41、将mmpose关键点检测模型部署RKNN上，进行模型加速处理-CFANZ编程社区

基本思想：需要一个自训练的关键点检测模型部署到rk3399pro的平台上，逐记录开始折腾

链接: https://pan.baidu.com/s/1q9hNysHmhbiqRYXFeinzkQ?pwd=rp9u 提取码: rp9u
--来自百度网盘超级会员v1的分享

一、使用相关链接,涉及到自训练方法和转模型onnx48、mmpose关键点识别模型转ncnn和mnn,并进行训练和部署_sxj731533730的博客-CSDN博客该模型作为基础模型，毕竟已经可以部署ncnn和mnn平台上了，且训练比较方便,下载瑞芯微的官方提供的系统和安装包35、ubuntu20.04搭建瑞芯微的npu仿真环境和测试rv1126的Debain系统下的yolov5+npu检测功能以及RKNN推理部署_sxj731533730的博客-CSDN博客配置环境和进行模型转换

41、将mmpose关键点检测模型部署RKNN上，进行模型加速处理_python

1)目录结构

ubuntu@ubuntu:~/Downloads/rknn-toolkit-1.7.1/examples/onnx/mmpose$ tree 
.
├── 0.jpeg
├── dataset.txt
├── hrnet_w32_macaque_256x192-f7e9e04f_20210407_sim.onnx
└── onnx2rknn.py

文件内容

ubuntu@ubuntu:~/Downloads/rknn-toolkit-1.7.1/examples/onnx/mmpose$ cat dataset.txt 
0.jpeg

2)转换模型，转换代码

from rknn.api import RKNN
 
ONNX_MODEL = 'hrnet_w32_macaque_256x192-f7e9e04f_20210407.onnx'
RKNN_MODEL = 'hrnet_w32_macaque_256x192-f7e9e04f_20210407.rknn'
 
if __name__ == '__main__':
 
    # Create RKNN object
    rknn = RKNN(verbose=True)
 
    # pre-process config
    print('--> config model')
    rknn.config(mean_values=[[0,0, 0]], std_values=[[255 , 255 , 255]], reorder_channel='0 1 2',
                target_platform='rk3399pro',
                quantized_dtype='asymmetric_affine-u8', optimization_level=3, output_optimize=1)
    print('done')
 
    print('--> Loading model')
    ret = rknn.load_onnx(model=ONNX_MODEL)
    if ret != 0:
        print('Load model  failed!')
        exit(ret)
    print('done')
 
    # Build model
    print('--> Building model')
    ret = rknn.build(do_quantization=True, dataset='dataset.txt')  # ,pre_compile=True
    if ret != 0:
        print('Build hrnet_w32_macaque_256x192-f7e9e04f_20210407 failed!')
        exit(ret)
    print('done')
 
    # Export rknn model
    print('--> Export RKNN model')
    ret = rknn.export_rknn(RKNN_MODEL)
    if ret != 0:
        print('Export hrnet_w32_macaque_256x192-f7e9e04f_20210407_sim.rknn failed!')
        exit(ret)
    print('done')
 
    rknn.release()

转换过程

(rknnpy36) ubuntu@ubuntu:~/Downloads/rknn-toolkit-1.7.1/examples/onnx/mmpose$ python3 onnx2rknn.py 


D Process Add_Add_666_144 ...
D RKNN output shape(add): (1 64 48 32)
D Tensor @Add_Add_666_144:out0 type: float32
D Process Conv_Conv_544_279 ...


D Packing Initializer_1865_14 ...
D output tensor id = 0, name = Conv_Conv_816/out0_0
D input tensor id = 1, name = input.1_768
I Build config finished.
done
--> Export RKNN model
done

二、使用npu仿真环境去推理mmpose关键点检测，因为关键点的检测需要目标检测作为支持，所以缺少检测代码去检测出目标检测狂，所以这里给了一个添加一个固定检测坐标就好，后期套用yolov5作为检测目标即可

import os
import urllib
import traceback
import time
import sys
import warnings

import numpy as np
import cv2
from rknn.api import RKNN

RKNN_MODEL = "hrnet_w32_macaque_256x192-f7e9e04f_20210407.rknn"
IMG_PATH = "0.jpg"

QUANTIZE_ON = True

def bbox_xywh2cs(bbox, aspect_ratio, padding=1., pixel_std=200.):
    """Transform the bbox format from (x,y,w,h) into (center, scale)

    Args:
        bbox (ndarray): Single bbox in (x, y, w, h)
        aspect_ratio (float): The expected bbox aspect ratio (w over h)
        padding (float): Bbox padding factor that will be multilied to scale.
            Default: 1.0
        pixel_std (float): The scale normalization factor. Default: 200.0

    Returns:
        tuple: A tuple containing center and scale.
        - np.ndarray[float32](2,): Center of the bbox (x, y).
        - np.ndarray[float32](2,): Scale of the bbox w & h.
    """

    x, y, w, h = bbox[:4]
    center = np.array([x + w * 0.5, y + h * 0.5], dtype=np.float32)

    if w > aspect_ratio * h:
        h = w * 1.0 / aspect_ratio
    elif w < aspect_ratio * h:
        w = h * aspect_ratio

    scale = np.array([w, h], dtype=np.float32) / pixel_std
    scale = scale * padding

    return center, scale
def rotate_point(pt, angle_rad):
    """Rotate a point by an angle.

    Args:
        pt (list[float]): 2 dimensional point to be rotated
        angle_rad (float): rotation angle by radian

    Returns:
        list[float]: Rotated point.
    """
    assert len(pt) == 2
    sn, cs = np.sin(angle_rad), np.cos(angle_rad)
    new_x = pt[0] * cs - pt[1] * sn
    new_y = pt[0] * sn + pt[1] * cs
    rotated_pt = [new_x, new_y]

    return rotated_pt
def _get_3rd_point(a, b):
    """To calculate the affine matrix, three pairs of points are required. This
    function is used to get the 3rd point, given 2D points a & b.

    The 3rd point is defined by rotating vector `a - b` by 90 degrees
    anticlockwise, using b as the rotation center.

    Args:
        a (np.ndarray): point(x,y)
        b (np.ndarray): point(x,y)

    Returns:
        np.ndarray: The 3rd point.
    """
    assert len(a) == 2
    assert len(b) == 2
    direction = a - b
    third_pt = b + np.array([-direction[1], direction[0]], dtype=np.float32)

    return third_pt
def get_affine_transform(center,
                         scale,
                         rot,
                         output_size,
                         shift=(0., 0.),
                         inv=False):
    """Get the affine transform matrix, given the center/scale/rot/output_size.

    Args:
        center (np.ndarray[2, ]): Center of the bounding box (x, y).
        scale (np.ndarray[2, ]): Scale of the bounding box
            wrt [width, height].
        rot (float): Rotation angle (degree).
        output_size (np.ndarray[2, ] | list(2,)): Size of the
            destination heatmaps.
        shift (0-100%): Shift translation ratio wrt the width/height.
            Default (0., 0.).
        inv (bool): Option to inverse the affine transform direction.
            (inv=False: src->dst or inv=True: dst->src)

    Returns:
        np.ndarray: The transform matrix.
    """
    assert len(center) == 2
    assert len(scale) == 2
    assert len(output_size) == 2
    assert len(shift) == 2

    # pixel_std is 200.
    scale_tmp = scale * 200.0

    shift = np.array(shift)
    src_w = scale_tmp[0]
    dst_w = output_size[0]
    dst_h = output_size[1]

    rot_rad = np.pi * rot / 180
    src_dir = rotate_point([0., src_w * -0.5], rot_rad)
    dst_dir = np.array([0., dst_w * -0.5])

    src = np.zeros((3, 2), dtype=np.float32)
    src[0, :] = center + scale_tmp * shift
    src[1, :] = center + src_dir + scale_tmp * shift
    src[2, :] = _get_3rd_point(src[0, :], src[1, :])

    dst = np.zeros((3, 2), dtype=np.float32)
    dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
    dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir
    dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :])

    if inv:
        trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
    else:
        trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))

    return trans
def bbox_xyxy2xywh(bbox_xyxy):
    """Transform the bbox format from x1y1x2y2 to xywh.

    Args:
        bbox_xyxy (np.ndarray): Bounding boxes (with scores), shaped (n, 4) or
            (n, 5). (left, top, right, bottom, [score])

    Returns:
        np.ndarray: Bounding boxes (with scores),
          shaped (n, 4) or (n, 5). (left, top, width, height, [score])
    """
    bbox_xywh = bbox_xyxy.copy()
    bbox_xywh[:, 2] = bbox_xywh[:, 2] - bbox_xywh[:, 0]
    bbox_xywh[:, 3] = bbox_xywh[:, 3] - bbox_xywh[:, 1]

    return bbox_xywh
def _get_max_preds(heatmaps):
    """Get keypoint predictions from score maps.

    Note:
        batch_size: N
        num_keypoints: K
        heatmap height: H
        heatmap width: W

    Args:
        heatmaps (np.ndarray[N, K, H, W]): model predicted heatmaps.

    Returns:
        tuple: A tuple containing aggregated results.

        - preds (np.ndarray[N, K, 2]): Predicted keypoint location.
        - maxvals (np.ndarray[N, K, 1]): Scores (confidence) of the keypoints.
    """
    assert isinstance(heatmaps,
                      np.ndarray), ('heatmaps should be numpy.ndarray')
    assert heatmaps.ndim == 4, 'batch_images should be 4-ndim'

    N, K, _, W = heatmaps.shape
    heatmaps_reshaped = heatmaps.reshape((N, K, -1))
    idx = np.argmax(heatmaps_reshaped, 2).reshape((N, K, 1))
    maxvals = np.amax(heatmaps_reshaped, 2).reshape((N, K, 1))

    preds = np.tile(idx, (1, 1, 2)).astype(np.float32)
    preds[:, :, 0] = preds[:, :, 0] % W
    preds[:, :, 1] = preds[:, :, 1] // W

    preds = np.where(np.tile(maxvals, (1, 1, 2)) > 0.0, preds, -1)
    return preds, maxvals
def transform_preds(coords, center, scale, output_size, use_udp=False):
    """Get final keypoint predictions from heatmaps and apply scaling and
    translation to map them back to the image.

    Note:
        num_keypoints: K

    Args:
        coords (np.ndarray[K, ndims]):

            * If ndims=2, corrds are predicted keypoint location.
            * If ndims=4, corrds are composed of (x, y, scores, tags)
            * If ndims=5, corrds are composed of (x, y, scores, tags,
              flipped_tags)

        center (np.ndarray[2, ]): Center of the bounding box (x, y).
        scale (np.ndarray[2, ]): Scale of the bounding box
            wrt [width, height].
        output_size (np.ndarray[2, ] | list(2,)): Size of the
            destination heatmaps.
        use_udp (bool): Use unbiased data processing

    Returns:
        np.ndarray: Predicted coordinates in the images.
    """
    assert coords.shape[1] in (2, 4, 5)
    assert len(center) == 2
    assert len(scale) == 2
    assert len(output_size) == 2

    # Recover the scale which is normalized by a factor of 200.
    scale = scale * 200.0

    if use_udp:
        scale_x = scale[0] / (output_size[0] - 1.0)
        scale_y = scale[1] / (output_size[1] - 1.0)
    else:
        scale_x = scale[0] / output_size[0]
        scale_y = scale[1] / output_size[1]

    target_coords = np.ones_like(coords)
    target_coords[:, 0] = coords[:, 0] * scale_x + center[0] - scale[0] * 0.5
    target_coords[:, 1] = coords[:, 1] * scale_y + center[1] - scale[1] * 0.5

    return target_coords
def keypoints_from_heatmaps(heatmaps,
                            center,
                            scale,
                            unbiased=False,
                            post_process='default',
                            kernel=11,
                            valid_radius_factor=0.0546875,
                            use_udp=False,
                            target_type='GaussianHeatmap'):
    
    # Avoid being affected
    heatmaps = heatmaps.copy()

    N, K, H, W = heatmaps.shape
    preds, maxvals = _get_max_preds(heatmaps)
    # add +/-0.25 shift to the predicted locations for higher acc.
    for n in range(N):
        for k in range(K):
            heatmap = heatmaps[n][k]
            px = int(preds[n][k][0])
            py = int(preds[n][k][1])
            if 1 < px < W - 1 and 1 < py < H - 1:
                diff = np.array([
                    heatmap[py][px + 1] - heatmap[py][px - 1],
                    heatmap[py + 1][px] - heatmap[py - 1][px]
                ])
                preds[n][k] += np.sign(diff) * .25
                if post_process == 'megvii':
                    preds[n][k] += 0.5

    # Transform back to the image
    for i in range(N):
        preds[i] = transform_preds(
            preds[i], center[i], scale[i], [W, H], use_udp=use_udp)

    if post_process == 'megvii':
        maxvals = maxvals / 255.0 + 0.5

    return preds, maxvals

def decode(output,center,scale,score_,batch_size = 1):


    c = np.zeros((batch_size, 2), dtype=np.float32)
    s = np.zeros((batch_size, 2), dtype=np.float32)
    score = np.ones(batch_size)
    for i in range(batch_size):
        c[i, :] = center
        s[i, :] = scale

        score[i] = np.array(score_).reshape(-1)
      

    preds, maxvals = keypoints_from_heatmaps(
        output,
        c,
        s,
       False,
        'default',
        11,
        0.0546875,
        False,
        'GaussianHeatmap'
        )

    all_preds = np.zeros((batch_size, preds.shape[1], 3), dtype=np.float32)
    all_boxes = np.zeros((batch_size, 6), dtype=np.float32)
    all_preds[:, :, 0:2] = preds[:, :, 0:2]
    all_preds[:, :, 2:3] = maxvals
    all_boxes[:, 0:2] = c[:, 0:2]
    all_boxes[:, 2:4] = s[:, 0:2]
    all_boxes[:, 4] = np.prod(s * 200.0, axis=1)
    all_boxes[:, 5] = score
    result = {}

    result['preds'] = all_preds
    result['boxes'] = all_boxes

    print(result)
    return result
def draw(bgr,predict_dict,skeleton):
    bboxes = predict_dict["boxes"]
    for box in bboxes:
        cv2.rectangle(bgr, (int(box[0]), int(box[1])), (int(box[0]) + int(box[2]), int(box[1]) + int(box[3])),(255, 0, 0))

    all_preds = predict_dict["preds"]
    for all_pred in all_preds:
        for x,y,s in all_pred:
            cv2.circle(bgr,(int(x), int(y)), 3,(0, 255, 120), -1)
        for sk in skeleton:
            x0=  int(all_pred[sk[0]][0])
            y0 = int(all_pred[sk[0]][1])
            x1 = int(all_pred[sk[1]][0])
            y1 = int(all_pred[sk[1]][1])
            cv2.line(bgr, (x0, y0), (x1, y1),(0, 255, 0), 1)
    cv2.imwrite("result.jpg",bgr)

if __name__ == "__main__":

    # Create RKNN object
    rknn = RKNN()

    if not os.path.exists(RKNN_MODEL):
        print("model not exist")
        exit(-1)

    # Load ONNX model
    print("--> Loading model")
    ret = rknn.load_rknn(RKNN_MODEL)
    if ret != 0:
        print("Load rknn model failed!")
        exit(ret)
    print("done")

    # init runtime environment
    print("--> Init runtime environment")
    ret = rknn.init_runtime()
    if ret != 0:
        print("Init runtime environment failed")
        exit(ret)
    print("done")
    bbox=[2.213932e+02, 1.935179e+02, 9.873443e+02-2.213932e+02, 1.035825e+03-1.935179e+02,9.995332e-01]
    image_size=[192,256]
    src_img = cv2.imread(IMG_PATH)
    img = cv2.cvtColor(src_img, cv2.COLOR_BGR2RGB)  # hwc rgb
    aspect_ratio = image_size[0] / image_size[1]
    img_height = img.shape[0]
    img_width = img.shape[1]
    padding=1.25
    pixel_std=200
    center, scale = bbox_xywh2cs(
        bbox,
        aspect_ratio,
        padding,
        pixel_std)
    trans = get_affine_transform(center, scale, 0, image_size)
    img = cv2.warpAffine(
        img,
        trans, (int(image_size[0]), int(image_size[1])),
        flags=cv2.INTER_LINEAR)
    print(trans)
    #img = np.transpose(img, (2, 0, 1)).astype(np.float32)  # chw rgb
    #outputs = rknn.inference(inputs=[img], data_type=None, data_format="nchw")[0]
    #img[0, ...] = ((img[0, ...] / 255.0) - 0.485) / 0.229
    #img[1, ...] = ((img[1, ...] / 255.0) - 0.456) / 0.224
    #img[2, ...] = ((img[2, ...] / 255.0) - 0.406) / 0.225
    # Inference
    print("--> Running model")
    start = time.clock()
    outputs= rknn.inference(inputs=[img])[0]
    # 获取结束时间
    end = time.clock()
    # 计算运行时间
    runTime = end - start
    runTime_ms = runTime * 1000
    # 输出运行时间
    print("运行时间：", runTime_ms, "毫秒")

    print(outputs)
    predict_dict=decode(outputs,center,scale,bbox[-1])
    skeleton = [[15, 13],[13, 11], [16, 14],[14, 12],[11, 12], [5, 11], [6, 12], [5, 6],[5, 7], [6, 8], [7, 9], [8, 10],[1, 2], [0, 1], [0, 2], [1, 3],[2, 4], [3, 5], [4, 6]]
    draw(src_img,predict_dict,skeleton)
    rknn.release()

仿真的结果

41、将mmpose关键点检测模型部署RKNN上，进行模型加速处理_python_02

三、c++的测试在rkn3399 pro的测试效果

cmakelists.txt

cmake_minimum_required(VERSION 3.16)
project(untitled10)
set(CMAKE_CXX_FLAGS "-std=c++11")
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS}  ")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ")

include_directories(${CMAKE_SOURCE_DIR})
include_directories(${CMAKE_SOURCE_DIR}/include)
find_package(OpenCV REQUIRED)
#message(STATUS ${OpenCV_INCLUDE_DIRS})
#添加头文件
include_directories(${OpenCV_INCLUDE_DIRS})
#链接Opencv库
add_library(librknn_api SHARED IMPORTED)
set_target_properties(librknn_api PROPERTIES IMPORTED_LOCATION ${CMAKE_SOURCE_DIR}/lib/librknn_api.so)
#官方的zoo社区或者参考前几篇博客寻找其so出处

add_executable(untitled10 main.cpp)
target_link_libraries(untitled10 ${OpenCV_LIBS} librknn_api )

main.cpp测试

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <queue>
#include "rknn_api.h"
#include "opencv2/core/core.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/highgui/highgui.hpp"
#include <chrono>
#include <iostream>

using namespace std;

struct Keypoints {
    float x;
    float y;
    float score;

    Keypoints() : x(0), y(0), score(0) {}

    Keypoints(float x, float y, float score) : x(x), y(y), score(score) {}
};

struct Box {
    float center_x;
    float center_y;
    float scale_x;
    float scale_y;
    float scale_prob;
    float score;

    Box() : center_x(0), center_y(0), scale_x(0), scale_y(0), scale_prob(0), score(0) {}

    Box(float center_x, float center_y, float scale_x, float scale_y, float scale_prob, float score) :
            center_x(center_x), center_y(center_y), scale_x(scale_x), scale_y(scale_y), scale_prob(scale_prob),
            score(score) {}
};

void bbox_xywh2cs(float bbox[], float aspect_ratio, float padding, float pixel_std, float *center, float *scale) {
    float x = bbox[0];
    float y = bbox[1];
    float w = bbox[2];
    float h = bbox[3];
    *center = x + w * 0.5;
    *(center + 1) = y + h * 0.5;

    if (w > aspect_ratio * h)
        h = w * 1.0 / aspect_ratio;
    else if (w < aspect_ratio * h)
        w = h * aspect_ratio;


    *scale = (w / pixel_std) * padding;
    *(scale + 1) = (h / pixel_std) * padding;

}

void rotate_point(float *pt, float angle_rad, float *rotated_pt) {
    float sn = sin(angle_rad);
    float cs = cos(angle_rad);
    float new_x = pt[0] * cs - pt[1] * sn;
    float new_y = pt[0] * sn + pt[1] * cs;
    rotated_pt[0] = new_x;
    rotated_pt[1] = new_y;

}

void _get_3rd_point(cv::Point2f a, cv::Point2f b, float *direction) {

    float direction_0 = a.x - b.x;
    float direction_1 = a.y - b.y;
    direction[0] = b.x - direction_1;
    direction[1] = b.y + direction_0;


}

void get_affine_transform(float *center, float *scale, float rot, float *output_size, float *shift, bool inv,
                          cv::Mat &trans) {
    float scale_tmp[] = {0, 0};
    scale_tmp[0] = scale[0] * 200.0;
    scale_tmp[1] = scale[1] * 200.0;
    float src_w = scale_tmp[0];
    float dst_w = output_size[0];
    float dst_h = output_size[1];
    float rot_rad = M_PI * rot / 180;
    float pt[] = {0, 0};
    pt[0] = 0;
    pt[1] = src_w * (-0.5);
    float src_dir[] = {0, 0};
    rotate_point(pt, rot_rad, src_dir);
    float dst_dir[] = {0, 0};
    dst_dir[0] = 0;
    dst_dir[1] = dst_w * (-0.5);
    cv::Point2f src[3] = {cv::Point2f(0, 0), cv::Point2f(0, 0), cv::Point2f(0, 0)};
    src[0] = cv::Point2f(center[0] + scale_tmp[0] * shift[0], center[1] + scale_tmp[1] * shift[1]);
    src[1] = cv::Point2f(center[0] + src_dir[0] + scale_tmp[0] * shift[0],
                         center[1] + src_dir[1] + scale_tmp[1] * shift[1]);
    float direction_src[] = {0, 0};
    _get_3rd_point(src[0], src[1], direction_src);
    src[2] = cv::Point2f(direction_src[0], direction_src[1]);
    cv::Point2f dst[3] = {cv::Point2f(0, 0), cv::Point2f(0, 0), cv::Point2f(0, 0)};
    dst[0] = cv::Point2f(dst_w * 0.5, dst_h * 0.5);
    dst[1] = cv::Point2f(dst_w * 0.5 + dst_dir[0], dst_h * 0.5 + dst_dir[1]);
    float direction_dst[] = {0, 0};
    _get_3rd_point(dst[0], dst[1], direction_dst);
    dst[2] = cv::Point2f(direction_dst[0], direction_dst[1]);

    if (inv) {
        trans = cv::getAffineTransform(dst, src);
    } else {
        trans = cv::getAffineTransform(src, dst);
    }


}

void
transform_preds(std::vector<cv::Point2f> coords, std::vector<Keypoints> &target_coords, float *center, float *scale,
                int w, int h, bool use_udp = false) {
    float scale_x[] = {0, 0};
    float temp_scale[] = {scale[0] * 200, scale[1] * 200};
    if (use_udp) {
        scale_x[0] = temp_scale[0] / (w - 1);
        scale_x[1] = temp_scale[1] / (h - 1);
    } else {
        scale_x[0] = temp_scale[0] / w;
        scale_x[1] = temp_scale[1] / h;
    }
    for (int i = 0; i < coords.size(); i++) {
        target_coords[i].x = coords[i].x * scale_x[0] + center[0] - temp_scale[0] * 0.5;
        target_coords[i].y = coords[i].y * scale_x[1] + center[1] - temp_scale[1] * 0.5;
    }

}


void letterbox(cv::Mat rgb, cv::Mat &img_resize, int target_width, int target_height) {

    float shape_0 = rgb.rows;
    float shape_1 = rgb.cols;
    float new_shape_0 = target_height;
    float new_shape_1 = target_width;
    float r = std::min(new_shape_0 / shape_0, new_shape_1 / shape_1);
    float new_unpad_0 = int(round(shape_1 * r));
    float new_unpad_1 = int(round(shape_0 * r));
    float dw = new_shape_1 - new_unpad_0;
    float dh = new_shape_0 - new_unpad_1;
    dw = dw / 2;
    dh = dh / 2;
    cv::Mat copy_rgb = rgb.clone();
    if (int(shape_0) != int(new_unpad_0) && int(shape_1) != int(new_unpad_1)) {
        cv::resize(copy_rgb, img_resize, cv::Size(new_unpad_0, new_unpad_1));
        copy_rgb = img_resize;
    }
    int top = int(round(dh - 0.1));
    int bottom = int(round(dh + 0.1));
    int left = int(round(dw - 0.1));
    int right = int(round(dw + 0.1));
    cv::copyMakeBorder(copy_rgb, img_resize, top, bottom, left, right, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));

}

void printRKNNTensor(rknn_tensor_attr *attr) {
    printf("index=%d name=%s n_dims=%d dims=[%d %d %d %d] n_elems=%d size=%d "
           "fmt=%d type=%d qnt_type=%d fl=%d zp=%d scale=%f\n",
           attr->index, attr->name, attr->n_dims, attr->dims[3], attr->dims[2],
           attr->dims[1], attr->dims[0], attr->n_elems, attr->size, 0, attr->type,
           attr->qnt_type, attr->fl, attr->zp, attr->scale);
}

int post_process_u8(uint8_t *input0, int model_in_h, int model_in_w) {


    return 0;
}

int main(int argc, char **argv) {
    float keypoint_score = 0.1f;
    cv::Mat bgr = cv::imread("../0.jpg");
    cv::Mat rgb;
    cv::cvtColor(bgr, rgb, cv::COLOR_BGR2RGB);
    float image_target_w = 256;
    float image_target_h = 192;
    float padding = 1.25;
    float pixel_std = 200;
    float aspect_ratio = image_target_h / image_target_w;
    float bbox[] = {2.213932e+02, 1.935179e+02, 9.873443e+02, 1.035825e+03,
                    9.995332e-01};// 需要检测框架　这个矩形框来自检测框架的坐标　x y w h score　这里来自我自己标注的检测框
    bbox[2] = bbox[2] - bbox[0];
    bbox[3] = bbox[3] - bbox[1];
    float center[2] = {0, 0};
    float scale[2] = {0, 0};
    bbox_xywh2cs(bbox, aspect_ratio, padding, pixel_std, center, scale);
    float rot = 0;
    float shift[] = {0, 0};
    bool inv = false;
    float output_size[] = {image_target_h, image_target_w};
    cv::Mat trans;
    get_affine_transform(center, scale, rot, output_size, shift, inv, trans);
    std::cout << trans << std::endl;
    cv::Mat detect_image;//= cv::Mat::zeros(image_target_w ,image_target_h, CV_8UC3);
    cv::warpAffine(rgb, detect_image, trans, cv::Size(image_target_h, image_target_w), cv::INTER_LINEAR);
    const char *model_path = "../hrnet_w32_macaque_256x192-f7e9e04f_20210407.rknn";
    // Load model
    FILE *fp = fopen(model_path, "rb");
    if (fp == NULL) {
        printf("fopen %s fail!\n", model_path);
        return -1;
    }
    fseek(fp, 0, SEEK_END);
    int model_len = ftell(fp);
    void *model = malloc(model_len);
    fseek(fp, 0, SEEK_SET);
    if (model_len != fread(model, 1, model_len, fp)) {
        printf("fread %s fail!\n", model_path);
        free(model);
        return -1;
    }


    rknn_context ctx = 0;

    int ret = rknn_init(&ctx, model, model_len, 0);
    if (ret < 0) {
        printf("rknn_init fail! ret=%d\n", ret);
        return -1;
    }

    /* Query sdk version */
    rknn_sdk_version version;
    ret = rknn_query(ctx, RKNN_QUERY_SDK_VERSION, &version,
                     sizeof(rknn_sdk_version));
    if (ret < 0) {
        printf("rknn_init error ret=%d\n", ret);
        return -1;
    }
    printf("sdk version: %s driver version: %s\n", version.api_version,
           version.drv_version);


    /* Get input,output attr */
    rknn_input_output_num io_num;
    ret = rknn_query(ctx, RKNN_QUERY_IN_OUT_NUM, &io_num, sizeof(io_num));
    if (ret < 0) {
        printf("rknn_init error ret=%d\n", ret);
        return -1;
    }
    printf("model input num: %d, output num: %d\n", io_num.n_input,
           io_num.n_output);

    rknn_tensor_attr input_attrs[io_num.n_input];
    memset(input_attrs, 0, sizeof(input_attrs));
    for (int i = 0; i < io_num.n_input; i++) {
        input_attrs[i].index = i;
        ret = rknn_query(ctx, RKNN_QUERY_INPUT_ATTR, &(input_attrs[i]),
                         sizeof(rknn_tensor_attr));
        if (ret < 0) {
            printf("rknn_init error ret=%d\n", ret);
            return -1;
        }
        printRKNNTensor(&(input_attrs[i]));
    }

    rknn_tensor_attr output_attrs[io_num.n_output];
    memset(output_attrs, 0, sizeof(output_attrs));
    for (int i = 0; i < io_num.n_output; i++) {
        output_attrs[i].index = i;
        ret = rknn_query(ctx, RKNN_QUERY_OUTPUT_ATTR, &(output_attrs[i]),
                         sizeof(rknn_tensor_attr));
        printRKNNTensor(&(output_attrs[i]));
    }

    int input_channel = 3;
    int input_width = 0;
    int input_height = 0;
    if (input_attrs[0].fmt == RKNN_TENSOR_NCHW) {
        printf("model is NCHW input fmt\n");
        input_width = input_attrs[0].dims[0];
        input_height = input_attrs[0].dims[1];
        printf("input_width=%d input_height=%d\n", input_width, input_height);
    } else {
        printf("model is NHWC input fmt\n");
        input_width = input_attrs[0].dims[1];
        input_height = input_attrs[0].dims[2];
        printf("input_width=%d input_height=%d\n", input_width, input_height);
    }

    printf("model input height=%d, width=%d, channel=%d\n", input_height, input_width,
           input_channel);


/* Init input tensor */
    rknn_input inputs[1];
    memset(inputs, 0, sizeof(inputs));
    inputs[0].index = 0;
    inputs[0].buf = detect_image.data;
    inputs[0].type = RKNN_TENSOR_UINT8;
    inputs[0].size = input_width * input_height * input_channel;
    inputs[0].fmt = RKNN_TENSOR_NHWC;
    inputs[0].pass_through = 0;

    /* Init output tensor */
    rknn_output outputs[io_num.n_output];
    memset(outputs, 0, sizeof(outputs));
    for (int i = 0; i < io_num.n_output; i++) {
        outputs[i].want_float = 1;
    }
    printf("img.cols: %d, img.rows: %d\n", detect_image.cols, detect_image.rows);
    auto t1 = std::chrono::steady_clock::now();
    rknn_inputs_set(ctx, io_num.n_input, inputs);
    std::chrono::steady_clock::time_point  now = std::chrono::steady_clock::now();

    ret = rknn_run(ctx, NULL);
    if (ret < 0) {
        printf("ctx error ret=%d\n", ret);
        return -1;
    }
    auto t2 = std::chrono::steady_clock::now();
    std::chrono::duration<double> time_span = std::chrono::duration_cast<std::chrono::duration<double>>(t2 - now);
    std::cout << "It took me " << time_span.count()*1000 << " ms."<<std::endl;
    ret = rknn_outputs_get(ctx, io_num.n_output, outputs, NULL);
    if (ret < 0) {
        printf("outputs error ret=%d\n", ret);
        return -1;
    }
    int shape_b = 0;
    int shape_c = 0;
    int shape_w = 0;;
    int shape_h = 0;;
    for (int i = 0; i < io_num.n_output; ++i) {
        shape_b = output_attrs[i].dims[3];
        shape_c = output_attrs[i].dims[2];
        shape_h = output_attrs[i].dims[1];;
        shape_w = output_attrs[i].dims[0];;
    }
    printf("batch=%d channel=%d width=%d height= %d\n", shape_b, shape_c, shape_w, shape_h);
    std::vector<float> vec_result_heap;
    float *output = (float *) outputs[0].buf;
    for (int i = 0; i < shape_c; i++) {
        for (int j = 0; j < shape_h; j++) {
            for (int k = 0; k < shape_w; k++) {
                float elements = output[i * shape_w * shape_h + j * shape_w + k];
                vec_result_heap.emplace_back(elements);
            }
        }

    }

    std::vector<Keypoints> all_preds;
    std::vector<int> idx;
    for (int i = 0; i < shape_c; i++) {
        auto begin = vec_result_heap.begin() + i * shape_w * shape_h;
        auto end = vec_result_heap.begin() + (i + 1) * shape_w * shape_h;
        float maxValue = *max_element(begin, end);
        int maxPosition = max_element(begin, end) - begin;
        all_preds.emplace_back(Keypoints(0, 0, maxValue));
        idx.emplace_back(maxPosition);
    }
    std::vector<cv::Point2f> vec_point;
    for (int i = 0; i < idx.size(); i++) {
        int x = idx[i] % shape_w;
        int y = idx[i] / shape_w;
        vec_point.emplace_back(cv::Point2f(x, y));
    }

    for (int i = 0; i < shape_c; i++) {
        int px = vec_point[i].x;
        int py = vec_point[i].y;
        if (px > 1 && px < shape_w - 1 && py > 1 && py < shape_h - 1) {
            float diff_0 = vec_result_heap[py * shape_w + px + 1] - vec_result_heap[py * shape_w + px - 1];
            float diff_1 = vec_result_heap[(py + 1) * shape_w + px] - vec_result_heap[(py - 1) * shape_w + px];
            vec_point[i].x += diff_0 == 0 ? 0 : (diff_0 > 0) ? 0.25 : -0.25;
            vec_point[i].y += diff_1 == 0 ? 0 : (diff_1 > 0) ? 0.25 : -0.25;
        }
    }
    std::vector<Box> all_boxes;
    bool heap_map = false;
    if (heap_map) {
        all_boxes.emplace_back(Box(center[0], center[1], scale[0], scale[1], scale[0] * scale[1] * 400, bbox[4]));
    }
    transform_preds(vec_point, all_preds, center, scale, shape_w, shape_h);
    int skeleton[][2] = {{15, 13},
                         {13, 11},
                         {16, 14},
                         {14, 12},
                         {11, 12},
                         {5,  11},
                         {6,  12},
                         {5,  6},
                         {5,  7},
                         {6,  8},
                         {7,  9},
                         {8,  10},
                         {1,  2},
                         {0,  1},
                         {0,  2},
                         {1,  3},
                         {2,  4},
                         {3,  5},
                         {4,  6}};


    cv::rectangle(bgr, cv::Point(bbox[0], bbox[1]), cv::Point(bbox[0] + bbox[2], bbox[1] + bbox[3]),
                  cv::Scalar(255, 0, 0));
    for (int i = 0; i < all_preds.size(); i++) {
        if (all_preds[i].score > keypoint_score) {
            cv::circle(bgr, cv::Point(all_preds[i].x, all_preds[i].y), 3, cv::Scalar(0, 255, 120), -1);//画点，其实就是实心圆
        }
    }
    for (int i = 0; i < sizeof(skeleton) / sizeof(sizeof(skeleton[1])); i++) {
        int x0 = all_preds[skeleton[i][0]].x;
        int y0 = all_preds[skeleton[i][0]].y;
        int x1 = all_preds[skeleton[i][1]].x;
        int y1 = all_preds[skeleton[i][1]].y;

        cv::line(bgr, cv::Point(x0, y0), cv::Point(x1, y1),
                 cv::Scalar(0, 255, 0), 1);
    }
    cv::imwrite("image.jpg", bgr);
    ret = rknn_outputs_release(ctx, io_num.n_output, outputs);

    if (ret < 0) {
        printf("rknn_query fail! ret=%d\n", ret);
       goto Error;
    }
    
    Error:
    if (ctx > 0)
        ret=rknn_destroy(ctx);
        printf("%d \n",ret);
        //感觉官方固件有问题，销毁对象存在问题，或许是我的固件有问题　https://t.rock-chips.com/forum.php?mod=viewthread&tid=2365
    if (model)
        free(model);
    if (fp)
        fclose(fp);
    return 0;
}

测试数据

/tmp/tmp.aWYhSoFJw3/cmake-build-debug/untitled10
[0.2005350017384178, -0, -25.19709322776945;
 -2.939193609270158e-18, 0.2005350017384178, 4.736860156114635]
D RKNNAPI: ==============================================
D RKNNAPI: RKNN VERSION:
D RKNNAPI:   API: 1.6.1 (00c4d8b build: 2021-03-15 16:31:37)
D RKNNAPI:   DRV: 1.7.1 (0cfd4a1 build: 2021-12-10 09:43:11)
D RKNNAPI: ==============================================
sdk version: 1.6.1 (00c4d8b build: 2021-03-15 16:31:37) driver version: 1.7.1 (0cfd4a1 build: 2021-12-10 09:43:11)
model input num: 1, output num: 1
index=0 name=input.1_790 n_dims=4 dims=[1 3 256 192] n_elems=147456 size=147456 fmt=0 type=3 qnt_type=2 fl=0 zp=0 scale=0.003922
index=0 name=Conv_Conv_816/out0_0 n_dims=4 dims=[1 17 64 48] n_elems=52224 size=52224 fmt=0 type=3 qnt_type=2 fl=0 zp=4 scale=0.002980
model is NCHW input fmt
input_width=192 input_height=256
model input height=256, width=192, channel=3
img.cols: 192, img.rows: 256
It took me 0.190758 ms.
batch=1 channel=17 width=48 height= 64
W RKNNAPI: rknn_destroy,  recv(MsgUnloadAck) fail!
E RKNNAPI: __pthread_recv_msg,  recv(MsgHeader) fail, -9(ERROR_PIPE) < 108!

Process finished with exit code 0

测试图片结果

41、将mmpose关键点检测模型部署RKNN上，进行模型加速处理_目标检测_03

参考

Index of /pypi/simple/

/home/ubuntu/rknn-toolkit-1.7.1/doc/Rockchip_User_Guide_RKNN_Toolkit_V1.7.1_CN.pdf