Running YOLO (Yolov8, Yolov5, Yolov6, YoloX, PPYolo) on RockChip NPU (RK3566, RK3568,RK3588, RK3576)

Anton Maltsev

มุมมอง 11 891

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 2 พ.ย. 2024

ความคิดเห็น • 60

@AntonMaltsev 10 หลายเดือนก่อน ⁺²
IMPORTANT!!
RockChip updated the structure of RKNN-Toolkit and ModelZoo completely from 1.6.0.
Here is the video with update - th-cam.com/video/VjmnH910fac/w-d-xo.html
@姚宏轩 11 หลายเดือนก่อน ⁺⁴
I followed your step and it works well on orangepi5 plus board. Thanks a lot.
@wolpumba4099 9 หลายเดือนก่อน ⁺³
*Intro*
- 0:01: The speaker discusses the increasing popularity of Rockchip in the computer vision world.
- 0:15: Acknowledges the popularity of Rockchip and mentions addressing two common problems in this video.
- 0:23: Highlights changes and updates in the Rockchip ecosystem over the years.
- 0:47: Promises an up-to-date guide as of October 2023 to address running different YOLO models on Rockchip NPUs.
*System Setup and Issues*
- 1:47: Introduces the system used, Rock P ER car 3568, and mentions testing on a different board.
- 2:57: Discusses potential problems like driver installation, using sudo commands, and NPU detection issues on specific boards.
- 3:24: Acknowledges the guide may not be universally excellent for all boards and recommends adjusting for specific cases.
*Logic of YOLO Network Setup*
- 3:38: Introduces the logic of preparing a YOLO network for Rockchip NPUs.
- 3:46: Emphasizes the need to understand the process: preparing, exporting, and running the network on the board.
*Step 1: Set Up the System*
- 4:49: Describes the first step, which involves SSH connection, using a script to set up the system on an empty board.
- 5:12: Advises modifying parameters in the script based on the specific board and system.
- 7:17: Stresses the importance of updating Python versions, with a focus on Python 3.7 in this case.
*Step 2: Set Up RKNN YOLOv8 Export*
- 11:25: Introduces the second step, setting up the YOLOv8 rapper from Rockchip for exporting the model.
- 13:33: Describes the process of setting up the YOLOv8 repository, creating a new environment, and running the export script.
*Step 3: Set Up Rknn-Toolkit*
- 13:54: Introduces the third step, setting up Rknn-Toolkit on an x86 machine for exporting the model in ARANN format.
- 15:16: Highlights the complexity of environment setup, particularly for compatibility with specific Python versions and GCC.
*Step 4: Run Network on RockChip*
- 20:08: Details the fourth step, running the network on the Rockchip device after successfully exporting the model.
- 20:31: Discusses modifications needed in the code to ensure compatibility with Rockchip and the successful execution of the network.
*Modifications to YOLOv8 Code*
- 25:09: Discusses reshaping in the code and the use of a specific function.
- 25:22: Mentions making modifications in the future and moves to the next point.
- 25:40: Talks about additional adjustments in the code.
- 25:49: Addresses a specific issue in the code, possibly an error.
- 25:56: Indicates a need for fixing something in the future.
*Translation and Code Modification*
- 26:03: Introduces the task of translating the code to English.
- 26:10: Begins the translation process and addresses the first modification.
- 26:16: Mentions being at a specific point in the code.
- 26:23: Discusses running a particular section of the code.
- 26:29: Highlights the need for translation to English.
- 26:38: Mentions a lot of modifications that need to be made.
- 26:46: Addresses a potential error related to model path.
- 26:56: Refers to specifying the target in the official documentation.
- 27:02: Talks about specifying the target for the system and discusses the use of default values.
- 27:11: Emphasizes the importance of specifying certain parameters.
*Testing and Additional Code Modifications*
- 27:17: Discusses the need to specify a particular ID.
- 27:25: Mentions using default data and displaying results.
- 27:33: Verifies that necessary modifications have been saved.
- 27:44: Initiates a test of the code.
- 27:50: Discovers a potential issue and considers the deletion of the torch.
- 28:00: Realizes that the torch has been deleted and addresses the issue.
- 28:10: Corrects the model path error.
- 28:26: Specifies the correct path for the expert model.
- 28:41: Continues with additional modifications.
- 28:52: Makes another adjustment in the code.
- 28:58: Initiates further changes.
- 29:23: Refers to an application and confirms everything is working.
- 29:32: Describes the image display through the interface.
- 29:39: Acknowledges the functionality of the code.
- 29:46: Comments on the legitimacy of the results.
- 29:52: Addresses the potential saving location of results.
- 29:59: Considers the need to specify a specific folder for saving.
- 30:07: Expresses uncertainty about the saving process.
- 30:13: Acknowledges the current functionality without modifications.
- 30:19: Mentions the necessity of adjustments for better performance.
- 30:26: Confirms the successful conversion of the code.
- 30:33: Discusses final adjustments needed at runtime without parameters.
- 30:40: Indicates the conclusion of the code-related discussion.
Disclaimer: I used chatgpt3.5 to summarize the video transcript. This
method may make mistakes in recognizing words
@EdjeElectronics 9 หลายเดือนก่อน ⁺²
Great work man! Keep it up!
@AntonMaltsev 9 หลายเดือนก่อน
Appreciate it! Thank you!
@stelioskoroneos3872 ปีที่แล้ว ⁺¹
Thanks for the video.
Do you have any idea how RokChip NPU would perform compared to RPie+Google Coral and/or Nvidia Nano on YOLO?
@AntonMaltsev ปีที่แล้ว ⁺¹
For the YOLOv5 there are a few of my tests here - medium.com/@zlodeibaal/choosing-computer-vision-board-in-2022-b27eb4ca7a7c
@LJC-zl4ye 10 หลายเดือนก่อน
The rknn_model_zoo library has been recently updated with the content of yolov8-seg, but when I tested it, I found that the post_process function in it will take a long time, and the effect of modifying the dfl function according to the video method is not obvious, may I ask how the post_process function should be modified to improve the calculation speed? Here is the content of post_process function, thanks.
def post_process(input_data):
# input_data[0], input_data[4], and input_data[8] are detection box information
# input_data[1], input_data[5], and input_data[9] are category score information
# input_data[2], input_data[6], and input_data[10] are confidence score information
# input_data[3], input_data[7], and input_data[11] are segmentation information
# input_data[12] is the proto information
proto = input_data[-1]
boxes, scores, classes_conf, seg_part = [], [], [], []
defualt_branch = 3
pair_per_branch = len(input_data) // defualt_branch
for i in range(defualt_branch):
boxes.append(box_process(input_data[pair_per_branch * i]))
classes_conf.append(input_data[pair_per_branch * i + 1])
scores.append(np.ones_like(input_data[pair_per_branch * i + 1][:, :1, :, :], dtype=np.float32))
seg_part.append(input_data[pair_per_branch * i + 3])
def sp_flatten(_in):
ch = _in.shape[1]
_in = _in.transpose(0, 2, 3, 1)
return _in.reshape(-1, ch)
boxes = [sp_flatten(_v) for _v in boxes]
classes_conf = [sp_flatten(_v) for _v in classes_conf]
scores = [sp_flatten(_v) for _v in scores]
seg_part = [sp_flatten(_v) for _v in seg_part]
boxes = np.concatenate(boxes)
classes_conf = np.concatenate(classes_conf)
scores = np.concatenate(scores)
seg_part = np.concatenate(seg_part)
# filter according to threshold
boxes, classes, scores, seg_part = filter_boxes(boxes, scores, classes_conf, seg_part)
zipped = zip(boxes, classes, scores, seg_part)
sort_zipped = sorted(zipped, key=lambda x: (x[2]), reverse=True)
result = zip(*sort_zipped)
max_nms = 30000
n = boxes.shape[0] # number of boxes
if not n:
return None, None, None, None
elif n > max_nms: # excess boxes
boxes, classes, scores, seg_part = [np.array(x[:max_nms]) for x in result]
else:
boxes, classes, scores, seg_part = [np.array(x) for x in result]
nboxes, nclasses, nscores, nseg_part = [], [], [], []
agnostic = 0
max_wh = 7680
c = classes * (0 if agnostic else max_wh)
ids = torchvision.ops.nms(
torch.tensor(boxes, dtype=torch.float32) + torch.tensor(c, dtype=torch.float32).unsqueeze(-1),
torch.tensor(scores, dtype=torch.float32), NMS_THRESH)
real_keeps = ids.tolist()[:MAX_DETECT]
nboxes.append(boxes[real_keeps])
nclasses.append(classes[real_keeps])
nscores.append(scores[real_keeps])
nseg_part.append(seg_part[real_keeps])
if not nclasses and not nscores:
return None, None, None, None
boxes = np.concatenate(nboxes)
classes = np.concatenate(nclasses)
scores = np.concatenate(nscores)
seg_part = np.concatenate(nseg_part)
ph, pw = proto.shape[-2:]
proto = proto.reshape(seg_part.shape[-1], -1)
seg_img = np.matmul(seg_part, proto)
seg_img = sigmoid(seg_img)
seg_img = seg_img.reshape(-1, ph, pw)
seg_threadhold = 0.5
# crop seg outside box
seg_img = F.interpolate(torch.tensor(seg_img)[None], torch.Size([640, 640]), mode='bilinear', align_corners=False)[0]
seg_img_t = _crop_mask(seg_img, torch.tensor(boxes))
seg_img = seg_img_t.numpy()
seg_img = seg_img > seg_threadhold
return boxes, classes, scores, seg_img
@jnassarm 3 หลายเดือนก่อน
Thanks for your amazing video Anton...Do you have a video explaining ROCK 5--YOLOV8--DUAL EDGE CORAL TPU M2 ? Thanks..Regards
@sergpetrusinski 11 หลายเดือนก่อน
Hi Anton Do you have the same video for Jetson Nano / ubuntu 20.04 with yolo detector ?
@gabrielnilo6101 ปีที่แล้ว
what do you recommend for computer vision (CV) with $50 limit?
And what do you recommend for CV with $150 limit?
@AntonMaltsev ปีที่แล้ว
It depends on the task and skills.
for 50$
So, for someone who is not familiar with embedded and doesn't need high performance, I can recommend old RPi - thepihut.com/products/raspberry-pi-3-model-a-plus
A lot of guides and a small amount of bugs. But pretty slow.
For someone who is familiar with embedded and neural networks - RockChip 3568 looks nice. There were ~35$ boards a year ago. Don't know if it's still in production.
Also, there are a lot of Arduino-like boards
--------------
For 150$ - a lot of boards availiable. Should consider resourses and task.
@rafiliya6258 2 หลายเดือนก่อน ⁺¹
How much time it takes for one picture? What is the resolution of pictures?
@AntonMaltsev 2 หลายเดือนก่อน
I am showing different benchmarks in this video - th-cam.com/video/mDRfXNuIMBE/w-d-xo.html
Also, in this video, I show different approaches with parallelization that can improve throughput almost six times.
@StasGT 5 หลายเดือนก่อน
Красавчик!
@StasGT 5 หลายเดือนก่อน
P.S. Хочу прикупить Апельсин 5Про, у него LPDDR5 стоит, должен быть по-шустрее. Я так понимаю, Рокчиповское НПУ поддерживает слои Внимания. Это круто, т.к. Гугл Корал не умеет. Хотя... Матричное умножение в лоб, делать может, соответственно, написать класс трансформатора не составит труда. Хм... Почему этого еще не сделали...? Займусь на днях, DeTr запущу на Корале.
@AntonMaltsev 5 หลายเดือนก่อน
Корал очень старый. Ну, и там много различия в архитектуре...
@LJC-zl4ye 11 หลายเดือนก่อน ⁺¹
Is there any way to increase the inference speed of rknn models on python? Currently NPU usage is a bit low, how can we increase it?
@AntonMaltsev 11 หลายเดือนก่อน
I did not experiment with this. There are some samples with multithreading - github.com/leafqycc/rknn-multi-threaded (Python) - github.com/leafqycc/rknn-cpp-Multithreading (C++) and this github.com/thanhtantran/rknn-multi-threaded-3588 (C++).
Maybe a few parallel processes can speed up this.
@LJC-zl4ye 11 หลายเดือนก่อน
I have tried using multiprocessing in the following way, with the code changed from the yolo_map_test_rknn.py file of rknn_model_zoo, but found no change in speed.
import concurrent.futures
def process_frame(frame):
img_src = frame
img = co_helper.letter_box(im=img_src.copy(), new_shape=(IMG_SIZE[1], IMG_SIZE[0]), pad_color=(0, 0, 0))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
input_data = img
outputs = model.run([input_data])
boxes, classes, scores = post_process(outputs, anchors, args)
img_p = img_src.copy()
if boxes is not None:
draw(img_p, co_helper.get_real_box(boxes), scores, classes)
return img_p
cap = cv2.VideoCapture(0)
pool = concurrent.futures.ThreadPoolExecutor(max_workers=8)
while True:
ret, frame = cap.read()
if not ret:
break
future = pool.submit(process_frame, frame)
cv2.imshow("full post process result", future.result())
if cv2.waitKey(1) & 0xFF == ord('q'):
break
pool.shutdown(wait=True)
cap.release()
cv2.destroyAllWindows()@@AntonMaltsev
@AntonMaltsev 11 หลายเดือนก่อน
@@LJC-zl4ye , you may be limited by camera speed here "cap = cv2.VideoCapture(0)"
Just try it with inference on a fixed image, something like "frame = np.zeros((480,640,3), np.uint8)"
Your camera may have 30/60 frames limitation
@LJC-zl4ye 11 หลายเดือนก่อน
I've tried this but the average FPS is still only 30.@@AntonMaltsev
@AntonMaltsev 11 หลายเดือนก่อน
@@LJC-zl4ye did you remove "ret, frame = cap.read()", " cv2.imshow", "cv2.waitKey(1)" etc. from the loop?
@AaquibNiyama-k1s 3 หลายเดือนก่อน
Which camera we can use for Orange Pi 5B? Can we use the RPi Camera module?
@AntonMaltsev 3 หลายเดือนก่อน
I am unsure, but I think OrangePi 5 (and some other Orange Pi boards) has a different camera format than other boards. They did not support CSI; they just used some internal format. But maybe there are some adapters
@AaquibNiyama-k1s 3 หลายเดือนก่อน
@@AntonMaltsev Can you please tell me which camera modules can work which is cost effective because the Orange PI camera module is costly.
@AntonMaltsev 3 หลายเดือนก่อน ⁺¹
@@AaquibNiyama-k1s never used CSI for OrangePi. I used some USB cameras, or different boards, with regular CSI interface.
@gundanium ปีที่แล้ว
Have you been able to get yolo to run on the rockchip with C++ and not through python?
@AntonMaltsev ปีที่แล้ว
The model obtained with such an approach should run on C++. The only problem is that you need to re-write NMS and all the other functions on the model output.
@adityasaiakella1287 11 หลายเดือนก่อน
does it work on rockchip rv1103?
@armenkalaidjian4494 9 หลายเดือนก่อน
Антон, все это будет работать на radxa zero 3w?
@AntonMaltsev 9 หลายเดือนก่อน
В целом, должно, особенно если версия не с 1GB.
Про скорость - не знаю. Вроде чуть медленнее 3568.
@armenkalaidjian4494 9 หลายเดือนก่อน
@@AntonMaltsev У меня плата будет в конце февраля с 4GB. Попробую на скорость. У тебя сколько кадров в секунду удалось выжать с yolov8s?
@ARMENIA181 ปีที่แล้ว
if modele traind for 640, is it possible to run camera 1280x736? Thnaks.
@AntonMaltsev ปีที่แล้ว ⁺¹
There are a few approaches to this:
1) Reshape input to 640 size
2) Export model in 1280x736
However, the best practice is to use a model with the same parameters you used when training it.
@AntonMaltsev 10 หลายเดือนก่อน
IMPORTANT!!!
The video is for rknn-toolkit2 version 1.5.2 and below.
They updated the structure completely from 1.6.0. Hope I will write a new video soon.
@upsangelhk 10 หลายเดือนก่อน
Thank you for the head up, surely looking forward to that.
@mrriverhe4768 10 หลายเดือนก่อน
can you provide blog?
@benx1326 ปีที่แล้ว
What are the performance of the rk3588 on yolov5s and yolov8n
@AntonMaltsev ปีที่แล้ว
Didn't test it, sorry. Also, when you ask such questions, you'll need to specify the size of the image. 224*224 vs 640*640 give you almost 10x different speed. You can check my results for rk3568 here:
medium.com/@zlodeibaal/choosing-computer-vision-board-in-2022-b27eb4ca7a7c
th-cam.com/video/Sadmw6Rrj1Y/w-d-xo.html
@ocamlmail 11 หลายเดือนก่อน
Благодарю, как обычно, за видео. Но звука нет и, наверное, не будет. Т.е. очень тихо.
@halilozcan8 ปีที่แล้ว
what would be performance in realtime ?
@AntonMaltsev ปีที่แล้ว
I tested a few networks a year ago. All result in my previous video or in medium article:
medium.com/@zlodeibaal/choosing-computer-vision-board-in-2022-b27eb4ca7a7c
th-cam.com/video/Sadmw6Rrj1Y/w-d-xo.html
@simpleded5454 ปีที่แล้ว ⁺³
Добрый день, Антон! Уже давно смотрю ваш канал, но в последний год страдаю от того, что ваши видео выходят на английском. Аудитории сильно не прибавилось, и до сих пор большинство русских, как я понимаю, и поэтому я не вижу особого смысла выпускать ролики в таком формате. Для иностранных слушателей проще выпустить статью на медиуме, так как слушать корявый английский не очень приятно. Простите за негатив, просто небольшой крик души.
P.S. Если вы хотите изучить английский, то лучше нанять репетитора и записаться на групповые занятия, так процесс быстрее пойдёт
@AntonMaltsev ปีที่แล้ว ⁺²
1) Прибавилось аудитории сильно
2) Появились иностранные заказчики
3) Познакомился с кучей CEO разных компаний
Считаю эксперимент успешным, буду продолжать:)
@AntonMaltsev ปีที่แล้ว ⁺¹
telegra.ph/Neskolko-slov-pro-blog-i-vokrug-09-16 - статистика год назад. Сейчас все сильно лучше + я научился чуть аккуратнее с этим работать.
@simpleded5454 ปีที่แล้ว
@@AntonMaltsev если вам это действительно идёт на пользу в таком ключе, то я рад за вас) Но тогда ждём прокачки языка😏😏😏
@igormotskin ปีที่แล้ว
Если у тебя уникальный контент, и людям нужна эта информация, то они будут смотреть и на китайском и на русском. Опять же, если ты можешь решить проблему CEO, то они к тебе будут обращаться, причём не важно какой у тебя уровень английского.
Но язык точно нужно улучшать. Существенно улучшать.
В любом случае мы желаем тебе добра и творческих успехов 😇
@okay730 8 หลายเดือนก่อน
should I change the classes in the yolo_map_test.py file
@mrriverhe4768 10 หลายเดือนก่อน
present it with txt and image is better
@cvabds 8 หลายเดือนก่อน ⁺¹
Subscribe
@robotmovil 7 หลายเดือนก่อน
When I try to run code (or other npu related code) I get:
E RKNN: [22:57:09.806] failed to open rknpu module, need to insmod rknpu dirver!
E RKNN: [22:57:09.806] failed to open rknn device!
E Catch exception when init runtime!
On Orange PI 5, Ubuntu 22.04 downloaded from the OrangePi web site.
@AntonMaltsev 7 หลายเดือนก่อน
Don't have any idea about this.
It seems that NPU or drivers are not correctly installed.
Are you completely on the 1.6 RKNN version?

ต่อไป

เล่นอัตโนมัติ

LLMs for RockChip. Guide for RKLLM. RK3588 vs RK3576 comparision