Did you enjoy this video? Try my premium courses! 😃🙌😊 ● Hands-On Computer Vision in the Cloud: Building an AWS-based Real Time Number Plate Recognition System bit.ly/3RXrE1Y ● End-To-End Computer Vision: Build and Deploy a Video Summarization API bit.ly/3tyQX0M ● Computer Vision on Edge: Real Time Number Plate Recognition on an Edge Device bit.ly/4dYodA7 ● Machine Learning Entrepreneur: How to start your entrepreneurial journey as a freelancer and content creator bit.ly/4bFLeaC Learn to create AI-based prototypes in the Computer Vision School! www.computervision.school 😃🚀🎓
Excelente!!, brillante. El mejor tutorial de pose estimation and keypoints training, 1000 gracias. La simplicidad y humildad de los elegidos, de los verdaderos genios. Gracias, gracias, gracias...ya me suscribí...
thank you so much, i have been stuck by v7 for two days to train my custom dataset, the tricky part is tensor size while i changed the number of keypoints which are different with 17 kpt skeleton detection. Today! I tried v8 after watching your video, that was amazing, so convenient! Selecting a good model can do save time lol
@@ComputerVisionEngineer I have only one question, I hope you can answer me please, if I want only the key points without the bounding box around the object, is that possible or each object should have both bbox and keypoints. I have a project where the data is annotated with keypoints without any bounding boxes. I watched your video, you added bbox coordinate for each object. Can I train Yolov8 with keypoints detection without object detection (bounding boxes)?
In order to train the keypoint detector you need both; keypoints and also the bounding box. If you only have the keypoints you could calculate an approximated bounding box by using OpenCV to find the bounding box that encloses the keypoints. 🙌
@@ComputerVisionEngineer Thank you so much. Yes, I checked that Yolov8 needs both and after a long preprocessing, I managed to do it. I trained it with both bounding boxes and key points and I achieved high mAP. Thank you so much for your answer. By the way, I did not find any TH-camrs who explained the parameters of Yolov8 or the dashboard that appears during the training (e.g., df_loss, pose_loss). If you can explain that in one of your future videos, I would be more appreciated.
The best tutorial I've ever seen. ❤ Can you please provide us the link of the annotated dataset including the train and val images, bcz it's take lot of time to annotate all the images. so that we can use it. Thank you. ❤
Thank you sir, your videos are amazing. This is really very helpful for me. I have one request for you can you prepare some more end-to-end industry-oriented project. Specifically Mechanical Industries.
Hi thank you for the video. I just want ask when we annotate is it possible to annotate videos with different positions in the same video ? Like sitting and standing and Kable them separately
Thank you for your good training. Regarding the Python script to convert the .xml file to YOLO-pose format, is the script in the video tutorial correct or is it available in the Git repository? Because they are different. I used the Git repository but it gives an error when training the algoritm.
After running your code, the .txt files (label) were created. But those label txt files, are not including the visibility flag (v). Thus error is occurring. What is the solution?
Yes, I tried. The way I managed to make it work was downloading the annotations in cvat format and then converting to coco keypoints format. If you find an easiest way to do it do let us know! Contributions are always appreciated. 😃💪
Dear Felipe, really great and helpful video. Thank you so much! In order of my studies I need to analyze/count the head movement of a worm. Therefore I just need to actually label the head (and maybe the tail) or would it still better to label every part of the worm? Or maybe even more generally, is this way of pose detection the way to do it? And/or is segmentation also an option? Any help appreciated. And do you a donation profile or somethin?
Keypoint detection on a worm sounds like an interesting project. Try with 2 keypoints, the head and the tail. Yes, I have a donations profile, you can support me in my Patreon! 😃🙌
@@ComputerVisionEngineer haha it was interesting, but now I get frustrated :/ The keypoints in the labels, that I get after using your conversion program, are just without any number behind. It appeared to me like you got those numbers automatically, am I wrong?
Hola Felipe, me ayudaron mucho tus videos en varios de mi proyectos. Sin embargo tengo un problema al intentar entrenar este modelo, no me detecta los labels y ya revisé todo mil veces y sigo sin entender por qué podría ser eso, ya que segui todo el procedimiento paso por paso y el archivo .txt tiene el formato requerido
@@alantorres6371 No sabría decirte qué puede ser, pero fijate que el config todo esté bien, dirección absoluta al directorio de los datos, el número de keypoints y la cantidad de clases.
Hi, could I ask what the units are for the results parameters post training, for e.g. parameters such as pose_loss and kobj_loss and where I can find this information? Thank you.
in the cvat_to_coco file, the code doesn't properly close files after opening them. This caused me some issues. This is my correction : with open(label_file_path, 'w') as label_file: label_file.write('0 {} {} {} {} '.format(str((xtl + (w / 2)) / width), str((ytl + (h / 2)) / height), str(w / width), str(h / height))) ...continue
Thank you so much for this very comprehensive video! 🙌 I was able to train a model on my custom data, but I would like to see those "lines" connecting the key points to generate a skeleton. How can I achieve that?
Hey. Great video. I really enjoyed watching it. However, the bounding box coordinates in your image labels are now incorrect in your google drive. A quick look and comparing the antelope_10002 label from your video (13:55) to the one in your repo will show the change. I plotted the difference. The original correctly listed center with height and width, in your repo (google drive) all of the bounding boxes are listed as the start/stop corner coordinates. (so upper and lower corner of the box). I'm planning to just script a change and convert the files to the correct format but I spent a day wondering why I couldn't build your basic model so I thought I would save someone else the time.
Hello, I trained for 300 epochs according to the cloud data provided by the author and the obtained bounding boxes (bbox) are incorrect. It seems that there is an issue with the coordinates annotation file for the bbox, just as you mentioned. If it were my own dataset, following the steps provided by the author should work correctly, shouldn't it?
I am training the model with cvat dataset with 3 classes(bounding boxes) and 11 key points but not able to train the model it is giving the error like "C:\Users\prime\anaconda3\envs\yolo_ultra_8083\lib\site-packages\ultralytics\data\dataset.py", line 139, in get_labels len_cls, len_boxes, len_segments = (sum(x) for x in zip(*lengths)) ValueError: not enough values to unpack (expected 3, got 0)" can you please give reference or any data for training multiple classes and respective keypoints
That happen to me as i forgot to put bounding boxes. I would go to ur saved dataset in cvat and check if really all the bounding boxes and keypoints are there.
Congratulations! I did everything according to your guide, but I'm getting an error: "No images found in D:\keypoint_detection\leg\data\labels\train.cache, training may not work correctly.". I've tried everything, but the error persists. Is there any way I can contact you to provide the files and code so you can tell me where I made a mistake?
Hi, I have already annotated images (composed of just non-normalized coordinates) without the bounding box. How can I convert my annotation to match yolov8?
Hey, you need the bounding box as well to use yolov8. If annotating the bounding boxes is not a possibility, you could estimate it by taking the bounding box that encloses your keypoints. 🙌
Good step by step project walkthrough from annotation, local hello world, cloud training, to analysing and testing final results ! Do you offer Full length Udemy like course ?
I have a question: If I need to key point detect two objects, such as people and animals, in my label file must the object id, bounding box, and key point set of both objects be in the label file?
Thanks for sharing! btw, how many images should I annotate to have a pose detection of 5 keypoints working fine? What are the key factors that affects this quantity?
@@ComputerVisionEngineer thank you! I tried with 500 and 50 epochs and the results improved a los from my previous training using roboflow. The only problem I noticed with Google colab is that the session restart after a few hours.
@@ComputerVisionEngineer oh that should works for my case as well. Thank you again, after finish with my current project I'll take your courses, your explanation was so clear.
Thank you for this tutorial. I have a question So, in the images where some keypoints are not present. What should I do ? How to annotate such images while maintaining the order of the annotation ?
Excellent tutorial - it has been incredibly beneficial. I greatly appreciate the sharing of your knowledge. I've gained much insight from it. Thank you immensely for your effort. However, I've encountered an issue when exporting annotations in CVAT 1.1 format: only one bounding box and its associated keypoints are exported, even though there are two boxes and two sets of keypoints present. Are you aware of how to address this problem? Additionally, could you provide guidance on how to train these annotations with YOLOv8?
Hi, thank you! Glad the content is helpful. Not sure what could be going on, does it happen every time you try to export an image with two objects annotated on it?
Thank you for your kind words. I can't do it every time I have more than two sets of annotations. And I want to know how to train more than one set of keypoints.
Can we have different key point numbers for different classes? as an example, I want to annotate my data which has 2 different classes. Class 1 has 3 key points and class 2 has 6 key points. Is it possible to train yolov8 on it? if yes how should we annotate data and prepare the config.yaml file? appreciate any help.
I would advice you to use the same keypoints in your different classes. If this is not possible, the keypoints the classes have in common should have the same 'meaning'. Take a look at the video, some animals have antlers and others don't. The keypoints in the antlers of those animals without antlers are annotated with a visibility flag of 0 (which means not labeled). As for all the remaining keyponts, they have the same 'meaning' in all the animals (eyes, ears, legs, etc.). 💪
Hi Felipe, I followed your tuto. However, I have a problem about corrupted images or label : "ignoring corrupt image/label: labels require 11 columns each". I have this for all my images. Do you know where does it come from ?
your cvat to coco keypoints format convertor does not work for multiple classes. i see all my 3 classes in cvat format, but in the coco converter version i only see class-id 0 at the start of every line.
Hi there, Sorry that this reply is not to your question. I wanted to know if you manually annotated all the images in your dataset or there was a way to do annotation faster. I was working on another dataset which has 12000 train images and 2800 test images. But it doesn't have annotation.txt files for labels. Can you suggest something? Regards
Hey, In this tutorial I show you how to produce annotations using CVAT, but those are not the labels I use to train the model. The dataset I used to train the model was already annotated with the format. 🙌
I didn't manually annotated all the images in the dataset. Depending on the situation, there could be ways to speed up the annotation process, for example you could 'pre annotate' the data by using the predictions from other model.
@@ComputerVisionEngineer Hello, and thanks for your videos, I've started with object detection for a project and I find your videos the best, most precise and helpful! However I am still struggeling a little - I used your code to convert the xml file to yolo format but as you pointed out here as well I only have format - so one question, do I need the visivility variable or is it just as well possible to just use the format, but what difference does it make then?
One follow-up question: In the video, you annotate with CVAT, but your actual annotations have the third visibility dimension for the keypoints. What tool do you recommend that includes that ability to include the visibility dimension? I don't need it for what I'm doing this week, but I'll need it next week. I appreciate any advice you can offer that will save me from trial/error more tools.
Not sure what could be the best tool to label the third visibility dimension keypoint, I haven't used any so far. The dataset I used in the video was already annotated. Sorry I can't help you with this. 🙌
@@ComputerVisionEngineer Thanks for the reply. Looks like RectLabel is capable but the bbox annotation it is producing along with the keypoints is generated as a box around the keypoints, not an additional one around the object. Not exactly what I want so I'm pondering combining a bbox with the keypoints and then swapping the box through a script. If that tool had decent (aka: ANY) tutorial videos it would be a lot easier to use.
Hi Felipe,is it possible to share with us the annotated dataset or a subset of it in the video since annotating 600 images is a seriously quite a lot to do? Thanks for your great tutorial on the topic.
Is labeling each key-point with visibility flags [0-1-2] any different from labeling them with visibility flags [0-1] or [0-2]? I mean using visibility flag 1 for occluded key-points.
good but not good enought to actually bind a skeleton and make it move, so what is it usefull for? Also I would have used a multi-class aproach for the different parts of the body
Thank you for very useful video,I followed the steps you said for custom data annotations. But there is a problem, after converting from xml to .txt format using your code, how can we get the train.cache and val.cache that need to present in the 'labels' folder. Please let me how to solve this because without cache file i'm unable to train the custom dataset. Thanks in advance
Hi, great tutorial like always, you are an inspiration for many people. I wonder if the "bounding box" of the classes could be an ellipse or a polygon, does YOLO can recognize this kind of annotation? In case it can, how should i edit the yaml file, beacuse i notice that in the yaml file there is not configuration for the bounding box, looks like the rectangle is the default.
At the best of my knowledge, yolov8 object detection only supports rectangular shapes. You may find other technologies to do polygon detection. Alternatively, you could do semantic segmentation with yolov8, by doing so you will detect objects by producing a mask with an arbitrary shape. 🙌
Thank you for your efforts. This seems to be really the best tutorial about yolo pose. But I have a question: You annotated with cvat and then saved a cvat format. Then you used a (39,3) dimension during training. How did the cvat to yolo file create the third dimension. I do the same, but I cannot create the third dimension (visibility) with cvat annotation. How should I create the 3 dimensional keypoint annotation with cvat???
Hey, in this video I used cvat only to show an example of how to annotate images to train a pose detector. Cvat doesn't support the visibility flag, only 2-dimentional xy annotation is supported. The dataset I used to train the pose detector in this video wasn't annotated with cvat, though. The dataset is publicly available, (x, y, v) annotations are provided. 🙌
** coco annotator creates a json file, but ultralytics json2yolo conversion does not convert properly. Do I have to write a converter for coco, like you did for cvat?
Turning a coco format json file to a yolo format was the easiest solution. These days open source community is getting weeker, every free code is used to create pennies for companies
Thanks for the video. When annotating data, say an antelope, what happens if the antelope is female and it does not have antler? do we skip or do we annotate the imaginary antler?
Ideally, it is better to use the x, y, v format for these situations, If you are using the x, y, v format you can label those keypoints as 0, 0, 0. If you are using the x, y format you could apply different criteria, one of them could be annotating the keypoints where they 'should be located' (in the case of non existing antlers I would locate the keypoints in the head). But again, this is not ideal, try to use the x, y, v format in those situations. 🙌
Can we extract the x,y,z coordinates of a certain landmark with yolo v8 ?? I saw this in one of the mediapipe solutions, but that solution is not customizable on a custom dataset. So I'm looking for the alternative
What a high-quality tutorial, it's going to be very useful for several people. Let me just ask a question, can I follow these steps if there were more than one animal in the image, that is, different instances for the same class in the image? Would I need to modify anything in the annotations?
Thank you! If there is more than one object in the images you would need to adjust the code in CVAT_to_cocoKeypoints.py. In the txt files with the labels, there should be a line for every object in the image; each line should follow the structure I explain in the video. Other than that everything I explain in this tutorial should work fine! 😃🙌
@@ComputerVisionEngineer Great, thanks. By the way, one tip I would add is to use skeleton annotation. In this format, CVAT exports in COCO keypoints (json) format, and then you can use reclabel to convert from COCO json to Yolov8 format.
I have data having two objects and every object has multiple points, (23,11) I was facing error for having heterogenous number of points. So I made them regular by putting zero padding at end of 2nd object. Now I am having 74, keypoints for each row, 1 id + 4 bounding box + (23*3)=69 points for both object, I have two lines. My yaml looks like this # number of classes nc: 2 names: 0: A 1: B nkpt: 46 kpt_shape: [46, 3] but model starting saying he is looking for 46*3=138 +5 in totla 143 points, so every image is getting corrupted, Plz help me in resolving this.
my bad, since you have 2 objects and each having 23 points so it will be just replace in yaml nkpt: 23 kpt_shape: [23, 3] and all other strcutre will be same
I am struggling with running the training script. It can find 11 images when I run it but says all 11 are corrupted. Are the images required to be of a specific type? I am using jpg
Sorry, you were indeed correct, it had nothing to do with the image file type. I have struggled to make it function on data I have annotated through CVAT, and I suspect that it might be because the converted COCO label data needs to have the 3rd value of visibillity in order to function. I annotated the data using Roboflow instead and it outputs the labels with this 3rd value for me, and it worked fine@@ComputerVisionEngineer
Hi Felipe, how do you think if we use a larger model for pretraining will we have a better result? I.e. yolov8m-pose, yolov8l-pose, yolov8x-pose etc. And another question - it looks like while labeling you didn't use the variant v=1 labeled but not visible - do you think it's not important? Thanks.
Hey Konstantin, using a larger model could produce better results. According to performance metrics on yolov8 website larger models achieve a higher average precision. Do you mean during my custom labeling I didn't use v=1? As far as I know CVAT doesn't support the (x, y, v) format for keypoints annotation. Since I used CVAT as my tool, I only used the (x, y) format. I wouldn't say 'it is not important', if you have visibility information for all keypoints, it's even better. But if you don't, you can still try training the model using the (x, y) format. 💪💪
Thanks for your video, but when I follow you I get an error: ValueError: not enough values to unpack (expected 3, got 0) I tried to find the solution on github but still can't, can you help me? Thanks a lot
Hi there! I am trying to find the xy coordinates of the keypoints on my test image, I tried your code to print keypoints but it throws on error - 'Keypoints' object has no attribute 'tolist'. Could you help me with this. I typed exactly whats in the video. I just changed the path to my trained model and test image.
@@ComputerVisionEngineer I downloaded the latest ultralytics as I started working with this recently. I wanted to infer and print the keypoint values on test image, so I can compare it with ground truth value. I am not able to infer the keypoints from test image. Could you help me with the code?
@@ComputerVisionEngineer, Hi thanks for your video, I used the same version, ultralytics==8.0.83, Keypoints' object has no attribute 'tolist'. Could you help me with this?
@@BettyLQC Use this code to predict on test images and extract the predicted keypoints coordinates: model = YOLO('path to your trained model .pt file path') results = model('image file path)) # predict on an image for xy in result.keypoints[0].xy[0].cpu().numpy(): xy_values.extend(xy) If you want the normalised keypoint values, replace 'xy' with xyn'
I have detected an object through the keypoints ,if I can tracking the object still using the keypoints rather than other infomation like the coordinates of bbox.
Object tracking algorithms I am aware of take the bounding box. Nevertheless, there is some research around object tracking using keypoints. If you prefer using keypoints take a look at some papers on this topic. 💪
@@owenviolet-s5c im trying to use the keypoints on an object to localize it, the idea is to acquire the grasping point (center of the first two keypoints, the first keypoint is used to calculate the rotation) of the object using yolo. I have tried v8 and it works pretty well to detect bboxes and the kpts inside each bbox. For calculating the coordinate of the grasping point (i assume) for your object, you need to write another simple code to calculate the center of the frame based on detected kpts. The model can predict the kpts in a specific order as you labelled them, as in this video, the order of keypoints matters. Good luck!
@@Nufall if your data is comprised of 5 points with the x, y, v format you should have 20 values for every object in your annotations file Class_id, xc, yc, w, h,
Hey, if you are looking for the data I used to train the pose detector in this tutorial, take a look at the Readme file of this project's repository. 🙌
Do you mean the annotations? Do you have an annotated keypoints detection dataset you can use in this project? If not, you can download the one I am using in this video!
Для тех у кого ошибка "np.numpy array expected" и "images corrupted": Все аннотации должны быть в формате numpy по типу 0.000000, а не 0.0000000000000. Я исправил это добавив в скрипт "convert to coco" из видео следующее: format(value_here, '.6f'), для всех значений где они записываются в файл. Так что получившийся файл будет со всеми значениями в верном формате по типу 0.161122, 0.112343 вместо 0.23433111111112, и т.д. --------------------- For those who have the error "np.numpy array expected" and "images corrupted": All annotations must be in numpy format like 0.000000, not 0.0000000000000. I fixed this by adding the following to the "convert to coco" script from the video: format(value_here, '.6f'), for all values where they are written to the file. So the resulting file will have all the values in the correct format like 0.161122, 0.112343 instead of 0.23433111111112, etc.
Did you enjoy this video? Try my premium courses! 😃🙌😊
● Hands-On Computer Vision in the Cloud: Building an AWS-based Real Time Number Plate Recognition System bit.ly/3RXrE1Y
● End-To-End Computer Vision: Build and Deploy a Video Summarization API bit.ly/3tyQX0M
● Computer Vision on Edge: Real Time Number Plate Recognition on an Edge Device bit.ly/4dYodA7
● Machine Learning Entrepreneur: How to start your entrepreneurial journey as a freelancer and content creator bit.ly/4bFLeaC
Learn to create AI-based prototypes in the Computer Vision School! www.computervision.school 😃🚀🎓
is there any one single master course to learn Computer Vision A-Z ?
This is the first video that I can find to do custom dataset for pose detection and training beside the human pose. Thank you so much.
🙂You are welcome! Glad you enjoyed it! 💪🙌
How to mark visibility of any particular point?
These are the best yolo walkthroughs hands down. Thanks for sharing!
You are welcome! 😃🙌
Even after ~ one year this video is still my reference to tackle pose estimate workflow !
Excelente!!, brillante. El mejor tutorial de pose estimation and keypoints training, 1000 gracias. La simplicidad y humildad de los elegidos, de los verdaderos genios. Gracias, gracias, gracias...ya me suscribí...
Gracias por tu apoyo! Me alegra que el video te haya sido util! 😃🙌
Spectacular video Felipe !
Greetings from Brazil 🇧🇷
Thank you, Otavio! I am glad you enjoyed it! 😃🙌
Excellent Tutorial, really very helpful. I learnt a lot and trained on my own custom dataset. Thank you so much for such a intuitive tutorial😊😊
😃 Glad it was helpful!! You are welcome!! 🙌
thank you so much, i have been stuck by v7 for two days to train my custom dataset, the tricky part is tensor size while i changed the number of keypoints which are different with 17 kpt skeleton detection. Today! I tried v8 after watching your video, that was amazing, so convenient! Selecting a good model can do save time lol
Yeah, agreed! I am glad you found the video useful! 🙌
@@ComputerVisionEngineer really useful and helpful. Looking forward to your new videos in the future! Thanks
@@leomatero2524 oh yes, it is wrong, thank you so much for noticing it! I will fix it ASAP 💪🙌
@@ComputerVisionEngineer no worries, annotation format is always tricky 🤣
@@leomatero2524 could you please tell me what was the the issues with the annotation format?
I do not find words to thank you, you did a great video!
Thank you for your kind words, Hamzawi! 😊 Glad you enjoyed it! 😃💪
@@ComputerVisionEngineer I have only one question, I hope you can answer me please, if I want only the key points without the bounding box around the object, is that possible or each object should have both bbox and keypoints. I have a project where the data is annotated with keypoints without any bounding boxes. I watched your video, you added bbox coordinate for each object. Can I train Yolov8 with keypoints detection without object detection (bounding boxes)?
In order to train the keypoint detector you need both; keypoints and also the bounding box. If you only have the keypoints you could calculate an approximated bounding box by using OpenCV to find the bounding box that encloses the keypoints. 🙌
@@ComputerVisionEngineer Thank you so much. Yes, I checked that Yolov8 needs both and after a long preprocessing, I managed to do it. I trained it with both bounding boxes and key points and I achieved high mAP. Thank you so much for your answer. By the way, I did not find any TH-camrs who explained the parameters of Yolov8 or the dashboard that appears during the training (e.g., df_loss, pose_loss). If you can explain that in one of your future videos, I would be more appreciated.
This is the best project walkthrough I can find till now. By the way love the italian accent. Sounding like the godfather is himself teaching you😂
😂 Thank you! My native language is Spanish, though. 🙂
This video is great for pose detection. Thank you so much.
your videos are so good and easy to follow ive really learned alot thank you.
😃 Thank you! I am glad you enjoy them! 😊
best aadmi hai bhai maza aagya love you bhai🤎♥
The best tutorial I've ever seen. ❤
Can you please provide us the link of the annotated dataset including the train and val images, bcz it's take lot of time to annotate all the images. so that we can use it.
Thank you. ❤
Amazing!!!
Keep up the good work man💪💪
😃💪💪
Thank you sir, your videos are amazing. This is really very helpful for me. I have one request for you can you prepare some more end-to-end industry-oriented project. Specifically Mechanical Industries.
Hey Sanjay, glad the videos are helpful! 😃 Sure, I enjoy end to end projects too, I may make more of those in the future. 💪💪
Pretty pretty pretty good tutorial, thank you😀
😃 Thank you! Glad you enjoyed it! 🙌
Really nice work man! God bless you
Thank you! 😊🙌
@@ComputerVisionEngineer I really hope your channel grow, are you on LinkedIn? Let's stay connected.
Hi thank you for the video. I just want ask when we annotate is it possible to annotate videos with different positions in the same video ? Like sitting and standing and Kable them separately
Hey! Good video!
If you have more than one class, how the config file look like? Bc maybe the #keypoints with the flip idx, should be different?
Thank you for your good training.
Regarding the Python script to convert the .xml file to YOLO-pose format, is the script in the video tutorial correct or is it available in the Git repository? Because they are different. I used the Git repository but it gives an error when training the algoritm.
Thanks for the great video.
How should I label a point that falls outside of the image?
After running your code, the .txt files (label) were created. But those label txt files, are not including the visibility flag (v).
Thus error is occurring. What is the solution?
Have you actually tried to export directly in YOLO format from CVAT? YOLOv8 format is the same as v5
Yes, I tried. The way I managed to make it work was downloading the annotations in cvat format and then converting to coco keypoints format. If you find an easiest way to do it do let us know! Contributions are always appreciated. 😃💪
Dear Felipe, really great and helpful video. Thank you so much! In order of my studies I need to analyze/count the head movement of a worm. Therefore I just need to actually label the head (and maybe the tail) or would it still better to label every part of the worm? Or maybe even more generally, is this way of pose detection the way to do it? And/or is segmentation also an option? Any help appreciated. And do you a donation profile or somethin?
of course those is video data... I extracted the first frame to generate pictures. Is it a good idea to use video data for annotations?
Keypoint detection on a worm sounds like an interesting project. Try with 2 keypoints, the head and the tail. Yes, I have a donations profile, you can support me in my Patreon! 😃🙌
@@ComputerVisionEngineer haha it was interesting, but now I get frustrated :/ The keypoints in the labels, that I get after using your conversion program, are just without any number behind. It appeared to me like you got those numbers automatically, am I wrong?
@@johnton96 hi, what 'number behind' are you referring to? The visibility flag?
@@ComputerVisionEngineer the number which shows if it´s not visible/not labeled, not visivle or visible and labeled
Hola Felipe, me ayudaron mucho tus videos en varios de mi proyectos. Sin embargo tengo un problema al intentar entrenar este modelo, no me detecta los labels y ya revisé todo mil veces y sigo sin entender por qué podría ser eso, ya que segui todo el procedimiento paso por paso y el archivo .txt tiene el formato requerido
Hola, estás usando el mismo dataset que yo en el video?
@@ComputerVisionEngineer noo estoy usando un dataset nuevo que incluye cubos y 4 keypoints que serían las aristas de la cara superior del cubo
@@alantorres6371 No sabría decirte qué puede ser, pero fijate que el config todo esté bien, dirección absoluta al directorio de los datos, el número de keypoints y la cantidad de clases.
Hi, could I ask what the units are for the results parameters post training, for e.g. parameters such as pose_loss and kobj_loss and where I can find this information? Thank you.
If I just want to ge the coordinate of the keypoint, what should I do?
Execute predict(video,save_txt = true)?
Great tutorial. Thank you.
😃🙌 You are welcome!
in the cvat_to_coco file, the code doesn't properly close files after opening them. This caused me some issues.
This is my correction :
with open(label_file_path, 'w') as label_file:
label_file.write('0 {} {} {} {}
'.format(str((xtl + (w / 2)) / width), str((ytl + (h / 2)) / height),
str(w / width), str(h / height)))
...continue
Thank you so much for this very comprehensive video! 🙌
I was able to train a model on my custom data, but I would like to see those "lines" connecting the key points to generate a skeleton. How can I achieve that?
You could connect the dots using cv2.line() method. 😃💪
what is your future video plans?
😃 There are several possible ideas. Stay tuned to find out! 😎🙌
Hey. Great video. I really enjoyed watching it. However, the bounding box coordinates in your image labels are now incorrect in your google drive. A quick look and comparing the antelope_10002 label from your video (13:55) to the one in your repo will show the change. I plotted the difference. The original correctly listed center with height and width, in your repo (google drive) all of the bounding boxes are listed as the start/stop corner coordinates. (so upper and lower corner of the box). I'm planning to just script a change and convert the files to the correct format but I spent a day wondering why I couldn't build your basic model so I thought I would save someone else the time.
Hey, thank you so much for the heads up! I will take a look and correct it as soon as possible! 🙌
Hello, I trained for 300 epochs according to the cloud data provided by the author and the obtained bounding boxes (bbox) are incorrect. It seems that there is an issue with the coordinates annotation file for the bbox, just as you mentioned. If it were my own dataset, following the steps provided by the author should work correctly, shouldn't it?
I am training the model with cvat dataset with 3 classes(bounding boxes) and 11 key points but not able to train the model it is giving the error like "C:\Users\prime\anaconda3\envs\yolo_ultra_8083\lib\site-packages\ultralytics\data\dataset.py", line 139, in get_labels len_cls, len_boxes, len_segments = (sum(x) for x in zip(*lengths)) ValueError: not enough values to unpack (expected 3, got 0)"
can you please give reference or any data for training multiple classes and respective keypoints
That happen to me as i forgot to put bounding boxes. I would go to ur saved dataset in cvat and check if really all the bounding boxes and keypoints are there.
hi, I have the same issue right now? can you please help me if you solved it?
Hola Felipe! Charrua? Lidero una startup que usa pose estimation... Felicitaciones por el canal!
Hola! Sí, uruguayo! Vos también? Gracias!! 😃 🙌
@@ComputerVisionEngineer de Argentina! Felicitaciones por el canal y el contenido
Great video!! I have one question: is there a significant performance difference if you don't flag the visibility of the points vs doing so? Thanks!!
If you have the visibility information the better. But if you don't, the model is most likely to learn the structure of your objects anyway.
@@ComputerVisionEngineer Thanks!
Congratulations! I did everything according to your guide, but I'm getting an error: "No images found in D:\keypoint_detection\leg\data\labels\train.cache, training may not work correctly.". I've tried everything, but the error persists. Is there any way I can contact you to provide the files and code so you can tell me where I made a mistake?
Hi, I have already annotated images (composed of just non-normalized coordinates) without the bounding box. How can I convert my annotation to match yolov8?
Hey, you need the bounding box as well to use yolov8. If annotating the bounding boxes is not a possibility, you could estimate it by taking the bounding box that encloses your keypoints. 🙌
Good step by step project walkthrough from annotation, local hello world, cloud training, to analysing and testing final results ! Do you offer Full length Udemy like course ?
I have a question: If I need to key point detect two objects, such as people and animals, in my label file must the object id, bounding box, and key point set of both objects be in the label file?
Thanks for sharing! btw, how many images should I annotate to have a pose detection of 5 keypoints working fine? What are the key factors that affects this quantity?
It depends on many factors, I usually try to have a few thousand images to train any type of model. 🙌
@@ComputerVisionEngineer thank you! I tried with 500 and 50 epochs and the results improved a los from my previous training using roboflow. The only problem I noticed with Google colab is that the session restart after a few hours.
yes, there are time restrictions in google colab, I usually use an ec2 instance in the cloud to train models
@@ComputerVisionEngineer oh that should works for my case as well. Thank you again, after finish with my current project I'll take your courses, your explanation was so clear.
Thank you for this tutorial. I have a question
So, in the images where some keypoints are not present. What should I do ?
How to annotate such images while maintaining the order of the annotation ?
Do you mean keypoints are not visible or are not present at all?
Excellent tutorial - it has been incredibly beneficial. I greatly appreciate the sharing of your knowledge. I've gained much insight from it. Thank you immensely for your effort. However, I've encountered an issue when exporting annotations in CVAT 1.1 format: only one bounding box and its associated keypoints are exported, even though there are two boxes and two sets of keypoints present. Are you aware of how to address this problem? Additionally, could you provide guidance on how to train these annotations with YOLOv8?
Hi, thank you! Glad the content is helpful. Not sure what could be going on, does it happen every time you try to export an image with two objects annotated on it?
Thank you for your kind words. I can't do it every time I have more than two sets of annotations. And I want to know how to train more than one set of keypoints.
Training/Validation images do the not need to all be the same size, correct? Thanks :)
could you please tell me the annotations process for MULTIPLE OBJECTs PER IMAGE of same class.
Idolo
Can we have different key point numbers for different classes? as an example, I want to annotate my data which has 2 different classes. Class 1 has 3 key points and class 2 has 6 key points. Is it possible to train yolov8 on it? if yes how should we annotate data and prepare the config.yaml file? appreciate any help.
I would advice you to use the same keypoints in your different classes. If this is not possible, the keypoints the classes have in common should have the same 'meaning'. Take a look at the video, some animals have antlers and others don't. The keypoints in the antlers of those animals without antlers are annotated with a visibility flag of 0 (which means not labeled). As for all the remaining keyponts, they have the same 'meaning' in all the animals (eyes, ears, legs, etc.). 💪
Bro I recently did your emotion detection but it show modules not found utils.dataset what to do I try my best but still couldn't get it right
Try by adding face_classification/src/ to your PYTHONPATH. 💪
Exporting as coco keypoints still not working as of May 2024...
Hi Felipe, I followed your tuto. However, I have a problem about corrupted images or label : "ignoring corrupt image/label: labels require 11 columns each". I have this for all my images. Do you know where does it come from ?
The issue may be your annotations don't match your kpt_shape. What is your kpt_shape?
@@ComputerVisionEngineer
kpt_shape: [2, 3]
flip_idx: [0, 1]
I only need 2 keypoints
Is that possible to make a video how to run key points detection with yolov8 in the web? By using tfjs
your cvat to coco keypoints format convertor does not work for multiple classes. i see all my 3 classes in cvat format, but in the coco converter version i only see class-id 0 at the start of every line.
Agreed. The script only works for 1 class, it needs to be adapted to deal with multi class annotation. 🙌
I have a knowledge of Only python , would i be able to understand ?, or should I learn other things,? to start it ? please suggest
Hey, yeah I think you will be able to understand. Even without Python you would be able to follow along 90% of this tutorial. 💪🙌
Great video, I was wondering how you were able to get the visibility tag in the labels files, as they were not included in the exported xml file.
Hi there,
Sorry that this reply is not to your question.
I wanted to know if you manually annotated all the images in your dataset or there was a way to do annotation faster. I was working on another dataset which has 12000 train images and 2800 test images. But it doesn't have annotation.txt files for labels. Can you suggest something?
Regards
Hey, In this tutorial I show you how to produce annotations using CVAT, but those are not the labels I use to train the model. The dataset I used to train the model was already annotated with the format. 🙌
I didn't manually annotated all the images in the dataset. Depending on the situation, there could be ways to speed up the annotation process, for example you could 'pre annotate' the data by using the predictions from other model.
@@ComputerVisionEngineer Hello, and thanks for your videos, I've started with object detection for a project and I find your videos the best, most precise and helpful! However I am still struggeling a little - I used your code to convert the xml file to yolo format but as you pointed out here as well I only have format - so one question, do I need the visivility variable or is it just as well possible to just use the format, but what difference does it make then?
One follow-up question: In the video, you annotate with CVAT, but your actual annotations have the third visibility dimension for the keypoints. What tool do you recommend that includes that ability to include the visibility dimension? I don't need it for what I'm doing this week, but I'll need it next week. I appreciate any advice you can offer that will save me from trial/error more tools.
Not sure what could be the best tool to label the third visibility dimension keypoint, I haven't used any so far. The dataset I used in the video was already annotated. Sorry I can't help you with this. 🙌
@@ComputerVisionEngineer Thanks for the reply. Looks like RectLabel is capable but the bbox annotation it is producing along with the keypoints is generated as a box around the keypoints, not an additional one around the object. Not exactly what I want so I'm pondering combining a bbox with the keypoints and then swapping the box through a script. If that tool had decent (aka: ANY) tutorial videos it would be a lot easier to use.
Hi Felipe,is it possible to share with us the annotated dataset or a subset of it in the video since annotating 600 images is a seriously quite a lot to do? Thanks for your great tutorial on the topic.
Hey, sure, I've added a link to the data in the Readme file of this project's github repository. 🙌
Is labeling each key-point with visibility flags [0-1-2] any different from labeling them with visibility flags [0-1] or [0-2]? I mean using visibility flag 1 for occluded key-points.
Labelling with visibility flag for occluded keypoints could lead to a more robust training.
Spectacular video CV Engineercan it work in occluded environment of humans and animals
Thank you! Yes, I think it would work with some occlusions. 🙌
hai, i just saw your videos, is it can be done with yolov5? i mean, can we train pose detection with yolov5? thank you
What a great content, thanks.
Is the model also capable of activity detection?
Do you mean action recognition?
Yes, sorry for the term abuse. @@ComputerVisionEngineer
@@eitanas85 I would do the action recognition in two steps: landmark detection + Scikit learn classifier
good but not good enought to actually bind a skeleton and make it move, so what is it usefull for?
Also I would have used a multi-class aproach for the different parts of the body
Thank you so much for your contributions! I will definitely keep it in mind for future videos. 💪🙌
can i use this to measure distances between landmark? please answer me Thank you so much
Awesome could be the small word💥
😎😊 Thank you for your support, Sreekar! 🙌
Thank you for very useful video,I followed the steps you said for custom data annotations. But there is a problem, after converting from xml to .txt format using your code, how can we get the train.cache and val.cache that need to present in the 'labels' folder. Please let me how to solve this because without cache file i'm unable to train the custom dataset. Thanks in advance
Hey, the cache files are not needed. You need to locate the data (images and annotations) exactly as I show in the tutorial. 🙌
How to add label 0 0 0 for the point which is not vissible in cvat0
Hi, great tutorial like always, you are an inspiration for many people.
I wonder if the "bounding box" of the classes could be an ellipse or a polygon, does YOLO can recognize this kind of annotation? In case it can, how should i edit the yaml file, beacuse i notice that in the yaml file there is not configuration for the bounding box, looks like the rectangle is the default.
At the best of my knowledge, yolov8 object detection only supports rectangular shapes. You may find other technologies to do polygon detection. Alternatively, you could do semantic segmentation with yolov8, by doing so you will detect objects by producing a mask with an arbitrary shape. 🙌
Thank you for your efforts. This seems to be really the best tutorial about yolo pose.
But I have a question: You annotated with cvat and then saved a cvat format. Then you used a (39,3) dimension during training. How did the cvat to yolo file create the third dimension. I do the same, but I cannot create the third dimension (visibility) with cvat annotation. How should I create the 3 dimensional keypoint annotation with cvat???
Hey, in this video I used cvat only to show an example of how to annotate images to train a pose detector. Cvat doesn't support the visibility flag, only 2-dimentional xy annotation is supported.
The dataset I used to train the pose detector in this video wasn't annotated with cvat, though. The dataset is publicly available, (x, y, v) annotations are provided. 🙌
I am struggling to find a (x,y,v) keypoint annotator, preferably opensource 😅… Could you please suggest one?❤
** coco annotator creates a json file, but ultralytics json2yolo conversion does not convert properly. Do I have to write a converter for coco, like you did for cvat?
Turning a coco format json file to a yolo format was the easiest solution. These days open source community is getting weeker, every free code is used to create pennies for companies
@@erencerman4466 I struggled to find one too! I didn't find any open source.
Thanks for the video.
When annotating data, say an antelope, what happens if the antelope is female and it does not have antler? do we skip or do we annotate the imaginary antler?
Ideally, it is better to use the x, y, v format for these situations, If you are using the x, y, v format you can label those keypoints as 0, 0, 0.
If you are using the x, y format you could apply different criteria, one of them could be annotating the keypoints where they 'should be located' (in the case of non existing antlers I would locate the keypoints in the head). But again, this is not ideal, try to use the x, y, v format in those situations. 🙌
Can we extract the x,y,z coordinates of a certain landmark with yolo v8 ??
I saw this in one of the mediapipe solutions, but that solution is not customizable on a custom dataset. So I'm looking for the alternative
Did you find a solution to this?
What a high-quality tutorial, it's going to be very useful for several people. Let me just ask a question, can I follow these steps if there were more than one animal in the image, that is, different instances for the same class in the image? Would I need to modify anything in the annotations?
Thank you! If there is more than one object in the images you would need to adjust the code in CVAT_to_cocoKeypoints.py. In the txt files with the labels, there should be a line for every object in the image; each line should follow the structure I explain in the video. Other than that everything I explain in this tutorial should work fine! 😃🙌
@@ComputerVisionEngineer Great, thanks. By the way, one tip I would add is to use skeleton annotation. In this format, CVAT exports in COCO keypoints (json) format, and then you can use reclabel to convert from COCO json to Yolov8 format.
Oh, amazing! Thank you so much for the heads up! 😃😃😃
@@jordaocassiano if i have the coco keypoints in a json format, how do i use that to make a yolov8 pose estimation model?
what if i have many humans in the image? will the dataset .txt file be the same?
I have data having two objects and every object has multiple points, (23,11) I was facing error for having heterogenous number of points. So I made them regular by putting zero padding at end of 2nd object.
Now I am having 74, keypoints for each row, 1 id + 4 bounding box + (23*3)=69 points for both object, I have two lines.
My yaml looks like this # number of classes
nc: 2
names:
0: A
1: B
nkpt: 46
kpt_shape: [46, 3]
but model starting saying he is looking for 46*3=138 +5 in totla 143 points, so every image is getting corrupted,
Plz help me in resolving this.
my bad, since you have 2 objects and each having 23 points so it will be just replace in yaml nkpt: 23
kpt_shape: [23, 3] and all other strcutre will be same
Hello, can anyone tell what is kobj_loss? Mine is zero everytime during training.
I am struggling with running the training script. It can find 11 images when I run it but says all 11 are corrupted. Are the images required to be of a specific type? I am using jpg
Jpg format should be ok. Have you tried with other images? Additionally, only 11 images may not be enough to train a robust model.
Sorry, you were indeed correct, it had nothing to do with the image file type. I have struggled to make it function on data I have annotated through CVAT, and I suspect that it might be because the converted COCO label data needs to have the 3rd value of visibillity in order to function. I annotated the data using Roboflow instead and it outputs the labels with this 3rd value for me, and it worked fine@@ComputerVisionEngineer
Please could you share the trained model.
Hi Felipe, how do you think if we use a larger model for pretraining will we have a better result? I.e. yolov8m-pose, yolov8l-pose, yolov8x-pose etc.
And another question - it looks like while labeling you didn't use the variant v=1 labeled but not visible - do you think it's not important?
Thanks.
Hey Konstantin, using a larger model could produce better results. According to performance metrics on yolov8 website larger models achieve a higher average precision. Do you mean during my custom labeling I didn't use v=1? As far as I know CVAT doesn't support the (x, y, v) format for keypoints annotation. Since I used CVAT as my tool, I only used the (x, y) format. I wouldn't say 'it is not important', if you have visibility information for all keypoints, it's even better. But if you don't, you can still try training the model using the (x, y) format. 💪💪
Can you make a video about how to implements DeepFace?
I will try to do a video about DeepFace. 🙌
Thanks for your video, but when I follow you I get an error:
ValueError: not enough values to unpack (expected 3, got 0)
I tried to find the solution on github but still can't, can you help me? Thanks a lot
What line is throwing the error?
Hi, I have a question. Is the order of the key points important even if I have a dataset with images all completely on one side? 😊
🤔 I would say that when training a pose detector the order of keypoints is always important
@@ComputerVisionEngineer ooookk thank you very much :)
Hi there! I am trying to find the xy coordinates of the keypoints on my test image, I tried your code to print keypoints but it throws on error - 'Keypoints' object has no attribute 'tolist'. Could you help me with this. I typed exactly whats in the video. I just changed the path to my trained model and test image.
Hey, are you using the same version of ultralytics that I am using in the video?
@@ComputerVisionEngineer I downloaded the latest ultralytics as I started working with this recently. I wanted to infer and print the keypoint values on test image, so I can compare it with ground truth value. I am not able to infer the keypoints from test image. Could you help me with the code?
@@ComputerVisionEngineer, Hi thanks for your video, I used the same version, ultralytics==8.0.83, Keypoints' object has no attribute 'tolist'. Could you help me with this?
@@BettyLQC Use this code to predict on test images and extract the predicted keypoints coordinates:
model = YOLO('path to your trained model .pt file path')
results = model('image file path)) # predict on an image
for xy in result.keypoints[0].xy[0].cpu().numpy():
xy_values.extend(xy)
If you want the normalised keypoint values, replace 'xy' with xyn'
Hi, I want to know if I can use the keypoint to tracking an object.Thanks!
I have detected an object through the keypoints ,if I can tracking the object still using the keypoints rather than other infomation like the coordinates of bbox.
Object tracking algorithms I am aware of take the bounding box. Nevertheless, there is some research around object tracking using keypoints. If you prefer using keypoints take a look at some papers on this topic. 💪
@@owenviolet-s5c im trying to use the keypoints on an object to localize it, the idea is to acquire the grasping point (center of the first two keypoints, the first keypoint is used to calculate the rotation) of the object using yolo. I have tried v8 and it works pretty well to detect bboxes and the kpts inside each bbox. For calculating the coordinate of the grasping point (i assume) for your object, you need to write another simple code to calculate the center of the frame based on detected kpts. The model can predict the kpts in a specific order as you labelled them, as in this video, the order of keypoints matters. Good luck!
can I extract bounding box from yolo v8 pose, tks
You can detect the bounding box using the pose detector, yes.
In the convest xml to txt code i get index out of range (the line start with bbox)
can u help me pls
it said :
bbox = image.getElementsByTagName('box')[0]
IndexError: list index out of range
i got same error? did you fix it?
Chat to coco format not working
thanks you
You are welcome! 😃
Got this error when training, could you help? ValueError: not enough values to unpack (expected 3, got 0)
I have 5 key points but the labels have 1 class name + 14 coordinate only, shouldn’t it be 15? Is there’s something wrong with the script?
Hey, you have 5 keypoints and you are using the (x, y, v) format for every keypoint?
@@ComputerVisionEngineer yes, i used the same script proposed on your tutorial
@@Nufall if your data is comprised of 5 points with the x, y, v format you should have 20 values for every object in your annotations file
Class_id, xc, yc, w, h,
@@ComputerVisionEngineer i followed exactly the same steps on your tutorial, where do you think the issues might be? Thank you very much for replying!
I tried but the custom model could not detect anything. Is your model detect well?
I had an ~ok performance as it shows in the test I did at the end of the video. Did you use the same dataset as I did?
@@ComputerVisionEngineer Thanks. It didn't work well because I downloaded a part of datasets.
how do you download the images in windows with subset
Hey, if you are looking for the data I used to train the pose detector in this tutorial, take a look at the Readme file of this project's repository. 🙌
How we will get text file
WARNING ⚠ no labels found in pose set, can not compute metrics without labels
I am getting this message and there is not error loss.
Sir I was able train it But the keypoints is not well placed I only used 80 images for training
Try with more images. How do the evaluation plots look like?
@@ComputerVisionEngineer ok👍🏿
Your videos are good, but its hard to listen your voice.
Thank you! Do you mean it is hard to understand the accent? Or is the audio volume too low?
How we get text file in label's
Do you mean the annotations? Do you have an annotated keypoints detection dataset you can use in this project? If not, you can download the one I am using in this video!
Для тех у кого ошибка "np.numpy array expected" и "images corrupted":
Все аннотации должны быть в формате numpy по типу 0.000000, а не 0.0000000000000.
Я исправил это добавив в скрипт "convert to coco" из видео следующее:
format(value_here, '.6f'), для всех значений где они записываются в файл.
Так что получившийся файл будет со всеми значениями в верном формате по типу 0.161122, 0.112343 вместо 0.23433111111112, и т.д.
---------------------
For those who have the error "np.numpy array expected" and "images corrupted":
All annotations must be in numpy format like 0.000000, not 0.0000000000000.
I fixed this by adding the following to the "convert to coco" script from the video:
format(value_here, '.6f'), for all values where they are written to the file.
So the resulting file will have all the values in the correct format like 0.161122, 0.112343 instead of 0.23433111111112, etc.