If I get an early reply, it would be very helpful, output feature size is 19x19. How they will create a label this downsampled size. How it mapped on to the original size.
19X19 gives information about each grid cell. So each cell corresponds to a pixel block, where one can find if there is an object present or not, using the confidence score. And if the confidence score is high, it predicts the class of the object, by multiplying confidence score with class score. for that pixel block [ cell]. Then for object of the predicted class, it takes into account the box estimation [anchor box], centre height and width. Finally for each grid cell, it gives you object class, and its bounding boxes, if the object is predicted in the box. Hope this is clear.
it's one of the best explanations I have seen. Loved your explanation :) . can you tell me how we get 19X19 as output with a 304X304 input image and 16X16 grid size?
Sorry for the late reply, I was busy with a training programme with 1000+ participants on Machine Learning for computer vision. It is simple, 304/16 =19 on x and y direction.
What threshold are you talking about ? If it is related to IOU, we generally give a threshold of 0.5 or 0.6. If the data is having very complex samples, then even a small threshold will do.
Useful for my project 👏
Thank you for the video.
You're welcome
If I get an early reply, it would be very helpful, output feature size is 19x19. How they will create a label this downsampled size. How it mapped on to the original size.
19X19 gives information about each grid cell. So each cell corresponds to a pixel block, where one can find if there is an object present or not, using the confidence score. And if the confidence score is high, it predicts the class of the object, by multiplying confidence score with class score. for that pixel block [ cell]. Then for object of the predicted class, it takes into account the box estimation [anchor box], centre height and width. Finally for each grid cell, it gives you object class, and its bounding boxes, if the object is predicted in the box.
Hope this is clear.
it's one of the best explanations I have seen. Loved your explanation :) . can you tell me how we get 19X19 as output with a 304X304 input image and 16X16 grid size?
Sorry for the late reply, I was busy with a training programme with 1000+ participants on Machine Learning for computer vision. It is simple, 304/16 =19 on x and y direction.
Good explanation, clear enough with suitable example. it will be great if you can share the slides.
Thanks, appreciate it.
how to give threshold for for 7 x 7 grid ?
What threshold are you talking about ? If it is related to IOU, we generally give a threshold of 0.5 or 0.6. If the data is having very complex samples, then even a small threshold will do.
@@aparajitaojha807 IOU mam thanks now it's clear
Hello mam can u explain that if a image contains 4 objects what is the label vector for that?
For 4 objects, the label will be like this. [ 1, x1,y1, h 1,w1, this is for object 1, 1, x2, y2, h2, w2, ...., 1, x4,y4, h4, w4]
Simple and nice explanation
Thanks for liking
hlo mam......can u explain how to develop and execute a code for object detection using cnn
Yes, sure
loved it
Thanks, I appreciate.