Hi Karol, pretty impressive. I am sure that the kite issue can be tuned out. I was wondering; could all this data be saved? Like the object annotation and time? Perhaps in a text file.
Hey I'm new to the field of Convolutional Neural Network. I have a presentation in school on YOLO and I need some help. Can someone please explain how the output of the convolution layer works. The input to the first convolution network is a 448*448*3 tensor. And it's output is a 224*224*64 tensor on a filter of 7*7. I understand that the depth is 64 because of 64 different filters (features) Thank you!
filter size have no impact on output size. In your example you have valid padding, stride=2, 64 filters give you depth=64. check this vid from Udacity: th-cam.com/video/jajksuQW4mc/w-d-xo.html
Hi! Nice video! Do you have any experience with running YOLO or SSD on Nvidia Jetson (TX1 or TX2)? How does it perform? Can it be used to detect objects in realtime?
I have exp with running SSD on Samsung S7. I have TK1 and TX1, but completely no time to test it :-( Today and next few days I'm publishing vids with FPS info. I have below 30FPS with GTX1080. Net should be optimized for FP16, and zeroed many weights to speed up which you can do it with TF. I can recommend SSD in TF which I am using now - I will publish some vids, but don't know when...
I get 30-40 fps with SSD Mobilnet on my Huawei P30 Pro with tflite GPU delegate now. With Google Edge TPU on a Raspberry PI or TX2/Jetson Nano you can get 50fps+. Things are improving rapidly.
Hello, I'm new to YOLO, I have the script which detects objects in an image, and I want to use with a Video stream , I'm using Python ? Thank you so much
For python I recommend SSD with Tensorflow - check this tutorial pythonprogramming.net/training-custom-objects-tensorflow-object-detection-api-tutorial
Really impressed with your algorithm:) Ran it on NVIDIA JetsonTX2 - delivered an fps of 6.9. Can I run webcam-voc.sh for detection of 6 classes from 21? What are the corresponding changes can i make to implement the above?
how to determine the position of each object identified in the video in the output like this bus: 89% , Position : (0.032253, 0.110209), Height and Width : 0.063911, 0.072384
You need to multiply x,w times width of the image in pixels. Similarly for y,h of bbox (times height of the image) You will get coordinates in px (center x,y and width, height of the bbox)
@@KarolMajek As I said Im a noob Im just starting to learn all this. github.com/pavisj/YoloV3_video_colab/blob/master/Yolo_Darknet_Video_Without_Display.ipynb This is the notebook I was trying to modify to get the required outputs. If you will make one on Google Colab that would be Awesome.
Tak, tylko z tego co widzę YOLO używa się przez cmd. Potrzebuję napisać aplikację w c++ (ew. c#, python..), w której wykorzystam YOLO do wytrenowania klasyfikatora na moich samples, później w trakcie ew. douczać lub oduczać klasyfikator. Teoretycznie można by napisać apkę w c++, która używałaby cmd do uruchominia YOLO jednak ja potrzebuję zintegrować YOLO z opencv i np. sam rysować bounding boxes na podstawie tego co wykryło YOLO
źródło jest w c i wykorzystuje openCV do wczytywania/wyświetlania/zapisu. Integracja to raczej koszmar będzie. Co chciałbyś wykrywać jeśli mogę zapytać?
Potrzebuję stworzyć Detection based tracker'a, czy też innego tracker'a, który w trakcie śledzenia dowolnego, wybranego przez użytkownika obiektu będzie się go uczył, aby później po jego zgubieniu wykorzystać YOLO do ponownego jego wykrycia
You would need to measure the interval a physical pulse (like radio or sound wave) would take to get to the object and bounce back to the emitter. That's essentially how radar and sonar works. You can't do that with just a camera. It just captures light and doesn't emit anything.
Using single camera you can do it with structure from motion. If you have only the camera you will not have proper scale, but if you have odometry, or second camera as stereo pair you can estimate the distance. Of course you will get best results using Lidar, e.g. VLP16, or Ouster.
You can get distance with an RGBD camera such as Structure Core or Intel Realsense. They use active stereo. An IR illuminated pattern helps with the stereo calculations. They work best indoors though. My phone, Huawei P30 Pro has a time of flight sensor, which works really well at close range indoors AR frameworks give you camera position in the real world which allows you to track and count objects in 3D. I am working on that right now.
YOLO is published (www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf) this is open source by pjreddie.com/darknet/yolo/ Check new videos for more awesome results!
@@sohrabi.mohammadjavad you need to track detections - identify which detection is which on following frames. Check IoU tracker which check intersection over union
It is more difficult to do it with moving objects. However with fixed position objects and moving camera, it would be relatively easy, using a SLAM system, like what they have in Augmented reality frameworks like ARCore, coupled with a depth camera. For example an intel T265 tracking camera paired with a Realsense D415 would be able to accomplish counting of fixed position objects relatively easily. However object detection needs to be very fast. It needs to keep up with the camera framerate at least!
(can't see the actual video while I'm writing this) If there's FPS in top left corner, this is the real speed. This actual video is 30fps, because it's made from images after frame by frame prediction
so many kites? !!
10:20 Nice Stop sign :v that is dangerous
yes early days yet before they put the motor half onto it. and thats going to be pretty fun, but untrustworthy the same.
It does not detect the car in the front (Daewoo I think) at 2:00 :(
Too far for YOLO, but after few secs it is detected as a truck! Trained on COCO dataset net is not aware of Daewoo :-)
Probably Daewoo can't be considered as a car ))
wow, this is pretty impressive
This is really Impressive... Great work done
Dobrý den z České Republiky! :)
Dobrý den z Německa! :-)
can you tell how we can find angle ,Height and distance between detection object and user
Hi Karol, pretty impressive. I am sure that the kite issue can be tuned out. I was wondering; could all this data be saved? Like the object annotation and time? Perhaps in a text file.
Yes
Is this running in real time on your Laptop @ 5fps or is this post-processing? Super cool!
thanks! yes, it runs on a laptop @ 5fps as I remember. YOLO and SSD are fast. Check Faster RCNN NAS - it gives better results
Hey I'm new to the field of Convolutional Neural Network.
I have a presentation in school on YOLO and I need some help.
Can someone please explain how the output of the convolution layer works.
The input to the first convolution network is a 448*448*3 tensor. And it's output is a 224*224*64 tensor on a filter of 7*7.
I understand that the depth is 64 because of 64 different filters (features)
Thank you!
filter size have no impact on output size. In your example you have valid padding, stride=2, 64 filters give you depth=64.
check this vid from Udacity: th-cam.com/video/jajksuQW4mc/w-d-xo.html
Karol Majek thank you sir ! How does does the 448x448 give a output of 224x224, is there any particular formula?
stride is the key. output = input / stride. Yes, if stride is 1 you wiil have same size (for valid border)
Hi! Nice video! Do you have any experience with running YOLO or SSD on Nvidia Jetson (TX1 or TX2)? How does it perform? Can it be used to detect objects in realtime?
I have exp with running SSD on Samsung S7. I have TK1 and TX1, but completely no time to test it :-(
Today and next few days I'm publishing vids with FPS info. I have below 30FPS with GTX1080.
Net should be optimized for FP16, and zeroed many weights to speed up which you can do it with TF.
I can recommend SSD in TF which I am using now - I will publish some vids, but don't know when...
Thanks! What about this S7? How did it perform?
th-cam.com/video/1togQecjpzo/w-d-xo.html
I get 30-40 fps with SSD Mobilnet on my Huawei P30 Pro with tflite GPU delegate now.
With Google Edge TPU on a Raspberry PI or TX2/Jetson Nano you can get 50fps+. Things are improving rapidly.
Hello, I'm new to YOLO, I have the script which detects objects in an image, and I want to use with a Video stream , I'm using Python ? Thank you so much
For python I recommend SSD with Tensorflow - check this tutorial pythonprogramming.net/training-custom-objects-tensorflow-object-detection-api-tutorial
Are you using OpenCV for anything?
There are still a lot of misses, especially when the car is right in front of the camera.
Check new videos - especially Faster RCNN NAS which will be published December 6, 2017
Hello dear i need the video that you use for detection? your drive link is not working.
It's available on archive.org
Google: archive karol majek dataset
Really impressed with your algorithm:)
Ran it on NVIDIA JetsonTX2 - delivered an fps of 6.9.
Can I run webcam-voc.sh for detection of 6 classes from 21?
What are the corresponding changes can i make to implement the above?
Thanks, that's not mine.
Do you want to speed it up? No chance :-)
You can try to use SSD Mobilenet - check my video with TX1: goo.gl/gpoC38
What are the dependencies and GPU needed
The unprocessed raw video is no longer available :(
Here's the second video from the set archive.org/details/0002201705192
@@KarolMajek Thank you!
how to determine the position of each object identified in the video in the output like this
bus: 89% , Position : (0.032253, 0.110209), Height and Width : 0.063911, 0.072384
You need to multiply x,w times width of the image in pixels.
Similarly for y,h of bbox (times height of the image)
You will get coordinates in px (center x,y and width, height of the bbox)
@@KarolMajek I'm a noob so can you atleast point me to the right direction or just help me with lines of code. I'm doing this on Google Colab btw
@@killerofothers can you share your notebook with everyone or not? Maybe it's the time to record a tutorial
@@KarolMajek As I said Im a noob Im just starting to learn all this.
github.com/pavisj/YoloV3_video_colab/blob/master/Yolo_Darknet_Video_Without_Display.ipynb
This is the notebook I was trying to modify to get the required outputs. If you will make one on Google Colab that would be Awesome.
Is there an app on iPhone that’s like this?
Lot's of kites an birds in the video. ;-)
Because those kind birds are not that kind birds ;-)
Dzień dobry, czy istnieje jakaś implementacja YOLO dla języka c++?
Marcin Chmiel A to pjreddie.com/darknet/yolo/ ?
Tak, tylko z tego co widzę YOLO używa się przez cmd. Potrzebuję napisać aplikację w c++ (ew. c#, python..), w której wykorzystam YOLO do wytrenowania klasyfikatora na moich samples, później w trakcie ew. douczać lub oduczać klasyfikator. Teoretycznie można by napisać apkę w c++, która używałaby cmd do uruchominia YOLO jednak ja potrzebuję zintegrować YOLO z opencv i np. sam rysować bounding boxes na podstawie tego co wykryło YOLO
źródło jest w c i wykorzystuje openCV do wczytywania/wyświetlania/zapisu. Integracja to raczej koszmar będzie. Co chciałbyś wykrywać jeśli mogę zapytać?
Potrzebuję stworzyć Detection based tracker'a, czy też innego tracker'a, który w trakcie śledzenia dowolnego, wybranego przez użytkownika obiektu będzie się go uczył, aby później po jego zgubieniu wykorzystać YOLO do ponownego jego wykrycia
Marcin Chmiel Wow! online będzie ciężko/może się nie udać. Powodzenia!
Can you calculate distance to the objects too?
You would need to measure the interval a physical pulse (like radio or sound wave) would take to get to the object and bounce back to the emitter. That's essentially how radar and sonar works.
You can't do that with just a camera. It just captures light and doesn't emit anything.
Using single camera you can do it with structure from motion. If you have only the camera you will not have proper scale, but if you have odometry, or second camera as stereo pair you can estimate the distance.
Of course you will get best results using Lidar, e.g. VLP16, or Ouster.
You can get distance with an RGBD camera such as Structure Core or Intel Realsense. They use active stereo. An IR illuminated pattern helps with the stereo calculations.
They work best indoors though.
My phone, Huawei P30 Pro has a time of flight sensor, which works really well at close range indoors
AR frameworks give you camera position in the real world which allows you to track and count objects in 3D. I am working on that right now.
thank you for uploading the video ..
do you have any idea how to realize this system to be realtime with better accuracy and computational speed?
Faster - SSD
better accuracy - Deformable R-FCN
thank tou for the answer, it helps me..
what is SSD?
Single Shot Multibox Detector
github.com/tensorflow/models/tree/master/research/object_detection
Wowww !!! super coollll
I want to implement. Objects detection on images...can u help me?
Custom object types?
Check YOLOv3 pretrained on COCO (80 categories) or on Open Images v4 (500 categories)
hello ,
how can I reach you ?
please...
how to create this software in c#
Hi, how did u do this, really I need learn it.
This is opensource github.com/karolmajek/darknet based on pjreddie.com/darknet/yolo/
Nice video! Could I use some clips from this video for my company youtube channel?
Yes, you can. Can you share your video later? Or write me an email?
You can easily find better videos on my channel, this one is pretty old
@@KarolMajek Thank you so much :)) If I upload video on my company channel, then I will share it for you.
@@abcdeunb Thank you!
This work is published or not in a paper
YOLO is published (www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf) this is open source by pjreddie.com/darknet/yolo/
Check new videos for more awesome results!
is it possible to count the cars ?
Yes, it is!
@@KarolMajek i like to add counting to this project but i am not advanced in . if possible please guide me to start
@@sohrabi.mohammadjavad you need to track detections - identify which detection is which on following frames. Check IoU tracker which check intersection over union
It is more difficult to do it with moving objects. However with fixed position objects and moving camera, it would be relatively easy, using a SLAM system, like what they have in Augmented reality frameworks like ARCore, coupled with a depth camera.
For example an intel T265 tracking camera paired with a Realsense D415 would be able to accomplish counting of fixed position objects relatively easily.
However object detection needs to be very fast. It needs to keep up with the camera framerate at least!
amazing
+imran shaikh thanks!
Is it real time? Or are you using a pre recorded video.?
+Saurabh bajaj This is not real time. This is original video speed 30fps. Actual detection is slower
Here you can see performance on GTX1080: th-cam.com/video/-ESq7KWRlEg/w-d-xo.html
Dear Mr.karol I want to do this work. Can you help me. It's my dream work . Can u help me please.
Did it need nividia gpu
Yes, recent NVIDIA GPU
how you run at 30 fps?
(can't see the actual video while I'm writing this)
If there's FPS in top left corner, this is the real speed.
This actual video is 30fps, because it's made from images after frame by frame prediction
Can any one do a project for me on helmet detection. I will pay for it
how much?
recapatcha has a problem
1:54 WWL 5187C fiat ducato
牛逼