YOLO-World: Real-Time, Zero-Shot Object Detection Explained

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 พ.ย. 2024

ความคิดเห็น • 106

  • @Sawaedo
    @Sawaedo 3 หลายเดือนก่อน +1

    Great Video! I didn't have time to read the YOLO World paper completely, or even test it, but with the video I can understand a lot of its architecture, and it's performance! Thank you Peter for explaining in such a great way!

    • @Roboflow
      @Roboflow  3 หลายเดือนก่อน

      pleasure to read comments like this!

  • @abdshomad
    @abdshomad 9 หลายเดือนก่อน +4

    As always, the content is well delivered. Thank you for always share the knowledge 👍

    • @SkalskiP
      @SkalskiP 9 หลายเดือนก่อน

      my pleasure!

  • @big_zzzzz
    @big_zzzzz 9 หลายเดือนก่อน +1

    Priceless info!

  • @sumukharaghavanm6466
    @sumukharaghavanm6466 9 หลายเดือนก่อน +1

    Great solution for students
    Thanks a lot!!!!

  • @uttamdwivedi7709
    @uttamdwivedi7709 9 หลายเดือนก่อน +8

    Great work !!! Could you provide a tutorial on how to train (finetune) this YOLO-World model on specific type of data?

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน +12

      I'll think about it. If enough people are interested we could at least write a blog.

    • @elhadjmeguellati3031
      @elhadjmeguellati3031 9 หลายเดือนก่อน

      interested and thanks for the very usefull content
      @@Roboflow

    • @scottsharp142
      @scottsharp142 9 หลายเดือนก่อน

      Yes, would love too see this as well. Thanks for great content.

    • @smartfusion8799
      @smartfusion8799 8 หลายเดือนก่อน

      Yes please, is it possible to run a fine tuned /light version on a edge device?

    • @viniciusgardim5154
      @viniciusgardim5154 3 หลายเดือนก่อน

      @@Roboflow do it, please

  • @yzamari
    @yzamari 3 หลายเดือนก่อน

    Great video! very informative!

  • @KarenWeissKarwei
    @KarenWeissKarwei 9 หลายเดือนก่อน +1

    Great video, informative and understandable. Thank you!

  • @LukasSmith827
    @LukasSmith827 9 หลายเดือนก่อน +2

    the best to ever do it

    • @SkalskiP
      @SkalskiP 9 หลายเดือนก่อน

      haha you are too nice! But thanks!

  • @wolpumba4099
    @wolpumba4099 6 หลายเดือนก่อน +1

    *YOLO-World Explained: A Bullet List Summary with Timestamps*
    *What is YOLO-World? (**0:00**)*
    * A cutting-edge, zero-shot object detection model that's 20x faster than predecessors. (0:24)
    * Uses a "prompt-then-detect" paradigm to achieve speed, encoding prompts offline and reusing embeddings. (2:26)
    * Leverages a faster CNN backbone and streamlined architecture for increased efficiency. (2:57)
    * Outperforms previous zero-shot detectors (like GroundingDINO) in terms of speed while maintaining accuracy. (2:12)
    *Advantages of YOLO-World:*
    * No need for custom dataset training for object detection. (0:42)
    * Real-time video processing capabilities (up to 50 FPS on powerful GPUs). (9:22)
    * Can incorporate color and position references in prompts for refined detection. (10:16)
    *Limitations of YOLO-World (**13:16**):*
    * Still slower than traditional real-time object detectors. (13:34)
    * May be less accurate than models trained on custom datasets, especially in uncontrolled environments. (13:51)
    * Can misclassify objects, particularly with low-resolution images or videos. (14:19)
    *Using YOLO-World Effectively (**5:33**):*
    * Experiment with different confidence thresholds for optimal results. (7:14)
    * Utilize non-max suppression (NMS) to eliminate duplicate detections. (8:07)
    * Filter detections based on relative area to remove unwanted large bounding boxes. (11:04)
    * Combine with FastSAM or EfficientSAM for zero-shot segmentation tasks. (15:21)
    *Beyond the Basics (**15:08**):*
    * YOLO-World opens possibilities for open-vocabulary video processing and edge deployment. (15:10)
    * Potential for advanced use cases like background removal, replacement, and object manipulation in video. (15:43)
    i used gemini 1.5 pro to summarize the transcript.

    • @Roboflow
      @Roboflow  6 หลายเดือนก่อน

      Curious how did you do it

    • @wolpumba4099
      @wolpumba4099 6 หลายเดือนก่อน

      @@Roboflow I used the prompt "create bullet list summary: ". Then another prompt "add starting (not stopping) timestamps".

  • @elvenkim
    @elvenkim 9 หลายเดือนก่อน +1

    Hi Pieter! Great delivery, love the final video on YOLO + SAM. May I check with you on how do we extract the coordinate of the bounding box?

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน +1

      In my code just access detections.xyxy :)

    • @elvenkim
      @elvenkim 9 หลายเดือนก่อน

      @@Roboflow many thanks Pieter!

  • @Codewello
    @Codewello 9 หลายเดือนก่อน +1

    Awesome as always! I have learned a lot from you, especially about supervision Also, I love the thumbnail.
    You look like you're saying 'come at me, bro 😁😁

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน

      Glad to hear it!

  • @froukehermens2176
    @froukehermens2176 9 หลายเดือนก่อน +2

    Can you use YOLO-world + SAM to annotate images for training a (faster) object detector? (or image segmentation - maybe even pose estimation?).

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน +1

      Yes you can! Some time ago we showed how to do it with Grounding DINO + SAM combo: th-cam.com/video/oEQYStnF2l8/w-d-xo.htmlsi=JzsB_leYOXbGtiGL

    • @Amir-vn2wx
      @Amir-vn2wx 9 หลายเดือนก่อน +1

      @@Roboflow This is awesome!

  • @richarddjarbeng7093
    @richarddjarbeng7093 8 หลายเดือนก่อน +1

    Cool tutorial. I have 2 questions.
    1. Is there a list of classes that the model can detect? For instance if I want to detect 'yellow tricycles' but I am not sure if the model knows tricycles where can I check this.
    2. How do you use this for semantic segmentation? You showed this briefly for the suitcases and croissants but you didn't go into the details.

    • @Roboflow
      @Roboflow  8 หลายเดือนก่อน

      There is no list… You need to experiment. But that’s easy. All you need to do is use HF space: huggingface.co/spaces/stevengrove/YOLO-World
      You need to use boxes coming from YOLO-World to prompt SAM. Take a look at the code here. Few months ago we showed how to combo GroundongDINO + SAM combo: th-cam.com/video/oEQYStnF2l8/w-d-xo.html

    • @richarddjarbeng7093
      @richarddjarbeng7093 8 หลายเดือนก่อน

      @@Roboflow Will check it out. Thanks for the quick response

  • @nazaruddinnurcharis598
    @nazaruddinnurcharis598 7 หลายเดือนก่อน +1

    Good information, whether this Yolo can be used to detect objects in realtime using a camera?, because I am in a project to develop Yolo for use in realtime cameras that I plan to use on my farm to detect predators.

  • @jimshtepa5423
    @jimshtepa5423 6 หลายเดือนก่อน

    have you done any video on training a model for custom dataset?

  • @nidalidais9999
    @nidalidais9999 6 หลายเดือนก่อน

    hi man , good work , what the difference between YOLO-World and T-REX model , and how to compare between models usually

  • @avamaeva7999
    @avamaeva7999 9 หลายเดือนก่อน +1

    This is a game changer, but it needs to work on mobile to be of real use in my setting? Two questions please:
    1 - Can quantizisation be used on this model to make it much quicker, perhaps to a level where it will work in real time (at least 10fps) on state of the art phones (eg iPhone 15)?
    2 - Can the model be run through the TFLite Converter? If not, any ideas whether that facility might be introduced?
    Many Thanks

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน +1

      Good questions. As far as I know no quantized version was yet released. I’ll try to reach out to authors and ask.

  • @99develop80
    @99develop80 8 หลายเดือนก่อน +1

    Thank you for the video! I have a question. What do you call the technology that uses YOLO-world + Efficient SAM in the back of the video to switch from detection to segmentation along the baseline? Or is there a way to implement it?

    • @Roboflow
      @Roboflow  8 หลายเดือนก่อน

      I use Gradio library to build those interactive demos.

  • @TUSHARGOPALKA-nj7jx
    @TUSHARGOPALKA-nj7jx 6 หลายเดือนก่อน

    Do we have a yolo-v8 model trained on the ade20k dataset? If not, how would one do it?

  • @alaaalmazroey3226
    @alaaalmazroey3226 9 หลายเดือนก่อน +1

    Can YOLO-world detect the road area from dash camera accurately? As i need to detected for autonomous vehicle

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน

      I recommend you try with your own images here: huggingface.co/spaces/stevengrove/YOLO-World

  • @alaaalmazroey3226
    @alaaalmazroey3226 9 หลายเดือนก่อน +1

    Hi, does YOLO-world can detect object (e.g. houses) perfectly from geospatial images?

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน

      I tested. I’m afraid not ;/

  • @misaeldavidlinareswarthon190
    @misaeldavidlinareswarthon190 9 หลายเดือนก่อน +1

    Impressive !!!! ... I have a quiestion
    So for maximun speed I still have to use Yolov8 or yolo-world have less latency with coustom dataset

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน

      If you need a model that runs in real-time or faster you still need to train object detector on custom datasets. It does not need to be YOLOv8.

  • @nourabdou4118
    @nourabdou4118 9 หลายเดือนก่อน +1

    Thank you, very informative. I've a question regarding the prompts, Does it support and understands things like "Red Zones" or "Grey Areas" ?
    I've tried to use it on maps and I was trying to identify grey areas or red areas but it doesn't work. Is there any workaround? thank you again!

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน +1

      hard to say without looking at the exact image. zone or area sounds very general :/ Is there any chance you could look for a gray rectangle or circle? I'm thinking of something more precise. And I assume you need a very low confidence threshold to do it anyway.

    • @nourabdou4118
      @nourabdou4118 9 หลายเดือนก่อน

      @@Roboflow It works and obviously it's not correct 100% but It works which's good, thank you so much

  • @rajeshktym
    @rajeshktym 9 หลายเดือนก่อน +1

    Hi, is it a good suggestion to use YOLO-World for apple grade detection? A global shutter 2MP camera will capture 5 apples in the same position in a single frame (apple cup conveyor with trigger). We need to find bounding box of each apple and the classification result like grade A or grade B. What may be the maximum time required to obtain grade and boundary box information for each apple using jetson Nano.

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน

      I think you can always spend few minutes to try. Like I said in the video: don’t be afraid to experiment, but be prepared that in your use case you might still need to train model on custom dataset.
      During my tests conveyor object detection usually worked really well. At least if objects do not occlude each other. That’s why I feel quite confident that detection part will work. I’m worried about classification.

  • @potobill
    @potobill 6 หลายเดือนก่อน

    is there a C++ version? Is the C++ version faster or the same speed?

  • @Kalyani-k7b
    @Kalyani-k7b 9 หลายเดือนก่อน +1

    Is this helpful in detecting the damaged object in real time??

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน

      Probably depends on type of object and type of damage, but I think yes.

    • @Kalyani-k7b
      @Kalyani-k7b 9 หลายเดือนก่อน +1

      ​@@Roboflow Thank you. Let's consider the example of suitcases and backpacks shown in the video. Can this technology be useful for detecting damage in them?

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน

      @@Kalyani-k7b I'll try to answare this question during community session

  • @vipulpardeshi2868
    @vipulpardeshi2868 9 หลายเดือนก่อน +2

    Hey , I just want to know , Is there any method to use Roboflow models on Offline Projects . Because by using API inferencing is very slow and I want fast detections.Is there any way to save the model .pt file and use it later without alsways importing Roboflow workspace. Thanks❤

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน +2

      Absolutely! You can use inference pip package to run any model from Roboflow on your local machine. You only need internet during the first run to download it. Then it is cached locally and you can run it offline.

    • @vipulpardeshi2868
      @vipulpardeshi2868 9 หลายเดือนก่อน

      Ok thanks for the reply , you guys are the best

  • @sreekanthreddy6979
    @sreekanthreddy6979 8 หลายเดือนก่อน +1

    how to do this with web camera ?

  • @baseerfarooqui5897
    @baseerfarooqui5897 8 หลายเดือนก่อน +1

    hi very informatic video i am getting this error while running code "AttributeError: type object 'Detections' has no attribute 'from_inference. i am using on my local system

    • @Roboflow
      @Roboflow  8 หลายเดือนก่อน

      What version of supervision you have installed?

  • @jkjhkjhkjhkjpopoipofsi
    @jkjhkjhkjhkjpopoipofsi 9 หลายเดือนก่อน +1

    Hi, is there a way to count the time of objects in zone

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน

      Yup. It is on out list of videos that are coming really soon!

  • @iconolk7338
    @iconolk7338 5 หลายเดือนก่อน +1

    I want to use this project. It works on the hugging face, but strangely it doesn't fit my environment, it doesn't work on my PC.
    I want to "clone" that on the hugging face, is there a way?

    • @Roboflow
      @Roboflow  5 หลายเดือนก่อน

      Yes. HF Spaces work like git. You can clone entire project to your local.

  • @alaaalmazroey3226
    @alaaalmazroey3226 9 หลายเดือนก่อน +1

    Hi, Does yolo-world + SAM work well to segment all the cars and trucks perfectly in the video scenes when there is a very crowded in the road? If not what do you suggest? Thsnks

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน

      If you plan to detect cars, just use any of models pre trained on COCO. You do not need zero shot detection to find cars :)

    • @DDDprinting
      @DDDprinting 6 หลายเดือนก่อน

      ​@@RoboflowDo you have a recommendation for a camera for this kind of work?

  • @TUSHARGOPALKA-nj7jx
    @TUSHARGOPALKA-nj7jx 6 หลายเดือนก่อน

    Would Yolo-world-m or s version run in ms on a CPU?

  • @paulpolizzi3421
    @paulpolizzi3421 9 หลายเดือนก่อน +1

    can this work on my kids soccer videos?

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน

      It probably can. But soccer is pretty standard use-case. YOLOv8 or other typical detector is probably a much better choice for you.

  • @isaac10231
    @isaac10231 9 หลายเดือนก่อน +1

    Can this be run locally on an rtx card? Or at least, how do we run this locally,?

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน

      Absolutely! I think you can easily run it on RTX.

  • @novandaardhi7867
    @novandaardhi7867 9 หลายเดือนก่อน +1

    can this integrated with ros2 using Nvidia Jetson Nano?

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน

      We are going to test Jetson deployments internally soon, but I can already tell you that it will be pretty hard to run it on the Nano board. Xavier / Orin sounds a lot more realistic.

    • @novandaardhi7867
      @novandaardhi7867 9 หลายเดือนก่อน

      thanks, maybe I can consider using Orin to run it, I'll wait for you to do a test on Jetson

  • @khalidalsinan3768
    @khalidalsinan3768 8 หลายเดือนก่อน +1

    in the huggingface website, when i upload a video, it outputs a video of 2 seconds only. Anyone knows how to fix this?

    • @Roboflow
      @Roboflow  8 หลายเดือนก่อน

      We need to prevent long video processing , because it makes other users wait longer.

    • @Roboflow
      @Roboflow  8 หลายเดือนก่อน

      You would need to clone the space and make it process longer files.

    • @KhalidAlsinan
      @KhalidAlsinan 8 หลายเดือนก่อน

      @@Roboflowhow do I “clone” it?

  • @abdshomad
    @abdshomad 9 หลายเดือนก่อน

    Yesterday I tried to detect red, yellow, green traffic light. It still did not recognize the color. Any specific guide on how to identify color?

    • @atharvpatawar8346
      @atharvpatawar8346 9 หลายเดือนก่อน +1

      If it’s able to detect the individual traffic lights, get the bounding boxes and use clustering to find the majority colour within that box

    • @abdshomad
      @abdshomad 9 หลายเดือนก่อน

      @@atharvpatawar8346 currently it can't. It will detect the whole lights. Even I tried to change the prompt to : circle, box, bulb, still not possible. Maybe have to apply 2nd classifier?

    • @SkalskiP
      @SkalskiP 9 หลายเดือนก่อน +1

      @@abdshomad I'd say iy you need to use YOLO-World and second level classifier it is probably not wort this.

    • @SkalskiP
      @SkalskiP 9 หลายเดือนก่อน +1

      @@abdshomad which version of model did you used?

  • @rafaelsetyan1755
    @rafaelsetyan1755 9 หลายเดือนก่อน +1

    Has anybody tried this model in UAV/Drone data, is it accurate? It might be possible to export onnx and to do inference in C++, isn't it?

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน +1

      The only test I made on drone footage was "lake detection". But that was a large object; you are probably considering detecting smaller objects.
      As for ONNX export, yes, export is possible, but (as far as I know) once you export your text prompt is frozen.

  • @g.s.3389
    @g.s.3389 9 หลายเดือนก่อน +1

    wow

  • @zdong2483
    @zdong2483 8 หลายเดือนก่อน

    report issue when running note book on Mar 23, 2023, have to use !pip install -q ultralytics==8.1.30, otherwise fail.

    • @Roboflow
      @Roboflow  8 หลายเดือนก่อน

      I’m not sure what you mean, but I just tested the code and everything works.

  • @polnapanda4934
    @polnapanda4934 9 หลายเดือนก่อน

    After couple of hours working on google colab It cuts almost all performance, deletes data and says that i can buy gpu power

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน

      Sorry to hear that. Google Colab is free, but only up to a certain point :/

    • @polnapanda4934
      @polnapanda4934 9 หลายเดือนก่อน +1

      @@RoboflowYep :c i was training my model and it deleted all progress after 4 hours of training

  • @chandanchakma2875
    @chandanchakma2875 5 หลายเดือนก่อน

    i want to learn AI .please make a playlist ..

  • @hanma9249
    @hanma9249 4 หลายเดือนก่อน

    GG

  • @vishwamgupta-n6k
    @vishwamgupta-n6k 9 หลายเดือนก่อน +2

    It is not working well when object size is less, GROUDING DINO Working well than Yolo-World.

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน +1

      I think it all depends on specific cases. What do you meant by “object size is less”?

    • @vishwamgupta-n6k
      @vishwamgupta-n6k 9 หลายเดือนก่อน +1

      @@Roboflow I mean when object is far away in image. Yolo world could not detect as many objects as GROUNDING DINO Could in such situation.

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน +1

      @@vishwamgupta-n6k have you tried lower confidence threshold?

    • @vishwamgupta-n6k
      @vishwamgupta-n6k 9 หลายเดือนก่อน

      @Roboflow yes tried that too, but still, the performance of GROUNDING DINO was superior. It could detect objects on more images than Yolo-world.

    • @science_electronique
      @science_electronique 9 หลายเดือนก่อน +1

      groundino is more accurate

  • @渣渣辉-o7o
    @渣渣辉-o7o 2 หลายเดือนก่อน

    😂

  • @netq254
    @netq254 8 หลายเดือนก่อน

    "Cheap Nvidia T4" £1000 is not cheap bro

    • @Roboflow
      @Roboflow  8 หลายเดือนก่อน +2

      Compared to A100 or H100 it is ;) but what I meant is just using T4 on AWS.

    • @netq254
      @netq254 8 หลายเดือนก่อน

      @@Roboflow Holy hell you're right! I didn't realise how expensive these cards are!