Detect Anything You Want with Grounding DINO | Zero Shot Object Detection SOTA

แชร์
ฝัง
  • เผยแพร่เมื่อ 14 มิ.ย. 2024
  • Discover Grounding DINO, a groundbreaking AI model that seamlessly locates objects in images and matches them with corresponding textual labels. Learn how Grounding DINO revolutionizes object detection and text recognition tasks. #GroundingDINO #ObjectDetection #AIModel #LanguageModel
    Chapters:
    00:00 Introduction
    01:29 Setting up the Python environment
    03:11 Loading GroundingDINO model
    04:38 Multimodal deep learning
    05:23 Running GroundingDINO on custom images
    06:21 Prompt engineering and object detection language constraints
    07:36 Multiclass detection
    09:12 More object detection language constraints
    11:19 Dataset auto labeling with Roboflow and GroundingDINO
    12:48 Outro
    Resources:
    🌏 Roboflow: roboflow.com
    🌌 Roboflow Universe: universe.roboflow.com
    📓 Grounding DINO notebook: github.com/roboflow/notebooks...
    🗞 Grounding DINO blog post: blog.roboflow.com/grounding-d...
    🗞 Grounding DINO arXiv paper: arxiv.org/abs/2303.05499
    💻 Grounding DINO repository: github.com/IDEA-Research/Grou...
    🎬 GPT 4: Will We Ever Train Again?: • GPT 4: Will We Ever Tr...
    🎬 CLIP: OpenAI's amazing new zero-shot image classifier: • CLIP: OpenAI's amazing...
    Stay updated with the projects I'm working on at github.com/roboflow and github.com/SkalskiP! ⭐
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 92

  • @pytagore3409
    @pytagore3409 ปีที่แล้ว +1

    Great video!
    Very cool AI! Looking forward to see this kind of AI detecting non-coco dataset objects and more specific things, with a prompt.

    • @Roboflow
      @Roboflow  ปีที่แล้ว

      Looks like we are moving towards that direction quite fast 💨

  • @zafirabdullah5488
    @zafirabdullah5488 ปีที่แล้ว +1

    Wow! Amazing! You are awesome. Easy to learn project any time😊

    • @Roboflow
      @Roboflow  ปีที่แล้ว

      Thanks a lot :) I try to do my best. Hearing that it works makes me happy.

  • @ziroks51
    @ziroks51 10 หลายเดือนก่อน +1

    thanks Piotr, I'm learning a lot from you

    • @Roboflow
      @Roboflow  9 หลายเดือนก่อน

      My pleasure!

  • @body1024
    @body1024 ปีที่แล้ว +2

    amazing world of computer vision
    keep us updated
    you are awesome

    • @Roboflow
      @Roboflow  ปีที่แล้ว +1

      Agree! And thanks a lot. Will do ;)

  • @temitopeibrahimamosa2885
    @temitopeibrahimamosa2885 ปีที่แล้ว

    Much weldone Piotr... As a special request, a video on how to train on custom dataset will be much appreciated

  • @haviduck
    @haviduck 3 หลายเดือนก่อน +1

    It's such an amazing repo

  • @_ABDULGHANI
    @_ABDULGHANI ปีที่แล้ว +1

    Thank you for that, very impressive.

    • @Roboflow
      @Roboflow  ปีที่แล้ว

      Thanks a lot 🙏🏻

  • @nanciszhao4831
    @nanciszhao4831 9 หลายเดือนก่อน +1

    Amazing work. I believe pre-encoding feature of images is a good idea. In this way, replacing text-prompt can also quickly search in the pre-encoding feature, which will have outstanding improvements in large-scale dataset automatic annotation and other work. Do developers have such plans?

  • @adolfusadams4615
    @adolfusadams4615 ปีที่แล้ว +3

    I was just as excited watching you work with GroundingDINO as you were working with it. Can you please use it to do real-time object detection with a live webcam?

    • @Roboflow
      @Roboflow  ปีที่แล้ว +2

      Thanks a lot 🙏🏻 Grounding DINO is unfortunately too slow to run in real time ;/ But I have a prototype of notebook to use it for tracking. But not real time. Static files.

  • @user-ks3vt1jn2z
    @user-ks3vt1jn2z 10 หลายเดือนก่อน

    fantastic,this moduel is so powerful and useful,3Q

  • @fifthperson9777
    @fifthperson9777 ปีที่แล้ว +2

    Wow, pretty cool

  • @user-st1rd9ur2k
    @user-st1rd9ur2k ปีที่แล้ว

    can we classify between same object of different type , what prompts should we give then? suppose we are trying to classify "severity of burns on skin" and we want to annotate the data with 'severe', 'moderate' or 'slight' burn. can it annotate that ?

  • @8w494
    @8w494 ปีที่แล้ว +1

    Love it.

  • @mbdesign8118
    @mbdesign8118 5 หลายเดือนก่อน

    If I understood correctly, it's still better for real-time object detections to use yolov8 for example than Grounding Dino, right? Especially if only 1-2 classes are to be detected

  • @dipayanpariksha1019
    @dipayanpariksha1019 8 หลายเดือนก่อน +1

    Hi , how can I finetune this Grounding Dino for object detection

  • @mariosconstantinou8271
    @mariosconstantinou8271 ปีที่แล้ว +1

    This is amazing work, well done! I was wondering what it's performance would look like for real-time crowd detection. Do you have any metrics for running inference on an embedded device such as Jetson?

    • @Roboflow
      @Roboflow  ปีที่แล้ว +1

      Unfortunately not, but given the fact that it is only 8fps on A100 GPU I wouldn’t count on any real time applications. I work on few demos. But there are other really cool use cases that I’m exploring.

    • @mariosconstantinou8271
      @mariosconstantinou8271 ปีที่แล้ว +1

      @@Roboflow Thank you for replying, I had a blast today playing with it and learning about it! Well done :)

    • @Roboflow
      @Roboflow  ปีที่แล้ว +1

      @@mariosconstantinou8271 I love to hear that. More cool toys based on Grounding DINO coming soon

  • @cyberhard
    @cyberhard ปีที่แล้ว +5

    Excellent! How does detection speed compare to YOLOv8? If the detector is not good in a certain domain, is it possible to retrain it to improve in that domain?

    • @Roboflow
      @Roboflow  ปีที่แล้ว +12

      The model is quite slow - 8 fps on A100. So it will not replace YOLO models in real-time scenarios. It can for sure partially automate the data annotation process. And yes it is possible to fine-tune the model on the selected dataset.

    • @evanshlom1
      @evanshlom1 ปีที่แล้ว

      ❤️

  • @jimvanvorst1696
    @jimvanvorst1696 ปีที่แล้ว +1

    Great video! My brain melted a little when I saw it detecting classes it was not trained on. Is the data all local, or does the inference use an API to reach out to the internet somewhere to get results? If it's all local, what is the total size cost? That is, instead of a pre-trained 40MB yolov8 model we now have... 400GB of something else?

    • @Roboflow
      @Roboflow  ปีที่แล้ว

      Good question. No the model works 100 local. You just need to download the weights file + (I guess) some additional weights for backbone and you are good to go. From that point you can shut down the internet. As for the size. I’m not sure how much space it takes on the machine but it fits on single GPU. So it can’t be anything crazy. (You can check that by running nvidia-smi command once again after you load the model. I’m AFK… co I can’t do it myself now.) But I remember that the static weights file is around 200MB.

  • @eliaweiss1
    @eliaweiss1 ปีที่แล้ว

    I tried to use it to segment a document containing tabular data, it didn't work at all, I tried different prompts, including: tables, lines, border lines, segmentation
    I also tried to explain what is that document
    it keeps returning the entire document

  • @danielrubio6220
    @danielrubio6220 ปีที่แล้ว +1

    Great video! so Is it possible to apply transfer learning to our own data set?

    • @Roboflow
      @Roboflow  ปีที่แล้ว

      Thanks a lot! Could you elaborate on the idea?

  • @gbo10001
    @gbo10001 ปีที่แล้ว +1

    wow great. do you think Robolow will embed this feature in the GUI?

    • @Roboflow
      @Roboflow  ปีที่แล้ว

      We are not yet sure. We need to explore more :) Stay tuned!

  • @siliconvision
    @siliconvision 10 หลายเดือนก่อน

    predict & predict_with_caption result is not working with your supervision box annotation... for Grounding DINO

  • @ihebchhaibi4831
    @ihebchhaibi4831 ปีที่แล้ว +1

    Amazing work thank you
    I trained YOLO V8 instance segmentation model and I 'm using it to predict Fish Species now I would like to predict fish freshness using zero shot learning can I get some help.
    thank you in advance.

    • @Roboflow
      @Roboflow  ปีที่แล้ว

      Best thing is to start with our notebook, upload some images of fish you want to detect and write some test queries. Just to get some idea how it performs.

  • @awakenwithoutcoffee
    @awakenwithoutcoffee 2 หลายเดือนก่อน

    great introduction to Grounding Dino. Some questions I have: how can we train our model to recognize different states of an item e.g. : a car that is missing a wheel, or has its windshield broken ?

    • @tongtongchen-qo6qj
      @tongtongchen-qo6qj 2 หลายเดือนก่อน

      DINO model as far as I know it is now just a recognizer not a generator

  • @user-hc2sc2de6y
    @user-hc2sc2de6y 9 หลายเดือนก่อน

    I am working on a project where I am using grounding Dino I want to automate the process of entering classes or text-prompt manually , I just want to pass the image and grounding Dino detects everything and extract the label similar like detGpt , is it possible ?
    please help

  • @lorenzoleongutierrez7927
    @lorenzoleongutierrez7927 ปีที่แล้ว +2

    Impressive !

    • @lorenzoleongutierrez7927
      @lorenzoleongutierrez7927 ปีที่แล้ว +1

      I just linked a dataset of images of wheat spikes and it's detecting them! I have a lot to learn because our work so far has been focused on the YOLO family, but this is completely fascinating and out of my scope. Thank you for sharing and please share your insights!

    • @SkalskiP
      @SkalskiP ปีที่แล้ว

      ​@@lorenzoleongutierrez7927 Hi I guess YOLO architecture is still helpful and needed. Grounding DINO is powerful but slow - only 8 fps on NVIDIA A100. But a strong zero-shot object detector like this can be used for example, to auto-annotate images. As for the theory behind the model I work on blog post right now. It should be out today/tomorrow. Could you share which dataset you used? Is it on Roboflow Universe?

  • @EzequielBolzi
    @EzequielBolzi ปีที่แล้ว +1

    Hi, nice video! I have a question If I wanted to use it to detect problems in, for example, wind turbines, would it be useful to me?
    King regards!

    • @Roboflow
      @Roboflow  ปีที่แล้ว +1

      It is hard to tell. Sounds like a very narrow and specific use case. I'd say take a few images that would be representable of the dataset that you expect to work with. And upload it to our Google Colab and test :)

    • @EzequielBolzi
      @EzequielBolzi ปีที่แล้ว

      @@Roboflow thank you, im gonna try to re order a 4 thousand database and i Will try to use another form

  • @SarahGarcia-mu2by
    @SarahGarcia-mu2by 2 หลายเดือนก่อน

    Hi ! I truly enjoyed yout video !! I am trying to use Grounding Dino (demo file) and Grounded Sam but I would like to use it without internet connexion ( the demo try to download bert base model from hugging face) and I would like to know how can deal with it ? I have tried to download the json file and to clone the bert model but I am really lost concerning what to add/modify to use it like that :(
    Thanks in advance !!

    • @SarahGarcia-mu2by
      @SarahGarcia-mu2by 2 หลายเดือนก่อน

      I forget to specify that I am currently using linux and Pycharm to run the file demo :)

  • @omoklamok
    @omoklamok ปีที่แล้ว +1

    Hi, thanks i corrected my problem by identifying the actual data. sir how can do this GroundingDINO usng count and live cam same video wity Track & Count Objects. i dont know the syntax for GroundingDINO

    • @Roboflow
      @Roboflow  ปีที่แล้ว

      Hi 👋🏻 Grounding DINO is to slow for real time processing unfortunately :/

  • @Legend01745
    @Legend01745 ปีที่แล้ว

    Great video, How can I localize or create bounding box on any object around us, I don't want to classify it, it will be used to avoid obstacle

    • @Roboflow
      @Roboflow  ปีที่แล้ว

      I’m afraid you’ll need to be more specific in your prompt. Did you experimented with our notebook? What was the result?

  • @petpo-ev1yd
    @petpo-ev1yd 3 หลายเดือนก่อน

    HI Bro,an error occurred:Error occurred when executing GroundingDinoModelLoader (segment anything):
    What should I do to deal this
    File "F:\Blender_ComfyUI\ComfyUI\execution.py", line 151, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
    File "F:\Blender_ComfyUI\ComfyUI\execution.py", line 81, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
    File "F:\Blender_ComfyUI\ComfyUI\execution.py", line 74, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
    File "F:\Blender_ComfyUI\ComfyUI\custom_nodes\comfyui_segment_anything
    ode.py", line 286, in main
    dino_model = load_groundingdino_model(model_name)
    File "F:\Blender_ComfyUI\ComfyUI\custom_nodes\comfyui_segment_anything
    ode.py", line 117, in load_groundingdino_model
    get_local_filepath(
    File "F:\Blender_ComfyUI\ComfyUI\custom_nodes\comfyui_segment_anything
    ode.py", line 111, in get_local_filepath
    download_url_to_file(url, destination)
    File "F:\Blender_ComfyUI\python_embeded\lib\site-packages\torch\hub.py", line 620, in download_url_to_file
    u = urlopen(req)

  • @1907hasancan
    @1907hasancan 11 หลายเดือนก่อน

    can ı work wıth my own dataset ın that notebook?

  • @user-zy5vm9rv6q
    @user-zy5vm9rv6q ปีที่แล้ว +1

    Is it possible to extract detections and boundling boxes as separate images ?

    • @Roboflow
      @Roboflow  ปีที่แล้ว

      Hi 👋🏻! It’s Peter from the video. Do you mean to crop original image using bounding boxes generated with Grounding DINO?

    • @user-zy5vm9rv6q
      @user-zy5vm9rv6q ปีที่แล้ว +1

      @@Roboflow Once you have detected objects and you have a bounding box, can you extract the bounding box as separate image ?

    • @Roboflow
      @Roboflow  ปีที่แล้ว

      @@user-zy5vm9rv6q you would need to write a small method to crop. We actually have open issue about that: github.com/roboflow/supervision/issues/75 we will add that soon to supervision, but you can take a look at code proposals here.

  • @payback80
    @payback80 ปีที่แล้ว +1

    awesome, how to save the annotated images in coco format?

    • @Roboflow
      @Roboflow  ปีที่แล้ว +1

      Right now you would need to write a ton of Python code to do that. But we actually plan to write simple utils to do that. Stay tuned ;)

    • @Georgehwp
      @Georgehwp ปีที่แล้ว

      @@Roboflow You can do this pretty quickly with voxel fiftyone. I may ping a PR onto the GroundingDino repo with this shortly.

    • @Georgehwp
      @Georgehwp ปีที่แล้ว +1

      Will submit PR to the repo for this by the end of the day.

    • @payback80
      @payback80 ปีที่แล้ว

      @@Georgehwp what are the main differences between cvat and voxel? never heard of voxel before

    • @Georgehwp
      @Georgehwp ปีที่แล้ว

      Very common to use voxel51 and cvat together

  • @servanson246
    @servanson246 ปีที่แล้ว

    Is there a version with DINO V2? Is DINO V2 for public usage?

    • @Roboflow
      @Roboflow  ปีที่แล้ว

      We don't have a video on DINOv2 yet, but we have this post as a starting point! blog.roboflow.com/what-is-dinov2/

    • @servanson246
      @servanson246 ปีที่แล้ว

      @@Roboflow 👍 I am already there. Thanks.

  • @Georgehwp
    @Georgehwp ปีที่แล้ว +1

    GroundingDino is ignoring my commas and aggregating the words in the full text prompts (anything the object looks like, combined into one string).

    • @Roboflow
      @Roboflow  ปีที่แล้ว

      Could you give me some examples?

    • @AG-sx9ws
      @AG-sx9ws 4 หลายเดือนก่อน

      GroundingDino is GERMAN?

  • @ritamonteiro2591
    @ritamonteiro2591 ปีที่แล้ว +1

    When I load Grounding DINO Model this error occurs:
    FileNotFoundError: [Errno 2] No such file or directory: '/content/weights/groundingdino_swint_ogc.pth'
    Would you help me?

    • @Roboflow
      @Roboflow  ปีที่แล้ว +1

      Make sure to run the notebook from top to finish. Looks like you skipped some cell.

    • @ritamonteiro2591
      @ritamonteiro2591 ปีที่แล้ว +1

      @@Roboflow I got it, thanks.

    • @Roboflow
      @Roboflow  ปีที่แล้ว

      @@ritamonteiro2591 no worries 😉

    • @ritamonteiro2591
      @ritamonteiro2591 ปีที่แล้ว +1

      @@Roboflow One question, can I use this script to classify other types of objects? For example, would I be able to separate oranges from apples?

    • @Roboflow
      @Roboflow  ปีที่แล้ว +1

      @@ritamonteiro2591 this use case is actually really simple. I’m sure it can distinguish between oranges and apples.

  • @nicolagobbo1191
    @nicolagobbo1191 ปีที่แล้ว +1

    got this error
    AttributeError Traceback (most recent call last)
    in ()
    22
    23 get_ipython().run_line_magic('matplotlib', 'inline')
    ---> 24 sv.show_frame_in_notebook(annotated_frame, (16, 16))
    AttributeError: module 'supervision' has no attribute 'show_frame_in_notebook'

    • @Roboflow
      @Roboflow  ปีที่แล้ว

      That one should be fixed already