How Computer Vision Works

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 ก.ย. 2024
  • The Google Cloud Vision and Video Intelligence APIs give you access to a pre-trained machine learning model with a single REST API request. But what do those pre-trained models look like behind the scenes? In this video we'll uncover the magic of computer vision models by breaking down how Convolutional Neural Nets work under the hood, and we'll end with a live demo of the Vision API.
    Learn more here!
    How CNNs work → goo.gl/W51CGk
    How RNNs work → goo.gl/I7RChj
    Cloud Vision API → goo.gle/2ND7eMP
    Cloud Video Intelligence API → goo.gle/3t7lTQP
    Subscribe to the Google Cloud Platform channel → goo.gl/S0AS51

ความคิดเห็น • 113

  • @sunitaskitchen6335
    @sunitaskitchen6335 3 ปีที่แล้ว +130

    most informative ad i have ever seen😏

  • @IgorSwxy
    @IgorSwxy 3 ปีที่แล้ว +78

    The outcome of this video: don't dare to understand how computer vision works, just use our API :)

  • @ozzyfromspace
    @ozzyfromspace 5 ปีที่แล้ว +32

    Building your own computer vision system is, frankly, much more satisfying. And you control every aspect of the technology.

    • @mugodavid6997
      @mugodavid6997 2 ปีที่แล้ว +2

      Is it possible to come up with a model that will take images of an item and compare that image with an already stored inage of the same in the database?

    • @yishayhazan1040
      @yishayhazan1040 ปีที่แล้ว +2

      easier said than done.

    • @brianmaugo8768
      @brianmaugo8768 ปีที่แล้ว

      Right?

  • @amrith007
    @amrith007 6 ปีที่แล้ว +149

    Honestly, Google didn't get the data. Google gave us Google photos with unlimited storage and we uploaded millions of photos from millions of users. You got us free photos backup to we give you billions of photos to analyse.

    • @offchan
      @offchan 5 ปีที่แล้ว +11

      Yes, they get photos, but the photos don't have labels. Thus, it's very hard to make smarter algorithms. In traditional settings, you require labels like "person", "cat", etc in order to train the system.

    • @offchan
      @offchan 4 ปีที่แล้ว +9

      ​@Rizwan Bhatti Slaves is one of the ways to label the data. Sites like MTurk are common for them to find labelers. Many researchers use this website to label their dataset including the famous ImageNet. But you don't have to label all the data. After you have trained the system to predict accurately, you can use the system's prediction to guide labeler to label faster. E.g. given a cat image, the system will predict cat and the labeler only needs to crosscheck if it's wrong or correct.
      Also they can use the unlabeled data to do unsupervised learning. I know the guys at Google are probably smart so they will know how to utilize the remaining tremendous amount of unlabeled data.

    • @MrBemnet1
      @MrBemnet1 3 ปีที่แล้ว +2

      @@offchan in deep learning you don't need labels

    • @offchan
      @offchan 3 ปีที่แล้ว +14

      @@MrBemnet1 That's wrong. I'm a machine learning engineer. Deep learning still require labels unless you are doing unsupervised learning (which is not that good yet).

    • @uditysingh1316
      @uditysingh1316 3 ปีที่แล้ว

      Yes you are correct

  • @TheAIEpiphany
    @TheAIEpiphany 4 ปีที่แล้ว +23

    There are some inaccuracies in the video but it does paint the general picture of the computer vision.
    Some errors:
    2:55 low-level patterns (like edges), high-level ones are higher level in the semantical sense so say edges -> circles -> eyes -> head going up the network layer hierarchy.
    4:40 temporally sensitive model doesn't have to be RNN it can be say C3D (convolutional 3D model)

  • @benjaminy.
    @benjaminy. 5 หลายเดือนก่อน

    Thank for your kind explanation. This is one of the best product introduction video that I’ve watched for quite a while. You combined the theory of machine learning - CNN, RNN into your digital products. Keep up the good work. As a novice of machine learning, I hope to learn more from you.

  • @micalopes1
    @micalopes1 4 ปีที่แล้ว +11

    I wanna cry. Amazing explanation 😍👏🏽

  • @lincolnwang6774
    @lincolnwang6774 5 ปีที่แล้ว +18

    Computer vision is not just Machine learning ok? This video gives me a feeling machine learning is the only way how computer vision works.

  • @karanacharya18
    @karanacharya18 5 ปีที่แล้ว +3

    I loved the sound design here.

  • @saurabhs4743
    @saurabhs4743 5 ปีที่แล้ว +13

    It was awesome.. thanks Google.. please let her do more machine learning tutorials.. she's amazing at that

  • @sihya9602
    @sihya9602 4 หลายเดือนก่อน +2

    WONDERFUL EXPLANATION

  • @ozzyfromspace
    @ozzyfromspace 5 ปีที่แล้ว +26

    You know how a little kid might see a cat 🐈 one time and in the future it'll be able to tell that other cats are cats. And one day it sees a dog and can immediately tell that something's different. That's what I'm interested in: how do we cut down the data requirement from billions of examples to just one or two? Imagine a self driving car that could appreciate context in real-time despite not having a really fine tuned Neural net that's based on a crazy amount of brittle test data. That's what I'm working on as the founder of an extremely early, unnamed startup. If we don't need 5 billion examples of cars being driven to do a good job, neither should machines. We need to build machines like babies that see cats running and just "get it". That's the dream.

    • @mattcollins5519
      @mattcollins5519 4 ปีที่แล้ว +1

      Got a name yet?

    • @artinbogdanov7229
      @artinbogdanov7229 4 ปีที่แล้ว +2

      I was thinking about the same. Then I caught myself on the thought that when we see a cat in motion it's equivalent to many images going one after another.
      If it would be just 1 or 2 images I don't think our brain would be able to recognize it as easily. Especially when we are talking about different breeds and sizes.
      I'd like to learn about your start-up thought :)

    • @Anarchy421
      @Anarchy421 4 ปีที่แล้ว +9

      Part of a child's ability to immediately grasp "cat" from their first encounter has to do with their ability to explore the various features of the cat from different angles, as well as seeing the cat move into different poses. I wonder if it would be possible to have a neural network interact with a photorealistic 3D model of a cat in a virtual setting with the ability to move around and interact, and whether that would reduce the data requirement.

    • @willd1mindmind639
      @willd1mindmind639 3 ปีที่แล้ว

      The human eye and brain moves at the speed of light in that during the course of a second, the eye and brain processes many "frames" of data even if nothing changes. Vision in living organisms is all about being able to understand the dimensions, perspectives, shapes, colors and textures of objects in three dimensional space. And during the course of a day, your eyes are constantly taking in those frames which adds up to provide the millions of reference images used to build a model of how the real world looks, which has nothing to do with labels. Computers have no eyes and therefore cannot directly perceive the real world and have no concept of 3 dimensional space, texture, shape, color and dimensions intrinsically. So what you are actually doing is writing an algorithm that generates patterns of data using statistical methods during training to associate groups of bit patterns to a set of labels. That is not seeing as in how living creatures see. It is just another way of writing code to perform a task where most of that code is written to do the training and then once the model is built it can be used to dynamically perform certain functions, within some other code or using an API.

    • @PrathamInCloud
      @PrathamInCloud ปีที่แล้ว +1

      @@Anarchy421 Good luck trying to match a model that's trained on millions of years of evolution and uses a 576 MP camera for inputs. Not to mention all that data gets stored on the most dense form of storage known to mankind (and possibly the densest form of storage possible too)

  • @joshsmit779
    @joshsmit779 6 ปีที่แล้ว +75

    I loved the explanation, but I still would rather build and train my own model.

    • @jamesgillis8122
      @jamesgillis8122 5 ปีที่แล้ว

      why

    • @chawza8402
      @chawza8402 4 ปีที่แล้ว +5

      @@jamesgillis8122 he might have the resources or he might have a better way to train his model.

    • @adamlee9347
      @adamlee9347 4 ปีที่แล้ว +3

      Yeah using APIs is no fun

    • @richardlighthouse5328
      @richardlighthouse5328 3 ปีที่แล้ว

      @@jamesgillis8122 Unlimited use with no limits like how many requests you can do to api per month.

    • @sid98geek
      @sid98geek 2 ปีที่แล้ว

      @@adamlee9347 @Chawza or maybe he doesn't want to risk Google exploiting his data by using their API.

  • @musicandreptiles101
    @musicandreptiles101 2 ปีที่แล้ว

    Getting GCP certified and started to get tired for the day, then came across this video and am back to interested

  • @ozzyfromspace
    @ozzyfromspace 5 ปีที่แล้ว +2

    Wow, the model you called a Recurrent Neural Network sounds like a flavor of a project I'm working on. Thanks, I didn't know there was a name for it ☺️

    • @filmonasmerom5235
      @filmonasmerom5235 3 ปีที่แล้ว +1

      Heyy, could give me a rough idea of what you were/are working on? 😊

  • @AshishAwasthiX
    @AshishAwasthiX 6 ปีที่แล้ว +4

    Thanks Sara for simple explanation of computer vision and API details. A small correction though, evolution of vision as per en.wikipedia.org/wiki/Cambrian#Dating_the_Cambrian was less than "billions of years ago".

  • @artinbogdanov7229
    @artinbogdanov7229 4 ปีที่แล้ว +2

    Very nicely explained. Thanks a lot!

  • @wesleyshiong
    @wesleyshiong 5 ปีที่แล้ว +2

    Thanks, Sara. Your video is very helpful to me.

  • @mayalaluna4005
    @mayalaluna4005 2 ปีที่แล้ว +1

    hi, Google, I really have something i want to ask the AI community. i am not really a developer, but a friend... and i feel there should be more to AI vision than regular cameras... i started think about the issue because my need to have a good quality scan of my old newspapers collections... the quality of smart phone scan, even with google app, is just not good enough comparing to flatbed scan.... and i thought if we can turn the entire screen of smart phone as the scanning glass as in flatbed, it would be so much better quality... most of all, it would ensure the paper stay in perfect flat shape and every detail would be evenly scanned with extreme high density details... the technology of adding some light sensor to the smart phone screen wouldn't be a huge breakthrough....
    and here really is the connection i am thinking about of AI vision, if smart phone have such flatbed scanning function, it would be like a sense of touching something with a hand, so in a way, AI not only could have the eye vision through camera, it could gain the visions of physical contact by scanning.... inch by inch, square by square.... and by doing so, it would improve the AI's understanding of the world in a much more expanded dimension.... and in the future if the smart phone screen could be soft flexible, it would really be like the skin of AI, it will feel and observe the world in a way human can not even imagine...
    i know my writing isn't so good, but i believe this thought of mine is really worthy of google's consideration....

  • @TheAnugupta
    @TheAnugupta 2 ปีที่แล้ว

    Fantastic explanation.

  • @ramkumarr1725
    @ramkumarr1725 ปีที่แล้ว

    Computers have eyes. Great. ❤

  • @DenisTRUFFAUT
    @DenisTRUFFAUT 6 ปีที่แล้ว +3

    Highly professional video.
    Just miss an explanation on human labelling assisted by Google (billed or free feature ?)

  • @zaheercarrim1035
    @zaheercarrim1035 ปีที่แล้ว

    She is brilliant.

  • @digvijaysinh26
    @digvijaysinh26 3 ปีที่แล้ว

    excellent explanation

  • @pandarzzz
    @pandarzzz 6 ปีที่แล้ว +3

    Thank you for sharing this cool video! 👨🖐

  • @jindagi_ka_safar
    @jindagi_ka_safar 3 ปีที่แล้ว

    Deep learning/CNN helps the computer understand 'image content' in the same way as our human brain does great!.

  • @smraghu81
    @smraghu81 5 ปีที่แล้ว +1

    Does any one knows what software they used to prepare this video. Kindly suggest.

  • @heavenlypot
    @heavenlypot 2 ปีที่แล้ว

    She looks so excited

  • @offload3286
    @offload3286 5 ปีที่แล้ว +2

    Thank You Sara Robinson!

  • @ugursoydan8187
    @ugursoydan8187 2 ปีที่แล้ว

    thank you well explained

  • @3swayam
    @3swayam 5 ปีที่แล้ว +4

    Computer vision enthusiasts pls comment below to collaborate and work on a project

  • @7906jun
    @7906jun 4 ปีที่แล้ว +2

    Amazing !!

  • @luis96xd
    @luis96xd 6 ปีที่แล้ว

    Excellent video!

  • @gorannovaks
    @gorannovaks 6 ปีที่แล้ว +43

    This lady makes me feel a bit uncomfortable with her creepy facial mimic, but I appreciate the content.

  • @딱구리-d1c
    @딱구리-d1c 5 ปีที่แล้ว +1

    Even I starts my master's course in CV soon, I couldn't explain CV in my speak. This video gave me a help to organize what is CV and how it works!

    • @3swayam
      @3swayam 5 ปีที่แล้ว

      D B I see you re working on CV , how about collaborating?

  • @devrajashok8579
    @devrajashok8579 3 ปีที่แล้ว

    What's the name of the music in the background??

  • @imaginethat704
    @imaginethat704 3 ปีที่แล้ว

    Great Info

  • @_rd_kocaman
    @_rd_kocaman 3 ปีที่แล้ว

    Thank you Gina

  • @alixaprodev
    @alixaprodev 5 ปีที่แล้ว

    just get it now thanks to you

  • @mansinghyadav4365
    @mansinghyadav4365 2 ปีที่แล้ว

    Very nice

  • @colorbars8564
    @colorbars8564 6 ปีที่แล้ว +1

    What is the best programming language to learn for computer vision? I'm a college student who is fairly proficient in C++ but am not sure if I should focus more on learning Java or Python to increase my chances of landing a job working in this field after college. I'm an applied mathematics major and have taken several classes in linear algebra, real analysis, complex analysis, probability, numerical analysis/methods and differential equations.

    • @styxnexus
      @styxnexus 6 ปีที่แล้ว +4

      IMHO, you should focus more on Python !

    • @toshb1384
      @toshb1384 6 ปีที่แล้ว

      Python or Processing+Java

    • @renatocastro4285
      @renatocastro4285 5 ปีที่แล้ว +1

      Focus on Python, however C++ is very good for Computer Vision too. Almost the half of computer vision projects in production are coded on C++.

  • @RoxanaNoe
    @RoxanaNoe 6 ปีที่แล้ว +1

    Great great video

  • @prabhasadapa9945
    @prabhasadapa9945 2 ปีที่แล้ว

    Could you please share those slides??
    It will be helpful to us

  • @science.20246
    @science.20246 ปีที่แล้ว

    I do research on subject , I cant explain better

  • @joeljacob1066
    @joeljacob1066 5 ปีที่แล้ว

    I loved the video

  • @patriziacasini6661
    @patriziacasini6661 6 ปีที่แล้ว +1

    E' STATO VERAMENTE INTERESSANTE

  • @fizamukhtar3442
    @fizamukhtar3442 5 ปีที่แล้ว

    Anyone suggest me the latest research topic in computer vision

  • @IgorSwxy
    @IgorSwxy 3 ปีที่แล้ว

    Google is good at making in-ad videos :)

  • @tcsls-thedesignlab4132
    @tcsls-thedesignlab4132 6 ปีที่แล้ว

    Really digging this video. We would like to use this within our organization for a business transformation course, with your permission of course!

  • @AmanRaj-qk6tr
    @AmanRaj-qk6tr 6 ปีที่แล้ว

    Awesome

  • @fizamukhtar3442
    @fizamukhtar3442 5 ปีที่แล้ว

    Nice

  • @ai.simplified..
    @ai.simplified.. 3 ปีที่แล้ว

    3:45 really how?

  • @soulysouly7253
    @soulysouly7253 2 ปีที่แล้ว

    it's cool to always throw around the words "machine learning", but it would be even better if we actually educated CS grads with signals and systems theory. No matter how much you know about ML you'll never understand why it works if you don't understand signal (including 2d signal like images) processing.

  • @MasterofPlay7
    @MasterofPlay7 4 ปีที่แล้ว +1

    google when are you going to take apple's head off? xD>>???

  • @sichard.rimmons
    @sichard.rimmons 6 ปีที่แล้ว +7

    sheepdogs and mops LOL!

  • @sanstechie_official4669
    @sanstechie_official4669 5 ปีที่แล้ว

    You're just training your model with million users photos , if not how can you train it to that much extent ... You guys have large computing power and million users data that's why it's possible to train your model quickly...

  • @LUSHBEE728
    @LUSHBEE728 2 ปีที่แล้ว

    WOW

  • @aboodaaboood8007
    @aboodaaboood8007 6 ปีที่แล้ว +1

    Please download the translation in Arabic

  • @AtticusDenzil
    @AtticusDenzil 2 ปีที่แล้ว

    why not start from the big bang?

  • @johnlin3145
    @johnlin3145 6 ปีที่แล้ว

    i don't see the creative power in evolution

  • @silverhawkscape2677
    @silverhawkscape2677 3 ปีที่แล้ว +1

    Now Computer vision is used to make PC cheats like aimbot available on Console. 🙃

  • @AnkitSingh-ru8he
    @AnkitSingh-ru8he 2 หลายเดือนก่อน

    What if we built software instead of shifting to cloud😂😂😂😂

  • @LeonardoRivillini
    @LeonardoRivillini 5 ปีที่แล้ว +5

    hahah very happy person, no ? :´p

  • @bilalrajab1879
    @bilalrajab1879 3 ปีที่แล้ว

    1:20 lmao

  • @markcuello5
    @markcuello5 ปีที่แล้ว

    HELP

  • @catherinele8417
    @catherinele8417 3 ปีที่แล้ว

    This is deep learning not computer vision ..

  • @jony7779
    @jony7779 6 ปีที่แล้ว +1

    viridis

  • @SajiSNairNair-tu9dk
    @SajiSNairNair-tu9dk 7 หลายเดือนก่อน

    👉🧠r a m 🕵️😂😊

  • @dallaskelley1752
    @dallaskelley1752 5 ปีที่แล้ว +3

    This is amazingly inefficient and a surprisingly dumb way to go about image detection.

    • @davidamelemah5770
      @davidamelemah5770 5 ปีที่แล้ว +6

      Dallas Kelley do you have a better method

    • @ozzyfromspace
      @ozzyfromspace 5 ปีที่แล้ว

      I do

    • @vophie
      @vophie 4 ปีที่แล้ว

      it's the best way we have for AI to do it rn

  • @DIYRobotGirl
    @DIYRobotGirl 3 ปีที่แล้ว

    I just want to know if we can get a waifu chatbot to have computer vision?

  • @motherfucc
    @motherfucc ปีที่แล้ว

    amazon aws

  • @imyasharya
    @imyasharya 4 ปีที่แล้ว +24

    Billions of years are achieved in just few decades. It's so exciting.

    • @oliver99999-e
      @oliver99999-e 2 ปีที่แล้ว

      The power of exponential growth

  • @technologyandinnovation4586
    @technologyandinnovation4586 4 ปีที่แล้ว

    You can't create and use buzz word to impress people. Bottom line no matter how hard google tries it will remain a search engine company and its variant. Every of their effort is optimize search there is nothing innovative just brute force.

  • @nathanstake
    @nathanstake 2 ปีที่แล้ว

    I am currently recruiting for Computer Vision Engineers for a Robotics Manufacturer in Ohio. This is fascinating.

  • @beckygomez748
    @beckygomez748 4 ปีที่แล้ว +1

    Thanks🙏

  • @quinn6152730
    @quinn6152730 2 ปีที่แล้ว

    Thank you google🥺

  • @copaboy
    @copaboy 3 ปีที่แล้ว

    Wow. That was quick and precise analysis of machine and deep learning.

  • @madvoice3703
    @madvoice3703 2 ปีที่แล้ว

    SUPER 😇😇😇😇