- 15
- 251 414
Josh Holloway
United States
เข้าร่วมเมื่อ 15 ส.ค. 2015
วีดีโอ
Web Components | Custom Elements
มุมมอง 1.8Kปีที่แล้ว
This video shows you how to create your own custom native HTML elements! 🤩 Custom elements give us the ability to create modular web components similar to React components. However, our web components based on custom elements are 100% native to the browser and require no libraries or dependencies! 🤯 Furthermore, web components work in all browsers (Chrome, Firefox, Edge, & Opera). They even wor...
eCommerce Web App | React (Next.js) + Node + PostgreSQL
มุมมอง 3.9K2 ปีที่แล้ว
This is a web-app I'm working on. I'm using React (via Next.js) on the frontend. On the backend I am using Node. I'm learning TypeScript while working on this project by using it both on the frontend and backend. Animations are done with GSAP. I'm using a PostgreSQL database with cryptographically hashed passwords via bcrypt and secure JWT authentication.
Detection of objects followed by projecting into the coordinate system of the breadboard
มุมมอง 7575 ปีที่แล้ว
The high-level vision module is composed of a convolutional neural network that performs object detection. The network used was Faster-RCNN [7]. The relevant outputs are the coordinates of each object, the class the object came from, and the corresponding confidence score. The coordinates of each component at this point are in the image plane in units of pixels. I next map these image coordinat...
Coordinate system projection onto a surface via feature matching & extrinsic parameter estimation
มุมมอง 7645 ปีที่แล้ว
The mid-level vision module first detects the fiducial marker on the breadboard. The current fiducial marker used is a checkerboard. I first extract features from the marker followed by matching those features to the marker on the breadboard in the image. From these point correspondences the extrinsic parameters can be estimated that allows us to map from the coordinate system of the camera to ...
Intro to CUDA (part 6): Synchronization
มุมมอง 21K5 ปีที่แล้ว
CUDA Teaching Center Oklahoma State University ECEN 4773/5793
Intro to CUDA (part 5): Memory Model
มุมมอง 26K5 ปีที่แล้ว
CUDA Teaching Center Oklahoma State University ECEN 4773/5793
Intro to CUDA (part 4): Indexing Threads within Grids and Blocks
มุมมอง 34K5 ปีที่แล้ว
CUDA Teaching Center Oklahoma State University ECEN 4773/5793
Intro to CUDA (part 3): Parallelizing a For-Loop
มุมมอง 35K5 ปีที่แล้ว
CUDA Teaching Center Oklahoma State University ECEN 4773/5793
Intro to CUDA (part 2): Programming Model
มุมมอง 40K5 ปีที่แล้ว
CUDA Teaching Center Oklahoma State University ECEN 4773/5793
Intro to CUDA (part 1): High Level Concepts
มุมมอง 87K5 ปีที่แล้ว
CUDA Teaching Center Oklahoma State University ECEN 4773/5793
Inverted Pendulum (part 2): Finally Working!
มุมมอง 5195 ปีที่แล้ว
Electrical Engineering Senior Design Project at OSU-Tulsa
Inverted Pendulum (part 1): Almost Working
มุมมอง 2005 ปีที่แล้ว
Electrical Engineering Senior Design Project at OSU-Tulsa
Inverted Pendulum (part 0): Not Quite Working
มุมมอง 3045 ปีที่แล้ว
Electrical Engineering Senior Design Project at OSU-Tulsa
Great video thx! but min 5:50 first row miss one 4 and last row should be 0 11. Is it correct ?
Thank you! The comment stated at time 10:42 is probably one of the most important parts! Essentially, one thread (within a given grid) will create/allocate the shared array and all other threads within the same grid will ignore that line of code and assume that it has been handled by the first thread.
Good job :)
excellent explanation, easy to catch up all the concepts and process.. cool
This is an incredibly helpful series. Thank you!
5 years later & this is the best training I've found so far. Thank you Josh!
Smooth one right here
Still great cuda content 5 years later. Thanks for the lectures!
i have cudaitis
Typo at 3:41? The slide says that the device is the device plus the host, that should just be for heterogeneous right?
Extremely helpful! Thank you for the good lecture :)
hey! terrific videos on CUDA... would be great if you could continue uploading follow up videos:)
I am running this kernel to test: __global__ void IncrementCuda(int * d_out, int N) { int i = threadIdx.x + blockIdx.x * blockDim.x; int tmp = -1; if(i < N) { tmp = d_out[(i+1)%N]; //__syncthreads(); d_out[i%N] = tmp; //__syncthreads(); } } Notice I have commented out the __syncthreads(). Calling with N = (up 20000000) , generates perfectly sequenced blocks. I would expect that I would see some 'jags' and gaps in the output i the absence of synchronization. I see none. I have also experimented with calling or not calling cudaDeviceSynchronize(). Again, no change. What is happening? How do I force a race condition in CUDA and observe it's effects?
One of the best conceptual overviews of any ML-related topic, not just CUDA. 4 years later - still worth watching. Props
By far the Best Explanation... Thank you sir
Please teach such things more often and make a playlist on youtube Sir ... things like openmp, embedded C, gnu toolchain, etc. are very rarely taught well in TH-cam.
Great content indeed!!! I'm willing to pay for such content 🔥
You definitely deserve atleast 1M subs and vivers! great stuff, keep doing good work !!! Subscribed!
I got to say, not coming from C) that void** confuses the hell out of me.
very good introduction! Thanks.
implementation of the kernel will be left as an exercise lmao. now out to buy an Nvidia gpu!
please show how to make multiple different custom elements on the same page
this was so helpful, feel like I need to do more than just subscribe 😀
Thank you from the bottom of my heart for those lectures!
CUDA 101
Great series. Thanks. Subscribed. Cheers
@Josh where is the rest!
6:52 // says to cpy Host to device, fcn says to cpy Host to device. btw great playlist!
Very simple and elegant thanks!
extremly good explanation of complex topic in an easy way.. You are awsome
why are you using a double pointer at 6:27? why not just d_c? why &d_c?
th-cam.com/video/4APkMJdiudU/w-d-xo.html Here you mention the threads don't execute at the same time though they perform the same operation. Is that because sometimes there are data dependencies between the threads? You say its because they are working on different data, (the value of data shouldn't matter) right?
I also really enjoyed this series of videos and would be interested in more [advanced ones] on this topic =).
I really like your videos. Are you going to record the following videos on more advanced use of GPU/Cuda in practice? That would be really great. I mean mostly optimization. Best, Michał
you taught me way more than I learned in 2 weeks of GPU programming lectures 😭. THANKS A BUNCH🙏🏽
FR MAN ,I WAS CRYING BC I DON'T UNDERSTAND ANYTHING
I enjoyed it.
🙌 "Promo sm"
Thanks! I was looking for a quick intro to custom elements and this was great :D
The best CUDA tutorial !!!
Super valuable series thanks for making it! 4 years too late but still never late to say thank you :)
Great stuff! Everything I needed to form a decent mental model of a GPU
Hello Sir! I have a doubt ... wouldn't the transfer of data from cpu to gpu (global memory) be the bottle-neck in the entire architecture? You teach great! Thank You !!!
yo i cannot begin to describe the gratefullness from this playlist amazing short formed and beneficial
Excellent series. Very clear and accurate lectures. This is the best CUDA lectures I have ever seen. Academia, please keep up the good work. You are well qualified to be educators.
Non sono solito a lasciare commenti, ma in questo caso non posso farne a meno. Ho seguito l'intera serie di video su CUDA e non posso che ringraziarla per la semplicità e la chiarezza con cui spiega i vari argomenti!
cant understand your english ascent..😢
where can I get the slides? Thanks
2:43 what does it mean by’ because it is executed on many data elements, the memory access latency can be hidden with calculations instead of big cache memory.’?
I believe it means the overhead of transferring data (which is way slower than performing a calculation), is compensated by performing many calculations with the same piece of data (e.g. for a matrix multiplication), as opposed to caching data to have good performance on a variety of instructions like a CPU does. Here's a good vid explaining the tradeoffs: th-cam.com/video/3l10o0DYJXg/w-d-xo.html
Wonderful tutorial!So,how to optimize cuda program to fully utilize?
Thank you sir for the great lectures, how can we access the rest of the course, is there more than those 6 parts?
look up the books he's showing at the end of the video