168
38 977

Implementing A Transpose Kernel On The GPU In The Spiral Language

1:17:00

Counterfactual Regret Training For Heads Up No Limit Hold'em (Using A Random NN Model)

2:56:31

Adding Stack Mutable Types To The Spiral Language

3:27:48

Revisiting Ampere Matrix Multiplication

1:43:02

Revisiting Ampere Async Loads

1:59:09

Optimizing The Training Loop For The GPU Implementation Of The CFR Algorithm

1:31:10

Sketching Out The Cross Entropy Method In Code

Much like we did for the tabular CFR algorithm, here we start out with the cross-entropy method by sketching it out in Spiral pseudo-code.
---
#spiral #functionalprogramming #machinelearning #reinforcementlearning #programming #cpp #programminglanguage #compiler #parallelprogramming #cuda #gpu
Playlist(Staged FP in Spiral): th-cam.com/play/PL04PGV4cTuIVP50-B_1scXUUMn8qEBbSs.html
Spiral: github.com/mrakgr/The-Spiral-Language
Spiral's ML Library: github.com/mrakgr/Spiral-s-ML-Library
Github: github.com/mrakgr/
If you have interesting work opportunities and require an expert functional programmer, don't hesitate to get in touch. My email is on my Github profile. Put "Work" as the subject in order to avoid the spam filters.

มุมมอง: 7

วีดีโอ

Implementing A Transpose Kernel On The GPU In The Spiral Language

1:17:00

Implementing A Transpose Kernel On The GPU In The Spiral Language

มุมมอง 509 ชั่วโมงที่ผ่านมา

The transpose primitive is a key piece of the cross-entropy algorithm, and we finally get around to implementing this piece of Cuda programming classic, in staged functional programming style. #spiral #functionalprogramming #machinelearning #reinforcementlearning #programming #cpp #programminglanguage #compiler #parallelprogramming #cuda #gpu Playlist(Staged FP in Spiral): th-cam.com/play/PL04P...

Counterfactual Regret Training For Heads Up No Limit Hold'em (Using A Random NN Model)

2:56:31

Counterfactual Regret Training For Heads Up No Limit Hold'em (Using A Random NN Model)

มุมมอง 2719 ชั่วโมงที่ผ่านมา

In this video we did an admirable job of refactoring, optimizing and analyzing the performance of the various parts of the ML library, but unfortunately the tabular CFR agents based on random NN models can't learn to read the board and perform poorly. So, in the next video we'll act according to plan and begin work on the cross-entropy method. In order to implement it however, we'll need the tr...

Adding Stack Mutable Types To The Spiral Language

3:27:48

Adding Stack Mutable Types To The Spiral Language

มุมมอง 64วันที่ผ่านมา

Things are coming to a head, and as we proceed in our journey, we need to optimize what can be. Concerned about the register and local memory usage, we play around with various optimizations and in the end, the stack mutable layout types that we build into the language in this video result in significant improvements. A great advantage compared to what we were doing before is that it's very eas...

1:43:02

Revisiting Ampere Matrix Multiplication

มุมมอง 2514 วันที่ผ่านมา

We go back to the matrix multiply kernel that we did back in early 2024 and comb over it. We redesign it so as to alleviate the code and register blowup that was in the original. We also finally put the async loads to good use. All in all, the matrix multiplication kernel that we created while passable is still a disappointment and we're looking forward to the next gen NVidia cards coming out s...

1:59:09

Revisiting Ampere Async Loads

มุมมอง 814 วันที่ผ่านมา

In this video, we play with the nanosleep function and show that the Cuda compiler is very good at interleaving global loads with computation even using synchronous instructions. It's very random whether an async load improves performance. #spiral #functionalprogramming #machinelearning #reinforcementlearning #programming #cpp #programminglanguage #compiler #parallelprogramming #cuda #gpu Playl...

Optimizing The Training Loop For The GPU Implementation Of The CFR Algorithm

1:31:10

Optimizing The Training Loop For The GPU Implementation Of The CFR Algorithm

มุมมอง 4421 วันที่ผ่านมา

We make a separate training loop just for the optimization purposes and get to work on bringing the register pressure down. We won't really be successful until we add stack mutable types to the language and optimize the matrix multiply kernel a few videos later. #spiral #functionalprogramming #machinelearning #reinforcementlearning #programming #cpp #programminglanguage #compiler #parallelprogr...

Training A Superhuman Leduc Agent On The GPU Using The CFR Algorithm

1:31:51

Training A Superhuman Leduc Agent On The GPU Using The CFR Algorithm

มุมมอง 2328 วันที่ผ่านมา

We finally train that agent we've been striving for. Arguably, when we played against it wasn't quite superhuman, but it was still decently competent giving us confidence that the CFR implementation that we have is correct. Having done this, we'll be spending the next few videos we'll be working on optimizing the training process. #spiral #functionalprogramming #machinelearning #reinforcementle...

Replacing The Use Of Static Shared Memory With Dynamic In Spiral's ML Library

1:01:03

Replacing The Use Of Static Shared Memory With Dynamic In Spiral's ML Library

มุมมอง 12หลายเดือนก่อน

Spiral's powerful tools make doing what the title says a lot easier than it would have been in C . With this out of the way, in the next video we finally train our first Leduc agent and play against it. #spiral #functionalprogramming #machinelearning #reinforcementlearning #programming #cpp #programminglanguage #compiler #parallelprogramming #cuda #gpu Playlist(Staged FP in Spiral): th-cam.com/...

Parallel Tabular CFR Training On The GPU

6:06:40

Parallel Tabular CFR Training On The GPU

มุมมอง 33หลายเดือนก่อน

We manage to go most of the way there in this video, but to our surprise the Cuda compiler is really bad at optimizing out the static memory allocations. It is only in the next video that we manage to get it to train properly against a random player in Leduc. #spiral #functionalprogramming #machinelearning #reinforcementlearning #programming #cpp #programminglanguage #compiler #parallelprogramm...

Improving The ML Library Testing. We're Going Back To Timelapsing.

41:08

Improving The ML Library Testing. We're Going Back To Timelapsing.

มุมมอง 19หลายเดือนก่อน

One aspect of programming that Ghostlike frequently skims out is testing, and this module we aim to improve that just a little. We couldn't get IO redirection inside Cuda kernels to work, but otherwise, we make some improvements to the way testing is done in the ML library. That sets the stage for actually getting the CFR training to work. #spiral #functionalprogramming #machinelearning #reinfo...

Integrating Tabular CFR With The Poker Games On The GPU

6:46:32

Integrating Tabular CFR With The Poker Games On The GPU

มุมมอง 25หลายเดือนก่อน

No trained agents yet, we are still running randos, but this time we are using the policies in the CFR model. The work on actually training them will start the next video. By the way, we are moving back to the old timelapsing style of doing videos. 4x recording, background music and a textbox where Ghostlike types his thoughts. These videos take too long to audio process, render and upload. Gho...

Debugging The Parallel Tabular CFR Implementation (Part 2)

2:32:01

Debugging The Parallel Tabular CFR Implementation (Part 2)

มุมมอง 16หลายเดือนก่อน

After implementing the transposing loops, we do the last bit of debugging before moving on to integrating tabular CFR with the games. #spiral #functionalprogramming #machinelearning #reinforcementlearning #programming #cpp #programminglanguage #compiler #parallelprogramming #cuda #gpu Playlist(Staged FP in Spiral): th-cam.com/play/PL04PGV4cTuIVP50-B_1scXUUMn8qEBbSs.html Spiral: github.com/mrakg...

Sharing The Pointers In Shared Memory Automatically Using The Transposing Loops

3:44:33

Sharing The Pointers In Shared Memory Automatically Using The Transposing Loops

มุมมอง 43หลายเดือนก่อน

Here we do special kinds of loops that can share data via shared memory automatically. They grab the free variables in a function and pass those via shared memory before splicing them back. It's very useful functionality for the `row_gather` primitives and allow us to improve their implementation significantly. Those of you coming new here, don't be confused. We aren't showing how to transpose ...

Debugging The Parallel Tabular CFR Implementation (Part 1)

2:01:22

Debugging The Parallel Tabular CFR Implementation (Part 1)

มุมมอง 19หลายเดือนก่อน

After sketching out the parallel CFR implementation out in code in the previous video, we do some basic debugging to make sure it is sound. When it comes to these algorithms, we'll never be 100% sure whether they are right. All the randomness will make debugging them a lot more difficult, but we can only do our best. In this video we're just doing some basic sanity checking which catches a surp...

Sketching Out A Parallel Tabular CFR Algorithm For The GPU

8:54:41

Sketching Out A Parallel Tabular CFR Algorithm For The GPU

มุมมอง 3842 หลายเดือนก่อน

Sketching Out A Parallel Tabular CFR Algorithm For The GPU

6:51:57

Creating The `row_gather` Primitives

มุมมอง 212 หลายเดือนก่อน

Creating The `row_gather` Primitives

How To Do Mouse Highlighting On Windows 11

5:45

How To Do Mouse Highlighting On Windows 11

มุมมอง 502 หลายเดือนก่อน

How To Do Mouse Highlighting On Windows 11

Running The Game Model With All The Threads. Implementing The Neural Model Loop.

7:19:23

Running The Game Model With All The Threads. Implementing The Neural Model Loop.

มุมมอง 572 หลายเดือนก่อน

Running The Game Model With All The Threads. Implementing The Neural Model Loop.

Running The (Random) Neural Agent On NL Holdem

4:31:35

Running The (Random) Neural Agent On NL Holdem

มุมมอง 202 หลายเดือนก่อน

Running The (Random) Neural Agent On NL Holdem

Running The (Random) Neural Agent On Leduc

3:56:30

Running The (Random) Neural Agent On Leduc

มุมมอง 272 หลายเดือนก่อน

Running The (Random) Neural Agent On Leduc

6:01:58

Refactoring The ML Library (Part 3)

มุมมอง 172 หลายเดือนก่อน

Refactoring The ML Library (Part 3)

6:52:07

Refactoring The ML Library (Part 2)

มุมมอง 142 หลายเดือนก่อน

Refactoring The ML Library (Part 2)

4:19:34

Adding Higher Ranked Types To Spiral

มุมมอง 573 หลายเดือนก่อน

Adding Higher Ranked Types To Spiral

4:09:08

Implementing The Masked Softmax

มุมมอง 193 หลายเดือนก่อน

Implementing The Masked Softmax

11:21:56

Refactoring The ML Library (Part 1)

มุมมอง 203 หลายเดือนก่อน

Refactoring The ML Library (Part 1)

6:10:20

Adding ECharts To The Game Frontend

มุมมอง 1423 หลายเดือนก่อน

Adding ECharts To The Game Frontend

Fixing A GADT Error In The Spiral Language

17:17

Fixing A GADT Error In The Spiral Language

มุมมอง 453 หลายเดือนก่อน

Fixing A GADT Error In The Spiral Language

Redesigning Spiral's Machine Learning Library With GADTs

7:41:55

Redesigning Spiral's Machine Learning Library With GADTs

มุมมอง 463 หลายเดือนก่อน

Redesigning Spiral's Machine Learning Library With GADTs

Review Of GADTs In Spiral And Their Implementation (livestream)

1:32:24

Review Of GADTs In Spiral And Their Implementation (livestream)

มุมมอง 404 หลายเดือนก่อน

Review Of GADTs In Spiral And Their Implementation (livestream)

ความคิดเห็น

@reinismu 7 วันที่ผ่านมา
Don't give up :) Have you reduced game tree with hand isomorphism and grouping similar states together? Also could try switching to short deck poker variant to see progress quicker
@markogrdinic5126 7 วันที่ผ่านมา
While those schemes could give some improvement, if I was maybe training it with backprop, the biggest problem is that with a random model, the CFR model on top simply doesn't have informative enough features to do anything useful. It cannot learn to read the board, and just turns the information into mush. This isn't something that is surprising to me, back in 2021 I tried training a NN to learn the hand ranking function in a supervised manner and it couldn't do it perfectly. In the follow up videos, I try to get around the lack of feature learning using the cross-entropy method, and you'll see how it goes...
@markogrdinic5126 7 วันที่ผ่านมา
Back in 2021, I did manage to get an agent to train on flop poker using actor critic methods even though it just died on NL Holdem, so you are right that reducing the amount of game states would make the problem more tractable. But it's still very unsatisfying to get it to work like that. You don't want the training process to choke on something like NL Holdem, because if it's having trouble with a toy game like that, how will it handle the bigger games? It won't. And I don't want a path where I am putting in hack after hack to make it work well in a domain. I want to go beyond that. But I can't really do it, so in the end developing the Spiral language and getting better at programming becomes the goal of the exercise. Embarrassingly, my journey is one where I get better at programming, but don't get closer to the essence of intelligence. I think that won't hold forever and eventually programming and understanding of intelligence will become synonymous. Programming is a weak proxy to modifying one's own mind right now. Eventually, it will become a strong proxy, and it's only a matter of time until it stops being a proxy at all.
@yunuszenichowski หลายเดือนก่อน
Hey, I really admire your project and wisdom. However, the longform recordings are a huge disadvantage, if you want people to watch your videos. They make it extremely difficult to follow the development, because one would have to watch hours and hours of programming. Something like a recap of a week's or month's work, like @LadybirdBrowser does it, would be much easier to follow. Although uncut programming can also be good content (@TsodingDaily), people would just need to be able to better follow what you are doing (which is often difficult to achieve).
@matejvolarevic 2 หลายเดือนก่อน
Gospodine, ne znam jeste li Hrvat (naš), ali Vam se divim! Htio bih jednoga dana biti vrstan programer kao i Vi!
@markogrdinic5126 2 หลายเดือนก่อน
Puno hvala.
@drewku42 3 หลายเดือนก่อน
Can we get a new tutorial? Would be greatly appreciated 🙏
@markovujanic 3 หลายเดือนก่อน
Hey Marko, grate stuff. I like your style, no fluff just right to the point . Hope to see more of your videos in the future :). Pozdrav!
@markogrdinic5126 3 หลายเดือนก่อน
Thank you.
@yunuszenichowski 4 หลายเดือนก่อน
Looking forward to the review video about GADTs :)
@yunuszenichowski 4 หลายเดือนก่อน
But even if the new AI hardware was right around the corner, wouldn't it be nice to have a good CUDA backend anyway? I heard the most difficult part of making new hardware competitive is ML library integration, so according to that, welcoming new hardware when it finally arrives with a library tailored to them would be a great bet. NVIDIA's margins are such a motivation for new companies, I am sure they will figure it out.
@markogrdinic5126 4 หลายเดือนก่อน
I used to think like that, but my brain and my heart are in disagreement. Even if the Nvidia competitors are destined to make an impact, it will take at least 3 years before it hits. I've been in this for a while and I've been looking forward to novel hardware for nearly a decade, which is why the performance of Nvidia's competitors is so disappointing. Anyway, the C++ backend is complete. You can already give it a try in Spiral v2.11. The screencast chronicling its creation will come out on the 28th July, in 4 days. Thanks for leaving a comment, I want more of those.
@yunuszenichowski 4 หลายเดือนก่อน
@@markogrdinic5126 I really want to try out Spiral. Didn't yet find the time, but I am very much looking forward to it. I am still an amateurish programmer, so we will see how it goes :) This might be a far fetch, but please tell me, was the anime Gurren Lagann inspiration for the name? Because how the Spiral is presented there, as the thing which describes the exponential human progress, would also fit your ambitions well I think. And I noticed you used anime soundtracks in some videos :)
@WillEhrendreich 5 หลายเดือนก่อน
Thank you, more content please! =)
@hishadman 5 หลายเดือนก่อน
Thank you for this video. Can you make one for uploading facefusion on paper space and only Foocus? Without download all of SD?
@vuhoangdung 6 หลายเดือนก่อน
hi Marko, thanks for the video, I'm able to do the login but I don't quite get the code flow? can you summarize it? also why you need two app registrations on Azure? thanks
@markogrdinic5126 6 หลายเดือนก่อน
Sorry, it's been so long since I made this vid that I forgot what was in it, and I'd need to study it for a few hours to give you satisfactory answers. Currently, I am busy with making the RPS game work on the GPU in the Spiral series, so I don't have time for that. You'll have to put in some effort yourself in order to understand this, and if you manage it, please do answer your own questions here so that others might benefit from your insight. Good luck!
@saccharineboi 6 หลายเดือนก่อน
I have an AMD laptop with linux and let me tell you, I've never been able to do any GPGPU. OpenCL driver crashes the whole system, ROCm too. My only option is Vulkan but I'd rather program neural networks in AVX2, or use a compute shader in opengl.
@tobiaskarl4939 6 หลายเดือนก่อน
What is F# ? 😁 Another lame slow language no one needs. C and only C is the fastest best programming language for that.
@markogrdinic5126 6 หลายเดือนก่อน
What is C? 😄Another lame slow language no one needs. Assembly and only Assembly is the fastest best programming language for that.
@tobiaskarl4939 6 หลายเดือนก่อน
@@markogrdinic5126 Oh I see you do not have any clue. C is directly converted into machine code. Try it ! Assembler same but is hard to read too low level.
@frozen_tortus 3 หลายเดือนก่อน
@@markogrdinic5126 real programer punch cards.
@asasdflkjdsalf345 7 หลายเดือนก่อน
you're missing 50k subscribers
@byetaeyang 7 หลายเดือนก่อน
Missed another 0
@Warren_Elrod 7 หลายเดือนก่อน
Just found your channel, what a treasure trove of information. Thank you so much 🙏
@RodrigoNishino 8 หลายเดือนก่อน
The way you say things is like Ricky Sanchez without the burps
@hieuleinh180 8 หลายเดือนก่อน
I have a HIP code of a Batch of Matrix multip Vector like this, what can I do to make it more fast without influence on accuracy __global__ void matmul_kernel(float *xout, float *x, float *w, int n, int d) { // W (d,n) @ x (n,) -> xout (d,) // by far the most amount of time is spent inside this little function int i = blockIdx.x * blockDim.x + threadIdx.x; int d_i = blockIdx.x; int batch_i = blockIdx.y; float val = 0.0f; for (int idx = threadIdx.x; idx < n; idx += blockDim.x) { val += w[d_i * n + idx] * x[n * batch_i + idx]; } val = blockReduceSum(val); if (threadIdx.x == 0) { xout[batch_i * d + d_i] = val; } } void gpu_matmul(float* xout, float* x, float* w, int n, int d, int batch_size, hipStream_t *stream) { // printf("batch_size: %d ", batch_size); dim3 grid(d, batch_size); matmul_kernel<<<grid, 512, 0, *stream>>>(xout, x, w, n, d); // CHECK_HIP(hipGetLastError()); }
@AntaripGangopadhyay 8 หลายเดือนก่อน
Hi!! I'm trying to install the packages, but trying to install the package "Gradient" is giving me errors [ AttributeError: cython_sources], Can you suggest anything? PS: I tried uninstalling and installing Cython but it's not helping.
@김영삼-h8p 8 หลายเดือนก่อน
How to Setup fooocus WebUI with Deforum Stable Diffusion on Paperspace please
@halamadrid-l9r 10 หลายเดือนก่อน
There is a time limit on how many times we can call the API. It returned the error code. Have you figured out ways to avoid it?
@markogrdinic5126 10 หลายเดือนก่อน
I heard from another user who got temporarily banned for making too many API calls during the startup. I've never ran into such an error myself, but then again, I haven't been using Paperspace since the start of the year. If anybody knows how to get around this, please post it in the comments. Maybe Paperspace added something in order to rate limit the number of queries.
@halamadrid-l9r 10 หลายเดือนก่อน
There is a time limit on how many times you can call the api within that time frame. It returned the error code 1015. How to avoid it?
@Mr-mirsab 11 หลายเดือนก่อน
please make a video of uploading dataset (more files) from local to paper space
@signin8663 11 หลายเดือนก่อน
root@xxxxxxxxx:/notebooks# rm -rf /storage/* #wait a fiew, and it's gonna purge it all#
@svenbardos6637 11 หลายเดือนก่อน
Nice video, thx
@markogrdinic5126 ปีที่แล้ว
For all of you out there, I just wanted to give you a heads up. The repetitive stress injury in my right hand has come back and I'm going to have to take longer breaks from here on out. I finished this video 4 days ago and because my right hand was in such a bad shape, only now have I decided to do some editing and publish it. I've actually caved and ordered a Glove80 keyboard. That particular ergonomic keyboard actually costs more than my GPU or monitor...but this is how bad things have gotten for me. I have also gotten a new, much more comfortable chair. I ordered the keyboard a few days ago, and until it gets here, I think I'm just going to sit back and relax and try to recover as much as possible. I'm not actually typing comment on a keyboard. In contrast to how I been doing it for all the previous sessions, I will move to using the voice access feature of windows 11. There are many reasons why my conditions has gotten this bad, starting from using a bad keyboard about mouse and programming with a bad posture. But I think the main problem is that I've been typing too much. Since 2015 I've been writing in a journal. And during my recent TH-cam sessions, I've been writing in that notepad to the side in addition to actually programming, I have written a lot of text during that time, probably way more than I should. I should be able to cut the amount of text that I've been writing by 80%. So far, I've been treating my keystrokes as essentially being free. So I didn't think much of using a keyboard instead of using my voice. But it seems that from here on out, I'm not going to be able to think of using the keyboard like it is nothing. From here on out, every single keystroke that I write will have to be precious. One of my original goals with starting this TH-cam channel was to improve my talking skills. But that particular goal seems to have gotten sidetracked. Maybe now that I am forced to talk to the computer, maybe I'll be able to actually improve my speaking skills. So, in the future sessions, after I recover a little, I won't actually be writing my thoughts in the notepad. Instead, I will be dictating using my voice.
@konstsh2240 ปีที่แล้ว
Elmish is not not that easy to compose, wish Bolero had more examples with reactive state model, maybe based on FSharp.Data.Adaptive
@indignasmr7379 ปีที่แล้ว
I'm having flashbacks to old old TH-cam, and I'm hearing that ♪let the bodies hit the floor♪ song even though you're not playing it. Hope this comment gets you some TH-cam algorithm points (:
@hamsterworks ปีที่แล้ว
This video has all the coherency of a tennis ace analyzing a chess game. There is no useful content to be found here. The reason nobody could give you a sort library is because requirements for sorting are very different on a hardware level. Do you want to sort millions of objects stored in off-chip SDRAM? Do you want to sort half a dozen items in one clock cycle? Do you want to sort a small number of items in a moderate number of cycles (say 100 items in 10 cycles)? Do you want only the top X or bottom Y items in the data set? Do you need predictable latency? What are you trying to optimize for? throughput? latency? chip area? power? All give you a very different hardware design
@markogrdinic5126 ปีที่แล้ว
Even assuming you are right, just how many ways are there of implementing sorting using HLS C++? Yes, there are various algorithms for doing it, some of which are O(n^2) and others O(n * log n). And assuming you need to mess with pragmas, that would multiply the number of variations by some amount. Even if you couldn't manipulate the inlining of a library sort function, just why isn't there a library of numerous different kinds of sorting? > Do you want to sort millions of objects stored in off-chip SDRAM? > Do you want to sort half a dozen items in one clock cycle? > Do you want to sort a small number of items in a moderate number of cycles (say 100 items in 10 cycles)? > Do you want only the top X or bottom Y items in the data set? > Do you need predictable latency? > What are you trying to optimize for? throughput? latency? chip area? power? > All give you a very different hardware design Why not all of that? That is what libraries are for, so the users do not have to waste their time implementing them themselves. Sorting isn't that trivial, and certainly something you wouldn't attempt to do in a few minutes, not if you wanted to do it correctly. I admit, some of my assumptions might be broken. I am going to have to study how the C++ code I've been compiling runs at the hardware level because I have no idea how passing pointers to arrays into functions could work otherwise. I know that functions get converted to state machines using FIFO/PIPOs, but being able to pass pointers in that regime makes less sense, not more. Don't tell me they compile it to some kind of chain and pass data from FIFO to FIFO? But that burden of learning is something I am going to have to take on. It doesn't excuse people in the community telling me to implement bubble and insertion sort, and getting upvoted by others.
@hamsterworks ปีที่แล้ว
@@markogrdinic5126 Hi! The language is not the issue, it's the approach. For an effective HDL design a lot more design work needs to be done up front. Say we wanted logic to perform a simplified poker ranking, for highest card, pair, two pair, three of a kind, four of a kind and full house. One possible hardware designed approach: The input: 6 bits for each card - 2-bits for suit and 4 bits for value, so 30 bits in total for a 5-card hand. The output: A 15-bit vector, consisting of - a pair flag - a two pair flag - a three of a kind flag - a four of a kind flag - a full house - The 5-bit value for the highest card or first set of cards - The 5-bit value for the next set of cards (or zero). If memory was no constraint, we could just look up a value in a table with 2^30 entries and get the rack almost instantly. However, it is a constraint - large memories take many cycles to access, and expensive (in terms of silicon, power, I/O and board space).- so a better solution is needed. here's one possible way. First, compare each card with each other in the hand. This will need ten 4-bit comparators, and generate a 10-bit value. That value can then be used to index a 1024-entry lookup table. The rows of this table consist of: - A bit that indicates if it a one pair - A bit that indicates if it is two pairs - A bit that indicates if it is three of a kind - A bit that indicates if it is four of a kind - A bit that indicates if it is a full house - A "sort highest values" flag - A five-bit mask for the first set of cards to find the highest value - A five-bit mask for the second set of cards to find the highest value So that's a 16-bit x 1024-entry read-only memory - or one 'Block RAM' in a common FPGA. With the output of the table lookup, then highest value in the two masked off sets of cards can be determined, and if required the output of those two values can be sorted if the the "sort highest values" flag is set. It requires four 6-bit comparisons to find the highest value in each set of cards, and another comparison that is needed to rank two pairs properly. You then end up with a blob of logic that takes as input a 30-bit 'poker hand' and returns 15-bit 'rankings' a small but fixed number of cycles later (maybe two or three cycles, depending on target clock rate). Throughput could be one hand each clock cycle. You could easily implement in a HLS, it would look radically different from the usual software-oriented solution, mostly because all the time consuming work of working out which cards have matching face values has been pre-calculated in generating the lookup table. Looking back over the design gives an estimate for the FPGA resources required - maybe 150 LUT6s, one Block RAM and 45 flipflops, or maybe 100 flipflops if pipelining was required.
@markogrdinic5126 ปีที่แล้ว
@@hamsterworks > maybe 150 LUT6s, one Block RAM and 45 flipflops, or maybe 100 flipflops if pipelining was required. Huh, really? Lol, when I compiled my hand ranker it was close to 20k LUTs. Could you really squeeze it down that much if you used a lookup table? I haven't really thought about it. But all the loops needed to match the cards, as well as do it in parallel are likely to take up hardware, and the kinds of LUTs I am getting don't feel unreasonable given the quantity of the outputted code even though it would be like 2% of an FPGA. > You then end up with a blob of logic that takes as input a 30-bit 'poker hand' and returns 15-bit 'rankings' a small but fixed number of cycles later (maybe two or three cycles, depending on target clock rate). Throughput could be one hand each clock cycle. That'd be pretty awesome performance, way better than in my implementation. I am interested in learning more. Let me think about what you are suggesting... > First, compare each card with each other in the hand. This will need ten 4-bit comparators, and generate a 10-bit value. That value can then be used to index a 1024-entry lookup table. The rows of this table consist of: ``` int c = 0; for (int i=0; i < num_cards; i++) { for (int j=i+1; j < num_cards; j++) { table[c++] = compare_card_ranks(i,j) } } ``` So assuming 5 cards per hand, that would be 4+3+2+1 = 10 loop iterations. We'd be comparing the ranks here which take 4 bits. And that 10 bit value is an array of booleans for each comparison. ...Which we can use to check for pair type hands. What about the straight flushes, flushes and straights? Also, I am sorry to be so misleading in my Reddit post, but I am not just sorting 5 card hands, at the time I wrote the post I was running into trouble, so I used 5 cards per hand to make it easier to debug them. The game I have in mind is NL Holdem which uses 7 cards. That one would require 6+5+4+3+2+1 = 21, a 16-bit * 2^21 table for the pair style table. And in addition to considering the straights and flushes, I'd also want to get back the actual hand, so I can compare it with the opponents. The hand ranker I wrote would work for getting the top hand from an arbitrary sized array of cards. ...Let me think a bit more. If we wanted to take this same approach to detect straight type hands, we'd need to sort them first. This is what motivated my question on the /r/FPGA sub. Flushes are easy to count. For straight flushes, we'd need to make a separate table that also considers the suits. The troublesome thing about them is that each of the possible suits would need to be considered individually, so number of comparators would be 4x that it would be for a straight. Also, I am sorry to heap requirements, but I thought it would be neat to have a hand ranker that would also take care of Omaha, which has 9 cards. At that point the approach would break. With more than 5 cards, straights become troublesome as it is no longer enough to just sort the hand, we need to skip over the pairs in a sequence. All of that makes a hand ranker complex. --- I know there are some sophisticated hand rankers which use hashing, but those do require long tables, much like the ones we are considering here. I was thinking that maybe I'd want to reuse this code on future AI hardware, so it wouldn't be good to use large memories where logic would suffice. The hashing based approaches would be a lot harder to implement than in just two days like I did for the hand ranker here. Actually, it took me a few days longer than that partly because I had to work on the language and the HLS C++ backend for Spiral while at the same time implementing it.
@hamsterworks ปีที่แล้ว
@@markogrdinic5126 If you were working with 5 cards, and you were to sort them before ranking you could make things simpler. Just compare for equality with the neighbour, to give a four bit vector So the hand "2 2 9 K K" would give "1 0 0 1". The lookup table for "1001" would indicate that it is two pair, taking card 2nd and 5th as the highest card in each pair. Likewise "2 4 4 4 4" would give "0 1 1 1" and that entry would indicate that it is four of a kind, with card #5 begin the high card There's only 16 different options in the lookup table, so it is much more compact, small enough to be implemented using just LUTS rather than block RAM
@markogrdinic5126 ปีที่แล้ว
Check out the following script if you want to see how manual style editing in the video could be automated: th-cam.com/video/2zBfH2J9GA4/w-d-xo.html
@rainbowcouch7736 ปีที่แล้ว
can i upload your starter script in paperspace notebook to run? I'm assume that 'pip install gradient' basically to communicate local machine to paperspace's gradient notebook, if I run your script inside paperspace gradient notebook environent, i can avoid the gradient installation issue. However, I don't think place the code directly in the notebook will run. thanks.
@markogrdinic5126 ปีที่แล้ว
Oh, no. It would not make sense to run it in a gradient notebook, since that script is what is used to start it to begin with. Were you the one who opened that issue on the repo? If you told me what kind of error you are getting maybe I could help you. You really shouldn't be having any trouble with running `pip install gradient` from the terminal. Maybe your Python installation isn't in PATH? Can you run the regular Python interpreter? For this kind of issue, maybe you could also ask Bing Chat to help you.
@rainbowcouch7736 ปีที่แล้ว
yeah, i have issue to run 'pip install gradient', I did use Ananconda, Window powershell and command prompt but no luck. I followed Bard, Claude and chatGPT instructions to locate the PATH, and it ends up with error: :AttributeError: cython_sources'... Getting requirements to build wheel did not run successfully." I did install Visual C++ , install cython, but in the end it still show the same error again. I did use different pc to try. May be I miss something in python. But thank you very much for your reply :) @@markogrdinic5126
@rainbowcouch7736 ปีที่แล้ว
hi, im not able to install gradient by using your method. is the pip install gradient broken? thanks.
@markogrdinic5126 ปีที่แล้ว
Sorry, I haven't used PS in a while, I won't be able to resolve this issue for you. If the setup script is out of date, you are going to have to ask around.
@mstu8097 ปีที่แล้ว
Thanks for these videos, it's an impressive amount of work. I've been using Fable to do client-side dev lately and watching you bump into issues, being a clearly more experienced coder than I am is somewhat comforting. It seems so frustrating that there doesn't seem to be a solution that's good enough in 2023. React has the community and reach but you need to deal with a terrible dynamic language and in the other side of the spectrum you have Fable/Elmish where you have F# which is absolutely fantastic and makes webdev coding quite good 90% of the time but sooner or later you'll run into weird runtime issues because in the end the code gets transpilated to a js runtime, and you still need to deal with a patchwork of components, webpack, npm and the whole jungle of third party stuff. It's just insane that this is the state of affairs. And while the open source community does their best and I'm nothing but grateful for their work the reality is that documentation for open source projects, or at least these project with lesser reach or less popular is always lacking or out of date because it takes a lot of time and effort that open source devs need to keep their daily jobs too and put food on the table. Big corporations do not always produce quality products or docs but at least they have a fighting chance. I was hoping on Bolero but clearly it's not mature enough. Maybe Blazor in .Net 8 alonside webassembly 2.0 will be the step forward that's sorely needed.
@DaveYostCom ปีที่แล้ว
Have you tried using AI to translate a TypeScript library to a .fsi file?
@markogrdinic5126 ปีที่แล้ว
No, but I wouldn't even think of using something like ChatGPT to generate Fable bindings for Typescript libraries. AI models are good for things where you can be loose like generating images with SD, or pointing me to the right documentation like Bing, or lately, helping me write science fiction like ChatGPT. I think that current models are just too weak for a task that requires deep reasoning like translating between incompatible type systems. Also, `.fsi` files are F# type signature files that are typically generated from regular `.fs` files. They are similar, but just contain types and not code, and tend to be used for documentation. The actual Fable bindings for Typescript libraries have to be placed in `.fs` files. I hope I got the intent behind your question right.
@coin383 ปีที่แล้ว
credit card
@adelarsq ปีที่แล้ว
Cool! Thanks for share!
@AkulaAviator ปีที่แล้ว
Thanks a lot for your help! Your guides are a must!
@Eji1700 ปีที่แล้ว
Well for what its worth im glancing at most of what youve done with plans to find some time and really sit down and plow into this. I've been doing a ton of small ETL things with F# for years now but have basically 0 front end skills and have repeatedly hit walls when trying, so having a full end to end project to bounce through and learn from has been extremely helpful in filling in some gaps in my knowledge. I'm hoping a thorough go through later will help more.
@markogrdinic5126 ปีที่แล้ว
That is great. I am glad somebody found this useful.
@tmxstock ปีที่แล้ว
this was so great, just had a question do you think you could make a video on how to use extensions on paper space? I cant get them to work :(
@balen7555 ปีที่แล้ว
I think transcribing from any typescript code to F# should be possible, maybe at the cost of paying for API ergonomics. Perhaps the problem is that ts2fable isn't mature? Then that's a solvable problem that isn't intrinsic to F#. F# honestly needs love/funding. (note: I haven't used F# nor Fable, but I am looking to use it)
@markogrdinic5126 ปีที่แล้ว
The problem with transcribing TS's types to F# is that TS has a lot more powerful type system, and a lot of what would be expressible in it, simply isn't in F#. All of type level mappings, type literals, type conditionals, nested variadic args. Moreover, F#'s (.NET's) and JS's object systems are different. You'll always be faced with incompatibilities in such a situation, and TS which was made to be a superset of JS will always have an edge when interacting with it. Currently, I am trying out Bolero & Blazor, so depending on how that turns out, it might be better to use those for web dev with F#. I need to spend some time using them before I decide whether I want to recommend it or not. I'd recommend F# in general as it is a lot easier to use (and a better language) than TS, but after experiencing Fable, but as far as webdev is concerned, I am not sure whether it would be worth the users time to use it over TS for their own projects. Maybe if you are an experienced webdev who is familiar with the libraries on the TS side, you could get use out of Fable, but for somebody coming in, definitely not. Bolero might be worth a try.
@balen7555 ปีที่แล้ว
@@markogrdinic5126 Thank you for your detailed reply. I'll check out your video on Bolero. I don't mainly do webdev, but still, I need to do work on the frontend sometimes, and since I do programming as a hobby, I like using languages that feel elegant to me
@georgematsos4416 ปีที่แล้ว
Is there a way to install invoke ai in paperspace ? Thanks.
@markogrdinic5126 ปีที่แล้ว
github.com/invoke-ai/InvokeAI#command-line-installation-for-users-familiar-with-terminals My guess would be yes, but I've never used it before. It has a web server, but I am not sure how it could be exposed publicly. You'd have to follow the instruction here and figure out if there is a setting to do that. The best way if you can't find it after some searching, is to open an issue in the relevant repo asking how to do it. Sorry, I am busy with other projects so I cannot make the time to make a tutorial on how to do it right now.
@georgematsos4416 ปีที่แล้ว
@@markogrdinic5126 Thanks for the reply. It is hard for me to figure it out. If you have the time in the future and create it i will be really thankful. Please if you do reply to this message.
@jofla ปีที่แล้ว
I was just researching how to achieve this goal! thanks for your video
@markogrdinic5126 ปีที่แล้ว
The error messages being shown on invalid lines bug was fixed quickly after I opened the issue. As of Fable 4.1.3, the line and columns will be shown where they should when generic params aren't known at compile time.
@markogrdinic5126 ปีที่แล้ว
github.com/rasheedaboud/Feliz.Auth.Examples Just a heads up, the author of `Feliz.Msal.React` provided some examples on how to use it here. Also, the import issue I opened in this video has been resolved today.
@rainbowasian96 ปีที่แล้ว
hey man! do you know how to setup startup scripts? scripts that run when you start the notebook i mean
@markogrdinic5126 ปีที่แล้ว
I've never tried it before, sorry. But I recall that you can setup Linux so it runs shell scripts during startup by editing some config files, you'll have to look for that on your own. What do you want them to do? Right now my focus is on webdev, but I'll get back and do more AI art stuff before long. I'll merge my various interests into one, and make some content that might be of interest to a wider audience, so don't hesitate to give me suggestions on what you'd want to see.
@teracota ปีที่แล้ว
Hey im planning to buy the monthly package , is it cost effective? also how much time it takes to boot up the stable diffusion webui everytime i run it?
@markogrdinic5126 ปีที่แล้ว
A few minutes unfortunately. It is not a problem if you want to prompt for a while, but if you want do a few quick pics it is a pain in the ass.
@teracota ปีที่แล้ว
@@markogrdinic5126 usually it takes 10 minutes on google colab. Of paperspace is quicker than that. It's great
@yuliyayevtukh9570 ปีที่แล้ว
Thank you a lot!
@flogginga_dead_horse4022 ปีที่แล้ว
master? seriously? :P
@EasyWay-M.I.S. ปีที่แล้ว
svaka čast marko
@markogrdinic5126 ปีที่แล้ว
Hvala.
@EasyWay-M.I.S. ปีที่แล้ว
@@markogrdinic5126 može neki kontakt il nešto imao bih par amaterskih pitanja
@markogrdinic5126 ปีที่แล้ว
@@EasyWay-M.I.S. Nema problema. Moj email možeš naći na mojem Github profilu. Username mi je mrakgr.
@Eji1700 ปีที่แล้ว
Just wanted to say thank you for this series. I haven't had time to sit down and do it end to end yet, but just glancing through it has cleared up so many pain points i've run into before, and end to end with things like actual deploying to azure helps clear up SOOO many missing parts.
@nvadito7579 ปีที่แล้ว
Great explanation. Will there be a video on how to load Lora Civitai characters (1 and more) and ControlNet?
@markogrdinic5126 ปีที่แล้ว
I'll keep your request in mind, though I do not have any plans to do more vids in this playlist for the foreseeable future. One thing I'd really like to do is change the Automatic1111 WebUI to something else, as I tried it yesterday and it was as buggy and unstable as usual, which irks me to no end. I'd really like to try out ComfyUI, but my interest right now is on programming. At some point though, I might return to doing visual novels, and then I'll do more Stable Diffusion videos. In the future, I'd like to do a playlist on how to do a visual novel as a web application, which would feature using generative AIs to assist writing, audio, voice and art.
@markogrdinic5126 ปีที่แล้ว
In the original CFR algorithm from the 2008 paper, the average policy update is even more confusing than it is in the video. In it the update happens not after the current policy has been updated, but before that. You can see it as an optimization hack so the algorithm does not have to compute an extra normalization, but as a consequence of that, the self probability the coming into the node might not, and most likely wouldn't be the same as the one that made the previous current policy update. I feel bad about saying the self probability multiplication doesn't matter after everyone in the RL Reddit thread said it does, so let me say that if anyone can construct an example where the update that omits it converges to the wrong answer, I'd be willing to change my mind.

Marko Grdinic (Ghostlike)

ความคิดเห็น