For the past 9~ hours I have been trying to figure out the pattern only by analyzing a few example results i already had. You have provided a clear and understandable explanation for the connection between these seemingly random numbers in the table. Thank you very much!
Since you are computing an iterated convolution, you could also use a fourier transform to directly compute the result, eliminating the need for dynamic programming. With a bit of optimization you can get to a space complexity of n and a time complexity of nlogn. It's an interesting use of the DFT considering most TH-cam content only uses it for polynomial multiplication.
@@andy02q Your gonna need a crazy amount of ram to compute the exact asnwer. A naive guess would be 6^(5•10^9), which as an integer would take up 1.6 GB of space alone.
@@andy02q I can help you with this problem. To find the probability of rolling a total sum of 3.5 * 10^10 using 10^10 dice, we can use the Central Limit Theorem (CLT). The Central Limit Theorem states that the distribution of the sum of a large number of independent and identically distributed random variables approaches a normal distribution. In this case, we're dealing with a large number of dice (10^10), so we can use the CLT. Each die has 6 sides, numbered 1 through 6. The mean (μ) and variance (σ^2) of a single die are: μ = (1+2+3+4+5+6)/6 = 3.5 σ^2 = [(1-3.5)^2+(2-3.5)^2+(3-3.5)^2+(4-3.5)^2+(5-3.5)^2+(6-3.5)^2]/6 ≈ 2.917 Since there are 10^10 dice, the mean and variance of the sum are: μ_total = 10^10 * 3.5 = 3.5 * 10^10 σ^2_total = 10^10 * 2.917 ≈ 2.917 * 10^10 Now we can standardize the sum using the z-score: z = (x - μ_total) / (σ_total) z = (3.5 * 10^10 - 3.5 * 10^10) / sqrt(2.917 * 10^10) = 0 The z-score is 0, which means the probability of getting exactly 3.5 * 10^10 as the sum is the peak of the normal distribution. To find the probability of obtaining this sum, we can use a standard normal distribution table or an online calculator. However, since we're dealing with discrete dice rolls, the exact probability should be calculated by considering the values in the neighboring region. The probability of getting a sum close to 3.5 * 10^10 will be high, but it's not possible to compute the exact probability without a computer for such a large number of dice. The Central Limit Theorem provides us with an approximation, and the probability will be close to the peak of the normal distribution. However, if you do enough rolls to approximate a random distribution you will find that the amount of instances of rolling 3.5 * 10^10 approaches the peak of the normal distribution. Calculating the peak of the normal distribution requires finding the maximum value of the probability density function (PDF) of the normal distribution. The PDF for a normal distribution is given by: f(x) = (1/(σ√(2π))) * e^(-((x-μ)^2)/(2σ^2)) For this problem, we have the following parameters: μ = μ_total = 3.5 * 10^10 σ = σ_total = sqrt(2.917 * 10^10) To find the peak of the normal distribution, we should evaluate the PDF at the mean (x = μ). This is because the mean of the normal distribution corresponds to the mode, which is the value with the highest probability. f(3.5 * 10^10) = (1/(sqrt(2.917 * 10^10) * √(2π))) * e^(-((3.5 * 10^10 - 3.5 * 10^10)^2)/(2 * (2.917 * 10^10))) Since (3.5 * 10^10 - 3.5 * 10^10)^2 = 0, the exponent part becomes: e^(-0) = 1 Thus, the PDF at the mean is: f(3.5 * 10^10) = (1/(sqrt(2.917 * 10^10) * √(2π))) Calculating this value: f(3.5 * 10^10) ≈ (1/(sqrt(2.917 * 10^10) * 2.5066)) ≈ 9.776 * 10^(-6) The peak of the normal distribution for this problem is approximately 9.776 * 10^(-6). Keep in mind that this value represents the density of the distribution and not the exact probability of obtaining the sum of 3.5 * 10^10. The actual probability can only be found by considering the discrete nature of the dice rolls and summing up the probabilities in the neighborhood of the target sum.
I found Eratosthenes' Sieve to be a much more intuitive implementation of this idea ("bottom-up" instead of "top-down".) To know if N is prime or composite, make an array of N elements and mark every element as prime. Starting at 2 and ending at sqrt(N), mark every multiple of 2 as composite. Then mark every multiple of 3 as composite. Then because 4 is already marked composite, skip to 5, etc. When we get to sqrt(N) then we know that either N is composite because we marked it composite when we were on its divisor M or we never marked it as a multiple of any integer 1 < M
Not sure if I'm right, but the idea of using a lookup table for already computed results looks a lot like the "memoization" aspect of functional programming languages
2:25 I wondered how fast that would actually be. For ten dice my brute force attempt took less than a second to create the full histogram, do not underestimate the power of brute force ;-) 32 seconds for twelve dice and 192 seconds for thirteen. Of course this grows very very quickly. Brute forcing anything beyond that is kinda hopeless. 30 dice would be quite impossible to brute force.
Of course, 32 seconds is an absolute eternity for a modern processor. The optimal solution time is measured in microseconds. How much this matters depends entirely on application. If you're just calculating it once, who cares? If it took an hour it wouldn't matter. But if you wanted to run this calculation in between frames in a video game, a solution that took even one second would be a thousand times too slow.
@@cogspace If calculating it once then the brute-force for 13 dice would much much faster than the "optimum" as the optimum would take longer to code :P
@@andrewnunes9148 hence why i said for 13 it would still be faster. Given the numbers from the video 14-15 might still be faster with bruteforce depending on how fast you are at coding. but 40 dice? Naaah, no chance.
this inspired me to make a function in my python package that can calculate the number of ways you can throw so the number is the same as the number you wanted to throw
The best way to solve any computation problem: Looking it up in a list The second best way to solve any computation problem: Finding a efficient way to make a list.
I solved this problem using the multinomial theorem to expand (x + x^2 + ... + x^6)^m which involves finding all weak compositions of m into 6 parts. Given this composition, lets say its [3, 0, 2, 1, 0, 0] find the coefficient for (x^1)^3 * (x^2)^2 * x^3 using the multinomial coefficient (m:3,2,1). Repeat for all compositions, and the result in standard form tells you for each exponent of x the number of ways to sum to that number.
@@isomorfismo1 i realized now that I shouldnt have just wikipedia'd it without realizing that this sum of multiple multinomial coeffiecients for each x^k is precisely what this dynamic programming algorithm calculates. My solution is a bit slower. To answer your question, look at wikipedia's page on the multinomial theorem. The expansion = the sum of all weak compositions of n (we use m) into 6 parts, where for each composition you do what I said previously. A weak composition of m in 6 parts is a way to sum up to m using exactly 6 numbers where you can use 0s.
This recursion optimization of saving previously calculated values, if i remember correctly, is called memoization(not a typo or baby way to say memorization).
I would say memoization and dynamic programming are not exactly the same thing but often overlap. Memoization is an implementation technique where a function keeps track of arguments it’s been called with and their return values so that if it sees the same arguments it can just return the same value. Dynamic programming is more of a problem solving technique of optimizing a solution by recognizing that you are repeating a lot of work and doing things in a different way so you don’t repeat yourself. Notice that the dynamic programming solution in this video doesn’t actually include any memoization. And memoization can be used separate from dynamic programming (query result caching for example).
I would say there is a difference there. Memoization is storing the result of some inputs. When those same inputs are used again, the stored result is used. Dynamic programming is a concept of dividing a problem into smaller subproblems and using storage to avoid recalculating a subproblem twice.
Excellent discussion. I had a hard time understanding the algorithm and eventually had to write it out myself. The problem for me was that for you the dice are all the same, so rolling a 2 on one die and a 1 on the other to get 3 was the same as rolling a 1 on the first die and a 2 on the other. But to me the dice were different, and so that was *two* different ways of rolling a 3 rather than *one*. Interesting how thinking of the problem space in a slightly different way make the solution harder to understand. The solution was fascinating though. Thanks.
Not sure what you mean, the dices are "different" in the video. Look at the beginning where he counts the number of ways to get a 7 with 2 dice. There are 6 different ways: 1,6 ; 6,1 ; 2,5 ; 5,2 ; 3,4 ; 4,3 Or did I misunderstand you?
@@Feeber2 I think that what they're getting at is that there are two ways to look at combinations, one is ordered and the.other is unordered. Lottos for example are usually unordered. It doesn't matter whether the winning numbers drawn are 1, 2, 3, 4 or 4, 3, 2 , 1 or 2, 3, 1, 4. If your ticket has 1, 2, 3 & 4 on it in any order you win. But order matters for some applications. The numbers 1234, 4321 and 2314 are clearly very different. I think that perhaps OP is saying that when they were thinking about the problem, they were counting rolling a 3 & 2 as one way to roll 5, and rolling 2 & 3 as a separate way to make 5. So where the video counts 1 way to make 5, OP is counting 2. I suspect that the difference just changes how the math(s) works a little bit.
@@spuzzdawg I fully agree with your first paragraph. Thats exactly what I believe as well. Of course there are problems where order matters and problems where it doesn't. Lotto is a good example for one where it does not, and sums of dice rolls is a good example for one where it does. The problem I have is though that they count 2,3 and 3,2 as two seperate ways in the video and not as one, so exactly in the way that OP believes it should be done and also in the way it needs to be done to produce correct results. That's why I was a little confused. Regarding your final sentence: It doesn't just change the math a little bit, it either produces right, or wrong results. If you don't count 2,3 and 3,2 as two seperate ways to roll a sum of 5, then you will never get results that are in line with reality.
@@Feeber2 Hm, I personally thought that @ronaldbell7429 was trying to say that they were trying to compute the possibility of dice with varying numbers of sides rolling a certain number, so for instance, the odds of a die with 4 sides, plus a die with 7 sides, plus a die with 13 sides rolling a specific number.
As a test, I coded this in Python both ways: Top-down using recursion, and bottom-up using the lookup table. The iterative method was incredibly faster. It found the answer to 100 dice and total of 500 in about 50 milliseconds. The top-down approach never finished.
@@victorcossio On a time scale where you eventually give up, I'd say hours or days at most 😂 But knowing the dark side of maths as well, it might've been millions of years as well 👌
I'm taking a discrete structures class right now, so I figured out a way to calculate the same thing using combinatorics and multisets. I don't know whether it's worth it, because factorials take time to calculate, but it works pretty well and isn't *that* hard to understand.
There are no doubt even better solutions than this, but you can cut down on the number of table calls you need to do by considering the chance of rolling a total of M on N dice is the sum of the product of the chances of rolling M-x and x over N/2 dice (with x ranging between N/2 and 3N, being the minimum and maximum rolls on N/2). With some adjustments for odd Ns and the like, this makes it so you mostly only have to check the powers of 2
Optimized recursive algorithm: For n dice, and a sum of m, do n-1 dice, and m-1,m-2,m-3,m-4,m-5,m-6, take m-6, if the new n*6 is less than the new m, we know that the number of ways to achieve that value to be 0, due to the new m being too large to achieve with dice. Likewise, if the new n is greater than the new m, we know that the number of ways to achieve that value to be 0, due to the new m being too small to achieve with dice. In this way, you can prune a lot of impossibilities earlier on. Recursively call this function for what is left, until n=1, where you can solve it as before.
Dynamic programming is just reusing intermediate results to avoid recomputing them. I have no idea why anyone thinks that it warrants a specific term. To me, that's so trivially obvious as an optimization technique that the term just makes the concept more confusing.
Yeah, I ended up here cos I couldn't remember what DP is (teehee). And it's just memoising functions.... Actually think I forgot *because* it's so trivial. Still useful to learn, but the name is silly. Not really a paradigm, is it.
@@JollyboatBros It doesn't even have to be a memoising function, technically, although that is an extremely simple way of implementing the concept. It can be as simple as a for loop with some temporary variables.
Thank you very much, I preciate your work and the idea of counting is cleaver. A develpoment is made by generalizing the idea for example the number of sides is 2 or 3 or what ever it is. R code programme is made to help solving such a problem n
It still kind of messes with my head that we would only count rolling a 1 and a 1 to get a sum of two once, but yet we count rolling a 1 and a 2, and a 2 and a 1 as two ways to get a sum of 3. It feels like we should be counting double ones twice to account for the fact that each die has to roll a 1.
I think it helps when you label the number with the dice "identifier" ( e.g. dice 1 number 1 => d1[1] ). Once you label the numbers and look at the combinations, like d1[1] : d2[1], then you realize the operations only go from d1 to d2, thus 1 and 1 occurs only once. Since we don't care about the order of the combinations, counting combinations going from d2 to d1 becomes unnecessary. Meaning, d1[1] : d2[1] == d2[1] : d1[1]. So, for combinations of 2 dice which sum equals to 3, we get => d1[1] : d2[2] and d1[2] : d2[1] Clearly, with the labels, d1[1] : d2[2] != d1[2] : d2[1], we can see that 1 : 2 and 2 : 1 is not the same thing (meaning, we are not counting 2 : 1 twice). In fact, if we were to count both double ones (from d1 to d2, and d2 to d1), then for dice_sum(2, 3), we would have 4 combinations.
Why? Just try listing up all the ways you can get 2 when rolling 2 dice: die1 - die2 1 - 1 Thats it. The first throw MUST be a 1 and the second throw MUST be a 1 - there is no other way. Now how to get 3: 1 - 2 2 - 1 You have 2 distinct ways of getting a 3 - either the first rolls a 1 and the second a 2, or the first a 2 and the second a 1. With the result of 7 it is way more obvious: 1-6 2-5 3-4 4-3 5-2 6-1 try it out - getting twice the same number is just less likely than getting 2 different numbers with the same result. With the way you described getting there would be no bell-curve but a flat line. Rolling 1000 dies and getting 1000 would be just as likely as getting 3500 - which is not the case.
I concur on both parts, the question becomes when is space the problem or computations per number calculated. There are three fundamental restrictions, Processes Space Accuracy My immediate thought was create a bell curve and report closest integer results.
@@ABaumstumpf the memory save isn't impactful on modern systems but it is substantial if you think about how the memory requirement grows with the size of the problem. But also, it's just a metaphor. This sort of optimization would be great in one environment and not so great in another. Without a precise description of what these dice rolls are being used for, it's impossible to tell whether certain optimizations are good or bad.
Lets examplify 1-1 and 1-2(2-1). For 1-1 we need that the first die be a 1 and the second be a 1 For rolling 3 (1-2 pair) we either roll a 1 then get a 2 or we roll a 2 first then roll a 1, so 2 different possibilities
Hi guys! This problem is the easiest possible combinatorics example. If you don't know how to solve this using discrete math, then go back to studying. Dynamic programming comes way after this. Enjoy your journey!
Yeaah, no. You can check it quite fast. It favorises smaller sums. For example: using 2 dice, the most probable sum is 7, so according to your formula the amount of ways to achieve 7 is 2 * 6 - 7 + 1 = 6. That's correct, but if you plug in some smaller number than 7 the outcome should be always less than 6. So let's check it - 2 * 6 - 3 + 1 = 10. Ultimately saying that there are more ways to get a sum of 3 than to get a sum of 7. Hence 3 is more likely sum than 7. Which is obviously not true making this formula incorrect. Hope this helps😅
Not 100% sure but I think the formula for rolling the value v with d dice of f faces is Σ(n=0 to floor((v-d)/f)) (-1)^n * ((v-6n-1) choose (d-1)) * (d choose n) It seems to work for d
60 Million combinations should be able to be created, iterated over and destroyed in well under a second, though - even single threaded, we have over a billion cycles per second by now, after all 😁
yeah but multiple of them are calculating the same thing, so we use a lookup table to save on doing that. also here's a followup question: how many ways are there to get a 420 on 100 dice
This could be an interesting challenge for ml. For example we could train an AI to take a sequence of 0s and 1s and attempt to predict the next bit/bits in the sequence. Then we can train another AI to generate a sequence of bits where the prediction AI can't find a a solution. We can then use the second AI to generate random numbers. After all randomness is about unpredictability.
Which isn't the point. In computer science, you should learn to recognize which problems grow exponentially and try to refactor them in a way that makes them easier. For instance, changing a problem whose solution takes 2^n time into a problem that takes n^2 (or better still log(n), and best possible linear on n)
@@obamagaming3802lets say it would take 10 seconds, if u add 5 dice, that is already 7000x more time than 10 dice, even 1 extra dice would make it go from 10 seconds to 1 minute
@@andrewnunes9148they're not talking about that, they're talking about how the average was stated to be several minutes but was actually less than 10 seconds
I love how your lookup table is a literal table
I came here after watching your videos on edx - web development in js and python. So far I have been loving the lectures
How is that course? Do you recommend?
@@prakash_77 Yes Highly recommended!!!
For the past 9~ hours I have been trying to figure out the pattern only by analyzing a few example results i already had. You have provided a clear and understandable explanation for the connection between these seemingly random numbers in the table. Thank you very much!
Since you are computing an iterated convolution, you could also use a fourier transform to directly compute the result, eliminating the need for dynamic programming. With a bit of optimization you can get to a space complexity of n and a time complexity of nlogn. It's an interesting use of the DFT considering most TH-cam content only uses it for polynomial multiplication.
I was about to go into the comments to say exactly this, but it seems that you beat me to it.
I'd really like to know how to do it. Like what is the chance to throw a 3.5*10^10 with 10^10 dices? There must be a analytic solution somehow.
@@andy02q Your gonna need a crazy amount of ram to compute the exact asnwer. A naive guess would be 6^(5•10^9), which as an integer would take up 1.6 GB of space alone.
@@Temari_Virus No. I'm totally certain, that it can be done by hand without a computer, let alone with huge RAM size.
@@andy02q I can help you with this problem. To find the probability of rolling a total sum of 3.5 * 10^10 using 10^10 dice, we can use the Central Limit Theorem (CLT). The Central Limit Theorem states that the distribution of the sum of a large number of independent and identically distributed random variables approaches a normal distribution. In this case, we're dealing with a large number of dice (10^10), so we can use the CLT.
Each die has 6 sides, numbered 1 through 6. The mean (μ) and variance (σ^2) of a single die are:
μ = (1+2+3+4+5+6)/6 = 3.5
σ^2 = [(1-3.5)^2+(2-3.5)^2+(3-3.5)^2+(4-3.5)^2+(5-3.5)^2+(6-3.5)^2]/6 ≈ 2.917
Since there are 10^10 dice, the mean and variance of the sum are:
μ_total = 10^10 * 3.5 = 3.5 * 10^10
σ^2_total = 10^10 * 2.917 ≈ 2.917 * 10^10
Now we can standardize the sum using the z-score:
z = (x - μ_total) / (σ_total)
z = (3.5 * 10^10 - 3.5 * 10^10) / sqrt(2.917 * 10^10) = 0
The z-score is 0, which means the probability of getting exactly 3.5 * 10^10 as the sum is the peak of the normal distribution. To find the probability of obtaining this sum, we can use a standard normal distribution table or an online calculator.
However, since we're dealing with discrete dice rolls, the exact probability should be calculated by considering the values in the neighboring region. The probability of getting a sum close to 3.5 * 10^10 will be high, but it's not possible to compute the exact probability without a computer for such a large number of dice. The Central Limit Theorem provides us with an approximation, and the probability will be close to the peak of the normal distribution.
However, if you do enough rolls to approximate a random distribution you will find that the amount of instances of rolling 3.5 * 10^10 approaches the peak of the normal distribution.
Calculating the peak of the normal distribution requires finding the maximum value of the probability density function (PDF) of the normal distribution. The PDF for a normal distribution is given by:
f(x) = (1/(σ√(2π))) * e^(-((x-μ)^2)/(2σ^2))
For this problem, we have the following parameters:
μ = μ_total = 3.5 * 10^10
σ = σ_total = sqrt(2.917 * 10^10)
To find the peak of the normal distribution, we should evaluate the PDF at the mean (x = μ). This is because the mean of the normal distribution corresponds to the mode, which is the value with the highest probability.
f(3.5 * 10^10) = (1/(sqrt(2.917 * 10^10) * √(2π))) * e^(-((3.5 * 10^10 - 3.5 * 10^10)^2)/(2 * (2.917 * 10^10)))
Since (3.5 * 10^10 - 3.5 * 10^10)^2 = 0, the exponent part becomes:
e^(-0) = 1
Thus, the PDF at the mean is:
f(3.5 * 10^10) = (1/(sqrt(2.917 * 10^10) * √(2π)))
Calculating this value:
f(3.5 * 10^10) ≈ (1/(sqrt(2.917 * 10^10) * 2.5066)) ≈ 9.776 * 10^(-6)
The peak of the normal distribution for this problem is approximately 9.776 * 10^(-6). Keep in mind that this value represents the density of the distribution and not the exact probability of obtaining the sum of 3.5 * 10^10. The actual probability can only be found by considering the discrete nature of the dice rolls and summing up the probabilities in the neighborhood of the target sum.
This was the precise problem I've been trying to wrap my head around in DP; Thank you!
I found Eratosthenes' Sieve to be a much more intuitive implementation of this idea ("bottom-up" instead of "top-down".) To know if N is prime or composite, make an array of N elements and mark every element as prime. Starting at 2 and ending at sqrt(N), mark every multiple of 2 as composite. Then mark every multiple of 3 as composite. Then because 4 is already marked composite, skip to 5, etc. When we get to sqrt(N) then we know that either N is composite because we marked it composite when we were on its divisor M or we never marked it as a multiple of any integer 1 < M
When you make the lookup table, you can disregard results that are less than the number of dice, not just results less than one
And any that are higher than 6 times the number of dice
Not sure if I'm right, but the idea of using a lookup table for already computed results looks a lot like the "memoization" aspect of functional programming languages
Yep caching is a form of memorization.
However memorization is a design strategy and not an aspect of any language.
Yep, "dynamic programming" is basically "bottom-up recursion with memoization of previous recursion calls".
U are a legend! Keep doing good work
2:25 I wondered how fast that would actually be. For ten dice my brute force attempt took less than a second to create the full histogram, do not underestimate the power of brute force ;-)
32 seconds for twelve dice and 192 seconds for thirteen. Of course this grows very very quickly. Brute forcing anything beyond that is kinda hopeless. 30 dice would be quite impossible to brute force.
Of course, 32 seconds is an absolute eternity for a modern processor. The optimal solution time is measured in microseconds. How much this matters depends entirely on application. If you're just calculating it once, who cares? If it took an hour it wouldn't matter. But if you wanted to run this calculation in between frames in a video game, a solution that took even one second would be a thousand times too slow.
@@cogspace If calculating it once then the brute-force for 13 dice would much much faster than the "optimum" as the optimum would take longer to code :P
@@ABaumstumpfdepends on how high is the amount of dice that u wanna calculate tho, each die make the programs considerably slower
@@andrewnunes9148 hence why i said for 13 it would still be faster.
Given the numbers from the video 14-15 might still be faster with bruteforce depending on how fast you are at coding. but 40 dice? Naaah, no chance.
"We'll need somewhere to store it"
Me internally: "Hashmap, throw a hashmap at it"
Not that i was right tho, but the meme will live on
@@makkusaiko Ackhsutually you can use a hashmap as well, u need to index with no. of dice and the sum 🤓
@@DinujayaRajakaruna yee. They just didn't use one
I like how your lookup table is an actual, wooden table 😊
this inspired me to make a function in my python package that can calculate the number of ways you can throw so the number is the same as the number you wanted to throw
The best way to solve any computation problem: Looking it up in a list
The second best way to solve any computation problem: Finding a efficient way to make a list.
I solved this problem using the multinomial theorem to expand (x + x^2 + ... + x^6)^m which involves finding all weak compositions of m into 6 parts. Given this composition, lets say its [3, 0, 2, 1, 0, 0] find the coefficient for (x^1)^3 * (x^2)^2 * x^3 using the multinomial coefficient (m:3,2,1). Repeat for all compositions, and the result in standard form tells you for each exponent of x the number of ways to sum to that number.
Can you explain with more details? Please
@@isomorfismo1 i realized now that I shouldnt have just wikipedia'd it without realizing that this sum of multiple multinomial coeffiecients for each x^k is precisely what this dynamic programming algorithm calculates. My solution is a bit slower.
To answer your question, look at wikipedia's page on the multinomial theorem.
The expansion = the sum of all weak compositions of n (we use m) into 6 parts, where for each composition you do what I said previously. A weak composition of m in 6 parts is a way to sum up to m using exactly 6 numbers where you can use 0s.
the hardest part of this implentation is finding the compositions, calculating the multinomial coeffecient is easy
7:25 what does that mean "add 6 values"?
This recursion optimization of saving previously calculated values, if i remember correctly, is called memoization(not a typo or baby way to say memorization).
I would say memoization and dynamic programming are not exactly the same thing but often overlap. Memoization is an implementation technique where a function keeps track of arguments it’s been called with and their return values so that if it sees the same arguments it can just return the same value. Dynamic programming is more of a problem solving technique of optimizing a solution by recognizing that you are repeating a lot of work and doing things in a different way so you don’t repeat yourself.
Notice that the dynamic programming solution in this video doesn’t actually include any memoization. And memoization can be used separate from dynamic programming (query result caching for example).
I would say there is a difference there. Memoization is storing the result of some inputs. When those same inputs are used again, the stored result is used.
Dynamic programming is a concept of dividing a problem into smaller subproblems and using storage to avoid recalculating a subproblem twice.
Excellent discussion. I had a hard time understanding the algorithm and eventually had to write it out myself. The problem for me was that for you the dice are all the same, so rolling a 2 on one die and a 1 on the other to get 3 was the same as rolling a 1 on the first die and a 2 on the other. But to me the dice were different, and so that was *two* different ways of rolling a 3 rather than *one*. Interesting how thinking of the problem space in a slightly different way make the solution harder to understand. The solution was fascinating though. Thanks.
Not sure what you mean, the dices are "different" in the video. Look at the beginning where he counts the number of ways to get a 7 with 2 dice. There are 6 different ways: 1,6 ; 6,1 ; 2,5 ; 5,2 ; 3,4 ; 4,3
Or did I misunderstand you?
@@Feeber2 I think that what they're getting at is that there are two ways to look at combinations, one is ordered and the.other is unordered. Lottos for example are usually unordered. It doesn't matter whether the winning numbers drawn are 1, 2, 3, 4 or 4, 3, 2 , 1 or 2, 3, 1, 4. If your ticket has 1, 2, 3 & 4 on it in any order you win. But order matters for some applications. The numbers 1234, 4321 and 2314 are clearly very different.
I think that perhaps OP is saying that when they were thinking about the problem, they were counting rolling a 3 & 2 as one way to roll 5, and rolling 2 & 3 as a separate way to make 5. So where the video counts 1 way to make 5, OP is counting 2.
I suspect that the difference just changes how the math(s) works a little bit.
@@spuzzdawg I fully agree with your first paragraph. Thats exactly what I believe as well. Of course there are problems where order matters and problems where it doesn't. Lotto is a good example for one where it does not, and sums of dice rolls is a good example for one where it does.
The problem I have is though that they count 2,3 and 3,2 as two seperate ways in the video and not as one, so exactly in the way that OP believes it should be done and also in the way it needs to be done to produce correct results. That's why I was a little confused.
Regarding your final sentence: It doesn't just change the math a little bit, it either produces right, or wrong results. If you don't count 2,3 and 3,2 as two seperate ways to roll a sum of 5, then you will never get results that are in line with reality.
@@Feeber2 Hm, I personally thought that @ronaldbell7429 was trying to say that they were trying to compute the possibility of dice with varying numbers of sides rolling a certain number, so for instance, the odds of a die with 4 sides, plus a die with 7 sides, plus a die with 13 sides rolling a specific number.
I like the eerie pascal table that seems to form in the top row, and the symmetry, there must be some ways to skip accessing most cells
As a test, I coded this in Python both ways: Top-down using recursion, and bottom-up using the lookup table. The iterative method was incredibly faster. It found the answer to 100 dice and total of 500 in about 50 milliseconds. The top-down approach never finished.
How much is never?
@@victorcossio On a time scale where you eventually give up, I'd say hours or days at most 😂
But knowing the dark side of maths as well, it might've been millions of years as well 👌
@@victorcossio I gave up after about 10 minutes. I had my answer.
Really good video, glad it appeared in my recc
I'm taking a discrete structures class right now, so I figured out a way to calculate the same thing using combinatorics and multisets. I don't know whether it's worth it, because factorials take time to calculate, but it works pretty well and isn't *that* hard to understand.
Same principle applies to factorials among other ways to optimize them
There are no doubt even better solutions than this, but you can cut down on the number of table calls you need to do by considering the chance of rolling a total of M on N dice is the sum of the product of the chances of rolling M-x and x over N/2 dice (with x ranging between N/2 and 3N, being the minimum and maximum rolls on N/2). With some adjustments for odd Ns and the like, this makes it so you mostly only have to check the powers of 2
I am very much keen to know what tech you are using to make such awesome videos. How you are putting these animations? if possible, please share..
very cool
Awesome video!
Thanks a lot!
Very nicely explained a hard to understand problem. Thanks!
its amazing
I this case order of dice that were rolled matters. In other cases you may not want order of dice to matter so you have to use something else
Quite interesting
LISP is a recursive language. First you curse and then you recurse.
🤣😂
Thank you, now I can code it. There is no more error in my C++ code.
I have no words to explain how easy you make any topic
.. thanks 🎉
What is name of program you used for animation?
Optimized recursive algorithm:
For n dice, and a sum of m, do n-1 dice, and m-1,m-2,m-3,m-4,m-5,m-6, take m-6, if the new n*6 is less than the new m, we know that the number of ways to achieve that value to be 0, due to the new m being too large to achieve with dice. Likewise, if the new n is greater than the new m, we know that the number of ways to achieve that value to be 0, due to the new m being too small to achieve with dice. In this way, you can prune a lot of impossibilities earlier on. Recursively call this function for what is left, until n=1, where you can solve it as before.
does this have any relation to partitions from number theory?
It has relation to partitions from combinatoric.
what software did u use to make the video?
I know just enough to see a tiny pascals triangle in the lookup but i know too little to understand why its there
Dynamic programming is just reusing intermediate results to avoid recomputing them. I have no idea why anyone thinks that it warrants a specific term. To me, that's so trivially obvious as an optimization technique that the term just makes the concept more confusing.
Yeah, I ended up here cos I couldn't remember what DP is (teehee). And it's just memoising functions.... Actually think I forgot *because* it's so trivial. Still useful to learn, but the name is silly. Not really a paradigm, is it.
@@JollyboatBros It doesn't even have to be a memoising function, technically, although that is an extremely simple way of implementing the concept. It can be as simple as a for loop with some temporary variables.
What if the die is not a cube, it has p number of faces???? How does the algorithm change
It just changes the zero values in the lookup table. The problem doesn’t change, just the values in the table.
Thank you very much, I preciate your work and the idea of counting is cleaver.
A develpoment is made by generalizing the idea for example the number of sides is 2 or 3 or what ever it is.
R code programme is made to help solving such a problem
n
It still kind of messes with my head that we would only count rolling a 1 and a 1 to get a sum of two once, but yet we count rolling a 1 and a 2, and a 2 and a 1 as two ways to get a sum of 3. It feels like we should be counting double ones twice to account for the fact that each die has to roll a 1.
I think it helps when you label the number with the dice "identifier" ( e.g. dice 1 number 1 => d1[1] ).
Once you label the numbers and look at the combinations, like d1[1] : d2[1], then you realize the operations only go from d1 to d2, thus 1 and 1 occurs only once.
Since we don't care about the order of the combinations, counting combinations going from d2 to d1 becomes unnecessary.
Meaning, d1[1] : d2[1] == d2[1] : d1[1].
So, for combinations of 2 dice which sum equals to 3, we get => d1[1] : d2[2] and d1[2] : d2[1]
Clearly, with the labels, d1[1] : d2[2] != d1[2] : d2[1], we can see that 1 : 2 and 2 : 1 is not the same thing (meaning, we are not counting 2 : 1 twice).
In fact, if we were to count both double ones (from d1 to d2, and d2 to d1), then for dice_sum(2, 3), we would have 4 combinations.
Why?
Just try listing up all the ways you can get 2 when rolling 2 dice:
die1 - die2
1 - 1
Thats it. The first throw MUST be a 1 and the second throw MUST be a 1 - there is no other way.
Now how to get 3:
1 - 2
2 - 1
You have 2 distinct ways of getting a 3 - either the first rolls a 1 and the second a 2, or the first a 2 and the second a 1.
With the result of 7 it is way more obvious:
1-6
2-5
3-4
4-3
5-2
6-1
try it out - getting twice the same number is just less likely than getting 2 different numbers with the same result.
With the way you described getting there would be no bell-curve but a flat line. Rolling 1000 dies and getting 1000 would be just as likely as getting 3500 - which is not the case.
Guys, there's 420 ways to roll a sum of 13 with 5 dice
Skipped through additive properties versus multiplicate and then went tee hee this might seem simple.
Convole!
You can do it by keeping just two rows. Current and previous. Once current is full, make current as previous and previous as current. Repeat
You also don't need the numbers in the very bottom-left or upper-right of the table that don't contribute to the final number.
Sounds like a simple for loop in programming could work then.
I concur on both parts, the question becomes when is space the problem or computations per number calculated.
There are three fundamental restrictions,
Processes
Space
Accuracy
My immediate thought was create a bell curve and report closest integer results.
Sure - and you would end up with a bit lower memory consumption, and orders of magnitude slower if you wanted to calculate a similar result again.
@@ABaumstumpf the memory save isn't impactful on modern systems but it is substantial if you think about how the memory requirement grows with the size of the problem. But also, it's just a metaphor. This sort of optimization would be great in one environment and not so great in another. Without a precise description of what these dice rolls are being used for, it's impossible to tell whether certain optimizations are good or bad.
But why do all that recursive work when you have pascal's triangle forming there?
I don't understand why you consider 5-1 and 1-5 to be different outcomes but 6-6 and 6-6 are the same
Lets examplify 1-1 and 1-2(2-1).
For 1-1 we need that the first die be a 1 and the second be a 1
For rolling 3 (1-2 pair) we either roll a 1 then get a 2 or we roll a 2 first then roll a 1, so 2 different possibilities
@@andrewnunes9148 i get it now, thanks
Hi guys! This problem is the easiest possible combinatorics example.
If you don't know how to solve this using discrete math, then go back to studying.
Dynamic programming comes way after this.
Enjoy your journey!
I made a formula to calculate that. It's not proved but it seems to work. (number of dice) * (sides of dice) - (wanted number to roll) + 1
Yeaah, no. You can check it quite fast. It favorises smaller sums. For example: using 2 dice, the most probable sum is 7, so according to your formula the amount of ways to achieve 7 is 2 * 6 - 7 + 1 = 6. That's correct, but if you plug in some smaller number than 7 the outcome should be always less than 6. So let's check it - 2 * 6 - 3 + 1 = 10. Ultimately saying that there are more ways to get a sum of 3 than to get a sum of 7. Hence 3 is more likely sum than 7. Which is obviously not true making this formula incorrect. Hope this helps😅
I feel like the algorithm is a little more complex
Not 100% sure but I think the formula for rolling the value v with d dice of f faces is
Σ(n=0 to floor((v-d)/f)) (-1)^n * ((v-6n-1) choose (d-1)) * (d choose n)
It seems to work for d
So you're saying there's 3 ways to roll a 4 on 1 die?
@@MichaelDarrow-tr1mn there're some undiscovered bounds for it ig.
How does the program know what dice are?
🎲
60 Million combinations should be able to be created, iterated over and destroyed in well under a second, though - even single threaded, we have over a billion cycles per second by now, after all 😁
yeah but multiple of them are calculating the same thing, so we use a lookup table to save on doing that. also here's a followup question: how many ways are there to get a 420 on 100 dice
long table meme aged well
Was expecting a binary tree for storing results. Oh well.
This could be an interesting challenge for ml. For example we could train an AI to take a sequence of 0s and 1s and attempt to predict the next bit/bits in the sequence. Then we can train another AI to generate a sequence of bits where the prediction AI can't find a a solution. We can then use the second AI to generate random numbers. After all randomness is about unpredictability.
D&D players if skill checks were D6 be like:
an average computer from 2015 wouldn't take more than 10 seconds to compute the combinations of 10 dice and count them up
Which isn't the point. In computer science, you should learn to recognize which problems grow exponentially and try to refactor them in a way that makes them easier. For instance, changing a problem whose solution takes 2^n time into a problem that takes n^2 (or better still log(n), and best possible linear on n)
@@ronaldbell7429 i'm aware but they said it would take several minutes on an average computer which makes no sense
@@obamagaming3802lets say it would take 10 seconds, if u add 5 dice, that is already 7000x more time than 10 dice, even 1 extra dice would make it go from 10 seconds to 1 minute
@@andrewnunes9148they're not talking about that, they're talking about how the average was stated to be several minutes but was actually less than 10 seconds
The word total is problematic. Result would be unambiguous.