How to Generate Pseudorandom Numbers | Infinite Series

แชร์
ฝัง
  • เผยแพร่เมื่อ 11 ต.ค. 2017
  • Viewers like you help make PBS (Thank you 😃) . Support your local PBS Member Station here: to.pbs.org/donateinfi
    What is a the difference between a random and a pseudorandom number? And what can pseudo random numbers allow us to do that random numbers can't?
    Tweet at us! @pbsinfinite
    Facebook: pbsinfinite series
    Email us! pbsinfiniteseries [at] gmail [dot] com
    Previous Episode
    How many Cops to catch a Robber? | Infinite Series
    • How Many Cops to Catch...
    Computers need to have access to random numbers. They’re used to encrypt information, deal cards in your game of virtual solitaire, simulate unknown variables -- like in weather prediction and airplane scheduling, and so much more. But How can a computer possibly produce a random number?
    Written and Hosted by Kelsey Houston-Edwards
    Produced by Rusty Ward
    Graphics by Ray Lux
    Assistant Editing and Sound Design by Mike Petrow
    Made by Kornhaber Brown (www.kornhaberbrown.com)
    Special Thanks to Alex Townsend
    Big thanks to Matthew O'Connor and Yana Chernobilsky who are supporting us on Patreon at the Identity level!
    And thanks to Nicholas Rose and Mauricio Pacheco who are supporting us at the Lemma level!

ความคิดเห็น • 576

  • @john_hunter_
    @john_hunter_ 6 ปีที่แล้ว +297

    It's going to be really awkward when Infinite Series, eventually ends.

    • @Bodyknock
      @Bodyknock 6 ปีที่แล้ว +52

      John Hunter Good point, the series won’t last an infinite amount of time and they only have a finite amount of time they can actually film. My suggestion is handle the videos as a supertask with each video taking half the time to make and coming out twice as quickly as the last one. That way they could release an infinite series of videos using only a finite amount of material. :)

    • @adblockturnedoff4515
      @adblockturnedoff4515 6 ปีที่แล้ว +9

      Good god. That is a brilliant idea to keep the show alive for infinite time. I love it.
      Only problem is the host and the technical staff need to be able to handle it (if people were able to do it then humanity would reach a whole new level).

    • @Bodyknock
      @Bodyknock 6 ปีที่แล้ว +21

      Sripad Kowshik Subramanyam Reminds me of the old joke about the chemist, the engineer and the mathematician who go vacationing in a cabin in the woods. One night the chemist wakes up and sees a fire on the stove. He quickly looks around, finds some inert powder in the cupboard and dumps it on the fire to put it out. The next night the engineer comes in and sees another fire on the stove (none of them are apparently great with kitchen safety). The engineer thinks for a bit about how much water he’ll need to dowse it, goes and grabs their water canteens and pours out just enough of them to douse the flame. The third night the mathematician comes in and sees yet another fire on the stove. He quickly grabs his notepad and starts vigorously writing down equations. A few minutes later he smiles, puts the pad down on the table and says “A solution exists, QED,” and goes back to bed. 😄

    • @adblockturnedoff4515
      @adblockturnedoff4515 6 ปีที่แล้ว +1

      Doug Rosengard good one.

    • @nibblrrr7124
      @nibblrrr7124 6 ปีที่แล้ว +6

      Who says their need to be infinitely many *distinct* episodes?
      Let n be the number of the last episode. Then every episode e_k with k>n is defined to be equal to episode e_n. Voilà, infinite series. :^)

  • @thechrisgrice
    @thechrisgrice 6 ปีที่แล้ว +165

    Kinda funny watching the video quality getting slaughtered at 2:21. Thanks to Tom Scott, I now know why!

    • @recklessroges
      @recklessroges 6 ปีที่แล้ว +33

      Well spotted. I was concentrating on the audio not the video so I missed Dr Houston-Edwards face pixelating. Tom Scott: TH-cam's cool uncle.

    • @CommaCam
      @CommaCam 6 ปีที่แล้ว +4

      Thanks for that reference. I was wondering what happened there.

    • @sumantopal558
      @sumantopal558 5 ปีที่แล้ว +4

      add more confeity add more confeity !! lol

    • @T33K3SS3LCH3N
      @T33K3SS3LCH3N 5 ปีที่แล้ว

      On a mobile screen at 1080p it's hardly noticeable. Never been this disappointed about a sharp picture.

    • @evilferris
      @evilferris 5 ปีที่แล้ว +3

      Kailei R m.th-cam.com/video/r6Rp-uo6HmI/w-d-xo.html found it!

  • @LightMonkeyHD
    @LightMonkeyHD 6 ปีที่แล้ว +60

    This right down my alley. Computer Science and Math. My goodness.

    • @Tadesan
      @Tadesan 5 ปีที่แล้ว

      She sings her siren's song!

    • @devinschlegel1763
      @devinschlegel1763 3 ปีที่แล้ว

      unfortunately this channel is gone :(

  • @stuartbently421
    @stuartbently421 6 ปีที่แล้ว +153

    PBS explains these crazy topics with such precision and clarity, expecially infinite series. kudos for an exellent channel!

    • @stuartbently421
      @stuartbently421 6 ปีที่แล้ว +3

      omg lmao this ego
      r/iamverysmart

    • @fossilfighters101
      @fossilfighters101 6 ปีที่แล้ว

      +

    • @fossilfighters101
      @fossilfighters101 6 ปีที่แล้ว +1

      Also this thread made me giggle~

    • @szimre95
      @szimre95 5 ปีที่แล้ว

      It is not that precise, it is stated in the video that radiation and things like that are random, while in reality they are not random, it is close to impossible to replicate and things like that, but it is not truly random. Maybe Veritasium or Vsauce had a video about true randomness where it is described a lot more precise. Even a service like random.org which collects electromagnetic background noise from antennas is not truly random, that noise has a source. I don't think true randomness exists, but we can get fairly close.

  • @itisALWAYSR.A.
    @itisALWAYSR.A. 6 ปีที่แล้ว +35

    I just put the numbers 0 - 9999 into a spreadsheet and ran the Middle Square Algorithm on them. A few curiosities arose, which signify reasons it's Bad News for purpose....
    1) There are 20 ways to produce a 0000 value. As 0000 has period 1, there are multiple ways to end up in this sinkhole.
    2) Nearly 39% of the seed numbers can NEVER be produced this way.
    3) A further 35% of sees numbers can only be produced by one other four-digit value
    4) The number 5625 can be generated twelve ways. 2500 can be generated eleven ways (including itself). You might expect a number or two out of 10,000 seeds to be very popular, but knowing what they are means they're very predictable.
    In reality, you're not producing random numbers [0,9999] : you're producing numbers [0,6109] with a ton of spaces and biases scattered within....!

    • @groszak1
      @groszak1 5 ปีที่แล้ว +4

      that's why a good pseudorandom number generator should be used and not a bad one

    • @rumisbadforyou9670
      @rumisbadforyou9670 3 ปีที่แล้ว +4

      MS is an old algorithm, not the best one. But in April of 2017 people have implemented a Middle Square Weyl Sequence PRNG. It is a modified version of MS and instead of [0;9999] produces [0;2^32-1] possible values. It fixes alot of problems of MS, and passes every statistical test. Just take a modulo of the returned value (for example mod 16384), and check the distribution. It should be uniformal.
      Note: `s` is the seed. To initialise `s` take a random even number and add 0xb5ad4eceda1ce2a9 to it.
      en.wikipedia.org/wiki/Middle-square_method#Middle_Square_Weyl_Sequence_PRNG

    • @simonmultiverse6349
      @simonmultiverse6349 3 ปีที่แล้ว

      Knuth (volume 2 of The Art of Computer Programming) has lots of analysis and practical algorithms for pseudo-random numbers. Simple recipe: pseudo-random numbers (one of his algorithms) followed by another neat procedure of Knuth's which is shuffling the numbers in an output buffer.
      Another book, "Numerical Recipes", refers to this. Numerical Recipes has references to Knuth. You can order the Knuth book and Numerical Recipes at any good bookshop. Beware: these are weighty books with a heavy price tag. Also look up "primitive polynomial modulo 2" which gives you a sequence which only repeats after a looooooooooong time. It's fairly easy to find tables of these, i.e. someone has calculated some of these "primitive polynomials modulo 2". (Knuth describes these.) Pick one which has a long repeat period, and the result is a trivially-simple recipe to make a stream of bits which repeats after an amazingly long time, i.e. longer than the age of the universe. Using this as the basis of a random-number generator gives a very fast and simple algorithm.
      Look at tests of random number sequences: if you think you have a random number generator, there are ways of testing it, so hook up your RNG (Random Number Generator) to one of these tests to see how it behaves. The web has lots of information on this. Wikipedia, for example, has descriptions of ways of testing random numbers. So do other mathematics-related web sites. The tests are generally easy to perform.
      I've also experimented with prime numbers. Find a sequence which can be quite simple, based on addition modulo the prime number, and (importantly) repeats with a period of your favorite prime number. Have several of these in parallel, each based on a different prime number, and the period of the resulting generator will be something like the product of all of the prime numbers. If you have 6 prime numbers which are of size about 1,000,000 then the composite behaviour will repeat with a period of about 1,000,000^6.

    • @MrNottocd
      @MrNottocd 3 ปีที่แล้ว +2

      AFIK - The best random number generators are based on cryptology functions. Or, one could just as easily say that cryptology methods are based on random number generators, since the whole point of cryptology is to take a message and create a patternless sequence of numbers.
      This video barely scratched the surface of random number generators. I asked a predoctoral math teacher if she had considered random number research for her doctoral thesis. Her reply was that she should have. There was a lot of research opportunities available. And that was maybe 20 years ago, so more research must still be available.

  • @rednax3788
    @rednax3788 6 ปีที่แล้ว +22

    Random-number-generating algorithms are so interesting.

  • @Kram1032
    @Kram1032 6 ปีที่แล้ว +114

    It's true that a pseudo random generator must eventually repeat. However, it's not true that it must happen as soon as it hits the same output again.
    SRGs can have hidden state. Stuff inside the black box that isn't part of the output. That way they may effectively loop through all possible outputs not in just one way but, in fact, potentially in all possible orders. Similarly you can build them such that they occasionally spit out the same value twice in a row without falling into a loop where they spit out ONLY that one value.
    They have to repeat only once their entire state (hidden state plus output) is hit again. And that could take a LOT longer than hitting every possible output once.

    • @EpicFishStudio
      @EpicFishStudio 6 ปีที่แล้ว +6

      adding irrational number (or in actual computer case, very very accurate decimal one) to the seed on every step will cause the cycle be so long the existence of humanity will come to end faster.

    • @Kram1032
      @Kram1032 6 ปีที่แล้ว +4

      Reaaaally depends on the rest of your setup. But yes, irrational numbers are a nice source of entropy.

    • @OskarElek
      @OskarElek 6 ปีที่แล้ว +5

      You're right, but that's just deferring the problem - if you had N non-hidden states and H hidden ones, then your period is going to be (at best) N*H, so you're essentially back to square one.

    • @Kram1032
      @Kram1032 6 ปีที่แล้ว

      I don't really see a problem here...
      I just pointed out that a PRNG can through the same outputs a *lot* of times before starting to repeat.
      Of course if you want to go through all possible sequences you'll need to have H*N >= N! or, if you want to allow for arbitrary duplicates, it ought to be H*N >= N^N. That's quite a tall ask, usually. (that's just numerically. You'll also want to satisfy some properties to actually generate the desired sequences)
      But why stop there? What if, not only you want to run through all permutations, but you want to permute the order you run through the permutations?
      The numbers obviously become insanely huge long before we even get to that point. Too large to contain on a computer. (By N=59, N! exceeds the estimated number of atoms in the observable universe, N^N clears that by 48)

    • @kickmonlee3390
      @kickmonlee3390 6 ปีที่แล้ว +2

      Hidden states or not once your back to the same "exact" state your started with you are bound to repeat

  • @puncheex2
    @puncheex2 6 ปีที่แล้ว +3

    Knuth in his book pointed out that a randomly generated random number generator does not necessarily generate good random numbers. He discusses an algorithm he designed using things like a human's height and other imponderables. He wrote the code for it and set it to work. Though it had a space of 32 bits (if I remember correctly), if cycled with a period of 12.

  • @verdatum
    @verdatum 6 ปีที่แล้ว +6

    This is such a massively important topic. This issue alone has contributed to literally hundreds of thousands of dollars of my lifetime income. You've done a fantastic job of explaining a topic that, in interviews, I've found software engineers with over a decade of experience manage to get completely wrong.

    • @konstantin7596
      @konstantin7596 ปีที่แล้ว

      What are you doing? :)

    • @verdatum
      @verdatum ปีที่แล้ว

      @@konstantin7596 implementing telecom encryption from spec, writing and verifying secure code through static analysis, reverse engineering, and automated testing, that sort of thing.

  • @Trevortni
    @Trevortni 6 ปีที่แล้ว +2

    Inverse transform sampling is great in theory; in fact, when I needed to generate numbers from a distribution a few years back, it was my first instinct despite not knowing what to call it at the time.
    Turns out, though, that - well, I'll let Wikipedia tell you: "For a continuous distribution, however, we need to integrate the probability density function (PDF) of the distribution, which is impossible to do analytically for most distributions (including the normal distribution). As a result, this method may be computationally inefficient for many distributions and other methods are preferred."
    And that's how I discovered the Box--Muller Transform. Marvelous bit of work that actually will give you a normal distribution.

  • @Roxor128
    @Roxor128 6 ปีที่แล้ว +2

    Ooh! That drifting numbers background really did a number on the video quality.
    No pun intended.
    Tom Scott did a video on the topic called "Why Snow and Confetti Ruin TH-cam Video Quality" (at least I think that was the title) which gives a basic explanation of the phenomenon if you're interested.

  • @fanrco766
    @fanrco766 6 ปีที่แล้ว +40

    This was a great video, but i feel like the definition of randomness was a little vague. It would be great to see a video on Kolmogorov complexity to get a more in-depth insight on randomness. Im loving the computer science direction of this channel and i can't think of a better topic to cover!

    • @Uejji
      @Uejji 6 ปีที่แล้ว +13

      I'm sure the definition of randomness was vague because the video focused chiefly on psuedorandomness. However, it's pretty difficult to define randomness.
      When I took mathematical statistics for my BS in mathematics, we focused on randomness being the tendency of the image of a random variable to resemble the composition of its PDF/PMFs as the number of samples grows arbitrarily large.
      For instance, the PMF of flipping a fair coin is Pr(X=x) = 1/2, x∈{Heads, Tails}, so if a flip of a fair coin truly is random, then flipping it an arbitrarily large number of times should give a 1/2 chance for each side.
      However, if there were another process involved, such as having to choose one of four coins with replacement each flip, where one coin is fair, two gives give heads 75% of the time and the last gives tails 75% of the time, then if the process truly is random, the outcome should resemble the composition of those functions.
      Additionally, as mentioned in the video, we should expect randomness to not exhibit any identifiable pattern. However, this isn't immediately disqualifying of something as being random. For instance, I could roll a d6 18 times and get 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6. This is extremely unlikely, but can happen under a truly random system. Ultimately, though, this pattern should not continue for arbitrarily large sequences.
      Also, again mentioned in the video, an important aspect of randomness is irreproducibility. You should expect no two random processes to give the exact same output, though if you are extremely unlucky (or extremely lucky!) you might have to observe for a bit before the outputs diverge. In this last season of Doctor Who, there was an episode where people are living in a simulated reality, and they learn of this because any group of people calling out random numbers at the same time always calls out the same sequence of numbers because, as explained in the episode, computers can't generate truly random numbers (quantum computers should allow true random number generation, but that's beside the point).
      Ultimately, though, randomness by its very nature is difficult to precisely define. It is easier to identify what *is not* random and trim it away, like an asymptotic game of mathematical jezzball.

    • @davidshi451
      @davidshi451 6 ปีที่แล้ว

      John D. Cook wrote an interesting blog post on the topic: www.johndcook.com/blog/2012/04/19/random-is-as-random-does/

    •  6 ปีที่แล้ว +2

      Why not define randomness in relation to an observer? If an observer can't predict the sequence, then the sequence is random for that observer. That definition does away with the whole problem.
      The old "true randomness" requires a sequence to be unpredictable to all possible observers. That is problematic, as we don't know what observers there can be. If there is a god, he would be an observer?

    • @fanrco766
      @fanrco766 6 ปีที่แล้ว +2

      Lars Pensjö This is very similar to Kolmogorov Complexity. A series would be considered random if the shortest program that could generate the series is the length of the series plus a constant

    •  6 ปีที่แล้ว

      Interesting, I will look it up!

  • @cmilkau
    @cmilkau 6 ปีที่แล้ว +12

    1. The state of a PRNG can be larger than the last bits of randomness it generated.
    2. Cryptographically secure pseudo-random numbers cannot be distinguished from truly random numbers with limited resources, so unless your program is galactical size it is guaranteed you won't notice a difference. CSPRNG are typically significantly slower than PRNG in general.
    3. The period of the most used Mersenne Twister is much larger than 2^32.

    • @rmidthun
      @rmidthun 6 ปีที่แล้ว +2

      The Mersenne Twister is really generating 19937 bit numbers. The code only gives you 32 bits at a time, giving the appearance that the numbers are repeating within a cycle but that isn't the real size of the output. If you consider the actual output value, then the period is exactly 2^19937 - 1 which is what she meant.
      If you consider all the inputs to the generator, then there can be no hidden state. If there was state that wasn't dependent on the input, then you wouldn't have a PRNG, since it could produce different outputs for the same input. With these technical considerations, what she said is correct, you can't cycle longer that 2^outputbits. It is just a bit more nuanced.

    • @DavidJohnston_deadhat
      @DavidJohnston_deadhat 6 ปีที่แล้ว +2

      The speed thing rather depends on your implementation constraints. For example a CSPRNG based on a block cipher in CTR mode can be implemented in a parallel form that can be extended as far as is required to get the speed that is required. For many non cryptographic RNGs, like for example XORSHIFT or PCGs, this is not the case. The CSPRNG in most modern desktop and server CPUs is designed to exploit this parallelism. The worlds fastest production RNG is a CSPRNG that is frequently reseeded (0.5 to 1 millions times a second). I noticed that the video states that the Mersenne Twister is the gold standard, which is laughably false. Also the stated procedure for random fractions doesn't yield a uniform distribution when tried with floating point arithmetic. Especially if you do the multiply before the divide pushing the first term into the lowest resolution end of the floating point representation. It's easy enough to make uniform floats, but the method in the video is simply wrong.

  • @missprizm
    @missprizm 6 ปีที่แล้ว +1

    Great video. I love when it gets more applied or computer-science-y 😊

  • @vaisuliafu3342
    @vaisuliafu3342 3 ปีที่แล้ว

    Complete and to the point! Great video.

  • @OskarElek
    @OskarElek 6 ปีที่แล้ว +1

    One thing I'd add to this (very good) video is that uniform random numbers, besides having a constant probability density, have to be also *independent*. So every number has to be equally likely regardless what preceded it. Congruential generators might have a hard time achieving this since they essentially loop over the domain defined by M - and it's of course heavily dependent on the chosen A and C. Selecting primes for these parameters is usually a good start...

  • @shorttravelvlogs
    @shorttravelvlogs 3 ปีที่แล้ว

    this is the best explanation I have ever seen for random number algorithms

  • @rafaelr.2228
    @rafaelr.2228 4 ปีที่แล้ว

    65 days quarentined. frist weed i get, i came to watch your videos. Amazing. I understand the Maths, but not why I'm doing this to myself. Congratulations on the video.

  • @claytonsingh
    @claytonsingh 6 ปีที่แล้ว +3

    I think its important to point out that the internal state of a PRNG is not required to be the same size as the output value. In any "good" implementation your period will be much larger then the resulting set. IE using a 64bit (or larger) internal state and returning a 32bit value. It is also important to note that many PRNGs apply functions to the output bits in such a way to protect that internal state from leaking out, such as a hash.
    X_n+1 = (X_n * a + c) % 2^64
    return hash(X_n) & (2^32 - 1)

  • @unsaturated8482
    @unsaturated8482 3 ปีที่แล้ว +1

    ms. kelsey houston edwards you're certified amazing.

  • @tymothylim6550
    @tymothylim6550 2 ปีที่แล้ว

    Thank you very much for this video! It was very clear and informative for me :)

  • @phlsnst5882
    @phlsnst5882 6 ปีที่แล้ว

    Thank you so much

  • @kenji8763
    @kenji8763 6 ปีที่แล้ว

    I love Infinite Series but I gotta say, the Comp-Sci related videos are above and beyond spectacular

  • @kamingfung8077
    @kamingfung8077 6 ปีที่แล้ว +1

    Good job! You explained the idea clearly and precisely. One thing I am still wondering is why the inverse distribution of uniformly sampled cumulative probability densities of a normal distribution becomes a normal distribution again. You have given us a very good example using the heights of US women in the video. I think I will do some more research to find a mathematical proof to convince myself.

  • @t5alx136
    @t5alx136 3 ปีที่แล้ว +1

    Finally a complete explanation as to why pseudo random exists :D

  • @Algebrodadio
    @Algebrodadio 6 ปีที่แล้ว +1

    Persi Diaconis does an excellent lecture series on what it means to be random.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 2 ปีที่แล้ว

    Her videos are always so well made

  • @zirkjalheim
    @zirkjalheim 6 ปีที่แล้ว

    I love the show! Keep it up guys and galls!
    Although the reasoning for obtaining a normal RV from a uniform RV follows from the inverse transform method (as you've illustrated), I'm a bit scared that this might be advocating its use for this case rather than the Box-Muller transform in order to obtain normal variates, which is much easier (and safer) to implement on a range of platforms and in fact much more elegant! (Save for the fact that you might be throwing an unneeded normal RV away or donating it to charity, but who in this simulation driven world only requires a single normal RV - am I right?)
    For the Inverse Transform demonstration I would've used a Pareto / exponential distribution (granted, it would not produce nice S curved graphs as the normal CDF does, but it does give the opportunity of quickly doing an on-screen algebraic manipulation to arrive at the desired RV, which always makes my heart sing :-) )

  • @pierreabbat6157
    @pierreabbat6157 6 ปีที่แล้ว +2

    Mersenne Twister isn't a linear congruential generator. It's a linear feedback shift register.
    For the normal distribution, one normally does not use the inverse distribution function, because that's not expressible in closed form. Instead, one generates two normal variates at once, because the distribution function of the radius (in polar coordinates) of a 2D normal distribution is in closed form. There are at least two ways of doing this.

    • @jesuslovespee
      @jesuslovespee 6 ปีที่แล้ว

      Pierre Abbat, it actually is essentially an lcg, but has a tremendous amount of state.

    • @ge48421
      @ge48421 6 ปีที่แล้ว

      Couldn't agree more, but morons might say No, it’s based on a shift register, not an LCG.

  • @otakuribo
    @otakuribo 6 ปีที่แล้ว +2

    This video's topic was _so random!_

  • @dimitrisgiannopoulos3824
    @dimitrisgiannopoulos3824 6 ปีที่แล้ว +2

    First time i've seen a video this interesting that I couln't focus on the content because the speaker was too beautiful.

  • @kyoung21b
    @kyoung21b 6 ปีที่แล้ว +6

    Though storage was briefly alluded to, the statement that using a “natural” sequence (e.g. via a Geiger counter) isn’t repeatable, isn’t strictly correct, e.g. if one stores the sequence thus generated. While it’s obviously more efficient to use a pseudo-random generator, storing and retrieving a sequence that’s many times the period of a typical pseudo-random generator (e.g. 2^32) isn’t all that impractical in this age of relatively fast storage. And some simulations do require much longer sequences than the periods of typical pseudo-random generators.

    • @Kram1032
      @Kram1032 6 ปีที่แล้ว

      Ok so this might also be nitpicky but a parabola may follow a quite a bit more complex parametrization than that.
      For instance:
      a / b² x² - sin(2α) (x-x0) + (1 - cos(2α)) (y-y0) = 0
      This is the parabola you get if you have a constant gravitational field pointing down with strenght "a" (if a is negative it points up instead), and you throw a ball with starting velocity "b" in a direction dictated by "α" when you at first stand on "(x0, y0)".
      It includes all 2D parabolas that are aligned with the x and y axis. (In fact it has more parameters than necessary. You could do the same with y = a x² + b x + c) But of course you could also want freely rotated parabolas (which would involve an extra parameter giving that angle) or you could look parabolas in higher dimensional space...

    • @kyoung21b
      @kyoung21b 6 ปีที่แล้ว

      LegendLength - maybe nit picky but try to estimate e.g. the likelihood of generating a Higgs at the LHC via the standard model using, e.g. the Merssene twister - only pointing out that when a sequence longer than the period of the standard pseudo-random generators is required one can in fact generate it and also have repeatability via storage and that can be done feasibly.

    • @ineednochannelyoutube5384
      @ineednochannelyoutube5384 6 ปีที่แล้ว

      Afaik, since prng needs a seed seuence to extract entropy from, you must have a cache of real random inputs to run one. Windows supposedly does this by logging assorteed junk data like time keystrokes mous pointer position and various network info at predetermined times into an ever expanding cache of just random strings.

  • @Lorem_youtube
    @Lorem_youtube 6 ปีที่แล้ว

    Nice DIN font choice, fits in nicely.

  • @tonysellu2864
    @tonysellu2864 6 ปีที่แล้ว

    great! worth a second watch to really understand pseudo RNG for newbies like myself

  • @Huntracony
    @Huntracony 6 ปีที่แล้ว +1

    Another nice thing about picking 2^32 as your m is that 32-bit computers would do it automatically. That's just how many bits they have, so any bits after that would just get lopped off. No additional computation required.

  • @megan530
    @megan530 4 ปีที่แล้ว

    Thank god for this video... Teaching me this subject way better than my college professor!

  • @bhaskartripathi
    @bhaskartripathi 3 ปีที่แล้ว

    First of all - Thanks for making such great videos for free. I have a question - How can we generate non-uniform distribution between [0,1]?

  • @marcop.h.1404
    @marcop.h.1404 3 ปีที่แล้ว +1

    Clear and precise explanations, thank you! Could you show me the literature used for this video or recommend some further literature?

  • @williamscharpf294
    @williamscharpf294 6 ปีที่แล้ว +2

    Fun side note, in the 1960s and 1970s IBM had a random number generator call “RANDU”. Donald Knuth, the author of the seminal book “ The Art of Computer Programing”, call it “Truly Horrible”. Once the flaw came out, it called into question a large number publish papers that used this function for simulations.

    • @jasondoe2596
      @jasondoe2596 6 ปีที่แล้ว

      Interesting, thanks!

    • @simonmultiverse6349
      @simonmultiverse6349 3 ปีที่แล้ว

      " in the 1960s and 1970s IBM had a random number generator call “RANDU”. "
      Yes, I think IBM's reply to a query about that was "We guarantee that each number individually is random, but we cannot guarantee that more than one of them is random."

  • @StefanGliga48
    @StefanGliga48 6 ปีที่แล้ว +2

    You should've talked about Linear Feedback Shift Registers... And output bias/properties. But this is still a great video.

    • @pierredufresne996
      @pierredufresne996 2 ปีที่แล้ว

      Ah, Linear Feedback Shift Registers! The Playstation 2 had them on the VPU1 (sort of a vector shader before there were GPUs) and I naively used them to create x,y,z coordinates for snow and rain sprites. It looked fine as long as just a few dozen were generated, but as more were added a pattern would show; all points confined to a few slanted columns. Caused a lot of head scratching; ended up using LFSR for x, a Linear Congruential Generators for y and for z something like z' = frac( z + GoldenRatio). Thought it was a hardware issue but found out later it was a problem inherent to LFSR.

  • @NickMoore
    @NickMoore 6 ปีที่แล้ว

    Very cool, I had never heard of inverse transform sampling before. I can think of a few programming applications where it would have been useful. Thanks!

  • @johngay8416
    @johngay8416 6 ปีที่แล้ว

    I really want to watch these videos, but I'm waiting for the series to finish so I can binge watch the entire series.

  • @pingviinishalalalaa
    @pingviinishalalalaa 6 ปีที่แล้ว

    This was cool topic! Can you do more videos on random or maybe even series of videos about random in mathematics and physics ^^

  • @alword
    @alword 4 ปีที่แล้ว

    Love your videos!🤗

  • @hieronymusnervig8712
    @hieronymusnervig8712 6 ปีที่แล้ว +1

    Someone should recommend this video to the guys who 'programmed' the randomizer for my car music player

  •  3 ปีที่แล้ว

    That was a great video!

  • @thelonelynxx
    @thelonelynxx 5 ปีที่แล้ว

    That was awesome.... Thank you so much.

  • @lightningbolt9155
    @lightningbolt9155 หลายเดือนก่อน

    I think it’s diabolical that this is the _only_ video I found that _actually gives a valid equation._ Like, I’m looking up random number generator, why aren’t you giving me one?

  • @rayredondo8160
    @rayredondo8160 6 ปีที่แล้ว +1

    I actually came up with a random algorithm called XSR for XOR with Shifting Right. It passes many random tests, and is faster and even better than the Mersenne Twister when stacked up.

  • @johnchessant3012
    @johnchessant3012 6 ปีที่แล้ว +1

    In the case of the linear-congruential generator, you could increment the value of c after every pass, so with the right values of m, a, and c, you could have a period of m^2 instead of m.

    • @groszak1
      @groszak1 5 ปีที่แล้ว +1

      but there can only be m states so the period cannot be greater than m

  • @KemalPiro
    @KemalPiro 6 ปีที่แล้ว +1

    There were a lot of issues regarding pseudo random generators (Java 2013, Sony 2010, OpenSSL 2008), but that doesn't mean hardware generators are better and humans definitely aren't. Hardware generator even those which uses radiation reader could be "hacked" by putting another device near it (ofc it requires physical access to that machine but that's always the case). Numbers/phrases generated by humans are even less random that simple PRNG. It could be determine base on that person knowledge, habits, friends how probabilistic is that he will chose exact number/character in sequence.

  • @jc35334
    @jc35334 6 ปีที่แล้ว +2

    Please do more applied stats videos like these. This was amazing!

  • @nickkei2838
    @nickkei2838 6 ปีที่แล้ว +9

    No reference to why it is called Mersenne Twister? 2^19937-1

    • @pbsinfiniteseries
      @pbsinfiniteseries  6 ปีที่แล้ว +10

      Good point. And just to unpack your comment a bit for other people: The period of the common Mersenne Twister generator is 2^19937-1, which is called a Mersenne Prime.

  • @a1b3a3c14nbcv
    @a1b3a3c14nbcv 4 ปีที่แล้ว +2

    how do computers pick the a, m, and c for the linear congruential generator? do companies all use the same one? do they use a different pseudorandom number generator to pick them?

    • @pierredufresne996
      @pierredufresne996 2 ปีที่แล้ว

      WIkipedia has a thorough page on Linear Congruential Generator

  • @NikhilGuptadtu
    @NikhilGuptadtu 2 ปีที่แล้ว

    Awesome video🔥

  • @petemagnuson7357
    @petemagnuson7357 6 ปีที่แล้ว +3

    Could the Cumulative Distribution Function trick (mapping from uniform [0,1] to some other distribution) be expressed in matrix multiplication? Not sure what benefit that could have, but it seem potentially powerful.

    • @DavidJohnston_deadhat
      @DavidJohnston_deadhat 6 ปีที่แล้ว +1

      It depends on the curve. Some curves are not analytic and so need iterative numerical methods. So the process is quite inefficient.

  • @mysterymeat586
    @mysterymeat586 5 ปีที่แล้ว

    I wrote a random number program in an old HP 41CX calculator. It was simple. Extract the 1/10th of a sec digit, set it next to the previous extraction in the alpha register, and when you have the number of digits wanted, execute AMUN and presto, a real random number appears in the X register. Requires a, or several, random execution or presses of the key assigned to extract the digit from the clock.

  • @jameshughes6078
    @jameshughes6078 5 ปีที่แล้ว +1

    I'm showing this to all my coworkers. Really good way to explain how things work behind the scenes for all those different distributions in various random modules/libraries

  • @jonathanseamon9864
    @jonathanseamon9864 6 ปีที่แล้ว +1

    How would you go about proving that a given SRNG is uniformly distributed? how does the Mersenne twister improve upon a linear congruential generator? What are some common randomness tests? I feel like there's a lot this video just glossed over; some further reading would be appreciated.

  • @gejyspa
    @gejyspa 6 ปีที่แล้ว

    One problem alluded to, but not mentioned explicitly with pseudorandom vs random numbers, is that truly random integer sequences could have exactly two (or "n") identical numbers in a row (for example with p=1/1e6 for rolling a thousand-sided die twice) , whereas for pseudorandom sequences, if there are two identical numbers in a row, there must be ONLY that number forever after. Therefore using pseudorandoms pairwise (over the entire range of possible numbers) would never give you the same distribution you would see in actual random sequence.

    • @DavidJohnston_deadhat
      @DavidJohnston_deadhat 6 ปีที่แล้ว +1

      Only if the the state size = the output size. This is not true of sensible PRNGs.

  • @meri7108
    @meri7108 6 ปีที่แล้ว +1

    That's weird, I was just thinking about this this morning. Perfect!

    • @Roxor128
      @Roxor128 6 ปีที่แล้ว

      And I was watching a video from GDC of some game devs talking about procedural content, which usually makes use of pseudorandom numbers just this afternoon.

  • @Mahoney10
    @Mahoney10 6 ปีที่แล้ว

    Hi, I’m trying to simulate gas particles in a 2D box, and would like to generate velocities from the 2D Maxwell-Boltzmann distribution. To do this I’ve changed the probability density function into the cumulative density function, which I found to be -e^(-av**2), with a = m/2kT. However plotting this function against velocity I get probabilities between -1 and 0, which can’t be right. I was wondering if maybe you knew either why the function goes from -1 to 0 and/or if there is something I could do to ‘fix’ it. Thank you in advance and great video btw!

  • @alexandrugheorghe5610
    @alexandrugheorghe5610 6 ปีที่แล้ว

    In the case of banking systems for example with tokens, the seed is the current time since we assume that the laws of physics at the surface of the Earth remain constant such that if your phone clock and the bank systems clocks are synchronized you both get the same generated number through the implemented algorithm. If however one would go a bit higher, let's say low Earth orbit, then the time will differ due to general relativity. You will be unable to login to your bank account from space if you don't correct for the time difference due to earth gravity.

  • @girlflash
    @girlflash 6 ปีที่แล้ว +1

    (Weird comment from game programmer on getting away with sampling a PRNG with a small period. Ignore as needed :P)
    Interesting thing WRT period for procedurally generated content (eg game levels) is it doesn't necessarily have to be particularly large. You are advancing the PRNG through it's sequence every time it is sampled, but you can sample it in such a way that the repetition is not apparent to the player.
    For example, if we have a small period PRNG determining the heights of tiles in a level, it will become apparent if the period is significantly less than the number of tiles generated. But if we generate a tile's height and then before we sample for the next tile we sample the PRNG a second time or more ("how many blades of grass are on this tile?" or "Shall we place a monster here?" - "Which direction is the monster facing?") by the time we get to calculating the next tile's hight, the PRNG could be at a (psuedo) random point in it's sequence. The sequence repeats but less noticeably for something the playing will recognise (tile heights) unless the player can view a lot more of the generated level.
    This can work against us too though; a program sampling a PRNG can request random numbers in step with it's period, so every tile would have the same height, and the same number of blades of grass, the same monster spawned. Or if the program made periodLength/2 samples for a tile, every ~other~ tile will be the same.
    This is largely irrelevant these days since most computers can comfortably generate very long pseudorandom sequences, and do it very quickly. But it is a trick that can be used if careful and when computation power is extremely limited :)

  • @phar0n
    @phar0n 6 ปีที่แล้ว

    This channel is perfect.

  • @eofirdavid
    @eofirdavid 6 ปีที่แล้ว

    You had a video about normal numbers half a year ago. These are exactly the numbers x in [0,1] such that the sequence x_{n+1}=10 x_n mod 1 equidistribute in [0,1]. I'm not sure how close is the connection to pseudo random numbers, but it is true that almost every number in [0,1] is normal (so it is a "good seed").

  • @mdrasidali4107
    @mdrasidali4107 4 ปีที่แล้ว

    Ma'am, please tell that if I use a very big seed value(assume random) and generate a random number (which is of a smaller size than the seed) out of it, then then what will be the entropy of the random number? its upper bound will be the entropy of seed? or its randomness is limited to all its possible value?

  • @KakiStinson
    @KakiStinson 5 ปีที่แล้ว

    Hello this is very helpful, I wonder if you can send some documents about random number generators and validity tests of the RNG's, because I'm doing my these of master in University
    thank you for your work and keep going.

  • @Mothuzad
    @Mothuzad 6 ปีที่แล้ว +1

    Well, I was planning on doing Perlin Noise today anyway. I guess I'll write my own PRNG too.

  • @valentinsarmagal
    @valentinsarmagal 3 ปีที่แล้ว

    excellent video

  • @fraserpaine5783
    @fraserpaine5783 6 ปีที่แล้ว +2

    Was it just a coincidence that the probability density function looked like the derivative of that cumulative distribution of heights? It appeared to be exactly that and I think that makes sense but I've not seen such an elegant link between statistics and calculus before that.

    • @DrunkenUFOPilot
      @DrunkenUFOPilot 6 ปีที่แล้ว +1

      PDF is always the derivative of the cumulative function. The exact opposite of a coincidence!

  • @fossilfighters101
    @fossilfighters101 6 ปีที่แล้ว

    This was a very good video.

  • @leelavathi7921
    @leelavathi7921 5 ปีที่แล้ว

    Very useful video 👍 how is randomness used in entity authentication

  • @datas_cat
    @datas_cat 6 ปีที่แล้ว

    9:56 the Uniform distribution: wouldn't the value in the y-axis be a value y, such that y approaches 0?
    Not y=1, because there would be an uncountably infinite values all with probability equal to 1?
    Or rather in your example y=10,000?

  • @trdi
    @trdi 6 ปีที่แล้ว

    What is "truly" random in every day life is more of a philosophical question than a mathematical one. If a poker website has a RNG that combines seed from micro temperature fluctuations, internet traffic, number of players on website and other variables, I'm going to call it random for the purpose of me trusting the website to deal me cards "randomly".

  • @joshuataylor2105
    @joshuataylor2105 6 ปีที่แล้ว

    I hope there's a follow up episode on some of the tests used to gauge a PRNG's statistical randomness. Looking at the 3-dimensional planes generated by LCGs is a pretty direct way to see how bad they are (in that respect).

  • @famitory
    @famitory 6 ปีที่แล้ว +1

    Lots of RNG engines for programs being activley interacted with (say, a game) use user input as part of the sequence. Is this still psudeorandom, or does the realtime user input make it totally random?

  • @SPrestwood
    @SPrestwood 6 ปีที่แล้ว

    It seems to be a fallacy when the spiraling lengths (1,2,..infinity) ate said to be a cop win when the length of the last one is - infinity. At what point is the overwhelming exception recognized?

  • @philliberatore4265
    @philliberatore4265 6 ปีที่แล้ว +1

    This demonstrates the difference between a mathematician and an engineer. The problem was defined with 2 significant digits. The answer can only possibly contain 2 significant digits. You don't pick up more. So, the baseball rulebook is correct. The correct answer to this problem is "12". The trailing ".019" is meaningless.
    Great videos, sir. Keep them coming!

  • @LimeGreenTeknii
    @LimeGreenTeknii 6 ปีที่แล้ว

    The cycle repeats every time you get the same number? Wouldn't it make more sense to have the algorithm be based on the last several numbers instead of the last one? For example, in the linear congruential generator, the increment could be changed to the seed number we got on the previous turn, and the modulo could be the seed we got two turns ago.
    That leads to the problem of how to actually start the algorithm (How can you start something that relies on previous turns?) but at the very most, this just means we need multiple seeds, and the modulo, increment, etc. become their own seeds.

  • @enisheadpay
    @enisheadpay 6 ปีที่แล้ว

    at 8:20 you say that the distributions of the random number generators is uniform. But isn't this only the case when the period of the sequence is equal to the size of the interval? If so then the Middle-Square Algorithm will never have a uniform distribution (the maximum period was less than the interval), and the Linear Congruential Generator will only have one when "good" values are chosen (such that the period=m). I would be interested to see another episode going into more detail about how these "good" values are found, and about the actual distribution of the Middle-Square Algorithm (as it is not uniform).

  • @chrisherrick2397
    @chrisherrick2397 6 ปีที่แล้ว +1

    Can Inverse Transform Sampling be used to transform a different distribution into a uniform distribution?

  • @ahmedalmutairi4056
    @ahmedalmutairi4056 6 ปีที่แล้ว +1

    Thanks indeed,, could we have an episode on extractor functions? like how to simulate a fair dice using unfair one.

    • @jasondoe2596
      @jasondoe2596 6 ปีที่แล้ว

      Ahmed Almutairi, oh, that's a great topic. Yes please!

  • @SellusionStar
    @SellusionStar 5 ปีที่แล้ว +3

    You're head always moves like a typewriter... :D this distracts me so hard. Is it because of reading from a prompter?

  • @tady159
    @tady159 6 ปีที่แล้ว

    I have seen some pseudo random number generators that use a bell-curve distribution, given its mean and the standard deviation. However, I don't quite understand how they generate the numbers, because they seem to be unbounded, differently from the first two algorithms shown in this video.

  • @kwinvdv
    @kwinvdv 6 ปีที่แล้ว +3

    Are there applications in which a pseudo random sequences are required that never repeat. One sequence that never repeats would be a Thue-Morse sequence, however it does not look very random. However one could use it as a seed to make it look "more random". Or are there better alternatives for this?

    • @kwinvdv
      @kwinvdv 6 ปีที่แล้ว

      LegendLength It is also relatively simple to get an extremely long period by using two long sequences, whose lengths are relatively prime (share no common factor other then one). And then combine the two, for example by addition. The result should have a period equal to the product of the two periods of the original sequences.

    • @drdca8263
      @drdca8263 6 ปีที่แล้ว +2

      No deterministic procedure which can be implemented in a physical system and has a finite size can give an infinite sequence that never goes into a loop, because there are only so many possible physical states for the implementation, by the bekenstein bound.

    • @kwinvdv
      @kwinvdv 6 ปีที่แล้ว

      A counter example would be a Thue-Morse sequence. For example if you start of with some finite sequence X consisting out of numbers between 0 and 1 and not all equal to 0.5 (and maybe some other conditions), then you can recursively extend this sequence such that it is never repeating itself using X:=[X, 1-X]. However this sequence probably does not have very good properties regarding appearing random.

    • @drdca8263
      @drdca8263 6 ปีที่แล้ว

      My point was in the "and has a finite size". Of course, if you have an unlimited amount of memory, then you can make a program that makes a sequence that never repeats, by doing something as simple as 01001000100001... etc. . But for any machine with a finite physical size, it has finitely many possibly physical states (because of the Bekenstein bound), and therefore must eventually repeat.

    • @ineednochannelyoutube5384
      @ineednochannelyoutube5384 6 ปีที่แล้ว

      +drdca I was wondering what you exactly meant by that, because I am absolutely no expert, but I am rather certain that a simple number line will never repeat, and it can be sescribed with xn=xn-1 + 1. But I guess memory couldnt hold the numbers after they exceed its size.

  • @AkashKumar-vc5yl
    @AkashKumar-vc5yl 3 ปีที่แล้ว

    Inverse Transform Sampling should also generate pseudo normal distribution because probability at each point is different but in this case, it will be constant for given ranges however short we go.

  • @01eocoe10
    @01eocoe10 6 ปีที่แล้ว +1

    This video really clarified computer "randomness" for me. I always wondered why they didn't use a hardware solution or query the last few digits on the second on an atomic clock. It makes a lot of sense that for the purposes of debugging this would be bad. Thanks!

  • @joaoaugusto7049
    @joaoaugusto7049 3 ปีที่แล้ว

    what variables do i have to use to get the uniform [0, 1] in the linear congruential generator ?

  • @recklessroges
    @recklessroges 6 ปีที่แล้ว

    As a computer scientist this is unbelievably important.

  • @ilmbrk6570
    @ilmbrk6570 6 ปีที่แล้ว

    If you have X_n = f(X_n-1) and f is the function to define the new psuedorandom number could we just change it to X_n = f(X_n-1 * n) to avoid cycling. Or why dont we slice up sqrt(2) into packs of four and use them as a random sequence?

  • @jeffreygustafson5633
    @jeffreygustafson5633 5 ปีที่แล้ว +1

    So why is it that the shuffle feature on apple music/ spotify is so bad? I have well over a thousand songs downloaded but will hear the same 10-20 songs played several times and never hear the majority of songs on my phone.

    • @majeck
      @majeck 5 ปีที่แล้ว

      A few years ago apple had true randomness in their shuffle function but people complained that sometimes the same song being played twice(randomness can repeat itself, you can get to continuous 5s on a die and it's still random), so they removed the feature with who knows what algorithm.

  • @NidusFormicarum
    @NidusFormicarum 6 ปีที่แล้ว

    But what if we want to create pseudorandom sequence that seem to contain all numbers - that is, any real number between for instance 0 and 1 can be produced? Wouldn't it be possible to acheive this by using a formula that typically produces transcendental numbers? It is true that only certain numbers will be contained, but we don't know which creating the impression that any number can accur. But... Functions that produce transcendental numbers can also contain rational numbers for certain input values. However, I wonder if it is possible to include irrational, non-transcendental numbers as well.

  • @YeshuaAgapao
    @YeshuaAgapao 6 ปีที่แล้ว

    Roll20 uses the true random hardware source to periodically re-seed its pseudorandom number generator (probably every few minutes for each user session).

  • @JordanMatrix
    @JordanMatrix 6 ปีที่แล้ว

    Great video on pseudorandom numbers! I would be happy to hear more on them. Any plans of making a video on low discrepancy series?
    en.wikipedia.org/wiki/Low-discrepancy_sequence

  • @trevinbeattie4888
    @trevinbeattie4888 6 ปีที่แล้ว

    I had a disagreement with my college professor about the randomness of a register on the Atari 8-bit computer. The POKEY chip includes an internal 17-bit polynomial counter from which 8 bits can be read. POKEY advances the counter on every clock cycle, so it's completely independent of the main CPU. In normal operation, the timing of when the CPU reads this register is impacted not just by the number of clock cycles it takes to run the software that reads it but also by interrupts from such things as video refresh and keyboard events. Can this register be considered truly random or not?

    • @DavidJohnston_deadhat
      @DavidJohnston_deadhat 6 ปีที่แล้ว +1

      Nope. Set an interrupt on a timer at (2^17)-1 clocks and read the value on each interrupt.

  • @ArjunSAriyil
    @ArjunSAriyil 6 ปีที่แล้ว

    For random number generation in computers, can we use the modulus of no. of processor cycles passed during a long calculation?

    • @eritain
      @eritain 6 ปีที่แล้ว +1

      It's dangerous getting your randomness from something other users can affect. If I want to push around your random number generation, all I have to do is calculate the right things at the right time to delay your calculation.
      On computers with spinning-platter hard drives, air turbulence in the drive affects the seek time, so that's a pretty good source of randomness -- as long as you're getting the timing from your own seeks. If you'll take any seek, again all I have to do is flood the system with disk access requests of my choice, whatever that particular drive can do most consistently.

  • @eritain
    @eritain 6 ปีที่แล้ว +1

    Custom hardware for randomness is less and less of a problem, now that basically every consumer computer (including phones) comes with a camera in it. Random movement of electrons in the CCD sensor produces thermal noise that you can use. Note that this does not depend on actual light entering the camera.

    • @mrpepino2
      @mrpepino2 6 ปีที่แล้ว +1

      Hi Nathan,
      you are right, a web cam can be used. I made a web base tool some years ago that works inside the browser an captures the web cam
      to generate random numbers.
      You can put a sticker on the cam and the sensor still it generates pixel noise.
      For poeple interested in it:
      retura.de/camrng/camrng.html
      Its in german...klick on "web cam aktivieren" to start the capturing and then on START.
      The graph then shows the quality or "randomness" of the data, by calculating PI with a monte-carlo simulation and the rate between ones and zeros.
      Greetings David

    • @mrpepino2
      @mrpepino2 6 ปีที่แล้ว

      oh I forgot I made an english version too:
      retura.de/camrng/camrng_en.html

  • @AB-ee5tb
    @AB-ee5tb 6 ปีที่แล้ว +3

    I've got a lot of Kahn academy to complete before I can begin to understand ANYTHING you EVER say. I'm 25 and I'm dumber than a 6th grader

    • @AdhiNarayananYR
      @AdhiNarayananYR 3 ปีที่แล้ว

      hope in these 3 years you were able to complete some of those.

  • @ProfessorPolitics
    @ProfessorPolitics 6 ปีที่แล้ว

    So when I use R or Stata to simulate a normally distributed set of numbers, even if I simulate a large number it often deviates from what a standard normal distribution is supposed to look like--sometimes by pretty noticeable amounts. Why is that so if the program is just basing its values off of the equation for the CDF?

    • @DavidJohnston_deadhat
      @DavidJohnston_deadhat 6 ปีที่แล้ว

      Two possible reasons. (1) It takes a lot of deviates for the curve to start looking nice, like a few million. (2) Not all algorithms are perfect.I use Zignor. I don't know what R uses.