Pseudo-, Quasi-, And Real Random Numbers

2y ago
47 Views
2 Downloads
1.16 MB
62 Pages
Last View : Today
Last Download : 2m ago
Upload by : Wade Mabry
Transcription

Pseudo-, Quasi-, and Real RandomNumberson LinuxOleg Goldshmidtolegg@il.ibm.comIBM Haifa Research LaboratoriesHaifux, Sep 1, 2003 – p.1/62

Agenda“Any one who considers arithmetical methods ofproducing random digits is, of course, in a state of sin.”[J. von Neumann, 1951]Sinful pleasures.“If the numbers are not random, they are at leasthiggledy-piggledy.” [G. Marsaglia, 1984]Does it look random enough to you?“Random numbers should not be generated with amethod chosen at random.” [D. Knuth, 1998]Pseudo-random and quasi-random.“Computers are very predictable devices.” [T. Ts’o,probably circa 1994, but maybe as late as 1999]Random tricks with the Linux kernel.Haifux, Sep 1, 2003 – p.2/62

Life in Sin According to Knuth (I)Simulationsciences: just about everywhereoperations research: workloadsSamplingNumerical analysisComputer programmingrandom inputsrandomized algorithmsDecision makingstrategic executive decisionsTechnion paper gradesoptimal strategies in game theoryHaifux, Sep 1, 2003 – p.3/62

Life in Sin According to Knuth (II)AestheticsRecreation (where do you think Monte Carlo comesfrom?)Sins Knuth didn’t consider:Financial marketsHow random is IBM share price?Cryptography and cryptanalysisIs “random” equivalent to “cryptographycallysecure”?RFC 1750, “Randomness Recommendations forSecurity”Haifux, Sep 1, 2003 – p.4/62

Early (Biblical?) Virtues and SinsIntuition: dice, cards, lottery urn, census reportsPhysics: resistance noise (A. Turing, Mark I, 1951)Disadvantage: irreproducible, difficult to debugA CD-ROM of random bytes (G. Marsaglia, 1995)output of noise-diode circuit with scrambled rapmusic — “white and black noise”Early attempts, while virtuous, were cumbersome and inadequate. A radical new approach was needed.Haifux, Sep 1, 2003 – p.5/62

Von Neumann’s Original SinCan random numbers be produced by ordinaryarithmetic? Von Neumann (circa 1946): take a long number (e.g.,10 digits), square it, extract the middle digits:Are these numbers random?No, but who cares? They appear randomObvious problem: 0 is stationaryAnother obvious problem: cycles38 bit numbers: period of 750,000 (N. Metropolis, 1956)Haifux, Sep 1, 2003 – p.6/62

DoesLook Random To You? Are there any patterns in the decimal representation of ?3.14159265358979323846264338327950Haifux, Sep 1, 2003 – p.7/62

DoesLook Random To You?Can this be considered a pattern?3.14159265358979323846264338327950Haifux, Sep 1, 2003 – p.8/62

DoesLook Random To You?Can this be considered a pattern?3.14159265358979323846264338327950“The entire history of human race.” [Dr. I. J. Matrix, 1965]Haifux, Sep 1, 2003 – p.9/62

What Are (Pseudo-)Random Numbers?Working definition of computer-generated randomsequence:a program that generates random sequences shouldbe different and statistically independent from everyprogram that uses its output.Interpretation of the definition:two different generators ought to produce statisticallyindistinguishable results when coupled to yourapplication.if they don’t, at least one of them is not a goodgenerator.Haifux, Sep 1, 2003 – p.10/62

What Are Good PRNGs?Pragmatic point of view: there are statistical tests thatare good at filtering out correlations that are likely to befelt by applications.follows (to a certain extent) from the workingdefinition coupled with insight and experiencegood generators need to pass all the testsor at least the user should be aware of failures tojudge their impactHow?Haifux, Sep 1, 2003 – p.11/62

Test Roll the dicetimes, expect to see on averagetimes:2 3 456789 10 11 124 8 12 16 20 24 20 16 12 842 4 10 12 22 29 21 15 14 96 9 10 11 12 8 7 6 5 4 3 2 Consider a set of independent observations of a randomvariable with a finite number of possible valuesRolling “true” dice:What is the probability that the dice are “loaded” (i.e., theresults are not random)?Haifux, Sep 1, 2003 – p.12/62

test (cont.) ! # For our experiment, & %! # "Compute the statistic— is it improbable? ( #( # ' (Usedistribution withdegrees offreedomNB: ,are not completely independent: giventhe -th value can be computed.'the probability that the sum of the squares of randomnormal variables of zero mean and unit variance will begreater thanHaifux, Sep 1, 2003 – p.13/62

Properties ofDistribution or 'depends only on , not on NB: independent experiments are assumedexercise: combine a set of experiments with itself,beconsider it a single set of size : how willaffected? )) ) ')ifandthedistribution is a goodapproximationshould be large enough that allare large (ruleof thumb: more than 5)large will smooth out locally nonrandom behaviour:not a problem with dice but may be a problem withcomputer-generated numbersHaifux, Sep 1, 2003 – p.14/62

Test Criteriashould not be too high — we do not expect too muchof a deviation from “true” dice!should not be too low — if it is we cannot considerthe numbers to be random!rules of thumb usually expressed in terms ofprobability:less than 1% or greater than 99% — rejectless than 5% or greater than 95% — suspect5% to 10% or 90% to 95% — somewhat suspectbetween 10% and 90% — acceptabledo the test several times - e.g. 2 out of 3Haifux, Sep 1, 2003 – p.15/62

Kolmogorov-Smirnov (KS) Testtest applies when there is a finite number of degreesof freedomwhat about, e.g., random real numbers on [0,1)?yeah, in the computer representation that is finite,but really large, and we want behaviour close to“real” anyway 7"98 ?8 "!? *, " empirical CDF: given a sequence ;: .0/23141645 ,* "KS test: compare cumulative probability distributionfunctions (CDF)Haifux, Sep 1, 2003 – p.16/62

KS Test Algorithm ,* " *, " # # *, "E *, "DE GF AF CB ,* "ED" EGF AF CB @ @D" theoretical criterion: H what isdoing there? std. dev. ofisfor fixed , so the factor makesproportional tothe statistics (largely) independent of*"# #? *" F A? F CB"#I ? F A? F CB @ @D" Ipractical criterion (assume sorting, but can do without)Haifux, Sep 1, 2003 – p.17/62

Practical Application of KS Test should be large enough so that the empirical and thetheoretical CDFs are observably different should be small enough not to wipe out significantlocally nonrandom behaviour . @,K" N CLM"# # *E ," (and of @,KD " J for large () the distribution of) is closely approximated by @,K" (and @,KD "apply KS test again to the sequence of)K @KD " @,K" J apply KS to chunks of a long sequence of medium size()obtain a sequence of,,Haifux, Sep 1, 2003 – p.18/62

and @DAgain,low @ KS Test Criteriashould be neither too high nor tooThere is a probability distribution associated with them,and we reject or suspect too low or too high probabilitiesCan be used in conjunction with thetest for discreetrandom variablesdoon chunks of the sequencenot a good policy to simply count how manyvalues are too large or too smallinstead, obtain the empirical CDF ofuse KS test to compare the empitrical CDF with thetheoretical oneHaifux, Sep 1, 2003 – p.19/62

KS vsKS applies to CDFs without jumpsapplies to CDFs with nothing but jumpscan be applied to continuous CDFs by binningsometimes KS is better, somethimeswinsdivide [0,1) into 100 binsif deviations for bins 0.49 are positive, and for bins50.99 — negative, KS will indicate a biggerdifference thanif even deviations are positive and odd ones arenegative, KS will indicate a closer match thanHaifux, Sep 1, 2003 – p.20/62

Empirical Testsoutlines of algorithms of some common teststheoretical basis (TAOCP) PP : PQPOuniform real numbers on [0,1): R R : RQ ROauxiliary integer sequence on [0,d-1]UPT S RwhereTis typically a power of 2, large enough for ameaningful test, but not too large to be practicalHaifux, Sep 1, 2003 – p.21/62

Empirical Tests (I)V? .Rtest# W TH.8, apply V T' RO T Q ,* "Frequency test (tests uniformity)use KS test withforuse, for eachcountand.with ? W #' T ? ,X"HT . RR"Serial test: we want pairs of successive numbers to beuniformly distributed, too: “The sun comes up just asoften as it goes down, in the long run, but that does notmake its motion random.” [D. Knuth]count, applytest withandgeneralize to triples, quadruples, etc.Haifux, Sep 1, 2003 – p.22/62

Empirical Tests (II) )66 PGap test: examine the length of gaps betweenin a given rangeoccurrences ofcount number of gaps of different lengths, for lengthsof 0,1,. , and lengths, until gaps are tabulated.applytest to the counts — details in TAOCP QROPoker test:splitinto “hands” (quintiples), applytest to“pair”, “two pairs”, “three”, “full house”, “four”, “poker”applytest according to the number of distinctvalues in each “hand” — details in TAOCP.Haifux, Sep 1, 2003 – p.23/62

Empirical Tests (III)Q#T ROCoupon collector’s testobserve the lengths of segments ofthat arerequired to collect a “complete set” of integers from 0, apply— details in TAOCPto6HY #' 6Y6Y 6QPOPermutation testdivideinto segments of lengtheach segment can have different orderingscount occurrences of each ordering, applytestand.withHaifux, Sep 1, 2003 – p.24/62

Empirical Tests (IV)Run test: observe lengths of monotonic segmentsdo not applytest: adjacent runs are notindependent, a long runs will tend to be followed bya short one, and vice versathrow away the element that immediately follows arun to make runs independent — details in TAOCPCollision test: what to do if number of degrees offreedom is much larger than the number ofobservations?hashing: count the number of collisionsa generator will pass the test if it does not generatetoo few or too many collisions — details in TAOCPHaifux, Sep 1, 2003 – p.25/62

DIEHARD I — General Descriptionobtainable fromhttp://stat.fsu.edu/ geo/diehard.htmlsource code available in C, but it it obfuscated: it is“patched and jumbled” Fortran passed through f2cyou need f2c to link it (-lf2c -lm )the original Fortran code is 30 years’ worth ofpatchesvery uncomfortable to alter, so don’t — it is notadvisable anyway unless you really know what youare doing“seems to suit my purposes” [G. Marsaglia]there are also executables for DOS, Linux, and SunHaifux, Sep 1, 2003 – p.26/62

DIEHARD II — Componentsmakewhat — creates test filesasc2bin — converts ascii (hex) to binarydiehard — runs the testsdiequick — a shorter versiona number of built-in random generators to testmakewhat prompts with a lista battery of testsdiehard allows you to choose from 15 testsreally more than 15, since a few tests are compoundsome familiar: run test, permutationssome custom: DNA test, parking lot testHaifux, Sep 1, 2003 – p.27/62

DIEHARD III — Procedurewrite a main() that does one of two things:open a binary file and write your random integers toitopen a text file and write your random integers to itin hex8 hex digits per integer, 10 integers per line, nospacesthen run asc2bin on the text filethe ascii file will be twice the size of the binary oneyour PRNG should produce 32-bit integersif 31-bit, then left-justify by left-shift: some testsfavour leading bitsHaifux, Sep 1, 2003 – p.28/62

(Possible) Problems with rand(3) KKA ? Z 2?Z]\["often linear congruential generators— period no greater thanK\2can provide quite decent random numbers with properchoice of , , and , fast ANSI C specifies that rand(3) return an intRAND MAX is no larger than INT MAXANSI C requires only that INT MAX be greater orrealizations willequal 32767 — a simulation ofrepeat the sequence about 30 timesusually not a problem on 32-bit machinesK\2ANSI C reference implementationLC with a sub-optimal choice of , , andbotched by implementors who try to “improve”Haifux, Sep 1, 2003 – p.29/62

More Problems with rand(3) KJa ( Ka (#("K (LC PRNGs are not free of sequential correlation onsuccessive callsproblem when generating random numbers in manydimensions:plot points in -dimensional space (between 0 and1 in each dimension)“random numbers fall mainly in the planes” [G.Marsaglia], i.e., they will lie on less than-dimensional planes(bad),less than 32 planes(good) ,about 1600 planes“We guarantee that each number is random individually,but we don’t guarantee that more than one of them israndom.” [NR]Haifux, Sep 1, 2003 – p.30/62

So how is glibc rand(3) doing?not an LC generator (cf. man random )do read the man pages for rand(3) andrandom(3) !“The period of this random number generator is verylarge, approximately 16*((2**31)-1)” [man random ]not really very large[[[[[[#[#[#[#[###[DIEHARD tests on rand(3) : so-somany (most?) other generators are no better, e.g. ran2from NR is only a little bit better, Sun f77 (old?) isreally lousy.Haifux, Sep 1, 2003 – p.31/62

Are there Any Good PRNGs? A ]\[D 2 " yes, some pass all the tests with flying colors:KISSThe "Mother of all random number generators"Multiply-With-CarryMersenne Twisterdo check!check by yourselfgenerators improvetests get tougherif you are really serious, develop your own testsHaifux, Sep 1, 2003 – p.32/62

Implementation of Good PRNGsfromhttp://www.cs.yorku.ca/ ew (z 36969*(z&65535) (z 16))wnew (w 18000*(w&65535) (w 16))MWC ((znew 16) wnew )SHR3 (jsrˆ (jsr 17), \jsrˆ (jsr 13), \jsrˆ (jsr 5))#define CONG (jcong 69069*jcong 1234567)#define FIB ((b a b),(a b-a))#define KISS ((MWCˆCONG) SHR3)not the last word in software engineering, one can dobetter with little effortHaifux, Sep 1, 2003 – p.33/62

Seeding PRNGsfixed seed very useful for debuggingsrand(time(NULL))get a MOSIX cluster, runfor i in ‘seq 1 10‘; do (./sim &); donesrand(clock())will be surprisingly similar from run to run: manyruns will only use a few seedscall gettimeofday(2) , use low-order bits ofmicroseconds, mix with pid , etc.good, but note that gettimeofday(2) is notPOSIX , not guaranteed to workentropy — also non-portable!Haifux, Sep 1, 2003 – p.34/62

Monte Carlo Simulations IApplicationsthrowing dice or spinning wheels (if you are intogetting rich, quickly)modelling price fluctuations in various markets (ifyou are hired to help the rich keep their money)studying Brownian motion, diffusion, cosmic raypropagation, etc. (if getting rich is not the objective)designing new computers and/or algorithms forefficient management of resources under uncertainworkloads (strictly for common good, of course)Haifux, Sep 1, 2003 – p.35/62

Monte Carlo Simulations II ,"7 ,b " TTechniquegenerate “realizations” based on random sequencescompute the expectation value of the result (payoff,displacement, etc.) and the “likely” deviation as anerror estimateequivalent to integration over realizations78 , b"example:— generate random pointscount those for whichRc R ccTbRRRTbexample:over a complicated shape — encloseinto a simple shapethat can be easily sampled,computewherein ,outside ofHaifux, Sep 1, 2003 – p.36/62

Non-Uniform RNG: TransformationsT d T7 d " G 7 "We , zero7 to obtain,D " 7? ih ? Tdj7?:hg d j"G j" TjG 7e j"7 T j;hgin multiple dimensions, with Jacobian iH * 7 7b" V8 TH Twe must solvebeing the CDF of7 "e forwithfor G *" 7 "7 b"7 for uniform on [0,1)otherwise, "f d dG , " T7d7 T "We dfundamental transformation law of probability:Haifux, Sep 1, 2003 – p.37/62

Normal Deviates: Box-Mullerlk from uniform (on [0,1))kl 7Generate normal deviates: o p3nm pnq # 73nm # 7the transformation: s7nB 7 fBot7# 7CLM [ror, equivalentlyHaifux, Sep 1, 2003 – p.38/62

Normal Deviates: Box-Muller (cont.)HD7Lxw ufBH noB tv{y zy H uinside the unit circle (using u[ u, a further trick: pickrejection)v,u" # 7LDgddHthe Jacobian:nq ppnqno need to compute#n m #n mu v" u v" H Hv v oro p 7 7 u o pv,get two normal deviates!Haifux, Sep 1, 2003 – p.39/62

Non-Uniform RNG: Rejection ,b "What if we do not know the inverse CDF? ,b ": G, b "T is known and analytically invertible: ,b "T E , "8 such that *"pick , reject if7 , "8 if7 acceptV uniform on "7generate } b" computeuniform in [0,A)*D " 7 7generateHaifux, Sep 1, 2003 – p.40/62

Variance Reduction Techniques D complexity analysis for integration using uniformlydistributed random points in an -dimensional space:each point adds linearly to the accumulate sum thatwill become the function averageit also adds linearly to the accumulated sum ofsquares that will become the variancethe estimates error comes from the square root of— slow convergence!the variance, henceantithetic variablesnon-uniform sampling (importance, stratification)Haifux, Sep 1, 2003 – p.41/62

D Can We beat the Square Root? D is not inevitablechoose points on a Cartesian grid, sample each gridor betterpoint exactly once in whatever order:convergenceproblem: must decide in advance how fine the gridhas to be, commit to sample all the pointscan we pick sample points “at random” yet spread themout in some self-avoiding way, eliminating the “localclustering” of uniformly random points? another context: search an -dimensional volume for apoint where some locally computable condition holdswe want to move smoothly to finer scales, and dobetter than randomHaifux, Sep 1, 2003 – p.42/62

Quasi-Random Sequences sequences of -tuples that fill -dimensional spacemore uniformly than uncorrelated random pointsnot random at all, “maximally avoiding” each other1 I ? 1I? 1Halton sequence: algorithmforwrite as a number in base , where is primereverse the digits and put a radix point in frontexample:,: 17 base 3 is 122,base 3? ? IIHalton sequence: intuitionevery time the number of digits in increases,becomes finer-meshedthe fastest-changing digit in controls the mostsignificant digit ofHaifux, Sep 1, 2003 – p.43/62

Quasi-Monte CarloD H, i.e., almost as complexity:curvature n3m "many quasi-random (a.k.a. low-discrepancy)sequences: Halton, Faure, Sobol, Niederreiter,Antonov-Saleev (efficient variant of Sobol) — details inNR and references thereinnon-trivial mathematics involved, with a bit ofcan be orders of magnitude better convergencetricky to implementbeware of USPTO!Haifux, Sep 1, 2003 – p.44/62

drivers/char/random.cGathering “environmental noise”inter-keypress timings from the keyboardmouse interrupt timings and position as reported byhardwareinter-interrupt timingsnot all interrupts are suitable (consider timer)finishing time of block requestsa8 &maintain an “entropy pool” mixed with a CRC-likefunction (fast enough to do on every interrupt)random bytes are obtained by taking SHAmessage of lengthbits160-bit “digest”keep an estimate of “true randomness” in the pool, ifzero an attacker has a chance if he cracks SHAHaifux, Sep 1, 2003 – p.45/62

/dev/random and /dev/urandom/dev/randomwill only return a maximum of the number of bits ofrandomness contained in the entropy pool/dev/urandomwill return as many bytes as are requested, withoutgiving the kernel time to replenish the poolacceptable for many applicationsvery random, passes DIEHARD (Ts’o: suitable forone-time pads)void get random bytes(void *buf, int n);for use within the kernelnot very fast, good for seeding, non-portableHaifux, Sep 1, 2003 – p.46/62

Unpredictable over Rebootson shutdown:seed /var/run/random-seedtouch seed; chmod 600 seedpool /proc/sys/kernel/random/poolsize[ -r pool ] && bytes ‘cat pool‘ bytes 512dd if /dev/urandom of seed count 1 bs byteson boot:seed /var/run/random-seed[ -f seed ] && cat seed /dev/urandomtouch seed; chmod 600 seedpool /proc/sys/kernel/random/poolsize[ -r pool ] && bytes ‘cat pool‘ bytes 512dd if /dev/urandom of seed count 1 bs bytesHaifux, Sep 1, 2003 – p.47/62

Other Interfacessysctl(8)# /sbin/sysctl -A 2 /dev/null grep randkernel.random.uuid e005e4a2-fde0-423d-8734-0akernel.random.boot id 68eedc43-9a54-4a42-80dekernel.random.write wakeup threshold 128kernel.random.read wakeup threshold 8kernel.random.entropy avail 4

producing random digits is, of course, in a state of sin.” [J. von Neumann, 1951] Sinful pleasures. “If the numbers are not random, they are at least higgledy-piggledy.” [G. Marsaglia, 1984] Does it look random enough to you? “Random numbers should not be generated with a method chosen at random.

Related Documents:

Compute Platform SoC. The Qualcomm Pseudo Random Number Generator implements a SHA-256 Hash DRBG as defined in SP 800-90A. The hardware sub-chip cryptographic modules are specified in the following table: Component Type Version Number Qualcomm Pseudo Random Number Generator hardware 2.1.0 Qualcomm Pseudo Random Number Generator hardware 2.3.1

Pseudo-code Algorithms can be speci ed using some form of pseudo-code Good pseudo-code: I Balances clarity and detail I Abstracts the algorithm I Makes use of good mathematical notation I Is easy to read Bad pseudo-code: I Gives too many details I Is implementation or language speci c Good Pseudo-code Example Intersection

such a dice rolls. Pseudo Random Number Generators are algorithms that utilize mathematical formulas to produce sequences that will appear random, or at least have the e ect of randomness. If the results of a Pseudo Random Number Generator mimicking dice rolls are listed it will appear random. However, statistical analysis will prove that the

Start by finding out how Python generates random numbers. Type ?random to find out about scipy's random number generators. Try typing 'random.random()' a few times. Try calling it with an integer argument. Use 'hist' (really pylab.hist) to make a histogram of 1000 numbers generated by random.random. Is th

Start by finding out how Python generates random numbers. Type ?random to find out about scipy's random number generators. Try typing 'random.random()' a few times. Try calling it with an integer argument. Use 'hist' (really pylab.hist) to make a histogram of 1000 numbers generated by random.random. Is the distribution Gaussian, uniform, or .

Pseudo Code Practice Problems: Listed below is a brief explanation of Pseudo code as well as a list of examples and solutions. Pseudo

PSEUDO-AND QUASI-ULTRASOUND ABNORMALITIES OF THE FETUS. Mary E Norton, MD. Professor, Obstetrics, Gynecology, and Reproductive Sciences. University of California, San Francisco. 6

learning teams, guided inquiry activities, critical and analytical thinking, problem solving, reporting, metacognition, and individual responsibility. Strategies for the successful use of learning teams are discussed, the roles of the instructor in this learning environment are described, and implementation hints