Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "generating primes"
-
I messaged a professor at MIT and surprisingly got a response back.
He told me that "generating primes deterministically is a solved problem" and he would be very surprised if what I wrote beat wheel factorization, but that he would be interested if it did.
It didnt when he messaged me.
It does now.
Tested on primes up to 26 digits.
Current time tends to be 1-100th to 2-100th of a second.
Seems to be steady.
First n=1million digits *always* returns false for composites, while for primes the rate is 56% true vs false, and now that I've made it faster, I'm fairly certain I can get it to 100% accuracy.
In fact what I'm thinking I'll do is generate a random semiprime using the suspected prime, map it over to some other factor tree using the variation on modular expotentiation several of us on devrant stumbled on, and then see if it still factors. If it does then we know the number in question is prime. And because we know the factor in question, the semiprime mapping function doesnt require any additional searching or iterations.
The false negative rate, I think goes to zero the larger the prime from what I can see. But it wont be an issue if I'm right about the accuracy being correctable.
I'd like to thank the professor for the challenge. He also shared a bunch of useful links.
That ones a rare bird.21 -
Found a clever little algorithm for computing the product of all primes between n-m without recomputing them.
We'll start with the product of all primes up to some n.
so [2, 2*3, 2*3*5, 2*3*5*,7..] etc
prods = []
i = 0
total = 1
while i < 100:
....total = total*primes[i]
....prods.append(total)
....i = i + 1
Terrible variable names, can't be arsed at the moment.
The result is a list with the values
2, 6, 30, 210, 2310, 30030, etc.
Now assume you have two factors,with indexes i, and j, where j>i
You can calculate the gap between the two corresponding primes easily.
A gap is defined at the product of all primes that fall between the prime indexes i and j.
To calculate the gap between any two primes, merely look up their index, and then do..
prods[j-1]/prods[i]
That is the product of all primes between the J'th prime and the I'th prime
To get the product of all primes *under* i, you can simply look it up like so:
prods[i-1]
Incidentally, finding a number n that is equivalent to (prods[j+i]/prods[j-i]) for any *possible* value of j and i (regardless of whether you precomputed n from the list generator for prods, or simply iterated n=n+1 fashion), is equivalent to finding an algorithm for generating all prime numbers under n.
Hypothetically you could pick a number N out of a hat, thats a thousand digits long, and it happens to be the product of all primes underneath it.
You could then start generating primes by doing
i = 3
while i < N:
....if (N/k)%1 == 0:
........factors.append(N/k)
....i=i+1
The only caveat is that there should be more false solutions as real ones. In otherwords theres no telling if you found a solution N corresponding to some value of (prods[j+i]/prods[j-i]) without testing the primality of *all* values of k under N.13 -
In the 90s most people had touched grass, but few touched a computer.
In the 2090s most people will have touched a computer, but not grass.
But at least we'll have fully sentient dildos armed with laser guns to mildly stimulate our mandatory attached cyber-clits, or alternatively annihilate thought criminals.
In other news my prime generator has exhaustively been checked against, all primes from 5 to 1 million. I used miller-rabin with k=40 to confirm the results.
The set the generator creates is the join of the quasi-lucas carmichael numbers, the carmichael numbers, and the primes. So after I generated a number I just had to treat those numbers as 'pollutants' and filter them out, which was dead simple.
Whats left after filtering, is strictly the primes.
I also tested it randomly on 50-55 bit primes, and it always returned true, but that range hasn't been fully tested so far because it takes 9-12 seconds per number at that point.
I was expecting maybe a few failures by my generator. So what I did was I wrote a function, genMillerTest(), and all it does is take some number n, returns the next prime after it (using my functions nextPrime() and isPrime()), and then tests it against miller-rabin. If miller returns false, then I add the result to a list. And then I check *those* results by hand (because miller can occasionally return false positives, though I'm not familiar enough with the math to know how often).
Well, imagine my surprise when I had zero false positives.
Which means either my code is generating the same exact set as miller (under some very large value of n), or the chance of miller (at k=40 tests) returning a false positive is vanishingly small.
My next steps should be to parallelize the checking process, and set up my other desktop to run those tests continuously.
Concurrently I should work on figuring out why my slowest primality tests (theres six of them, though I think I can eliminate two) are so slow and if I can better estimate or derive a pattern that allows faster results by better initialization of the variables used by these tests.
I already wrote some cases to output which tests most frequently succeeded (if any of them pass, then the number isn't prime), and therefore could cut short the primality test of a number. I rewrote the function to put those tests in order from most likely to least likely.
I'm also thinking that there may be some clues for faster computation in other bases, or perhaps in binary, or inspecting the patterns of values in the natural logs of non-primes versus primes. Or even looking into the *execution* time of numbers that successfully pass as prime versus ones that don't. Theres a bevy of possible approaches.
The entire process for the first 1_000_000 numbers, ran 1621.28 seconds, or just shy of a tenth of a second per test but I'm sure thats biased toward the head of the list.
If theres any other approach or ideas I may be overlooking, I wouldn't know where to begin.16 -
So I promised a post after work last night, discussing the new factorization technique.
As before, I use a method called decon() that takes any number, like 697 for example, and first breaks it down into the respective digits and magnitudes.
697 becomes -> 600, 90, and 7.
It then factors *those* to give a decomposition matrix that looks something like the following when printed out:
offset: 3, exp: [[Decimal('2'), Decimal('3')], [Decimal('3'), Decimal('1')], [Decimal('5'), Decimal('2')]]
offset: 2, exp: [[Decimal('2'), Decimal('1')], [Decimal('3'), Decimal('2')], [Decimal('5'), Decimal('1')]]
offset: 1, exp: [[Decimal('7'), Decimal('1')]]
Each entry is a pair of numbers representing a prime base and an exponent.
Now the idea was that, in theory, at each magnitude of a product, we could actually search through the *range* of the product of these exponents.
So for offset three (600) here, we're looking at
2^3 * 3 ^ 1 * 5 ^ 2.
But actually we're searching
2^3 * 3 ^ 1 * 5 ^ 2.
2^3 * 3 ^ 1 * 5 ^ 1
2^3 * 3 ^ 1 * 5 ^ 0
2^3 * 3 ^ 0 * 5 ^ 2.
2^3 * 3 ^ 1 * 5 ^ 1
etc..
On the basis that whatever it generates may be the digits of another magnitude in one of our target product's factors.
And the first optimization or filter we can apply is to notice that assuming our factors pq=n,
and where p <= q, it will always be more efficient to search for the digits of p (because its under n^0.5 or the square root), than the larger factor q.
So by implication we can filter out any product of this exponent search that is greater than the square root of n.
Writing this code was a bit of a headache because I had to deal with potentially very large lists of bases and exponents, so I couldn't just use loops within loops.
Instead I resorted to writing a three state state machine that 'counted down' across these exponents, and it just works.
And now, in practice this doesn't immediately give us anything useful. And I had hoped this would at least give us *upperbounds* to start our search from, for any particular digit of a product's factors at a given magnitude. So the 12 digit (or pick a magnitude out of a hat) of an example product might give us an upperbound on the 2's exponent for that same digit in our lowest factor q of n.
It didn't work out that way. Sometimes there would be 'inversions', where the exponent of a factor on a magnitude of n, would be *lower* than the exponent of that factor on the same digit of q.
But when I started tearing into examples and generating test data I started to see certain patterns emerge, and immediately I found a way to not just pin down these inversions, but get *tight* bounds on the 2's exponents in the corresponding digit for our product's factor itself. It was like the complications I initially saw actually became a means to *tighten* the bounds.
For example, for one particular semiprime n=pq, this was some of the data:
n - offset: 6, exp: [[Decimal('2'), Decimal('5')], [Decimal('5'), Decimal('5')]]
q - offset: 6, exp: [[Decimal('2'), Decimal('6')], [Decimal('3'), Decimal('1')], [Decimal('5'), Decimal('5')]]
It's almost like the base 3 exponent in [n:7] gives away the presence of 3^1 in [q:6], even
though theres no subsequent presence of 3^n in [n:6] itself.
And I found this rule held each time I tested it.
Other rules, not so much, and other rules still would fail in the presence of yet other rules, almost like a giant switchboard.
I immediately realized the implications: rules had precedence, acted predictable when in isolated instances, and changed in specific instances in combination with other rules.
This was ripe for a decision tree generated through random search.
Another product n=pq, with mroe data
q(4)
offset: 4, exp: [[Decimal('2'), Decimal('4')], [Decimal('5'), Decimal('3')]]
n(4)
offset: 4, exp: [[Decimal('2'), Decimal('3')], [Decimal('3'), Decimal('2')], [Decimal('5'), Decimal('3')]]
Suggesting that a nontrivial base 3 exponent (**2 rather than **1) suggests the exponent on the 2 in the relevant
digit of [n], is one less than the same base 2 digital exponent at the same digit on [q]
And so it was clear from the get go that this approach held promise.
From there I discovered a bunch more rules and made some observations.
The bulk of the patterns, regardless of how large the product grows, should be present in the smaller bases (some bound of primes, say the first dozen), because the bulk of exponents for the factorization of any magnitude of a number, overwhelming lean heavily in the lower prime bases.
It was if the entire vulnerability was hiding in plain sight for four+ years, and we'd been approaching factorization all wrong from the beginning, by trying to factor a number, and all its digits at all its magnitudes, all at once, when like addition or multiplication, factorization could be done piecemeal if we knew the patterns to look for.7 -
Up all damn night making the script work.
Wrote a non-sieve prime generator.
Thing kept outputting one or two numbers that weren't prime, related to something called carmichael numbers.
Any case got it to work, god damn was it a slog though.
Generates next and previous primes pretty reliably regardless of the size of the number
(haven't gone over 31 bit because I haven't had a chance to implement decimal for this).
Don't know if the sieve is the only reliable way to do it. This seems to do it without a hitch, and doesn't seem to use a lot of memory. Don't have to constantly return to a lookup table of small factors or their multiple either.
Technically it generates the primes out of the integers, and not the other way around.
Things 0.01-0.02th of a second per prime up to around the 100 million mark, and then it gets into the 0.15-1second range per generation.
At around primes of a couple billion, its averaging about 1 second per bit to calculate 1. whether the number is prime or not, 2. what the next or last immediate prime is. Although I'm sure theres some optimization or improvement here.
Seems reliable but obviously I don't have the resources to check it beyond the first 20k primes I confirmed.
From what I can see it didn't drop any primes, and it didn't include any errant non-primes.
Codes here:
https://pastebin.com/raw/57j3mHsN
Your gotos should be nextPrime(), lastPrime(), isPrime, genPrimes(up to but not including some N), and genNPrimes(), which generates x amount of primes for you.
Speed limit definitely seems to top out at 1 second per bit for a prime once the code is in the billions, but I don't know if thats the ceiling, again, because decimal needs implemented.
I think the core method, in calcY (terrible name, I know) could probably be optimized in some clever way if its given an adjacent prime, and what parameters were used. Theres probably some pattern I'm not seeing, but eh.
I'm also wondering if I can't use those fancy aberrations, 'carmichael numbers' or whatever the hell they are, to calculate some sort of offset, and by doing so, figure out a given primes index.
And all my brain says is "sleep"
But family wants me to hang out, and I have to go talk a manager at home depot into an interview, because wanting to program for a living, and actually getting someone to give you the time of day are two different things.1