Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API

From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "binary score"
-
Hey, been gone a hot minute from devrant, so I thought I'd say hi to Demolishun, atheist, Lensflare, Root, kobenz, score, jestdotty, figoore, cafecortado, typosaurus, and the raft of other people I've met along the way and got to know somewhat.
All of you have been really good.
And while I'm here its time for maaaaaaaaath.
So I decided to horribly mutilate the concept of bloom filters.
If you don't know what that is, you take two random numbers, m, and p, both prime, where m < p, and it generate two numbers a and b, that output a function. That function is a hash.
Normally you'd have say five to ten different hashes.
A bloom filter lets you probabilistic-ally say whether you've seen something before, with no false negatives.
It lets you do this very space efficiently, with some caveats.
Each hash function should be uniformly distributed (any value input to it is likely to be mapped to any other value).
Then you interpret these output values as bit indexes.
So Hi might output [0, 1, 0, 0, 0]
while Hj outputs [0, 0, 0, 1, 0]
and Hk outputs [1, 0, 0, 0, 0]
producing [1, 1, 0, 1, 0]
And if your bloom filter has bits set in all those places, congratulations, you've seen that number before.
It's used by big companies like google to prevent re-indexing pages they've already seen, among other things.
Well I thought, what if instead of using it as a has-been-seen-before filter, we mangled its purpose until a square peg fit in a round hole?
Not long after I went and wrote a script that 1. generates data, 2. generates a hash function to encode it. 3. finds a hash function that reverses the encoding.
And it just works. Reversible hashes.
Of course you can't use it for compression strictly, not under normal circumstances, but these aren't normal circumstances.
The first thing I tried was finding a hash function h0, that predicts each subsequent value in a list given the previous value. This doesn't work because of hash collisions by default. A value like 731 might map to 64 in one place, and a later value might map to 453, so trying to invert the output to get the original sequence out would lead to branching. It occurs to me just now we might use a checkpointing system, with lookahead to see if a branch is the correct one, but I digress, I tried some other things first.
The next problem was 1. long sequences are slow to generate. I solved this by tuning the amount of iterations of the outer and inner loop. We find h0 first, and then h1 and put all the inputs through h0 to generate an intermediate list, and then put them through h1, and see if the output of h1 matches the original input. If it does, we return h0, and h1. It turns out it can take inordinate amounts of time if h0 lands on a hash function that doesn't play well with h1, so the next step was 2. adding an error margin. It turns out something fun happens, where if you allow a sequence generated by h1 (the decoder) to match *within* an error margin, under a certain error value, it'll find potential hash functions hn such that the outputs of h1 are *always* the same distance from their parent values in the original input to h0. This becomes our salt value k.
So our hash-function generate called encoder_decoder() or 'ed' (lol two letter functions), also calculates the k value and outputs that along with the hash functions for our data.
This is all well and good but what if we want to go further? With a few tweaks, along with taking output values, converting to binary, and left-padding each value with 0s, we can then calculate shannon entropy in its most essential form.
Turns out with tens of thousands of values (and tens of thousands of bits), the output of h1 with the salt, has a higher entropy than the original input. Meaning finding an h1 and h0 hash function for your data is equivalent to compression below the known shannon limit.
By how much?
Approximately 0.15%
Of course this doesn't factor in the five numbers you need, a0, and b0 to define h0, a1, and b1 to define h1, and the salt value, so it probably works out to the same. I'd like to see what the savings are with even larger sets though.
Next I said, well what if we COULD compress our data further?
What if all we needed were the numbers to define our hash functions, a starting value, a salt, and a number to represent 'depth'?
What if we could rearrange this system so we *could* use the starting value to represent n subsequent elements of our input x?
And thats what I did.
We break the input into blocks of 15-25 items, b/c thats the fastest to work with and find hashes for.
We then follow the math, to get a block which is
H0, H1, H2, H3, depth (how many items our 1st item will reproduce), & a starting value or 1stitem in this slice of our input.
x goes into h0, giving us y. y goes into h1 -> z, z into h2 -> y, y into h3, giving us back x.
The rest is in the image.
Anyway good to see you all again.27 -
The saddest and funniest side of our industry is (atleast in India): someone works hard and makes it to the best colleges, do great projects on AI, ML; get a good score on Leetcode, codechef; gets a job in FAANG-like companies...
Changes colors in CSS and texts in HTML.
And, why is there so much emphasis on Data Structures and Algorithms? I mean, a little bit is fine, but why get obsessed with it when you never write algorithms in production code?
Now, don't tell me that, we use libraries and we should know what we are doing, no, we don't use algorithms even in libraries.
Now, before you tell me that MySQL uses B-tree for maintaining indexes, you really don't need to solve tricky questions to be able to understand how a B-tree works.
It's just absurd.
I know how to little bit on how design scalable systems.
I know how to write good code that is both modular and extensible.
I know how to mentor interns and turn them into employees.
I know how to mentor junior engineers (freshers) and help them get started.
Heck I can even invert a binary tree.
But some FAANG company would reject me because I cannot solve a very tricky dynamic programming question.4 -
So my data structure's exam's result came and i got scored 57.5 out of 80. My classmates who barely know anything about C scored way more than me. I am so embarrassed at myself but i gave the right answers in exam. My score in the exams before was 39/40 and 38.5/40. All my hardwork failed because it was a so called THEORY exam where there was only 2 small questions of writing algos but all others were just like "describe pre-order traversal of a binary tree" or "write the difference between a tree and a graph?define adjacent node, path and complete graph"...
When will this fuckery end?2 -
Is it just me or is everyone thinking the binary ++ counter is a good idea?
I think it’s really cool because it sort of hides the number of ++ you have behind a wall of laziness. Like people probably aren’t going to convert every score and I personally just look at the length of it as a rough estimate of how upvoted a user is.
I think this really helps eliminate the “likes contest” that can come with other social media. (And as a result reduces spam that is posted/reposted just to get likes)
Anyone agree? Disagree?5