mysql

Ranter

AurthurM

117

Comments

20

N00bPancakes

7419

4y

>because it will skip number when row is deleted

Yeah but ... what's the bad part there?
9

theabbie

7622

4y

What does she want? should all index update on deletion? That's too much unnecessary work.
12

LLAMS

3628

4y

“My teacher said it is wrong to use this feature that is built into basically every database engine.”

And this is why some people teach rather than do.
2

Eklavya

1735

4y

If you want a primary key rather than AI, you can use time with seconds in your db
6

TeachMeCode

5190

4y

Everywhere I worked used auto incrementing keys. What’s the issue with deletion? Nothing will break bc the last value is stored so the next key will always be unique.
5

N00bPancakes

7419

4y

@LLAMS

Let's all just be happy this person isn't in a meeting rasing critical issues like this ;)
2

AurthurM

117

4y

@N00bPancakes thats what i dont understand.. its my teachers statement
btw. I thought maybe i was wrong..thinking of custom key generator. but custom key generators also doesn't always guarantee uniqness .. 🤔
3

Eklavya

1735

4y

@TeachMeCode No, it's not like that.If the last value is deleted, it will not be repeated again
2

AurthurM

117

4y

@Eklavya yeah that is a way.. but it is like a hack lol.. but i agree with u
9

Root

77451

4y

@Eklavya No. Bad. Bad bad bad.

Why? Two subsequent inserts will have colliding IDs. and if a single db instance handles this nicely, two will not.
4

TeachMeCode

5190

4y

@Eklavya I’ll admit it’s a guess, never really read into it. I always assumed that the last value is always stored somewhere and increments when a new row is added to guarantee uniqueness
7

N00bPancakes

7419

4y

I'm no DB expert (and I say this because of that) but generally if a database does a thing for me like auto increment IDs ... man they do it right just fine on their own.

If they didn't I sure as hell would have heard about it.

Let the database do it's database thing has been a good rule of thumb for me.
3

AurthurM

117

4y

Also there are many approaches for many things in coding.. some are good practice and some are bad.. but nothing is wrong if it works completely. I hate teachers who sticks to one way and thinks other ways are always wrong. :(
3

N00bPancakes

7419

4y

@AurthurM

It would be interesting to hear her response if you asked about what is bad about that situation.

Personally I wouldn't challenge the professor on it... i'd just want to hear why they don't like that scenario / situation.
2

AurthurM

117

4y

@N00bPancakes yeah... Now i am confident that it is not a wrong way..
2

AurthurM

117

4y

@N00bPancakes i will keep on updating.. btw it willl jeopardize my project marks lmao
6

TeachMeCode

5190

4y

@AurthurM nah don’t jeopardize your project, just do what she wants and forget all the crap she taught you when you leave
2

N00bPancakes

7419

4y

@AurthurM

I wouldn't jeopardize any relationships or marks over it.
3

AurthurM

117

4y

@TeachMeCode best advice i got today.Thank you
14

Root

77451

4y

Autoinc pks are fine.

They’re the fastest for the db to generate and verify (and index). By contrast, uuids take a small but significant amount of time to generate, and longer to check. Autoinc pks also have the same amount of problems (or fewer) than other schemes, and are also the easiest to understand.

Two caveats:
1) do not display these to the user. Ever. Why? They are trivial to predict and therefore allow scraping. Make a secure id column that’s unique but random. (Suggestion: prefix a few chars that hint at the object type; helps for debugging and customer support.)
2) When sharding the db, you’ll have to modify the pk scheme. Why? Two simultaneous inserts on two dbs will generate rows with colliding autoinc pks. There are several approaches to this, all with pros and cons. Best I’ve found: compound ID. Database id + separator + pk. A previous company used “J” because it didn’t look like a letter (and some ego reasons). But this way the China db would have a record like 1J12345678, which prevented a collision when the US db had 0J12345678 as a pk.

tl;dr
your teacher had an idea that autoinc pls could cause problems, but didn’t really understand why, and jumped to the wrong conclusion. Too little understanding is always a bad thing.
1

AurthurM

117

4y

@Root thanks for your answer.. i will keep that in mind.
3

A4Abhiraj

1740

4y

One point people miss is auto increment uses locks in most databases. It depends on where you use it and most of the times it won't cause issues, but beware of this fact.
1

AurthurM

117

4y

@A4Abhiraj can you elaborate incase of mysql db
4

A4Abhiraj

1740

4y

Continuing, you shouldn't be afraid to use auto increments in concurrent applications, general scenarios won't be problematic.
There will be a shared lock between inserts, it's designed not to block parallel inserts but the critical part is still blocking, n+1 isn't blocking, the part where you retrieve n is. (Think of how would you yourself implement a simple auto incr)

Second consideration here - in heavy parallel transactions, you will anyways find gaps between numbers, so don't base anything on the assumption that there won't be any holes in this sequence, what's guaranteed is a uniqueness and a sequence, not 'consecutive'ness

Point being, the holes part isn't just about delete. It's knowledge that's passed around without context in the tech community. Auto Inc causes holes also due to parallelism and a fail fast approach.
1

AurthurM

117

4y

@A4Abhiraj i understand your point. So what do you think we should use to avoid such cases?
1

AurthurM

117

4y

@A4Abhiraj and in google login we get googles 21 digit int id .. are those unique to each users?
4

A4Abhiraj

1740

4y

Reference to locking -
https://dev.mysql.com/doc/refman/...

The first question is do you need to avoid this.
My experience comes from using postgres as a queue(using notification on trigger) and we started with auto increment and with every insert, the time to insert went up.
It's a big no for this scenario. Figure out if concurrent inserts are in question and does the next insert depend on the previous. ( Not for the pkey, but for application logic, like the above example)

And you can use UUID instead. Slightly heavier to calculate. And recent databases support indexing uuids by saving them as binary data, not chars.

MySQL specific reference -
https://stitcher.io/blog/...
4

A4Abhiraj

1740

4y

@AurthurM not sure which ID, but I'm guessing they are unique.
Uuids will have a format of xxxx-xxxx-xxxx-xxxx-xxxx
But it depends on various standards and there are a few. Perhaps Google uses UUID and then converts it to base64 string to reduce the size.

Or maybe they add a date stamp to the end, because companies like Google can still have collisions even with UUID.
Amazon recently had some issue I heard.
1

AurthurM

117

4y

@A4Abhiraj ok. I will keep them in mind. Thaks for explanation.
3

Root

77451

4y

@A4Abhiraj Everything will be a nightmare at that scale. Sounds like a fun problem to solve! :)
1

A4Abhiraj

1740

4y

@Root yeah the chances of collision are never zero, they just tend to zero.
I've seen people mention using 2 uuids. The solution to this is reducing the chances of collision, rather than making it 0.
2

Root

77451

4y

@A4Abhiraj Sounds like a flawed approach. If something you’re doing is to mitigate a problem when it happens, the better approach would be to find a way to prevent it from happening in the first place.

Here the two uuids are mitigating a race condition.
1

A4Abhiraj

1740

4y

@Root there's no race condition with uuids , atleast as a process, every UUID function runs on its own, can be run on different servers and will be unique (one standard depends on the MAC address)
As a theory, yes a race condition in the sense that 2 rows have the same UUID.

Read here
https://stackoverflow.com/questions...
1

A4Abhiraj

1740

4y

@Root I'm sure there will be mathematical models to have true unique values.
Software has always been a midway between perfection and practicality.
UUID is one such example..
4

Root

77451

4y

@A4Abhiraj Exactly. The problem isn’t in the uuid, it’s elsewhere.

Like i said, everything at that scale is a nightmare — because it’s complicated with many edge cases, and the solution must be perfect in every way. Fun problem.
4

danielstaleiny

383

4y

It is bad because your stupid co-workers make code dependent on incrementing that id. I puked when I saw him for each to match id
1

Root

77451

4y

@danielstaleiny 😂
2

danielstaleiny

383

4y

@Root sadly true story. The same person made 4k lines php backend API.

He left the company to work in the bank which is nice. :D
3

TeachMeCode

5190

4y

@danielstaleiny um he did what??? It sounds like he queried a million rows then tried to loop through each to find an id that matches what was passed in. That’s pretty sad
2

A4Abhiraj

1740

4y

@danielstaleiny someone's money is in bad hands 😂
1

TeachMeCode

5190

4y

@danielstaleiny he left to work at a bank as a...?
2

danielstaleiny

383

4y

@TeachMeCode good question. :D I am pretty sure as software developer. Don't ask me how he passed the interview.
3

TrevorTheRat

1184

4y

The id number not being used again when deleted is a good thing not a bad thing, don’t try to change that behaviour.

But I would argue for using a UUID instead so that you can generate it before writing to the database a d apply it to other things - maybe you have a message to put into a bus, a file to write, a letter to print etc. Whatever it is. It’s easier to do that with a UUID than it is to save to the database and then apply the id to those examples
2

Oktokolo

11330

4y

Your teacher is wrong for almost all use cases.

Only if guessing of IDs is an actuall attack scenario (like when you completely mess up your authorization and/or rate limiting), a source of strictly monotonic increasing positive integral numbers (that is, what autoincrement represents) doesn't cut it.
Don't forget, to set the initial value to some random number having enough digits if users shouldn't be able to estimate the total amount of records at creation time of the record.

If users must not be able to infer order of creation for any two records, you do actually need the record IDs to be random. But even then you can still use autoincrement IDs internally and map them to randomly generated public IDs when they need to be exposed (don't derive them from the internal ID).
Chances are, that you aren't stumbling over that use case even once in your life as a coder...
1

AurthurM

117

4y

@Oktokolo yeah in my project those ids are not exposed to user
1

Oktokolo

11330

4y

@AurthurM
Then i don't see any point in not using the automatic sequence of guarenteed-to-be-unique IDs provided by the one entity, which ist most able and also absurdly optimized to do so.

But don't worry: Teachers are wrong all the time.
Try to get his reasons and then debunk them one by one. After that he either stops insisting on that fake news or he is an idiot.
1

AurthurM

117

4y

@Oktokolo yeah.. i will do that :)
2

IntrusionCM

13947

4y

A lot of this rant should be put in big fat brackets followed by the sentence

"theoretically, but only under certain circumstances that in most use cases are highly unlikely".

I'll try to make it quick:

- the dreaded rollover is the common scenario: an auto increment reaches the maximum value of the type it can hold and thus - after incrementing - rolls over to 0 and starts at 1 again

Highly unlikely. MySQL e.g. has INT unsigned - 4_294_967_295 is the maximum value.

Unless you do really really really really frequent inserts, this will be hard to reach.

- the tales of transactions and locking.

Very short: if you come to the point where this matters, you'll have to deal with a lot of problems. And most likely you'll know what you are doing. Cause it's hard to provoke transaction problems and require e.g. an change of the default isolation mode for transactions, unless you try to use the database for advanced locking (advanced in the sense of actively locking certain rows, e.g. creating semaphores or stuff like that).

Which requires a lot of knowledge, cause this is a pain in the butt... And really a bad idea for many reasons. Not impossible, but if you don't know what you're doing, you're literally standing in an rain of invisible arrows and end up getting pierced by them. (not joking. This is really really really hard stuff).

Usually you can ignore this as long as you use the default isolation mode and an transactional engine and don't decide to poke into the default behaviours (shared row locking, trying to be too clever, over optimizing, REPLACE / ON DUPLICATE KEY UPDATE...)

- the promised land of UUIDs.

Ugh. This is wrong. For many many reasons.

Unless the database provides an integrated datatype for UUIDs, think twice.

UUIDs exist in many forms, commonly (but erroneously) UUID refers to UUID V4.

36 chars are _a lot_ in databases, especially when used as a PK.

This matters. UUIDs as string are a bad idea.

Don't do if you have alternatives.
0

AurthurM

117

4y

@IntrusionCM what should be the alternative from your pov?..
0

IntrusionCM

13947

4y

@AurthurM

The cats motto: If it fits, it sits.

Auto increment keys are fine.

What @Root says is true - auto increment keys are predictable, but again: only consider this as an security issue if it matters.

E.g. in an API endpoint with authentication and user permissions, I see no reason to not use auto increment IDs. After all, if the permission system does not work, the predictability of the Auto Inc IDs is the least of your problems...

Same for everything that happens only internal - e.g. with no possibility of user / client access.

If you have to obfuscate, try to look for the simplest solution.

Don't over anticipate - it's what ends usually in the most complex solution, only adding burden and not really solving stuff.

E.g. most people fear hash collisions.

As I said before - an INT 32 unsigned is really large.

The easiest solution is to just generate a SHA 256 hash of the ID plus a random generated salt, and return the first half.

16 chars.

Easily done in any ORM, only used for lookups.

Should it ever fail, which is really unlikely, you can just retry.

Note: this is just a random generated string. The SHA256 is just for lengthening the string, the random value to obfuscate.
3

Nihil75

738

4y

This is why I tought myself everything, and am afraid to send my son to school.
2

Root

77451

4y

@Nihil75 Yes. This, exactly.
3

rantsauce

6739

4y

Your teacher is a moron (like most of them).
1

natesymer

213

4y

Ignore teachers. Do your own research.

Autoincrementing integer primary keys are used EVERYWHERE in industry. You could use GUIDs, and if you were using postgresql, you could have the DB autogen them for you. But GUIDs aren't guaranteed to be unique (extreme edge case), so I'd go with the autoincrementing id. Autoincrementing ids won't let you down.
1

lorentz

15165

4y

It works fine in basically all cases. If you have a foreign key referring to either of multiple tables and don't have a discriminator column (which is already a pretty fucked up sitch) or if you want the IDs to go client side and prevent guessing, use UUIDs.
0

AurthurM

117

4y

Now i am still using autoinc primary key but with uuid as secondary key .
0

John-Does

2000

4y

UUID will be always bigger than incrementing a number why bother ?