6

is using auto incremented id as primary key bad practice in mysql??

My teacher said it is wrong to use autoincrement because it will skip number when row is deleted. But it does to make it unique .

Or should i use uuid ? But doesn't it impact performance?

Comments
  • 20
    >because it will skip number when row is deleted

    Yeah but ... what's the bad part there?
  • 9
    What does she want? should all index update on deletion? That's too much unnecessary work.
  • 12
    “My teacher said it is wrong to use this feature that is built into basically every database engine.”

    And this is why some people teach rather than do.
  • 2
    If you want a primary key rather than AI, you can use time with seconds in your db
  • 6
    Everywhere I worked used auto incrementing keys. What’s the issue with deletion? Nothing will break bc the last value is stored so the next key will always be unique.
  • 5
    @LLAMS

    Let's all just be happy this person isn't in a meeting rasing critical issues like this ;)
  • 2
    @N00bPancakes thats what i dont understand.. its my teachers statement
    btw. I thought maybe i was wrong..thinking of custom key generator. but custom key generators also doesn't always guarantee uniqness .. 🤔
  • 3
    @TeachMeCode No, it's not like that.If the last value is deleted, it will not be repeated again
  • 2
    @Eklavya yeah that is a way.. but it is like a hack lol.. but i agree with u
  • 9
    @Eklavya No. Bad. Bad bad bad.

    Why? Two subsequent inserts will have colliding IDs. and if a single db instance handles this nicely, two will not.
  • 4
    @Eklavya I’ll admit it’s a guess, never really read into it. I always assumed that the last value is always stored somewhere and increments when a new row is added to guarantee uniqueness
  • 7
    I'm no DB expert (and I say this because of that) but generally if a database does a thing for me like auto increment IDs ... man they do it right just fine on their own.

    If they didn't I sure as hell would have heard about it.

    Let the database do it's database thing has been a good rule of thumb for me.
  • 3
    Also there are many approaches for many things in coding.. some are good practice and some are bad.. but nothing is wrong if it works completely. I hate teachers who sticks to one way and thinks other ways are always wrong. :(
  • 3
    @AurthurM

    It would be interesting to hear her response if you asked about what is bad about that situation.

    Personally I wouldn't challenge the professor on it... i'd just want to hear why they don't like that scenario / situation.
  • 2
    @N00bPancakes yeah... Now i am confident that it is not a wrong way..
  • 2
    @N00bPancakes i will keep on updating.. btw it willl jeopardize my project marks lmao
  • 6
    @AurthurM nah don’t jeopardize your project, just do what she wants and forget all the crap she taught you when you leave
  • 2
    @AurthurM

    I wouldn't jeopardize any relationships or marks over it.
  • 3
    @TeachMeCode best advice i got today.Thank you
  • 14
    Autoinc pks are fine.

    They’re the fastest for the db to generate and verify (and index). By contrast, uuids take a small but significant amount of time to generate, and longer to check. Autoinc pks also have the same amount of problems (or fewer) than other schemes, and are also the easiest to understand.

    Two caveats:
    1) do not display these to the user. Ever. Why? They are trivial to predict and therefore allow scraping. Make a secure id column that’s unique but random. (Suggestion: prefix a few chars that hint at the object type; helps for debugging and customer support.)
    2) When sharding the db, you’ll have to modify the pk scheme. Why? Two simultaneous inserts on two dbs will generate rows with colliding autoinc pks. There are several approaches to this, all with pros and cons. Best I’ve found: compound ID. Database id + separator + pk. A previous company used “J” because it didn’t look like a letter (and some ego reasons). But this way the China db would have a record like 1J12345678, which prevented a collision when the US db had 0J12345678 as a pk.

    tl;dr
    your teacher had an idea that autoinc pls could cause problems, but didn’t really understand why, and jumped to the wrong conclusion. Too little understanding is always a bad thing.
  • 1
    @Root thanks for your answer.. i will keep that in mind.
  • 3
    One point people miss is auto increment uses locks in most databases. It depends on where you use it and most of the times it won't cause issues, but beware of this fact.
  • 1
    @A4Abhiraj can you elaborate incase of mysql db
  • 4
    Continuing, you shouldn't be afraid to use auto increments in concurrent applications, general scenarios won't be problematic.
    There will be a shared lock between inserts, it's designed not to block parallel inserts but the critical part is still blocking, n+1 isn't blocking, the part where you retrieve n is. (Think of how would you yourself implement a simple auto incr)

    Second consideration here - in heavy parallel transactions, you will anyways find gaps between numbers, so don't base anything on the assumption that there won't be any holes in this sequence, what's guaranteed is a uniqueness and a sequence, not 'consecutive'ness

    Point being, the holes part isn't just about delete. It's knowledge that's passed around without context in the tech community. Auto Inc causes holes also due to parallelism and a fail fast approach.
  • 1
    @A4Abhiraj i understand your point. So what do you think we should use to avoid such cases?
  • 1
    @A4Abhiraj and in google login we get googles 21 digit int id .. are those unique to each users?
  • 4
    Reference to locking -
    https://dev.mysql.com/doc/refman/...

    The first question is do you need to avoid this.
    My experience comes from using postgres as a queue(using notification on trigger) and we started with auto increment and with every insert, the time to insert went up.
    It's a big no for this scenario. Figure out if concurrent inserts are in question and does the next insert depend on the previous. ( Not for the pkey, but for application logic, like the above example)

    And you can use UUID instead. Slightly heavier to calculate. And recent databases support indexing uuids by saving them as binary data, not chars.

    MySQL specific reference -
    https://stitcher.io/blog/...
  • 4
    @AurthurM not sure which ID, but I'm guessing they are unique.
    Uuids will have a format of xxxx-xxxx-xxxx-xxxx-xxxx
    But it depends on various standards and there are a few. Perhaps Google uses UUID and then converts it to base64 string to reduce the size.

    Or maybe they add a date stamp to the end, because companies like Google can still have collisions even with UUID.
    Amazon recently had some issue I heard.
  • 1
    @A4Abhiraj ok. I will keep them in mind. Thaks for explanation.
  • 3
    @A4Abhiraj Everything will be a nightmare at that scale. Sounds like a fun problem to solve! :)
  • 1
    @Root yeah the chances of collision are never zero, they just tend to zero.
    I've seen people mention using 2 uuids. The solution to this is reducing the chances of collision, rather than making it 0.
  • 2
    @A4Abhiraj Sounds like a flawed approach. If something you’re doing is to mitigate a problem when it happens, the better approach would be to find a way to prevent it from happening in the first place.

    Here the two uuids are mitigating a race condition.
  • 1
    @Root there's no race condition with uuids , atleast as a process, every UUID function runs on its own, can be run on different servers and will be unique (one standard depends on the MAC address)
    As a theory, yes a race condition in the sense that 2 rows have the same UUID.

    Read here
    https://stackoverflow.com/questions...
  • 1
    @Root I'm sure there will be mathematical models to have true unique values.
    Software has always been a midway between perfection and practicality.
    UUID is one such example..
  • 4
    @A4Abhiraj Exactly. The problem isn’t in the uuid, it’s elsewhere.

    Like i said, everything at that scale is a nightmare — because it’s complicated with many edge cases, and the solution must be perfect in every way. Fun problem.
  • 4
    It is bad because your stupid co-workers make code dependent on incrementing that id. I puked when I saw him for each to match id
  • 1
  • 2
    @Root sadly true story. The same person made 4k lines php backend API.

    He left the company to work in the bank which is nice. :D
  • 3
    @danielstaleiny um he did what??? It sounds like he queried a million rows then tried to loop through each to find an id that matches what was passed in. That’s pretty sad
  • 2
    @danielstaleiny someone's money is in bad hands 😂
  • 1
    @danielstaleiny he left to work at a bank as a...?
  • 2
    @TeachMeCode good question. :D I am pretty sure as software developer. Don't ask me how he passed the interview.
  • 3
    The id number not being used again when deleted is a good thing not a bad thing, don’t try to change that behaviour.

    But I would argue for using a UUID instead so that you can generate it before writing to the database a d apply it to other things - maybe you have a message to put into a bus, a file to write, a letter to print etc. Whatever it is. It’s easier to do that with a UUID than it is to save to the database and then apply the id to those examples
  • 2
    Your teacher is wrong for almost all use cases.

    Only if guessing of IDs is an actuall attack scenario (like when you completely mess up your authorization and/or rate limiting), a source of strictly monotonic increasing positive integral numbers (that is, what autoincrement represents) doesn't cut it.
    Don't forget, to set the initial value to some random number having enough digits if users shouldn't be able to estimate the total amount of records at creation time of the record.

    If users must not be able to infer order of creation for any two records, you do actually need the record IDs to be random. But even then you can still use autoincrement IDs internally and map them to randomly generated public IDs when they need to be exposed (don't derive them from the internal ID).
    Chances are, that you aren't stumbling over that use case even once in your life as a coder...
  • 1
    @Oktokolo yeah in my project those ids are not exposed to user
  • 1
    @AurthurM
    Then i don't see any point in not using the automatic sequence of guarenteed-to-be-unique IDs provided by the one entity, which ist most able and also absurdly optimized to do so.

    But don't worry: Teachers are wrong all the time.
    Try to get his reasons and then debunk them one by one. After that he either stops insisting on that fake news or he is an idiot.
  • 1
    @Oktokolo yeah.. i will do that :)
  • 2
    A lot of this rant should be put in big fat brackets followed by the sentence

    "theoretically, but only under certain circumstances that in most use cases are highly unlikely".

    I'll try to make it quick:

    - the dreaded rollover is the common scenario: an auto increment reaches the maximum value of the type it can hold and thus - after incrementing - rolls over to 0 and starts at 1 again

    Highly unlikely. MySQL e.g. has INT unsigned - 4_294_967_295 is the maximum value.

    Unless you do really really really really frequent inserts, this will be hard to reach.

    - the tales of transactions and locking.

    Very short: if you come to the point where this matters, you'll have to deal with a lot of problems. And most likely you'll know what you are doing. Cause it's hard to provoke transaction problems and require e.g. an change of the default isolation mode for transactions, unless you try to use the database for advanced locking (advanced in the sense of actively locking certain rows, e.g. creating semaphores or stuff like that).

    Which requires a lot of knowledge, cause this is a pain in the butt... And really a bad idea for many reasons. Not impossible, but if you don't know what you're doing, you're literally standing in an rain of invisible arrows and end up getting pierced by them. (not joking. This is really really really hard stuff).

    Usually you can ignore this as long as you use the default isolation mode and an transactional engine and don't decide to poke into the default behaviours (shared row locking, trying to be too clever, over optimizing, REPLACE / ON DUPLICATE KEY UPDATE...)

    - the promised land of UUIDs.

    Ugh. This is wrong. For many many reasons.

    Unless the database provides an integrated datatype for UUIDs, think twice.

    UUIDs exist in many forms, commonly (but erroneously) UUID refers to UUID V4.

    36 chars are _a lot_ in databases, especially when used as a PK.

    This matters. UUIDs as string are a bad idea.

    Don't do if you have alternatives.
  • 0
    @IntrusionCM what should be the alternative from your pov?..
  • 0
    @AurthurM

    The cats motto: If it fits, it sits.

    Auto increment keys are fine.

    What @Root says is true - auto increment keys are predictable, but again: only consider this as an security issue if it matters.

    E.g. in an API endpoint with authentication and user permissions, I see no reason to not use auto increment IDs. After all, if the permission system does not work, the predictability of the Auto Inc IDs is the least of your problems...

    Same for everything that happens only internal - e.g. with no possibility of user / client access.

    If you have to obfuscate, try to look for the simplest solution.

    Don't over anticipate - it's what ends usually in the most complex solution, only adding burden and not really solving stuff.

    E.g. most people fear hash collisions.

    As I said before - an INT 32 unsigned is really large.

    The easiest solution is to just generate a SHA 256 hash of the ID plus a random generated salt, and return the first half.

    16 chars.

    Easily done in any ORM, only used for lookups.

    Should it ever fail, which is really unlikely, you can just retry.

    Note: this is just a random generated string. The SHA256 is just for lengthening the string, the random value to obfuscate.
  • 3
    This is why I tought myself everything, and am afraid to send my son to school.
  • 2
    @Nihil75 Yes. This, exactly.
  • 3
    Your teacher is a moron (like most of them).
  • 1
    Ignore teachers. Do your own research.

    Autoincrementing integer primary keys are used EVERYWHERE in industry. You could use GUIDs, and if you were using postgresql, you could have the DB autogen them for you. But GUIDs aren't guaranteed to be unique (extreme edge case), so I'd go with the autoincrementing id. Autoincrementing ids won't let you down.
  • 1
    It works fine in basically all cases. If you have a foreign key referring to either of multiple tables and don't have a discriminator column (which is already a pretty fucked up sitch) or if you want the IDs to go client side and prevent guessing, use UUIDs.
  • 0
    Now i am still using autoinc primary key but with uuid as secondary key .
  • 0
    UUID will be always bigger than incrementing a number why bother ?
Add Comment