17

Issue reporter:
Feed from external service provider is overwriting the wrong things, pls fix

Resolution:
Service provider is sending duplicate unique IDs, you need to get with them to discuss

Service provider response:
This is a *RE-USABLE UNIQUE ID* . . . The bottom line is you should not use this UNIQUE ID as a primary key.

Comments
  • 25
    🙃
  • 3
    kind of makes sense... I mean, if I have 3 fruits

    1 - banana

    2 - apple

    3 - orange

    and I eat the orange, I then have

    1 - banana

    2 - apple

    Then I get a lemon and reuse the UID:

    1 - banana

    2 - apple

    3 - lemon

    Technically, it's still a UID. Not a single item in that table shares its ID with any other item in that same table, so at that point in time, all the IDs are unique.
  • 4
    @netikras this falls apart as soon as you have even a single other table reference the primary key in the first one
  • 3
    @netikras That brings up a good question. When we talk about a unique identifier, is unique an attribute of the identifier itself or an attribute of the _entity_ it identifies?
  • 1
    @localpost Noone said anything about references ;)

    @ethernetzero exactly. And I see OP and the 3rd-party are having the same disagreement
  • 2
    The service provider is stupid IMHO.

    A reusable UID is only useful if it can only identify one entity at any given time. For example, the Spanish national ID _is_ reusable but there can't be more than one living person with the same ID. If more than one entity can exist at the same time and have the same unique ID, then it never was a unique ID in the first place. Call it a category, a tag, a grouping ID or what have you, but not an UID in that case.
  • 5
    @netikras you are right and the service provider is technically right. But this is a very weird use of a unique id 😆.
    I can‘t think of a good reason to make it reusable. It only introduces problems into the system.
    Like in your example, oranges can become lemons.

    On a less serious note, if life gives you lemons, it might be because some dumbass decided that sharable unique ids are a good idea.
  • 2
    @ethernetzero that's a fair point and as far as I understand the situation there is a time component involved - the issue with this feed is that one entity was pulled in with this ID a little while ago, and now a different entity has that ID on the provider side.

    The difficulty is that the whole point of this feed is to keep our system up to date with information on the provider side, so it kind of relies on one ID referring to one single entity in perpetuity. I totally understand the time-related argument, it's just a funny/frustrating situation.

    This might be getting a bit philosophical, but in my mind unique means an ID that doesn't give me a different reference depending on _when_ I ask for it...
  • 3
    @Lensflare agreed, and in @netikras example I think a more appropriate name would be... "Stack level" or "stack ID"? Since in that context all the value is really telling you is the position in a list, and we may happen to assume that the items in the list are unique.

    I think it would be akin to performing lookups based on ROW_NUMBER() rather than an actual ID value.
  • 2
    Note from very lengthy, sometimes blood and gore filled discussions....

    If you stick a foreign identifier into your internal models, worse database, and use the foreign identifier as a primary key....

    ... You should really... Really.... Really... Smash your head on the table till the thought is gone permanently.

    I said in the beginning bloody and gore filled as many of these discussions become very violent due to the fact that many devs fail to see that an foreign identifier is something that **must** never be trusted.

    There are plethora of reasons for it.

    The simplest is: It's an foreign ID.

    Who knows, what Smeagol thought was their treasure and is actually a cursed ring that will lead Sauron to yourself...

    Foreign means it can change.

    It can have a different meaning depending on context, with you being uninformed what contexts exist or what the true meaning is.

    It can be an encoded value, where the encoding can change without further notice (and thus the value could be entirely different, e.g. what was a 16 char value with no special chars is now a 16 char value with special chars).

    It could be re-emitted as a way of "retry" mechanism, leading to an erroneous "duplicate key" error on your side.

    It might not have an defined or unknown type, e.g. a 128 bit / 16 byte integer can look like a 32 bit / 4 byte integer, but when that thing starts overflowing, you will be ruined.

    ....

    I could go on for hours.

    Don't rely on foreign information.

    Never use a foreign identifier as an primary key.

    Always treat foreign identifiers like a living grenade whose detonator is on and just waiting to explode.

    ...

    And most important: Don't argue with the two most unreasonable arguments against basic security measure - "it's redundant" and "it's inefficient", cause that might lead to blood and gore I mentioned.

    Better safe than having a multi tenant db going haywire "cause dev didn't want redundant information for efficiency".
  • 0
    Don't use client supplies ids for primary keys, use your own, and create a cross reference table
  • 0
    @netikras if you ever used any sensible database you would know that what happens (as it should) is that the lemon gets id 4.
  • 0
    @Midnight-shcode Wrong. Lemon gets whatever ID it's assigned.

    It does because whatever is inserting the lemon into the table relies on the DB's sequences to "generate" an ID for it. Regardless, the ID has to be explicitly assigned to the entry *before* persisting it. Whoever is running the INSERT has to explicitly specify the value for a FK. That value can be randomly generated, it can equally be a favourite colour of anyone in the room, it can be a sequence.next() value if you run out of imagination how to generate a unique value for it.

    Sequences are but helpers. Not a rule.
  • 0
    @Midnight-shcode FWIW sequences are not salvation. They do not scale horizontally (if you have an active-active app cluster with separate DBs, e.g. a DR), and they very quickly become sparse, i.e. even if the DB only had 3 items in the table (with IDs 1, 2, 3) that have never changed, the next added item can get an ID=3323562. Sequences do not roll back if the transaction is aborted.
  • 0
    @netikras ok let me rephrase.

    BY DEFAULT, when you let the DB engine decide, and when it's a sane relational database, and in your case I recommend doing that, since you seem to not understand why it's the default behavior, lemon gets assigned 4. said in a more general way: the next automatically assigned id is always the highestEverUsed+1 id, regardless of what's the highest ID currently existing in the db.

    the hint as to why this happens is in the name - relational.

    there might come a time when for various reasons a related record might not get it's "i'm related to this thing" column erased along with that thing itself.

    in which case, you want that ID to dangle into nothingness, instead of randomly attaching itself to some random-ass entry created sometime later.
  • 1
    @netikras

    "(if you have an active-active app cluster with separate DBs, e.g. a DR), and they very quickly become sparse, i.e. even if the DB only had 3 items in the table (with IDs 1, 2, 3) that have never changed, the next added item can get an ID=3323562. Sequences do not roll back if the transaction is aborted."

    good. i wouldn't ever want to even IMAGINE having to debug/figure out the temporal ID collision issue that I outlined in a fucking cluster. It's a nightmare enough to try and figure out what happened and how to fix it in a single DB accessed by a single app.
  • 1
    @Midnight-shcode

    This isn't true.

    In this case I'm nitpicky cause auto increment has a lot of gory details. It's safe to assume you describe auto increment / MySQL and not a sequence as in the SQL standard, cause a SEQUENCE does a whole lot more.

    But even the Auto Increment keys in MySQL aren't easy - relying on a behaviour of an auto increment key is another thing one should never do.

    It can have gaps - be it for rollback, be it for deletion, be it due to manual override, due to overflow...

    ... A lot of things sequence additionally handles.

    https://postgresql.org/docs/...

    As one example.

    The even more fun thing is that auto increment key and sequence can be seen as anti patterns in many ways.

    They're basically the poor man's choice for not finding a unique index based on the table data.
  • 1
    @IntrusionCM parts of what you wrote is stuff i don't know about, but the parts i do know i agree with.
    "unique key based on the data" - compound primary key is an option? and then using it as an FK? last time i tried to do something like that was a very long time ago, but back then it wasn't possible, afaik.

    still, i was responding to netikras, and the general logic of those two options discussed, and even though my response may be even more underinformed than i originally thought, within the framework of what i was responding to, i stand by my point, precisely for the reasons already stated.
  • 0
    @Midnight-shcode could be a matter of taste. Could be a matter of use case. But at least half of the tables I had to ever create - I wished I didn't have to rely on sequences. Esp after sec tests :)
  • 1
    @Midnight-shcode

    A foreign key composed of multiple columns, compound, is possible.

    It depends a lot on the RDBMs - as some RDBMs are limited regarding the possibilities, especially in regards to storing the data. Storing data as compound keys might exceed limits regarding size (e.g. MySQL InnoDB allows [ancient versions require large_prefix] only up to 3072 bytes per whole index key).

    But - hence my comment on being an anti-pattern - when you have already defined an sequence to be an primary key, you would require an unique index for the compound key to define an foreign key.

    Which would lead to a massive resource loss - as you have an primary key, an unique key plus the foreign key.

    Anti-pattern for the sequence in this case as you added an artificial primary key despite having an alternative, non-artificial PK.

    Just to clarify: I can understand your point, I just wanted to make it very clear that the expectation that an auto - increment or sequence doesn't overflow or is a "true" sequence of always increment one is a dangerous one.

    Dangerous as the expectation doesn't match reality. You can enforce a true sequence of plus one, but this requires locking and a great deal of invasive design - which is in almost all cases a very bad idea TM.

    That was the reason I pointed it out, cause many have this "idea" stuck in their head that an auto-increment / sequence generates a true plus 1 increment **always**.

    I'm not in favor of recycling IDs - but as an overflow is almost always a possibility with auto-increments or sequences, one should plan carefully how to avoid it.

    long (64 bit) or even long long (128 bit) integers are just a way of sacrificing resources to make an obvious problem an unlikely problem ;)
  • 1
    @IntrusionCM
    1. thanks for the info
    2. i understand your point now and i agree
Add Comment