6

Interesting...

On Friday, I was playing with the ChatGPT integration in DBeaver. I was using the DBeaver sample SQLite database. This database has a couple of tables, among them Album and Artist, where Album has a foreign key into Artist.

So, I asked it:

"give me a query that lists all albums from artists who's name starts with s"

The query I got back was:

SELECT * FROM Album

Uhh, okay.

But then, I noticed that I wrote "who's" instead of 'whose', which would be proper grammatically. So, I changed that, and then I got this query:

SELECT * FROM Album WHERE ArtistId IN (SELECT ArtistId FROM Artist WHERE Name LIKE 'S%');

Hooray, that works! I'm not sure it's the best way to write the query... I might have written:

SELECT * FROM album a, artist r WHERE a.artistid=r.artistid AND r.name LIKE 'S%'

...I'd have to check to see if one performs better than the other, and consider which syntax I find clearer, but that's a separate issue, it's just nice to see a working, reasonable query generated because that's the point, after all.

But I found it interesting that such a minor error would cause it to not work, that's my main point.

Interestingly, it seems to have learned: I just tried the same thing, and I got the right query either way. So that's pretty cool.

It's a pretty neat feature and I can see some legitimate value in it. I'm pretty good writing SQL myself... I've managed to write some truly hideously complex queries over the years... but there are definitely instances I can recall where the query didn't seem obvious at the start, and having an AI that can MAYBE produce something that is AT LEAST a starting point is definitely something I can get onboard with.

Comments
  • 2
    The effectiveness of either query depends on the database in use. Explain-analyze plans for both queries and see what costs are assigned and what overall algorithms are used.

    P.S. plans will change over time as growing amount of data decays the performance of a particular plan and it has to be adapted.
  • 5
    I wouldn't even use the "table1, table2 where"

    Use a proper JOIN.

    SELECT * FROM Album JOIN Artist ON Album.ArtistId = Artist.ArtistId WHERE Artist.Name LIKE 'S%';

    The option ChatGPT gave you is also possible of you have a ton of album rows but only a few artists rows.

    The join is more efficient as long as you have correct indexes on the tables.
  • 3
    There's a good chance it didn't learn, you were just unlucky with the first one. Remember that GPT generation is stochastic by nature. At the end, it randomly samples words from the distribution to continue the sentence. So if you ask it 10 times, you gonna get roughly 2-3 sentences, but might get 10 different ones if the RNG gods deem that appropriate
  • 0
    @Hazarth Ah, that's interesting, thanks for that explanation!
  • 2
    @Grumm

    I'd elaborate on that, cause it's a misconception.

    SQL-89 join syntax (jep, 89) is technically a cartesian product filtered by the where - as the where clause uses equality, it is an INNER JOIN.

    Any database with SQL-92 support will treat it (and probably rewrite it during optimize) with an inner join.

    Otherwise you would be right - in strict SQL 89 *without* an proper optimizer, the cartesian product would be produced first, then the equality check would filter (large result set, then trimmed down).

    Given that SQL - 89 is... *old*… I'd really doubt that any sane, modern database system would pull that stunt.

    Note that this applies only to a clean inner join, not an left join. Left join sucks.

    WHERE x.id = y.id (+) (oracle syntax)

    If I remember correctly, MsSQL phased that out... I think it was another char than the plus...
  • 0
    There is a special place in hell for people who use implicit joins or implicit converts.
  • 1
    @IntrusionCM Sure, but why keep writing SQL queries in the SQL-89 standard ?

    Just because the compiler can optimize it doesn't mean we cannot write it like that too ?

    I prefer to see a lot of 'JOIN' syntax.
  • 1
    @Grumm definitely, i just wanted to clean up the misconception regarding behaviour.

    I *hate* the SQL-89 syntax.

    Especially in long queries when your brain is tired from picking apart the rest, a sudden join specific part in an ocean of conditions is the worst.

    Brain goes boom.
Add Comment