Optimized a query today. Before it timed out after 10 minutes, now it takes 4.3 seconds. Very proud.

Ranter

Awlex

17587

Comments

2

kamen

6786

3y

Hope it still carries out the same function as well...
8

ElectroArchiver

2847

3y

@kamen I can make it so fast, it doesn't even exist heh
6

darksideofyay

5788

3y

we had a query that was so bad that it wouldn't run in a week. i got it down to 20s. there was a ton of garbage on it lol
0

Lensflare

19978

3y

n^2 -> log(n)?
1

cb219

1044

3y

@Lensflare more like n^2 -> itlog(n)?
1

Lensflare

19978

3y

@cb219 sry I don’t know what itlog means. I googled it and read something about eggs. Still confused.
1

dev0x01

56

3y

@Lensflare BECAUSE EGGS
3

Midnight-shcode

4441

3y

ahhh, reminds me my first time i tried to optimize a query, got it from like 10 seconds to like 120 milliseconds, felt amazing :)

(wasn't as awesome of an optimization as it sounds like, since it was basically just written (by me) in the worst way possible the first time, since i gave it zero thought)
1

cb219

1044

3y

@Lensflare oh sry, just googled myself. 😂
I used a notation I've learned in university. Apparently not the "official" one. I meant log*(n). Iterated log.
2

nebula

1786

3y

gz :)

been there some years ago and sp_blitz helped me https://www.brentozar.com/blitz/

it was so satisfying to say that a query which executes multiple times a day got reduces from a minute or so to a few seconds just because I removed implicit conversation from varchar to nvarchar.
0

hjk101

5552

3y

What was the issue?
0

datablitz7

285

3y

Did you just change the query or also added/improved indexes to your tables?
0

Awlex

17587

3y

Damn, looking at what I wrote, i think I could write an article about it

Alright, time to explain what I did. Essentially we had a big query that would search for all device ids that should get a push notification for a new message while also considering the accounts currently on the device and the device settings.

This in itself isn't that expensive. What made the difference between the new performance and the old performance was that we also had to calculate how many unread messages are currently on all accounts & sub-accounts of the device.

After examining the query I found that the part that was taking the most time was the condition where I'd check if the message should go to one of the devices sub-accounts, which is saved in an array in the message.

The fix for this was to
1. Split this query in 2 part: Querying the receiver devices and counting the unread messages
2. Apply a GIN-index on the array of sub-accounts and replace `receiver_id IN message.sub_accoubts` with `ARRAY[receiver_id] <@ message.receivers` which essentially means "check if the array on the left occurs in the array on the right".

I know the second step sounds crazy, but that's essentially the only way I found make it use an index, because `IN` doesn't make use of them in postgresql at least.

Thanks to TDD for making this refactoring possible without bringing down prod.

Add Comment

Optimized a query today. Before it timed out after 10 minutes, now it takes 4.3 seconds. Very proud.

random