help

Ranter

rox79

823

Comments

1

hitko

3005

4y

The part you're doing wrong is trying to process large data by sending it to a web server and waiting for a response. That's not what web is for. Also I'm 99% sure you're trying to do something really dumb, because when you have a legit reason to send that much data and process it on a server, you normally already know how to do it the right way (hint: it doesn't include waiting for the task to complete within the same request).
0

100110111

3778

4y

What @hitko said. Now you may want to look into event-driven architecture and sse (though I’m not actually sure how to apply to PHP)
0

C0D4

64776

4y

There's a few things that come to mind to get this working again.

1) optimise the query to reduce execution time.

2) once optimised to the point it doesn't get any better, load it to a temporary file instead of sending it back to the browser, and zip it up.
Then email a link back to the user and download the file and have it deleted.

3) if this still takes to long, run it as a cronjob from the server side, create a db table with a queue of sorts and schedule a job to read this db table and any record in that table would generate said data export and remove its self from the queue - this would remove the time dependency as the server is initiating the php job and not apache / Nginx.

4) if it still doesn't work, log into the DB yourself and run the query directly 😅

5) go back to 1, but now add limit and offset to the end of the query and make them arguments you can send through and download the data in chunks instead of the entire thing at once.

5.1) I see you made it back here, now try again with smaller chunk s until the thing works.
1

rox79

823

4y

@C0D4 yeah 3 seems the last option. 😔 If nothing helps today I will create batches and run cron
1

100110111

3778

4y

@rox79 n:o 3 would pretty much be my first option if this had even the remotest possibility of becoming a recurring issue (also, n:o 1 as well - optimize the fuck out of everything!)
1

C0D4

64776

4y

@rox79 you may want to reread for future you to consider options.

@100110111 depends on skill level, #1 is usually enough though, unless it's a tonne of data or really poor joins.
0

100110111

3778

4y

@C0D4 fair enough. My issue with optimizing the queries as the only course of action is that it doesn’t necessarily scale well (assuming the db also gets new data inserted into and actually grows in size). I personally would want to remove a future pain point away as early as possible. People don’t think of scalability nearly enough, I’ve come to notice. We have an example on an application that I work with, where some daily run queries where designed and implemented 3-4 years ago, - ran fast nuff back then, but stull big nuff a job to need a solution quite like your 3rd point - but the data to be queried has grown so vast the job now takes over 2h to complete. It* was easy to circumvent but the problem will persist until I have the time to tackle the issue. I know how to fix it, but I’m stuck with more urgent issues now, so it’ll have to wait.

* the job had a timeout set for 2h, where it would be considered as failed and retried. So for a while we got the results of the job twice and since it ran twice, it also incurred computing costs.

Related Rants

Add Comment

devrant

suggestion