dinsdag 30 september 2014

"Cache database queries heavily, because the database is always the biggest bottleneck in every PHP application."

Today I read onother of those lovely articles that tell you what to do in your web developement.

Lots of good advice, of course, and some pretty bad.

For example, the often quoted mantra: "Cache database queries heavily, because the database is always the biggest bottleneck in every PHP application."

The problem lies in the definition of "bottleneck". It is certainly true that SQL queries usually take more time to run than the rest of the PHP script put together, but that doesn't mean it's slow. It just takes that muche time, get over it. For some reason, PHP users seem to start their project with a benchmark of a script that just contains <?php ?> and then want to use caching to compensate for every extra microsecond.

But the real problem with caching queries is that you can't. The old saying is that there are two things difficult in programming, naming things and when to invalidate your cache. If you do decide to cache queries then you are not just going to run into the latter, you will soon find that you cannot find a way to reliably invalidate your cache. You simply cannot tell that the database has changed without looking at the database. The best you can do is set a timeout and pray that the stale data won't be a problem.... but it wil.... but you know it will.... So how can you make your database related work faster? Well that depends on what you are doing.

if the data from your query is going to be used for further mutations, then just don't cache it. Never. The problems you can get by accidentally feeding a procedure with stale data are simply not worth the performance gain. Don't cache the query that fetches a user's current credit, or his shopping cart, or the current stock for a product, or... well anything that changes while the user is working.

If the data is meant for display only then there is no real risk involved if it's stale. Still, I would never cache the queries, I'd cache the HTML that is generated from the query data. After all: why cache the source and waste time re-creating the HTML that will be identical for the same source data?

So... what if your queries are just too slow?

Make them faster!  Learn SQL, use indexes properly, use the features that your database has to offer, like partial indexes, stored functions, CTE's, windowing and the like.

Ok, but what if I really can't make thing fast enough and I really do need to cache query results?

Simple: don't. Have you not been listening?  Use a materialized view instead. That's a bit like a cache, but it's controlled by the only entity that *can* tell when data is stale: the database.

But please, never cache raw query output in your application, you're just digging a big hole for yourself.