zondag 23 maart 2014

SQL Performance thingies , part 1. Don't modify a datetime to compare against NOW().

When searching for records of a particular age, people tend to do this:

SELECT * FROM table WHERE (NOW() - datefield) >= INTERVAL 5 DAY;


SELECT * FROM table WHERE datefield + INTERVAL 5 DAY >= NOW()

These are bad because they require the database to calculate the difference for all records, which is a no-index operation.

The best way to do this is to calculate the actual date  value that marks '5 days ago'. ie: NOW() - INTERVAL 5 DAY:

SELECT * FROM table WHERE datefield >= NOW()  - INTERVAL 5 DAY;

This way the database will see that NOW()-INTERVAL 5 DAY is a constant within the transaction, and calculate it only once. Comparing a date field against a constant can use an index so this is the fastest way of doing this.

zondag 9 maart 2014

Everyday PostgreSQL: Counting votes.

Problem: a movie fansite allows users to vote, and rather than try to block users from voting multiple times, they just log every vote and make their minds up later.


CREATE TABLE movies (name VARCHAR(100));
CREATE TABLE votes (userid INT, moviename VARCHAR(100));

INSERT INTO movies (name) VALUES ('Austin Powers'), ('Tank girl'), ('The fifth element');
INSERT INTO votes (userid, moviename) VALUES (1,'Tank girl'),(1,'Tank girl'),(1,'Tank girl'),(1,'The fifth element'),(2,'Austin Powers'),(2,'Tank girl'),(3,'Tank girl'),(3, 'The fifth element');

So, how do you count unique votes per user, per movie?

One way is to do exactly what the problem describes; count the number of unique votes per movie.
First; create a list of unique votes per movie, using a plain old DISTINCT:

SELECT DISTINCT userid, moviename FROM votes;

Then use that as a sourc of vote information  in a nice CTE:

WITH distinctvotes AS (SELECT DISTINCT userid, moviename FROM votes)
SELECT COUNT(*),moviename FROM distinctvotes INNER JOIN movies ON movies.name=distinctvotes.moviename GROUP BY moviename

This approach also enables you to define more rules for which votes count and which don't. The CTE query could for example exclude al users who voted more than ten times in the same minute. Putting that query in a CTE leaves the actual count query clean and easy to manage.

woensdag 5 maart 2014

Stored procedures, yes or no?

Today I stumbled onto a stack-exchange post by a guy who was very, very clear about how extremely bad practice it is to use stored functions. His argument (in it's entirety) was that his 30+ years of experience tought him that they are a maintenance nightmare.

So, is that true? No, not really, but I understand what he's on about.

When you work on scripts or source code you can update each file separately and in the case of PHP or Python you can even update separate files directly on a production server. A stored procedure can depend on other database entities, some of which may even have to be dropped before the update can proceed.

The thing is; you can't really just update separate files on a production server, for bovious reasons, I would think. A proper update should always bring the system into a single-user mode, then do the update, test the new code, and then bring the server back into mutli-user mode.

If the system is going into single-user mode anyway, it nolonger matters how many changes you make, you can easily execute a single SQL file that removes and re-creates a set of dependent SP's. It is exactly the same operation you would do if you had to update several interdependent sourcefiles.

So no, it's not a nightmare, it's a simple matter of getting your procedures in order.