woensdag 26 februari 2014

Why I like PostgreSQL more than MySQL: Check constraints

Databases can basically only store strings and numbers. Your business however has a frightening number of rules an regulations about what the data should look like.

A sales price for example must always be more than zero. A discount must always be between zero and the salesprice. A tax rate is always between 0% and 100%. The number of units per package is always one or more. A product cannot be marked as sold-out if backorders are allowed. And so on...

In MySQL you can do this with triggers, which works, but is hardly elegant, let alone easy to manage.

In PostgreSQL you can use a CHECK constraint on s single field:

CREATE TABLE foo (id INT, price NUMERIC(10,2) CONSTRAINT legal_price_check CHECK  (price>0));
INSERT INTO foo (id, price) VALUES (1,0);

"ERROR:  new row for relation "foo" violates check constraint "legal_price_check""

Or on he table as a whole:
CREATE TABLE foo (id INT, price NUMERIC(10,2), weight NUMERIC(5,2), shippingcost NUMERIC(4,2),
 CONSTRAINT legal_shipping_cost CHECK  ((weight<= 100 AND shippingcost=0) OR (weight>100 and shippingcost>0)));

INSERT INTO foo (id, price, weight, shippingcost) VALUES (1,10, 90,0);
INSERT INTO foo (id, price, weight, shippingcost) VALUES (1,10, 90,10);

The first record works, the second is denied because the shippingcost should be zero if the weight is below 100.

Checking the progress of a MySQL ALTER TABLE query?

I've often said that MySQL spend way too much of their time working around it's shortcomings, and today I say this post:


It shows a way to see the progress of an ALTER TABLE statement in MySQL. Normally I'd say "that's a cool gadget for a special X-mas post in a blog", but he's dead serious about it: ALTER TABLE in MySQL rebuilds the entire table, which not only takes a long time on large tables, it also locks the table while doing it. In large production systems you cannot alter a large table without seriously disturbing the functionality of the system. Hence, you want to be able to tell the users how long the system is going to be down; progressbar.

Kudo's for finding out how to do it, but seriously, shouldn't you switch to a database that does not suffer from this ridiculous issue in the first place?

dinsdag 25 februari 2014

Why I like PostgreSQL more than MySQL: PL/PgSQL

MySQL supports stored functions, true. Unfortunately they only support the SQL-standard way, which is quite limited and not a very userfriendly language to write. (Isn't it ironic; the one time they do stick to the standard and they do it where they were better off using something custom) One of the most annoying limitations of the SQL standard PL language is that you cannot execute dynamic queries; queries that are composed in variables.

Dynamic queries allow you to build a query that suites the current requirements, depending on things like the content of the parameters, the content of the records and of course: business rules.

This can be particularly usefull for filterin and sorting. A function for fetchin the latest products may include a sortorder parameter that controls which field should be used to determine order, and that field will have to be added to the query in an ORDER BY clause. Obviously you could just create an ORDER-BY clause that uses a large CASE statement, but queryplanners cannot optimize this because they can't predict which field will be used.

In PostreSQL's PL/PgSQL you can include a command like this (from the manual)

        || quote_ident(colname)
        || ' = '
        || quote_literal(newvalue)
        || ' WHERE key = '
        || quote_literal(keyvalue);
As you can see it concatenates a series of strings with quoted variables and calls EXECUTE on it.
Effectively this means that your stored function can use complex logic to construct the query and execute it.

"Isn't that dangerous?"

Talking to developers, #1: what's important in databases?

From a conversation with a "developer":

"What do you think is the most important aspect of a database?" I ask.

"Performance!" The developer says.

"What is performance?" I said.

"Well it should be fast!"

"Fast at what?"

"Running queries!"

"What kind of queries?"

"All kinds, everything must be really fast."

"What about reliability?"

"What about it?"

"Well, shouldn't you be able to trust your database to save the data you tell it to save verbatim?"

"No, we do that in the application, the database just has to store the data, really quickly."

"Doesn't the application need time to make sure the data is valid?"

"Of course, but the database is much slower at that."

"Ok... so how do you do these validations?"

"We run queries on the database to see if the data we are about to send actually exists before we send it."


zaterdag 22 februari 2014

Every day PostgreSQL: Using XML functions for quick HTML previews.

Over tbe past months I've been working with OpenCV for image processing. One of the projects I'm working on requires comparin histograms. I won't go into any sort of detail about that, suffice it to say that I noe have a tabel that holds a few million comparison results from a few thousand images that have been compared to eachother. The next step is of course to process the results and find images that meet the requirements. A regular query can return the names of the images, but is need to see the actual images to check if the query returned matching images. I could un the query,fetch the data into a language and write code that generates HTML with the names of the images in an IMG tag, or I could use PostgreSQL's XML functions to generate an HTML page, so I only have to read one field from the resultset and print it.

-- Create a table of images
CREATE TEMPORARY TABLE foo (id int, filename TEXT);

-- Step one: create image tags
INSERT INTO foo (id, filename) VALUES (1,'basil.jpg'),(2,'sybil.jpg'),(3,'polly.jpg');

-- Step 2: aggreate the image tags inside a body tag.
-- I use CTE here to make the code more readable.
WITH imagetags AS ( SELECT XMLELEMENT(name img, XMLATTRIBUTES(filename AS src)) AS imgtag FROM foo )
SELECT XMLELEMENT(name body, XMLAGG(imgtag)) FROM imagetags;

-- The final code, added an HTML tag and a header with a CSS link.
WITH imagetags AS ( SELECT XMLELEMENT(name img, XMLATTRIBUTES(filename AS src)) AS imgtag FROM foo)
SELECT XMLELEMENT(name html, XMLELEMENT(name head, XMLELEMENT(name link, XMLATTRIBUTES('style.css' AS href, 'stylesheet' AS rel))),
XMLELEMENT(name body, XMLAGG(imgtag))) as h FROM imagetags;

Now all I have to do in the program is execute the query, fetch one record and print the value of column 'h'.

How mainatainable is this? Well, you could just plonk this into a view, which makes updating it a matter of executing the CREATE statement of that view once for every mutation, just like you'd press 'save' after updating a PHP file, and in both cases you have to reload the page to see the result, so there isn't much between both methods.

vrijdag 21 februari 2014

Composer security issue, take care!

Composer has a feature that allows code from one package to overwrite code from another package, and the creators seem reluctant to fix it. Short-term fix: don't update packages you installed through composer. Long-term fix: hope that the composer guys have a change of heart because their initial reaction is "it's not a big deal" Get the detals here: http://blog.astrumfutura.com/2014/02/composer-downloading-random-code-is-not-a-security-vulnerability/

donderdag 20 februari 2014

Why I like PostgreSQL more than MySQL: functional indexes

Indexes enable a database to lookup records quickly. They contain a sorted subset of columns from the target table, which makes them small and easy to traverse.

However, they can only help if the clause that your query uses can be solved by looking at the data that is stored in the index. For example, a query containing the clause "WHERE YEAR(datefield)=2014" will not use an index.

In MySQL the only thing you can do is write a trigger that saves the output of YEAR(datefield) into a separate column or even a separate table, and index that. This is very clumsy and difficult to use in your application.

In PostgreSQL you can simply create an index on YEAR(datefield) and the database will use that if a clauses mentions YEAR(datefield), and it will of course automatically update the index just like any other index.

CREATE INDEX datefield_year_idx ON foo (EXTRACT(YEAR FROM datefield));

This type of indexing can also be used on XML and JSON fields.

maandag 17 februari 2014

What is the best naming convention in databases?

The short answer is: the one you feel comfortable with.

The long answer is that every convension has it's advantages and disadvangtages and both are usually valid points.

For example; I have allways used plural form for table names, because a table holds a collection of data from which the query will make a subselection. There is nothing singular about the entire thing. Arguments against this include that it is sometimes difficult to predict the plural form of a table. That's sort of a valid point, if you have to guess the names of your tables, but I tend to know the names of the tables, I don't guess them.

Similarly some people like to name their foreign key fields after the tables and fields that are involved. Personally I prefer to name the foreign key field after the role it plays in the model, because that is the information I need when I'm working with the field. if a field contains the id of a person who is the teacher for the current record's user, I name the field "teacher_id". That makes sense to me. I already know that teachers are stored in the teachers view, I don't need to put that information in every field that refers to the teachers view.

And these kinds of arguments can be given for every rule of every convention.

So, when you speak to someone who says that his method is the best, he means "I like this".  And that is all it means. Listen to the reasoning andremember the things that make sense to you and that help you to write better code.

zondag 9 februari 2014

Why I like PostgreSQL more than MySQL: LIKE '%hello' can use indexes.

LIKE searches for substrings and if the search expression starts with a wildcard a B-tree index cannot be used (it is sorted on characters that are most likely not even in the search expression).

In PostgreSQL however, you can create a Trigram index which effectively indexes every possible combination of substrings that the LIKE could chop the string into. So, if your field contains "Hello", then the index will contain "Hello", and "ello" and "llo" and so on (yes, a frightening amount of combinations is created and care should be taken if high insert performance is required)  When searching, the index will always have the chopped version of the string in the index and will return a result very quickly indeed.

dinsdag 4 februari 2014

Why I like PostgreSQL more than MySQL: triggers on foreign key events

Triggers let you... well, "trigger" businesslogic whenever a record's content changes. A common usage for this is auditing changes, or preventing illegal values by triggering a check before accepting the query.

The single reason why you would want to do this things in a trigger is that the trigger will fire whenever a particular action is called on the target table, regardles of how that change was initiated. If your PHP script issues a query the trigger will fire. If you run a query by hand, the trigger will fire. If you update a record which leads to a cascading-update, the trigger will fire.

Except when it's MySQL, because MySQL does not fire triggers if the change is the result of a ON DELETE CASCADE or ON UPDATE CASCADE. So, if you are auditing changes to your shoppingcart table you will never see an entry for records that are removed by a cascading delete on the order table.

PostgreSQL does not suffer from this assenine exception.