dinsdag 23 december 2014

Why I like PostgreSQL more than MySQL: REPLACE()

"UPSERT", update a record or insert it if it does not exist,  is a feature that is sorely missing from most database engines.

MySQL has implemented a fools-upsert through REPLACE(), because it's not UPSERT, it's DELETE and INSERT. The statement will first delete any existing record and then create a new one.

This has a few side effects that usually don't show up until you've spent an hour or two debugging:

1. The DELETE will trigger ON DELETE CASCADE. So if you REPLACE a user record, then all records that refer to the user with an ON DELETE CASCADE will be deleted. Then the new user record is created, and what you see is: "the user record still exists the PK is still the same, but all referencing records have disappeared. WUT?!"

2. A new record is created, which means that any fields  that are not mentioned in the REPLACE query will be left at their default values. What you see is:"I did a REPLACE to change the name and now the user's date of birth is empty. WUT?!"

3. Because the DELETE will trigger the CASCADE, you will get a foreign-key violation when you try to REPLACE a record that has relations with a RESTRICT setting. What you see is: "I try to replace a record and it says that it cannot DELETE... WUT?!"

How does PostgreSQL handle this? It doesn't. If you want to change a record, you use UPDATE. If you want to delete a record, use DELETE. If you want to create a record, use CREATE.

maandag 8 december 2014

Stop living with your head in the cloud!

First there was the computer. Then people began to work on computers in environments where data had to be shared, and the network was created. With the need for a central location to store data came the server and with the first server-crash came the redundant setup with multiple servers replicating data between them. As computing power requirements grew, so did the number of servers, until the only realistic name for it was a "server farm". These farms worked well, the were a group if servers working together behind a firewall and loadbalancer, with internal communications making the whole setup fully redundant.

But thats old technology, says marketing, today we use the cloud. The cloud is fast, safe, cheap, secure and scalable, it's perfect!

That sounds just peachy, what kind of marvel is this new "cloud" thing exactly? Well that's the brilliant bit, and it is really brilliant, you see, it's a group if servers working together behind a firewall and loadbalancer, with internal communications making the whole setup fully redundant.

No, that's not a copy/paste error, a cloud really is exactly what we used to call a "server farm".

Now I can hear you mumbling:

"But the cloud is virtualized!"
Virtual servers are still servers, with the same issues regarding crashes and security. Making processing share hardware was considered a bad thing until the word "cloud" was invented.

"But the cloud doesn't use servers, it uses instances!"
It either uses instances of servers. There are options to run an application as a service, which introduces a whole new set of risks.

"The could cannot be hacked!"
It's hooked up to the internet, therefor it can be hacked.
There is no such thing as an unhackable setup, the fact that you can login means that a hacker can too. In fact the risk is greater at a cloud because they can get at you by loggin in to your cloud, or by getting into the cloud management service. It wasn't too long ago that I posted about a company that had to fold because someone got into their cloud and simply deleted the virtual servers. Boom.
At least with real servers you have to get into each one separately (unles the admin is a moron who uses the same credentials everywhere.... sadface)

"It is much more secure than a regular server!" Again: it is a regular server.

"But, are there really no advantages to a cloud over a server-farm?"
No. It *IS* a server farm.

There are certainly benefits to a server farm, but calling it a cloud  doesn't change what it is or how it works.

vrijdag 5 december 2014

Everyday PostgreSQL: de top-N berichten uit een tabel per groep.

Elke zoveel tijd kom je een vraag tegen als "Ik heb blogposts/producten/foto's in meerdere categorieen en nu wil ik per categorie de nieuwste X laten zien."

De standaard oplossing is ingewikkeld doen met lussen of correlated subqueries, maar in PostgreSQL (en meerdere van de betere databases) heb je toegang tot windowing, en daarmee wordt het ineens een elegante oplossing:

DROP TABLE IF EXISTS tmp_artikelen;
CREATE TEMPORARY TABLE tmp_artikelen (id SERIAL, title VARCHAR(100), group_id INT);
INSERT INTO tmp_artikelen VALUES
(1, 'Hoe maak je een salade?', 1),
(2, 'Het water staat hoog.',2),
(3, 'Twee maten om te meten',1),
(4, 'Geniet maar drink met mate.', 1),
(5, 'Wizzel en pop gaan uit.', 1),
(6, 'Tomas de locomotief.', 2),
(7, 'Het weer is weer weerbarstig.', 3),
(8, 'Gnurkie wap wap!', 3);

WITH data_plus_rownumber AS (
FROM tmp_artikelen)
SELECT * FROM data_plus_rownumber WHERE r <4;
De CTE maakt een query die naast de data uit de tabel ook het rij-nummer genereert op basis van het group_id en de volgorde van de id's per group_id. De feitelijke query pikt daaruit alle records met een rownummer van minder dan vier, met andere woorden: alleen de eerste drie per group_id.

Het enige wat je hier nog bij moet doen is een beetje sorteren en klaar ben je.