zondag 24 november 2013

Should you translate your content through google-translate?

The short answer is: no, you should not. Not automatically, anyway.

The long(er) answer is that google does such a horrible job of translating texts that the text is either hard to read or simply doesn't mean the same as the original.

For example:

Dutch: "Mijn vlakbank kan niet van dikte zagen"
Actual meaning: "my planing bench cannot cut a board to the correct thickness"
Google: "My bank can not just cutting thickness."

Dutch:"Ik gebruik een schep om te scheppen."
Actual meaning: "I use a schovel for schoveling"
Google: "I use a shovel to create."

Dutch: "Vertel nog eens een moppie."
Actual meaning: "Tell me another joke"
Google: "Tell me a baby."

Dutch: "Deze grijze trui is mooi wollig, zit lekker en staat heel ruig voor op de kop af dertien euro."
Actual meaning: "This gray sweater is beautifully woolly, comfortable to wear and makes you look real rough, for exactly thirteen euros."
Google: (and I'm not making this up) "This gray sweater is nice woolly, is delicious and is very rough on the nose for thirteen euros."

Dutch: "Bing is beter than google."
Actual meaning: "Bing is better than google"
Google: "Is Bing better than google." (it's subtle, but the meaning goes from a statement to a question)

For those of you who say that "any translation is better than no translation", think again.
Firstly, when visitors read a series of texts in very poor grammer and spelling they will move on. Nothing gives the impression of amateurism and ineptitude more than a text full of grammer and spelling mistakes.

Secondly, the translated texts simply don't mean the same as the original. And they will be indexed by the searchengines and you will be ranked under keywords that have absolutely nothing to do with your site at all. Your website about gardening equipment is never going to be found in Dutch if every word related to "shoveling" is being translated as "creating" instead, and nobody is going to buy a sweater that is "delicious and rough on the nose".

So no, don't automatically translate your content through google. If you want to offer the translations as an extra service, create a button that opens a popup with google-translate in it, so the visitor can see that it's google thats making the mistakes and not you.

Don't contradict the locals!

"In the land of the blind, the one-eyed man is king."

Every forum has a group of locals who spend most of their forum-time on that particular forum to share their knowledge with those who post questions. This is a good thing.

Being on the forum 24/7, they anwer most of the questions and because most of the questions are posted by people who know next to nothing about the subject, the answers of the experts are taken for granted. This is a bad thing.

It's bad because never being told that they are wrong makes the locals feel that they are right. About everything. So when you go to a forum and see someone tell a newbie that 1+1=3, and you point out that it is in fact 2, *you* are the badguy for disagreeing with the expert.

But what makes an expert an expert? Learning! If you have knowledge and someone says you are wrong, you check his claims. If you are in fact wrong, congrats, you've just fixed a bug in your knowledge! If you're right, congrats, you've fixed a bug in his knowledge!

As Mythbuster Adam Savage says: "I love being wrong because it means I'll learn something new."

So why are there so many forumusers and "experts" who will set fire to you if you dare to disagree? How will they ever learn more than they know? How will they ever become actual experts if they insist that they already know everything and are incapable of making mistakes?

If anybody out there knows of a forum where the locals have social skills and a desire to learn rather than pretend that they are brilliant, I'd love to know about it because I can't find any.

zaterdag 23 november 2013

To ENUM or not to ENUM?

An ENUM restricts the allowed values for a column to a hardcoded set. This sounds exactly like what a foreign key does, but there are a few significant differences.

Firstly, an ENUM is part of the database schema, not of the data stored *in* the schema. It is not meant to be changed by a user, in fact it is supposed to be impossible for anyone other than the administrator to change the values.

Secondly, a column that is controlled by an ENUM will sort it's values accoring to the structure of the ENUM, not according to the actaul value. If the ENUM defines 'Zed' before 'Cucumber' then order-by will return 'Zed' before 'Cucumber'.

So why would you use an ENUM if it's not easy to change and doesn't sort properly?
Well, that's exactly why: the options are not going to be changed by a user and the sorting is hardcoded. This means that the rest of the database can make assumptions about this without having to look values up and manually sort them.

How does ENUM work in real life?

For MySQL the answer is simple: it doesn't. MySQL's implementation is very bad, it does not even check for duplicate values in the declaration:
mysql> create temporary table foo (a ENUM('a','a'));
Query OK, 0 rows affected, 1 warning (0.01 sec)
And removing existing options from an ENUM... well best not to think about that because MySQL wouldn't be MySQL if it didn't just NULL the fields that point to the removed value (unless it's NOT NULL, in which case... well.. bye bye data).
MySQL has no alternatives besides messing about with triggers, so use a lookup table instead.

In PostgreSQL the ENUM is enforced properly and it is done trough a custom datatype so you can re-use the definition in all tables that need it, eliminating the chance of discrepancies. as per the manual:

CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
name text,
current_mood mood

The ENUM has one property that is often unexpected (and undesired) which is that they sort by the order of the values in the ENUM definition, so in this example an "ORDER BY currentmood" would list 'sad' first, then 'ok' and then 'happy'.

Of you do need ordering to work normally, you can use a CHECK constraint to simply see check if the value is IN('sad','ok','happy'). And you can make that re-usable by creating a DOMAIN for it:

CREATE DOMAIN current_mood_domain AS TEXT
   VALUE IN ('happy', 'very happy', 'ecstatic')
CREATE TABLE foo (current_mood current_mood_domain);
INSERT INTO foo(current_mood) VALUES ('happy');
INSERT INTO foo(current_mood) VALUES ('angry');

donderdag 7 november 2013


Wanneer op je website iets gebeurt waardoor de gewone gang van zaken niet afgerond kan worden dan is er maar één manier om het correct af te handelen:

Toon een foutmelding die de bezoeker uitlegt wat er aan de hand is, en geef via header()  de HTTP statuscode 500  (interne server fout) of 503 (dienst tijdelijk niet beschikbaar).
De foutmelding is logisch; je bezoekers moeten weten dat er iets mis is, je kunt niet botweg een lege pagina laten zien. de HTTP statuscode voorkomt dat zoekmachines je foutmelding zien als nieuwe content, of erger; beslissen dat je hele website kaputt is en je ranking resetten.

Is het lastig om dit goed te doen? Helemaal niet.
PHP heeft exceptions. Je kunt in principe in je index.php een try{} block maken met een catch() die  de foutmelding en de HTTP code doorgeeft. Elke fout die in je scripts gebeurt hoeft alleen een exception te werpen om in de buitenste catch te komen en de melding+statuscode te laten verzenden.

Dat betekent dus ook dat die() en exit() alleen voorkomen in die buitenste catch(), en nergens anders. *NERGENS*.

  // Doe hier al je werk
    if (!fopen(...))
        // Er ging iets fout, werp een exception die door het buitenste catch block wordt opgevangen.
        throw new \Exception("Could not open file");
    // Stuur alle geprinte data naar de browser.
catch (\Exception $e)
    // Gooi alle geprinte data weg zodat alleen de foutmelding naar de browser wordt gestuurd.
    header('HTTP/1.1 503 Service Temporarily Unavailable');
    header('Status: 503 Service Temporarily Unavailable');
    echo "Sorry, de site is tijdelijk niet beschikbaar.";

zaterdag 2 november 2013

New Nested HSTORE developement outperforms MongoDB

If you've ever used PostgreSQL's HSTORE feature then you'll know about it's speed, ease of use and horrible inability to store nested data.

This is about to change.

The HSTORE developers are working on a nested version and at a recent conference they showcased the new features. The highlight was a round of applause for showing that the HSTORE's documentmodel can actually outperform MongoDB.

See: http://momjian.us/main/blogs/pgblog/2013.html#November_1_2013