A long journey to formatting a date internationally in PHP

2017-03-20: xBazilio has rewritten my PHP-CPP extension as a PHP7 native, give it a try, repo is the same — https://github.com/ksimka/intl_dtpg.

TL;DR At the moment you can't simply format a date internationally without a year. The best solution so far — a bunch of config files with prepared patterns for every locale you need.

Let's start with a simple problem: format a date internationally: there must be a day, a full month name and a year. It's a really easy one. PHP has its i18n extension called intl which is a part of PHP core since 5.3 or so. And intl has IntlDateFormatter class. We will use its LONG format.

<?php

foreach (['en_US', 'ru_RU', 'es_ES', 'fa_IR'] as $locale) {  
    $formatter = new IntlDateFormatter(
        $locale, 
        IntlDateFormatter::LONG, 
        IntlDateFormatter::NONE, 
        'Europe/Moscow'
    );
    echo $formatter->format(1455111783), PHP_EOL;
}

Output:

February 10, 2016  
10 февраля 2016 г.  
10 de febrero de 2016  
۱۰ ﻑﻭﺭیﻩٔ ۲۰۱۶ ﻡ. // this is actually RTL text, but I'm not sure this blog can handle it properly

So far so good. Now let's slightly change the initial conditions: format a date internationally: there must be a day and a full month name.

What we want is

February 10  
10 февраля  
10 de febrero  
۱۰ ﻑﻭﺭیﻩٔ ﻡ. // not sure that I edited this one correctly, but who cares?

Actually, now it's not that simple as it seems.

Honestly, all this post is about this very problem.

Wait, but why?

You could wonder, why on earth do I need that kind of format? The answer is quite obvious if you've ever worked with any kind of timestamped feed of something. If you haven't, let's look at Twitter.

twitter date formats

If a twit was posted recently — Twitter shows you a smartly formatted interval of time. If it was posted a long time ago — you'll see a smartly formatted date, moreover if it was happened in this year, you'll not see a year part in that date. And it's great. Smart date formats == good UX.

So yes, we want our users to grasp when it was happened as quickly as possible. Smart formatting FTW.

Now you know, let's get back to our problem.

Ok, there must be a format for this, like LONG, but without a year, right?

Wrong. There are only four standard formats: FULL, LONG, MEDIUM and SHORT, and they all have a year.

<?php

$formats = [
    IntlDateFormatter::FULL, 
    IntlDateFormatter::LONG, 
    IntlDateFormatter::MEDIUM, 
    IntlDateFormatter::SHORT,   
];
foreach ($formats as $format) {  
    $formatter = new IntlDateFormatter(
        'en_US', 
        $format, 
        IntlDateFormatter::NONE, 
        'Europe/Moscow'
    );
    echo $formatter->format(1455111783), PHP_EOL;
}

Output:

Wednesday, February 10, 2016  
February 10, 2016  
Feb 10, 2016  
2/10/16  

If you'll think a bit you'll understand it's impossible to have a constant for every custom format which a developer can theoretically want to use.

Ha, I've just came up with an easy solution!

Really? Let me guess: you want to just cut off a year out of that date formatted as LONG, am I right? Quite sure I am. It can be difficult to understand without examples. But we have some, so take a look at.

Remind, we have.

February 10, 2016  
10 февраля 2016 г.  
10 de febrero de 2016  
۱۰ ﻑﻭﺭیﻩٔ ۲۰۱۶ ﻡ.

Let's strip a year.

February 10,  
10 февраля г.  
10 de febrero de  
۱۰ ﻑﻭﺭیﻩٔ ﻡ. 

Do you see all that artefacts like , and г. and de and definitely something that I can't distinguish in the last string?

So — no, it's not a solution at all. Don't worry, it's not a shame, I thought the same way, it was my first "a-ha!" thing too.

Ok then, there are patterns, let's use patterns!

Yes, IntlDateFormatter work with patterns internally, and you can pass your own pattern too.

Pattern consist of a number of predefined sequences of letters.

Looks like we need a "d MMMM" pattern.

<?php

foreach (['en_US', 'ru_RU', 'es_ES', 'fa_IR'] as $locale) {  
    $formatter = new IntlDateFormatter(
        $locale, 
        IntlDateFormatter::NONE, 
        IntlDateFormatter::NONE, 
        'Europe/Moscow',
        null,
        "d MMMM"
    );
    echo $formatter->format(1455111783), PHP_EOL;
}

Output:

10 February  
10 февраля  
10 febrero  
۱۰ فوریهٔ

Looks good! But hey, what's this? Oh, shit...

Remind, what we want is

February 10  
10 февраля  
10 de febrero  
۱۰ ﻑﻭﺭیﻩٔ ﻡ. 

No, it's not good, only one match. It's because with a pattern you yourself placed all the date parts on its places. Like you said "a day part must be the first and a month part must be the next, and yeah, a space between them". For any locale. It's just wrong.

The truth is — locale is not only a language, it'a also a formatting pattern. And a pattern is not only what parts are there, it's also where that parts are.

<?php

foreach (['en_US', 'ru_RU', 'es_ES', 'fa_IR'] as $locale) {  
    $formatter = new IntlDateFormatter(
        $locale, 
        IntlDateFormatter::LONG, 
        IntlDateFormatter::NONE, 
        'Europe/Moscow'
    );
    echo $formatter->getPattern(), PHP_EOL;
}

Output:

MMMM d, y  
d MMMM y 'г'.  
d 'de' MMMM 'de' y  
d MMMM y G  

As you can see they are all different.

But there must be existing solution! It can't be so it doesn't exist! Or...?

*sigh* Yes, I thought the same way. How could it be that PHP, a mature language with a strong and mature ecosystem, lacks kinda basic functionality? Sad but true: it does.

There is at least one very important thing missing in intl: DateTimePatternGenerator class from ICU. It is just designed to solve our little problem and all the other similar ones.

Wait-wait-wait, what an ICU?

ICU stands for "International Components for Unicode"

ICU is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software.

...

Formatting: Format numbers, dates, times and currency amounts according the conventions of a chosen locale. This includes translating month and day names into the selected language, choosing appropriate abbreviations, ordering fields correctly, etc. This data also comes from the Common Locale Data Repository.

Shortly — it's a set of libraries. PHP's intl itself doesn't do any magic, it's a kind of proxy to those libraries.

$ php -i | grep intl -A5
intl

Internationalization support => enabled  
version => 1.1.0  
ICU version => 56.1  
ICU Data version => 56.1  

In order to use IntlDateFormatter ICU must be installed in your system (or you have to build icu separately and then build PHP with it). With different versions of ICU you'll get different output when formatting.

$ dpkg -S icu
libicu52:amd64: /usr/lib/x86_64-linux-gnu/libicule.so.52.1  
libicu52:amd64: /usr/lib/x86_64-linux-gnu/libicule.so.52  
libicu52:amd64: /usr/lib/x86_64-linux-gnu/libicutest.so.52  
...

Got it. You've mentioned DateTimePatternGenerator, tell me more

Yeah, DateTimePatternGenerator, it's the most magical date-time-formatting-related thing in ICU.

Quote from the ICU site:

This class provides flexible generation of date format patterns, like "yy-MM-dd".

The user can build up the generator by adding successive patterns. Once that is done, a query can be made using a "skeleton", which is a pattern which just includes the desired fields and lengths. The generator will return the "best fit" pattern corresponding to that skeleton.

The main method people will use is getBestPattern(String skeleton), since normally this class is pre-built with data from a particular locale. However, generators can be built directly from other data as well.

That's exactly what we need! We pass so called "skeleton" (date parts that we want to be in a formatted output) to getBestPattern method, and it returns "best fit" pattern. Then we pass that pattern to IntlDateFormatter — and voila!

How it could work.

$skeleton = "MMMMd";
foreach (['en_US', 'ru_RU', 'es_ES', 'fa_IR'] as $locale) {  
    $pgen = new IntlDateTimePatternGenerator($locale);
    $pattern = $pgen->getBestPattern($skeleton);

    $formatter = new IntlDateFormatter(
        $locale, 
        IntlDateFormatter::NONE, 
        IntlDateFormatter::NONE, 
        'Europe/Moscow',
        null,
        $pattern
    );
    echo $formatter->format(1455111783), PHP_EOL;
}

Output (probably):

February 10  
10 февраля  
10 de febrero  
۱۰ ﻑﻭﺭیﻩٔ ﻡ. 

Woohoo! Yes-yes-yes, I just copy-pasted our "what we want" block. In fact, I don't know what the output would be, but I hope I'm right.

So if I want to use a custom format, what should I do?

I came up with the second obvious solution: generate a config file with every custom pattern for each locale in your project. And do this every time new locale comes.

Here is a quick snippet.

<?php

// ...
foreach ($locales as $locale) {  
    $pattern = <<<CONFIG
        '%s' => [
            'medium_no_year' => "%s", // %s
            'long_no_year' => "%s", // %s
        ],

CONFIG;

    $mediumF = new IntlDateFormatter($locale, IntlDateFormatter::MEDIUM, IntlDateFormatter::NONE);
    $longF = new IntlDateFormatter($locale, IntlDateFormatter::LONG, IntlDateFormatter::NONE);

    printf(
        $pattern,
        $locale,
        $mediumF->getPattern(),
        $mediumF->format(1455111783),
        $longF->getPattern(),
        $longF->format(1455111783)
    );
}

This will give you something like

        'en_US' => [
            'medium_no_year' => "MMM d, y", // Feb 10, 2016
            'long_no_year' => "MMMM d, y", // February 10, 2016
        ],
        'ru_RU' => [
            'medium_no_year' => "d MMM y 'г'.", // 10 февр. 2016 г.
            'long_no_year' => "d MMMM y 'г'.", // 10 февраля 2016 г.
        ],
        'es_ES' => [
            'medium_no_year' => "d MMM y", // 10 feb. 2016
            'long_no_year' => "d 'de' MMMM 'de' y", // 10 de febrero de 2016
        ],
        'fa_IR' => [
            'medium_no_year' => "d MMM y G", // ۱۰ فوریهٔ ۲۰۱۶ م.
            'long_no_year' => "d MMMM y G", // ۱۰ فوریهٔ ۲۰۱۶ م.
        ],

Then you have to manually edit it, removing a year part.

        'en_US' => [
            'medium_no_year' => "MMM d", // Feb 10
            'long_no_year' => "MMMM d", // February 10
        ],
        'ru_RU' => [
            'medium_no_year' => "d MMM", // 10 февр.
            'long_no_year' => "d MMMM", // 10 февраля
        ],
        'es_ES' => [
            'medium_no_year' => "d MMM", // 10 feb.
            'long_no_year' => "d 'de' MMMM", // 10 de febrero
        ],
        'fa_IR' => [
            'medium_no_year' => "d MMM", // ۱۰ فوریهٔ م.
            'long_no_year' => "d MMMM", // ۱۰ فوریهٔ م.
        ],

When you have dozens of locales it's an exhausting work, TRUST ME (*quiet weeping*).

And that's not all. If you want to output a date with a time, you have to generate twice as much patterns. Because you can't simply format time and concatenate it to date. Append or prepend or insert in between somewhere?

<?php

foreach (['en_US', 'ru_RU', 'es_ES', 'fa_IR'] as $locale) {  
    $pattern = <<<CONFIG
        '%s' => [
            'medium_no_year-short' => "%s", // %s
            'long_no_year-short' => "%s", // %s
        ],

CONFIG;

    $mediumF = new IntlDateFormatter($locale, IntlDateFormatter::MEDIUM, IntlDateFormatter::SHORT);
    $longF = new IntlDateFormatter($locale, IntlDateFormatter::LONG, IntlDateFormatter::SHORT);

    printf(
        $pattern,
        $locale,
        $mediumF->getPattern(),
        $mediumF->format(1455111783),
        $longF->getPattern(),
        $longF->format(1455111783)
    );
}

Output:

        'en_US' => [
            'medium_no_year-short' => "MMM d, y, h:mm a", // Feb 10, 2016, 2:43 PM
            'long_no_year-short' => "MMMM d, y 'at' h:mm a", // February 10, 2016 at 2:43 PM
        ],
        'ru_RU' => [
            'medium_no_year-short' => "d MMM y 'г'., H:mm", // 10 февр. 2016 г., 14:43
            'long_no_year-short' => "d MMMM y 'г'., H:mm", // 10 февраля 2016 г., 14:43
        ],
        'es_ES' => [
            'medium_no_year-short' => "d MMM y H:mm", // 10 feb. 2016 14:43
            'long_no_year-short' => "d 'de' MMMM 'de' y, H:mm", // 10 de febrero de 2016, 14:43
        ],
        'fa_IR' => [
            'medium_no_year-short' => "d MMM y G،‏ H:mm", // ۱۰ فوریهٔ ۲۰۱۶ م.،‏ ۱۴:۴۳
            'long_no_year-short' => "d MMMM y G، ساعت H:mm", // ۱۰ فوریهٔ ۲۰۱۶ م.، ساعت ۱۴:۴۳
        ],

Remove a year, again.

        'en_US' => [
            'medium_no_year-short' => "MMM d, h:mm a", // Feb 10, 2:43 PM
            'long_no_year-short' => "MMMM d, 'at' h:mm a", // February 10, at 2:43 PM
        ],
        'ru_RU' => [
            'medium_no_year-short' => "d MMM, H:mm", // 10 февр., 14:43
            'long_no_year-short' => "d MMMM, H:mm", // 10 февраля, 14:43
        ],
        'es_ES' => [
            'medium_no_year-short' => "d MMM H:mm", // 10 feb. 14:43
            'long_no_year-short' => "d 'de' MMMM, H:mm", // 10 de febrero, 14:43
        ],
        'fa_IR' => [
            'medium_no_year-short' => "d MMM،‏ H:mm", // ۱۰ فوریهٔ،‏ ۱۴:۴۳
            'long_no_year-short' => "d MMMM، ساعت H:mm", // ۱۰ فوریهٔ، ساعت ۱۴:۴۳
        ],

Even comma in farsi is not a standard comma, you can't just guess how to assemble this from two separate parts.

But how all the other PHP world does this?

Hard to believe, but they either don't do datetime i18n or do it wrong.

I've looked into sources of the most popular PHP-backed CMSes.

I will not touch frameworks cause in my opinion it's not a job for a framework. Maybe some basic functionality... Hmmm, maybe. Actually I've tried with the Yii2, and they just recommend to use bare intl. So, let's just look at some CMSes.

Drupal

I started to search and immediatly stumbled upon an issue with an intriguing title — "Date intl support is broken, remove it". Lolwut!? But it's not a joke, they really did it.

intl was removed

Basically they solve a formatting problem the same way (a bunch of custom patterns), but before that patch, as you can see on the screenshot, they had a intl key for international pattern. Now they just don't care.

Also, if I understand correctly (I'm not an active Drupal user), every user after installation have to create those patterns manually. For each locale. Or maybe I simply didn't find something special.

This is how it's done in 8.1

date formats 8.1

And this is the 9.x one

date formats 9.x

(Looks like they just haven't strip those intl keys from it yet)

It's not that bad, but as a CMS user I don't want to learn different cultures to know what datetime pattern do people prefer. All that work have been already done by CLDR. Yes, I want custom patterns sometime, but all I agree to do is to specify what parts should be there (like only a day and a month or only a month and a time).

WordPress

I'm also not an active user of WordPress, so I've searched through Github. Looks like the main function here is date_i18n from functions.php (btw, wtf, guys? 5.2k lines file with functions? r u seriuos?).

wordpress date_i18n

I honestly tried to understand how it works. But... but... just look at this.

wordpress date_i18n contents

Holy shit... Anyway, it definitely doesn't look like date i18n done right, at least because of using date_format. They try to localize day and month names, no more. Correct me if I wrong.

Joomla!

It's almost the same as Drupal's: a number of predefined formats which have to be defined for each locale.

en-GB from default installation

en-GB Joomla config

Those letters tell us that Joomla! uses date_format too and doesn't use intl. Is it justified? I think no. intl is a part of PHP since 5.3, and we already have PHP 7, 5.5 is a minimum requirement for most modern code. Let's call it legacy or tech debt or whatever and move on.

ModX Revolution

transport.core.system_settings.php

get.class.php

modifier.date_format.php

strftime is better than date_format or nothing in case of i18n, but it doesn't do the job right too.

Magento2

Magento is not a classical CMS, it's a e-commerce platform, but it's well known and it has its own framework. So why not.

I must say that it's the only framework in my review that does the date and time i18 almost right! It's the only codebase where IntlDateFormatter is used, and is used as a core date formatting component.

But their code is not perfect. See Timezone.php.

I didn't find whether they try to format a date without a year somewhere. But looks like they're making the same mistake (as we did a little bit earlier) trying to replace a year when formatting a "date with long year", or not? I'm not a regex expert (even though I'm familiar with "lookahead" and "lookbehind" things) so I'd better simply try to execute that getDateFormatWithLongYear code to see what it actually does.

<?php

foreach (['en_US', 'ru_RU', 'es_ES', 'fa_IR'] as $locale) {  
    $dateFormat = (new \IntlDateFormatter(
        $locale,
        \IntlDateFormatter::SHORT,
        \IntlDateFormatter::NONE
    ))->getPattern();

    $formatWithLongYear = preg_replace(
        '/(?<!y)yy(?!y)/',
        'Y',
        $dateFormat
    );

    $formatter = new \IntlDateFormatter(
        $locale, 
        \IntlDateFormatter::NONE, 
        \IntlDateFormatter::NONE, 
        null, 
        null, 
        $formatWithLongYear
    );
    echo $formatter->format(1455111783), PHP_EOL;
}

Output:

2/10/2016  
10.02.2016  
10/2/2016  
۲۰۱۶/۲/۱۰ م.

Looks good! So — OK, sometimes replacing works. Though the fact they use replacing is another confirmation that PHP lacks of pattern generator.

And their getDateTimeFormat is obviously a mistake. They concatenate patterns. No, the order of date and time is not the same for every locale.

You can also look here. Good work anyway!

So, what do you want?

It's obviously: I want someone to implement missing pattern generator from ICU in PHP's intl!

Consider it a public request. See how many peolpe are struggling with i18n date and time formatting. Wanted to change the world? It's your chance! Do it!

UPDATES

2016-02-26

Posted on
Tagged in i18n , intl , date , time , php
ksimka

PHP developer at Wamba.com

comments powered by Disqus