Advanced search  

News:

cpg1.5.46 Security release - upgrade mandatory!
The Coppermine development team is releasing a security update for Coppermine in order to counter recently discovered vulnerabilities. It is important that all users who run version cpg1.5.44 or older update to this latest version as soon as possible.
[more]

Pages: [1]   Go Down

Author Topic: [Done]: [Suggestion for code review: select_lang.inc.php] automatic language detection  (Read 14379 times)

0 Members and 1 Guest are viewing this topic.

ripat

  • Coppermine newbie
  • Offline Offline
  • Posts: 11
    • Regex tester

First, I would like to say that I'am impressed by the quality of Coppermine and the by the amount of work it represents.

Living in a country where 3 different languages are spoken, I paid a special attention to the automatic language detection based on the Accepted-Language and User-Agent HTTP strings.

GENERAL REMARK

MY SUGGESTION
The code below is faster and has more features. Faster by the use of PCRE regex functions that are *much* faster than the POSIX ones. In a little benchmark (100 loops) the new code is 3 times faster if there is a Accepted-Language string and up to 5 times faster on the User-Agent string.

As for the new feature, in the definition of the http Accepted-Language string w3c says:
Each language-range MAY be given an associated quality value which represents an estimate of the user's preference for the languages specified by that range. The quality value defaults to "q=1".
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4

My code below takes the user preferences into account by sorting the languages tokens on their weight (q=0.x)

For example: if the Accepted-Language strings looks like: ww,ww-zz,de=0.2;q=0.1,it;q=0.5,en;q=0.3, the code will disregard the non-existing ww or ww-zz tags and will pick-up the language-tag that has the higher q factor, it in this case.

Code: [Select]
function lang_detect_q($available_languages) {
    if (!empty($_SERVER['HTTP_ACCEPT_LANGUAGE'])) {
        $language_tokens = explode(',', $_SERVER['HTTP_ACCEPT_LANGUAGE']);
        // loop through each Accept-Language token and find quality level (i.e. q=0.8)
        $lang_tag = $quality_tag = array();
        foreach ($language_tokens as $language_token ) {
            // explodes on ;q
            $q_explode = explode(';q=', $language_token);
            // if no q factor in token default q value = 1
            $q = isset($q_explode[1]) ? $q_explode[1] : 1;
            // add language_tag and quality_tag to array
            $lang_tag[]    = $q_explode[0];
            $quality_tag[] = $q;
        }
        // sorts array on key in reverse order (higher quality first)
        // array_multisort was too slow
        arsort($quality_tag);
        // loop throuh every quality_tag array
        foreach ($quality_tag as $q_key => $q_val) {
            // loop through each available_languages
            foreach ($available_languages as $key => $language) {
                if (preg_match('#^(?:'. $language[0] .')#i', $lang_tag[$q_key])){
                    // exit function on first match.
                    return $available_languages[$key][1];
                }
            }
        }

    // if Accept-Language not present in the client's http header, we try the User-Agent string
    } elseif (!empty($_SERVER['HTTP_USER_AGENT'])) {     
        // once again, loop through each available_languages
        foreach ($available_languages as $key => $language) {
            if (preg_match('#[(,; [](?:'. $language[0] .')[]),;]#i', $_SERVER['HTTP_USER_AGENT'])) {
                // exit function on first match.
                return $available_languages[$key][1];
            }
        }
    }
    // if nothing found --> exit function with false (or default language value if necessary)
    return false;
}

$lang = lang_detect_q($available_languages);
// If we catched a valid language, configure it
if ($lang) {
    $USER['lang'] = $lang;
}

As for the $available_languages array, the PCRE functions run slightly faster when the grouping parenthesis (option1|option2) are rendered non capturing as in (?:option1|option2). So,
'fr' => array('fr(?:-[[:alpha:]]{2})?|french', 'french', 'fr'),

Let me know if something need to be changed.
« Last Edit: November 30, 2008, 12:33:54 am by Nibbler »
Logged

Nibbler

  • Guest

Good work, is this tested on the main web browsers?
Logged

ripat

  • Coppermine newbie
  • Offline Offline
  • Posts: 11
    • Regex tester

Yes I did.

IE 5.5
IE 6.0
IE 7.0
FF 2.0 (Linux)
FF 2.0 (OS-X)
Opera (Linux)
Opera (Windows)
Safari 9.2 (OS-X)

And even CURL and wget :=)

They are all OK but it's normal as they all send pretty standard Accepted-Language strings. If that string is not present, like for CURL and wget, the fallback on the User-Agent string is far less efficient as they are far from standard and don't always contain the localisation tag.

What I mean is that the language detection relies on string sent by the browser in the http header. Pretty straight forward. Not like that html/css stuff when the client receives the html page and must parse it correctly!

Jean-Luc.
Logged

Nibbler

  • Guest

Committed to 1.5.
Logged

Nibbler

  • Guest

Would be nice to hook the language detection into the language manager.
Logged

Joachim Müller

  • Dev Team member
  • Coppermine addict
  • ****
  • Offline Offline
  • Gender: Male
  • Posts: 47844
  • aka "GauGau"
    • gaugau.de

That's what I'm up to. The language manager isn't done yet. My goal is to let the admin decide if he wants language auto-selection based on browser language or not. Let's hope I get all the features done before the feature freeze stage.
Logged
Pages: [1]   Go Up
 

Page created in 0.019 seconds with 20 queries.