forum.coppermine-gallery.net

Dev Board => cpg1.4 Testing/Bugs => cpg1.4 Testing/Bugs: FIXED/CLOSED => Topic started by: itang on April 26, 2005, 06:21:28 pm

Title: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: itang on April 26, 2005, 06:21:28 pm
Running 1.4.1 and utf-8 encoding. Have some picuters with Chinese characters keyword. The keyword list comes out correctly.

When clicking on certain Chinese keywords, no picture has been found. The searching result table title (at the top right hand corner) contained corrupted Chinese words.

However, there is no problem to do a normal search and all the Chinese characters shown up correctly.
Title: Re: 1.4.1 - keyword list can't reture correct result for chinese characters (utf
Post by: Casper on April 26, 2005, 06:47:34 pm
I have no way to test this, moved to 1.4 testing/bugs board.

itang, thanks foryour report.
Title: Re: 1.4.1 - keyword list can't reture correct result for chinese characters (utf
Post by: itang on April 28, 2005, 03:51:35 pm
It only happened in IE.

It is normal in Firefox!

Get back the web!
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: Joachim Müller on July 31, 2005, 12:24:31 pm
[moderation]
bumping this unresolved thread to the top...
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: Joachim Müller on August 09, 2005, 09:33:30 am
@itang: what are your character encoding settings in IE?
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: omniscientdeveloper on September 12, 2005, 06:35:23 am
*BUMP*
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: artistsinhawaii on September 23, 2005, 11:01:40 am
Finally feeling a little more comfortable around Coppermine, I thought I'd give the Japanese language a test drive. 

Sort function works fine for Japanese language in standard and keyword search modes, provided keywords are not in quotes. 

With Japanese, however, and this goes back to using quotes to separate keywords, I can't find a way to separate keywords in a keyword field so that each keyword is recognized as a separate word.  Japanese quotes and English quotes do not work. 

If I separate the words with spaces, they are linked as one keyword and listed in the keyword search as: 吉川 田辺 神奈川  ( ONE keyword with no surrounding quotes).
If I separate the words with quotes, they are linked as ONE keyword and listed in the keyword search as  "吉川" " 田辺" " 神奈川"   -  (ONE keyword all within quotes).

Without quotes, there is no problem searching. 

With quotes, the search function fails and the return result is:

search:               
検索結果 - ""吉川" "田辺" "神奈川""


Dennis
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: Joachim Müller on October 21, 2005, 11:28:24 pm
We're getting nowhere with this thread. I'm marking it as "known issue"
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: artistsinhawaii on October 22, 2005, 03:47:22 am
I don't know if this will help any of the developers, but, the space character for Asian languages (Chinese/Japanese) is:  & # 12288; (without the spaces between &,#,12288) as opposed to %20

Dennis
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: logue on October 23, 2005, 04:31:57 am
Isn't a problem in setting of php.ini?

I did the interval to a trouble which was alike when translating SMF into Japanese.
Then, garbled characters were lost when php.ini was modified as follows.

Code: [Select]
unicode.runtime_encoding = iso-8859-1to
Code: [Select]
unicode.runtime_encoding = utf-8
Code: [Select]
;mbstring.internal_encoding = EUC-JPto
Code: [Select]
mbstring.internal_encoding = UTF-8
Code: [Select]
;mbstring.substitute_character = none;to
Code: [Select]
mbstring.substitute_character = 12307;
Code: [Select]
tidy.clean_output = Offto
Code: [Select]
tidy.clean_output = On
There may also be a part which is not related...
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: CapriSkye on November 09, 2005, 01:15:09 am
it might have something to do with your database encoding...
i'm using latest apache, php, and mysql 5
everything works okay, nothing wrong with keyword search.
i even tried with the keyword itang was having trouble with.
jfyi
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: artistsinhawaii on November 09, 2005, 02:09:55 am
it might have something to do with your database encoding...
i'm using latest apache, php, and mysql 5
everything works okay, nothing wrong with keyword search.
i even tried with the keyword itang was having trouble with.
jfyi

Interesting.  Did you try stringing multiple keywords? More than one keyword in the keyword field?

I found that replacing the doublebyte space character with %20 allows these words to be searched separately.

Dennis
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: CapriSkye on November 09, 2005, 03:06:29 am
here's my test site, you can try it yourself
http://www.capriskye.com/gallery

the snowboard pic is using the keywords as itang used.
the msu pic has one character in the keyword field.
and doesn't matter how many characters i used to search, i still get the correct results.
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: artistsinhawaii on November 10, 2005, 05:11:59 am

and doesn't matter how many characters i used to search, i still get the correct results.

But what about characters separated by spaces?  Suppose you wanted to link a picture to three different albums?  ie

四角 丸い 細長 

Will the search function read each as a separate keyword? or will it read it all as one?  In my case, it reads it all as one.

Dennis

Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: CapriSkye on November 10, 2005, 05:57:17 am
are you saying if you search 角, it wouldn't return anything?
even though that keyword is in 四角 丸い 細長?
if so that's not the case for me.

i've change the keywords for the snowboard to the following:
電腦 動作 雪板 雪 滑

if you just search 電, it would return that picture.
if you search 電腦動作, without the space between them, it wouldn't return anything.
but i don't think that's a problem, isn't that how it's suppose to work?
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: artistsinhawaii on November 10, 2005, 06:12:44 am
if you just search 電, it would return that picture.
if you search 電腦動作, without the space between them, it wouldn't return anything.
but i don't think that's a problem, isn't that how it's suppose to work?


yep, that's how it's suppose to work.  Good to know that the problem is not in CPG but in my database encoding somewhere.  I'll sort it out.  Thanks.

Dennis
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: DJMaze on November 24, 2005, 10:28:40 am
if you look carefully only one character is changed into garbig.
The two questionmarks should actualy be three questionmarks since the chinese character consists of three bytes, not two.
Mostly it means that the font used in IE doesn't have the character within or the application can't handle the code.

I don't have IE and i don't understand chinese so finding that specific character is hard for me. Maybe post it ?
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: artistsinhawaii on November 25, 2005, 04:32:29 am
@DJMAZE,


I can't remember how I pulled this one out but for Asian character sets, the blank space between characters when one hits the space bar is:
 
& # 12288; (that's without the spaces between the &, #, and 12288) as opposed to %20 or & # 32;  for standard latin ascii characters.  If this could be replaced with the latin ascii equivalent that would resolve the issue.


Dennis
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: DJMaze on November 25, 2005, 04:12:58 pm
If this realy is a "space" issue then there are probably much more issues since there are many kinds of spaces.

I will design a PHP with the complete unicode character set so that people can test each and every character ok ?
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: DJMaze on November 25, 2005, 05:48:42 pm
nvm here it is: http://dragonflycms.org/unicode/?p=60#12288

I've wrapped it up in pages constisting of 150 chars on each page or your browser might choke in it
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: artistsinhawaii on November 27, 2005, 01:26:13 am
nvm here it is: http://dragonflycms.org/unicode/?p=60#12288

I've wrapped it up in pages constisting of 150 chars on each page or your browser might choke in it

DJMAZE,

The use of Japanese or Chinese characters in other places works as expected.  The real issue, as I see it, is that when someone using CPG in Japanese or Chinese tries to input Kanji as their keywords and hits the spacebar between each set of Kanji to separate each "keyword", the keyword manager will save this as:

keyword1&#12288 ;keyword2&#12288 ;keyword3  (without the space between 8 and  ; )

rather than:

keyword1 keyword2 keyword3   (where the spaces are the &#32 ; (latin) variety.)

therefore the search engine would read:
the first example as ONE word not three separate words and list it as ONE word at the bottom of the search page.  The keyword manager would read the entry as ONE keyword as well and therefore the linking feature would fail to work.  If I literally replace the &#12288 with ' ' (latin space), there is no problem.  We need to replace the &#12288 character with ' ' (latin equivalent) JUST FOR THE KEYWORD field before it is saved to the database.

(Hope you didn't read this when I had %20 in there...   cuz %20 wouldn't work either.)

Dennis
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: DJMaze on November 27, 2005, 03:46:17 am
I understood was thinking ahead regarding other issues but posted in this topic instead of the correct forum.

Anyway as seen on unicode page use the hex code and the following code change might work (someone needs to test)

thumbnails.php around line 93:
Code: [Select]
$USER['search'] = $_POST;
$album = 'search';
}
Replace with:
Code: [Select]
$USER['search'] = $_POST;
$USER['search']['search'] = preg_replace('#[\xE3][\x80][\x80]#', ' ', $USER['search']['search']);
$album = 'search';
}
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: artistsinhawaii on November 27, 2005, 04:31:49 am
@DJMAZE

Sorry, didn't do anything really.
it allowed for searching of independent characters separated by the space, but it did not resolve the issue of all the characters being strung together as ONE keyword.

In other words, if I type in three different  keywords in Japanese/Chinese separated by spaces and save the settings, all three end up as ONE keyword because the asian space character is not recognized by CPG as a separator of keywords, thereby defeating the linking feature of different keywords.   

So the character replacement has to occur even before it is saved to the database OR better yet,  the asian space character must also be recognized by CPG as a legitimate space character.

Dennis

Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: DJMaze on November 27, 2005, 04:25:14 pm
The above mentioned code is only for fixing the search itself, not the keywords.
Since it works here's the fix for the keywords:

editOnePic.php around line 41 replace:
Code: [Select]
    $title        = $_POST['title'];
    $caption      = $_POST['caption'];
    $keywords     = $_POST['keywords'];
    $user1        = $_POST['user1'];
    $user2        = $_POST['user2'];
with
Code: [Select]
    $title        = $_POST['title'];
    $caption      = $_POST['caption'];
    $keywords     = preg_replace('#[\xE3][\x80][\x80]#', ' ', $_POST['keywords']);
    $user1        = $_POST['user1'];
    $user2        = $_POST['user2'];
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: artistsinhawaii on November 27, 2005, 06:07:15 pm
@DJMAZE

Pefect!   I tried it as a keyword link, keyword for search, and then I tried searching for a particular Japanese/Chinese character and phrase when the Kanji was in the caption and title.  All work.

Dennis
Title: Re: 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
Post by: DJMaze on November 27, 2005, 06:16:48 pm
Bugfix added to cvs 'devel' module.
Needs testing and if it works i will commit to 'stable'

/devel/editOnePic.php (http://cvs.sourceforge.net/viewcvs.py/coppermine/devel/editOnePic.php) new revision: 1.43
/devel/thumbnails.php (http://cvs.sourceforge.net/viewcvs.py/coppermine/devel/thumbnails.php) new revision: 1.31
/devel/include/functions.inc.php (http://cvs.sourceforge.net/viewcvs.py/coppermine/devel/include/functions.inc.php) new revision: 1.209