[rt-users] utf8 and accents.

Curtis Bruneau curtisb at vianet.ca
Fri Aug 8 16:51:52 EDT 2008


Ruslan Zakirov wrote:
> On Sat, Aug 9, 2008 at 12:20 AM, Curtis Bruneau <curtisb at vianet.ca> wrote:
>   
>> I need some suggestions, I have come to the conclusion that all utf8
>> collations don't do french properly, not like latin1 anyway. All accents
>> are seen as the same, while binary distinct they cannot be unique
>> indexed and sorting will recognize them as the same as well as queries
>> using any variant character.
>>
>> So I'm in a bit of a bind, if I were to use RT with a case sensitive
>> collation like utf8_bin would the application behave as expected? I know
>> search would be much more strict and possibly confusing to the end user.
>>     
>
> utf8_bin is good choice. You're free to use binary collation. May be
> utf8_general_ci collation will be better for you. Any collation is ok
> as long as you know how to deal with them in mysql.
>
>
>   
Ok just wondering, I'll give it a try.. I was more curious if any string 
type clauses would still work internally since binary collations are 
everything/case sensitive
. I'm guessing that's all fine because I think postgres stores it's 
stuff as binary_cs and relies on the OS do to collations (something like 
that, other postgres db's around here seem to be case sensitive).
>> My other option would be to continue to use latin1, is there any way to
>> accomplish this using the latest code base? It's probably not
>> configurable and I don't want to have to manage diffs for the possible
>> changes, unless it is fairly minimal to do..
>>     
>
> No, we wouldn't return to that as it's totally wrong and have
> concequences as it's actually violation of setting purpose. RT was
> storing UTF8 encoded data in a latin1 column, so collations worked
> absolutly incorrect for everything even latin1 and were close to
> binary.
>
> At this point I can suggest you move either binary collation or create
> a new one and send it to mysql team for inclusion.
>
>   
Understood, I wasn't liking that idea either. Oddly enough 
latin1_swedish_ci (the latin1 default) isn't suppose to be accent 
sensitive,  latin1_general_ci is but my old database (mysql 4.1) seems 
to be indexing it and seeing them seperate. The collation isn't 
specified so i'm assuming swedish but it's behaving like general, 
perhaps the old version respected the differences. I'm basically trying 
to get it the same as before (perhaps if swedish was enforced before I 
wouldn't be in this position), regardless this isn't really an issue 
with RT.
>> The issue in question -> http://bugs.mysql.com/bug.php?id=34130
>>
>> They said it's on 'todo', MSSQL handles this with ci_ai, ci_as, cs_ai
>> and cs_as collations where the accents are either sensitive or not.
>> Hopefully they do come around to it..
>>
>> Character difference for mysql .. http://www.collation-charts.org/mysql60/
>>
>>
>> Curtis
>>     
Thanks again for your time, i'm really excited to launch 3.8.x, compared 
to 3.4.x our users are loving it, especially the reporting and all that.
Curtis.



More information about the rt-users mailing list