[rt-users] Bad characters in names loaded from LDAP (AD)
Bill Cole
rtusers-20090205 at billmail.scconsult.com
Mon Oct 10 23:41:00 EDT 2016
On 10 Oct 2016, at 16:26, Jan Burian wrote:
> Hi all,
>
> we have RT 4.4.0 on CentOS 7 and Perl v5.22.1. And we are starting to
> use RT in production.
>
> We configured RT to authenticate users via LDAP
> (RT::Authen::ExternalAuth::LDAP). Our LDAP server is MS AD (Win 2008
> R2).
[...]
> Authentication is working fine. Users can log in, if the user doesn't
> exist in RT the account is autocreated. All the configured attributes
> are transferred.
This is a strong sign that the LDAP part is working correctly. If the
LDAP server (AD) and client (Perl's Net::LDAP module) are using
mismatched encodings, it is likely to show up in authentication failures
due to incompatible encodings of the same (logical) characters that
8-bit encodings assign to byte values 0x80-0xff.
Fortunately, it is somewhere between arcane and impossible to make
Net::LDAP use anything other than UTF-8. There's *probably* some way to
make it do T.61 for ancient-history compatibility, but that's mostly
pointless.
[...]
> We had similar problem with Moodle. When we configured Moodle against
> Active Directory and set cp1250 encoding, then it was doing exactly
> same
> thing. After we changed encoding for LDAP connector to utf-8 then the
> names was
> corrected.
Which makes sense: LDAP v3 by default uses UTF-8 and you have a modern
system with a mature LDAP client. I know of no way to configure a CentOS
7/Perl 5.22 system such that the LDAP interaction with an AD LDAP server
talking UTF-8 would be the source of this sort of encoding conflict. I'm
mildly surprised that anything talking LDAPv3 can be made to use cp1250
encoding, but I suppose Microsoft makes their own rules to go along with
their own unique code pages.
[...]
> Also I red thath MS AD in LDAP protocol version 3 returns any string
> to
> LDAP client in utf-8 encoding.
> I really don't know where could be a problem.
The most likely place is in your database. I'm guessing that you are
using MySQL, which defaults to latin1 encoding. When you store a UTF-8
string into a latin1 table, it breaks any multi-byte characters into 2
or 3 characters, but the right bits are still there. This issue has come
up a few times on this list over the past decade and I think Best
Practical has documented how to safely convert a RT database with that
sort of problem from latin1 to utf8. It is probably worth looking
through their docs (possibly one of the UPGRADING* files?) and the RT
Wiki for a solution. I expect it could be done with a binary dump of the
database, altering of any latin1 tables to use utf8, and a re-import of
the binary dump. I'm not enough of a MySQL expert to detail that process
(I generally use Postgres where possible.)
More information about the rt-users
mailing list