[svk-devel] svk 2.0.1, Win32 and locales

Thomas Lauer thomas.lauer at virgin.net
Wed May 9 07:09:31 EDT 2007


(Sorry for the empty mail I sent an hour or so ago: hit a key I
shouldn't have hit.)

I have used (or rather tested) SVK for the last two weeks in a small C
project. So far, things do look pretty good. Emboldened by this, I've
just decided to deploy a bigger project: a chaotic mixture of .doc and
.xls files (that's MS Office stuff) and a tree with my website contents,
text, binaries, warts and all.

The good news is that basically it works. The bad news is that there are
a few small, but annoying niggles which seem to be Win32 and/or locale
specific. I have some filenames (say "Tüt.txt") with [\x80-\xff]
characters in them. This is what I get when SVK tries to commit one of
these files: "Can't decode path as utf8." (The commit is aborted.)

The problem seems to arise in SVK's Util.pm, get_encoding(). This always
returns utf8 on my machine regardless of my locale or LANG settings. For
the time being I've patched get_encoding() so it returns a hard-coded
"iso-8859-1".

After that I once more tried the commit and now all files, including
those with accents etc., went through without a hitch: nice. Even nicer
is that almost everything else works as well: revert, cat, log... all
that looks good, even with filenames with non-Ascii characters in them.

However, what doesn't work *at all* are the update and diff commands
(perhaps there are others). I get a blunt "Can't encode path as
iso-8859-1." with update and a not very helpful "Invalid argument: Safe
data '--- T' was followed by non-ASCII byte 195: unable to convert
to/from UTF-8" when doing a diff.

I find these messages rather strange as (most of) the other functions
which deal with such files *do* work. (It's also surprising that an app
at a 2.x release level that's supposed to work with Win32 shows that
sort of problem (which admittedly may well be down to a lack of
documentation). Then again, perhaps Real Unix Programmers [tm] never use
accented characters and umlauts in their filenames...)

Anyway, in the case of update it seems that to_native() (that's the
function throwing the error message) is called with a string composed of
two filenames, as it were: "Tüt.txt\Tüt.txt". Obviously, this is not a
valid utf8 string. However, I have not investigated this or the diff
case any further.

There are two issues here, I think.

1) Why won't get_encoding() return the right encoding on Win32? (I have
the current version of Win32::Console.)

2) What happens with the filenames when I try to update or diff that
doesn't happen when I revert or cat?

BTW, all that happens on a Win2K, SP4 rollup machine with

> svk -v
> This is svk, version v2.0.1 (using Subversion bindings 1.004003)

>perl -v
>This is perl, v5.8.8 built for MSWin32-x86-multi-thread
>Copyright 1987-2006, Larry Wall

I first installed the Win32 binary package (which appears to be on the
2.0.0 level). After running into this problem I updated the whole lot to
SVK 2.0.1 and the 1.4.3 Subversion Perl bindings... still no joy.

However, repositories based on source trees populated only with "normal"
filenames work in all circumstances I have tried so far, so I assume the
SVK installation itself is up to scratch.

Hints, patches or workarounds anyone?

-- 
cheers  thomasl

web : http://thomaslauer.com/start


More information about the svk-devel mailing list