[rt-users] RT saves data in quoted-printable, why???

Václav Ovsík vaclav.ovsik at i.cz
Fri Mar 6 09:18:56 EST 2015


Hi,

On Thu, Mar 05, 2015 at 06:37:21PM -0500, Alex Vandiver wrote:
> On Fri, 6 Mar 2015 00:06:32 +0100 Václav Ovsík <vaclav.ovsik at i.cz>
> wrote:
> > https://issues.bestpractical.com/Ticket/Display.html?id=29735
> 
> Aha -- thanks for digging that out!  I thought I vaguely recalled
> something in this area previously.
> https://issues.bestpractical.com/Ticket/Attachment/286095/157750/utf8-encoding.patch
> looks to be functionally fairly similar to the branch.

Thanks for attention to this...

> There are a few other, orthogonal fixes in there that may still be
> interesting to tease out into their own commits.  It looks like I see
> changes to:
> 
>  * Fix the computed max size of base64'd attachments; I'd need to
> squint at it harder, but seems eminently reasonable.
> 
>  * Attempt to gracefully deal with TruncateLongAttachments truncating
> mid-byte of UTF-8 data.  As above; the decode/encode is an interesting
> trick to attempt to ensure that the byte stream is consistent.  I'd
> like to test it a bit, but seems not unreasonable.

It is not too efficient maybe, but easy and safety first :)

>  * Choose base64 vs QP based on which is shorter; I'm less convinced by
> this, since it means that for large data, it gets QP'd, base64'd, and
> then one of those _again_ -- which isn't terribly efficient.  I'm less
> convinced by the tradeoff of computation time to stored in-database
> size.

You are right. My intention was to gather as much readable text as
possible. Maybe a text contains some invalid characters, but the rest
of the text is readable, so QP is more appropriate, because it leaves
the most of a text readable.
So the measuring of length of an encoded data Base64/QP gives a result of
how much ASCII chars are there.
 len Base64 < len QP - many binary data - maybe some octet stream
 len QP < len Base64 - many ASCII chars - maybe the text

But this is corner case probably and it is not very interesting.
The most of the text should be UTF-8 valid and the rest is not
interesting these days.

> If you're interested in reworking the patch into a 2-3 commit series,
> I'm happy to apply for 4.2-trunk.
>  - Alex

https://github.com/bestpractical/rt/compare/stable...zito:4.2-zito-encodelob-utf8-fix
This is a bit newer version I'm using within production instance rt-4.2.9.
I will be happy if some part will be usable for RT mainline.

Thanks for fine software!
Cheers
-- 
Zito



More information about the rt-users mailing list