[Rt-commit] rt branch, 4.2/utf8-reckoning, repushed
Alex Vandiver
alexmv at bestpractical.com
Wed Sep 3 13:49:41 EDT 2014
The branch 4.2/utf8-reckoning was deleted and repushed:
was 948620a5c3d91444e41ac38361a7b3fa81d5c466
now af9fe7c431b030f3c78cf0729819cc71df8d61a9
1: 2715890 = 1: 15dde68 Modernize and condense t/mail/sendmail.t and t/mail/sendmail-plaintext.t
2: af6491f = 2: a275a7f Always log bytes, not characters
3: b345603 = 3: 18ef9b2 The alluded-to deficiency is not a concern in perl ≥ 5.8.3
4: feb718b ! 4: 6d9bd63 Ensure all MIME::Entity bodies are UTF-8 encoded bytes
@@ -7,7 +7,7 @@
and noting their character set.
In the case of Approvals/index.html, there was no need for an explicit
- MIME::Entity object; ->Correspond creates on as needed from a "Content"
+ MIME::Entity object; ->Correspond creates one as needed from a "Content"
argument.
diff --git a/lib/RT/Action/CreateTickets.pm b/lib/RT/Action/CreateTickets.pm
5: 6dbe1b1 ! 5: 41d084f Ensure all MIME::Entity headers are UTF-8 encoded bytes
@@ -12,7 +12,7 @@
While the majority of these headers will never have wide characters in
them, always decoding and encoding ensures the proper disipline to
- guarantee that strings with the "UTF-8" flag do not get placed in a
+ guarantee that strings with the "UTF8" flag do not get placed in a
header, which can cause double-encoding.
diff --git a/lib/RT/Action/SendEmail.pm b/lib/RT/Action/SendEmail.pm
6: a122628 = 6: 12c2671 Make RT::Action::SendEmail->SetHeader take characters, not bytes
7: 2fcc445 ! 7: a21eb81 Add a utility method to check that an input is bytes
@@ -2,20 +2,20 @@
Add a utility method to check that an input is bytes
- Note that it is impossible to verify that an input characters; here, we
- can only validate if it _could_ be bytes.
+ Note that it is impossible to verify that an input is characters; here,
+ we can only validate if it _could_ be bytes.
- First, any string with the "UTF-8" flag off cannot contain codepoints
- above 255, and as such is safe. Additionally, if the "UTF-8" flag is
- on, having no codepoints above 127 means the bytes are unambigious.
- Having codepoints above 255 is guaranteedly a sign that the input is not
- a byte string.
+ First, any string with the "UTF8" flag off cannot contain codepoints
+ above 255, and as such is safe. Additionally, if the "UTF8" flag is on,
+ having no codepoints above 127 means the bytes are unambigious. Having
+ codepoints above 255 is guaranteedly a sign that the input is not a byte
+ string.
- This leaves only the case of a string with the "UTF-8" flag on, and
- codepoints above 127 but below 255. The "UTF-8" flag is a sign that
- they were _likely_ touched by character data at some point. In such
- cases we warn, suggesting that the bytes have the UTF-8 flag disabled by
- means of utf8::downgrade, if they are indeed bytes.
+ This leaves only the case of a string with the "UTF8" flag on, and
+ codepoints above 127 but below 255. The "UTF8" flag is a sign that they
+ were _likely_ touched by character data at some point. In such cases we
+ warn, suggesting that the bytes have the "UTF8" flag disabled by means
+ of utf8::downgrade, if they are indeed bytes.
diff --git a/lib/RT/Util.pm b/lib/RT/Util.pm
--- a/lib/RT/Util.pm
8: 0aea559 ! 8: 17702cd Verify that MIME::Entity bodies are bytes, and remove _utf8_off call
@@ -6,7 +6,7 @@
body is indeed bytes, and not characters.
We also remove the _utf8_off call -- because, contrary to what the
- comment implies, the presence or absence of the "UTF-8" flag does _not_
+ comment implies, the presence or absence of the "UTF8" flag does _not_
determine if a string is "encoded as octets and not as characters"; it
merely states that the string is capable of holding codepoints > 255.
If it happens to not contain any, the _utf8_off does nothing. If it
@@ -18,7 +18,7 @@
fixed by a simple _utf8_off, but instead must be fixed by ensuring that
the body always contains bytes, not wide characters -- as it now does,
thanks to the prior commits. The call to RT::Util::assert_bytes serves
- as an additional safeguard against backsliding o nthat assumption.
+ as an additional safeguard against backsliding on that assumption.
diff --git a/lib/RT/I18N.pm b/lib/RT/I18N.pm
--- a/lib/RT/I18N.pm
9: f1660db = 9: ba11085 Verify that MIME::Entity headers are bytes, and remove _utf8_off call
10: 41f6ff8 = 10: 1d18663 Standardize on the stricter Encode::encode("UTF-8", ...) everywhere
11: 0b4f458 = 11: ed0458d Remove "use utf8" from RT::I18N::fr, making NBSP explicit
12: 62668f9 = 12: 7548587 Remove remaining cases of "use utf8"
13: fe89415 = 13: fb58e26 Dashboard: decode bytes in query parameters into characters
14: 39c008c = 14: b2db8fc Tests: WWW::Mechanize correctly returns characters now
15: 52e4290 ! 15: 2be0797 _utf8_on in EncodeToMIME is needless and incorrect; remove it
@@ -4,12 +4,12 @@
66930fd8 switched from an explicit _utf8_off to an explicit _utf8_on, in
an attempt to switch from splitting on bytes to splitting on characters.
- However, the "UTF-8" flag does not magically determine if a string is
+ However, the "UTF8" flag does not magically determine if a string is
bytes or characters. Instead, only consistency in calling convention
can do so. All callsites of RT::Interface::Email::EncodeToMIME and
RT::Action::SendEmail::MIMEEncodeString now pass character strings; all
that _utf8_on can do is incorrectly "decode" those strings as UTF-8 if
- they happen to not have the "UTF-8" flag set.
+ they happen to not have the "UTF8" flag set.
diff --git a/lib/RT/Interface/Email.pm b/lib/RT/Interface/Email.pm
--- a/lib/RT/Interface/Email.pm
16: c73b596 = 16: f67c72a Move comment from PreprocessTimeUpdates to DecodeArgs, where it belongs
17: 9bba281 = 17: 3ac9388 Always decode data in %ARGS as UTF-8 in DecodeArgs
18: 5c8dcd5 = 18: 9cc181b Add RT::Util::assert_bytes checks to _EncodeLOB and _DecodeLOB
19: e6c9339 = 19: b1af637 Update POD and comments to be clearer about characters vs bytes
20: 82fa2b3 = 20: 701c7dd Remove an unreachable line
21: 44cd960 = 21: 4d70cfb TSV need not explicitly encode as UTF-8; all output is UTF-8 encoded
22: a4c0582 = 22: b2e341b Move "use Encode" calls to one central location
23: 40b9dc2 = 23: d91b416 Consistent character/byte hygene allows RT to run with DBD::Pg 3.3.0
24: ea0eeed = 24: 89a8568 Note that HTTP output still incorrectly relies on is_utf8
25: c014818 = 25: bc8e5e9 Comment the logic for database decode_utf8/is_utf8 checking
26: 8eb5159 = 26: 0a5fd0a Encode characters on their way out of tests
27: 948620a = 27: af9fe7c Stop hiding "Wide character in..." warnings
More information about the rt-commit
mailing list