[Rt-commit] rt branch, 4.4/rescue-outlook-html, created. rt-4.4.3-52-g98ff7e87e

Gergely Nagy algernon at bestpractical.com
Wed Oct 3 13:16:44 EDT 2018


The branch, 4.4/rescue-outlook-html has been created
        at  98ff7e87e8a7d10868749df2bf151bd24e4fb937 (commit)

- Log -----------------------------------------------------------------
commit 98ff7e87e8a7d10868749df2bf151bd24e4fb937
Author: Gergely Nagy <algernon at bestpractical.com>
Date:   Wed Oct 3 19:14:09 2018 +0200

    WIP: Teach RT::EmailParser::RescueOutlook how to clean up HTML mail too
    
    RescueOutlook cleaned up the `text/plain` parts of a message, but over the
    years, HTML mail became much more common, and the HTML parts produced by Outlook
    have similar issues. This change attempts to clean up those parts too.
    
    (WIP, because it is untested. The regexp does work, but the part extraction is
    something I'm unsure about.)
    
    Signed-off-by: Gergely Nagy <algernon at bestpractical.com>

diff --git a/lib/RT/EmailParser.pm b/lib/RT/EmailParser.pm
index ad26a291b..c4c2103b9 100644
--- a/lib/RT/EmailParser.pm
+++ b/lib/RT/EmailParser.pm
@@ -634,6 +634,9 @@ in it.  it's cool to have a 'text/plain' part, but the problem is the part is
 not so right: all the "\n" in your main message will become "\n\n" :/
 
 this method will fix this bug, i.e. replaces "\n\n" to "\n".
+
+Similarly, if the message is HTML-only, the same problem is present there: between each paragraph, there will be an empty one in between with only a line break. This method removes those line break-only paragraphs too.
+
 return 1 if it does find the problem in the entity and get it fixed.
 
 =cut
@@ -645,7 +648,7 @@ sub RescueOutlook {
 
     return unless $mime && $self->LooksLikeMSEmail($mime);
 
-    my $text_part;
+    my $text_part, $html_part;
     if ( $mime->head->get('Content-Type') =~ m{multipart/mixed} ) {
         my $first = $mime->parts(0);
         if ( $first->head->get('Content-Type') =~ m{multipart/alternative} )
@@ -655,6 +658,11 @@ sub RescueOutlook {
             {
                 $text_part = $inner_first;
             }
+            my $inner_second = $first->parts(1);
+            if ( $inner_second->head->get('Content-Type') =~ m{text/html} )
+            {
+                $html_part = $inner_second;
+            }
         }
     }
     elsif ( $mime->head->get('Content-Type') =~ m{multipart/alternative} ) {
@@ -662,6 +670,10 @@ sub RescueOutlook {
         if ( $first->head->get('Content-Type') =~ m{text/plain} ) {
             $text_part = $first;
         }
+        my $second = $mime->parts(0);
+        if ( $second->head->get('Content-Type') =~ m{text/html} ) {
+            $html_part = $second;
+        }
     }
 
     # Add base64 since we've seen examples of double newlines with
@@ -694,6 +706,27 @@ sub RescueOutlook {
         }
     }
 
+    if ($html_part) {
+
+        # use the unencoded string
+        my $content = $html_part->bodyhandle->as_string;
+
+        if ( $content =~ s{<p(\s*style="[^"]*")><br>\n</p>}{}mg ) {
+
+            # only write only if we did change the content
+            if ( my $io = $html_part->open("w") ) {
+                $io->print($content);
+                $io->close;
+                $RT::Logger->debug(
+                    "Removed extra newlines from MS Outlook message.");
+                return 1;
+            }
+            else {
+                $RT::Logger->error("Can't write to body to fix newlines");
+            }
+        }
+    }
+
     return;
 }
 

-----------------------------------------------------------------------


More information about the rt-commit mailing list