[rt-users] HTML stripper

Craig Schenk murple at murple.net
Tue Mar 30 19:45:53 EST 2004


Ive seen numerous posts here from people ($self->include) wishing that RT
could take incoming HTML mail and strip them down to plain text. I wrote this
Perl script to do this, it may not be the most elegant solution but it works
and can be used for RT, MajorDomo, whatever.

Basically it takes mail from STDIN and spits out a new one to STDOUT, so you
can put in your /etc/aliases:

     rt-queue: "| htmldump | rt-mailgate etc..."

If incoming mail is a straight text/html MIME type, the script will run it
through lynx -dump (or you can use html2txt) to generate a text version. Since
this may not be the prettiest formatting, a header is attached saying "this
was generated from HTML automatically etc" and the original HTML email is
preserved. The output of the script in this case will be a multipart MIME
email which has the text part first and then the HTML as another MIME
attachment, given the name "original.html" so it's obvious when viewed in the
RT ticket.

If the incoming email is already multipart, any text parts and attachments are
passed on unchanged. HTML parts are treated as above, with the exception that
if the MIME header already has a filename for the HTML part, it won't get
given the "original.html" name.

Im sure it can be improved, but it seems to work well enough for what I need.



More information about the rt-users mailing list