Bug 30436 - inbox: strip HTML attachements
Summary: inbox: strip HTML attachements
Status: RESOLVED FIXED
Alias: None
Product: sourceware
Classification: Unclassified
Component: Infrastructure (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: overseers mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-09 21:55 UTC by Mark Wielaard
Modified: 2023-07-10 09:08 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
remove_redundant_html_parts.pl filter (1.32 KB, text/plain)
2023-07-09 14:10 UTC, Mark Wielaard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Wielaard 2023-05-09 21:55:40 UTC
Currently public-inbox just drops emails that have HTML.

public-inbox-mda says:
May 09 20:40:46 *** We only accept plain-text mail, No HTML ***

This is fairly hardcoded into public-inbox.

So we might want to add a filter in front of public-inbox-mda that filters out any text/html attachments like mailman does.
Comment 1 Mark Wielaard 2023-06-23 14:22:57 UTC
mimedefang has been installed but not yet configured to do the actual stripping for the inbox user.
Comment 2 Mark Wielaard 2023-07-01 22:04:54 UTC
With mimedefang we could use the following simple mimedefang-filter:

# -*- Perl -*-
sub filter_end {
    my($entity) = @_;
    remove_redundant_html_parts($entity);
}
# DO NOT delete the next line, or Perl will complain.
1;

But it isn't clear to me how/if we can use the milter setup to only filter messages sent to the inbox user, or how to integrate it into the inbox .forward filter /home/inbox/public-inbox-mda-true.sh

Running mimedefang.pl directly by hand seems to work, but then we need another wrapper to setup the COMMANDS and interpret the RESULTS as described in mimedefang-protocol. Maybe such a wrapper already exists?
Comment 3 Mark Wielaard 2023-07-09 14:10:02 UTC
Created attachment 14957 [details]
remove_redundant_html_parts.pl filter

Trying to use the milter interface might be tricky. But the actual functionality required from mimedefang can be easily extracted. The attached remove_redundant_html_parts.pl script acts as a filter that takes as input an email and either outputs that original email or the email with redundant html parts removed.

This could be used as filter to public-inbox-mda
Comment 4 Mark Wielaard 2023-07-09 19:06:33 UTC
(In reply to Mark Wielaard from comment #3)
> Created attachment 14957 [details]
> remove_redundant_html_parts.pl filter
> 
> This could be used as filter to public-inbox-mda

This has been installed now as filter-public-inbox-mda-true.sh which is the .forward script for the inbox calling public-inbox-mda. It seems to work as intended.

We do still have to (re)import old (rejected by public-inbox) emails containing HTML. Those are in the pipermail archives (already stripped).
Comment 5 Mark Wielaard 2023-07-10 09:08:35 UTC
(In reply to Mark Wielaard from comment #4)
> We do still have to (re)import old (rejected by public-inbox) emails
> containing HTML. Those are in the pipermail archives (already stripped).

This was done overnight using the .public-inbox/emergency mailbox (which stores all rejected messages):

for i in .public-inbox/emergency/cur/*; do orig_to=$(grep ^X-Original-To: $i | cut -f2 -d\ ); export ORIGINAL_RECIPIENT="$orig_to"; cat $i | /home/inbox/remove_redundant_html_parts.pl | /usr/bin/public-inbox-mda --no-precheck; fi; done

which was also a good test of the remove_redundant_html_parts.pl script.

A quick inspection of inbox.sourceware.org now shows messages with (redundant) HTML parts are now archives as they were with pipermail.