Currently public-inbox just drops emails that have HTML. public-inbox-mda says: May 09 20:40:46 *** We only accept plain-text mail, No HTML *** This is fairly hardcoded into public-inbox. So we might want to add a filter in front of public-inbox-mda that filters out any text/html attachments like mailman does.
mimedefang has been installed but not yet configured to do the actual stripping for the inbox user.
With mimedefang we could use the following simple mimedefang-filter: # -*- Perl -*- sub filter_end { my($entity) = @_; remove_redundant_html_parts($entity); } # DO NOT delete the next line, or Perl will complain. 1; But it isn't clear to me how/if we can use the milter setup to only filter messages sent to the inbox user, or how to integrate it into the inbox .forward filter /home/inbox/public-inbox-mda-true.sh Running mimedefang.pl directly by hand seems to work, but then we need another wrapper to setup the COMMANDS and interpret the RESULTS as described in mimedefang-protocol. Maybe such a wrapper already exists?
Created attachment 14957 [details] remove_redundant_html_parts.pl filter Trying to use the milter interface might be tricky. But the actual functionality required from mimedefang can be easily extracted. The attached remove_redundant_html_parts.pl script acts as a filter that takes as input an email and either outputs that original email or the email with redundant html parts removed. This could be used as filter to public-inbox-mda
(In reply to Mark Wielaard from comment #3) > Created attachment 14957 [details] > remove_redundant_html_parts.pl filter > > This could be used as filter to public-inbox-mda This has been installed now as filter-public-inbox-mda-true.sh which is the .forward script for the inbox calling public-inbox-mda. It seems to work as intended. We do still have to (re)import old (rejected by public-inbox) emails containing HTML. Those are in the pipermail archives (already stripped).
(In reply to Mark Wielaard from comment #4) > We do still have to (re)import old (rejected by public-inbox) emails > containing HTML. Those are in the pipermail archives (already stripped). This was done overnight using the .public-inbox/emergency mailbox (which stores all rejected messages): for i in .public-inbox/emergency/cur/*; do orig_to=$(grep ^X-Original-To: $i | cut -f2 -d\ ); export ORIGINAL_RECIPIENT="$orig_to"; cat $i | /home/inbox/remove_redundant_html_parts.pl | /usr/bin/public-inbox-mda --no-precheck; fi; done which was also a good test of the remove_redundant_html_parts.pl script. A quick inspection of inbox.sourceware.org now shows messages with (redundant) HTML parts are now archives as they were with pipermail.