inbox.sourceware.org experiment

Mark Wielaard mark@klomp.org
Wed Aug 24 21:06:06 GMT 2022


Hi,

On Wed, Aug 24, 2022 at 12:05:03PM +0200, Mark Wielaard via Overseers wrote:
> I noticed two issues some lists seem to have a bad/corrupt xapian
> database and generate an error while indexing (gcc-patches).

I tried reindexing and compacting the largest lists. This did not
help. But the compacting did reduce the disk size of the xapian
indexes by 10GB (!).

There is now a bit more logging in
/home/inbox/logs/public-inbox-mda.out.log

It looks like this error:

rollback ineffective with AutoCommit enabled at
/usr/share/perl5/vendor_perl/PublicInbox/V2Writable.pm line 621.
checkpoint: Exception: Error writing block 147232
shard close: Exception: Error writing block 147236

Only happens after importing a new gcc-patches message. The message
isn't fully indexed, but can be referenced normally. It won't show up
in full text searches though. I haven't figured out why. I'll ask
upstream how the better debug this.

> emails with slashes / in the Message-ID sometimes get wrongly
> escaped and appear to not be in the archive while they really are.
> e.g. the message I am replying to shows as:
> https://inbox.sourceware.org/overseers/YwVP8+LHvyLzUG%2F+@wildebeest.org/
> But should be:
> https://inbox.sourceware.org/overseers/YwVP8+LHvyLzUG/+@wildebeest.org/

This isn't a big deal except when the / is at the end of the
Message-ID. Which unfortunately happens for bugzilla emails which end
in @http.sourceware.org/bugzilla/ that last slash seems to be a real
problem. Don't know a workaround for that yet.

You see public-inbox does know about the Message-ID by searching for:
https://inbox.sourceware.org/libabigail/bug-29464-9487@http.sourceware.org/bugzilla//
Which will suggest that actual URL as "partial match" but then when
following that link the slashes get escaped again... Will ask upstream
if there is any solution for this.

Finally there are some lists that accept HTML emails (by stripping off
the HTML part). public-inbox however simply rejects those emails.

*** We only accept plain-text mail, No HTML ***

Again, we should ask upstream if there could be an option to accept
just the plain/text part of such emails.

Note that such emails do end up in the .public-inbox/emergency mailbox
so in theory we could remove the text/html part and then reinsert the
message.

So there are some issues, but in general I think it works just fine
now.

Cheers,

Mark


More information about the Overseers mailing list