inbox.sourceware.org experiment

Mark Wielaard mark@klomp.org
Tue Aug 16 21:36:17 GMT 2022


Hi,

On Sat, Aug 13, 2022 at 04:14:03PM +0200, Mark Wielaard via Overseers wrote:
> Looking at the mailman2inbox.sh script I have a few suggestions (I can
> make them to the script myself, but don't know if you are currently
> editing/running it):
> 
> - public-inbox-init should probably use -V2 (see above). You can then
>   also use -j JOBS to speed up the import.
> 
> - --indexlevel shuld be full to make the Xapian searching more useful
>   (this is the default, so you can also not set it). Note that this
>   also affects the incremental updating done by public-inbox-mda.
> 
> - You want to kill public-inbox-httpd using -SIGHUP so it just reloads
>   the new config files. Yo also want to kill the other daemons,
>   public-inbox-imapd and public-inbox-nntpd
> 
> - The --ng name should be based on the primary domain name (see
>   above). I don't know how to determine that easily though. Maybe
>   mailman knows, then we can also set the initial ADDRESS properly.
> 
> The formail -s public-inbox-mda seems to work well for batch
> importing, but is it efficient enough for keeping the importing up to
> date? It looks like the last .mbox file is just really big and new
> messages are appended at the end, so we would be trying to import all
> messages all the ime. And how do we make sure it is triggered when new
> messages come in?

It turns out public-inbox does support importing a full mbox in one
go. But it doesn't have a nice binary for it yet. There is however
scripts/import_vger_from_mbox in upstream git which is easily adapted
(just remove the vger specific filtering).

I put this in the inbox homedir as import_from_mbox.  And to test I
remove the already imported elfutils-devel and reimported it using the
import_from_mbox script using:

$ public-inbox-init -V2 --ng inbox.sourceware.elfutils-devel -L full elfutils-devel /home/inbox/lists/elfutils-devel https://inbox.sourceware.org/elfutils-devel elfutils@sourceware.org elfutils-devel@lists.fedorahosted.org

$ ./import_from_mbox elfutils-devel elfutils-devel@lists.fedorahosted.org lists/elfutils-devel < /sourceware/projects/elfutils-home/elfutils-devel.nospam.mbox

$ for i in /var/lib/mailman/archives/private/elfutils-devel.mbox/*mbox; do ./import_from_mbox elfutils-devel elfutils-devel@sourceware.org lists/elfutils-devel < $i; done

Note this is V2 plus full indexing and includes and extra historical
elfutils-devel.nospam.mbox

Surprisingly this only took ~30 seconds in total.

The elfutils-devel.nospam.mbox doesn't contain enough headers to do
proper threading unfortunately. But the full index does make it
possible to match on similar subject.

I don't have a solution for keeping the archive up to date. Parsing
mboxes is really discouraged upstream because it needs reparsing all
messages and there is no locking mechanism for mboxes so if mailman
writes to the mbox and public-inbox reads from it odd things can
happen.

One way to make it work with public-inbox-watch is to subscribe the
inbox user to each list and create a Maildir of messages. But then the
message headers will have been rewritten by mailman. So it would be
better to somehow get the inbox user the messages before mailman sees
them, or somehow get the inbox user a copy of the message as mailman
would add to the mbox archive instead of what it sents to list
subscribers.

Cheers,

Mark


More information about the Overseers mailing list