inbox.sourceware.org experiment

Mark Wielaard mark@klomp.org
Wed Aug 17 12:25:17 GMT 2022


Hi Frank,

On Tue, Aug 16, 2022 at 06:10:58PM -0400, Frank Ch. Eigler wrote:
> > It turns out public-inbox does support importing a full mbox in one
> > go. But it doesn't have a nice binary for it yet. There is however
> > scripts/import_vger_from_mbox in upstream git which is easily adapted
> > (just remove the vger specific filtering).
> 
> This is already 99% done for the sourceware mailing lists.

Nice. Was this done using the mailman2inbox.sh script? I believe that
is still generating v1 archives. Which is why I regenerated the
elfutils-devel one.

> > [...]
> > Note this is V2 plus full indexing and includes and extra historical
> > elfutils-devel.nospam.mbox
> 
> Is there a need for "full" indexing as opposed to "basic"?  I don't
> see why we'd need another text search engine for this stuff, we
> already have.  The basic "v1" with basic indexing seems fine and
> effective for web and nntp.

Note that full indexing is separate from using v1 or v2 archives.

I don't think we should be using v1 archives, those or deprecated
upstream and they strongly recommend using v2 archives which are much
more scalable. Reimporting the lists as v2 archives using the
import_from_mbox script should be much more efficient and can be done
in a couple of hours instead of days.

A full index does not just make full text search of the mailinglist
really fast, it also indexes addresses, date ranges, subjects, headers,
body, attachments, etc. And the results are also available as mbox. So
you would then be able to easily express "give me all emails/threads
in gcc-patches from the last 6 months that discuss dwarf2out.cc where
I was not the sender or one of the receivers" and then download the
whole mbox or browse all those messages/threads online. See
e.g. https://inbox.sourceware.org/elfutils-devel/_/text/help/ for the
xapian queries you can execute.

> > [...]
> > I don't have a solution for keeping the archive up to date. [...]
> 
> We can hack a postfix->|mailman and |inbox-mda alias-fork
> and dual pipe delivery for each mailing list.

That would be great. But I would need some time reading up on
postfix/mailman configs. Do you have an example of where/how this hack
would be done?

Thanks,

Mark


More information about the Overseers mailing list