inbox.sourceware.org experiment

Frank Ch. Eigler fche@elastic.org
Wed Aug 17 13:24:56 GMT 2022


Hi -

> [...]
> > Is there a need for "full" indexing as opposed to "basic"?  I don't
> > see why we'd need another text search engine for this stuff, we
> > already have.  The basic "v1" with basic indexing seems fine and
> > effective for web and nntp.
> [...]
> I don't think we should be using v1 archives, those or deprecated
> upstream and they strongly recommend using v2 archives which are much
> more scalable.

Given that v1 is the default of public-inbox-init, they can't be that bad.

> Reimporting the lists as v2 archives using the import_from_mbox
> script should be much more efficient and can be done in a couple of
> hours instead of days.

That speed is nice, but I suspect that's not a v1/v2 representation
efficiency issue but something else.


> A full index does not just make full text search of the mailinglist
> really fast, it also indexes addresses, date ranges, subjects, headers,
> body, attachments, etc. And the results are also available as mbox. So
> you would then be able to easily express "give me all emails/threads
> in gcc-patches from the last 6 months that discuss dwarf2out.cc where
> I was not the sender or one of the receivers" and then download the
> whole mbox or browse all those messages/threads online.  [...]

Yes, understood that the extra indexing can do extra searches.  My
question was about utility/need for this.  For elfutils-devel, note
that the full xapian indexes are about 10x the size of the
git-compressed email archive, whereas in the case of the systemtap
import, it's only about 0.2x, so there is a serious cost/benefit
question.

(In both v1 and v2 cases, the git representation of the mailboxes is
about 60% of the size of the raw mbox files.  That's pretty puny
compression TBH, I expected much better.)


> That would be great. But I would need some time reading up on
> postfix/mailman configs. Do you have an example of where/how this hack
> would be done?

postfix delivers mailing list traffic via /etc/mailman/aliases,
e.g.:

autobook-cvs:             "|/usr/local/mailman/mailman post autobook-cvs"
autobook-cvs-bounces:     "|/usr/local/mailman/mailman bounces autobook-cvs"
autobook-cvs-confirm:     "|/usr/local/mailman/mailman confirm autobook-cvs"
autobook-cvs-join:        "|/usr/local/mailman/mailman join autobook-cvs"

I would use a script to generate a new config file from that, so that the
primary mailing list incoming aliases are forked:

autobook-cvs:             autobook-cvs-mailman, autobook-cvs-inbox
autobook-cvs-mailman:     "|/usr/local/mailman/mailman post autobook-cvs"
autobook-cvs-inbox:       "|env SOMETHING /usr/bin/public-inbox-mda SOMETHING"
autobook-cvs-bounces:     "|/usr/local/mailman/mailman bounces autobook-cvs"
autobook-cvs-confirm:     "|/usr/local/mailman/mailman confirm autobook-cvs"
autobook-cvs-join:        "|/usr/local/mailman/mailman join autobook-cvs"

and then switch postfix to this alias file instead.

- FChE


More information about the Overseers mailing list