inbox.sourceware.org experiment
Frank Ch. Eigler
fche@elastic.org
Wed Aug 17 13:24:56 GMT 2022
Hi -
> [...]
> > Is there a need for "full" indexing as opposed to "basic"? I don't
> > see why we'd need another text search engine for this stuff, we
> > already have. The basic "v1" with basic indexing seems fine and
> > effective for web and nntp.
> [...]
> I don't think we should be using v1 archives, those or deprecated
> upstream and they strongly recommend using v2 archives which are much
> more scalable.
Given that v1 is the default of public-inbox-init, they can't be that bad.
> Reimporting the lists as v2 archives using the import_from_mbox
> script should be much more efficient and can be done in a couple of
> hours instead of days.
That speed is nice, but I suspect that's not a v1/v2 representation
efficiency issue but something else.
> A full index does not just make full text search of the mailinglist
> really fast, it also indexes addresses, date ranges, subjects, headers,
> body, attachments, etc. And the results are also available as mbox. So
> you would then be able to easily express "give me all emails/threads
> in gcc-patches from the last 6 months that discuss dwarf2out.cc where
> I was not the sender or one of the receivers" and then download the
> whole mbox or browse all those messages/threads online. [...]
Yes, understood that the extra indexing can do extra searches. My
question was about utility/need for this. For elfutils-devel, note
that the full xapian indexes are about 10x the size of the
git-compressed email archive, whereas in the case of the systemtap
import, it's only about 0.2x, so there is a serious cost/benefit
question.
(In both v1 and v2 cases, the git representation of the mailboxes is
about 60% of the size of the raw mbox files. That's pretty puny
compression TBH, I expected much better.)
> That would be great. But I would need some time reading up on
> postfix/mailman configs. Do you have an example of where/how this hack
> would be done?
postfix delivers mailing list traffic via /etc/mailman/aliases,
e.g.:
autobook-cvs: "|/usr/local/mailman/mailman post autobook-cvs"
autobook-cvs-bounces: "|/usr/local/mailman/mailman bounces autobook-cvs"
autobook-cvs-confirm: "|/usr/local/mailman/mailman confirm autobook-cvs"
autobook-cvs-join: "|/usr/local/mailman/mailman join autobook-cvs"
I would use a script to generate a new config file from that, so that the
primary mailing list incoming aliases are forked:
autobook-cvs: autobook-cvs-mailman, autobook-cvs-inbox
autobook-cvs-mailman: "|/usr/local/mailman/mailman post autobook-cvs"
autobook-cvs-inbox: "|env SOMETHING /usr/bin/public-inbox-mda SOMETHING"
autobook-cvs-bounces: "|/usr/local/mailman/mailman bounces autobook-cvs"
autobook-cvs-confirm: "|/usr/local/mailman/mailman confirm autobook-cvs"
autobook-cvs-join: "|/usr/local/mailman/mailman join autobook-cvs"
and then switch postfix to this alias file instead.
- FChE
More information about the Overseers
mailing list