develooper Front page | perl.pep | Postings from November 2017

Re: Any advice for a searchable web archiver ?

Thread Previous | Thread Next
Marc Chantreux
November 20, 2017 19:43
Re: Any advice for a searchable web archiver ?
Message ID:
> I'm glad to see that you're interested in JMAP :)  We're also betting
> very heavily on it at FastMail as I'm sure you're aware!

yes! i saw the JMAP proxy you have on github and subscribed on

> We're using Xapian as part of Cyrus IMAP, and it's quite useful for
> what we're doing,

do you think this should be enough to store mailing lists archives?

> There are some pitfalls to look out for, for example if you naively
> index everything a search for "references" is going to return quite a
> lot of messages.
> Another problem with naive indexing is that Maildir allows message file
> names to move as flags are added/removed, and you'll want your indexer
> to avoid reindexing them every time.  I expect you might already have a
> datastructure that handles that though.

i was plaining to have "append only" maildir strategy: nothing would be

> In terms of search usefulness, most of our customers love the stemming
> support, but it does have some exciting issues around languages and
> diacritics and inability to match on anything other than word prefixes -
> so you can't match partial strings inside a word.  That may or may not
> be an issue for your usecase.

thanks for sharing. it should be taken carrefully.

> I don't know Dezi or Lucy, so I don't have a strong opinion there.

also good to know.

thanks a lot for your reply.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About