Feature Proposal: Delegate More Processing To Search Algorithm
Motivation
During my development of the Kino Search Algorithm in
SearchEngineKinoSearchAddOn, it becomes incredibly obvious that the TWiki core needs to delegate more choices to the Search Algorithm.
This work may be interwoven with some of the
ResultSet and
ExtractAndCentralizeFormattingRefactor work.
Description and Documentation
In TWiki 4.2.2, when SEARCHs happen, we call a very naive pluggable function
once per web -
SearchAlgorithm::search ( $searchString, $topics, $options, $sDir, $sandbox, $web )
where $options only contains scope, type, casesensitive, wordboundaries, and $topics (painfully) created list of topics.
This function then returns a hash of topic name to 'extract', which the Search rendering then throws away, keeping only the topicname list.
SearchEngineKinoSearchAddOn (As can the Xapian Engine I'm working on) can return (incredibly quickly) all the meta information for the topic, including a contextual extract, and to add to that, can return non-topics - attachments and other external data, which I would love to use.
Impact
Implementation
So: I propose to refactor the
TWiki::Store::SearchAlgorithms and
TWiki::Store::QueryAlgorithms API's (which I understand only Crawford and I have worked with
please pipe up if I've missed you to :
- bring them into one API, where multiple SearchAlgorithms can register themselves as capable of processing a search type (or list of types)
- create the UI elements to dynamically add support for enabled 'types' in the WebSearch topic (so we can have attachment, external doc, google search) checkboxes
- pass the SearchAlgorithms all the known settings that might allow it to optimise a query (including the format string)
- use any information that SearchAlgorithms return in the output rendering, thus leveraging advanced improvements
for backwards compatibility, the currently existing search types and scopes will be required to return identical results as in previous versions of twiki. This implies that
scope=all will not in fact search all data types, but rather only topicname and topic text.
--
Contributors: SvenDowideit - 19 Aug 2008
Discussion
Great Initiative, Sven!!!
From my studies about twiki performance, I realized that search and store are the worst bottlenecks. I was
planning to try out Xapian (it seems to be very fast).
TWiki-5 will fly
--
GilmarSantosJr - 19 Aug 2008
Sounds excellent, Sven. The devil is in the detail; it sounds like you will be doing a lot of refactoring in Search.pm (to get rid of those topic lists, for a start).
Ideally I'd like the API fixes to climb higher up the tree so that I can perform multiple-web searches with one call; though that may be a refactoring too far.
--
CrawfordCurrie - 19 Aug 2008
It would be so cool to make it a modern interface using iterators over result sets. I can imagine that most of the current Search.pm simply goes over the fence.
--
MichaelDaum - 19 Aug 2008
Please remember a date in date of commitment field so the proposal app can work. Added todays date
--
KennethLavrsen - 11 Sep 2008
I am setting this to parked and no committed developer. Please feel free to flip that and own & implement.
--
PeterThoeny - 2010-08-01