Tags:
create new tag
, view all tags

Question

Plucene installed succesfully with BackEnd parsers and updated CPAN libraries. Index is succesful (topics & attachments) but search doesn't return results from attachments (it seems to only search within topics). Any suggestion on how to debug it? Plucene Index log shows succesful and Apache log doesnt mention anything regarding plucene.

Environment

TWiki version: TWikiRelease04x00x05
TWiki plugins: DefaultPlugin, EmptyPlugin, InterwikiPlugin
Server OS: VM Debian Stable Linux install
Web server: Apache
Perl version:  
Client OS: Linux
Web Browser:  
Categories: Plugins, Add-Ons

-- MiloValenzuela - 17 Nov 2006

Answer

ALERT! If you answer a question - or have a question you asked answered by someone - please remember to edit the page and set the status to answered. The status is in a drop-down list below the edit box.

As it says in the SearchEnginePluceneAddOn topic, it doesn't index in attachments.

You might consider using the SearchEngineSwishEAddOn instead.

-- CrawfordCurrie - 16 Dec 2006

Actually, SearchEnginePluceneAddOn is indexing attachments. I am not sure however how to debug your case.

-- PeterThoeny - 16 Dec 2006

Is there any way to identify "what" gets indexed? I assume that by indexing it means that it converts to some sort of text format the attachments so that they can be searched afterwards. Is there any way to check this?

-- MiloValenzuela - 21 Dec 2006

Sorry, you are right, it indexes PDF, HTML and text attachments, but not office or M$ documents, which is why never started using it.

There appears to a system in Plucene for plugging in "back end parsers" which I assume are responsible for converting the attachments to a canonical (indexable) form. Whether that is text or not.....

-- CrawfordCurrie - 22 Dec 2006

It also indexes M$ documents if you install the ExtraBackendParsers.zip parsers attached to the SearchEnginePluceneAddOnDev topic.

-- PeterThoeny - 24 Dec 2006

I did installed the plugins you mention (with their respective dependencies)...No luck...do you know where could it be the "canonical form" that CrawfordCurrie mentioned?

-- MiloValenzuela - 29 Dec 2006

The backend parsers transform a proprietary format (.doc, .pdf) into an intermediate html for indexing. Sorry, I am not that familiar with the add-on to help debug.

-- PeterThoeny - 29 Dec 2006

 
Change status to:
Topic revision: r8 - 2006-12-29 - PeterThoeny
 
Twitter Delicious Facebook Digg Google Bookmarks E-mail LinkedIn Reddit StumbleUpon    
  • Download TWiki
TWiki logo Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.