Tags:
prometricexam_1Add my vote for this tag search1Add my vote for this tag create new tag
view all tags
I am looking for a search facility in attached files (eg. word-doc, ppt, xls, PDF,...). Eventually through a plug-in/add-on.

PS. I just got started with Twiki, so if the feature is already present, just let me know.

-- JoWyns - 25 Jul 2002

No its not present at the Time. Since you relay to grep-features searching these content is not trivial wink

  1. Have a look at Search::searchWeb as an extension point. Think you can implement a plugin-concept as used for the normal plugins. Have a look at http://jakarta.apache.org/lucene/docs/index.html for implementing such a Framework (Note: its Java ;-))
  2. There exist several other topics in the Codev web where using other indexing-servers are described (eg. IIS-Indexing-Services, or Vertiy).

-- MarkusKling - 25 Jul 2002

I'll have a look at it. I was already thinking of using Oracle InterMediaText (yep, we are heavy oracle users), and try to store the attachments in the Oracle database. Then I would need to provide a link (in the header, or footer) to let the user search in the Oracle database.

Any suggestions?

Grep-like search is nice, but no must (for our users).

-- JoWyns - 25 Jul 2002

See SearchAttachments and SearchEngineVsGrepSearch for similar discussions. There are two open source search engines which I know of can handle MSOffice documents, Namazu & Perlfect. There are probably more.

-- MattWilkie - 31 Jul 2002

Some of our users need a search facility for attached files too.

So i changed the search script. The new search script is attached. In my tests ist works well also with word-doc-files.

I think should be better included in the twiki environment. For example by an additional checkbox "Search Attachment" in the web search dialog.

-- NorbertWindrich - 29 Jan 2003

I too would like to see this as a plugin or part of the main code tree

-- VickiBrown - 08 Dec 2004

Take a look at SearchEnginePluceneAddOn. This add on is a 100% Perl implementation (it uses Plucene which is a Perl port of the Java library named Lucene).

-- JoanMVigo - 21 Dec 2004

One possibility, and a possible replacement for the whole of search and hence the grep-backtick-security-bug scenario is to use ht://dig - http://www.htdig.org/

A few accomdations might be needed. It may require a plugin to update the index every time a topic is edited or an attachment is added. But ht://dig ...

  • Can act as a robot, just following the 'view' links
  • Can handle arbitrarily long boolean expressions
  • Can handle soundex or similar searches
  • Can use external parsers - http://www.htdig.org/contrib/

But, you say, a wiki is a dynamic site. OK, so look at http://www.devshed.com/c/a/PHP/Search-This/

-- AntonAylward - 21 Dec 2004

Perhaps you can take a look at SearchEngineKinoSearchAddOn. IOts and indexed search that also indexes attachments like PDF, DOC and XLS documents. This is similar to SearchEnginePluceneAddOn but bases on KinoSearch and thus much faster and more scalable.

-- MarkusHesse - 16 Sep 2007

Topic attachments
I Attachment History Action Size Date Who Comment
Perl source code filepm Search.pm r1 manage 27.6 K 2003-01-29 - 08:13 UnknownUser search attachments
Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r9 - 2007-09-16 - MarkusHesse
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.