I am looking for a search facility in attached files
(eg. word-doc, ppt, xls, PDF,...).
Eventually through a plug-in/add-on.
PS. I just got started with Twiki, so if the feature is already present,
just let me know.
--
JoWyns - 25 Jul 2002
No its not present at the Time. Since you relay to grep-features searching these content is not trivial
- Have a look at Search::searchWeb as an extension point. Think you can implement a plugin-concept as used for the normal plugins. Have a look at http://jakarta.apache.org/lucene/docs/index.html
for implementing such a Framework (Note: its Java ;-))
- There exist several other topics in the Codev web where using other indexing-servers are described (eg. IIS-Indexing-Services, or Vertiy).
--
MarkusKling - 25 Jul 2002
I'll have a look at it.
I was already thinking of using Oracle
InterMediaText (yep, we are heavy oracle users), and try to store the attachments in the Oracle database. Then I would need to provide a link (in the header, or footer) to let the user search in the Oracle database.
Any suggestions?
Grep-like search is nice, but no must (for our users).
--
JoWyns - 25 Jul 2002
See
SearchAttachments and
SearchEngineVsGrepSearch for similar discussions. There are two open source search engines which I know of can handle MSOffice documents, Namazu & Perlfect. There are probably more.
--
MattWilkie - 31 Jul 2002
Some of our users need a search facility for attached files too.
So i changed the search script. The new search script is attached. In my tests
ist works well also with word-doc-files.
I think should be better included in the twiki environment. For example by an additional checkbox "Search Attachment" in the web search dialog.
--
NorbertWindrich - 29 Jan 2003
I too would like to see this as a plugin or part of the main code tree
--
VickiBrown - 08 Dec 2004
Take a look at
SearchEnginePluceneAddOn. This add on is a 100% Perl implementation (it uses Plucene which is a Perl port of the Java library named Lucene).
--
JoanMVigo - 21 Dec 2004
One possibility, and a possible replacement for the whole of search and hence the grep-backtick-security-bug scenario is to use ht://dig -
http://www.htdig.org/
A few accomdations might be needed. It may require a plugin to update the index every time a topic is edited or an attachment is added. But ht://dig ...
- Can act as a robot, just following the 'view' links
- Can handle arbitrarily long boolean expressions
- Can handle soundex or similar searches
- Can use external parsers - http://www.htdig.org/contrib/
But, you say, a wiki is a
dynamic site. OK, so look at
http://www.devshed.com/c/a/PHP/Search-This/
--
AntonAylward - 21 Dec 2004
Perhaps you can take a look at
SearchEngineKinoSearchAddOn. IOts and indexed search that also indexes attachments like PDF, DOC and XLS documents. This is similar to
SearchEnginePluceneAddOn but bases on KinoSearch and thus much faster and more scalable.
--
MarkusHesse - 16 Sep 2007