SearchPDFPlugin
SearchPDFPlugin allows the contents of attached PDF files to be included in searches. This functionality is covered in other plugins (SearchEngineSwishEAddOn & SearchEnginePluceneAddOn) but these plugins depended on CPAN modules or programs that did not run on a windows server.
How does it work?
This plugin requires an external program to extract text from PDF files and then stores the results in a META tag within the topic. There are three main components of this process:
SearchPDFPlugin: Handles the events related to attachments being added to or removed from topics. Checks to see if the attachment is a PDF and if the attachment is being removed it removes any META data associated with the attachment. If the attachment is being added then it writes an entry into the SearchPDF.txt file in the work area.
SearchPDF.txt: Tracks when new attachments have been added and need to be indexed.
- If this file contains the word
ALL on a single line then all topics are checked for PDF attachments.
indexPDF.pl: Process the SearchPDF.txt file in the work area by calling the text extraction program to generate META data and saves the data in the appropriate topic.
Plugin Installation Instructions
- Download the ZIP file from the Plugin web (see below)
- Unzip
SearchPDFPlugin.zip in your root ($TWIKI_ROOT) directory. Content:
| File: |
Description: |
data/TWiki/SearchPDFPlugin.txt |
This page. |
lib/TWiki/Plugins/SearchPDFPlugin.pm |
The plugin code. |
pub/_work_areas/SearchPDFPlugin/SearchPDF.txt |
Work file that stores recently attached PDFs that need to be indexed (contains 'ALL' so the first time the script runs all topics with PDFs are indexed. |
tools/indexPDF.pl |
Script that reads SearchPDF.txt file and adds META data to topics. |
- Create a new user TWikiSearchPDF that is a member of the TWikiAdminGroup (or edit the preferences below to select an account of your choice).
- Download and install the XPDF program for extracting text from PDF files (http://www.foolabs.com/xpdf/download.html
)
- Edit the TWikiPreferences for your site and add the following:
* Search PDF plugin needs a user account in order to modify topics
* Set SEARCHPDFUSER = TWikiSearchPDF
* Set SEARCHPDFUSERWEB = Main
- Add a line to LocalSite.cfg that specifies the location and name of the XPDF program:
- $TWiki::cfg{Plugins}{SearchPDFPlugin}{XPDFLocation} = 'c:/Wiki/xpdf-3.02-win32/pdftotext.exe';
- Visit
configure in your TWiki installation, and enable the plugin in the {Plugins} section.
Plugin Info
- Set SHORTDESCRIPTION = Search attached PDF documents.
Related Topics: TWikiPlugins,
DeveloperDocumentationCategory,
AdminDocumentationCategory,
TWikiPreferences