It would be nice to automatically convert documents in
FileAttachments from their native formats to
HTML. This could be done at the time of uploading the file; or in a lazy way - at the time of accessing the
HTML equivalent for the first time. The
HTML format has the added advantage that search could include file attachments as well.
Previously I was already considering automatic conversion of Word, Power Point and Excel files to
HTML format so that attachments could be retrieved in the original format or in a - somewhat degraded -
HTML format. At that time I could not find any free software that was suitable, or had an acceptable conversion quality. Ideally the conversion utility would be in Perl - for portability reasons. Simple C with not too many system or library dependencies would be acceptable too.
--
PeterThoeny - 09 Apr 2000
Checkout wvHtml at
http://www.wvware.com
- its standard with most linux distros now, and does quite a good job - you can even convert inline graphics to gif/jpg if you must...
--
CrisBailiff - 13 Feb 2001
Thanks for the pointer. By coincidence I got the same pointer by Matt Sergeant, director and CTO of
http://axkit.org/
. Another library to investigate is the file conversion part of Sun's
OpenOffice suite (formerly StarOffice).
For now we should first tackle
AttachmentsUnderRevisionControl.
--
PeterThoeny - 18 Feb 2001
Cool. Hmm. Looks like we'd look for a wvTwiki ;^)
--
MartinCleaver - 26 Mar 2001
wvTWiki: Yes please! 
I've been spent the last four hours "porting" old system docs, many of which are already in html to
TWikiShorthand. I have barely scratched the surface and am begining to wonder if it's really worth it.
Why convert docs which are already in html to wiki you ask? Because raw html is hard to edit in the little edit form, especially if it's polluted with many many gratuitous font tags. This means that even though the old docs can be viewed painlessly in TWiki, they are not likely to get updated, thereby losing the main reason for bringing them in in the first place.
--
MattWilkie - 25 Oct 2001
Quick list of conversion programs:
- xlHtml
- for Excel documents
- wvware
- a library which allows access to Microsoft Word files. It can load and parse Word 2000, 97, 95 and 6 file formats
- catdoc & xls2csv
- convert Word to plain text and Excel to comma separated ascii (csv)
- xpdf
- view/convert pdf to text
As part of changes I made to the
TocPlugin, and some scripts to support it, I am using htmldoc to automatically convert sets of TWiki pages into large PDF files. Importing word documents is time consuming, although I've had fairly good success using "copy" and "paste" to move masses of text into the TWiki edit boxes. Tables are slow (inserting all the vertical bar characters), but the results are good.
An automatic way to import either the html or the rtf into TWiki form would be good, but I'm not waiting for it. There is so much redundant and obscuring markup added by work, that I think one would almost need to take a compiler approach to the problem: build a parse tree, optimize for redundancy, then optimize for "code quality".
--
CarlMikkelsen - 21 May 2003
Isn't there a way to just refer to external documents? I have a project that generates documentation in
HTML using Python's help() function. I don't want to duplicate the logic involved in help() to locate all of the relevant entities in order to extract their docstrings to produce Twiki output. If I could create a Twiki topic that automatically included the body of an external
HTML document, that would work well.
Alternatively, I'd have to convert the
HTML to Twiki format which is, apparently, possible, but how do I get that output into my Twiki? I'd be happy if there was a means to check out, replace, then check in a Twiki topic via some API. That would allow me to use a script to produce the
HTML, convert it to Twiki format, then update the appropriate Twiki topic with the new
HTML.
What have I missed? Surely there's functionality for doing one of those two things, right?
--
RobStewart - 13 Jun 2003
Hi Rob, welcome to twiki.org :- )
IncludeTopicsAndWebPages might work for you. Also there is an interesting sounding plugin,
SlashFilenamePlugin due out anyday now.
--
MattWilkie - 13 Jun 2003
If clickable link is enough,
InterWiki is what I used. If not, you may define custom
TWikiVariables opening IFRAME and closing it.
--
PeterMasiar - 14 Jun 2003
If the attachments are MSWord documents then you can convert them to wiki-text using
wv and the stylesheet I attached to
MsOfficeIntegration.
--
TobyCabot - 28 Jul 2003