HTML to TWiki Converter
This is a simple
HTML to TWiki converter which is useful for converting html pages into TWiki format before these pages are published in a TWiki web. Another functionality is the ablility to find & replace Wiki words.
This application is still in its very basic form may be we could turn this simple application into a very powerful TWiki tool (eg:- I have an idea of incorporating editor functionality)
It contains a user friendly GUI.
To run the application, simply edit the bin/startConverter.bat file(assuming windows platform) to indicate your jdk home & run this file.
You can either open an html file & convert it or simply paste html text in the html pane & then click convert from the run menu.
email:
azeez@beyondmPLEASENOSPAM.net
--
AfkhamAzeez - 30 Jul 2002
Does this replace
HTML tags with TWiki markup (where there are known equivalents)? What else does it do?
--
RandyKramer - 30 Jul 2002
Windows installation note: if you already have java 1.3 or newer runtime installed (e.g. as per the
TouchGraph wiki browser) you don't need to install the SDK. Just create a new batch file which contains (the
-m options seem to be not mandatory):
java -ms16m -mx96m com.hSenid.converter.MainApplication
As for the app itself, this looks like a pretty good start to me. My initial observations from a 15 minute trial:
- Headings and lists are converted to Twiki syntax very well. Likewise for bold and italic (but not strong and emphasis).
- Links need some more work: Twiki already automatically converts
http://something.ca/etc/ so keeping the surrounding a href tags is superfluous. I imagine this is on the todo list.
- Tables are ignored but the contents are converted. This is a good thing for pages which use tables for layout but not so good for tables which actually contain tabular data.
- I would prefer extra whitepspace to be removed for better readability and future refactoring.
- This application works well for html documents which are well structured, not so well otherwise. (well structured = headings are actually headings, not just used to make text bigger, for example).
Would you please post a list of the TWiki syntax rules obeyed and a list of those ignored at the present time? There is no point in testing rules which haven't been implemented yet. : )
--
MattWilkie - 31 Jul 2002
And if you want to run it on Unix, you need to unpack the exe file under windows, go into the lib/ subdirectory, and type
java -classpath HT* com.hSenid.converter.MainApplication
There is a Unix related bug: All references to paths in the source use \\ instead of /, so that the graphics and so on can not be loaded. I think that changing them to / will work without a hitch.
Here's a request: Can you wrap the complete package in a jar file so that we just have to type jave -jar html2twiki.jar ?
Thanks for your work so far
--
WoutMertens - 01 Aug 2002
When I saw this, I thought
this is just what we've been loking for, but I think it's just the opposite: we need something to convert MS Excel sheets to
TML, and the tool above seems to skip tables?
(Maybe this is really a separate TWikiAddOnProduct from the HtmlToTWikiConverter, but this still seems like a good place to put it.)
If you take an Excel sheet and choose
File|Save as HTML, the result is not pretty. It renders just fine but the code is a mess of intricate MS Office-CSS rules. But it
does contain the basic table layout, so if we ignore everything but the table tags themselves, we'd be on track.
I guess if we just make a Perl script that
- discards all code outside <table></table> and then
- converts all <td> and </td> tags to " | ", and
- all <tr> and </tr> to single newlines, and finally
- removes all other tags
then we'd have what we needed, at least that would be a rough pseudo-code for it. The output would be
TML containing only table cells and values, nothing else. Of course something like that would find problems when there's line breaks inside the <td>'s, but for Excel we can be pretty sure that's not the case.
--
TorbenGB - 07 Aug 2002
If your data doesn't have any commas in it, you can export from Excel in csv format, then translate , to |
add a | at the beginning and end and then paste into twiki. Easier that trying to junk the html IMHO.
If you do have ',''s in the spreadsheet, then it gets trickier.
--
JohnRouillard - 07 Aug 2002
John is correct: as long as there are no commas in the data. Unfortunately that's rather US-centric, because many countries use the comma as a decimal separator and the period as a thousands separator - reverse of how US has it. That makes csv problematic unless you go through maneuvers to change the default delimiter used in the csv file.
--
TorbenGB - 08 Aug 2002
In
CSV format, if there is a comma in data, then quotes are put around the data (i.e.,
12,2 becomes
"12,2"). If there are quotes, then double quotes are added (i.e.,
"abc" becomes
"""abc"""). Seems easy enough to deal with...
--
ThomasWeigert - 08 Aug 2002
You are right, that seems simple. So maybe this can be added to the TWikiConverter discussed above? Then we could also use it for spreadsheets and other table-based files.
--
TorbenGB - 09 Aug 2002
HTML-TWiki Converter, enhanced version with JTidy
integration -
This is based on version 1.0, developed by
AfkhamAzeez, that offers some modifications and enhancements:
- packaged as jar file for easy handling: just type 'java -jar H2TConverter.jar'
- integration of HTML Tidy (JTidy
)
- enables clean-up of HTML code prior to conversion to TWiki syntax (does improve conversion results in many cases)
- makes a fine standalone tool for HTML cleanup, even if you don't want to convert to TWiki
- for JTidy source licence and notes please refer to the READMEs provided in the .zip archive...
- slightly modified and rearranged menu items and toolbar buttons
- added keyboard shortcuts
- added (preliminary) support for
| and tags...
My focus was clearly on JTidy integration, as I've often seen much better conversion results with previously cleaned-up text. I'm planning to add some more additions in the future:
- Add a preference dialog for JTidy options...
- Better conversion of tables...
Another option would be to wrap up the converter code as a
JEdit
plugin, as JEdit provides a much nicer and more complete environment for editing... (well, of course)
--
SaschaLosko - 27 Apr 2003
FYI, the
PowerEditPlugin java applet contains a built-in
HTML to TWiki converter.
--
CrawfordCurrie - 29 Apr 2003
Thanks for the info. I do know about the
PowerEditPlugin, I'm actually excited about it. Two issues:
- Text encoding of tags is incorrect. Supposedly you have fixed it, but there was no new release, yet.
- I'ld like to have JTidy integration.
--
SaschaLosko - 30 Apr 2003
Ah, Sascha, I'm sorry. Too many irons in the fire. I'll try and make the release this weekend, but improvements to the
ActionTrackerPlugin take precedence.
--
CrawfordCurrie - 01 May 2003
Simple HTML to Twiki convertor Perl script by
FrankHarrell and Jeff Horner
We wrote this to have a simple command line-based convertor. It does most of the work but you need to manually add multiples of three spaces for nested bullet lists, etc.
--
FrankHarrell - 31 Jan 2004
HTML to TWiki XSLT
This is an
XSLT
file that translates basic
HTML code to Twiki markup. At the moment of writing not everything is supported. Everything that is not supported will be returned as is.
--
MichielHendriks - 21 Jul 2006
Nice. Could you possibly package this as a
HtmlToTWikiXsltAddOn package? See more at
AddOnPackageHowTo.
--
PeterThoeny - 21 Jul 2006