Tags:
create new tag
, view all tags

HTML to TWiki Converter

This is a simple HTML to TWiki converter which is useful for converting html pages into TWiki format before these pages are published in a TWiki web. Another functionality is the ablility to find & replace Wiki words.

This application is still in its very basic form may be we could turn this simple application into a very powerful TWiki tool (eg:- I have an idea of incorporating editor functionality)

It contains a user friendly GUI.

To run the application, simply edit the bin/startConverter.bat file(assuming windows platform) to indicate your jdk home & run this file.

You can either open an html file & convert it or simply paste html text in the html pane & then click convert from the run menu.

email: azeez@beyondmPLEASENOSPAM.net

-- AfkhamAzeez - 30 Jul 2002

Does this replace HTML tags with TWiki markup (where there are known equivalents)? What else does it do?

-- RandyKramer - 30 Jul 2002

Windows installation note: if you already have java 1.3 or newer runtime installed (e.g. as per the TouchGraph wiki browser) you don't need to install the SDK. Just create a new batch file which contains (the -m options seem to be not mandatory):

java -ms16m -mx96m com.hSenid.converter.MainApplication

As for the app itself, this looks like a pretty good start to me. My initial observations from a 15 minute trial:

  • Headings and lists are converted to Twiki syntax very well. Likewise for bold and italic (but not strong and emphasis).
  • Links need some more work: Twiki already automatically converts http://something.ca/etc/ so keeping the surrounding a href tags is superfluous. I imagine this is on the todo list.
  • Tables are ignored but the contents are converted. This is a good thing for pages which use tables for layout but not so good for tables which actually contain tabular data.
  • I would prefer extra whitepspace to be removed for better readability and future refactoring.
  • This application works well for html documents which are well structured, not so well otherwise. (well structured = headings are actually headings, not just used to make text bigger, for example).

Would you please post a list of the TWiki syntax rules obeyed and a list of those ignored at the present time? There is no point in testing rules which haven't been implemented yet. : )

-- MattWilkie - 31 Jul 2002

And if you want to run it on Unix, you need to unpack the exe file under windows, go into the lib/ subdirectory, and type

   java -classpath HT* com.hSenid.converter.MainApplication

There is a Unix related bug: All references to paths in the source use \\ instead of /, so that the graphics and so on can not be loaded. I think that changing them to / will work without a hitch.

Here's a request: Can you wrap the complete package in a jar file so that we just have to type jave -jar html2twiki.jar ?

Thanks for your work so far smile

-- WoutMertens - 01 Aug 2002

When I saw this, I thought this is just what we've been loking for, but I think it's just the opposite: we need something to convert MS Excel sheets to TML, and the tool above seems to skip tables?

(Maybe this is really a separate TWikiAddOnProduct from the HtmlToTWikiConverter, but this still seems like a good place to put it.)

If you take an Excel sheet and choose File|Save as HTML, the result is not pretty. It renders just fine but the code is a mess of intricate MS Office-CSS rules. But it does contain the basic table layout, so if we ignore everything but the table tags themselves, we'd be on track.

I guess if we just make a Perl script that

  • discards all code outside <table></table> and then
  • converts all <td> and </td> tags to " | ", and
  • all <tr> and </tr> to single newlines, and finally
  • removes all other tags
then we'd have what we needed, at least that would be a rough pseudo-code for it. The output would be TML containing only table cells and values, nothing else. Of course something like that would find problems when there's line breaks inside the <td>'s, but for Excel we can be pretty sure that's not the case.

-- TorbenGB - 07 Aug 2002

If your data doesn't have any commas in it, you can export from Excel in csv format, then translate , to | add a | at the beginning and end and then paste into twiki. Easier that trying to junk the html IMHO. If you do have ',''s in the spreadsheet, then it gets trickier.

-- JohnRouillard - 07 Aug 2002

John is correct: as long as there are no commas in the data. Unfortunately that's rather US-centric, because many countries use the comma as a decimal separator and the period as a thousands separator - reverse of how US has it. That makes csv problematic unless you go through maneuvers to change the default delimiter used in the csv file.

-- TorbenGB - 08 Aug 2002

In CSV format, if there is a comma in data, then quotes are put around the data (i.e., 12,2 becomes "12,2"). If there are quotes, then double quotes are added (i.e., "abc" becomes """abc"""). Seems easy enough to deal with...

-- ThomasWeigert - 08 Aug 2002

You are right, that seems simple. So maybe this can be added to the TWikiConverter discussed above? Then we could also use it for spreadsheets and other table-based files.

-- TorbenGB - 09 Aug 2002

HTML-TWiki Converter, enhanced version with JTidy integration -

This is based on version 1.0, developed by AfkhamAzeez, that offers some modifications and enhancements:

  • packaged as jar file for easy handling: just type 'java -jar H2TConverter.jar'
  • integration of HTML Tidy (JTidy)
    • enables clean-up of HTML code prior to conversion to TWiki syntax (does improve conversion results in many cases)
    • makes a fine standalone tool for HTML cleanup, even if you don't want to convert to TWiki
    • for JTidy source licence and notes please refer to the READMEs provided in the .zip archive...
  • slightly modified and rearranged menu items and toolbar buttons
  • added keyboard shortcuts
  • added (preliminary) support for and tags...

My focus was clearly on JTidy integration, as I've often seen much better conversion results with previously cleaned-up text. I'm planning to add some more additions in the future:

  • Add a preference dialog for JTidy options...
  • Better conversion of tables...

Another option would be to wrap up the converter code as a JEdit plugin, as JEdit provides a much nicer and more complete environment for editing... (well, of course)

-- SaschaLosko - 27 Apr 2003

FYI, the PowerEditPlugin java applet contains a built-in HTML to TWiki converter.

-- CrawfordCurrie - 29 Apr 2003

Thanks for the info. I do know about the PowerEditPlugin, I'm actually excited about it. Two issues:

  • Text encoding of tags is incorrect. Supposedly you have fixed it, but there was no new release, yet.
  • I'ld like to have JTidy integration.

-- SaschaLosko - 30 Apr 2003

Ah, Sascha, I'm sorry. Too many irons in the fire. I'll try and make the release this weekend, but improvements to the ActionTrackerPlugin take precedence.

-- CrawfordCurrie - 01 May 2003

Simple HTML to Twiki convertor Perl script by FrankHarrell and Jeff Horner

We wrote this to have a simple command line-based convertor. It does most of the work but you need to manually add multiples of three spaces for nested bullet lists, etc.

-- FrankHarrell - 31 Jan 2004

HTML to TWiki XSLT

This is an XSLT file that translates basic HTML code to Twiki markup. At the moment of writing not everything is supported. Everything that is not supported will be returned as is.

-- MichielHendriks - 21 Jul 2006

Nice. Could you possibly package this as a HtmlToTWikiXsltAddOn package? See more at AddOnPackageHowTo.

-- PeterThoeny - 21 Jul 2006

Topic attachments
I Attachment History Action Size Date Who Comment
Texttxt html2twiki.pl.txt r1 manage 1.6 K 2004-01-31 - 14:35 FrankHarrell Simple html to twiki conversion perl script
Edit | Attach | Watch | Print version | History: r23 < r22 < r21 < r20 < r19 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r23 - 2007-03-15 - TroyGoodson
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.