MS Word to TWiki Markup Language Add-On
When a migration to TWiki is planned, but many documents already exist in MS Word .doc format, the migration process can be a pain to do manually. Also casual contributers like to use their favorite text editor, which is in many cases MS Word. Several plugins have attempted to overcome this problem, the most well-known is TWiki:Plugins.MsOfficeAttachmentsAsHTMLPlugin
. However, in this way the documents are not editable in the wiki fashion, i.e. 'click-edit-save'.
This simple VBA script can convert a .doc to TWiki:Codev.TWikiML
. It's far from complete, but handles the basics.
- User edits Word file
- User saves Word file for future reference
- User clicks
Tools-Macros > Word2TWiki
- Text in Word file is converted to TWikiML, and is copied to the clipboard
- Document is saved as filtered htm (additional document besides your previously saved word document)
- Inline images are collected in folder ActiveDocumentPath\YourFileName_files
- Explorer opens the images folder
- User pastes data into TWiki and saves topic
Ctrl + V or
Edit -> Paste
- User uploads all images from the folder. (Recommended: use TWiki:Plugins.BatchUploadPlugin to simply upload all the images as a zip file that gets unzipped by the plugin on the topic)
: it sometimes crashes on MS Word tables, and then leaves an orphaned Excel process running. At the point where the conversion stopped, use the MS Word menu, 'Table' > 'Convert' > 'Table to text...', and re-run the conversion (or simply convert tables to text before running the macro).
Currently, it handles conversions for the following:
- Headings (1-6)
- Typewriter font (Courier New)
- Bold (also combined with Italics, Underline, Typewriter font)
- Text color (see "Colored text" in TWikiPreferences#Rendering_Shortcuts)
- (Nested) Bullet lists
- (Nested) Numbered lists (No fancy styles or continuations)
- %BR% is added before linebreaks (paragraph breaks are not touched)
- Regular Tables, with or without merged cells, rowspans and colspans. Table in table is also handled
- Inline images fully handled
All other objects or features are left untouched.
It would be really nice if somebody could try to improve this Add-On to:
| Detect if a TOC is present in the .doc and replace it with %TOC%
|| Not yet implemented
| Fix the Known Problems
- Has been known to cause Word to hang altogether (at least partly fixed in version 1.1, please provide feedback and test cases (word docs) if you still notice this)
- If (lines in) a Table cell is bold or italic, but doesn't actually contain any text, this macro still inserts = = or _ _.
- Numeric bullet lists inside Table cells. All the numbers get reset to "1"
- Word drawings are save as images by word save as web page function. As good as MS word can offer! To solve the problem, user needs to convert drawing to image.
- It won't keep right formatted images with text on the left.
- Single paragraph breaks still exist in the converted TWiki source but disappear when TWiki renders the topic. Double (or more) paragraph breaks create a paragraph break in TWiki. A linebreak in word [Shift-Enter] get an additional %BR% during conversion.
- Super and sub scripts are not converted.
- Greek letters are not converted.
- Converted text requires quite bit time to edit and correct.
Add-On Installation Instructions
Contrary to many TWiki Add Ons this is not installed on the server but is a macro to be installed in MS Word.
- Download the .BAS file from the Add-on Home (see below)
- If you have problems downloading the .BAS file, try the .ZIP version.
- Launch Microsoft Word, go to
Tools | Macro | Visual Basic Editor (Alt+F11)
- right mouse button on the Normal project (within the Project Explorer window - if you don't see that window, go
View | Project Explorer (Ctrl-R), do an
Insert | Module
context menu or
File menu to select
Import file... and pick the downloaded .BAS file.
File | Save Normal, then
File | Close and Return to Microsoft Word
- Or create a icon on the toolbar to direct access this this macro.
Please note that from version 1.400, this macro requires MS Excel in order to handle merged table cells. Since most people/corporations who own Word also own Excel, this decision was believed to be acceptable. If you do not own MS Excel but do want to use this macro, replace the ConvertTable() subfunction with the function in version 1.310 (the Add On will then fail on encountering tables with merged cells).
If your version of Word is older than MS Word 2007, uncomment the first line in the macro, so that it reads:
Attribute VB_Name = "Module1"
- Set SHORTDESCRIPTION = Visual Basic script to convert a Microsoft Word documents to the TWiki markup language
| 24 Jun 2009:
|| v1.5: TWiki:Main.CharlieMao - Fix image problem: a) No image missing, b) No miss placement in table cell, c) table can have colspan and rowspan.
| 23 Jun 2009:
|| v1.486: TWiki:Main.CharlieMao - the excel object is created and destroyed once only to speed-up process. Add "Dim I as integer" to some subs to make it local.
| 21 Oct 2008:
|| v1.485: TWiki:Main.SeanCMorgan - added blank line before headings, to break up the raw view for easier reading.
| 15 Oct 2008:
|| v1.484: TWiki:Main.SeanCMorgan - added
alt attribute to
img tag, to provide file name on mouseover
| 07 Oct 2008:
|| v1.483: TWiki:Main.SeanCMorgan - added conversion of
< (else anything in angle brackets simply disappeared)
| 06 Aug 2007:
|| v1.482: TWiki:Main/AlexanderStedile Bug fixes, uppercase and spacing issues.
| 26 Jul 2007:
|| v1.481: TWiki:Main/AlexanderStedile fixed converting hyper links by changing conversion step execution order.
| 25 Jul 2007:
|| v1.480: TWiki:Main/AlexanderStedile added converting text color (named Word/TWiki colors), added converting linebreaks (paragraphs are not touched).
| 23 Jul 2007:
|| v1.470: TWiki:Main/AlexanderStedile added conversion for underline, typewriter font, combinations with bold. Refactored and removed some copy&paste code.
| 26 Apr 2007:
|| v1.460: TWiki:Main/DougClaar merged the bugfixes of 1.4.4 back in. They got lost in 1.4.5
| 25 Feb 2007:
|| v1.450: Added support for inline images by saving the document as htm which collects the images in a folder and fiurther modifies the links to them
| 26 May 2006:
|| v1.440: Removed the bug in which macro hangs while converting the links and also the order of text and address is corrected. (thanks Touseef)
| 06 Apr 2006:
|| v1.430: handle nested lists even better (plus small bugfix) (thanks Pablo)
| 19 Sep 2005:
|| v1.410: More robust and elegant handling of merged cells (thanks Merlijn)
| 18 Sep 2005:
|| v1.400: Tables with merged cells are supported (thanks Merlijn)
| 22 Aug 2005:
|| v1.310: Small bugfix, removed double variable declaration in sub ConvertLists.
| 22 Aug 2005:
|| v1.300: Correct conversion of nested bullet- and numbered lists. (thanks Mikael)
| 06 Aug 2005:
|| v1.200: Better conversion of bold and italic formatting by correct handling of trailing and leading formatted spaces.
| 05 Aug 2005:
|| v1.100: Fixes bug where Word hangs if formatting (bold/italics) is applied to the paragraph mark at the end of a line that is contained in a bullet-list.
| 08 Jul 2005:
|| v1.000: Initial version
| CPAN Dependencies:
| Other Dependencies:
|| Requires MS Word and MS Excel (http://www.microsoft.com)
| Perl Version:
| Add-on Home:
Version 1.1 of this Add On was more than heavily based on / directly copied from:
Version 1.4.5 handling of inline images is based on a lightly modified code from
Related Topic: TWikiAddOns
- 09 Jul 2005