Tags:
create new tag
view all tags

SID-01843: CompleteFailure of MediaWikiToTWikiAddOn to process XML dump

Status: Unanswered Unanswered TWiki version: 5.1.0 Perl version: 5.8.8
Category: MediaWikiToTWikiAddOn Server OS: SLC5 & RHEL5 Last update: 10 years ago

I am attempting to convert a Mediawiki xml dump to TWiki but the conversion fails immediately -- on the first few lines of the xml dump.

My command and error are,

[root@ws twiki]# ( cd /disc2/www/twiki/bin && /disc2/www/twiki/tools/mediawiki2twiki --dry --debug --max 10 --file /disc2/tcrane/wikibackup.xml -web MediaWiki --images /disc2/tcrane/mediawiki/images/ ) DEBUG: opening /disc2/tcrane/wikibackup.xml

not well-formed (invalid token) at line 11, column 74, byte 715 at /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/XML/Parser/Expat.pm line 616 XML::Parser::ExpatNB::parse_more('XML::Parser::ExpatNB=HASH(0x831c1c0)', '<?xml version="1.0"?> <mysqldump xmlns:xsi="http://www.w3.org...') called at /disc2/www/twiki/lib/CPAN/lib//Parse/MediaWikiDump.pm line 227 Parse::MediaWikiDump::Pages::parse_more('Parse::MediaWikiDump::Pages=HASH(0x873d700)') called at /disc2/www/twiki/lib/CPAN/lib//Parse/MediaWikiDump.pm line 196 Parse::MediaWikiDump::Pages::init('Parse::MediaWikiDump::Pages=HASH(0x873d700)') called at /disc2/www/twiki/lib/CPAN/lib//Parse/MediaWikiDump.pm line 43 Parse::MediaWikiDump::Pages::new('Parse::MediaWikiDump::Pages', '/disc2/tcrane/wikibackup.xml') called at /disc2/www/twiki/lib/TWiki/Contrib/MediaWikiToTWikiAddOn/Converter.pm line 112 TWiki::Contrib::MediaWikiToTWikiAddOn::Converter::new('TWiki::Contrib::MediaWikiToTWikiAddOn::Converter', 'webMapString', '', 'plugin', '', 'topicMapString', '', 'maxPages', 10, ...) called at /disc2/www/twiki/lib/TWiki/Contrib/MediaWikiToTWikiAddOn.pm line 84 TWiki::Contrib::MediaWikiToTWikiAddOn::main() called at /disc2/www/twiki/tools/mediawiki2twiki line 81 at /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/XML/Parser/Expat.pm line 616 XML::Parser::ExpatNB::parse_more('XML::Parser::ExpatNB=HASH(0x831c1c0)', '<?xml version="1.0"?> <mysqldump xmlns:xsi="http://www.w3.org...') called at /disc2/www/twiki/lib/CPAN/lib//Parse/MediaWikiDump.pm line 227 Parse::MediaWikiDump::Pages::parse_more('Parse::MediaWikiDump::Pages=HASH(0x873d700)') called at /disc2/www/twiki/lib/CPAN/lib//Parse/MediaWikiDump.pm line 196 Parse::MediaWikiDump::Pages::init('Parse::MediaWikiDump::Pages=HASH(0x873d700)') called at /disc2/www/twiki/lib/CPAN/lib//Parse/MediaWikiDump.pm line 43 Parse::MediaWikiDump::Pages::new('Parse::MediaWikiDump::Pages', '/disc2/tcrane/wikibackup.xml') called at /disc2/www/twiki/lib/TWiki/Contrib/MediaWikiToTWikiAddOn/Converter.pm line 112 TWiki::Contrib::MediaWikiToTWikiAddOn::Converter::new('TWiki::Contrib::MediaWikiToTWikiAddOn::Converter', 'webMapString', '', 'plugin', '', 'topicMapString', '', 'maxPages', 10, ...) called at /disc2/www/twiki/lib/TWiki/Contrib/MediaWikiToTWikiAddOn.pm line 84 TWiki::Contrib::MediaWikiToTWikiAddOn::main() called at /disc2/www/twiki/tools/mediawiki2twiki line 81

The first few lines of the xml dump are,

<?xml version="1.0"?> <mysqldump xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <database name="gregoire"> <table_structure name="archive"> <field Field="ar_namespace" Type="int(11)" Null="NO" Key="MUL" Default="0" Extra="" /> <field Field="ar_title" Type="varchar(255)" Null="NO" Key="" Default="" Extra="" /> <field Field="ar_text" Type="mediumblob" Null="NO" Key="" Extra="" /> <field Field="ar_comment" Type="tinyblob" Null="NO" Key="" Extra="" /> <field Field="ar_user" Type="int(10) unsigned" Null="NO" Key="" Default="0" Extra="" /> <field Field="ar_user_text" Type="varchar(255)" Null="NO" Key="MUL" Extra="" /> <field Field="ar_timestamp" Type="binary(14)" Null="NO" Key="" Default="" Extra="" /> <field Field="ar_minor_edit" Type="tinyint(4)" Null="NO" Key="" Default="0" Extra="" /> <field Field="ar_flags" Type="tinyblob" Null="NO" Key="" Extra="" /> <field Field="ar_rev_id" Type="int(10) unsigned" Null="YES" Key="" Extra="" /> <field Field="ar_text_id" Type="int(10) unsigned" Null="YES" Key="" Extra="" /> <field Field="ar_deleted" Type="tinyint(3) unsigned" Null="NO" Key="" Default="0" Extra="" /> <field Field="ar_len" Type="int(10) unsigned" Null="YES" Key="" Extra="" /> <field Field="ar_page_id" Type="int(10) unsigned" Null="YES" Key="" Extra="" /> <key Table="archive" Non_unique="1" Key_name="name_title_timestamp" Seq_in_index="1" Column_name="ar_namespace" Collation="A" Cardinality="1" Null="" Index_type="BTREE" Comment="" /> <key Table="archive" Non_unique="1" Key_name="name_title_timestamp" Seq_in_index="2" Column_name="ar_title" Collation="A" Cardinality="1" Null="" Index_type="BTREE" Comment="" /> <key Table="archive" Non_unique="1" Key_name="name_title_timestamp" Seq_in_index="3" Column_name="ar_timestamp" Collation="A" Cardinality="1" Null="" Index_type="BTREE" Comment="" /> <key Table="archive" Non_unique="1" Key_name="usertext_timestamp" Seq_in_index="1" Column_name="ar_user_text" Collation="A" Cardinality="1" Null="" Index_type="BTREE" Comment="" /> <key Table="archive" Non_unique="1" Key_name="usertext_timestamp" Seq_in_index="2" Column_name="ar_timestamp" Collation="A" Cardinality="1" Null="" Index_type="BTREE" Comment="" /> <options Name="archive" Engine="InnoDB" Version="10" Row_format="Compact" Rows="1" Avg_row_length="16384" Data_length="16384" Max_data_length="0" Index_length="32768" Data_free="0" Create_time="2013-12-19 12:34:13" Collation="latin1_swedish_ci" Create_options="" Comment="InnoDB free: 10240 kB" /> </table_structure>

I have tried removing the first 3 tag lines and their corresponding </> tags at the end of the XML dumpfile but without improvement. My version of XML::Parser::Expat is 2.34. I don't know the version of the mediawiki which produced the dump. I was handed an sql dump to convert to xml. Any ideas? Does anyone have a mediawiki XML dump known to work with MediaWikiToTWikiAddOn that I could try and compare with?

Thanks Tom Crane

-- Tom Crane - 2013-12-19

Discussion and Answer

-- Tom Crane - 2013-12-19

I have attached files of the first few lines of the XML dump and the command + error message. An earlier copy&paste removed the line feeds, making the text unreadable!

-- Tom Crane - 2013-12-19

There seems to be a problem with the XML dump. Opening the file in vi shows Default="^@^@^@^@^@^@^@^@^@^@^@^@^@^@" on line 11, which is a bunch of null characters.

-- Peter Thoeny - 2013-12-19

Hi Peter, Thanks for pointing that out. Curiously even using 'mysqldump --hex-blob --compatible=ansi --xml' does not remove those ASCII NULs although it did remove a huge amount of binary and top-bit-set characters further down the dump. Substituting "0" characters for those ASCII NULs in emacs fixed that problem. Now I get further but fail with,

[Thu Dec 19 20:16:10 2013] mediawiki2twiki: buffer length of 10010 exceeds 10000 at /disc2/www/twiki/lib/CPAN/lib//Parse/MediaWikiDump.pm line 232.

I tried progressively upping the value of BUF_LIMIT in lib/CPAN/lib/Parse/MediaWikiDump.pm to 100000 which enables the script to run for many seconds before failing with,

[Thu Dec 19 20:11:54 2013] mediawiki2twiki: could not init at /disc2/www/twiki/lib/CPAN/lib//Parse/MediaWikiDump.pm line 197.

Any more ideas? Thanks Tom

-- Tom Crane - 2013-12-19

Not sure. Maybe reading the source helps? https://metacpan.org/source/TRIDDLE/Parse-MediaWikiDump-0.2/lib/Parse/MediaWikiDump.pm

-- Peter Thoeny - 2013-12-19

Did anyoune resolve this problem?

-- Kamil Gee - 2014-09-04

No. In the end I gave up and manually edited the few pages of MediaWiki ML I had into TWiki ML.

Sorry Tom.

-- Tom Crane - 2014-09-04

I bet other people like you and me will stumble upon this problem. I've been doing the same thing except that I'm importing in Foswiki. So, I used mysqldump to export everything and got the same error:

could not init at /www/foswiki/lib/CPAN/lib//Parse/MediaWikiDump.pm line 196.

This error means that your backup cannot be parsed. It seems that you can't import DATABASES with that function. The xml file you created can only be imported by databases because it is a database export! If you only want to import text in your TWiki or Foswiki, you might as well use this function: https://www.mediawiki.org/wiki/Manual:DumpBackup.php

-- TWiki Guest - 2015-09-14

Closing this question after more than 30 days of inactivity. Feel free to reopen if needed. Consider engaging one of the TWiki consultants if you need timely help. We invite you to get involved with the community, it is more likely you get community support if you support the open source project!

-- Peter Thoeny - 2015-12-03

      Change status to:
ALERT! If you answer a question - or someone answered one of your questions - please remember to edit the page and set the status to answered. The status selector is below the edit box.
SupportForm
Status Unanswered
Title CompleteFailure of MediaWikiToTWikiAddOn to process XML dump
SupportCategory MediaWikiToTWikiAddOn
TWiki version 5.1.0
Server OS SLC5 & RHEL5
Web server Apache/2.2.3
Perl version 5.8.8
Browser & version N/A
Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatlog mediawiki2twiki_error.log r1 manage 2.7 K 2013-12-19 - 18:40 UnknownUser Command and error message (earlier copy&paste to TWiki removed linefeeds)
XMLxml mediawiki2twiki_wikibackup-25.xml r1 manage 2.5 K 2013-12-19 - 18:38 UnknownUser First few lines of mediawiki xml dump file (earlier copy&paste to TWiki removed linefeeds)
Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r8 - 2015-12-03 - PeterThoeny
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.