SID-01843: CompleteFailure of MediaWikiToTWikiAddOn to process XML dump
| Status: |
Unanswered |
TWiki version: |
5.1.0 |
Perl version: |
5.8.8 |
| Category: |
MediaWikiToTWikiAddOn |
Server OS: |
SLC5 & RHEL5 |
Last update: |
10 years ago |
I am attempting to convert a Mediawiki xml dump to TWiki but the conversion fails immediately -- on the first few lines of the xml dump.
My command and error are,
[root@ws twiki]# ( cd /disc2/www/twiki/bin && /disc2/www/twiki/tools/mediawiki2twiki --dry --debug --max 10 --file /disc2/tcrane/wikibackup.xml -web
MediaWiki --images /disc2/tcrane/mediawiki/images/ )
DEBUG: opening /disc2/tcrane/wikibackup.xml
not well-formed (invalid token) at line 11, column 74, byte 715 at /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/XML/Parser/Expat.pm line 616
XML::Parser::ExpatNB::parse_more('XML::Parser::ExpatNB=HASH(0x831c1c0)', '<?xml version="1.0"?>
<mysqldump xmlns:xsi="http://www.w3.org...') called at /disc2/www/twiki/lib/CPAN/lib//Parse/MediaWikiDump.pm line 227
Parse::MediaWikiDump::Pages::parse_more('Parse::MediaWikiDump::Pages=HASH(0x873d700)') called at /disc2/www/twiki/lib/CPAN/lib//Parse/MediaWikiDump.pm line 196
Parse::MediaWikiDump::Pages::init('Parse::MediaWikiDump::Pages=HASH(0x873d700)') called at /disc2/www/twiki/lib/CPAN/lib//Parse/MediaWikiDump.pm line 43
Parse::MediaWikiDump::Pages::new('Parse::MediaWikiDump::Pages', '/disc2/tcrane/wikibackup.xml') called at /disc2/www/twiki/lib/TWiki/Contrib/MediaWikiToTWikiAddOn/Converter.pm line 112
TWiki::Contrib::MediaWikiToTWikiAddOn::Converter::new('TWiki::Contrib::MediaWikiToTWikiAddOn::Converter', 'webMapString', '', 'plugin', '', 'topicMapString', '', 'maxPages', 10, ...) called at /disc2/www/twiki/lib/TWiki/Contrib/MediaWikiToTWikiAddOn.pm line 84
TWiki::Contrib::MediaWikiToTWikiAddOn::main() called at /disc2/www/twiki/tools/mediawiki2twiki line 81
at /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/XML/Parser/Expat.pm line 616
XML::Parser::ExpatNB::parse_more('XML::Parser::ExpatNB=HASH(0x831c1c0)', '<?xml version="1.0"?>
<mysqldump xmlns:xsi="http://www.w3.org...') called at /disc2/www/twiki/lib/CPAN/lib//Parse/MediaWikiDump.pm line 227
Parse::MediaWikiDump::Pages::parse_more('Parse::MediaWikiDump::Pages=HASH(0x873d700)') called at /disc2/www/twiki/lib/CPAN/lib//Parse/MediaWikiDump.pm line 196
Parse::MediaWikiDump::Pages::init('Parse::MediaWikiDump::Pages=HASH(0x873d700)') called at /disc2/www/twiki/lib/CPAN/lib//Parse/MediaWikiDump.pm line 43
Parse::MediaWikiDump::Pages::new('Parse::MediaWikiDump::Pages', '/disc2/tcrane/wikibackup.xml') called at /disc2/www/twiki/lib/TWiki/Contrib/MediaWikiToTWikiAddOn/Converter.pm line 112
TWiki::Contrib::MediaWikiToTWikiAddOn::Converter::new('TWiki::Contrib::MediaWikiToTWikiAddOn::Converter', 'webMapString', '', 'plugin', '', 'topicMapString', '', 'maxPages', 10, ...) called at /disc2/www/twiki/lib/TWiki/Contrib/MediaWikiToTWikiAddOn.pm line 84
TWiki::Contrib::MediaWikiToTWikiAddOn::main() called at /disc2/www/twiki/tools/mediawiki2twiki line 81
The first few lines of the xml dump are,
<?xml version="1.0"?>
<mysqldump xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<database name="gregoire">
<table_structure name="archive">
<field Field="ar_namespace" Type="int(11)" Null="NO" Key="MUL" Default="0" Extra="" />
<field Field="ar_title" Type="varchar(255)" Null="NO" Key="" Default="" Extra="" />
<field Field="ar_text" Type="mediumblob" Null="NO" Key="" Extra="" />
<field Field="ar_comment" Type="tinyblob" Null="NO" Key="" Extra="" />
<field Field="ar_user" Type="int(10) unsigned" Null="NO" Key="" Default="0" Extra="" />
<field Field="ar_user_text" Type="varchar(255)" Null="NO" Key="MUL" Extra="" />
<field Field="ar_timestamp" Type="binary(14)" Null="NO" Key="" Default="" Extra="" />
<field Field="ar_minor_edit" Type="tinyint(4)" Null="NO" Key="" Default="0" Extra="" />
<field Field="ar_flags" Type="tinyblob" Null="NO" Key="" Extra="" />
<field Field="ar_rev_id" Type="int(10) unsigned" Null="YES" Key="" Extra="" />
<field Field="ar_text_id" Type="int(10) unsigned" Null="YES" Key="" Extra="" />
<field Field="ar_deleted" Type="tinyint(3) unsigned" Null="NO" Key="" Default="0" Extra="" />
<field Field="ar_len" Type="int(10) unsigned" Null="YES" Key="" Extra="" />
<field Field="ar_page_id" Type="int(10) unsigned" Null="YES" Key="" Extra="" />
<key Table="archive" Non_unique="1" Key_name="name_title_timestamp" Seq_in_index="1" Column_name="ar_namespace" Collation="A" Cardinality="1" Null="" Index_type="BTREE" Comment="" />
<key Table="archive" Non_unique="1" Key_name="name_title_timestamp" Seq_in_index="2" Column_name="ar_title" Collation="A" Cardinality="1" Null="" Index_type="BTREE" Comment="" />
<key Table="archive" Non_unique="1" Key_name="name_title_timestamp" Seq_in_index="3" Column_name="ar_timestamp" Collation="A" Cardinality="1" Null="" Index_type="BTREE" Comment="" />
<key Table="archive" Non_unique="1" Key_name="usertext_timestamp" Seq_in_index="1" Column_name="ar_user_text" Collation="A" Cardinality="1" Null="" Index_type="BTREE" Comment="" />
<key Table="archive" Non_unique="1" Key_name="usertext_timestamp" Seq_in_index="2" Column_name="ar_timestamp" Collation="A" Cardinality="1" Null="" Index_type="BTREE" Comment="" />
<options Name="archive" Engine="InnoDB" Version="10" Row_format="Compact" Rows="1" Avg_row_length="16384" Data_length="16384" Max_data_length="0" Index_length="32768" Data_free="0" Create_time="2013-12-19 12:34:13" Collation="latin1_swedish_ci" Create_options="" Comment="InnoDB free: 10240 kB" />
</table_structure>
I have tried removing the first 3 tag lines and their corresponding </> tags at the end of the XML dumpfile but without improvement. My version of XML::Parser::Expat is 2.34. I don't know the version of the mediawiki which produced the dump. I was handed an sql dump to convert to xml. Any ideas? Does anyone have a mediawiki XML dump known to work with
MediaWikiToTWikiAddOn that I could try and compare with?
Thanks
Tom Crane
--
Tom Crane - 2013-12-19
Discussion and Answer
--
Tom Crane - 2013-12-19
I have attached files of the first few lines of the XML dump and the command + error message. An earlier copy&paste removed the line feeds, making the text unreadable!
--
Tom Crane - 2013-12-19
There seems to be a problem with the XML dump. Opening the file in vi shows
Default="^@^@^@^@^@^@^@^@^@^@^@^@^@^@" on line 11, which is a bunch of null characters.
--
Peter Thoeny - 2013-12-19
Hi Peter,
Thanks for pointing that out. Curiously even using 'mysqldump --hex-blob --compatible=ansi --xml' does not remove those ASCII NULs although it did remove a huge amount of binary and top-bit-set characters further down the dump. Substituting "0" characters for those ASCII NULs in emacs fixed that problem. Now I get further but fail with,
[Thu Dec 19 20:16:10 2013] mediawiki2twiki: buffer length of 10010 exceeds 10000 at /disc2/www/twiki/lib/CPAN/lib//Parse/MediaWikiDump.pm line 232.
I tried progressively upping the value of BUF_LIMIT in lib/CPAN/lib/Parse/MediaWikiDump.pm to 100000 which enables the script to run for many seconds before failing with,
[Thu Dec 19 20:11:54 2013] mediawiki2twiki: could not init at /disc2/www/twiki/lib/CPAN/lib//Parse/MediaWikiDump.pm line 197.
Any more ideas?
Thanks
Tom
--
Tom Crane - 2013-12-19
Not sure. Maybe reading the source helps?
https://metacpan.org/source/TRIDDLE/Parse-MediaWikiDump-0.2/lib/Parse/MediaWikiDump.pm
--
Peter Thoeny - 2013-12-19
Did anyoune resolve this problem?
--
Kamil Gee - 2014-09-04
No. In the end I gave up and manually edited the few pages of MediaWiki ML I had into TWiki ML.
Sorry
Tom.
--
Tom Crane - 2014-09-04
I bet other people like you and me will stumble upon this problem. I've been doing the same thing except that I'm importing in Foswiki. So, I used mysqldump to export everything and got the same error:
could not init at /www/foswiki/lib/CPAN/lib//Parse/MediaWikiDump.pm line 196.
This error means that your backup cannot be parsed. It seems that you can't import
DATABASES with that function. The xml file you created can only be imported by databases because it is a database export! If you only want to import text in your TWiki or Foswiki, you might as well use this function:
https://www.mediawiki.org/wiki/Manual:DumpBackup.php
--
TWiki Guest - 2015-09-14
Closing this question after more than 30 days of inactivity. Feel free to reopen if needed. Consider engaging one of the
TWiki consultants if you need timely help. We invite you to
get involved with the community, it is more likely you get community support if you support the open source project!
--
Peter Thoeny - 2015-12-03
If you answer a question - or someone answered one of your questions - please remember to edit the page and set the status to answered. The status selector is below the edit box.