Question
TWiki uses
RCS for storing revisions of all attachments that are uploaded in TWiki topics. The attachments and its corresponding
RCS files are stored in the pub directory of your TWiki installation. How can I get all
RCS revisions of the uploaded files? This question is pertinent when you want to migrate from TWiki to another content management system and carry all revisions too!
Environment
--
AlokNarula - 08 Oct 2008
Answer
HOW TO GET A PREVIOUS REVISION OF AN RCS ARCHIVE PROGRAMMATICALLY
TWiki stores all attachments in the /var/www/twiki/pub directory. Attachments are stored inside a directory name matching with the Web containing the attachment. For example, the attachment "INTERNATIONAL_TRAVEL.doc" uploaded to the TravelPolicy topic in the HR web is stored as "/var/www/twiki/pub/HR/TravelPolicy/INTERNATIONAL_TRAVEL.doc" in the filesystem.
When you upload a file as an attachment, TWiki creates a
RCS version of the file (with a ,v extension). TWiki updates the
RCS version everytime you upload the same file and stores the delta of the changes in the ,v file. The
RCS version also has a log of all changes made to the file. If you want to get a previous revision of the uploaded file, you can run the following command from command line and get the previous revision:
co -p -r<rev#> <original_file> > <new_file>
The checkout (co) command checks out the original_file at the specified revision and prints it as the new_file.
TO GET PREVIOUS REVISIONS OF ALL RCS ARCHIVE FILES IN YOUR TWIKI "PUB" DIRECTORY
1. Create a list of all attachments in the /var/www/twiki/pub directory. You can use the 'prune' option in Unix 'find' to exclude the directories that you don't want to search. For example, the following command excludes all attachments in the TWiki, _work_areas, Bugs, _default, Sandbox, Trash, Hidden, and Publish directories:
find . \( -name "TWiki" -o -name "_work_areas" -o -name "Bugs" -o -name "_default" -o -name "Sandbox" -o -name "Trash" -o -name "Hidden" -o -name "Publish" \) -prune -o -print > /var/tmp/filelist.txt
2. Eliminate all the
RCS files from the list created in Step-1 and grep the list for the extensions that you're interested in (doc, chm, exe, pdf, ppt, txt, vsd, zip):
grep -v ",v$" filelist.txt | awk 'BEGIN { FS="/"; IGNORECASE="1" } $NF ~ /\.DOC$|\.PDF$|\.CHM$|\.EXE$|\.PPT$|\.TXT$|\.VSD$|\.ZIP$|\.XLS$|\.MPP$|\.RPT$/ {print}' > attachments.txt
3. Create a list of all previous revisions by running the following PERL script:
#!/usr/bin/perl -w
$fname = ""; #file name
$rev_no = ""; # revision number
@rev_struct = (); # revision structure
$" = ":"; # delimiter for array elements
#Open an input handle to read all the TWiki attachments
$attachments = "attachments.txt";
open(DOCS,"$attachments") or
die "Cannot find $attachments $!";
#Open an output handle to record each attachment's revision data
$attachrevs = "revdata.txt";
open(REVREC,">>$attachrevs") or
die "Cannot create revision record $!";
while(<DOCS>) {
open(REVINFO,"rlog $_|") or
die "Cannot open the rlog data $!";
# Parse the rlog file and populate the revisions structure
while(<REVINFO>) {
if ($_ =~ m/^Working\sfile:\s(.+)/g) {
$fname = $1;
push(@rev_struct,$fname);}
if ($_ =~ m/^revision\s(.+)/g) {
$rev_no = $1;
push(@rev_struct,"$rev_no");}
}
# Print the revisions data
print REVREC "@rev_struct\n";
@rev_struct = (); # empty the revisions structure
}
4. Parse the revisions info into a data structure containing 4 fields: Directory:Attachment:Extension:Revision. Create a directory matching the file name and revision info and save each revision into its corresponding directory.
#!/usr/bin/perl -w
$revlist = "revdata.txt";
#Read the revisions info of each TWiki attachment
open(REVINFO,"$revlist") or
die "Cannot find $revlist $!\n";
#Parse the revisions info into a data structure containing 4 fields:-
#Directory:Attachment:Extension:Revision
#Implement the following algo:
#1. Create a directory from $2 as follows:
#mkdir -p $2
#2. Create a directory matching the file name and revision info
#mkdir $2/$3V$_
#3. Save each revision into its corresponding directory
# co -p -r$_ $1$2$3$4 > $2/$3V$_/$3$4`
while(<REVINFO>){
m/(.*\/pub\/)(.*\/)(.*?)(\..{3}):(.*)/g;
`mkdir -p $2`;
@revsarray = split(/:/,$5);
foreach(@revsarray) {
# Create a directory matching the file name and revision info
`mkdir $2/$3V$_`;
# Save each revision into its corresponding directory
`co -p -r$_ $1$2$3$4 > $2/$3V$_/$3$4`;}
}