Tags:
admin_tool1Add my vote for this tag navigation1Add my vote for this tag create new tag
, view all tags
Question

My Wiki is growing rapidly and I am starting to notice descrepancies in the WikiWords people are using. Has anyone out there got a way to show all words that have been mentioned but not defined?

Thanks, Martin.

  • TWiki version: Dec 2000
  • Web server: Apache
  • Server OS: NT4 sp6

-- MartinCleaver - 01 Mar 2001

Answer

With the stock Twiki installation there isn't as far as I'm aware a pre-written way of doing this. I've just knocked up the following though which may be useful... Ideally this should be built via a cron job, and do things more intelligently, but as a first pass it's hopefully a useful starting point.

#!/usr/bin/env perl
 
$webDataLocation = "/usr/local/httpd/twiki/webs/Projects/data";
opendir(WEBDIR, $webDataLocation);
while($file=readdir(WEBDIR)) {
   next unless ($file =~ /\.txt$/);
   open(IN, "$webDataLocation/$file");
   $slurp= join (" ", <IN>);
 
   $slurp =~ s/\t/ /g;
   $slurp =~ s/\n/ /g;
   $slurp =~ s/\r/ /g;
   $slurp =~ s/[^a-zA-Z0-9 ]/ /g;
   $slurp =~ s/\s+/ /g;
 
   foreach $word (split(/\s+/, $slurp)) {
      if ($word =~ /^[A-Z]+[^A-Z]+[A-Z]+[^A-Z]+$/) {
         $seen{$word}++;
         $seenIn{$word}{$file}++;
      }
   }
   close IN;
}
foreach $word (keys %seen) {
    if ( -e "$webDataLocation/$word.txt") {
        push (@exists, "$seen{$word} : $word ref'd by : " . (join(" ", sort keys %{$seenIn{$word}})) . "\n");
    } else {
        push (@notexists , "$seen{$word} : $word ref'd by : " . (join(" ", sort keys %{$seenIn{$word}})) . "\n");
    }
}
$EXISTS = join ("", sort { $b <=> $a } @exists);
$NOTEXISTS = join ("", sort { $b <=> $a } @notexists);
print <<REPORT;
Twiki Topics Referenced that have Topics defined
$EXISTS
 
Twiki Topics Referenced that need Writing
$NOTEXISTS
REPORT

This probably ought to go into into the Codev web as something suitable for FeatureBrainstorming...

-- MichaelSparks - 20 Jul 2001

Sheesh. I never thanked you for this Michael! Thanks Michael!

-- MartinCleaver - 23 May 2003

I tried to edit the script so that it exports a TWiki topic:

  1. edit the script so that the result is a suitable TWiki topic.
  2. put /bin/notdefined > /data/TWikiTopic.txt in crontab, have it run e.g. every half hour

I've got this working now, and have noticed the following behaviour:

  1. if I define the datadirectory to be one specific web (web A), it reports topics in web B referenced by topics in Web A as 'not defined'.
  2. if I define the datadirectory to be the root of all webs, then I get all references from debug.txt and the log2005MM.txt files.
  3. in this latter case, directories of the data directory are not searched

Does anybody with perl skills know how to adjust the script so it ignores log and debug files? (Or is it easy to move those to another location without TWiki beaking down?)

To be complete, these are the small changes to the script (part before foreach unchanged):

foreach $word (keys %seen) {
    if ( -e "$webDataLocation/$word.txt") {
        push (@exists, "$seen{$word} : $word ref'd by : " . (join(" ", sort keys %{$seenIn{$word}})) . "\n");
    } else {
        push (@notexists , "$seen{$word} : $word _referenced by : " . (join(" ", sort keys %{$seenIn{$word}})) . "_\n \%BR\%");
    }
}
$EXISTS = join ("", sort { $b <=> $a } @exists);
$NOTEXISTS = join ("", sort { $b <=> $a } @notexists);
print <<REPORT;
---+ List of non-existing pages
This page automagically lists all pages which are referenced but undefined (i.e. which show with a '?' in TWiki. This list is refreshed every 30 minutes.

*NB:* Do not add pages that are people's names. Those pages will be created when he or she registers in TWiki (in TWiki.TWikiRegistration).

Use this page to discover:
   * which important pages are missing (top of the list)
   * spelling errors in page links (bottom of the list)

---++ Pages that are referenced but undefined
$NOTEXISTS
REPORT

-- JosMaccabiani - 2 Jul 2005

Significant Reworking of Scripts

I started out with this script, but quickly reworked it to handle

[[ double bracket ]]
links. I attach this below.

However, this reworking was not enough to make this happy. Code smell: duplication: there is the code inside TWiki::Renderer that understands link syntax, and then this tool. Therefore in DanglingLinksToolNeeded I have attached a script that takes a different approach, using TWiki::Renderer to do the work (and hence, hopefully, staying consistent even if TWiki's link syntax changes).

I will move this discussion to DanglingLinksToolNeeded.

-- AndyGlew - 22 Mar 2006

Topic attachments
I Attachment History Action Size Date Who Comment
Texttxt findunwrittentwikipages-old.pl.txt r1 manage 5.3 K 2006-03-22 - 21:55 AndyGlew Slightly reworked version of script on this page
Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r11 - 2006-03-22 - AndyGlew
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.