Currently Twiki has no way of
FindingUnwrittenButLinkedPages - as per the Q
ListingAllUndefinedButUsedWikiWords asked by
MartinCleaver .
As a first pass I've written a simple script that goes away and collates the following info:
- WikiWord, Use count, Existing or not, List of existing pages that reference the WikiWord
This is currently done on the fly, but it might be nice to do things in a more "make" style of method,
and to re-use the stored info if it hasn't changed since the table was last generated.
The existing script I've knocked up looks like this, but probably needs redoing to be
really useful:
#!/usr/bin/env perl
$webDataLocation = "/usr/local/httpd/twiki/webs/Projects/data";
opendir(WEBDIR, $webDataLocation);
while($file=readdir(WEBDIR)) {
next unless ($file =~ /\.txt$/);
open(IN, "$webDataLocation/$file");
$slurp= join (" ", <IN>);
$slurp =~ s/[^a-zA-Z0-9 ]/ /g;
$slurp =~ s/\s+/ /g;
foreach $word (split(/\s+/, $slurp)) {
if ($word =~ /^[A-Z]+[^A-Z]+[A-Z]+[^A-Z]+$/) {
$seen{$word}++;
$seenIn{$word}{$file}++;
}
}
close IN;
}
foreach $word (keys %seen) {
if ( -e "$webDataLocation/$word.txt") {
push (@exists, "$seen{$word} : $word ref'd by : " . (join(" ", sort keys %{$seenIn{$word}})) . "\n");
} else {
push (@notexists , "$seen{$word} : $word ref'd by : " . (join(" ", sort keys %{$seenIn{$word}})) . "\n");
}
}
$EXISTS = join ("", sort { $b <=> $a } @exists);
$NOTEXISTS = join ("", sort { $b <=> $a } @notexists);
print <<REPORT;
Twiki Topics Referenced that have Topics defined
$EXISTS
Twiki Topics Referenced that need Writing
$NOTEXISTS
REPORT
It's probably of some use as is, but needs alot of tarting up and persistant memoisation to be properly useful I think.
--
MichaelSparks - 20 Jul 2001
Great. What say we make this a regular report as per
WebStatistics?
--
MartinCleaver - 22 Jul 2001
I just noticed this. I think it's also useful to find weakly linked pages, things near the bottom of the existing list are those that might be hard for people to find using a normal "stumbling" pattern.
--
MikeMaurer - 14 Aug 2003
Some tasks crop up again and again:
A clean way to
FindAllLinksInPage is necessary for
The problem is, that there are a many ways to create links.
(Hmm... this is a wiki -- it's a feature, not a problem!)
To definitely, positively, absolutely cover all cases,
you have to apply all rules which TWiki does,
including all installed plugins.
This is not feasible.
This must be delegated to the TWiki renderer itself.
Ways to this:
- let TWiki render the page completely into a naked template; then de-render all links: sounds ugly and costly in terms of run-time
- provide hooks for all functions emitting links; then you could register to collect all links as you go by: probably to many places to change
The make-approach could make the run-time cost bearable.
Maybe combined with an on-save hook and fields in
ADatabaseCalledTWiki?
If something comes out of this,
I'll be glad to get the missing links into the
TouchGraphAddOn
--
PeterKlausner - 14 Aug 2003
A 3rd solution: make a hook called when a unresolved link is found:
- It should not slow TWiki on normal pages
- then you can have a plugin storing all the missing links, triggered by a web-crawling of the site
This could also be helpful to implement "catch-alls" to catch redirected pages or webs.
--
ColasNahaboo - 14 Aug 2003
Nice work.

But it generates too much hits on the "non existing"-side since i.e. author names in meta data are normally not fully qualified ("Main."). Ignoring meta data could work...
--
OliverKrueger - 30 Dec 2004