Tags:
create new tag
view all tags

How to do a Reverse Search?

I want to be able to search for topics which do NOT match the search text.

This is for a TWikiApplication: Instead of listing all the changes in a web a la WebChanges, I want to be able to construct a series of changes pages, something like this:

  • list all recently updated topics which are NOT system pages (WebStatistics, WebSearch, WebTopicEditTemplate, etc.)
  • List all recently updated topics which ARE system pages
  • List all recently updated topics which are user's personal JournalEntries
  • ...(or BugReports, or ...)
  • List all topics which are NOT yet categorised (e.g. TopicClassification is not set)
  • etc.

-- MattWilkie - 14 Jul 2003

Answer

Sorry, didn't get very far, maybe somebody else has a workable approach — left my "notes" here FWIW.

Contents

Hmm, I thought some of the above (at least the first two) could be done, but I ran into a problem — I assumed that the goals (at least for the first two searches) could be translated to "show pages that start with 'Web'" (system pages) and "show pages that do not start with 'Web'" (non system pages).

The problem is that there are pages that start with "Web" which are not system pages, for example:

I don't know if there are system pages that don't start with Web. One solution (that may require more work than anyone would like) is to rename the system pages (or the non-system pages that start with Web).

If this approach had been workable (i.e., if no non-system pages began with "Web"), some fine tuning of the following would be appropriate. (Like making the searches case sensitive to distinguish between "Web" and "WEB" and limiting the search to say 50 results, and similar.)

With respect to the search for topics that are not yet categorized and other form field data, look at TWiki.FormattedSearch#Table_showing_form_field_values_ — I haven't made any attempt to "negate" the search, so don't know if it can be done or not.

Non System Pages

The following avoids showing any pages that start with "Web". As stated above, the problem is that there are pages that start with Web that are not system pages.

Note: Couldn't get the following to appear properly for a copy and paste — you will probably need to edit or view raw. Fixed, <verbatim> works fine, if you don't mind the really long line. -- WalterMundt - 21 Jul 2003

%SEARCH{ "^[^Web]" regex="on" scope="topic" nosearch="on" nototal="on" header="   * *Page Title: Summary* <p />" format="   * [[$topic]]: $summary <p />" }%

You may need to modify the above search to find an uppercase letter after the [^Web]. An upper case match might be found using [A..Z] (??) (You might also need to modify the [^Web] to [^W][^e][^b] or similar.)

Neither of these will work. The given search mathes anything whose first character is not "W", "e", or "b"...which means that it won't match such topics as WelcomeGuest, which start with "W" but not "Web". With the pattern "^[^W][^e][^b]", you match anything which has none of the following: starts with a "W", has an "e" as the second letter, or has a "b" as the 3rd. Thus not only do topics like the above not match, but such innocuous names as SeeSkin wouldn't either. There's just no way to do this neatly, which is why PeterThoeny is thinking about adding special negation syntax for TWiki below. You COULD do "^[^W]|.[^e]|..[^b]", which would work, except for "W" and "We", which aren't valid topics anyway...but that's a nasty hack that only really works for very short searches. --WalterMundt - 21 Jul 2003

System Pages

The following shows any pages that start with "Web". As stated above, the problem is that there are pages that start with Web that are not system pages.

Note: Couldn't get the following to appear properly for a copy and paste — you will probably need to edit or view raw.

%SEARCH{ "^Web" regex="on" scope="topic" nosearch="on" nototal="on" header=" * Page Title: Summary

" format=" * $topic: $summary

" }%

Fine Tuning Parameters

Some of these might help with fine tuning, if that becomes appropriate.

order="modified" reverse="on" limit="50" casesensitive="on"

Other

I just ran across an excellent resource for REs, see Power Regexps, Part II, by Simon Cozens. It is the second part of a three part article, I haven't found the URL for the first part yet. Some of it includes "verbalization" of the commands (similar to "x gets x + 1" for x = x + 1), which I find a very effective tool to learn and remember things.

-- RandyKramer - 14 Jul 2003

Thanks for ideas Randy. I'll chew on them for awhile and see what I can do. There is one thing I should have said though: I'm searching for TWikiForms data. What I want to do is have a page, e.g. WebHome, which shows the last 10 updated topics NOT including any pages which have a TopicClassification of "SystemPage" or "NotReadyForDiscussion" or any other category I want to exclude.

The first article is http://www.perl.com/lpt/a/2003/06/06/regexps.html .

-- MattWilkie - 16 Jul 2003

Not sure if this is possible with the current implementation. Sounds like a -v switch in grep. TWiki's search could be enhanced for that.

Example (TWiki enhanced) regular expression search syntax: sushi;maguro;!ebi would search for "sushi" AND "maguro" BUT NOT "ebi". Seems like a useful enhancement. There was some related discussion in the Codev web.

-- PeterThoeny - 20 Jul 2003

I had the problem that my search looked like this:

% SEARCH{"^%META:FORM{name=\"SocialActivityForm.*\""  scope="text" regex="on" nosearch="on" nototal="on" format="| [[$topic]] | $formfield(Owned by) |" }%
But that WebTopicEditTemplate was also of the type SocialActivityForm, so the search would return that as well as the ones it should. I've just hacked SearchDotPm:

-bash-2.05b$ diff -u lib/TWiki/Search.pm.orig lib/TWiki/Search.pm
--- lib/TWiki/Search.pm.orig    Mon Aug 11 03:58:00 2003
+++ lib/TWiki/Search.pm Mon Aug 11 08:55:45 2003
@@ -307,6 +307,9 @@

         next if ( $noEmpty && ! @topicList ); # Nothing to show for this topic

+        my $exclude="(.*Template)";
+        @topicList = grep (!/$exclude/, @topicList);
+
         # use hash tables for date, author, rev number and view permission
         my %topicRevDate = ();
         my %topicRevUser = ();

Now SEARCH never returns a match for Template, definitely bad because it confuses admins but will do for now as it trades this for not confusing users. If anyone would like to contribute the details of assigning to $exclude from a passed in parameter I would greatly appreciate the input.

-- MartinCleaver - 11 Aug 2003

Martin's fix one way to exclude the template topic. Here are two other ways to exclude the template topic without changing the code:

  1. Do not search for the form name, instead, searcg for a field like TopicClassifcation. Do not set the classification in the template topic, leave it at "Select one..." or whatever initial state you choose. Then in the form where you create a new topic based on the template initialize the field with a hidden input field. See AskedQuestions for an example (append ?raw=on to the URL).
    • Note: This approach has also the advantage that you get "poor man's polymorphism", e.g. you can define multiple forms that share some fields and have other fields unique. For example, if you have a "CarForm", a "TrainForm" and a "BicyleForm" you can search for common fields like "AverageSpeed"; and you can search also just within one way of transportion.
  2. Search for a keyword in the topic, e.g. "Back to MyIndexTopic". The template has that string but escaped with NOP, e.g. Back to My%NOP%IndexTopic. The NOP gets removed when a topics is instantiated based on the template.

-- PeterThoeny - 12 Aug 2003

The work-arounds for this problem are a real pain as you have to struggle to find something that will not match the template but will match in everything generated from the template. Adding %NOP% works only sometimes for me. Instead, I made the following changes to the code, which was quite painless, and adds an additional parameter +exclude="TopicNameOne|TopicNameTwo|etc" so that you can explicitly state which topic files will be excluded from the search. This may be a good idea for a future release of the code as well.

In TWikiDotPm, we need to add the exclude parameter to the %SEARCH{...}% variable and to the call of TWiki::Search::searchWeb() :

    my $attrHeader        = extractNameValuePair( $attributes, "header" );
    my $attrFormat        = extractNameValuePair( $attributes, "format" );
    my $attrExclude       = extractNameValuePair( $attributes, "exclude" );    ## added by RCLutz 10/23/03

    return &TWiki::Search::searchWeb( "1", $attrWeb, $searchVal, $attrScope,
       $attrOrder, $attrRegex, $attrLimit, $attrReverse,
       $attrCasesensitive, $attrNosummary, $attrNosearch,
       $attrNoheader, $attrNototal, $attrBookview, $attrRenameview,
       $attrShowlock, $attrNoEmpty, $attrTemplate, $attrHeader, $attrFormat,
      $attrExclude                                              ## added RCLutz 10/23/03
    );

In SearchDotPm, add the parameter to the parameters passed:

    my ( $doInline, $theWebName, $theSearchVal, $theScope, $theOrder,
         $theRegex, $theLimit, $revSort, $caseSensitive, $noSummary,
         $noSearch, $noHeader, $noTotal, $doBookView, $doRenameView,
         $doShowLock, $noEmpty, $theTemplate, $theHeader, $theFormat,
         $theExclude,                                           ## added RCLutz 10/23/03
         @junk ) = @_;

and exclude topics with the same approach used above by MartinCleaver:

        next if ( $noEmpty && ! @topicList ); # Nothing to show for this topic

            if ($theExclude) {                                  ## Added RCLutz 10/23/03
            @topicList = grep (!/$theExclude/, @topicList);     ##
            }                                                   ##

        # use hash tables for date, author, rev number and view permission

This also posted to the Codev web: ExcludeWebTopicsFromSearch

-- RaymondLutz - 23 Oct 2003

ExcludeWebTopicsFromSearch and InvertedSearchFeature (AND NOT search) are now implemented in the latest TWikiAlphaRelease.

-- PeterThoeny - 04 Jan 2004

just realised I never actually thanked you, here in the topic which started it all, for actually implementing this. :) Thank you Peter!

-- MattWilkie - 18 Feb 2005

Your welcome!

-- PeterThoeny - 21 Feb 2005

Edit | Attach | Watch | Print version | History: r12 < r11 < r10 < r9 < r8 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r12 - 2005-02-21 - PeterThoeny
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.