Question
Update: I've added a similar problem at the bottom of this page, added some questions, and changed the status back to "asked question" -- hope this works.
-- RandyKramer - 29 Sep 2002
I'm creating a lab report table using two types of wiki templated pages, tasks, and sub-stories.
A problem I can't seem to figure out is how to change a regex expression so it will not match results which have a certain string in them.
My current search string is:
"TOPICPARENT.*name\=.*%TOP1C%"
Which automatically grabs all the topics that are a child of the current page, I know that pages containing "Story" in their title, or just in their standard text will be of a different type, and should not be found. I've seen the [^ ] notation can be used to search for individual characters to be excluded, this must be extendable to whole strings, or there'd be a mechanism to do it in regex.
Update:
I've got the problem simplified (I think). Since both pages use differing twiki forms, that would be the thing to differentiate the two searches. A search like:
"TOPICPARENT.*name\=.*%TOP1C% (something I can't figure out...) FORM.*name=.*FormName"
should work. I've tried several ways to do that part I can't figure out, which would skip over the page's actual contents. Several examples I've seen regarding regex on other sites say something like:
*[] or *[[:space:][:print:]]
would do that part, but it doesn't work in Twiki.
Is it possible that windows is causing this not to work? I've thought that might be the case, but everything else works fine for me, except the differences feature (which I haven't worried about since I'm using this as an internal testing system which will get put up on a Linux server once all the bugs are out).
Suggestions?
I realize this isn't specific to Twiki, but it might be a bug, or something that might be considered for future %TWIKIVARIABLE% settings at some time.
- TWiki version: Dec 2001
- Web server: Apache
- Server OS: WinXP
- Web browser: IE6
- Client OS: WinXP/Debian
--
MikeMaurer - 18 Feb 2002
Answer
Mike, here are a couple of pointers, both described or linked in
SearchEngineVsGrepSearch
-
andgrep - this is a simple Perl script that searches for regex A 'and' regex B in a file (but not in topic name). May now be obsolete since metadata is used for forms, but it may give you some ideas.
- GNU
bool - this is a C program that does proper AND and NEAR searching, though not for regexes.
Both are drop-in replacements for egrep/fgrep, so quite easy to experiment with. You may find you want to write a custom version of
andgrep if the above tools aren't what you need.
--
RichardDonkin - 14 Mar 2002
I've found that a regex search like ^(?!Csic).*mail on some data I've taken from the Wikilearn web does fine in Nedit, but does not work in TWiki -- on
WebChanges try an advanced search on topic names with regex checked.
What I expected it to do (and what it does in Nedit) is display all pages with names that include "mail" but do not start with "Csic". It displays no hits --there should be around 50. (On the perlre man page, "?!" is described as a "A zero-width negative look-ahead assertion...".)
Questions:
- Is this because TWiki is possibly using an older version of Perl which does not support this syntax, or does TWiki do some preprocessing of the search query before handing it off to Perl? Or is there some other reason it doesn't work on TWiki?
- Is there a syntax that works? I've tried the following -- some of them come close, but those that come closest exclude topics starting with "Cs" as well as topics starting with "Csic", and that is not what I need.
- ^[^(Csic)].*mail
- ^[^Csic].*mail
- ^[^C][^s][^i][^c].*mail
--
RandyKramer - 29 Sep 2002
From the command line the following seems to do what you want.( if you intend to not match the four letter string "Csic" )
perl -we 'print "Jo" if (Csic_email =~ /(?=^(?:(?!^Csic.*$).)*$).*mail/s) '
The Perl Cookbook has this recipe in section 6.17.
I did not investigate if twiki does something to your regexps.
--
FrankHartmann - 29 Sep 2002
I'm trying to discourage people from adding their own version of a question at the end of an existing Support page - this is because such questions are usually lacking in information (e.g. TWiki version, TWiki.cfg, error messages, etc). This one is OK, but I don't want to encourage less experienced users of TWiki.org to do this - hence the
SupportGuidelines. I think it's OK to add on to an existing page if it is discussing 'how do I' type questions, but not if it's a 'TWiki doesn't work' type question. Not sure if the distinction can be made clear enough, though.
As for your question - you are assuming that TWiki uses Perl to search, when in fact it uses
grep. See TWiki.cfg for the command used. Grep doesn't have this sort of negative lookahead feature, though no doubt you can find a Perl grep that does, at some cost in performance.
--
RichardDonkin - 30 Sep 2002
Thanks to you both -- some comments below, but so far it looks like there is no (easy) solution.
Frank: Thanks -- tried your regex a little in TWiki until I recognized that it included the "?!" which Richard pointed out does not work in grep (in addition, I didn't "realize" TWiki used grep instead of Perl for searching).
Richard: Thanks -- I had seen your request about not adding to the bottom of a question but decided this was essentially the same question and it made more sense (to me

) to "reopen" the question on the same page than to start a new page with the same question, and I thought that reopening the question (that is, marking it "AskedQuestions") and being careful to (try to) include enough information would make it OK. (I guess I thought it would be better than having duplicate pages asking the same question, but who knows -- if I hadn't found this page I would have started a new one.) In general, I will only reopen an old question if it is (AFAICT) a duplicate of the question or possibly a very appropriate followup question. If the consensus is that this is the wrong thing to do then I'll try not to ever do it.
Thanks for pointing out that TWiki uses grep instead of Perl for searching. (I guess I "knew" that in some sense, but didn't really think put two and two together when I was attempting to get the regex to work -- funny how my brain works sometimes (not).) Left it as an "asked" question pending someone stating that there is no regex that will accomplish this in the current version of TWiki.
Digression: Hmm, I just (almost) got bit by one of the (variety of) bugs related to losing your changes when I hit the back button -- I'm trying to write down the sequence of events so I remember it and can (consider) check(ing) against the bug descriptions:
- I'm editing this at one of the local colleges, so I'm not using my normal setup, and I'm probably behind some "massive" firewalls / proxy servers.
- I'm using IE 5.5.
- I edited a page, previewed it, hit the back button to re-edit, and my changes were not there.
- Without thinking, I hit refresh -- no help.
- I finally "recovered" by hitting the forward button (to get back to the preview, which had my changes), saved, and then hit edit to re-edit the page.
Of course, the above may not be a bug, or rather, may be the best that TWiki can do in this environment (IE (5), firewall, proxy, gateway, etc.) -- I'll try to find those bugs again and look at them -- more importantly, I probably need to change my TWiki editing habits when I use the computers at this school.
Just in case it's a useful clue, I was using the default "skin" at the time. (I've never (IIRC) tried a different skin.) -- RandyKramer - 09 Apr 2003
--
RandyKramer - 30 Sep 2002
Re the IE 5 'back button' problems - I've put a pointer to this page on
BackFromPreviewStillLosesText. Such problems are quite rare since
BackFromPreviewLosesText was implemented in the
TWikiAlphaRelease and on TWiki.org, but they can still happen occasionally.
--
RichardDonkin - 29 Nov 2002