Question
Attempting to do a logical OR in a formatted search. Using the example data/form in the manual, if I use the following, it works... sort-of:
| *Topic:* | *OperatingSystem:* | *OsVersion:* |
%SEARCH{ "[O]peratingSystem.*?value=\"[O]sHPUX\"|[O]sVersion.*?value=\"2005\"" scope="text" type="regex"
nosearch="on" nototal="on" format="| [[$topic]] | $formfield(OperatingSystem) | $formfield(OsVersion) |" }%
It works if the
only Operating System selected for a topic is OsHPUX. If two (or more are selected, it doesn't show-up in the results.
Any suggestions?
Environment
--
TerrillBennett - 22 Jun 2007
Answer
If you answer a question - or have a question you asked answered by someone - please remember to edit the page and set the status to answered. The status is in a drop-down list below the edit box.
some unverified ideas...
you may be able to say
[O]peratingSystem.*?value=\".*?[O]sHPUX.*?\"
That would tell it there could be other options selected before or after the
OsHPUX in the value string.
Another option would be to have it return more results than you need, then use CALC to parse your results and only display the ones you want. Of course, the first approach would be far less complex and take tons less resources to run
I hope one of those solutions works for you.
--
MatthewCardozo - 26 Jun 2007
Matthew,
Thanks for your comments. You were exactly correct. I had found the solution a couple days ago, and was writing-up the following for others to reference. It's long, but for those who know Regular Expressions bu not how TWiki stores it's data, you'll get something of value. For those who do not know Regular Expressions, they should get even more. Here's my answer that I've spent two days writing...
I
highly recommend the book
Mastering Regular Expressions by Jeffrey E. F. Friedl. That said, here we go...
And the solution to the Logical OR question is:
Recall that I am using the forms example found in the
TWiki documentation. My data in the four test topics which I created and to which I have attached the form are as follows:
My search criteria: Find any topic which has an OperatingSystem of
OsHPUX checked
OR an OsVersion of
2005.
Looking at the data above, three (3) records should be returned.
After copying the
exact search string from the manual (which only looked for records which had an operating system of
OsHPUX), I was only getting FormTest3. FormTest1 was missing from the results, and should have been included.
Throwing-in my logical OR worked, and returned the record with 2005 in it, but the other HPUX records were missing.
The Answer - The Search String
I'll break the search strings up so they are human readable without scrolling, and highlight the important part:
The Original Search String:
%SEARCH{ "
[O]peratingSystem.*?value=\"[O]sHPUX\"|[O]sVersion.*?value=\"2005\""
scope="text" type="regex" nosearch="on" nototal="on"
format="| [[$topic]] | $formfield(OperatingSystem) | $formfield(OsVersion) |" }%
The "Fixed" Search String:
My
final solution is different, but here's the syntax that initially solved the problem, and got me moving in the right direction:
%SEARCH{ "
[O]peratingSystem.*?value=\".*[O]sHPUX.*\"|[O]sVersion.*?value=\"2005\""
scope="text" type="regex" nosearch="on" nototal="on"
format="| [[$topic]] | $formfield(OperatingSystem) | $formfield(OsVersion) |" }%
You have your search string answer. Just copy, paste, modify. No need to understand "why." You can go home, now.
Before you go, let me point out something you'll miss if you leave now: search this page for the word "hacker."
There is a lack of explanation on Regular Expressions on TWiki. And for good reason: "Regular Expressions" isn't a short topic. The
Mastering Regular Expressions book above is 542 pages, if that tells you anything!
For those of you who don't know Regular Expressions, and would like to know
why it works... keep reading. Sorry the explanation is so long, but keep in mind this is a
simple Regular Expression - be thankful it wasn't something complex!
What are we looking IN?
You must know the format of what it is you are actually searching. Since it isn't documented, I peeked and looked into the
FormTest?.txt files (where ? = 1, 2, 3 or 4) to see if/where/how the data is stored. The data is kept in META statements with each "field" in its own statement. Here are the OperatingSystem field data of two (2) of the records from FormTest1 and FormTest2:
%META:FIELD{name="OperatingSystem" attributes="" title="OperatingSystem" value="OsHPUX,
OsSolaris"}%
%META:FIELD{name="OperatingSystem" attributes="" title="OperatingSystem" value="OsLinux,
OsSolaris,
OsWin"}%
It took less than five minutes to "fix" my search string, once I knew where, and most important:
how, the data was stored and formatted.
Breaking It Down - The Regular Expressions
Removing everything that's not important to this discussion, we are left with the following portion of the solution's syntax:
"[O]peratingSystem.*?value=\".*[O]sHPUX.*\"|[O]sVersion.*?value=\"2005\""
The double-quotes on both ends delimit the search string. They're not part of the Regular Expression so throw them out. Simple enough, eh? We are done with the outside quotes.
The
[O]: the brackets denote a Regular Expression character set. If we changed it to read:
[Osx], in Regular Expression syntax it means "match any ONE of the characters O, s or x at this position." So, our search expression is telling the underlying Perl script we want to match the character Upper Case "O".
The
peratingSystem system means just exactly what it says: we are looking for the characters
peratingSystem in that order.
So far we have: Look for an Upper Case
O followed by the characters
peratingSystem.
I really hate to do this to you this early in your study, but: did we
really need the brackets? The Answer is: No!
- We could have just said: OperatingSystem and been done with it. Think about it.
- Or, we COULD have said [O][p][e][r][a][t][i][n][g][S][y][s][t][e][m] to get the same results... if you like typing a lot.
Since we now know the example given in the manual has more syntax than necessary... and since we now understand character sets if we ever really need them (for example
[fb]oo matches both
foo or
boo), let us simplify the search expression, and start over. In our revised expression, so far we've analyzed the following bold-faced portion:
"
OperatingSystem.*?value=\".*OsHPUX.*\"|OsVersion.*?value=\"2005\""
The dot ( "
." ) in a Regular Expression means: match ANY single character (except \n or \r), here.
The asterisk ( "!*" ) means: repeat the preceding expression zero or more times. In this case, when we put the dot and the * together, we get:
zero or more characters (any characters), here.
The question mark means: don't be greedy!
Greedy? Yes, greedy. If I say I'm looking for "a.*d" you might say it means: find the character "a" followed by ANY characters zero or more times, followed by the character "d".
If the string we were searching is:
- Joe and I were trespassing on Bill's land.
Then THE Question is: What string does "a.*d" match???
If your answer is
and, then you're not being
greedy. The expression "a.*d" matches:
- and I were trespassing on Bill's land
That's what we mean by "greedy." You said
any characters zero or more times. You
didn't say when to stop! I've bold-face the
zero or more characters portion, above. So the expression
a.*d matches
as much as possible from the first "a" it finds, until the
last "d" it finds that comes after the "a" in the string!
Change your expression to "a.*d?" and it matches what you expected: the
first and (there's an "and" in "land," don't ya know?). With the
?, the expression finds the
shortest match between the first "a" to the first "d" it finds after the "a".
So far we're searching for: find
OperatingSystem followed by ANYTHING, but don't be greedy, followed by...
Exactly what it says: the string
value=. We could have been looking for "Fred=" or "Mr. Goodfellow=" or "Samba the Magnificant=" but we're not. We're looking for the
exact string of characters: "value=" The fact that the word
value means something in the English language is insignificant to this expression - it just happens to be the string of characters we're looking for.
The next part says it should be followed by a double-quote:
\" The backslash ( "\" ) is an escape character, telling TWiki not to end the string - the double-quote is a part of the data inside the double-quotes on both ends (that we threw away, remember?).
So now we are looking for: find
OperatingSystem followed by ANYTHING, but don't be greedy, followed by
value= followed by a double-quote followed by...
This is a repeat: Any character, zero or more times and BE greedy. Well, actually, we could have said
? and not be greedy, and it would have worked, too.
Then we're looking for:
OsHPUX followed by ANYTHING zero or more times followed by a double-quote. See? You're an expert... we can move faster, now!
And now that
you are the expert, can you tell us the answer to:
why didn't the example copied from the manual work correctly?
Here are the two Regular Expressions, the original first, then the "fixed" expression:
value=\"[O]sHPUX\"
value=\".*[O]sHPUX.*\"
The original search string was looking for
exactly: double-quote, followed by
OsHPUX followed by double-quote. The only records it could match were those records in which
only OsHPUX had been selected for the operating system(s)! Selecting two placed the second system (e.g.
OsWIN) between
OsHPUX and the closing double-quote. And the original expression did not allow for it.
So, our
almost final expression for the
first half, reads:
OperatingSystem.*?value=\".*OsHPUX.*\"
Well,
all of the expressions below will work. Pick one, but make
absolutely certain you understand the possible consequences to the integrity of your TWiki Application (search for "hacker" below) - the farther you go down this list, the easier it becomes for me to subvert the integrity of your application (I'd personally stick with the one
above):
OperatingSystem.*?value=.*OsHPUX.*\"
OperatingSystem.*?value=.*OsHPUX.*
OperatingSystem.*?value=.*OsHPUX
O.*S.*?value=.*OsHPUX
O.*?v.*=.*O.*UX
O.*=.*UX
That last one simply SCREAMS "subvert me!" It will work, but if
any topic
anywhere in your web (form or no form attached) has "Oh gee I wish A was = to UX" on one line, you're entire Application is screwed. We'll get back to that...
Logical OR - I think
That brings us the second half of our expression:
"OperatingSystem.*?value=\".*OsHPUX.*\"
|OsVersion.*?value=\"2005\""
The bar ( "|" ) is the logical
OR in TWiki Regular Expressions. Note: I haven't found any documentation in TWiki that says this is true - I made an educated guess when I read this article about Regular Expression and the AND operator. So I guessed.
A logical OR says: match the expression on the left
OR the expression on the right.
If we look at the remainder of the expression, it's a repeat of the first half - except we're searching for
OsVersion anything value= double-quote 2005 double-quote.
Ouch! That was easy!-)
Logical AND
In TWiki, the logical AND is the semicolon (;). In Perl, it's &. So if you're reading
Mastering Regular Expressions, keep that in mind.
Final Notes: - Case and Hackers!
It seems that the TWiki Regular Expression engine applies a
CaSE iNsENsItiVe operator. Matching the case isn't required.
operatingsystem and
oshpux work just a well as
OperatingSystem and
OsHPUX. But, I'd advise you not count on that... the engine could change in the future without asking for your permission.
About those hackers...
As I said above when giving you the list of possible expressions... it seems TWiki ignores what topics do or don't have the Form attached to it. TWiki just searches
every document on your web for the expression you're searching for! A clever individual (such as yourself) could look at
any page that has the form attached, do a little digging into the workings of TWiki Applications Using Forms (you've already done that, here!) and easily subvert your application.
Using our example, if, on
any page in the entire Web I put:
OperatingSystem value="OsHPUX"
Then that page will be returned in the answer. It will
not have correct data (the data portion will be empty), but it will be returned as a part of the results. And now your job is to find the page on which someone placed that string!
This could be inadvertently done by something as simple as documentation for your Application.
Summary
The documentation on FormattedSearch, Forms and Web Applications Using Forms isn't well documented. They give you a simple example of a form, how to check a checkbox, and one simple search string that only works if only one out of the three checkboxes are selected, and parameters, etc.
And it's understandable why they don't go more in depth. They don't give you the information on how to (or if you even can) perform a logical AND, OR, XOR, NOR or NOT.
Experimentation has shown NOT ( "!" ) doesn't work. I've even tried "(?!xxx)" (negative lookahead) and it doesn't work. If someone figures out what (if) character is used for NOT, let me know.
Enjoy!
Below is a copy of all my searches, verbatim, for your enjoyment. I hope you learned something about Regular Expressions, and thank you for your undivided attention.
Class dismissed!
!FormTest - Show All Data<br />
| *Topic:* | *OperatingSystem:* | *OsVersion:* |
%SEARCH{ "OperatingSystem.*? value=\".*\"" scope="text" type="regex" nosearch="on" nototal="on" format="| [[$topic]] | $formfield(OperatingSystem) | $formfield(OsVersion) |" }%
!FormTest - Logical OR on partial value:<br />
OsSolaris AND OsVersion 9(5 OR 8) (two records. chane !TestForm2 to 98, to verify this works):<br />
| *Topic:* | *OperatingSystem:* | *OsVersion:* |
%SEARCH{ "OperatingSystem.*?value=.*OsSolaris.*\";OsVersion.*?value=\"9(5|8)\"" scope="text" type="regex" nosearch="on" nototal="on" format="| [[$topic]] | $formfield(OperatingSystem) | $formfield(OsVersion) |" }%
!FormTest - Multiple Values (OR) in One Value<br />
OsSolaris AND OsVersion (95 OR 2005) (three records):<br />
| *Topic:* | *OperatingSystem:* | *OsVersion:* |
%SEARCH{ "OperatingSystem.*?value=.*OsSolaris.*\";OsVersion.*?value=\"(95|2005)\"" scope="text" type="regex" nosearch="on" nototal="on" format="| [[$topic]] | $formfield(OperatingSystem) | $formfield(OsVersion) |" }%
!FormTest - Multiple Values (OR) in One Value<br />
OsSolaris AND OsVersion (95 OR 2005) (three records):<br />
| *Topic:* | *OperatingSystem:* | *OsVersion:* |
%SEARCH{ "OperatingSystem.*?value=.*OsSolaris.*\";OsVersion.*?value=\"(95|2005)\"" scope="text" type="regex" nosearch="on" nototal="on" format="| [[$topic]] | $formfield(OperatingSystem) | $formfield(OsVersion) |" }%
!FormTest - Logical OR<br />
OsHPUX or OsVersion 2005:<br />
| *Topic:* | *OperatingSystem:* | *OsVersion:* |
%SEARCH{ "OperatingSystem.*?value=.*OsHPUX.*\"|OsVersion.*?value=\"2005\"" scope="text" type="regex" nosearch="on" nototal="on" format="| [[$topic]] | $formfield(OperatingSystem) | $formfield(OsVersion) |" }%
!FormTest - Logical AND (" ; ")<br />
OsWin AND OsVersion 95 (one record):<br />
| *Topic:* | *OperatingSystem:* | *OsVersion:* |
%SEARCH{ "OperatingSystem.*? value=\".*OsWin.*\";OsVersion.*? value=\"95\"" scope="text" type="regex" nosearch="on" nototal="on" format="| [[$topic]] | $formfield(OperatingSystem) | $formfield(OsVersion) |" }%
Form Search Results, PublicFAQ:<br />
| *Topic:* | *OperatingSystem:* | *OsVersion:* |
%SEARCH{ "TopicClassification.*? value=\"PublicFAQ\"" scope="text" type="regex" nosearch="on" nototal="on" format="| [[$topic]] | $formfield(OperatingSystem) | $formfield(OsVersion) |" }%
--
TerrillBennett - 27 Jun 2007
Fun read! This should be refactored into a supporting topic for the regular expression search documentation in the TWiki Web (as long as it still is needed
--
FranzJosefGigler - 27 Jun 2007
I'm glad you were able to find a solution that worked for you. I have a couple of comments about your tutorial...
1. The [ ] characters around the O in
[O]peratingSytem
- You are correct to say that they are not needed.
- They are used as a trick to make sure that the page you have the search in does not get returned as a result to the search itself.
2. The logical ! does work as a not. I use it on a regular basis in some of my more complex searches. If you need to do more complex comparisons on the data you're searching, you can use the
SpreadSheetPlugin which contains most of the logical operations.
3. The .* before and after the
OsHPUX. In this case there is very little difference between that and the .*?, however, in theory, the regular expression engine is working harder to find the results you want if you omit the ?. It will have found what you want, but because you told it to be greedy, it will keep going. It will search to the end of the document (which isn't far off since the meta fields are at the end of the page anyway) then start going backward till it finds a ". Again, that's not a huge deal here, but since you know that the value property will have a closing ", you could have made the expression not greedy and it would stop as soon as it came to the first ". This idea may come in handy for someone who is searching through large files, or lots and lots of them, where optimizing the search is beneficial.
I can't say whether or not the book you mentioned is any good. I would like to offer a bit shorter option that is available on the web for free. It was invaluable to my development as a regular expression writer.
http://www.regular-expressions.info/
--
MatthewCardozo - 27 Jun 2007
This is a very good topic. I would like to know which AND ';', NOT '!' or OR '|' binds tightest?
So does
A;B|C mean
(A AND B) OR C or
A AND (B OR C)
Can brackets be used to define which order to run the logic in?
--
JonathanDorling - 01 Nov 2007
; is stronger, e.g.
A AND (B OR C)
--
PeterThoeny - 02 Nov 2007
When I try to use brackets it doesn't work.
I want to add these two sections together, could you please advise?
"TOPICPARENT{name=\"%INCLUDINGTOPIC%\"};[A]rea.*value\=\".*IT.*\";[O]ther.*value\=\"\""
"TOPICPARENT{name=\"%INCLUDINGTOPIC%\"};[O]ther.*value\=\".*IT.*\""
Thanks Jonathan
--
JonathanDorling - 02 Nov 2007
Sorry, closing this question after more than 30 days of inactivity. Please feel free to re-open if necessary.
--
PeterThoeny - 11 Dec 2007