Tags:
create new tag
, view all tags

SID-02335: exclude keywords when web="All" actually includes instead

Status: Answered Answered TWiki version: 6.0.0 Perl version:
Category: CategorySearch Server OS: Last update: 21 minutes ago

when doing a search that includes All Public Webs, whether using a standard keyword search, an inline search, an expression search (using ; and ! ), if I try to search web=ALL or all public webs, the results INCLUDE the keywords I have asked to EXCLUDE by indicating with a - or ! where applicable.

If I search ONE web at a time, it works as it should.

How do we fix this?

-- TWiki Guest - 2017-10-09

Discussion and Answer

I can confirm this behaviour and I have a hunch that this is a bug in TWiki's software. It seems that it slipped the tests because the exclusion works in the first web of a list, but only in the first web.

The following patch should do the trick (also attached as SID-2335.diff):

Index: lib/TWiki/Search.pm
===================================================================
--- lib/TWiki/Search.pm   (revision 30388)
+++ lib/TWiki/Search.pm   (working copy)
@@ -238,7 +238,8 @@
     $scope = 'text' unless ( $scope =~ /^(topic|all)$/ );
 
     # AND search - search once for each token, ANDing result together
-    foreach my $token ( @$tokens ) {
+    my @tokens = @$tokens;
+    foreach my $token ( @tokens ) {
 
         my $invertSearch = 0;

Do you want to file a bug report yourself? Otherwise I'll do it.

-- Harald Jörg - 2017-10-09

Harald,

Thanks for the reply and confirmation. I am having trouble getting registered on this TWiki support site and had started to wonder if this product was actually no longer supported. I would love it if you could file the bug report for future users.

How do I use the code above? I am an end user of the TWiki in my organization. I have been experimenting with inline searches and queries to get around this... but I don't know how to use a .pm file. I might, however, be able to get one of our admins to implement something if you can tell me what to tell them to do with your suggestion.

Thanks, Shannon

-- TWiki Guest - 2017-10-10

PS - I am looking again at what you have above and wondering if I can put that into an inline search query... I'm "okay" with coding so I am still figuring out how to define the search text (which I see as 'text'), the exclude text and the token, which is either the exclude or the web name?

-- TWiki Guest - 2017-10-10

Hello Shannon,

Re registering: We had similar reports, I guess this is a not-yet-fixed consequence of a a recent server move of twiki.org (beyond my reach, I'm not an admin there). But well, as far as I dare say, the product is supported :). I've filed the bug under Bugs:Item7822, and I'll commit a fix, which might look a bit different in the final version and be part of the next release.

How to use the code: Unfortunately this is nothing which can be used by a TWiki user: you need write access to the TWiki code. So, yes, this is for your admins. It is a "patch" file to be used by the command with the same name under Unix/Linux: In the root directory of your TWiki installation, say patch -p0 path_to_SID-2335.diff. Simply spoken, remove the line starting with a - sign and add the lines starting with a + sign, the whole thing happening in lib/TWiki/Search.pm around line 237.

Of course, TWiki is versatile enough so that you could, in principle, get around that error by creating a macro which splits a search into different searches one-per-web with some voodoo and TWiki:Plugins/SpreadSheetPlugin and invoke that macro instead of a search... but I doubt that's worth the effort given that the patch is rather non-intrusive.

-- Harald Jörg - 2017-10-10

Our admin applied the patch above and the search function has improved! I have an extensive search list to undertake, so I will see if any other issues arise smile

THANK YOU!!

-- TWiki Guest - 2017-10-10

PS - I did something like what you suggested by doing an inline search and creating one separate line where web= was a different web, representing each web. Then I made my text search string = %MYSTRING% and set %MYSTRING%=[whatever] on my personal settings page. This gave me about a dozen web searches at once on the same page smile Does that make sense?

-- TWiki Guest - 2017-10-10

Edit: created a separate line for each web...

It looked like this:

WEB1 (heading 2)

%SEARCH{"%MYSTRING%" web="Web1"}%

WEB2 (heading 2)

%SEARCH{"%MYSTRING%" web="Web2"}%

WEB3 (heading 2)

%SEARCH{"%MYSTRING%" web="Web3"}%

...and so on for about a dozen webs...

-- TWiki Guest - 2017-10-10

FYI, found a bug to the existing solution... it does not exclude phrases in quotes.

example... the search tpd -httpd -ntpd -"On-Line TPD" will find all instances of tpd, but exclude results that have httpd or ntpd. it does not exclude results with On-Line TPD in them

-- TWiki Guest - 2017-10-10

About your solution with a list of single web searches: Yes, of course this makes sense. The only downside is that this is difficult to maintain when a new web comes up, or when you need a different search string or format. That's what I meant when I wrote about "voodoo with Plugins/SpreadSheetPlugin" - this has a learning curve, but could possibly automate such stuff. Again, I doubt that it's worth it.

About your new bug discovery: Yes, indeed, you found yet another bug! Thanks for reporting - I'll file yet another bug report (Bugs:Item7823) On first glance, there's (again) a simple patch, but in this case I'm not so sure about side effects with other search strings. I'll attach it anyway (SID-2335_2.diff), if you like an adventure smile

What made me think was your statement I have an extensive search list to undertake. In that case, it might be worth the effort to get in touch with QuerySearch: It also comes with a learning curve, and the searches are only available in the SEARCH variable but not from the ready-made WebSearch page, but it allows a lot of complex queries which are impossible with the "naive" literal and keyword searches. Your search would look scary at first glance, but I think it can be understood rather easily by decomposing it:

%SEARCH{"(lc(text) ~ '*tpd*') AND NOT (lc(text) ~ '*httpd*') AND NOT (lc(text) ~ 'ntpd') AND NOT (lc(text) ~ 'on-line tpd')" type="query"}%

-- Harald Jörg - 2017-10-10

Harald,

Thank you so much for all the help on this!

I have tried a few query searches with some unexpected results. Your breakdown above makes perfect sense, and I had wondered if I needed lc(text) in my search. Also, do I only use ~ when there's a wildcard * ? Such that the AND NOT statements do not need the ~ ? And if it doesn't matter whether it is there or not on those, I can just keep them.

How to do this with All Public Webs? Is it still web="All"? Because I think that was failing when I tried the query search.

Thank you also for another patch suggestion. I can ask our admin if he wants to try your attached changes, though your caveat that it may have unintended consequences is heard.

To give you more insight, I need to locate a bunch of documents in a vast TWiki that has had content contributed to it for years. We have a few types of documents that may or may not have been created for each area of our business. Examples are, technical guide, users guide, provisioning guide and troubleshooting. The names of these guides are very inconsistent. They may have an acronym, they may have the full title, they may have the word guide and they may not. There was NO control over how they were named. So I am doing my best to try a large number of search strings and I am constantly altering the string to try searching on something else, and to try to remove stuff that clogs the results.

What would you do? If it's the voodoo answer, the learning curve might be too broad. I am getting mixed results with the search capabilities that I already have, and I can probably trudge through this as is. My biggest impediment is when I need to exclude phrases that are clogging results and it won't let me.

Thanks! Shannon

-- TWiki Guest - 2017-10-11

Hello Shannon,

I'm glad that I can help. At the moment I have some time on my hands to do TWiki stuff, and for me it's still fun smile

Many thanks for the description of your problem. You have my sympathy, I've worked with (big) libraries and archives in the past and am still enthusiastic about things around document management, information lifecycle management and stuff. I think I know pretty well what you're talking about. Quite often I've encouraged people to just submit their contributions without bothering about the formal stuff - that could be bolted on later. From that point of view, your TWiki seems to be a success so far!

Let's start with the technical stuff:

  • The phrase lc(text) is required when you want to search case insensitive: Convert your text to lowercase, and match it against a lowercase test string. TWiki's keyword search is case insensitive, but query search isn't. Note that because of this I've converted your third keyword On-Line TPD to lower case to match against lc(text).
  • The tilde ~ says "matches" and makes sense if you have some sort of wildcard on the right hand side. A pattern of '*foo*' means that there is, somewhere in your text, a "foo". So lc(text) ~ '*foo*' in a query search is almost the same than a search for foo in a keyword search. If there's no wildcard character on the right hand side, then text ~ 'foo' is the same as text = 'foo', which usually is not what you want.
  • The AND NOT phrase simply glues the conditions together. So, in words, the search is for "topics which contain tpd but do not contain httpd, nor ntpd, nor on-line tpd.
  • Search in all public webs with web="all" (lower case "a", because "All" is just a valid name for a web!) should work with query search as with any other type of search. I'm writing "should" because so far you've been pretty successful in spotting bugs which went undetected for a decade or so smile

So, what would I do? Well, "it depends". I haven't yet understood so far what's the desired "final result" of your search efforts, i.e. what do you want to have achieved once you've trudged through. Anyway, I think that if voodoo needs to take place, then only to make the results visually appealing. Usually, in such a situation, I'd fire up a TWiki Application, simply because I've done that by the score (Note: you don't need to be an admin for that). TWiki applications add structure to unstructured data, and unstructured data seems to be what you have so far (historical note: This is one of the reasons why I have chosen TWiki over other wiki engines years ago). You have a target structure (business area, document type), so the question is how to connect these metadata with the topics. The Perl mantra TIMTOWDI (there is more than one way to do it) applies here as well. Any way, it starts with defining a Form for the meta data. In addition to business area and document type, document management systems usually record the author(s), the status, and maybe other stuff which depends on your documentation governance. Topics with forms can be queried almost SQL-like with QuerySearch, without depending on any control over topic names. As for how to connect the meta data to the topics, there are several options:

  • Add the form and fill in the data to each topic you've found with your search efforts,
  • Create new topics containing the form for every document which should exist, and then
    • either link to existing documents you've found in another form field, or
    • add an "intelligent" search which pulls of documents fitting to the metadata dynamically. However, I'd be very careful with that, because it can't guarantee that new documents will be correctly "found".

In any case, in addition to the technical part you'll need to define a process for future documents, so that they don't break your schema. But I guess that's beyond the scope of technical TWiki support smile

-- Harald Jörg - 2017-10-11

      Change status to:
ALERT! If you answer a question - or someone answered one of your questions - please remember to edit the page and set the status to answered. The status selector is below the edit box.
SupportForm
Status Answered
Title exclude keywords when web="All" actually includes instead
SupportCategory CategorySearch
TWiki version 6.0.0
Server OS

Web server

Perl version

Browser & version

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatdiff SID-2335.diff r1 manage 0.5 K 2017-10-09 - 19:27 HaraldJoerg Patch to fix the issue with negated search in multiple webs
Unknown file formatdiff SID-2335_2.diff r1 manage 0.5 K 2017-10-10 - 22:14 HaraldJoerg A second patch for a search bug... yet to be tested fully.
Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r8 - 2017-10-11 - HaraldJoerg
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.