Tags:
create new tag
view all tags

Question

I have a problem in a web where search does not work anymore. The web has too many topics in it for grep, it sais:

bash: /bin/grep: Argument list too long

I tried to play around with find and xargs but did not succeed. Could a Unix guru help out?

  • TWiki version: 01 Dec 2001
  • Web server: Apache
  • Server OS: Linux
  • Web browser: N/A
  • Client OS: N/A

-- PeterThoeny - 11 Dec 2001

Answer

The typical approach, which should work on any system with xargs, e.g. Linux and System V Release 4 variants such as Solaris, is:

   find . -type f -name '*.txt' -print | xargs grep -l pattern

I just tried this out on Linux and it works OK. Any use of wildcarded arguments should really use xargs for scalability beyond the (approx) 10 Kbyte limit on command line arguments in Unix.

-- RichardDonkin - 12 Dec 2001

Thanks Richard, that works from the command line. A sample output is:
./TopicA.txt
./TopicB.txt

Now I need to make it work for TWiki. TWiki specifies the egrep or fgrep command in TWiki.cfg, i.e. $egrepCmd = "/bin/egrep";

Parameters (switches, pattern and scope) are appended by TWiki, i.e. /bin/egrep -i 'pattern' *.txt

It looks like I need a wrapper (name it i.e. xegrep) that expects the parameters like egrep and does the find xargs stuff. Note that leading ./ should be stripped and number of parameters varies.

-- PeterThoeny - 12 Dec 2001

The wrapper would work, but it would be easier to configure TWiki if you put the commands directly into a TWiki variable, so that $findCmd and $xargsCmd can be changed as appropriate; otherwise the user has to remember to edit the 'xegrep' script when installing TWiki, not just edit TWiki.cfg.

-- RichardDonkin - 13 Dec 2001

You are correct, better to do a clean solution. What I need for now is simply a quick hack on one installation.

Clean solution: Change the $egrepCmd to include %SWITCHES%, %PATTERN% %FILTER%. Then it is possible to replace the regular egrep by find / xargs.

-- PeterThoeny - 12 Dec 2001

I ran into this problem on the GambasWiki and was able to fix it with a combination of the strategies described here.

In lib/TWiki/Search.pm, I changed the following lines starting at 212:

    if( $theScope eq "topic" ) {
        $cmd = "$TWiki::lsCmd %FILES% | %GREP% %SWITCHES% -- $TWiki::cmdQuote%TOKEN%$TWiki::cmdQuote";
    } else {
        $cmd = "%GREP% %SWITCHES% -l -- $TWiki::cmdQuote%TOKEN%$TWiki::cmdQuote %FILES%";
    }
with the following:
    if( $theScope eq "topic" ) {
        $cmd = "find . -type f -name '%FILES%' -print | perl -pe 's|^\.\/||;' | grep %SWITCHES% -l -- $TWiki::cmdQuote%TOKEN%$TWiki::cmdQuote";
    } else {
        $cmd = "find . -type f -name '%FILES%' -print | perl -pe 's|^\.\/||;' | xargs -n 5 grep %SWITCHES% -l -- $TWiki::cmdQuote%TOKEN%$TWiki::cmdQuote";
    }

It seems to be working well. (We include a "referenced by" search at the bottom of every page so our generated static version used in the help browser is automagically cross-referenced, so it was kinda important.) I'm surprised this doesn't come up more often and wonder if this fix should be standard issue. (Edit: I added the perl -pe clause up above because it fixed some formatting weirdness due to find's property of (correctly) prepending "./" to everything if you search ".".)

-- RobKudla - 27 Jul 2003

I thank you for posting this and commend your ingenuity Rob. I also urge the CoreTeam caution we when implement this. There is nothing wrong with this suggestion but we really ought to isolate all architecture dependent functionality, in particular where they materialise as external calls. I suggest we route them all through a new TWiki/Arch.pm (ArchDotPm) with subclasses TWiki::Arch::Unix and TWiki::Arch::Windows, etc

-- MartinCleaver - 27 Jul 2003

You're right, of course; I was assuming when I wrote that fix that TWiki already depended on egrep and ls (for example) being there but I suppose that's why it's $TWiki::lsCmd and %GREP% in the command lines.

I actually wonder if in this case some of that couldn't be avoided by reimplementing what we need of ls and egrep in a platform-independent way. Maybe the way to do that would be to have e.g. TWiki::Arch::Fallback and have a native Perl implementation of external commands that differ by architecture, for those architectures without niceties like xargs.

-- RobKudla - 27 Jul 2003

Indeed, so in turn you are quite right. TWiki and the people using it and supporting it would all benefit it we GetRidOfAllExternalLinkages.

IIRC, someone submitted such a bunch of patches on Codev quite recently, but I forget who and it is too easy to lose such things on Codev (See PleaseCreateNewCategories)

I strongly suspect that the patches didn't make TWikiAlphaRelease.

-- MartinCleaver - 27 Jul 2003

Putting all external code into TWiki isn't an unalloyed benefit - it would simplify installation, but Perl-based grep is definitely slower, and search is already the slowest feature of TWiki for large installations. Patches to avoid the need to use xargs are very welcome, as are those to use it in a configurable way - I'd suggest a flag in TWiki.cfg called $largeWeb that is set to 1 to force use of xargs (or a slower Perl-based loop for argument processing + grep-launching technique).

As for the development issues, this is best talked about on Codev - everything I am working on (or not) is on TWiki.org, but right now I don't have a lot of time and I'm afraid the same is true of most other CoreTeam members. The best way for people to get onto the CoreTeam is to submit a few high quality patches that are suitable for the core and conform to the PatchGuidelines. So far I haven't seen many such patches...

-- RichardDonkin - 28 Jul 2003

This needs to be fixed, see Codev.ArgumentListIsTooLongForSearch

-- PeterThoeny - 10 Sep 2003

The ArgumentListIsTooLongForSearch issue has been fixed on 01 Nov 2003 and is available in the latest TWikiAlphaRelease or TWikiBetaRelease.

-- PeterThoeny - 26 Jan 2004

I implemented the above fix by modifying my Search.pm file, but now the ref-by search performed by the rename script doesn't work. Using the ref-by link seems to work OK, as does a search, but the rename script missed the pages.

Can anyone suggest why this might be? I will upgrade to the latest version, but I'll need to do that under change control, and right now I need to get it working.

-- AlexGarner - 17 Mar 2005

The "ref-by" function does not work because of a few small bugs. 1) Instead of hardwiring the grep, it should be %GREP%. This allows egrep to be invoked when needed. 2) Delete the -l option from the first $cmd line. This allows "topic" level searches to work again. 3) Technically, there is a missing backslash in the perl command, although this bug does not cause any harm.

The corrected patch that will fix the broken "ref-by" functionality is:

    if( $theScope eq "topic" ) {
        $cmd = "find . -type f -name '%FILES%' -print | perl -pe 's|^\\.\/||;' | %GREP% %SWITCHES% -- $TWiki::cmdQuote%TOKEN%$TWiki::cmdQuote";
    } else {
        $cmd = "find . -type f -name '%FILES%' -print | perl -pe 's|^\\.\/||;' | xargs -n 5 %GREP% %SWITCHES% -l -- $TWiki::cmdQuote%TOKEN%$TWiki::cmdQuote";
    }

This patch (like the original one above) does not work for RegularExpression searches that include a semicolon (;) which indicates a boolean AND search. It is also slow because of the xargs command. I attach an alternate fix (context diff given by recursive_grep.txt) that solves both problems. This fix works only if you have a version of grep which supports the -r recursive and the --include=PATTERN filter options. Used together, they eliminate the need for xargs. In ad hoc testing on systems with 10,000 to 30,000 documents, the recursive grep is about 6-8 times faster than grep using xargs. However because of other processing overhead, the actual page rendering is only about twice as fast as the xargs version. Still, it is definitely noticeable. With this fix, you might be able to squeeze a few more productive months out of the 01 Feb 2003 version.

-- BrianPark - 25 Jun 2005


Category: TWikiPatches
Topic attachments
I Attachment History Action Size Date Who Comment
Texttxt recursive_grep.txt r1 manage 5.6 K 2005-06-25 - 04:54 UnknownUser Search.pm using recursive grep
Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r15 - 2005-06-25 - BrianPark
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.