Tags:
create new tag
, view all tags

Idea: SEARCH With Regular Expression Sort

Spec TBD.

-- Contributors: PeterThoeny - 05 Oct 2006

Discussion

Below disucssion is moved from AutoIncTopicNameOnSave to here.

-- PeterThoeny - 05 Oct 2006

I think that the discussion about sort order begs the real question: Rather than padding numbers out to a fixed length for the sorting...

How about extending SORT to handle numeric order?

  • Much as does UNIX sort --field 4 -n

Instead of getting into field numbers, provide a way of extracting a sort key - e.g. a regexp - and then specifying a sort order on that

  • e.g. s/Item\([0-9]+\).*/\1\t\&/
  • and then sort --field 1 --numeric
?

-- AndyGlew - 02 Oct 2006

Sort numerically by topic name: Not sure how this can be defined in a generic & useful way with a %SEARCH{}%.

-- PeterThoeny - 02 Oct 2006

If you support regexps

  • define a regexp to extract the fields, concatenating them in order from primary through lesser keys
  • concatenate using something standard - tab or the like
  • this defines fields
  • then specify a numeric/alphabetic sort on a field basis.

E.g. Item0-Subject, Item6565-subject

%SEARCH{ topic="Ite*", sort_regexp( s/^Item\([0-9]+\).*/\1\t\&/, field1=numeric}

-- AndyGlew - 05 Oct 2006

This cold be useful for some wiki applications, although a bit complex to use. We should find a spec that is easy to grasp and is flexible. For example, sort with regex could be on topic name, a form field value, or a regex on topic text.

-- PeterThoeny - 05 Oct 2006

Do we need to specify the regular expression? Just specify "numeric" and let the code figure it out. The numeric is really a flag saying sort any embedded numbers as numbers.

I sketched out a test program where I sort a list of items containing either prefixed or postfixed numbers (ie. item1 or 1item). The code then figure out which case and sorted accordingly.

Here is the testdata:
item1
item2
item21
item31
item3
item04
item50
item0005
item100

The Result:
item1
item2
item3
item04
item0005
item21
item31
item50
item100

The test Code:

#!/usr/bin/perl

use strict;
use Data::Dumper;


sub by_numeric {
    my($res);

    if( $a->[0] =~ m/^\d+$/ && $b->[0] =~ m/^\d+$/ ){
        $res = $a->[0] <=> $b->[0];
        return( ($res == 0) ? $a->[1] cmp $b->[1] : $res );
    } elsif( $a->[1] =~ m/^\d+$/ && $b->[1] =~ m/^\d+$/ ){
        $res = $a->[0] cmp $b->[0];
        return( ($res == 0) ? $a->[1] <=> $b->[1] : $res );
    }
} # by_numeric

sub Main {
    my(@data, @split);

    @data = <STDIN>;

    foreach ( @data ){
        $_ =~ s/[\r\n+]$//;
        if( $_ =~ m/^(\d+)([^0-9]+)$/ || $_ =~ m/^([^0-9]+)(\d+)$/ ){
            push(@split, [$1, $2]);
        }
    }
    print STDOUT "in: ", Dumper(\@data), "split", Dumper(\@split), "\n";
    print STDOUT "sort: ", Dumper([ sort by_numeric @split]), "\n";
    print STDOUT "joined:\n", join("\n", map { join('', @$_); } sort by_numeric @split), "\n";

}

&Main();

Is this what you want? Could always be extended to handle embedded numbers if needed.

-- CraigMeyer - 06 Oct 2006

I experimented with extending with order=numeric. And spliting into non-numeric, numeric, whats-left. It seems to do what you wanted. Here are the code fragments;

sub by_numeric {
    return( $a->[0] cmp $b->[0] || # 1st term non-numeric
            $a->[1] <=> $b->[1] || # 2nd term Numeric
            $a->[2] cmp $b->[2]    # Optional 3rd term non-Numeric
            );
} # by_numeric

in Search.pm "sub searchWeb" just before if( $sortOrder eq 'modified' ) add

if( $sortOrder eq 'numeric' ){
    @topicList = map { join('', @$_); } sort by_numeric
                       map { ($_ =~ m/^([^0-9]+)(\d*)(.*)$/) ? [$1, $2, $3] :
                       [$_, '', '']; } @topicList;
   } elsif( $sortOrder eq 'modified' ){

-- CraigMeyer - 06 Oct 2006

 
Topic revision: r3 - 2006-10-06 - CraigMeyer
 
Twitter Delicious Facebook Digg Google Bookmarks E-mail LinkedIn Reddit StumbleUpon    
  • Download TWiki
TWiki logo Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.