Tags:
create new tag
, view all tags
KushalKumaran - 2011-09-19:

Regular expressions, as implemented by common programming languages, are way more powerful than "regular expressions", as intended by computer science theorists in connection with finite state automata. There are very good reasons behind this power. Even something very basic, such as backreferences, are beyond the capabilities of regular expressions. I found this: http://dev.perl.org/perl6/doc/design/apo/A05.html, where Larry Wall says: "... generally having to do with what we call "regular expressions", which are only marginally related to real regular expressions".


PeterThoeny - 2011-09-19:

Thanks for the pointer Kushal, I see there are dramatic changes/fixes to regexes coming with Perl 6.


EdGrimm - 2011-09-19:

That tagging is a great idea. I've used recursion in some toy code of mine, which works well, but has less than wonderful performance, and it has the limitation that the inner parentheses must be processed away or the outer parentheses cannot be evaluated. I haven't tested this yet, but it does have potential.

That having been said, it looks to me like it'd be better to put the tag after the parentheses. That way, the parenthesis itself indicates the next character will be the first character of a tag. There's no way any non-parenthesis will be confused as a tag temporarily, triggering a back track. You then need to have a sequence to end the tag, but since you can't possibly be looking at a false positive, a single, non-digit character will do.

So, instead of

$ROUND-esc-1( $TIME-esc-2( -esc-2), $TIMEDIFF-esc-2( $TIME-esc-3(
  $T-esc-4( R$ROW-esc-5( -esc-5):C$COLUMN-esc-5( -1 -esc-5) -esc-4)
 -esc-3), day -esc-2) -esc-1)
you have
$ROUND(1. $TIME(2. )2., $TIMEDIFF(2. $TIME(3.
  $T(4. R$ROW(5. )5.:C$COLUMN(5. -1 )5. )4.
 )3., day )2. )1.


PeterThoeny - 2011-09-20:

Thanks Ed for your insights. Not sure how much performance you gain by swapping the parenthesis with the escape token. The -esc- was just for visuals, in reality it is a null character. Does it make a difference in performance if you scan for .N( vs (N.? Your approach is better in a sense that you can take any non-digit char to terminate the sequence, vs. a null character to start the sequence.


Topic revision: r5 - 2011-09-20 - PeterThoeny
 

Twitter Delicious Facebook Digg Google Bookmarks E-mail LinkedIn Reddit StumbleUpon    
  • Download TWiki
TWiki logo Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.