Formatted Search Regex Cant Return Rest Of Page
It is not possible with the Regex syntax supported and documented for use in
FormattedSearch to skip the top of a topic up to a point and then return the remainder of the topic. This is documented correctly as a limitation, but I don't understand why we need to live with this limitation!
I am placing a unique marker on the page to illustrate this problem.
FIRST MARKER
And some text in the middle. The following is the sample for the email example.
2ND MARKER
According to the definition for the regex used in
FormattedSearch, it is possible to format the page using
- the entire topic using the
$topic variable.
- A portion of the page that is framed by two unique string sequences, returning the part in between:
$pattern(.*?FIRST MARKER(.*?)2ND MARKER.*)
%SEARCH{"FIRST MARKER" topic="%TOPIC%" format="$pattern(.*?FIRST MARKER(.*?)2ND MARKER.*)" nonoise="on"}%=
|
|
And some text in the middle. The following is the sample for the email example.
|
- A portion of the page that is framed by two unique string sequences, returning the frame as well:
$pattern(.*?(FIRST MARKER.*?2ND MARKER).*)
%SEARCH{"FIRST MARKER" topic="%TOPIC%" format="$pattern(.*?(FIRST MARKER.*?2ND MARKER).*)" nonoise="on"}%=
|
|
FIRST MARKER
And some text in the middle. The following is the sample for the email example.
2ND MARKER
|
- return the first part of the page up to a single unique pattern. You would like to use this:
$pattern((.*?)2ND MARKER.*)
%SEARCH{"FIRST MARKER" topic="%TOPIC%" format="$pattern((.*?)2ND MARKER.*)" nonoise="on"}%=
|
Formatted Search Regex Cant Return Rest Of Page
It is not possible with the Regex syntax supported and documented for use in FormattedSearch to skip the top of a topic up to a point and then return the remainder of the topic. This is documented correctly as a limitation, but I don't understand why we need to live with this limitation!
I am placing a unique marker on the page to illustrate this problem.
FIRST MARKER
And some text in the middle. The following is the sample for the email example.
|
- return a portion of one line, from a unique sequence to the end:
$pattern(.*?\*.*?Email\:\s*([^\n\r]+).*)
However, it is not possible to do the following:
- return the last part of the page after a single unique pattern. You would like to use this:
$pattern(?.*FIRST MARKER(.*))
%SEARCH{"FIRST MARKER" topic="%TOPIC%" format="$pattern(?.*FIRST MARKER(.*))" nonoise="on"}%=
|
|
)
|
As you can see, this doesn't work right. I've fiddled with various syntax forever, but indeed, it is documented that it won't work, because $pattern() needs .* at the end. It's parsing is so bad that it just stops when it sees .*, and then inserts the ) as if it were explicit.
--
Contributors: RaymondLutz - 25 Jul 2007
Discussion
Why the heck does the comment plugin not work here
--
KennethLavrsen - 25 Jul 2007
Maybe it has something to do with
but don't ask me how this can be.
--
FranzJosefGigler - 25 Jul 2007
Yes, a
.*) is not allowed
inside the regex since this pattern is used to determine the end of the regex pattern. As a workaround you can scan & capture greedily over all chars with a different syntax. For example, instead of capturing
(.*) you can capture
([^\x01]*). This assumes that the hex char
01 does not occure in the text.
- return the last part of the page after the last occurance of a pattern:
$pattern(.*FIRST MARKER([^\x01]*).*)
%SEARCH{"FIRST MARKER" topic="%TOPIC%" format="$pattern(.*FIRST MARKER([^\x01]*).*)" nonoise="on"}%=
|
|
(after marker)
and this is the end of the topic.
|
--
PeterThoeny - 28 Jul 2007
This is the last occurance of: FIRST MARKER (after marker)
and this is the end of the topic.