| Author: | Mark Nodine |
|---|---|
| Contact: | mark.nodine@mot.com |
| Revision: | 1.3 |
| Date: | 2005-05-31 |
| Copyright: | This document has been placed in the public domain. |
/_TOOLS_/plat/perl-5.6.1-twiki/bin:/bin
This document explains how to write new modules to extend trip. There are two principal mechanisms by which trip can be extended: adding new plug-in directives and adding new writers. For either of these tasks, the programmer should be familiar with the DOM data structure and the DOM.pm subroutines.
To add a plug-in directive, the programmer should be familiar with the RST::Directive:: routines starting with RST::Directive::arg_check, as well as the RST:: routines RST::system_message and RST::Paragraph.
A plug-in directive can be added by creating a Perl module with the same name as the directive (with a ".pm" extension, of course). The Perl module must have a BEGIN block which registers the routine to call to process the directive using RST::Directive::handle_directive.
As an example from the if plug-in directive,
BEGIN {
RST::Directive::handle_directive('if', \&RST::Directive::if::main);
}
Whatever routine you designate will get called with the following arguments:
- $name:
- The directive name. This argument is useful if you use the same routine to process more than one directive with different names.
- $parent:
- Pointer to the parent DOM object. It is needed to add new DOM objects to the parent's contents.
- $source:
- A string indicating the source for the directive. If you call RST::Paragraphs to parse reStructuredText recursively, you should supply it a source like "$name directive at $source, line $lineno".
- $lineno:
- The line number indicating where in the source this directive appears.
- $dtext:
- The directive text in a format suitable for parsing by RST::Directive::parse_directive. It consists of only the arguments, options, and content sections.
- $lit:
- The complete literal text of the explicit markup containing the directive. Used for generating error messages.
The directive's routine will return any DOM objects representing system messages. It will also likely produce side-effects by appending new DOM objects to the parent object's contents.
The first thing the directive's routine will usually do is to call RST::Directive::parse_directive as follows:
my $dhash = RST::Directive::parse_directive($dtext, $lit, $source, $lineno);
It is recommended that if the directive encounters any parse errors (wrong number of arguments, does/does not contain content, etc.), that it return a system_message DOM object formatted with RST::Directive::system_message to label the message as having come from the specific directive.
It is also up to the package to provide the documentation that appears when the user runs trip -h. Any comment in the perl module within a =begin Description .. =end Description section of a Perl POD section is printed for the module's help. For example, here is the help documentation from the if directive:
=pod
=begin reST
=begin Description
Executes its argument as a perl expression and returns its
content if the perl expression is true. The content is
interpreted as reStructuredText. It has no options. It processes
the following defines:
-D perl='perl-code'
Specifies some perl code that is executed prior
to evaluating the first perl directive. This
option can be used to specify variables on the
command line; for example::
-D perl='$a=1; $b=2'
defines constants ``$a`` and ``$b`` that can
be used in the perl expression.
-D trusted Must be specified for if directives to use any
operators normally masked out in a Safe environment.
This requirement is to prevent an if directive in a
file written elsewhere from doing destructive things
on your computer.
=end Description
=end reST
=cut
Note
The help text should parse correctly as reStructuredText, since it is passed through trip to create the web documentation.The output from a writer is generated by traversing the DOM tree recursively. There can be multiple phases of traversal, and the value produced by the top-level DOM object in the final phase is what actually gets written to the output.
Each writer exists in a file that is the writer's name with the extension .wrt. A .wrt file has a special write schema format specifically designed to make development of writers easy. Here is a BNF for the write schema file format:
parser := phase_list
phase_list := phase_desc | phase_list phase_desc
phase_desc := phase id eq '{' NL sub_list NL '}' NL
phase := 'phase' |
eq := '=' |
sub_list := sub_desc | sub_list sub_desc
sub_desc := sub id eq '{' NL perl_code NL '}' NL
sub := 'sub' |
An id is any sequence of non-space characters. NL is a newline. perl_code is the perl code for a subroutine. Note that the words "phase" and "sub" are optional, as is the equal sign ("=") between the id and the open brace.
The id's associated with phases are arbitrary. The phases are executed in the order they appear in the file. 1 The names of the subroutines are regular expressions to match the tag field in the DOM structure. The first subroutine in the phase whose regular expression matches the tag field of the DOM object to be processed is the one that is called, and is referred to as the handler for that tag. The handlers are called doing a post-order traversal of the tree; in other words, once all of the children (members of the content field) of a DOM object have had their handler called, the DOM's own handler is called. The arguments of the subroutine are:
- $dom:
- A reference to the DOM object being processed.
- $str:
- The concatenation of the strings returned by the handlers of all the children of the DOM object being processed.
- $parent:
- A reference to the parent of the DOM object.
The subroutine needs to return a string that is the combined result of processing all the layers from the DOM on down (assisted, of course, by the $str argument). The result returned by the subroutine gets cached in the val field of the DOM object for future use, as well as being propagated as part of the $str argument of the parent's handler routine.
Options to the writer can be specified using a -W define, which has the format
-W var[=val]
If no value is supplied, then the value defaults to 1. Any such defines become available to the writer directly in a variable $var.
As an example, here is the code for the dom writer:
phase PROCESS {
sub \#PCDATA = {
my ($dom, $str) = @_;
if (! $nobackn) {
$dom->{text} =~ s/\n\n/\n\\n\\\n/g;
$dom->{text} =~ s/ $/ \\n\\/;
}
$dom->{text} .= "\n" unless substr($dom->{text},-1) eq "\n";
return $dom->{text};
}
sub .* = {
my ($dom, $str) = @_;
$str =~ s/^/ /mg unless $str eq '';
my $attr = defined $dom->{attr} ?
join('',map(qq( $_) . (defined $dom->{attr}{$_} ?
qq(="$dom->{attr}{$_}") : ""),
sort keys %{$dom->{attr}})) : '';
my $internal = '';
if (defined $dom->{internal} && %{$dom->{internal}}) {
my $int = $dom->{internal};
$internal = " .. internal attributes:\n";
my $spaces = (" " x 9);
$internal .= "$spaces.transform: $int->{'.transform'}\n";
$internal .= "$spaces.details:\n";
my $key;
foreach $key (sort keys %{$int->{'.details'}}) {
my $val = $int->{'.details'}{$key};
my $string;
if (ref($val) eq 'DOM') {
$string = main::ProcessDOM($val);
$string =~ s/^/$spaces /mg;
$string = "\n$string";
}
elsif ($val eq "") { $string = " None\n" }
else { $string = " $val\n" }
$internal .= "$spaces $key:$string";
}
}
return "<$dom->{tag}$attr>\n$str$internal";
}
}
This example is perhaps not typical, since it needs to call the internal main::ProcessDOM routine in order to process the DOM objects in the internal .details field of the DOM; most writers should have no need to do so.
It is also up to the writer to provide the documentation that appears when the user runs trip -h. Any comment in the writer appearing in a POD (Perl's Plain-Old-Documentation) Description section is printed for the writer's help. For example, here is the help documentation from the dom writer:
=pod
=begin reST
=begin Description
This writer dumps out the internal Document Object Model (DOM, also
known as a doctree) in an indented format known as pseudo-XML. It
is useful for checking the results of the parser or transformations.
It recognizes the following defines:
-W nobackn Disables placing "\\n\\" at ends of lines that would
otherwise end in whitespace.
=end Description
=end reST
=cut
Note
The help text should parse correctly as reStructuredText, since it is passed through trip to create the web documentation.