MythStream Parsers
1. Parser interface
Input
Three parameters are passed to the parser:- The name of the file containing content fetched by the harvester. Note that the harvester doesn't fetch any data when the parser name in the handler field is prefixed with an asterisk (*parsername)
- The stream item url, passed between double quotes. These quotes must be removed in the parser. Note that if the parser uses the url to fetch content, the harvester fetch step can be skipped. If this is the case, use the *-prefix in the handler field.
- The stream name, passed between double quotes. These quotes should be removed in the parser.
Output
The MythStream harvester expects the following xml output from the parser:
<items> <item> <name>stream name</name> <url>stream url with optional name::value pairs</url> <descr>stream description</descr> <handler>handling hint or parser script name</handler> <meta> <name>name</name> <content>the meta data</content> <viewer>viewer</viewer> </meta> </item> ... </items>The "descr", "handler" and "meta" nodes are optional.
The handler field tells MythStream how to handle the url in the stream item. Values:
- <parser name>: the harvester downloads and stores the contents the url refers to before calling the parser
- *<parser name>: the harvester skips the download step and calls the parser directly
- STREAM_DL: start a download, and match the downloaded file to the stream item using stream item url field
- FUZZY_DL: start a download, and match the downloaded file to the stream item using stream item name field
Use STREAM_DL for non-dynamic stream url's. Note 1: when used with dynamic url's the match between stream item and downloaded file breaks when a new session is started. If a user revisits such a stream item and selects it a new download will start. Note 2: if the match between stream item and download is broken, the download can still be accessed in the "downloads" folder.
If not sure, use FUZZY_DL and see if that works ok.
The viewer field of the meta node can contain the special values:
- text: show as plain text after opening viewer window
- html: parse as html after opening viewer window
- inline: show as line of text in the information window
- url: (not implemented yet) url for
user feedback
The url field supports one or more <:name::value:> tags. These tags bring up a popup window allowing the user to edit the value (with virtual keyboard if desired). The last entered value is stored in the stream storage. This can be used to generate dynamic queries, or ask for a result set number (often stream index sites return searches in multiple pages). Example from the demo database:
http://www.dailymotion.com/rss/relevance/search/<:search term::funny:>/<:page::1:>
2. Debugging
When a parser fails to deliver stream items the message “no
url's found” is shown in the message field (below status and time
labels in MythStream). When a parser returns data in an incorrect
format the message "parser problem please check on commandline"
is shown.
The harvester stores fetched data in the file
list.xml in the parser directory. If a parser fails, this file can be
used to test the parser on the command line:
cd ~./mythtv/mythstream/parsers perl pathto/parser.pl list.xml
3. Example parser
#! /usr/bin/perl use English; use XML::Simple; use XML::DOM; $xml = new XML::Simple; my $doc = XML::DOM::Document->new; my $head = $doc->createXMLDecl ('1.0'); my $root = $doc->createElement('items'); sub newNode { local $name = shift; local $value = shift; local $node = $doc->createElement($name); local $text = $doc->createTextNode($value); $node->appendChild($text); return $node; } #------------------------------------------------------------------------------ # Init and Run #------------------------------------------------------------------------------ &read_parse(); # get commandline parameters into @in $source = $in[0]; # source filename from command line eval { $data = $xml->XMLin($source); }; foreach my $entry (@{$data->{item}}) { $temp = $entry->{enclosure}; if ($entry->{link} ne "") { $item = $doc->createElement('item'); $item->appendChild( newNode('name', $entry->{title}) ); $item->appendChild( newNode('url', $entry->{link}) ); $item->appendChild( newNode('descr', $entry->{description}) ); $item->appendChild( newNode('handler', 'STREAM_DL') ); $root->appendChild($item); } } print $head->toString; print $root->toString; print "\n"; #-------------------------------------------------------------------------------- # get command line parameters #-------------------------------------------------------------------------------- sub read_parse { local (*in) = @_ if @_; local ($i); push(@in, @ARGV); foreach $i (0 .. $#in) { $in[$i] =~ s/\+/ /g;} return scalar(@in); }