Publishing XML for Wireless Devices
part of Perl for the Web
Publishing content to wireless devicesincluding cell phones, Palm computers, and other personal digital assistants (PDAs)is a good use of Extensible Markup Language (XML). Many of the data formats being adopted for these devices are defined as XML languages, and sharing document data between standard HTML-producing Web applications and new wireless applications is best done through a structured format such as XML. In addition, the data shared by these applications can be used to create new applications using much of the same code base.
Taking advantage of new content outlets requires a new way of presenting information on a site. For sites with underlying XML documents, one answer is to provide different views of the same data in a variety of formats. These views are easy to provide if a site is already using templates and persistent Perl programs. The key is to use the Web server to route incoming requests to the appropriate Web applications and to provide the full path to the requested information.
Wireless Markup Language (WML) Decks for the Wireless Web
Wireless devices would seem like a natural way to access the Web. Web-based directions, address books, and other mobile-friendly data often are needed when a computer isn't nearby, and wireless networks provide connections virtually anywhere. Unfortunately, Web access from the current generation of wireless devices is abysmal. Screens on most wireless devices are tiny, with poor resolution, no color, and few input methods. Wireless Web connections are slow, and the memory limitations of cell phones are extreme. As a result, Web pages written for display on large, colorful monitors with full keyboards and high-speed connections are likely to be unrecognizable and unusable on a wireless device.
An early answer to the wireless Web problem came in the form of the Wireless Access Protocol (WAP) and the Wireless Markup Language (WML). WAP was implemented as a low-bandwidth protocol for retrieving Web information from WAP gateways that provide bridges to the Web. WML is an XML language that emphasizes the tiny screen sizes and special navigation requirements of wireless devices by using a subset of the simpler formatting aspects of HTML combined with additional WAP-centric tags.
WML is set up like a deck of cards, where each card is an HTML-like page and cards are collected into decks, which are complete WML files. The cards are designed to carry screen-sized chunks of information, and decks collect related cards to reduce the wait associated with repeatedly accessing the wireless network. When accessed, each WML file is compiled into a stream of bytecode by the WAP gateway and delivered to the wireless device by the WAP protocol. The workings of the WAP protocol and the format of the bytecode are immaterial to WML site designers, however, because WAP gateways create the bridge from WAP to HTTP automatically.
WML and Future Standards
WML has had a difficult time gaining acceptance because of the restrictions imposed on it by devices and by design. For instance, WML documents are limited to a maximum size determined by the bytecode produced by the WAP gateway. In addition, because WML is an XML language, the syntax for WML files is strict. Site designers who are used to the forgiving nature of HTML browsers sometimes find it frustrating when a WAP gateway rejects their WML pages due to malformed syntax. In addition, no indication is given to the client or the server as to why the WML file was rejected, so fixing WML is usually a trial-and-error process. Listing 17.1 is a WML version of the XML table of contents in Listing 16.1 from Chapter 16, "XML and Content Management."
Listing 16.1 Table of Contents in WML
01 <?xml version="1.0"?>
02 <!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN"
03 "http://www.wapforum.org/DTD/wml_1.1.xml">
04 <wml>
05
06 <template>
07 <do type="prev" name="Prev" label="Prev">
08 <prev/>
09 </do>
10 </template>
11
12 <card id="index" title="Perl for the Web">
13 <p>Table of Contents:</p>
14 <p><a href="#c0">Introduction</a></p>
15 <p><a href="#c1">Foobar chapter</a></p>
16 <p><a href="#c2">Barbaz chapter</a></p>
17 <p><a href="#cA">Appendix A</a></p>
18 </card>
19
20 <card id="c0" title="Introduction">
21 <p><a href="c.psp/c0/s1/p0.wml">About this book</a></p>
22 <p><a href="c.psp/c0/s2/p0.wml">Conventions</a></p>
23 </card>
24
25 <card id="c1" title="Foobar chapter">
26 <p><a href="c.psp/c1/s1/p0.wml">How to Foobar</a></p>
27 <p><a href="c.psp/c1/s2/p0.wml">How to Foobaz</a></p>
28 </card>
29
30 <card id="c2" title="Barbaz chapter">
31 <p><a href="c.psp/c2/s1/p0.wml">How to Barbaz</a></p>
32 <p><a href="c.psp/c2/s2/p0.wml">When not to Barbaz</a></p>
33 </card>
34
35 <card id="cA" title="Alphabet Soup: Reference and Glossary">
36 <p><a href="c.psp/c3/s1/p0.wml">XML Bestiary</a></p>
37 <p><a href="c.psp/c3/s2/p0.wml">Specifications and Organizations</a></p>
38 </card>
39
40 </wml>
The table of contents WML file in Listing 17.1 is laid out as a deck with five simple cards. The file starts by defining a template for all the cards in the deck with the <template> tag on line 06. (This <template> tag is valid WML, not a server-side tag.) The template defines a <do> tag in lines 07[nd]09, which, in this case, sets a Prev menu option that sends the browser to the previous card viewed. The index card defined in lines 12[nd]18 is the default card displayed and contains a top-level listing of the chapters in the table of contents. Each chapter listing is also a link to the card detailing the chapter. Line 14, for instance, is a listing named "Introduction," which points to the card named c0 that is defined in lines 20[nd]23. Each chapter card contains a listing of chapter sections, which are hyperlinks to the associated WML files. The end result is a WML deck that enables wireless Web users to drill down through small sets of menu choices to find a specific section, as shown in Figure 17.1. In practice, this file would be generated from the original XML by a Perl script that keeps track of the relationships between cards in the deck.
***Insert figure 17.117hpp01.tiffSC***crop
Figure 17.1
Table of contents viewed in a WAP simulator.
After WML was introduced, a competing language called iMode was developed by a NTT DoCoMo, a company that produces cell phone interfaces. Based on a subset of HTML, iMode takes a more lenient approach to wireless Web site design. iMode sites are very similar to HTML-based sites, with the caveat that iMode devices have small screens and tiny memories and only understand basic HTML tags. Unfortunately, iMode phones can't access WML sites and vice-versa; a site hoping to offer services to both would be forced to create a separate version of the service for each.
Recently, a new wireless markup standard based on the Extensible Hypertext Markup Language (XHTML) was developed and christened XHTML Basic. The idea behind XHTML Basic is to provide a language that has the familiar usage of HTML, the strict syntax of XML, and a reduced set of tags tailored to wireless devices. XHTML Basic has already been agreed to as both the next version of iMode as well as the next version of WML. Therefore, content managers who convert site content to WML directly might find themselves with another conversion project very soon. Converting site content to an intermediate XML format might be a better choice because structured XML documents lend themselves to dynamic transformations. When used with templates, an intermediate XML format enables HTML, WML, and new standards to be presented simultaneously by the same site. See the "Multihomed Documents" section later in this chapter for more information.
Book Content for Wireless Devices
Although reading an entire book on a cell phone is not recommended, it's sometimes handy to have Web content in a form that is accessible anywhere at any time. For example, it might be useful to offer the book content from Chapter 16 in a wireless format. Luckily, the framework for this has already been set up; chapters already are segmented into individual sections and paragraphs, and a template system already has been created to provide access to the chapters through XML::Simple.
These same principles would apply for any site information from any data source offered in any text-like form, not just book content from XML files for the wireless Web. For instance, a news site such as SlashDot might want to offer headlines from its database as VoiceXML, which is an XML format used to provide voice interaction with text data. The use of templates and the persistent Perl processor still would be the same. In fact, after these techniques are developed for one XML format, very little additional work is required to adapt them to other formats over different channels.
Adding Templates for WML
As described in Chapter 16, chapters are stored in Simple Book Format (SBF) files. To produce WML files from the SBF file, a template can be used, just as it would to produce HTML output. In this case, using a template gives an additional benefit; the template can be tested on a wireless device, and all pages generated from the tested template can be assumed to work identically. Listing 17.2 is a template that sets a paragraph from the SBF chapter file into a WML card format with simple navigation. (See Listing 17.3 later in this chapter for the Perl program that processes this template.)
Listing 16.2 SBF Paragraph to WML Template
01 <tag name="wmltemplate" accepts="title, next, previous, paragraph">
02 <output>
03 <?xml version="1.0"?>
04 <!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN"
05 "http://www.wapforum.org/DTD/wml_1.1.xml">
06 <wml>
07 <card id="index" title="$title">
08
09 <do type="prev" name="Prev" label="Prev">
10 <go href="p$previous.wml" />
11 </do>
12 <do type="options" name="Next" label="Next">
13 <go href="p$next.wml" />
14 </do>
15
16 <p>$paragraph->{content}</p>
17
18 </card>
19 </wml>
20 </output>
21 </tag>
The WML deck template in Listing 17.2 is laid out in a fashion similar to the table of contents deck in Listing 17.1. The template is set up using the <tag> tag in lines 01 and 21, which accepts variables set by the template processor. (This template also could be designed to accept the full parse tree, but processing the parse tree beforehand makes for a simpler, easier-to-test template.) Line 07 starts the card and sets the title. Lines 09[nd]14 create two menu options for navigation, Prev and Next, which are hyperlinks to the paragraphs before and after the one displayed. Line 16 displays the paragraph contents as provided. The result is a WML file that contains a single paragraph of the book XML file, as shown in Figure 17.2.
***Insert figure 17.217hpp02.tiffSC
Figure 17.2
Book paragraph viewed in a WAP simulator.
With this template and a coherent naming scheme, any chapter can be displayed paragraph by paragraph with paging provided by the Prev and Next buttons. Combined with the table of contents page and search pagewhich could be generated with similar templatesan entire book easily can be made available to any wireless device. Additional templates could be created after the transition to XHTML Basic is started, and updates to the underlying chapter files would be made available to all formats simultaneously.
Multihomed Documents
HTML, WML, VoiceXML, and other translations can be viewed as virtual versions of the underlying XML documents. As such, it's sometimes helpful to provide more coherent URL names to access the translated versions. A standard URL for a program-generated WML file might look like the following:
Listing 16.
http://www.site.com/perl/wap.pl?file=chapter12§ion=4
Unfortunately, the URL itself gives very little indication of the format of the resulting file. The program name wap.pl hints that the result might be in WML format, but it also might be in XHTML or another WAP format. In addition, the relationship between the parameter values and the result aren't easy to determine directly. If the same output came from a static file listed on the same site, it probably would have a more understandable URL:
Listing 16.
http://www.site.com/wap/chapter12/section4.wml
This URL is both easier to decipher and easier to remember. Because site designers and some visitors use URLs to understand the structure of a site, providing a URL that resembles a static location is sometimes preferable. This becomes more prominent when different views of the same underlying data are presented via separate document URLs. Translating a URL such as this is possible using multihomed documents, which are documents that are processed differently based on the way they're accessed. For instance, the following URLs all access the same file:
Listing 16.
http://www.site.com/thebook/chapter12.html
http://www.site.com/wap/c12/s4/p1.wml
http://www.site.com/xml/chapter12.xml
http://www.site.com/voice/chapter12/section4.vxml
Multihomed documents work well in situations in which data files are stored in a hierarchy that needs to be repeated in multiple formats. URL translation of this sort also enables underlying data sources to be changed while maintaining existing URLs. For instance, if a file is first offered as HTML and then changed to be generated from an XML source file, the transition can be eased by creating a multihomed document that responds to the original URL by generating the appropriate HTML output.
Uncoupling Document Names with PATH_INFO
Multihomed documents can be created in a number of ways, largely dependent on the Web server and Perl environment. In the simplest case, file information can be added to the end of a program URL as though the program were a directory. The additional information is passed to the program in the PATH_INFO environment variable. For instance, a program called c.psp that displays a WML file might be accessed by the following URL:
Listing 16.
http://www.site.com/c.psp/c1/s1/p0.wml
The Web server would access the c.psp program and give it the PATH_INFO value of /c1/s1/p0.wml. This value then could be used to determine the file to process and other request information. Form variables sent through GET and POST requests still would be processed in the ordinary fashion, if present. PATH_INFO is often used to indicate an actual file on the Web server, but it can indicate any other dynamic information as well.
Book Publisher with PATH_INFO
To publish book contents using URL information provided through PATH_INFO, it is necessary to translate the path into a form that the program can use to locate the requested information. This might be a simple change when translating an HTML path to an XML filename, or it can require regular expressions to extract variables encoded into a readable URL. Listing 17.3 is an example of the latter and uses PATH_INFO to store the parameters for generating a WML file.
Listing 16.3 WML Chapter Display Using PATH_INFO
01 <perl>
02 use XML::Simple ();
03
04 my ($c, $s, $p) = $ENV{PATH_INFO} =~ m{/c(.+?)/s(\d+)/p(\d+).wml};
05 $c = "0$c" unless (length($c) > 1);
06
07 my $xs = XML::Simple->new(forcearray => ['section', 'paragraph'],
08 memshare => 1);
09 my $chapter = $xs->XMLin("$ENV{DOCUMENT_ROOT}/thebook/chapter$c.xml");
10 my ($section, $title, @paragraphs);
11
12 if ($s)
13 {
14 $section = $chapter->{section}->[$s -1];
15 $title = $section->{title};
16 @paragraphs = @{$section->{paragraph}};
17
18 add_paragraphs(\@paragraphs, $section);
19 }
20 else
21 {
22 $title = $chapter->{title};
23 @paragraphs = @{$chapter->{paragraph}};
24 }
25
26 my $paragraph = $paragraphs[$p];
27 my $next = ($p < $#paragraphs) ? ($p + 1) : 0;
28 my $previous = ($p > 0) ? ($p - 1) : $#paragraphs;
29
30 print "Content-type: text/vnd.wap.wml\n\n";
31 </perl>
32
33 <include file="$ENV{DOCUMENT_ROOT}/templates/wmlparagraph.psp" />
34
35 <output>
36 <wmltemplate title="$title" next="$next" previous="$previous" paragraph="$paragraph" />
37 </output>
38
39 <perl>
40 sub add_paragraphs
41 {
42 my $paragraphs = shift;
43 my $section = shift;
44
45 push (@{$paragraphs}, @{$section->{paragraph}});
46
47 foreach my $subsection (@{$section->{section}})
48 {
49 add_paragraphs($paragraphs, $subsection);
50 }
51 }
52 </perl>
The majority of the code in Listing 17.3 finds a specified paragraph within a specified section within a specified chapter. These specified values come from the PATH_INFO environment variable, as extracted in line 04. The regular expression in Line 04 searches for the numeric parts of the path informationpreceded by c, s, and p designatorsand assigns the values to the $c, $s, and $p variables, respectively. Line 05 makes sure that the number in $c is in the same format as those used to designate the chapter files. Line 09 parses the requested chapter file, and lines 12[nd]24 load the requested section's paragraphs into the @paragraphs array. Line 26 chooses the requested paragraph, and lines 27 and 28 set the numbers of the previous and next paragraphs, wrapping around if necessary. Line 30 sets the proper content type for the WML file, and line 33 calls the template specified in Listing 17.2.
Adding Directory Processors
In most cases, using the URL of the program as the base URL of the multihomed document isn't an ideal solution. This is especially true when the program is being used to replace a set of static files with the output of dynamic translations. The solution in these cases is to make the program a directory processor, which is a program that handles all requests to a particular directory or file type. The Web server provides the file path as a PATH_INFO variable to the program specified as the processor for that directory. For instance, the following URLs would both receive the same response if the /thebook/c.psp program were defined as the processor for the /wap directory:
Listing 16.
http://www.site.com/thebook/c.psp/wap/c1/s2/p0.wml
http://www.site.com/wap/c1/s2/p0.wml
The procedure for setting up directory processors varies depending on the Web server being used. Version 1.3 of Apache Server, for instance, uses the Action and SetHandler directives in the httpd.conf configuration file to specify a directory processor. For instance, to set up the relationship used in the URLs in the previous code example, the following could be added to the httpd.conf file:
Listing 16.
Action wap-chapter /thebook/c.psp
<Location "/wap">
SetHandler wap-chapter
</Location>
The Action directive assigns the wap-chapter identifier to the program located at the /thebook/c.psp virtual file. Note that this location can be any kind of program, including persistent Common Gateway Interface (CGI)-style programs and templated programs using HTML::Mason or Perl Server Pages (PSP). The SetHandler directive then can be used in any new or existing Location or Directory block. The Location block is used for virtual locations and the Directory block is used for translated file paths within the file system.
WML Revisited
Multihomed documents suit WML and wireless devices for a number of reasons. First, multihomed documents make it possible to compress the URL identifier for a specific page down to the absolute minimum of characters. Because the WAP protocol defines an upper limit to page size, any effort to reduce the overall size of a page is worthwhile. With a multihomed document, the first hyperlink in the following code (from a file at /wap/c12/s2/p0.wml, for instance) can be replaced with the second hyperlink, which is a dramatic improvement over an already-short URL:
Listing 16.
<go href="c.psp?c=12&s=2&p=1" />
<go href="p1.wml" />
In addition, WML presents a taste of the kind of challenges Web application developers will face in the future. Web programmers don't want to write the same applications over and over again for different types of browsers, so a facility that enables multiple views into the same data might provide a more reusable approach to application design. For wireless devices, a version of the site can be provided for WML 1.1, WML 1.2, iMode, and XHTML Basic without having to update each version separately. With such a simple change in focus comes a great deal more freedom from the vagaries of standards.
Summary
Wireless devices pose a special problem for Web application designers. Their requirements are strictly defined, but the standards to which they adhere are likely to change. The solution is to create an interface that can be decoupled from the underlying data so that it can be recreated as new standards come available. Templates, XML documents, and flexible Perl programs can be used to generate today's WML files using the same XML documents used by existing Web applications. Tomorrow's standards and more can be addressed by creating multiple windows into the same data in a variety of formats and adding templates as necessary to support more.