Global Spin: XML as a B2B Interface

XML as a B2B Interface

Click here to order from a bookstore near you.

Beyond content management, Extensible Markup Language (XML) has gained notoriety as a base protocol for business-to-business (B2B) integration efforts. XML enables business data to be encoded in a format that is both cross-platform and easily standardized. XML works well as a cross-platform protocol because it is a text format with Unicode support, which makes it readable by any platform with any character set. Standards can be created for languages based on XML by developing a Document Type Definition (DTD), an XML Schema, or a sample document for the specification. This enables businesses to describe their interfaces in terms of a named standard–the purchasing language cXML, for instance–instead of defining a custom interface for each business relationship.

Perl is an excellent language for implementing B2B interfaces because the process usually involves text processing, network interaction, and systems integration, which are three of Perl's strongest areas. Perl already connects to many common information systems, including most databases. Additional systems can be accessed by Perl programs through a number of local and network-based interfaces, including sockets and component layers such as COM and CORBA. In addition, existing Web interfaces are good candidates for B2B integration with Perl because most of the work of integrating backend systems already has been done.

Early XML-based B2B interfaces were developed for specific partner relationships, but companies and organizations soon realized that the cost of implementing a new interface for each partner was prohibitive, even using XML. Custom interfaces are giving way to generalized interface protocols such as XML for Remote Process Calls (XML-RPC) and Simple Object Access Protocol (SOAP), which offer better integration with existing software models. These generic interfaces also hold the promise of true interface automation, which should lead to more robust interfaces.

B2B Examples

With all the hype surrounding the Internet, e-commerce, and the new economy, it's amazing how many businesses still interact using low-tech methods. High-tech companies on the cutting edge are no exception–many times the fulfillment process for e-commerce orders consists of an employee printing and faxing the order to a supplier. Automation tends to stop at the door. Therefore, many processes outside the core business still start with hand entry and end with a printed confirmation. Many of these processes could benefit from B2B integration, regardless of whether the relationship is between businesses, organizations, or private users. When implemented in a standardized way, B2B integration promises to be as great an improvement in automated partner interactions as the Web was for user interactions.

B2B interaction takes many forms. Standards have been developed for many B2B-related XML languages, and more are in development. Business processes can be connected to each other directly, or they can interact through a trading floor. Web content can be distributed through B2B-style interfaces, as can other syndicated content.

Trading Floors

Trading floors are centralized repositories of XML documents, usually designed to provide generic interactions between a number of trading partners at the same time. Trading floors might be run by a company such as Ariba, which consolidates connections between businesses and their suppliers, or they might be implemented by information consolidators such as SourceForge.

Connecting to a trading floor with Perl is a job that can be tackled using existing modules. Trading floors usually have a specific set of XML standards that they support. Thus, all interaction with the trading floor has to be conducted using those languages. When a language is provided, modules such as XML::DOM or Orchard can be used to create, read, or modify documents for the interface. Modules such as LWP can be used if the trading floor uses HTTP or FTP network protocols, or custom protocols can be implemented using the IO::Socket::INET module. Local information can be gathered from the DBI modules (XML::DOM for Orchard, LWP for sockets, and DBI for databases). For even easier integration, document samples provided by the trading floor can be imported into Document Object Model (DOM) or Orchard structures as templates and local data can be added to them.

Creating a trading floor in Perl can be done just as easily. In fact, a trading floor in XML can be handled using templates, embedded programming, and a persistent environment–just as a Web site would. Persistent Perl is capable of handling the high traffic necessary for automating interactions between all the clients of a trading floor. Templates also can be used to implement new interfaces that enable clients to use their preferred tools. Translation from one interface style to another–between the cXML and CBL purchasing languages, for instance–can be provided as an additional service. All the logging, security, and customer service issues that pertain to a robust Web site pertain to a trading floor as well. Thus, it makes sense for both to use similar tools.

Early trading floors have had mixed success. Many trading floor operators have found that their potential clients need technical assistance getting their own systems integrated, let alone integrated with their partners. XML is an ideal language for structured communication, but many businesses have little or no expertise in using XML tools. Fortunately, Perl can help with this. Trading floor proprietors can implement lightweight Perl servers at client sites to serve as a bridge between client systems and the trading floor. Costs can be kept down by using inexpensive hardware and the open-source LAMP architecture–Linux, Apache, MySQL, and Perl.

Web Site Content Mirrors

An application of B2B interfaces that might not be obvious is the mirroring of Web sites and other Internet content. Companies such as Akamai and Digital Island offer services that spread Web site content around the Internet so that a high-speed connection always is available between a site visitor and a content mirror. Similar mirroring systems are used to deliver popular software–the Comprehensive Perl Archive Network (CPAN) itself is mirrored around the world. Common to all mirroring systems is the capability to transfer as little data as possible to reduce the network load on the central server. To this end, mirroring systems usually break Web pages into sections that can be reconstituted on the mirrored site.

Because existing mirroring schemes usually require a specialized interface, some improvement might be gained by implementing a generic XML interface to mirror content on remote sites. If a site is designed to use a set of XML files as core data, the files can be transferred from a central server to mirrors around the world. The mirror sites then can use templates to localize the content and integrate it with other mirrored content. Using XML as a data transfer language also enables mirror sites to provide dynamic content, including searchable databases. Databases can be synchronized by using an XML interface to transfer records that have been modified since the last update.

Content Syndication

Mirrors aren't the only way in which content gets reused. Traditional content syndication services, such as Reuters and Associated Press, offer news feeds to news sites around the Web. In addition, Web sites such as Freshmeat, PlanetOut, and Space.com have syndicated their content feeds to other Web sites. Many additional types of Web sites would benefit greatly from syndicating their content, using the syndicated content from other sites, or both. For instance, a company that produces electric cars might benefit from syndicated news about its cars from news sites or reliability reports from consumer sites. Those sites might benefit from updated pricing, availability, and release information about the cars. In other cases, open syndication–providing raw XML versions of site content–might be preferred. For instance, SlashDot offers its latest headlines in an XML format to encourage the development of desktop applications that display its content. Any site looking to build consistent traffic streams might benefit from a similar approach.

Perl and XML make content syndication as easy as creating or using local XML files. XML from syndication servers can be transferred to local storage using a scheduled process, or it can be gathered in real time through the LWP module. Caching algorithms can be implemented on the target sites to provide near-real-time updates without the problem of local storage. Templates then can be created to provide a layer of abstraction between site design and the underlying network protocols.

The real key to widespread content syndication is standards. If a new interface to content doesn't have to be created for each relationship between source and target sites, more sites can syndicate content from a wider group of sources. Standardized XML formats for content distribution will give rise to specialized Perl modules for accessing the content in the same way that XML's standard structure gave rise to modules such as XML::Simple. If content syndication interfaces become common enough that they require little effort to implement, more Web sites will benefit from content that is already available.

Implementations with XML-RPC

In 1998, Dave Winer of Userland (http://www.userland.com) developed a method for calling procedures on remote machines and returning the results in an XML format. His idea was to use existing Internet protocols–TCP/IP, HTTP, and XML–to pass messages back and forth from the client to the server. The result was XML-RPC, which is a specification that enables arbitrary method calls to be invoked with a standard set of arguments. The specification covers message structures, standard data types, and error handling.

XML-RPC has seen limited use, as compared to the more robust SOAP protocol, mainly because of the third-party support SOAP has gained since the two were developed. SOAP is designed to be applicable to a wider array of circumstances, but it is a much more complex protocol as a result. XML-RPC has a simpler specification and a much narrower set of implementation possibilities. As a result, in many circumstances, it's possible to implement an equivalent XML-RPC interface with simpler tools and less expertise.

Remote Procedure Calls (RPC)

An RPC is a method that is called on a class residing on another server. It's directly analogous to the kinds of calls that can be made on Perl modules, including a named method that takes parameters and returns a simple result. For instance, a class called headlines might provide a method called getHeadlineTitle, which takes the name of a magazine as a string argument and returns an article title as a string result. In XML-RPC, the method request and its associated response might look like the following:

Listing 16.

<methodCall>
 <methodName>headlines.getHeadlineTitle</methodName>
 <params>
  <param><value><string>PerlWeek</string></value></param>
 </params>
</methodCall>

<methodResponse>
 <params>
  <param><value><string>Sites that Bite</string></value></param> 
 </params> 
</methodResponse>

The method call is contained in a <methodCall> tag, which in turn contains a <methodName> tag containing the name of the class and method being invoked on the remote server. Arguments to the method are contained in a <params> tag, with each argument represented by a <param> tag containing a data type tag and the parameter value. In this case, the parameter is a string containing PerlWeek, but a parameter can be an arbitrarily complex construct made up of a number of data types.

The data types enabled by XML-RPC should be familiar to Perl programmers–with a few additions. The string and int data types correspond to most scalars that a Perl program uses. Perl doesn't distinguish between string variables and various possible classes of numeric variables, but variables can be processed to test whether they are the correct type, if necessary. The array data type corresponds directly to Perl arrays, and the struct data type in XML-RPC is like a hash variable in Perl. XML-RPC does not include data types for use as variable references, but both struct and array parameters can include recursive struct and array types as values.

The response to a method call is contained in a <methodResponse> tag, which contains a similar <params> tag to hold result values. The <methodResponse> tag also can contain a <fault> tag with an error definition if problems were encountered during processing.

Exposing XML-RPC Interfaces

The Frontier::RPC2 module, a Perl interface to the XML-RPC protocol, was developed by Ken MacLeod in response to the XML-RPC specification. (The "2 refers to the second iteration of the protocol–the first was developed for a server product called Frontier and simply labeled "RPC.") Frontier::RPC2 provides a number of methods for encoding Perl data structures into an XML-RPC request or response message. However, most interaction with the module occurs through the Client and Daemon modules included with the package.

An XML-RPC client can be implemented through the Frontier::Client module. Calls to Frontier::Client are simple and use an object-oriented structure. A client object is created with the URL of the XML-RPC server specified, and the call method is used to call a specified method with an array of arguments. Arguments might take the form of Perl scalars, arrays, or hashes, all of which are translated into the corresponding XML-RPC data types. Alternately, the arguments might be special XML-RPC data types, such as boolean or double, that can be created using Frontier::Client object methods of the same names. The response from a call method is returned as either an implicitly-translated Perl variable or a data type object that can be translated explicitly using a value method.

An XML-RPC server is implemented using the Frontier::Daemon module. The implementation is simple; XML-RPC methods are mapped to Perl subroutines using a hash specified when the Frontier::Daemon object is instantiated. The module converts incoming requests into subroutine calls with the appropriate parameters and converts the returned result into an XML-RPC response.

Implementations with SOAP

Another RPC interface that has seen widespread use is SOAP. SOAP was developed by Microsoft, IBM, Userland, and a host of other companies and organizations. SOAP was originally an offshoot of the same project that spawned XML-RPC, but it since has grown into a full World Wide Web Consortium (W3C) specification. Support for SOAP has either been developed, or is in development for, Java, C, Visual Basic, Python, and Perl.

Microsoft has enough faith in SOAP to make it a core part of its .NET initiative, which is hoped to become a platform for true distributed computing over the Internet. SOAP was originally a Microsoft project, but it has been adopted as a standard by so many other companies and organizations that it has taken on an independent meaning outside the .NET initiative. If standards are adhered to, SOAP clients and servers are likely to become the next big wave of Internet activity.

SOAP::Lite

Perl support for the SOAP protocol is provided by the SOAP::Lite module written by Paul Kulchenko. Like XML::Simple, SOAP::Lite provides an XML interface without the need to program around the peculiarities of XML. It translates Perl structures into SOAP envelopes, which are SOAP messages containing routing and format information as well as method calls and serialized data. SOAP::Lite also handles most aspects of SOAP network connections, including compatibility layers for the slight differences in current SOAP server implementations.

SOAP implements data types similar to those used by XML-RPC, and a slew of additional data types. SOAP data types come mostly from the XML Schema specification, which defines simple types, such as strings, as well as complex types. In addition, XML Schema provides a method for declaring custom SOAP data types for a specific request, which enables complex structures to be defined and reused throughout a collection of SOAP services.

A SOAP::Lite Server

Although SOAP messages are designed to be the same for both request and response, SOAP::Lite is implemented in two parts: a client and a server. The server accepts SOAP requests and creates a result based on Perl modules provided to the SOAP::Lite module. Listing 18.1 is an example of a SOAP-based headline server implemented using SOAP::Lite's Common Gateway Interface (CGI)-style server.

Listing 16.1 Simple SOAP::Lite Headline Server

01 #!/usr/bin/perl -w
02 use strict;
03 use SOAP::Transport::HTTP;
04 
05 SOAP::Transport::HTTP::CGI
06  -> dispatch_to('Headlines')
07  -> handle;
08 
09 
10 package Headlines;
11 
12 sub get_headline 
13 {
14  my $class = shift;
15  my $source = shift;
16 
17  my %headlines = 
18  (
19   'Slashdot' => {title => 'Perl Causes Warts',
20          url  => 'http://www.slashdot.org/',
21          synopsis => 'Jon Katz uncovers a Perl scandal.'},
22   'PerlWeek' => {title => 'SOAP and You',
23          url  => 'http://www.perlweek.com/',
24          synopsis => 'An overview of the personal impact.'},
25  );
26 
27  return $headlines{$source};
28 }

Lines 03[nd]07 of Listing 18.1 are the core of the SOAP interface. Line 03 loads the SOAP::Transport::HTTP module–a part of the SOAP::Lite bundle that implements the HTTP interfaces. Line 05 starts the call to the SOAP::Transport::HTTP::CGI class, line 06 calls the dispatch_to method, and line 07 calls the handle method to handle the incoming request. dispatch_to is used to declare classes that are made available to the SOAP::Lite server. In this case, the Headlines package defines all the methods available to SOAP requests. Other packages and modules could be added to the dispatch_to parameters to provide multiple classes, and package directories can be specified as well.

Lines 10[nd]28 of Listing 18.1 define the simple package Headlines, which has one subroutine, get_headline. The get_headline subroutine takes one argument and returns a hash reference with keys for the title, url, and synopsis of an article. When the server receives a SOAP request destined for the program in Listing 18.1, SOAP::Lite does the following:

Deserializes the request from a SOAP envelope into a usable format
Determines which class the request is being called against (Headlines)
Calls the subroutine (get_headline) corresponding to the SOAP method
Serializes the result (a reference to a hash containing article details) into a SOAP response envelope

SOAP is a protocol that can be used over many transports, including HTTP, FTP, and POP3/SMTP. SOAP::Lite correspondingly provides a transport class for each transport. In addition, SOAP::Lite provides a number of different servers for use in a Web context, including a mod_perl server, an Apache module, a CGI-style server, and a standalone server that implements its own network daemon.

A SOAP::Lite Client

SOAP::Lite clients connect to SOAP servers and process server methods with Perl data types given as arguments. Listing 18.2 is an example of a SOAP-based headline display client designed to be incorporated into a dynamic Perl Server Pages (PSP) page.

Listing 16.2 Simple SOAP::Lite Headline Client

01 <tag name="headline" accepts="source">
02 
03 <perl>
04 use SOAP::Lite +autodispatch => 
05  uri  => 'http://localhost/Headlines',
06  proxy => 'http://localhost/cgi-bin/headlines.cgi';
07 
08 my $h = Headlines->get_headline($source);
09 </perl>
10 
11 <output>
12 <p><b><a href="$h->{url}">$h->{title}</a></b>
13 <br />$h->{synopsis}</p>
14 </output>
15 
16 </tag>

Line 01 of Listing 18.2 declares a PSP tag called headlines that accepts an attribute called $source. Lines 04[nd]06 set up the SOAP::Lite interface. Line 04 invokes the autodispatch feature of SOAP::Lite, which treats remote classes as though they are local Perl modules. Line 05 declares the name of the object being called on the server, which generally is defined by the server itself. In this case, the namespace is http://localhost/Headlines because the Headlines class is being called from the server at localhost. Line 06 declares the endpoint of the SOAP server, which, in this case, is the URL at which the server program can be found.

Line 08 calls the get_headline method of the Headlines class from the server, which was defined in Listing 18.1. This class and method would be called in exactly the same way, no matter how it was implemented. The deserialized result of the get_headline method is saved to the variable $h. Lines 11[nd]14 display the results as though $h were any other Perl hash reference. The tag created by this definition could then be called from any PSP page:

Listing 16.

<headlines source="PerlWeek" />

As line 08 of Listing 18.2 illustrates, SOAP servers can be used in a Perl program as though they were repositories of Perl modules. For e-commerce systems, this provides a novel way to implement functions such as credit-card authorization, shipping calculations, and any other library functions that need a simple interface to remote systems. Eventually, interfaces to common services can be created by providers in the same way that Web sites are created. These Web services provide a standardized, distributed set of functions that can be incorporated into any network-enabled application. (See Chapter 19, "Web Services," for more applications of this idea.)

Sidebar: Paul Kulchenko

One of Perl's strongest aspects is the community that surrounds it, especially the many people who contribute updates, fixes, and new modules. The Perl community has shown time and again that it can write excellent software as a group without needing a controlling corporation to drive the process. If there's a single shining example of the community that makes Perl a robust language, it's Paul Kulchenko. In SOAP::Lite, Paul created a module that exemplifies the simplicity, flexibility, and intuitiveness of Perl. Above all, he wrote it single-handedly.

The SOAP interface originally developed for Perl was awful. It was complex, half-finished, and it didn't do its job. That's really the worst a Perl module can do. I first ran across it when I was tasked with adding SOAP capabilities to an XML product. I had looked over the SOAP protocol and was suitably confused by it. So, I was hoping for a simple, easy-to-implement SOAP module that would abstract away most of the extraneous aspects of SOAP so that I could concentrate on the connection to my server. Unfortunately, what I found was a module that was unusable–literally. It would fail when invoked with SOAP. Ordinarily, this would be an excuse to dive under the hood and make my own fixes, but the code in the SOAP module was difficult to follow. I finally gave up on it and decided to write my own.

I was only a few days into my own SOAP module when I heard about SOAP::Lite. Paul Kulchenko had noticed the abysmal state of the SOAP module as well, and decided as I did to start fresh with a simpler implementation. Paul's module was truly simple to use, and I was able to implement the interface I needed using a very early version of it–version 0.36, I believe. Since then, Paul has added a host of new features–all of which work beautifully–in response to user requests. His module has been cited as the epitome of SOAP implementations, and it's given Perl a leg up on the SOAP-based landscape of the next generation Web. I just hope that Paul gets suitably rewarded for his efforts and that more Perl enthusiasts like him follow in his footsteps.

Summary

Perl and XML provide a good infrastructure for building B2B interfaces between business partners and other organizations. The possibilities for B2B integration are endless–purchase cycles, content mirrors, and content syndication are just three of them. Custom B2B interaction languages can be created, or a standard such as XML-RPC or SOAP can be used to integrate RPCs from remote machines as though they were local Perl modules.