Global Spin: The Power of Persistence

The Power of Persistence

Click here to order from a bookstore near you.

Chapter 8, "Performance Myths," covered the aspects of Web applications that won't help the performance of a CGI application, but it begs this question: "Where can Perl performance be improved on a Web server?" The answer is simple: don't exit. Another word for this idea is "persistence"; a persistent program is one that hangs around as long as it's needed without having to restart. Each new request to a persistent program is handled as part of a continuous stream instead of in fits and starts.

Implementing the solution is trickier than it looks. The core assumption of the Common Gateway Interface (CGI) and CGI programs is that the program is called, it responds to the call, and it is tossed away. Thus, the first thing to do when setting up persistence for a Web application is to replace the CGI interface entirely. CGI isn't set up to work in a persistent form. CGI has no way of accessing a program that's already running because it assumes it is starting a program. The replacement for this protocol has to provide the same bridge between the Web server and Web applications that CGI does, with the new assumption that it always is accessing programs that are already running. After that happens, it's still necessary to set up the persistent application itself. Because Web applications written for use with CGI assume that they are started fresh for a specific request, it's likely that an application that works well in a CGI environment needs some work to be made persistent.

There are a few ways to implement a persistent interface between a Web server and applications. The Web server usually provides one of its own, known as the server application program interface (API), which treats applications as extensions of the server process itself to be started at the same time. Another way of implementing a persistent interface is by using a Web server extension to start and manage external programs as they are accessed by incorporating them into the extension at runtime. Perl makes this possible by providing a mechanism by which external programs can be compiled into a Perl program while it's running. A third way to achieve persistence is by using a network-like protocol to communicate between the Web server and an external process, which contains the Web application. Additional forms of persistent Perl applications can be created using a mix of these three techniques as well.

These changes don't just affect one aspect of the CGI performance bottleneck. When applications are persistent, it's possible to tune performance even further by taking real advantage of the hardware and environment. Persistent applications can be clustered to use memory more efficiently, cached to provide instant response, and balanced across physical servers to provide high availability. Additional aspects of the applications, such as database access, can be made persistent after the applications themselves are persistent. We discuss this in Chapter 14, "Database-Backed Web Sites."

Basically, Don't Exit

The principle behind persistent applications is simple, but it's not simple enough to go without some explanation. In fact, for most Perl programmers, the idea of continuing after a request is fulfilled is foreign. Perl programs generally are written with a straight path from beginning to end, aside from a few loops and conditionals. Even system utilities written in Perl are designed to run once, from beginning to end, to provide one result, and to exit. This way of behaving comes from Perl's roots as a command-line scripting language for UNIX systems; many UNIX programs simply accept arguments, process them, and return the result before exiting.

Interactive programs don't operate this way. Familiar programs, such as Microsoft Word or the GNU Image Manipulation Program (GIMP), load all their program code into memory when the program starts, and then execute program sections as they are needed. They do this by way of an event loop, which is basically a program loop that doesn't specify the conditions for its own end. Instead, it continuously checks for the occurrence of system events, including keystrokes, mouse clicks, and network requests. Events usually cause parts of the program to be executed, but in some cases, events are handled by additional code called plug-ins. These plug-ins are provided by third-party vendors. These program modules are loaded into memory with the program at runtime and serve as extensions to the core functions of the program.

GIMP and Word Revisited

As stated in Chapter 5, "Architecture-Based Performance Loss," common programs with graphic user interfaces (such as GIMP and Word) would be unusable if they used the CGI model for handling user requests. If either program ended after every keystroke or mouse action and started up again with each new one, the user would spend most of his or her time waiting for the program to perform even simple tasks. The program would have great trouble keeping track of state information, too; any action the program performed would have memory only of the steps performed during the current request. As a result, external data structures would be needed to store information about the current size of the main window, the position of the cursor in a document, or even the current document being worked on. Writing an application that works this way would be ludicrous; thus, most applications that interact with a user are persistent as a matter of course.

How do these programs get consistent performance? Basically, they stay put. Very little of the program's structure is dependent on any one action, so the core of the program can stay constant while specific routines take care of incoming requests. This is very different from a CGI application, which can't be very complex. CGI applications have to start from scratch with each request, so they're usually written to perform a single task with no data structures shared between instances or between different aspects of the interface seen by a user.

Persistence improves these applications in many ways. One thing a persistent application can count on is the capability to set up complex or time-consuming data structures only once–usually when the program starts up–and have them available from that point forward. Word, for instance, needs to load a dictionary for spell checking, which might take up megabytes of storage either on disk or in memory. It would be prohibitive to create such a massive data structure every time spelling is checked, especially if it's checked after every new word is typed. Instead, the entire dictionary is loaded into a standard memory structure before the program starts executing, so any further action taken has that structure at its disposal without having to create it. Many such data structures can be created when the program starts up, and each one enriches the possibilities of what a program can do as its running.

Another difference between Word or GIMP and a CGI application is that the former can take much longer to compile without incurring performance penalties at runtime. GIMP, for instance, can be compiled once when new versions are released, and then used a thousand times before the next version is compiled. Because of this, compilation can take much longer than a user is willing to wait for the program to start because compilation happens separately from runtime. For a Perl program, this isn't possible–Perl assumes that a program is compiled from source code every time it is run. However, it is possible to use the same principle when running Perl applications in a Web server environment. As long as an application is compiled before the user first interacts with it, that application can have the same benefits of a long compile time that a compiled program enjoys. This idea is explored in greater detail in the "More Power: Clustering, Precaching, and Balancing" section later in this chapter.

Word and GIMP also provide the means for additional code to be run within the same persistent environment. This gives users and third-party vendors the ability to write simpler programs that take advantage of the environment provided by the program to perform more complex tasks. Plug-ins for GIMP, for instance, can use any GIMP tool by invoking its corresponding function call from the plug-in scripting language. The languages GIMP provides for these plug-ins include C, a custom language called Script-Fu, and Perl. Using these languages, plug-ins can be written that perform complex operations on the current document using the GIMP tools to achieve an overall effect. A plug-in written to create drop shadows behind an image, for instance, might invoke GIMP functions to duplicate the image, fill the duplicate with black, and then blur the duplicate. Because the plug-in is running persistently in the same environment as GIMP, it appears to the user as though the plug-in is simply another function provided by GIMP.

The Event Loop

The way most programs handle the simple task of continuing to run is through an event loop. The event loop is a continuous loop that cycles through some trivial task while waiting for a request to come in from the keyboard, mouse, or other input stream. GIMP, for instance, gladly sits and does nothing if no user input is given to it; however, it springs to life the moment a key is pressed or the pointer is moved or clicked. Each keystroke or click is then dealt with as though it were a singular event, with a specific set of input arguments and a specific result. For instance, clicking the About menu choice when using GIMP is a single event that causes a window containing program information to be displayed. Most event loops are designed to process many events at essentially the same time, either by keeping a list of events that need to be processed or by processing them in parallel through simultaneous subprocesses called threads.

To deal with events as they occur, each possible event is assigned a subroutine or program section called an event handler, which is responsible for performing an action. For instance, the event handler assigned to the About menu choice event might be a subroutine called show_about_box() that displays a window with a graphic and some text. Event handlers can perform more complex tasks as well, including changing the program environment and operating on the current document. In fact, event handlers do most of the work of a program such as GIMP because almost all program functions are the result of events.

As discussed in the previous section, plug-ins for a program such as GIMP are treated as parts of the main program. In fact, each plug-in serves as a special kind of event handler, which is invoked when the corresponding menu choice is made and given the current document as a working environment. Plug-ins might be written in a language other than the main program, though, so it's necessary for common program functions to be translated into the plug-in language and made available to the plug-in environment. This set of program functions is known as an API, which comprises the complete set of functions any plug-in can perform. In the case of GIMP, the API is integrated into the Script-Fu plug-in language and offered as a set of documented Perl modules that can be used when writing a GIMP plug-in in Perl. The API encapsulates all the functionality of the main program that is accessible by each plug-in, plus the requirements of a plug-in program that enables it to be called by the main program as an event handler.

A Web Application as a Single Plug-In

One way to get persistence in a Web application is to write a Web application in the way in which you would write a plug-in module for GIMP, with the Web server taking the role of the central program. Like a plug-in for GIMP, a Web server plug-in serves as an event handler for events received by the main program. The Web server itself serves as an event loop, processing requests as they come in–from the network in this case, rather than from a keyboard or mouse–and then distributing the requests to handlers assigned to each class of events. Web applications written in this fashion are accessed the way in which the Web server itself is accessed. The new code becomes an extension of the Web server and becomes persistent as a result.

Of course, rewriting the Web server as part of the application–instead of the other way around–is possible, but not advised. Web servers such as Apache have been optimized over the course of years to serve Web pages and related files as quickly as possible. This functionality would be difficult to duplicate in a Web application, and it's likely that the new implementation would be significantly slower at serving the Web application. In addition, standard Web files that include static pages and images are likely to be served with much less efficiency than a dedicated Web server would be able to give, so the overall performance of the Web site would suffer appreciably.

The Web Server API

A plug-in Web application is integrated into the Web server through an API that is like the one used by GIMP. Different Web servers have different APIs, including Apache API, Netscape Server API (NSAPI) for Netscape and related servers, and Internet Server API (ISAPI) for Microsoft Internet Information Server (IIS) and related servers. Plug-ins written to address an API can rely on the Web server to listen for network traffic and route that traffic accordingly, invoking the plug-in when necessary to handle incoming request events that warrant it. In turn, the plug-in is responsible for retrieving the request and related environment information and returning the result through the same API.

Because Web server applications use the server process as an event loop, they can rely on some basic assumptions:

The Web server handles each request as it comes in, whether by threading the request handlers or by keeping a list of requests to be processed. As a result, no requests are lost due to a busy handler.
The Web server calls plug-in code as needed to process requests, with all information needed to process that request provided by the given environment. As a result, no additional programming is needed to provide network listeners or similar constructs.
The Web server provides an internally consistent set of methods for accessing information about the request and the system environment. The server also provides standard functions for modifying and delivering the response to the client.

One benefit to writing Web applications as plug-ins to the Web server is the tight integration this affords. Complex applications that take up a large fraction of the requests a Web server processes are good candidates for this kind of implementation because more of the Web server's tasks can be taken over and customized by the plug-in. For instance, an application might be designed to respond to requests for HTML pages by translating XML files on disk into HTML. (Similar applications are discussed in Chapter 16, "XML and Content Management.") For this application, it would be useful to override the Web server's default way of translating page request URLs into filenames. You would do this by processing those URLs indirectly within the Web application plug-in itself. The Web server API is likely to provide functions that make this possible without requiring the plug-in to perform other functions provided by the Web server.

On the other hand, that tight integration is also a major drawback to writing Web applications as Web server plug-ins. Because plug-ins are written to integrate with a specific Web server API, most applications written this way are incompatible with any other Web server API. In fact, some Web server APIs change from one version of the server to the next, so Web applications written to one version of the API do not work under subsequent versions of the same server. This incompatibility is worse for plug-ins written in compiled languages because they are likely to be incompatible even with the same Web server running on a different platform. All these incompatibilities can provide a major obstacle to deploying a Web application in a diverse environment, and they can make the application a barrier to upgrading the Web server architecture at a later time.

Another drawback to writing plug-ins based on an API is the steep learning curve for programmers with a CGI background who make the switch to API programming. The Web server API provides a fine grain of control over all the Web server's functions, but inherent in this control is the expectation that the programmer specifies explicitly every aspect of the Web application's interaction with the Web server and the network. Most CGI-style programmers–especially those who use Perl–are used to environments that assist application development by preprocessing input variables and buffering and post-processing output. These programmers sometimes balk at a whole new layer of interaction that has to be addressed by the Web application. Even if Web programmers are up to the task, the relative merits of better performance or shorter development cycles should be weighed before deciding to develop a Web server plug-in. (Other alternatives are discussed in the "Web Applications in One Modular Program" and "Web Applications as Persistent Separate Programs" sections later in this chapter.)

Apache API, NSAPI, and ISAPI

The Apache Web server, the most popular server at the time of this writing, provides an API in the C language. Apache plug-in modules can be incorporated into the server in one of two ways, at compile time or at runtime as a Dynamic Shared Object (DSO). After Apache is compiled for a specific system, it is possible to provide modules that are compiled directly into the program. Because it's not always feasible to recompile the entire Web server when a plug-in Web application changes, Apache provides the ability to write plug-ins as DSOs, which are compiled as separate object modules and relinked to the Web server at runtime. The DSO interface does not provide any additional portability, however, so DSOs are not exempt from the main drawback of API-dependent plug-ins. An Apache DSO module still has to be compiled against the precise version of Apache with which it is used because the server API is likely to change between versions.

A third way to write plug-in modules for Apache is provided indirectly by the mod_perl module. This module provides the entire Apache API as a set of Perl functions, similar to the way in which GIMP provides its API to Perl. With the Perl version of the API, it's possible to write Perl plug-in modules for Apache, which are compiled into the Web server at runtime. These plug-ins are treated the same as DSO or compiled C plug-ins, so they have most of the same benefits and drawbacks as any other API-dependent application. (mod_perl is discussed in greater depth in Chapter 10, "Tools for Perl Persistence.") One benefit to writing plug-in modules in Perl, however, is the portability across operating systems that Perl offers–plug-ins written in Perl for Apache under Linux, for instance, probably work without modification under FreeBSD. Portability across versions also is more likely because the process of translating the server API into Perl functions adds inertia to the interface and potentially slows the pace of changes. Neither of these benefits is set in stone, however, because the choice of when to make changes still is up to the API designer.

Assigning an Apache plug-in module to a class of events is done by configuring the Apache server to use the module for a defined group of requests. A module might be assigned to handle requests in a particular virtual directory, or it might be assigned to handle any request with a particular file extension in the URL. One familiar module that works in both ways is mod_cgi, the module that processes CGI requests for the Apache server. The mod_cgi module usually is assigned to handle any requests to the virtual cgi-bin directory, and it might also be configured to handle any request for files with the .cgi extension. As a result, both of the following requests would be handled by mod_cgi:

Listing 16.

http://www.example.com/cgi-bin/sql-query.pl
http://www.example.com/forum/build_hash.cgi

Like Apache, the Netscape line of Web servers (sometimes called iPlanet servers) provides a server API for plug-in applications. The programming API for this server is appropriately named the Netscape Server API, or NSAPI. NSAPI modules are implemented similarly to Apache API modules, with the Netscape server providing the event loop and the plug-in modules serving as event handlers. At the time of this writing, NSAPI servers do not provide a Perl interface to the API, so the NSAPI interface is tangential to the scope of this book. However, some solutions for persistent Perl on Netscape and iPlanet servers are listed in Chapter 10.

The Microsoft IIS also supports a similar plug-in programming interface, called ISAPI. Again, at the time of this writing, no ISAPI servers have a direct interface to the API in Perl, but persistent Perl solutions for ISAPI servers are detailed in Chapter 10. Incidentally, support for ISAPI is slated to be included in version 2.0 of the Apache Web server for Windows.

Threading and Perl 5.6

Threading is a method of event loop processing by which events can be handled by the same process at the same time by executing program sections in parallel. The Netscape and Microsoft IIS Web servers are threaded, and the NSAPI and ISAPI interfaces also enable plug-in code to be threaded. As of version 2.0, the Apache Web server enables threading within the server process.

At first glance, threading a Perl environment would seem to be an excellent idea. Threads would increase the amount of compiled code shared between processes and would reduce the overall memory requirements of Perl processes. It would be simpler to create shared data structures if it could be assumed that all requests would be handled within the same memory space. Also, persistent connections could be pooled between threads to reduce the total number of connections necessary at any given time. Overall, the benefits of a threaded model are very enticing.

However, the threading model requires both the Perl environment and all code running within it to be thread-safe code, or code that behaves consistently with or without threads. Support for threaded code is present in the current version of Perl (5.6 as this is written), but Perl threading still is considered experimental and many modules are not written to be thread-safe. This is due in part to the fact that threading models are platform-specific, so duplicating the same threading behavior in all varieties of Perl has been difficult. Some modules have had success in cleaning out any code that isn't thread-safe, but not all modules are thread-safe for all platforms. Because thread safety in Perl programs can be a concern in terms of application stability, its generally advised to avoid threaded Perl environments for Web applications until Perl threading support is more mature.

Web Applications in One Modular Program

Another way to achieve persistence in a Web application is by writing a single Web application in parts and incorporating the parts into the main process at runtime.

This method enables each program section to be written more simply than a combined Web application based on the Web server API would be written. Aspects of persistent API programming that are common to each application can be written once in a core translation module, and additional code for specific functions can be written in separate programs that are combined implicitly at runtime. For instance, Perl programs written to address the standard input and standard output devices can be combined with a translator application. This translator application preprocesses request input through the server API and buffers the standard output device and post-processes it back into API calls. In fact, there are already Perl modules, such as Apache::Registry, that perform this task.

This also enables many different Web applications to be combined into a single persistent application, which enables the development of a persistent Web application to more closely follow the way in which CGI-based applications and Web sites are designed. As discussed in the "The Nature of CGI" section in Chapter 5, CGI applications are developed using the same one-file-per-request paradigm that governs HTML files. Because each page is developed to address one request, it would be unwieldy to redevelop these individual applications into a monolithic application that handles classes of requests based on factors other than the page URL. If this type of application didn't rely on Web conventions for filenames and directory structures, an alternative organization would have to be developed anyway.

Subroutine-Based Execution

The API interface to a Web server can be accessed directly by writing plug-in modules, or it can be used indirectly by incorporating individual CGI-style programs into a larger Perl program, which then can be used directly as a plug-in module. The key to this translation is the capability of Perl programs to rewrite another Perl program as a subroutine and compile the subroutine into the main program. Perl provides the eval keyword to accomplish all sorts of program modification and recompilation at runtime. Compiling code with eval at runtime makes it possible to process the source code of a Perl program as though it were any string, modify it for use as a Perl subroutine, and then compile it into the main program as though it had been present all along.

In addition, Perl namespaces provide a convenient way to keep track of code that has been compiled into the program environment. After modification, each subroutine generally is placed in a package named for the original program file. This enables code to be reused by accessing the same subroutine in the same namespace again, which provides the full performance benefits of persistence without having to explicitly write code that accesses the API directly. By adding an additional processing step, which translates nonPerl structures into Perl code before compiling, this method also can be used to process template code or Perl code embedded into HTML pages. Applications of this method, such as Apache::Registry, HTML::Mason, and VelociGen, are discussed in more detail in Chapter 10.

Monolithic Application Drawbacks

Even though the source files for this method of persistence might be stored separately on disk, the result at runtime is in fact a single running program, known as a monolithic application. This kind of application architecture is closer to the single-program architecture found in Web applications written for a specific server API than it is to the multiple-program architecture described in the "Web Applications as Persistent Separate Programs" section of this chapter. This is true even if the program changes over time and loads sections of itself interactively as new page types are requested from the network. The server process still shares one memory space between all parts of itself and requires no communication protocols other than the server API. The program is a monolithic application even if many copies of the server process are running at any one time because each copy has the potential to be a duplicate of all the others. In addition, the processes can't communicate with each other after they have been started.

Keeping the program together as a monolithic application can be efficient in terms of system resources. However, monolithic applications can have serious drawbacks when used in a Web environment. Because each process has the potential to duplicate the entire Web application, it's possible to have server processes that are all very large due to the size in memory of a few functions.

Also, monolithic applications–especially those made up of many smaller programs compiled in at runtime–have the potential to be unstable if any of the smaller subprograms or plug-ins crashes or behaves erratically. For instance, a Perl program incorporated into a server through Apache::Registry might be modified incorrectly during application development or maintenance, inadvertently introducing an infinite loop or memory leak that causes the program to behave erratically. Because the misbehaving subprogram is compiled into the Web server itself, crashing the subprogram causes the Web server itself to crash, along with any other Web applications that have been compiled into the server process. Worse yet, after the Web server restarts after crashing, it loads the same subprogram again in response to the first Web request that calls for it. Because all copies of the Web server are being disrupted by the misbehaving code, it's possible that all other requests to the server are being ignored or mishandled while the server crashes and restarts.

Another drawback to monolithic application architectures is the inability to separate out sections of the Web application for independent processing. For large-scale sites with broad areas of focus, a Web application covering all the different functions of the site might become unwieldy to load entirely into one Web server process. A Web application that keeps an in-memory cache of parsed Extensible Markup Language (XML) structures, for instance, might quickly grow to fill available memory, causing the application to use virtual memory on disk and perform poorly. This situation is made worse when the Web server uses multiple processes because dozens of processes might each create a duplicate of the oversized program. It would be helpful in this case to split the Web application into independent sections, each of which processes a portion of incoming requests and loads only a fraction of the subprograms required by the entire Web application. However, this is difficult to set up within a monolithic application because decisions as to which Web server process handles a specific request are usually made before any modules are executed, so modules themselves can't route network requests within the process.

It should be noted that application environments that provide persistence by incorporating smaller applications into the Web server process already are available, so it's generally not necessary to develop an environment like this from scratch. These tools are detailed in Chapter 10.

Web Applications as Persistent Separate Programs

In the UNIX world, a program that runs continuously and that waits for external processes to access it to perform tasks is called a daemon. Similar constructs–sometimes called listeners or servers–exist for any networked operating system and provide the same functionality, regardless of what they're called. For example, the Web server process itself is a daemon, usually listening on port 80 for requests from a Web client. Other common network daemons include Telnet for remote terminal access, SMTP and POP for email transfer, and FTP for file transfer. Each listens at a different port for incoming traffic. The ports used by these applications aren't set in stone; Web servers are known to listen to ports from 82 to 8000, and responses to requests usually are sent out to ports in the 10,000 range on the client machine.

This flexibility enables new daemons to be created that provide services that are not so common, but that are likely to be used more by local applications. A database server such as Oracle, for example, serves as a daemon for client processes accessing the database over a network. The Oracle server provides a standard interface to the database assigned to a specific port, such as 1521, and client programs connect to that port when it's necessary to retrieve data. This method provides persistence in contrast to simpler database systems, such as dBASE or Microsoft Access, which invoke the database server off disk whenever the client program is launched. The network connection between client and server need not take place between two machines that are physically separated on the network; in many cases, the client and the daemon are on the same machine, and the network simply provides a standard interface between the two.

A Web server can be considered a client as well because the Web server process can execute code that connects across the network to another daemon. For instance, in our Oracle example, the database server might be accessed by code running within a Web server, which in turn is processing a request sent by another client. This leads to a chain of persistent network daemons and provides another way to separate functional components of an application while maintaining the persistence of the application as a whole. In contrast, if the database server functionality needed to be incorporated into the Web server along with network interaction and data processing code, the application as a whole would quickly become unwieldy, and it would be difficult to add robustness to either the Web server part of the application or the database part of the application without impacting the other part negatively. Similarly, if the client had to access the Web server and the database server independently, the overall utility of the endpoint application would be greatly limited. By combining the utility of a single client endpoint (the Web server) and a chain of more robust services (the Web server and the database, among others), this enables application designers to concentrate on the task at hand without worrying unduly about how to integrate other applications into the endpoint.

The daemon idea can be extended further to provide persistence to Web applications. Just as the database server was kept separate from the Web process to enable each program to become independently robust, the Web application process can be separated out from the Web server to enable a more robust Web application while keeping a persistent link to the Web server process. Again, this provides a consistent interface to the client by providing a single endpoint for server interaction (the Web server) while using a chain of robust services (the Web server and the Web application, among others) to provide the body of the response. Because the Web application is separated from the Web server process, it's also possible to tune the performance characteristics of the Web application without having to modify the performance of the Web server. For instance, this provides a solution to the problem of threading, as mentioned in the "A Web Application as a Single Plug-In" section earlier in this chapter. The Web server process can be threaded for greater performance and uniform caching, even if the Web application runs as separate processes to provide more stability for Perl processing.

Most daemons listening at network ports are likely to require any incoming requests to adhere to a specified protocol. A Web server requires the Hypertext Transfer Protocol (HTTP), a file server requires FTP, and remote procedure call daemons might require the Simple Object Access Protocol (SOAP). In this usage, a protocol is the format of the data stream sent from the client to the server; it governs things like the format of meta-information, the location of the content length if necessary, and the symbols used to indicate the end of the data stream. Similarly, Web applications designed as stand-alone network daemons would need to specify a protocol for the Web server process to use when transferring request information and receiving response information. This protocol might be custom developed for a single Web application, or it could use one of the preexisting protocols, such as FastCGI, that were developed for just this purpose.

Subroutines Revisited

It's possible to set up a Web application daemon that runs Perl programs the same way Apache::Registry does. Because the application daemon is running persistently, it's possible to use the same techniques to process CGI-style Perl programs within a stand-alone daemon. The Web application daemon can accept preprocessed request information from the Web server and load the corresponding Perl application from the compiled cache to process the request. The Web server still handles the majority of Web requests, including static HTML and image files, and implements all the network interface code necessary to provide a robust Web site.

Just as in the case of an Apache::Registry module, a Web application daemon can translate each application into a subroutine in its own namespace, creating a fairly clean environment in which the application can allocate its own variables and create its own data structures. Unlike Apache::Registry, however, a stand-alone process can interact with any Web server that supports its protocol. This gives greater freedom to system administrators when deciding on server architecture because the Web applications no longer dictate a choice of Web server.

Another important difference in this case is that fine-grain control over the Web server process is no longer available. Because all communication between the server process and the Web application daemon is taking place through a network protocol rather than through a server API, it's unlikely that the Web server would provide a protocol that enables daemon processes to modify the environment greatly. However, some control over the Web server process can be gained by configuring the client module, which does interact with the server through the server API.

CGI Compatibility

Greater compatibility with complex CGI programs can be gained by making each individual program a separate daemon. A CGI search engine, for instance, can be separated out from the rest of the Perl applications if it has been written to handle many different types of search requests within a single application. Some CGI programs were designed in this fashion because it more closely resembled traditional interactive programs, even though the nature of the CGI protocol kept these applications from benefiting from traditional applications' persistence. In the case of such a complex program, it's sometimes more sensible to set the program up as its own Web application daemon because the code necessary to handle network protocol requests would be a minor addition.

Separating a CGI-style Perl program out as an individual daemon also provides greater compatibility with the way CGI programs execute. Because some stand-alone Perl programs assume they are running as unique processes, little regard is given to naming conventions, namespaces, and other style issues that make it possible to combine Perl programs the way that Apache::Registry does. If a Perl program is written in a style that would cause it to behave erratically under a shared environment, separating the program out would give the program free reign over its own process. At that point, problems caused by executing the program in a persistent environment can be discovered and solved individually without trying to fit the application into a shared environment at the same time.

This idea also works for CGI programs that aren't written in Perl, although those applications are outside the scope of this book. By implementing the same Web application protocols in these applications that are used in the Perl applications, it's possible to provide a persistent interface to all Web applications, no matter how they are implemented. This additional level of abstraction also provides the opportunity to change the implementation of any Web application daemon at any time without impacting other Web applications.

Variables, Objects, and Memory Issues

Persistence compatibility for average CGI Perl programs can be helped greatly by simulating the conditions of a clean restart every time a program is accessed. This can be implemented within a program by keeping track of variables manually and resetting each variable at the end of the program. Those kinds of modifications tend to be costly to implement–if not practically impossible–because existing code would have to be searched by someone as knowledgeable about the program as the original developer, with the added capability to tell when subroutines and additional modules create variables and data structures implicitly. In most cases, this problem can be overcome by using good Perl coding style because Perl has facilities for automatically cleaning up my-scoped variables. In addition, some of the tools listed in Chapter 10 reinitialize Perl variables by searching the symbol table and keeping track of variables independently of Perl.

Objects can present specific difficulties when relying on Perl to clean up memory allocated, especially if the objects have circular references to themselves. Objects in Perl are stored as references to data structures that might contain references to other data structures as well. Perl decides when to dispose of an object by checking its reference count, which is the number of variables that hold a reference to the object at any point in the course of the program. Because objects can also store references, it's possible for an object to reference another object that references it in return. Because both objects maintain a nonzero reference count until either object is disposed, it's possible that neither object will be disposed even after the program is finished using it. In a persistent environment, this can result in objects continuing to stay in memory long after the programs (or subroutines) that started them have exited. Disposing of these objects automatically can be difficult, so it's lucky that this situation does not occur very often.

Again, environments that handle the conversion from CGI applications to persistent Perl programs already are available, so it's generally not necessary to write cleanup routines for these from scratch. These tools are detailed in Chapter 10.

More Power: Clustering, Precaching, and Balancing

Persistence isn't the only performance improvement made possible by these architectural changes. When Web applications are persistent, clustering, precaching, and balancing applications can provide additional performance tuning. Most of these techniques are possible with only one of the architectures mentioned in this chapter–from the "Web Applications as Persistent Separate Programs" section–but other architectures might be able to use variations of these techniques.

Clustering Web applications enables individual programs to become more responsive while reducing the memory requirements of a particular daemon, or application engine, as it's sometimes called. Although the usual approach to improving Web server performance in a multiprocess environment is to create identical duplicates of a process in memory, clustering takes the approach that each application engine has a preferred group of programs it hosts. With the first approach, any available application engine is considered to be equivalent when deciding which engine should process a request. Clustered engines, on the other hand, are weighted by their preferred group of programs, so requests are routed to the first open engine that has the appropriate program to handle each request. This makes it more likely that any given request reaches an application engine that has recently handled a similar request, which increases the use of cached code and reduces the amount of duplicate code.

Precaching Perl modules and data structures is important because much of the time spent compiling program code and instantiating data should be spent before the first request gets handled. Perl makes caching modules easy because each module has its own package and namespace that is referenced in the same way, regardless of the namespace of the program or subroutine that accesses it. Precaching data structures and application code is a little more difficult, but only because the data and applications are more likely to change over the course of the application engine's lifetime.

Balancing is the performance improvement most Web server installations are likely to implement in hardware, but the intelligence behind most balancing doesn't take Web application structure into account. The assumption is that all Web site sections are being accessed equally and that they cause the same load on the server, so a simple duplication of all the Web servers would improve performance across the board. With a clustered Web application, though, it's possible to load balance on a smaller scale, shifting the load of higher-use applications to more application engines and even to additional Web application servers.

Clustering Application Engines

Because daemons can be accessed independently of the Web server processes, it's possible to cluster useful programs in each daemon so that the overall size of the daemons in memory stays as small as possible. This is an architectural change that should have no effect on the way applications are accessed; programs are simply grouped behind the scenes into clusters that are more manageable for the server environment. This idea is similar to Web server clustering, which uses network hardware to determine which Web server should respond to a class of Web requests to provide as uniform a stream of requests as possible to any one Web server.

Applications can be clustered within application engines in a number of logical configurations:

Programs can be clustered based on the directories in which they reside or other URL information. For instance, requests for programs residing in the /search/ directory might be routed to one application engine, while requests for programs in the /wml/ directory might be routed to another. This type of routing is the easiest to set up in most Web server environments.
Programs also can be grouped by function. If a group of programs comprise a search application that takes considerably more time and processing power than other applications, that group can be routed to a specific application engine to improve the response time of other applications.
Programs can be grouped even by the Perl modules they use. If a group of programs uses the XML::Parser module, for instance, it might be advisable to separate those programs out to reduce overall memory usage. This grouping probably is useful in few cases, though, because many Web applications use the same core set of modules and differentiating between similar combinations would be difficult.

Additionally, it's possible to cluster applications automatically using a balancing algorithm to decide which application engine should respond to each request. The development of such an algorithm is outside the scope of this book, but the general idea is to funnel incoming requests to engines that have previously processed similar requests. Thus, the application needed to process any particular request is less likely to be duplicated across multiple engines because the balancing algorithm assign priority to the engines that already have that application cached. Subsequently, each application engine is likely to cache fewer applications because requests outside its scope are likely to be handled by other engines.

Precaching Modules

One way that persistence techniques using subroutines can help the performance of Perl Web applications is by enabling modules to be shared between applications without any changes to the individual program. This is made possible by the way Perl separates modules into their own namespaces, which are accessed through an absolute path from anywhere in the program. For instance, variables and subroutines from the DBI module can be accessed directly by prefacing them with DBI, so the connect subroutine could be accessed as &DBI::connect(). Object notation adds an additional layer of abstraction to these calls by enabling subroutines in a specific package to be accessed as methods off an object, which then inherently calls the appropriate package. Thus, DBI->connect() calls a subroutine from the DBI package, but so do $dbh->quote() and $dbh->disconnect().

Because modules are shared between the subroutines that pass for individual programs, it's easy to load the modules before any requests are accepted at all. A list of common modules can be kept either manually or determined automatically from the scripts to be cached by a particular daemon, and each module can be loaded into memory by calling it from the main daemon program as it is initialized. Similarly, a list of the Perl programs that are likely to be invoked by a particular daemon can be used to load those programs into their own namespaces as soon as the daemon initializes. In both cases, the modules and program subroutines then are available in precompiled form to all requests, and any new modules or programs are added in as they are requested.

The difference is seen during the first request to a particular program after the server or daemon has been started. If modules have been precached, the first request takes just as long as any other request. However, if modules haven't been cached, the first request to a particular engine takes much longer than subsequent requests. In fact, the first request takes almost as long as an equivalent CGI request would take because the same compilation and instantiation overhead is seen in both cases–only the Perl compiler itself is already in memory.

Precaching Data Structures

If each daemon has a specific set of programs it hosts and each program is precompiled within that application engine, it's also possible to preload the data structures for those applications in which the daemon is first started. Perl doesn't assist this process inherently the way it makes module caching possible, but a few changes to the modules used by a Web application can do a similar job. For instance, the DBI module has a corresponding Apache::DBI module, which enables applications in a persistent environment to create database connections when the application engine is first invoked so that they are available when the first request is handled by the engine. Similar techniques can be used to precache parsed XML files, compiled template code, or lengthy data files. Precaching these kinds of data require more specific knowledge of the data structures the Web application uses, but in some cases, the performance benefits are well worth the additional effort and care.

Of course, additional checks must be performed when a request comes in for each program to make sure that the program's data structures haven't changed or become invalid. A database connection, for instance, might be closed by the database after a period of inactivity. If a persistent application tries to use the connection without checking its status, the application might behave erratically. A simple check needs to be added to make sure the connection still is available. Similarly, an XML file used as an on-disk data repository might have been modified by another process since it was last cached, so it would be useful to check the timestamp on the file against the cached version, update the file if it has changed, and purge the file if it is no longer present.

Many Perl modules have persistent versions or Web-centric add-ons that enable this kind of preloading and persistence management to occur transparently. The DBI module is an example that is discussed in detail in Chapter 14. The persistence-enhanced versions of other modules might be found by searching for Apache::* versions on the Comprehensive Perl Archive Network (http://www.cpan.org).

Load Balancing Across Engines

Another benefit to separating Web application engines from the Web server process is additional control over the number of engines available to process a given type of request. If a Web application requires many more Web server processes than application engines, for instance, it's possible to balance the load from all Web server processes across a smaller set of application engines. Load balancing this way generally requires a more complex algorithm for routing requests than the standard one-process-per-server approach, but the results can be more flexible and configurable.

This type of load balancing also can benefit Web server performance by enabling an administrator to limit the number of engines available to process certain types of requests. For instance, if a database used infrequently by Web applications is licensed for only a small number of simultaneous connections to save money on licensing fees, it is possible to cap the total number of engines capable of accessing the database server. Because this number can be set independently from other Web application and Web server performance tuning variables, it's possible to fix a value without impacting negatively on the performance of the Web site as a whole. This same technique also can be used to limit the impact of a specific user's Web applications in a multiuser environment, as would be found in a shared Web-hosting situation.

Application engines also can be spread across multiple machines to create a Web application cluster. Because each application engine is also a network daemon, it doesn't matter whether the engine is running on the same system as the Web server or on another system connected to it through a network. The protocol is the same, the request is the same, and the response is the same. Because of this, it's possible to run any or all application engines on separate servers, with the additional possibility of clustering application engines based on the servers that have those applications available. Of course, those servers have to be connected through a suitably fast connection, but in most cases, the internal connections between servers on a Local Area Network (LAN) are much faster than the connection between the Web servers and the Internet.

In practice, a combination of automated load balancing and clustering techniques provides the most robust and responsive environment possible. Because it's impossible to determine the relative load on a server caused by requests to one Web application or another, it's good to load balance engines that have been clustered. This enables each engine type to have as much processing power devoted to it as necessary without creating unnecessary duplicates of engines that aren't under as much load.

Sidebar: Two Important Principles to Remember

I really can't emphasize enough the importance of two simple principles in Web application design:

Don't exit
Keep connections open

The first is discussed here and in subsequent chapters, and the second is discussed in terms of databases in Chapter 14. In truth, though, the two principles would be invaluable even without explanation because they underlie every major performance increase this book has to offer.

It's amazing to me how often these two simple principles are overlooked. I was programming in persistent environments with continuous database connections before eBay was popular, but I still run across executives from Fortune 500 companies who are confused by the concept of a persistent environment. Most are convinced that their performance problems will be solved by rewriting entire Web applications in a different language or for a different operating system. They ignore the fact that both the old and the new systems they're using are made inefficient by throwing compiled and instantiated programs away after every request and creating and destroying database handles every time they're used. Even more amazing was the arrogant explanation that I got from one e-commerce CTO: "I have sixty programmers working night and day to optimize code. I don't need a new architecture."

Although this chapter described a few important types of Web server architectures, there are many more possibilities waiting to be discovered. In fact, as the performance limitations of each architecture are reached, it becomes possible to see the next level of architectural improvement necessary to meet exponential growth yet again. In all cases, though, the same two questions can be used as a filter for architectural performance improvements:

Does the improvement add any persistence to the application or its data?
Does the improvement keep connections open that otherwise would have to be closed?

To keep the idea even simpler, the two principles can be combined into one over-arching goal: recycle everything. Don't throw any piece of compiled code away. Hold on to that database connection as long as possible. Keep a cached version of those XML files, and update only the sections of them that have changed. Use idle clock cycles on other machines to augment Web processing. Provide cached copies of results when the data isn't likely to have changed. Above all, recycle those resources that are more precious than all the processing power in the world: reuse the code that cost so many programmer-hours to produce.

Summary

Persistence is a Web server concept that has many possible implementations. The general idea behind a persistent Web application is to mimic the way in which an interactive application, such as GIMP, would process incoming requests through an event loop and event handler code. One way to duplicate this in a Web server environment is by writing the Web application as a server plug-in that interacts through the Web server's API. A second approach is to take the existing CGI applications and rewrite them automatically by incorporating them as event handler subroutines into a larger Perl program. Both approaches create monolithic applications that are difficult to modify for greater performance, however. A third approach separates out the Web application code by creating a network daemon that communicates with the server through a protocol layer. By combining these approaches, it's possible to create an environment that is configured for persistent performance but that accepts CGI-style code as well as persistence-specific applications. Furthermore, it's possible to use these techniques to cluster applications into functional engines and balance application processing over multiple servers without needing to modify the Web application code itself.