Sources of Unexpected Traffic
part of Perl for the Web
On the Web, one word can signify both a boon and a danger to an application: traffic. There's no surer death to a Web application than a lack of traffic, which indicates a lack of interest and usually ends up with a lack of sales. On the flip side, though, increased traffic can mean greater demands on limited hardware and software resources, and many a site has been buried under its own success.
Sites That Suffer Outages
Every site is hoped to be a site for the ages: one that never gets old, goes out of style, or fails for any reason. Yahoo! is one of these sites; the core content on Yahoo! has been the same since its inception, the design of the site has changed very little in the years it has been available, and it's reliable enough that many people use it to determine if their Internet connection has failed. In other words, if you can't reach Yahoo!, you can't reach anything.
Of course, not every site is Yahoo!, and not every site is accessible, fast, and reliable. When site response is slow, when pages are intermittently returned, or when the site is inaccessible, it can cause a lot of problems for both the people accessing the site and the people running the site.
I've collected some examples of Web site outages from my travels. Most of these sites became inaccessible not due to extraordinary circumstances, but due to more traffic than the designers expected. It's an easy thing to run into; I've designed sites that operated for years longer than their expected lifetime. When they fail, it's seen as less a disaster than the end of a miraculous run that was never expected.
The Macadamia Story
The holidays are a time of love, joy, and consumerism. As much as I try, I'm still not out of the woods where that is concerned. Of course, being the twenty-first-century man that I am, I like to do the shopping I have to do online. It means that if I can't keep from getting the people I love tacky, overpriced gifts that they'll never keep, I can at least buy those gifts from a Web site while wearing boxers, listening to the Muppets sing carols, and drinking eggnog while playing Minesweeper in another window.
Unfortunately, the 1999 holiday season made this less convenient than I originally had hoped. It was my first season shopping entirely online, and I was feeling very merry about the prospect of doing all my shopping over the course of an afternoon while the rest of my neighborhood engaged in a blood match for the last Furby down at the local mall. In fact, I was so optimistic about my ability to buy everything online that I waited until the last minute to order; I had heard that most sites wouldn't guarantee on-time delivery unless the order was placed that day. Of course, most sites didn't mention this outright, but I figured I'd better be safe.
The last gift I had to buy was one of the most important: my mother's. She's a sweet woman who likes macadamia nuts. So, I figured the choice was clear. I'd find a site that offered a nice assortment of macadamias in a bright shiny package, and my shopping would be complete. Being the savvy comparison shopper I am, I also planned to visit a few locations to make sure I found the perfect package at the right price. You know, multiple browser windows offer the kind of shopping experience that a mall just can't duplicate. My first destination was Harry and David, a company I was used to dealing with through mail order and brick and mortar. I had just noticed that they had a Web site. So, it seemed the perfect place to search for those macadamia nuts.
The Harry and David site was well laid out, but I noticed a performance problem as soon as I opened the first page. I'm used to waiting for Web pages, though, and I tend to give sites the benefit of the doubt when surfing because I know how it is to be under the gun when designing a Web site. I was willing to wait for those macadamia nuts because there are always other browser windows and other sites to visit while a page loads. Unfortunately, I had to wait for longer periods of time for each page; the site must really be getting hit hard, I thought. Finally, pages started taking too long for the otherwise infinite patience of my Web browser; I still persevered, though, reloading each page repeatedly until something came back. At the very last step, just before I got the chance to look at the macadamia assortment I thought would be best, the site returned an ASP application error and refused to show me any further pages. I was shut out; there would be no macadamias from Harry and David.
I was determined to go on, though. Certainly there had to be another site that would provide the macadamias I wanted to buy and the stability I needed to actually process the order. I went to site after site, big names and small, but they were all slow, damaged, or both. Some sites timed out when I tried to order, others gave application errors when I searched for that devil term "macadamia," and many more just quit without letting me know whether nuts were available at all. It was getting to the point where I had to decide whether to continue this hellish endeavor or give up and join the fray at the mall. I couldn't, though. It had reached the point where it was a matter of principle. Somewhere, someone was going to sell me some macadamia nuts.
My final destination was a true desperation move. I went to the corporate Web site for Mauna Loa, makers of macadamia nuts. Their headquarters are in Hawaii. The site wasn't geared for e-commerce; in fact, it practically shunned me from actually purchasing actual food. Not to be deterred, I navigated to the samples section where nuts could be shipped to those without the benefit of other macadamia outlets. I gave my credit card number and my mother's address. No gift wrap was available, no message could be included, and the tin would contain just nuts. In short, it would look like it came from a supermarket. No promises were made about shipping dates, and it would probably be sent through the U.S. mail. Still, it worked; I was able to order it on-line, and that was all that was important at that point. I let Mom know that the macadamia nuts were on their way, and she was happy enough with the thought involved.
Incidentally, I called my mother back a few weeks after the holidays to ask if the macadamia nuts had arrived safely; they never did. It could have been the vagaries of the postal service or a shipping error at the Mauna Loa plant. The order itself could have been ignored due to the sheer idiocy of a man in California ordering nuts from Hawaii to be shipped to Wisconsin when they're already available there. Whatever the reason, the nuts never arrived. I sometimes imagine that they're making their way around the country, looking for my mother the way that the fugitive looked for the one-armed man, destined to search for eternity.
Case Study: WFMY
Chris Ellenburg knows how important it can be to prepare for unexpected traffic increases. Ellenburg is director of new media at WFMY, a television station in Greensboro, North Carolina, and webmaster for the Web site the station launched in August 1999. WFMY was the first station in its market to offer a Web site, which saw its success overwhelm it when traffic became too much for the site to handle.
Ellenburg initially developed the WFMY Web site using Perl CGI to connect to a MySQL database that provided dynamic content for use throughout the site. News stories, weather forecasts, program listings, and archival content were all processed by Web applications because most pages on the site needed to provide up-to-the-minute information. This information made the site an instant success with viewers throughout the region and around the country, but that popularity quickly led to a collapse.
Shortly after the WFMY site was made public, Hurricane Floydpossibly the worst storm in North Carolina's historycaught national attention and the site received a flood of requests for information about the hurricane. WFMY was the definitive Internet news source for the region. Therefore, the site was pummeled with requestssite logs recorded over 3 million hits that month. The traffic onslaught proved to be too much for the CGI applications on the site, as Ellenburg found out quickly; the site became slow to respond, taking up to 20 seconds to return each page. Eventually, the site stopped returning pages at all, and the Web server itself went down every 10 minutes due to the load. There was little time to take steps to improve performance or stem the flow of traffic before the wave of interestand the hurricanepassed. Site usage dropped below 1 million hits per month afterward, which the server was able to handle.
Ellenburg realized that something had to be done to prevent the same thing from happening again. So, he evaluated potential solutions to the performance problems plaguing the site. A Web server upgrade was initially considered, which would involve either increasing the server's memory beyond its original half-gigabyte of RAM, or clustering a group of servers using load-balancing hardware. Both of these options were prohibitively expensive for the station, however, and either option would give at best an incremental improvement in speed. Ellenburg also considered using static HTML files for the entire site to remove the overhead due to CGI processing, but the timely information that was the site's core would have suffered as a result. Despite the performance hit they entailed, dynamic pages and Web applications were indispensable.
Clearly, the solution to the problem had to come from performance improvements to the site's Perl processing. Ellenburg came across mod_perl and VelociGen, both of which claimed to provide performance improvements over CGI by caching and precompiling Perl-based Web applications. (mod_perl, VelociGen, and other tools for improving Perl performance are detailed in Chapter 10, "Tools for Perl Persistence.")
Both packages performed as advertised and then some. Tests that compared the original CGI with a persistent application using embedded Perl showed the persistent application performing up to ten times faster than the CGI version (see Chapter 12, "Environments For Reducing Development Time"). In the process of testing, Ellenburg also found that he enjoyed the ease afforded by embedding Perl directly into his dynamic HTML pages.
A new WFMY site was developed using embedded Perl applications and launched in October 2000 to unanimous approval. Site visitors immediately noticed a marked improvement in site responsiveness, andmore importantlyso did Ellenburg's boss. The site recorded 3 million hits for the month once again, this time without so much as a hiccup. Remember, this was after a year stagnating at only a million. No longer constrained by site responsiveness, traffic continued to increase up to 7.6 million hits for December 2000 with 2 million hits from page views alone. Ellenburg estimates that the site could easily handle 20 million hits per month in its present configuration. When the next hurricane hits, whether it be digital or meteorological, Chris Ellenburg will be ready.
Evaluating the Cost of Down Time
WFMY didn't lose any business due to the slow responsiveness of their site, but businesses with e-commerce sites would certainly feel the impact of a sluggish or unavailable site. In the case of an e-commerce site, the cost of being slow or unavailable is easy to estimate based on the revenue lost for that time period. For instance, if Amazon.com had revenues of $1.8 billion for 2000, that translates to a cost of $280,000 per hour of down time. That means a performance solution that improves Amazon's uptime from 99.9 percent to 99.99 percenta reduction from 90 seconds to 9 seconds a day of unavailabilitywould be worth over $6,300 a day in revenue alone, or about $190,000 a month. This doesn't even count the cost of lost productivity among site staff, delays to site improvements, or other internal costs.
The numbers are no less compelling for the cost of performance degradation. Even if a server is up 99.9 percent of the time, it could still be slow enough to discourage potential customers from exploring the site. A slow site can deter visitors drastically, as WFMY can attest; their site usage fell off when the site was slow, but grew rapidly when performance was improved. Estimating the cost in this situation is less reliable, but more compelling. If Amazon had discouraged just 25 percent of its customers overall, it would have lost $450 million in those nine months. If it improved performance like WFMY and saw a 600 percent increase in customers, it could stand to gain $11 billion in revenues. (Money is never a bad thing for a company struggling to be profitable.) The idea of expansion isn't revolutionary or untested; any brick-and-mortar establishment would find it an obvious choice to increase the capacity of a popular store before turning customers away at the door.
Obviously these numbers are estimates and would vary widely from business to business, but the principle holds true: basic Web site performance improvements can translate to large gains in visitor interest and revenue. When evaluating the cost of performance improvements and site architecture upgrades, you should weigh the opportunity costs of not upgrading against the costs of the change.
The Cost of Readiness
Preparing a Web site to handle traffic increases can be costly, but cost overruns can be avoided if the true costs of readiness are known and budgeted for before the upgrade begins. Knowing the hidden costs of an upgrade has an additional benefit: better architectural decisions that improve efficiency and decrease hidden costs will become more attractive to managers looking to get the most bang for their buck.
The costs of a performance overhaul fall into four categories:
- Time spent researching performance problems and deciding on solutions
- Hardware upgrade costs
- Software upgrade costs
- The cost of training (or hiring) staff to develop and administer the new systems
Usually, only the cost of new hardware and software are considered; the assumptions are that research and development time are already paid for and staffing costs are equivalent for all solutions. However, choices made in each of the four areas can have enormous effects on the costs of the others.
Evaluating all areas as a whole provides a better indication of the true cost of a solution. For instance, Windows is sometimes chosen as a server operating system with the idea that Windows administrators are easier to train and less expensive to hire than Linux administrators. Unfortunately, this choice would lead to an increased hardware cost due to the need for more Windows servers to handle the same load as equivalent Linux servers, whichin combination with increased down time and reconfiguration needswould necessitate hiring more administrators. This additional hiring usually cancels out individual salary savings.
Sudden Traffic Increases
Sudden increases in site traffic can come from many sources and become more likely as the Web increases in size and consolidates around a few very popular points. A Web site can cause its own traffic surge due to new marketing or popular interest in the site's product, service, or information. The site also could become popular by being linked to another popular site or service, which causes a sudden increase in traffic with little or no warning.
Fame and the Slashdot Effect
Slashdot (http://www.slashdot.org) is a site that aggregates news and product reviews that might be of interest to the geek community on the Web. The site's motto is "News for nerds. Stuff that matters," and it has gotten a devoted following since its inception in 1998. The stories on Slashdot are designed to showcase a new idea or declare the availability of a new technology rather than comment on the state of something ongoing, as the nightly news is likely to do. However, Slashdot doesn't limit itself to technology articles; any new discovery, theory, product, service, or idea that might have some interest to geeks is fair game, and hundreds of thousands of submissions a day are posted from Slashdot devotees looking to have their name listed under the latest headline.
In early 2001, Slashdot had a readership of millions. Many visit the site every day; some (like me) read the site many times over the course of the day as a running summary of the geekosphere at any given time. Because of this, millions of Slashdot readers are likely to respond to a story at any given time, as opposed to weekly or even daily news sites that are visited over the course of an entire day.
Because of its format, Slashdot concentrates the majority of reader focus on one article at any given time. The site lists about a day's worth of headlines directly on the home page, usually with a link to the site or news story being discussed. As headlines are supplanted by new stories, they move down the pageeventually getting listed in a separate archive area. Because of the quick availability of these few links, many Slashdot readers are likely to click on the same link within the same short period of time. The result of this extreme interest by so many people in such a short time is known as the Slashdot Effect.
The Slashdot Effect causes enough of a spike in traffic to crash many of the Web servers that are lavished with the site's attention; afterward, comments attached to the Slashdot story mention that the site had been Slashdotted. This usually happens to relatively unknown sites because the Slashdot Effect is larger in comparison to their usual site traffic.
The Slashdot Effect can be made even larger when the Slashdot readership is less likely to have seen the linked site already. In fact, small sites can misdiagnose the Effect as a distributed denial of service attack. If a site's connection to the Internet is being saturated by incoming requests, it's difficult to distinguish a difference between legitimate interest and malicious intent. In most cases of legitimate interest, however, the Web server is overloaded long before the network connection, which makes the difference easier to discern.
No site and subject is safe from the Slashdot Effect, even if the site's content or function is outside the stated focus of Slashdot. Because the editors of Slashdot have independent editorial control, it's possible that any one editor might deem a story fit for posting.
Other Sources of Fame
There are many sites on the Web that create effects similar to the Slashdot Effect, but traffic from these sites builds over a longer period of time. Sites like Yahoo News or CNet can focus a lot of attention on a previously unknown site by simply mentioning it in an article; the effect is magnified if Associated Press or Reuters syndicates the article. With such syndication, the article appears on many news sites simultaneously. News writers on the Web also are known to use each other as sources; even a story with very little information can be quoted again and again as parts of other news items.
A Web site also can see sudden increases in traffic when it provides a Web application that becomes an integral part of another application or toolkit. For instance, the XML Today Web site (http://www.xmltoday.com/) saw a ten-fold increase in traffic when its XML stock quote service was included as a test in an XML development kit released by IBM. Each new test of the IBM software caused an automatic burst of traffic to XML Today, and users of the development kit found the service interesting enough to continue using.
Cumulative Increase
Luckily, not all increases in Web site traffic are as sudden or as short-lived as the Slashdot Effect. Gentle increases are just as likely as a site becomes more widely known. Additional increases in site traffic come with the general expansion of the World Wide Web (WWW) as more Web sites are created and more people connect to the Internet. With these types of increases, it's much easier to estimate the need for additional site resources well before performance problems occur. Still, it's necessary to keep an eye on traffic increases because they can add up quickly and keep a site from handling larger spikes.
The Network Effect
Traffic increase due to the increase in connections to a site from other sites is known as the Network Effect. This occurs when new Web sites are brought online with links to your site, with a fraction of that site's traffic added to yours. For instance, every new Amazon.com partner bookstore that is created provides new links back to Amazon.com. These new links increase the total traffic to that site. A traffic increase also occurs when existing sites add new links to the site, as in the case of SourceForge, an open source development site that links to software projects in progress. A new link on SourceForge attracts visitors interested in that kind of project.
The Network Effect can cause the traffic to a site to increase even if the total number of links doesn't increase. This increase occurs because the sites that are already linked get more traffic as time goes on, and they transfer a portion of that increase accordingly. Thus, a link from the Yahoo! directory to a site provides more and more traffic over time, simply because Yahoo! itself gets more traffic due to promotions and consolidation. It is possible for this principle to work in reversesites providing less traffic to linked sites as viewer numbers declinebut that case is much less likely due to the overall increase in people and sites connected to the Web. (See the "Web Expansion and New Devices" section later in this chapter.)
Site Log Indicators
It's possible to determine the average traffic a site receives by studying Web site activity logs. These logs are created by nearly every Web server available, so they should be available for use in determining the traffic to any Web site, regardless of which platform or Web server it uses. In addition, most activity logs are stored in variations on a standard format. This allows standard analysis software to be used across different sites.
Usually, activity logs are analyzed in terms of pages that are most popular or the overall amount of traffic a site gets. This is a holdover to the early days of the Web when any traffic at all was a sign of prestige and webmasters were mainly interested in determining the popular pages. Site traffic was viewed in the same light as Nielsen ratings for a television show; a site was seen as a success if it increased its number of hits, or requests for files. This style of log analysis is still used to determine viewer intent by ranking popular areas of a site; the resulting data is sometimes used to determine which parts of a site are worth the devotion of resources.
When analyzing the effect of increased traffic on Web applications, however, it's important to separate the idea of hits from the real usage a Web application gets. For instance, a processing-intensive Web application, which uses four dynamic pages and twenty images (icons, advertisements and such), registers as 24 hits in a traditional log analysis. However, another application with 1 dynamic page and 23 images also registers as 24 hits. If a server log shows only the total hit count for a period of time, it becomes difficult to determine how many of the hits are due to static files and how many are due to Web applications. If requests for the first Web application increase while requests to the second decrease, the overall increase in server load would go unreported by a hit count.
To get a clearer picture of the current load on a Web application server, more emphasis needs to be placed on the dynamic pages served overall. If a site sees an increase from 3,000 hits one month to 6,000 the next, for instance, it might seem that load on the server has doubled. If the dynamic pages served by the site have increased from 1,000 to 4,000 in the same time period, however, it's more likely that server load has quadrupled.
Estimating Time to Full Load
Estimates of future site activity should take a few different factors into account. Existing traffic, past trends, seasonal patterns, and traffic changes due to site redesigns all can be used to get an estimate of the activity a site can expect. By taking server load under current conditions into account, it's possible to get a good estimate of when a server upgrade is necessary to keep up with traffic increases.
Determining Traffic Patterns
If a site has been available long enough to show a trend of increasing site traffic, a base estimate of future site traffic can be determined by assuming that traffic increases will follow the same general curve as previous increases. Again, dynamic page usage should be weighted higher in determining overall load, and the trends in dynamic page usage should be estimated independently to get a sense of their effect.
If the content and usage of a site follows a seasonal pattern, take the seasonal pattern into account when estimating future traffic. E-commerce sites, for instance, are likely to see an increase of traffic before the holidays, and financial sites are likely to see the greatest spike before tax season. News sites are likely to see seasonal increases as well; WMFY counts on an increase in traffic during hurricane season, and sites in tourist-oriented markets can count on more visitors during peak season. Site sections also can show seasonal patterns; during ski season, a news site in Colorado might see a disproportionate increase in traffic to the Web application that displays weather. Logs that go back over a year would be the best way to gauge seasonal changes, but the current revisions of many sites are less than a year old.
Dramatic changes to a site, usually caused by redesigning the site or adding new sections or services, should also be taken into account. A site redesign is likely to affect the number of files present overall, the number of images requested per Web page, and the relative number of Web applications in use on the site. For instance, if a news article that uses one static page and five images is replaced by one that is broken into four dynamic pages with ten images overall, the number of hits and dynamic page accesses both increase, regardless of overall traffic increases. If the site this article is part of records a 300 percent increase in hits and a ten-fold increase in dynamic pages generated, only a fraction of that increase is due to the increase in visitors. So, when determining trends on such a site, the traffic levels before the site redesign should be scaled (up in this case, down in some others) to match the traffic levels directly afterward. For instance, if site traffic increases 70 percent from the week before a redesign to the week after while the number of visitors stays constant, traffic before the redesign can be scaled up by 70 percent when figuring traffic trends.
Determining Maximum Load
Once estimates of future traffic are available, it's possible to estimate how soon a Web server will need upgrading to handle the increased traffic. You can't necessarily determine this from current server statistics alone. A server can experience performance problems due to a variety of factors, and each possible bottleneck would have to be considered. For instance, static file performance is rarely going to peak before database connections have hit their maximum. Fortunately, the maximum load a server can handle can be determined directly through testing.
First, get an idea of the maximum load the server supports. Test the server using a load simulator to get the maximum number of requests the server can support in a given amount of time. For instance, if the server can process 100 dynamic page requests per second at peak capacity, that translates to 260 million dynamic page requests per month. Likewise, if the server can handle 1000 mixed requestsapplications, static files and imagesper second, that translates to 2.6 billion hits per month. A discussion of load testing and performance analysis tools is available in Chapter 15, "Testing Site Performance."
Then, check this number against the estimates of traffic. If a server can handle four times the current load and that number will be reached in six months according to traffic estimates, for instance, it's time to start planning. These numbers also can be used when evaluating performance improvements to a site. If a server upgrade will double the server's capacity but traffic increases will outstrip that value in nine months, more work might be necessary to extend the life of the server.
Of course, this doesn't take sudden spikes of traffic into account. If a server can handle four times the current traffic, it still gets Slashdotted when sudden interest increases traffic by ten times or more. As a result, it's a good rule of thumb to keep server capacity at least ten times the current traffic levels at all times, and plan server upgrades to maintain that level. This isn't likely to cost a great deal more than maintaining a lesser readiness, and it provides a much-needed buffer when making decisions about how to serve visitors' needs in the future.
Web Expansion and New Devices
Of course, not all Web traffic comes from the same sources, as it does today. The Web itself is a very young medium, and as such, it reinvents itself at an alarming pace. Sites that are popular one moment can be gone the next, and entirely new genres of sites can be invented within months. The growth of the Web adds to the growing pains; the growth is from traditional computers being connected to the Web and from new devices, such as cell phones and palmtop computers, gaining Web access.
The Web Doubles Every Eight Months
The number of Web sites available to browsers doubles every eight months, according to numbers published by Alexa Internet in 1998. More recently, that rate was verified by server totals published by the Netcraft Web Server Survey in December 2000. Although this number seems phenomenal, the number of people and businesses not connected to the Web is still large enough to support this kind of growth for years to come.
This growth doesn't automatically mean that site traffic to every site also doubles every eight months. Although the potential for a traffic increase is doubling at that rate, the actual increase (or decrease) of traffic to any given Web site is governed more by the interest in that site than by the sheer number of people and possible connections that are available on the Web. However, the likelihood that a site with consistent performance will see this kind of growth is good.
WML and Slow Connections
With new Web-enabled devices come new possibilities, as well as new headaches. A cell phone connected to the Web sounds like a good thing at first; it provides mobile Web access to millions of people who wouldn't carry wireless laptops. Unfortunately, the interface on a cell phone is not nearly as rich or detailed as the one most Web browsers use today; thus, a special set of technologies had to be developed to give cell phone and wireless PDA users a way to get Web content, without dealing with the high-bandwidth, visually-oriented sites that are currently available. The answers were the Wireless Access Protocol (WAP) and Wireless Markup Language (WML).
WAP provides a low-bandwidth way for wireless devices to access the Internet without incurring the overhead of the usual TCP/IP connection. A detailed description of WAP is outside the scope of this book, but the important thing to remember about WAP is that it appears to a Web server as a very slowbut otherwise standardclient connection.
WML is a markup language based on XML (the Extensible Markup Language) that uses a completely different way of organizing Web content than does the more familiar HTML. WML organizes content into decks of cardssimilar to the way a slide presentation or a stack of flash cards would be organized. A deck is a single document with a collection of demarcated sections, called cards. A card is a snippet of HTML-like content with no graphics that can have links both within the deck and out to other WML decks. This represents a radical departure from the full-screen HTML pages to which Web users and designers have become accustomed. This departure means that existing HTML content would have to be translated into WML before it could be repurposed for wireless devices. In practice, this has generally meant the complete redesign of a Web site for WML presentation, with all the decisions and hassles that entails. The end result is two versions of the Web site, both of which have to be updated whenever site information changes.
Once a site is made available through WML and WAP, the slow connection speed of the current set of Web-enabled wireless devices can cause additional headaches for Web site administrators. Slow connections can have a marked effect on site performance; because the Web server has to keep a slow connection open longer than a speedy one, it's possible to degrade the performance of a site by tying up all its available connections with wireless devices. Unfortunately, this can't be easily rectified; it simply has to be accounted for when planning. See Chapter 17, "Publishing XML for the Future," for more information on WML support in Web applications.
XML and Automated Requests
Eventually, the majority of Web application traffic might not come from viewers at all. The rise of business-to-business (B2B) and peer-to-peer (P2P) communications provides new avenues for computers and applications to interact with each other directly, as well as new reasons for them to do so. The CDDB (Compact Disc Database) protocol, for instance, enables a compact disc jukebox in someone's home to contact a Web site and request information on an album or song. This connection enables the jukebox to display the song's title and artist while it's being played, without requiring the owner to type in any information at all. As a result, though, sites that provide information in CDDB format are likely to get requests every time a song is played or a disc is loaded, regardless of whether an actual person is viewing the resultant information.
These sorts of requests are likely to become more common as Web applications are developed to enable other Web applications to query them in ways in which users would not. Employment sites, for instance, might receive many more requests for job listings and resumes if they made those listings available to other job sites in a standard format. The same sites would also generate requests by querying the same information from other sites. Maintaining these interfaces will be just as important as maintaining the HTML interface to customers because the site becomes less usable if its sources of information become unavailable.
The favored language for these kinds of requests is XML. XML is a flexible way for all types of information to be marked up in ways that both humans and computers can understand. In practice, it's become the watershed for a new generation of industry-specific languages that can be used to represent data specific to a business or other population on the Internet. For instance, the human resources (HR) and job search community has proposed HR-XML, a set of XML languages that can be used to describe job postings, resumes, and other documents used in the field.
A standard way of sending requests automatically for such HR documents comes in the form of interprocess communication protocols. These protocols include the Simple Object Access Protocol (SOAP) and XML Remote Process Communication (XML-RPC). These give a standard set of protocols that can be used by one program to access documents from another program remotely, without having to develop a specific protocol for each pair of programs. These standard protocols, combined with service directories using the Universal Discovery and Description Interface (UDDI) and the Web Services Description Language (WSDL), give programs a generic interface to a whole range of remote applications by implementing support for just these languages and protocols. This is similar to the same way in which a Web browser has a generic interface to a whole range of remote documents using HTML, HTTP, and Internet search engines. The catch is that the protocols and languages themselves are as difficult to implement as HTML and HTTPif not more so. This fact once again gives Web application programmers more to support with the same architecture. For some solutions to this problem, see Chapter 17, "Publishing XML for the Future," Chapter 18, "XML as a B2B Interface," and Chapter 19, "Web Services."
Summary
Traffic comes from many sources, and it can be both a blessing and a curse. Sites like WFMY have found that too much traffic all at once can cripple a site and deny service to additional site visitors. When this happens to an e-commerce site, it can mean lost revenue and lost opportunities. However, it's sometimes difficult to tell just when traffic will spike. Traffic can increase quickly due to exposure on a site such as Slashdot. It can increase slowly due to exposure on other sites. By evaluating overall traffic patterns, it's possible to determine the margin a site has before the Web server becomes overloaded. Also, by considering the possible effects of new technologies such as WAP and XML, it's still possible to stay ahead of the curve and keep site visitors happy while avoiding performance problems.