Mashup Patterns: Designs and Examples for the Modern
Chapter 1, Understanding Mashup Patterns
In this section you'll learn about the various types of mashups and the difference between
consumer mashups and enterprise mashups. You'll also learn about potential sources for mashup data,
acquiring data from the Web and the structure of HTML.
Consumer and enterprise mashups
Types of Mashups
Mashups have several different colloquial interpretations, which has resulted in some confusion regarding the term and its use. The word originated in the music industry, where a mashup was a combination of two or more songs to create a new experience. Typically, the vocal track of one song was combined with the instrumental background of another in this process.
The technology industry extended this definition to encompass a new application genus that described the combination of two or more sources into an integrated site. This technique of development hybridization can be roughly split into two separate categories: consumer mashups and enterprise mashups.
Consumer mashups are generally associated with Web 2.0. They require a lesser amount of programming expertise because they rely on public Web sites that expose well-defined APIs and feeds (see Figure 1.4).
The output is usually created by one of the sites participating in the mashup. In the classic "show craigslist listings on a Google map," the API of Google Maps is used to plot and present the feed obtained from craigslist.com. The limitation of this approach was that resources had to be "mashup ready."
Enterprise 2.0 mashups (sometimes referred to as data mashups) are more complex. Depending on which solution a firm deploys, enterprise mashups can emerge in several ways:
- Mashups are used solely by IT to rapidly deliver products. Application developers use both
internal and external sources to create data mashups
Figure 1.4 A small number of sites with public APIs account for the majority of consumer- created mashups. Source: http://www.programmableweb.com/apis
and employ traditional coding techniques to create the user interface around them. Users aren't directly involved in the construction process but they benefit from IT's ability to provide solutions more quickly.
- IT creates a set of "mashable" components and gives end users a sand-box environment where they
can freely mix and match the pieces together themselves. If users need new components, they have to
solicit IT help to create them.
- An organization deploys an environment that lets anyone create and combine his or her own
mashups. This approach is the most difficult implementation to manage, but probably has the
greatest impact. To understand the challenge of this approach, consider the use of Microsoft Excel
in many firms. Users can create spreadsheet-based applications and pass them around without any
central oversight of what exists, how it is used, or if it was tested. This friction-free creation
and distribution model spreads good solutions as quickly as bad ones.
Whether mashups are used by IT, business associates, or both, their agile nature makes them a key enabler of Enterprise 2.0. Unfortunately, they are not without potential downsides. In an attempt to "deconstruct" the success of Google, the Harvard Business Review points out several pitfalls that can hinder success in a culture of open development:
- As people spend more time experimenting, productivity in other areas can suffer.
- Poor coordination across groups can lead to duplication of efforts and repeated mistakes.
- A constant stream of new products may confuse the organization and its employees.
Despite these potential hazards, the authors indirectly identify the virtuous circle of Enterprise 2.0 (Figure 1.5). As diverse products are combined to create useful new resources, they themselves become fodder for the next generation of useful products. In principle, this process isn't very different from the longstanding goal of reusability that firms have strived for in their applications and architecture. Three important differences arise this time around, however:
1. In the age of mashups "reuse" is no longer an ivory-tower concept restricted to the purview of application architects. Because end users and developers alike will be creating solutions, everyone will engage in the practice of reuse.
Figure 1.5 The virtuous circle of mashups
2. The existing approach to reuse front-loads development efforts with additional planning and coding to create open APIs and extra documentation that may never be used. Because mashups impose reusability "after the fact," their creators will build their own APIs and include only the minimum functionality needed.
3. Traditional reuse practices don't require that a system that leverages existing code or libraries is itself reusable. This leads to implementations that are essentially "dead ends." Mashups are implicitly reusable, which creates a never-ending cycle of potential associations and recombination.
Acquiring Data from the Web
As we saw in the last section, the majority of consumer mashups use the public APIs of a handful of Web sites. In the enterprise model, the best potential sources for mashup data may not be as forthcoming. In these situations, it becomes necessary to employ creative techniques to extract information. One of the most common and controversial techniques is often referred to as "screen scraping." This derogatory phrase carries a long sullied history and is thrown around by detractors seeking to undermine this approach.
Traditional "screen scraping" owes its origins to the early days of desktop computing, when IT departments developed various techniques to migrate "dumb terminal" mainframe applications to end-user computers. Rather than tackle the costly and time-consuming task of rewriting or replacing existing applications, many IT departments used special PC-based applications that emulated the original terminals. These applications could receive the data from the mainframe and extract the contents of the forms presented on the old green-screen systems. User keystrokes were likewise emulated to send input back to the original application. This technique relied on developer-created templates and was both highly position-sensitive and extremely unforgiving. The smallest alteration in the mainframe display would break the predefined template and break the new application.
Because of these drawbacks, screen scraping was generally viewed as a hack and a last resort. The negative experiences associated with this approach continue to haunt any solution that promises to extract raw data from a user interface. Before organizations feel comfortable with mashups, users will need to understand how modern methods differ from the brittle approaches of the past.
Too many of us have forgotten that the "L" in HTML stands for "Language." In HTML, the description of the presentation and the presentation itself are inexorably bound in most people's minds. Many view HTML and what is displayed in their browser as two sides of the same coin.
In fact, it is the underlying Document Object Model (DOM) that makes mashup "screen scraping" something that should more appropriately be referred to as "Web harvesting" or "DOM parsing." When HTML is read by a browser, it is internally organized into a hierarchal structure. The underlying data structure is tree based and much more organized than what the user sees (see "The Structure of HTML" sidebar). HTML elements may contain additional nonvisual information such as the id and class attributes (see "The class and id Attributes" sidebar).
The Structure of HTML
Consider the following simple Web form:
When parsed by a browser, this HTML is internally organized into a hierarchical structure known as the Document Object Model (DOM). The DOM is more conducive to automated analysis than the presentation users receive.
The class and id attributes
The ubiquitous use of id and class in HTML make them ideal markers for Web scrapers to identify document elements.
Beyond their original intent within HTML, id and class attributes can also serve as "markers" for general-purpose processing by other applications/agents (e.g., mashups). Unlike the screen scrapers of the past that relied solely on positional information to parse screen content, mashups are able to examine the underlying attributes used to build the presentation. Although not a foolproof approach, this data changes much less frequently than the look and feel of a site, as demonstrated in the sidebar "Presentation Changes Don't Break Object Discovery." While consumer mashup builders queue up and wait for content providers to expose an API, enterprise teams are using Web harvesting to grab whatever data they want.
Presentation Changes Don't Break Object Discovery
This example shows a sample Web page before and after a radical redesign. Although a visitor might be disoriented by the drastic changes, similarities in the underlying HTML (and resulting DOM tree) will not slow down a mashup that examines the site.
As part of a larger system, a mashup is created to sign in to a Web site by supplying a "Sign On ID" and a "Password." The form attributes and DOM information are displayed following the screenshot.
Even though the site has been radically redesigned, it still contains form elements for "Sign On ID" and "Password." A peek at the underlying HTML and DOM shows that these fields retain the same attributes. A mashup most likely will not have a problem recognizing the new design, even though a human might take some time to become accustomed to the new interface.
Enterprise mashups are not restricted to skimming content from HTML: They can leverage more structured formats such as XML (RSS, ATOM), Web Services, or even binary formats such as Excel and PDF (as shown in Figure 1.6). Nevertheless, the great promise of enterprise mashups derives from their ability to treat the entire World Wide Web as a first-class data source.
Figure 1.6 Enterprise mashups can consume a variety of different data sources.
Continue to the next section: Using mashups and SOA, EAI
Download Chapter 1, Understanding Mashup Patterns
Read other excerpts and download more sample chapters from our CRM and call center bookshelf
This was first published in May 2009