most recent, version 2.0, is not.
Atom is a newer, but similar, syndication protocol. It is a proposed standard at the Internet
Engineering Task Force (IETF) and seeks to maintain better metadata than RSS, provide better and
more rigorous documentation, and incorporates the notion of constructs for common data
These syndication technologies are great for mashups that aggregate event-based or updatedriven
content, such as news and weblog aggregators.
Like any other data integration domain, mashup development is replete with technical challenges
that need to be addressed, especially as mashup applications become more feature- and functionalityrich.
This section touches on a handful of these challenges, some of which you can address and
mitigate, while others are open issues.
Data Integration Challenges: Semantic Meaning and Data Quality
Qualitative surveys suggest that the number one enterprise IT concern today is data integration
within the enterprise virtual organization. (In this context, I use the term virtual organization to
mean a composition of federated business units, each contained within its own administrative
domain.) Like many enterprise IT managers who find themselves up to the task of integrating
legacy data sources (for example, to create corporate dashboards that reflect current business
conditions), mashup developers are faced with the analogous challenges of deriving shared semantic
meaning between heterogeneous data sets. Therefore, to get an idea for what mashup developers
have in store,you need look no further than the storied integration challenges faced by enterprise
For example, translation systems between data models must be designed. When converting data
into common forms, reasonable assumptions often have to be made when the mapping is not a
complete one (for example, one data source might have a model in which an address-type contains
a country-field, whereas another does not). Already challenging, this is exacerbated by the fact
that the mashup developers might not be domain experts on the source data models because the
models are third-party to them, and these reasonable assumptions might not be intuitive or clear.
In addition to missing data or incomplete mappings, the mashup designer might discover that the
data they wish to integrate is not suitable for machine automation; that it needs cleansing. For
example, law enforcement arrest records might be entered inconsistently, using common abbreviations
for names (such as “mkt sqr” in one record and “Market Square” in another), making automated
reasoning about equality difficult, even with good heuristics. Semantic modeling technologies, such
as RDF, can help ease the problem of automatic reasoning between different data sets, provided
that it is built-in to the data-store. Legacy data sources are likely to require much human effort in
terms of analysis and data cleansing before they can be availed to semantic modeling technologies.
Mashup developers might also have to contend with several issues that IT integration managers
might not, one of which is data pollution. As part of their application design, many mashups solicit
public user input. As evidenced in the wiki application domain, this is a double-edged blade: it can
be quite powerful because it enables open contribution and best-of-breed data evolution, yet it can
be subject to inconsistent, incorrect, or intentionally misleading data entry. The latter can cast
doubts on data trustworthiness, which can ultimately compromise the value provided by the mashup. Another host of integration issues facing mashup developers arise when screen scraping techniques must be used for data acquisition. As discussed in the previous section, deriving parsing and
acquisition tools and data models requires significant reverse-engineering effort. Even in the best
case where these tools and models can be created, all it takes is a re-factoring of how the source
site presents its content (or mothballing and abandonment) to break the integration process, and
cause mashup application failure.