Monday, March 31, 2008

Exploring the History of XML

Regardless of your personal opinion of XML, everyone has at least heard of it. Not everyone, however, knows the origins of XML, and it is helpful to understand at least the basics of its evolution. Imagine you are attending a company party, and someone from management (it is even worse when they are not from the information technology [IT] group) decides to ask you about XML because they have been hearing all about it in meetings. After covering the history of XML, you will be certain to be left alone the rest of the night. Seriously, though, understanding how and why XML was conceived will provide an understanding of the problems it was originally meant to solve, which ultimately can aid in determining whether you should use it and how you can use it to solve current problems.

Generalized Markup Language

XML can trace its roots all the way back to 1969. Charles F. Goldfarb, previously a practicing attorney, accepted a position at IBM that involved integrating information systems with legal practices. The project involved integrating text editing, information retrieving, and document rendering. The problem at hand was that each application required different markup. Goldfarb, along with Ed Mosher and Ray Lorie, began what was to be eventually known as the Generalized Markup Language (GML). The name was actually created based on the initials of Goldfarb, Mosher, and Lorie, and from here the term markup language was coined.

The purpose of GML was to describe the structure of a document using tags, allowing for the retrieval of different parts of the text while separating document formatting from its content. This way the same document could easily be used amongst different applications and systems. These different systems would then use their own processing commands based upon the tags encountered within the document. Another important aspect was the introduction of Document Type Definitions (DTDs). GML was officially named in 1973.

Standard Generalized Markup Language

In 1978, Goldfarb joined the American National Standards Institute (ANSI) and worked on a project based on GML to be known as the Standard Generalized Markup Language (SGML). While GML was a proprietary IBM format, SGML was developed by many people and groups and aimed to standardize textual representation and manipulation in documents in a platform- and vendor-neutral, open format. SGML is not really a language in the sense most people think of languages but rather defines how to create a markup language, so it is really a metalanguage.

The first working draft of SGML was published in 1980 and continued to evolve, being released as a recommendation for an industry standard in 1983. In 1986, the International Organization for Standardization (ISO) published it as an international standard.

Although adopted by some large organizations, such as the U.S. Department of Defense (DOD), the U.S. Internal Revenue Service (IRS), and the Association of American Publishers (AAP), SGML was extremely complex, which ultimately prevented its widespread adoption. Most companies did not have the time or resources to leverage SGML in their business activities. However, some people say using SGML reduces a product’s time to market, because in the long run less time is spent on application integration and day-to-day editing. This may be true, but the upfront cost in time is typically too great for smaller companies that cannot afford to dedicate enough resources to this.

The complexity of SGML and the time-to-market paradigm of using it play significant roles in the history of XML and ultimately led to its creation. The following are a few notable concepts of SGML that are relevant in the evolution of XML:

  • A document is defined structurally by a DTD.

  • Named elements, also referred to as markup tags, defined within the DTD comprise the document.

  • Entities, which are named parts of the document and consist of a name and a value, can perform substitutions within the document.

Hypertext Markup Language

Many of you may not remember the Internet before the World Wide Web was created. In those days, Gopher was a common technology used to access documents on the Internet. It was extremely primitive compared to what everyone uses today, but back then it allowed people to access documents and in most cases search for documents from all over the globe.

In 1989, while working at CERN, the European Particle Physics Laboratory, Tim Berners- Lee came up with an idea that would allow documents on the Internet to cross-reference each other. In basic terms, a document could link to other documents, including specific text within the documents. The language used to create these documents was Hypertext Markup Language (HTML). In 1990, the Web was born with the first live HTML document on the Internet.

HTML was based on SGML and added some features such as hyperlinking and anchors. Specifically created for the Internet, HTML featured a small set of tags and was designed for displaying content, causing it and the Web to quickly gain widespread adoption. Its features, however, were also its major limitations. Because it is simple, its tag set is not extendable. The tags also have no meaning to anything other than the application, such as a browser, that renders the document.

Extensible Markup Language

The technology started to come full circle in 1996. With SGML being considered too complicated and HTML too limited, the next logical step was taken. The World Wide Web Consortium (W3C) formed a committee to combine the flexibility and power of SGML with the simplicity and ease of use of HTML, which resulted in XML. Finally in February 1998, XML 1.0 was released as a W3C recommendation. Again, it was originally intended for electronic publishing, but little did they anticipate the reaching effects XML would have. The design goals were as follows:

  • XML shall be straightforwardly usable over the Internet.

  • XML shall support a wide variety of applications.

  • XML shall be compatible with SGML.

  • It shall be easy to write programs that process XML documents.

  • The number of optional features in XML is to be kept to the absolute minimum, ideally zero.

  • XML documents should be human legible and reasonably clear.

  • The XML design should be prepared quickly.

  • The design of XML shall be formal and concise.

  • XML documents shall be easy to create.

  • Terseness in XML markup is of minimal importance.

To understand how simple XML can be, consider that an example of a complete well-formed XML document can be as simple as .

No comments: