Monday, March 31, 2008
PHP Tutorials
One of the first things most people want to know about PHP is what the initials stand for. Then they wish they had never asked. Officially, PHP stands for PHP: Hypertext Preprocessor.
Basics of Object Oriented Programming
This tutorial is aimed at an audience unfamiliar with the basic concepts of object-oriented programming (OOP). The intent is to provide a general overview of OOP with a view toward using PHP effectively.
Object Oriented Features New to PHP 5
Object Oriented Features New to PHP 5......a useful tutorial.
Introduction to XML and Web Services
The Extensible Markup Language (XML) is a simple, platform-independent standard for describing data within a structured format. XML is not a language but instead a metalanguage that allows you to create markup languages.
What is PHP and Why Should I Care?
PHP is a scripting language that brings websites to life in the following ways:
* Sending feedback from your website directly to your mailbox
* Sending email with attachments
* Uploading files to a web page
* Watermarking images
* Generating thumbnails from larger images
* Displaying and updating information dynamically
* Using a database to display and store information
* Making websites searchable
* And much more . . .
PHP is easy to learn, it is platform-neutral, so the same code runs on Windows, Mac OS X, and Linux; and all the software you need to develop with PHP is open source and therefore free. There was a brief debate on the PHP General mailing list (http://news.php.net/ php.general) in early 2006 about changing what PHP stands for. Small wonder, then, that it drew the comment that people who use PHP are Positively Happy People.
PHP started out as Personal Home Page in 1995, but it was decided to change the name a couple of years later, as it was felt that Personal Home Page sounded like something for hobbyists, and did not do justice to the range of sophisticated features that had been added. Since then, PHP has developed even further, adding extensive support for objectoriented programming (OOP) in PHP 5. One of the language’s great attractions, though, is that it remains true to its roots. You can start writing useful scripts very quickly without the need to learn lots of theory, yet be confident in the knowledge that you are using a technology with the capability to develop industrial-strength applications. Although PHP supports OOP, it is not an object-oriented language.
Make no mistake, though. Using simple techniques does not mean the solutions you will find in these pages are not powerful. They are.
Embracing the power of code
How hard is PHP to use and learn?
How safe is PHP?
Embracing the power of code
The CSS Zen Garden, cultivated by Dave Shea, played a pivotal role in convincing designers of the power of code. The underlying XHTML of every page showcased at www.csszengarden.com is identical, the CSS produces stunningly different results. You do not need to be a CSS superhero, but as long as you have a good understanding of the basics of XHTML and CSS, you are ready to take your web design skills to the next stage by adding PHP to your arsenal.
PHP is a server-side language. That means it runs on the web server, unlike CSS or JavaScript, which run on the client side (that is, the computer of the person visiting your site). This gives you much greater control. As long as the code works on your server, everyone receives the same output. You can do the same thing with JavaScript, but what visitors to your site actually see depends on two things: JavaScript being enabled in their web browser, and the browser they are using understanding the version of JavaScript you have used. With PHP, this does not matter, because the dynamic process takes place entirely on the server and creates the XHTML needed to display the page with a random choice of image. The server chooses the image filename and inserts it into the
What PHP does is enable you to introduce logic into your web pages. This logic is based on alternatives. If it is Wednesday, show Wednesday’s TV schedules . . . If the person who logs in has administrator privileges, display the admin menu, otherwise, deny access . . . that sort of thing.
PHP bases some decisions on information that it gleans from the server: the date, the time, the day of the week, information held in the page’s URL, and so on. At other times, the decisions are based on user input, which PHP extracts from XHTML forms. As a result, you can create an infinite variety of output from a single script. For example, if you visit at http://foundationphp.com/blog/, and click various internal links, what you see is always the same page, but with different content.
How hard is PHP to use and learn?
PHP is not rocket science, but at the same time, do not expect to become an expert in five minutes. If you are a design-oriented person, you may find it takes time to get used to the way PHP is written. What we like about it very much is that it is succinct. For instance, in classic ASP, to display each word of a sentence on a separate line, you have to type out all this:
<%@ Language=VBScript %>
<% Option Explicit %>
<%
Dim strSentence, arrWords, strWord
strSentence = "ASP uses far more code to do the same as PHP"
arrWords = Split(strSentence, " ", -1, 1)
For Each strWord in arrWords
Response.Write(strWord)
Response.Write("
")
Next
%>
In PHP, it is simply
$sentence = 'ASP uses far more code to do the same as PHP';
$words = explode(' ', $sentence);
foreach ($words as $word) {
echo "$word
";
}
?>
That may not seem a big difference, but the extra typing gets very tiresome over a long script. PHP also makes it easy to recognize variables, because they always begin with $. Most of the functions have very intuitive names. For example, mysql_connect() connects you to a MySQL database. Even when the names look strange at first sight, you can often work out where they came from. In the preceding example, explode() “blows apart” text and converts it into an array of its component parts.
Perhaps the biggest shock to newcomers is that PHP is far less tolerant of mistakes than browsers are with XHTML. If you omit a closing tag in XHTML, most browsers will still render the page. If you omit a closing quote, semicolon, or brace in PHP, you will get an uncompromising error message. This is not just a feature of PHP, but of all server-side technologies, including ASP, ASP.NET, and ColdFusion. It is why you need to have a reasonable understanding of XHTML and CSS before embarking on PHP. If the underlying structure of your web pages is shaky to start with, your learning curve with PHP will be considerably steeper.
PHP is not like XHTML: you ca not choose from a range of PHP editors that generate all the code for you automatically. Dreamweaver does have considerable support for PHP, and it automates a lot of code generation, mainly for integrating web pages with the MySQL database.
How safe is PHP?
Basics of Object Oriented Programming
This tutorial is aimed at an audience unfamiliar with the basic concepts of object-oriented programming (OOP). The intent is to provide a general overview of OOP with a view toward using PHP effectively. We will restrict the discussion to a few basic concepts of OOP as it relates to PHP, though it is sometimes useful to look at other object-oriented (OO) languages such as Java or C++.
We will discuss three aspects of object orientation in this tutorials: class, access modifiers, and inheritance. Although OOP may be a different programming paradigm, in many respects it is an extension of procedural programming, so where appropriate, We will use examples from procedural programming to help explain these concepts.
ClassObjects Need Access Modifiers
Object Reuse and Inheritance
Class
You cannot have OOP without objects, and that is what classes provide. At the simplest level, a class is a data type. However, unlike primitive data types such as an integer, a float, or a character, a class is a complex, user-defined data type. A class is similar to a database record in that it encapsulates the characteristics of an object. For example, the record of a Person might contain a birth date, an address, a name, and a phone number. A class is a data type made up of other data types that together describe an object.
Classes Versus Records
Although a class is like a record, an important difference is that classes contain functions as well as different data types. And, when a function becomes part of a data type, procedural programming is turned on its head, quite literally, as you can see in the following example syntax. A function call that looked like this:
function_call($somevariable);
looks something like this with OOP:
$somevariable->function_call();
The significant difference here is that OO variables do not have things done to them, they do things. They are the actors rather than the acted upon, and for this reason they are said to behave. The behavior of a class is the sum of its functions.
A Cohesive Whole
Procedural programmers often work with code libraries. These libraries usually group related functions together. For instance, all database functions might be grouped together in a file called dbfunctions.inc. The functions that make up an object’s behavior should also be related to one another, but in a much stronger fashion than functions in the same library. Just as the different elements of a Person record describe an individual, so too should the behavior of a class describe the class. In order for something to be an object, it should be a cohesive whole incorporating appropriate characteristics and appropriate behavior.
Objects Are Instances
Classes are not themselves objects, but a way of creating objects-they are templates or blueprints that form the model for an object. When speaking loosely, these two terms are sometimes used interchangeably, but strictly speaking an object is an instance of a class. This is somewhat like the difference between the concept of an integer and a specific variable $X with a specific value. The concept of a class as a template for an object becomes clearer in the context of inheritance, especially when we discuss multiple inheritance (a topic we will deal with shortly).
Objects Need Access Modifiers
OOP is made possible by using this simple concept of a class as a cohesive aggregate of characteristics and behaviors, this is exactly what objects are in PHP 4—but one of the most important features of any OO language is the use of access modifiers. Access modifiers refine the object model by controlling how an object is used or reused. Simply put, access modifiers provide guidance about what you can and cannot do with an object. To get a sense of what this means, let’s use an example from procedural programming.
Let’s define a subroutine as a function that is never invoked directly but that is only called by other functions. Now suppose you are a procedural programmer with a library of functions and subroutines that is used by several other programmers. The ability to flag subroutines as secondary would be helpful in instructing others how to use your library, but the only way to do this is through documentation. However, in OOP, access modifiers not only indicate the primacy of certain functions over others, they enforce it programmatically. They implement language constraints to ensure that “subroutines” are never called directly. Properly constructed classes are self-documenting and self-regulating.
In the situation just described, the need to document a code library arises because it is used in a collaborative environment, the exact same circumstance accounts for the existence of access modifiers. One of the assumptions of OOP is that it is conducted within an interactive context with access modifiers defining the ways of interacting. This is one of the important differences between OOP and procedural programming. Access modifiers provide the rules for using a class and this syntactically defined “etiquette” is commonly referred to as an interface. By providing an interface, there is less need to rely on documentation and on user programmers “doing the right thing.”
Documenting code libraries is important because libraries get reusedl, access modifiers matter for exactly the same reason-they facilitate reuse.
Object Reuse and Inheritance
In a biological sense, a child inherits genes from its parents, and this genetic material conditions the appearance and behavior of the child. In OOP the meaning of inheritance is analogous-it is the ability to pass along characteristics and behavior. At first this feature of OOP may seem somehow magical, but really inheritance is just a technique for reusing code-much the way you might include a library of functions in procedural programming.
If you identify an existing class that exactly suits your needs, you can simply use it and benefit from the predefined behavior. Inheritance comes into play when a class does not do quite what you want. This situation is not much different from adding functions to an existing code library. Through inheritance you can take advantage of existing behavior but also graft on any additional capabilities you need. For example, if you know that you want to create a Blue jay class and none exists, you can use an existing Bird class by inheriting from it, then modify it to suit your specific situation.
When one class forms the basis for a new class, as a Bird class might for a Blue jay class, the original class is often referred to as the base (or parent) class. For obvious reasons, a class derived from another class is called a derived class or a child class.
Multiple Inheritance
In nature, multiple inheritance is the norm, but in the world of OO PHP, an object can have only one parent class. The creators of PHP 5 rejected the idea of multiple inheritance for classes. To see why, let’s use the Bird class again to show what multiple inheritance is and how it can lead to problems. If you wanted to create a Whooping crane class, it would make sense to derive this class from the Bird class. Suppose you also have an Endangered species class. Multiple inheritance would allow you to create a Whooping crane class from a combination of these two classes. This would seem to be an excellent idea until you realize that both classes define an eating behavior. Which one should you prefer? Awkward situations like this highlight the disadvantages of multiple inheritance. With single inheritance this kind of situation never arises.
Having Your Cake and Eating It Too
Single inheritance offers a simpler and more straightforward approach, but there are times when you may wish to combine behaviors from different classes. A whooping crane is both a bird and endangered. It does not make sense to build one of these classes from scratch every time you want this combination. Is there a way of combining different classes and avoiding the problem of overlapping behavior?
PHP solves this problem by introducing the concept of an interface. In this context, interface means a class with no data members that is made up only of functions that lack an implementation (function prototypes with no bodies). Any class that inherits from an interface must implement the missing function body. If Endangered species were an interface rather than a class, having more than one eating function would not matter. The method definition in the Bird class would act as the implementation of the interface function. In this way interfaces avoid the problem of defining the same function twice.
Because PHP does not require function prototyping, you may be unfamiliar with this concept. A function prototype is the declaration of a function name and parameters prior to its use-the function signature, if you like.
A class may inherit from only one class, but because interfaces lack an implementation any number of them may be inherited. In true PHP fashion, interfaces contribute to a powerful but flexible programming language.
Interfaces can be described as abstract because they always require an implementation. Because they are abstract, interfaces bear more resemblance to templates than classes do. Unlike classes, they can never be used "as is", they are only meaningful in the context of inheritance. Because interfaces lack an implementation they can act only as a model for creating a derived class.
Object Oriented Features NeObject Oriented Features New To PHP 5w
PHP 3 was released in mid-1998. Some basic object-oriented (OO) capabilities were included, more or less as an afterthought, to “provide new ways of accessing arrays.” No significant changes were made to the object model when version 4 was released in mid-2000. The basics of objectoriented programming (OOP) were there-you could create a class and single inheritance was supported.
With the release of PHP 5 in 2004 there was plenty of room for improving PHP’s OO capabilities. At this point, Java, the most popular OO language to date, had already been around for almost 10 years. Why did it take PHP so long to become a full-fledged OO language? The short answer is because PHP is principally a web development language and the pressures of web development have only recently pushed it in this direction.
Support for objects has been grafted onto the language-you can choose to use objects or simply revert to procedural programming. That PHP is a hybrid language should be viewed as something positive, not as a disadvantage. There are some situations where you will simply want to insert a snippet of PHP and other situations where you will want to make use of its OO capabilities.
In some cases, an OO solution is the only solution. PHP 5 recognizes this fact and incorporates a full-blown object model, consolidating PHP’s position as the top server-side scripting language.
Access Modifiers
Built in Classes
Backward Compatibility
Where to Go from Here
Access Modifiers
PHP 5 gives us everything we would expect in this area. In previous versions of PHP there was no support for data protection, meaning that all elements of a class were publicly accessible. This lack of access modifiers was probably the biggest disincentive to using objects in PHP 4.
A notion closely related to data protection is information hiding. Access modifiers make information hiding possible by exposing an interface. This is also referred to as encapsulation of an object.
Built in Classes
Every OOP language comes with some built-in classes, and PHP is no exception. PHP 5 introduces the Standard PHP Library (SPL), which provides a number of ready-made classes and interfaces. As of version 5.1, depending upon how PHP is configured, all in all, there are well over 100 built-in classes and interfaces-a healthy increase from the number available in version 5.0.
Having ready-made objects speeds up development, and native classes written in C offer significant performance advantages. Even if these built-in classes do not do exactly what you want, they can easily be extended to suit your needs.
Exceptions
All OOP languages support exceptions, which are the OO way of handling errors. In order to use exceptions, we need the keywords try, catch, and throw. A try block encloses code that may cause an error. If an error occurs, it is thrown and caught by a catch block. The advantage of exceptions over errors is that exceptions can be handled centrally, making for much cleaner code. Exceptions also significantly reduce the amount of error-trapping code you need to write, which offers welcome relief from an uninspiring task. Also, having a built-in exception class makes it very easy to create your own customized exceptions through inheritance.
Database Classes
Because PHP is all about building dynamic web pages, database support is allimportant. PHP 5 introduces the mysqli (MySQL Improved) extension with support for the features of MySQL databases versions 4.1 and higher. You can now use features such as prepared statements with MySQL, and you can do so using the built-in OO interface. In fact, anything you can do procedurally can also be done with this interface.
SQLite is a database engine that is incorporated directly into PHP. It is not a general-purpose database like MySQL, but it is an ideal solution in some situations, in many cases producing faster, leaner, and more versatile applications. Again an entirely OO interface is provided.
PHP versions 5.1 and higher also bundle PHP Data Objects (PDO) with the main PHP distribution. If you need to communicate with several different database back ends, then this package is the ideal solution. PDO’s common interface for different database systems is only made possible by the new object model.
Web Services
In PHP 5 all Extensible Markup Language (XML) support is provided by the libxml2 XML toolkit (www.xmlsoft.org). The underlying code for the Simple API for XML (SAX) and for the Document Object Model (DOM) has been rewritten, and DOM support has been brought in line with the standard defined by the World Wide Web Consortium.
Unified treatment of XML under libxml2 makes for a more efficient and easily maintained implementation. This is particularly important because support for XML under PHP 4 is weak, and web services present many problems that require an OO approach.
Under PHP 4, creating a SOAP client and reading an RSS feed are challenging programming tasks that require creating your own classes or making use of external classes such as NuSOAP (http://sourceforge.net/ projects/nusoap). There is no such need in PHP 5.
Reflection Classes
The reflection classes included in PHP 5 provide ways to introspect objects and reverse engineer code. The average web developer might be tempted to ignore these classes.
Iterator
In addition to built-in classes, PHP 5 also offers built-in interfaces. Iterator is the most important, as a number of classes and interfaces are derived from this interface.
Backward Compatibility
Backward compatibility may be an issue if your code already uses objects. PHP 5 introduces a number of new "magic" methods. Magic methods begin with a double underscore, and this requires changing any user-defined methods or functions that use this naming convention. The most important ones relate to how objects are created and destroyed. The PHP 4 style of object creation is still supported, but you are encouraged to use the new magic method approach.
PHP 5 deprecates some existing object-related functions. For example, is_a has been replaced by a new operator, instanceof. This particular change won’t affect how your code runs under PHP 5. If you use a deprecated function, you will see a warning if the error-reporting level is set to E_STRICT. In another example, the get_parent_class, get_class, and get_class_methods functions now return a casesensitive result (though they do not require a case-sensitive parameter), so if you are using the returned result in a case-sensitive comparison you will have to make changes.
Pass By Reference
The preceding examples of changes are relatively minor and fairly easy to detect and upgrade. However, there is one change in particular that is of an entirely different magnitude.
The major change to PHP in version 5 relating to OOP is usually summed up by saying that objects are now passed by reference. This is true enough, but do not let this mask what is really at issue: a change in the way that the assignment operator works when used with objects.
Granted, the assignment operator is often invoked indirectly when an object is passed to a function or method, but objects are now passed by reference because of the implicit assignment. Prior to PHP 5, the default behavior was to assign objects by value and pass them to functions by value.
This is perfectly acceptable behavior for primitives, but it incurs far too much overhead with objects. Making a copy of a large object by passing it by value can put strains on memory and in most cases, all that is wanted is a reference to the original object rather than a copy. Changing the function of the assignment operator is a fairly significant change. In fact, the scripting engine that underlies PHP, the Zend engine, was entirely rewritten for PHP 5.
In PHP 4 it is possible to pass objects by reference using the reference operator (&), and in fact it is good programming practice to do so. Needless to say, this use of the reference operator becomes entirely superfluous after upgrading to PHP 5.
Prognosis
The mere enumeration of the details of backward compatibility masks what can be a highly charged issue. Whenever you change an established language, there are competing interests. In many cases you are damned if you do and damned if you do not. For example, retaining inconsistent function naming conventions may be necessary to maintain backward compatibility, but you may also be criticized for this very lack of consistency.
Of course, breaking backward compatibility means that some existing code won’t function properly. In many circumstances it is not easy to decide where and when to break backward compatibility, but changing PHP to pass objects by reference is a fairly defensible change despite any inconveniences. The only thing you can be sure of is that any change will give rise to complaints in some quarter. Certainly, having deprecated functions issue warnings is one good way to give advance notice and let developers prepare for coming changes.
Where to Go from Here
If you know PHP already, then learning OO PHP will not be too difficult. Given the relative simplicity of PHP’s object model, certainly less effort is required than for a C programmer to learn C++. Nevertheless, moving to a new language or a new version of a language entails some cost in terms of time and effort, especially if it has an impact on your existing code libraries.
We have covered some of the backward compatibility issues as they relate to OOP. Almost all procedural code will run with no changes under PHP 5. No rewrites are required, and code does not need to be converted to an OO style. Upgrading existing applications to take advantage of PHP 5 is a different matter. In the case of some large applications, upgrading may require significant effort. Many applications will benefit by being upgraded. If you have ever tried to customize software such as phpBB (the popular open-source forum), you know that the task would be much simpler if the application was object-oriented. However, upgrading an application such as phpBB means beginning again from scratch.
And there are other considerations besides code compatibility. After learning the ins and outs of OOP with PHP 5, will you actually be able to make use of it? Are there actually servers out there running PHP 5?
Adoption of PHP 5
As of this writing PHP 5 is hardly a bleeding-edge technology. It has been available for more than a year, and there have been a number of bug fixes. It is a stable product. Where developers have control over web server configuration there is no question that upgrading to PHP 5 will be beneficial. But developers do not always have a choice in this matter. In some situations (where the developer has no control of the web host, for instance), the decision to upgrade is in someone else’s hands.
PHP is a victim of its own success. The popularity and stability of PHP 4 have slowed the adoption of PHP 5. PHP 4 is a mature language that supports many applications, open-source and otherwise. There is naturally a reluctance to rock the boat. For this reason the adoption of PHP 5 has been somewhat slow, especially in shared hosting environments.
Other web hosting options have been much quicker to adopt PHP 5. The various virtual private server (VPS) hosting options usually include PHP 5, as do dedicated hosts. As a more secure and increasingly inexpensive hosting option, VPS is becoming much more popular.
Compromise
Widespread adoption of PHP 5 will happen sooner or later, but this book recognizes that developers may need, at least for a time, to continue writing new applications that will run under PHP 4. For this reason, wherever possible, a PHP 4 version of code has been provided in addition to the PHP 5 version. In a sense, PHP 5 just formalizes what was already possible in PHP 4. For instance, even though PHP 4 allows direct access to instance variables, when creating a class in PHP 4 it makes sense to write accessor methods for variables rather than setting or retrieving them directly. This requires a disciplined approach, but it will yield code that not only runs under PHP 4 but also will be much easier to upgrade to PHP 5. Adding restrictive access modifiers to variables will be a relatively simple task if accessor methods are already in place. Writing code with the expectation of upgrading it will also invariably mean writing better code. That is all the talk about OOP. In the remaining chapters you’re going to do OOP.
Introduction to XML and Web Services
The Extensible Markup Language (XML) is a simple, platform-independent standard for describing data within a structured format. XML is not a language but instead a metalanguage that allows you to create markup languages. In layman’s terms, it allows data to be tagged using descriptive names so both humans and computer applications can understand the meaning of different pieces of data.
For example, reading the following structure, it is easy to understand what this data means:
Maine Augusta Moose Chickadee White Pine
The state capitol of Maine is Augusta. The state animal is the moose, the state bird is the chickadee, and the state tree is the white pine. Although no officially named standard markup language was used for this example, it is still a well-formed XML document. XML offers the freedom of defining your own language to describe your data as needed.
With these new languages, the number of applications (ranging from document publishing applications to distributed applications) and the number of people and businesses adopting XML continue to grow. One of the most visible XML-based technologies today is the Web service technology, where Web-based applications are able to communicate in a standardized, platform-neutral way over the Internet. As you may have guessed, this is a big reason why XML and Web services have become buzzwords. With almost 30 years of history leading up to its creation, XML may just be what the original pioneers behind generalized markup envisioned.
This tutorial will cover XML and Web services, beginning with the history of XML and including the introduction of Web services. By the end of this tutorial, you should have an idea of the problems XML was initially meant to solve and how it has evolved to what it is today.
Exploring the History of XML
Using XML in the Real World
Introducing Service Oriented Architecture and Web Services
Defining Common Terms and Acronyms
Exploring the History of XML
Regardless of your personal opinion of XML, everyone has at least heard of it. Not everyone, however, knows the origins of XML, and it is helpful to understand at least the basics of its evolution. Imagine you are attending a company party, and someone from management (it is even worse when they are not from the information technology [IT] group) decides to ask you about XML because they have been hearing all about it in meetings. After covering the history of XML, you will be certain to be left alone the rest of the night. Seriously, though, understanding how and why XML was conceived will provide an understanding of the problems it was originally meant to solve, which ultimately can aid in determining whether you should use it and how you can use it to solve current problems.
Generalized Markup Language
XML can trace its roots all the way back to 1969. Charles F. Goldfarb, previously a practicing attorney, accepted a position at IBM that involved integrating information systems with legal practices. The project involved integrating text editing, information retrieving, and document rendering. The problem at hand was that each application required different markup. Goldfarb, along with Ed Mosher and Ray Lorie, began what was to be eventually known as the Generalized Markup Language (GML). The name was actually created based on the initials of Goldfarb, Mosher, and Lorie, and from here the term markup language was coined.
The purpose of GML was to describe the structure of a document using tags, allowing for the retrieval of different parts of the text while separating document formatting from its content. This way the same document could easily be used amongst different applications and systems. These different systems would then use their own processing commands based upon the tags encountered within the document. Another important aspect was the introduction of Document Type Definitions (DTDs). GML was officially named in 1973.
Standard Generalized Markup Language
In 1978, Goldfarb joined the American National Standards Institute (ANSI) and worked on a project based on GML to be known as the Standard Generalized Markup Language (SGML). While GML was a proprietary IBM format, SGML was developed by many people and groups and aimed to standardize textual representation and manipulation in documents in a platform- and vendor-neutral, open format. SGML is not really a language in the sense most people think of languages but rather defines how to create a markup language, so it is really a metalanguage.
The first working draft of SGML was published in 1980 and continued to evolve, being released as a recommendation for an industry standard in 1983. In 1986, the International Organization for Standardization (ISO) published it as an international standard.
Although adopted by some large organizations, such as the U.S. Department of Defense (DOD), the U.S. Internal Revenue Service (IRS), and the Association of American Publishers (AAP), SGML was extremely complex, which ultimately prevented its widespread adoption. Most companies did not have the time or resources to leverage SGML in their business activities. However, some people say using SGML reduces a product’s time to market, because in the long run less time is spent on application integration and day-to-day editing. This may be true, but the upfront cost in time is typically too great for smaller companies that cannot afford to dedicate enough resources to this.
The complexity of SGML and the time-to-market paradigm of using it play significant roles in the history of XML and ultimately led to its creation. The following are a few notable concepts of SGML that are relevant in the evolution of XML:
- A document is defined structurally by a DTD.
- Named elements, also referred to as markup tags, defined within the DTD comprise the document.
- Entities, which are named parts of the document and consist of a name and a value, can perform substitutions within the document.
Hypertext Markup Language
Many of you may not remember the Internet before the World Wide Web was created. In those days, Gopher was a common technology used to access documents on the Internet. It was extremely primitive compared to what everyone uses today, but back then it allowed people to access documents and in most cases search for documents from all over the globe.
In 1989, while working at CERN, the European Particle Physics Laboratory, Tim Berners- Lee came up with an idea that would allow documents on the Internet to cross-reference each other. In basic terms, a document could link to other documents, including specific text within the documents. The language used to create these documents was Hypertext Markup Language (HTML). In 1990, the Web was born with the first live HTML document on the Internet.
HTML was based on SGML and added some features such as hyperlinking and anchors. Specifically created for the Internet, HTML featured a small set of tags and was designed for displaying content, causing it and the Web to quickly gain widespread adoption. Its features, however, were also its major limitations. Because it is simple, its tag set is not extendable. The tags also have no meaning to anything other than the application, such as a browser, that renders the document.
Extensible Markup Language
The technology started to come full circle in 1996. With SGML being considered too complicated and HTML too limited, the next logical step was taken. The World Wide Web Consortium (W3C) formed a committee to combine the flexibility and power of SGML with the simplicity and ease of use of HTML, which resulted in XML. Finally in February 1998, XML 1.0 was released as a W3C recommendation. Again, it was originally intended for electronic publishing, but little did they anticipate the reaching effects XML would have. The design goals were as follows:
- XML shall be straightforwardly usable over the Internet.
- XML shall support a wide variety of applications.
- XML shall be compatible with SGML.
- It shall be easy to write programs that process XML documents.
- The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
- XML documents should be human legible and reasonably clear.
- The XML design should be prepared quickly.
- The design of XML shall be formal and concise.
- XML documents shall be easy to create.
- Terseness in XML markup is of minimal importance.
To understand how simple XML can be, consider that an example of a complete well-formed XML document can be as simple as
.
Using XML in the Real World
Once hitting the streets, XML became the flavor of the day. Its use started spreading like wildfire. It was the age of the "dot-com," where companies were popping up like weeds and XML was being applied to everything. Although this may be grossly overstated because many companies-especially the larger, well-founded ones-were using XML sparingly and judicially, the vast majority of these start-up companies tried applying XML to virtually every situation. My opinions on this matter not only originate from personal experience but also from acquaintances who experienced the same situation.
Remember, while working at one company, word came down from management that we had to incorporate XML into our development. XML did not particularly fit and better technologies existed, but it was out of our control, so we did it. To this day, We can only speculate on why we received this mandate. It could have been that everyone was talking about the technology, and someone in management questioned why it was not being used or thought it would make sense to use the technology so that, when the company was discussed amongst potential venture capitalists, management could throw out the XML word to sound more attractive. In any event, XML is a useful technology, when used correctly. Everyone needs to remember XML is not the Holy Grail but is just another technology that can get the job done. In fact, this is important to remember when dealing with any technology!
Once the Internet bubble started deflating and companies, at least ones that survived, began re-evaluating their business and technology, it appears they also began using technology more prudently. You will always encounter the XML zealots who have to use XML for everything and claim it can replace most other technologies, you will also encounter those on the other end of the spectrum who contend XML is just a fad and will soon die. Reality, however, paints a different picture. XML is alive and doing well, just no longer plastered everywhere and being touted as the second coming. Before you start mumbling something about Web services under your breath, let’s focus on some of the areas XML has some real use, because this is the heart of the matter at hand. We will break the discussion down into four general areas:
- Standardized data description
- Publishing
- Data storage and retrieval
- Distributed computing
In most cases, the same XML data is used within more than one of these areas, which is one of its original design goals as well as why it became so popular.
Standardized Data Description
Standardized data description is not technically an application of XML but rather its heart and soul. It is the backbone of XML-based applications. Take, for example, the following document:
Hello World
This is a well-formed XML document in a language we just created; however, it is pretty much useless to anyone but myself, which is fine as long as we only one who needs to use the data. It does not work this way in the real world, however.
Companies, organizations, and even industries formally define languages as standards, meaning everyone must use the set of defined rules without deviation. This ensures data can be shared and easily understood by any human or machine that uses the defined language. If you were to search the Web for GML, trying to locate information about the Generalized Markup Language, you may be surprised at the results. You will get an abundance of information covering the Geography Markup Language and Geotech-XML, and if you are lucky, you might find several sites that actually concern the Generalized Markup Language. In fact, try a search on ML prefixed by almost any random character or two, and odds are you will find some sort of XML-based markup language. The following are just a few examples of publicly defined standardized languages.
Mathematical Markup Language
Mathematical Markup Language (MathML) is a standard, developed by the W3C, that defines a universally consistent manner to describe mathematics for use on the Web. It actually has two parts, consisting of presentation tags and content tags. The presentation tags in Listing, obviously, are for presentation in a browser, and the content tags in Listing describe the meaning of an expression, which can then also be used in automated processes.
Listing. Presentation Tags Expressing 1+2Listing. Content Tags Expressing 1+2
Extensible Business Reporting Language
Extensible Business Reporting Language (XBRL) is an open and international standard for describing business and financial data. This language is not as simple and short as MathML, so you can find real examples of this at Reuters (http://www.reuters.com) and Microsoft (http://www.microsoft.com). Each of these companies offers financial reports, available to the public, in XBRL format. It is also noteworthy that the Committee of European Banking Supervisors (CEBS), the U.S. Securities and Exchange Commission, and the United Kingdom are among some of the early adopters of this technology.
Publishing
Publishing is an obvious application of XML. Looking at XML’s history, this was the primary factor driving the development of generalized markup languages. Publishing involves taking the data content and transforming it for presentation. The presentation may take any form understandable to a user or program, such as Portable Document Format (PDF), HTML, or even another markup language.
Publishing to Different Formats
XML offers the flexibility to present the same content in multiple formats. Envision an application where the data needs to be sent to a Web browser in HTML format as well as to a wireless device understanding the Wireless Markup Language (WML). The same data content can be transformed into each of these markup languages using Extensible Stylesheet Language Transformations (XSLT).
Content Syndication
You might remember Microsoft’s Active Channels from many years ago. The Channel Definition Format (CFD) was the first Web syndication technology based on the push method. (The push method basically meant the server was pushing this content down your throat.) If you are lucky enough to not have been online during the Microsoft/Netscape technology wars back then, you are probably more familiar with the current-day RSS or ATOM. These are much more friendly because the client machine pulls the data if and when you want it. This data is then loaded into some type of parser, which then processes the data, usually for display.
Content Management Systems
A content management system (CMS) is a system used for creating, editing, organizing, searching, and publishing content. You can put XML to good use within a CMS (though it is not required, and many CMS systems you may encounter do not use any XML at all). For those that do employ XML, its use may fall into a few of the previously mentioned areas. Using a CMS for a Web site as an example, the minimal it would do is transform the XML content into HTML. As the site design changes or the business focus changes, you would have no need to modify the content. You might need to make some changes to style sheets for output, but you could leave the core content alone. Compare this to having content just embedded within an HTML page. Although you could use Cascading Style Sheets (CSS) for some design changes, moving content around within the layout would require some large cut-and-paste operations. This leads right into content-editing issues.
Even for small companies and organizations, copy changes to HTML-only pages are not all that simple. Normally the changes are coming from those who are not involved in the technical aspects of the Web site. This leads to the request for changes having to go through the proper channels until a designer actually makes the changes. In addition, the changes, after being made to the HTML, usually have to be double-checked and approved before they can move into the production system. While this may not seem all that difficult, imagine the implications when dealing on a larger scale, such as in big corporations or global organizations. Basically, it becomes a management nightmare. As you may infer from this, not only is the publishing of the data playing a role in the problem but the editing of the content is also contributing to the problem.
The final content used in the output typically consists of many smaller pieces of content, with some content even referencing and possibly including other chunks of content. Systems dealing with this often have a built-in editor where each person or group is in control of their own pieces of content, which are managed by the CMS. When dealing with XML-based content, the editor will help ensure valid syntax is used so the user does not require knowledge of XML. As content is added or edited, no longer is a large process needed to publish any of the changes. The content may still need to go through an approval process, but the ones involved would include only those who specifically deal with the site content. The CMS would take care of publishing these changes, again by processing all the content involved, which may include adding any referenced subcontent pieces and transforming the content into the appropriate layout. This would effectively take an IT department out of the process, because the IT team would no longer be needed to manually update copy, resulting in an increase in productivity.
Data Storage and Retrieval
The data storage, search, and retrieval area is another where XML is used. For simplicity’s sake, as well as that it aids in the understanding of this area, We will break this topic down into two distinct areas. On a small scale, you can use an XML document as a cross-platform database. Looking at the much larger picture, systems dealing with large amounts of XML content need ways to store this data so it can easily be searched, modified, and retrieved. Though related in some small way, the applications of these two examples differ significantly.
An XML Document As a Database
Many instances exist where data needs to be stored and retrieved, but conventional databases are overkill or simply cannot be used. For example, desktop applications need to load and save user settings. In many cases, simple text files (or in the case of some Windows applications, the registry) are used for storing the data. Typical text files use a layout consisting of a section identifier followed by name/value pairs that correspond to specific settings within the application. Listing shows an example of this.
[General]
Version=1.0
Country=United States
[Menu]
Background=212 226 217
FontColor=0 0 0
An application would read this file and set its internal parameters accordingly. An alternate approach would be to use XML for this, as shown in Listing.
Listing. Configuration File Example (XML Format)
1.0
United States
Using XML in this manner is mainly a personal preference. As demonstrated in the example, it is a bit more verbose than a simple text file, but in certain cases it can also add some benefit. A large configuration file could easily be broken up into smaller files, with the possibility of certain files residing on a network. An application could use an XML parser to load the main configuration file, reassemble the entire configuration file, and load the settings into the application. Sharing a configuration file amongst applications is also easier. Common settings could live within one level of the document, and application-specific settings could live within their own respective levels in the hierarchy. Again, this is just an alternative way to handle configuration files but can be found in some applications on the market today.
Native XML Databases
Recently, native XML databases have begun to gain traction in the marketplace. A native XML database (NXD) specializes in XML storage, focuses on document storage, and uses XPath to query data. Historically, XML has been stored in relational databases in a few ways. A binary large object (BLOB) field could store the entire document in the field. Documents could also be stored on the file system with the database used to locate the documents. A document could also be mapped to a database, where an element could be represented by a table and attributes, and nested elements could be represented by fields within the table.
Take, for example, Microsoft’s SQL Server 2000. The database could be queried using the following hypothetical Structured Query Language (SQL), which would output the record in XML format:
Select user_id AS ID, user_name AS NAME from Users User where user_id=1 FOR XML AUTO
As demonstrated, the fields are returned as attributes of the User element within the document. Inserts and updates to the table, however, are still accomplished using standard INSERT and UPDATE SQL commands with field name/value pairs. An NXD, on the other hand, uses XML technologies such as XPath and the Document Object Model (DOM) to create and manipulate documents within the database. For systems and companies utilizing XML-based content, NXDs may make sense because they offer common XML syntax for data access and deal with documents in their native formats. Relational databases, however, have also made strides in this area; many are beginning to include advanced XML features. These “XML-enabled” databases still provide their core relational model but also add many of the features of an NXD, such as native XML storage, which will preserve the infoset and XPath or XQuery querying. It is yet to be seen, however, whether these new XML-enabled databases will make native XML databases obsolete or just position the native ones to target XML-focused organizations with no real needs for relational data.
Distributed Computing
Distributed computing is not a new technology. Ever since computers were hooked into networks, systems have been working together and sharing tasks with other systems. With the introduction of the Internet came a much larger distributed network that could be leveraged. XML brings a common technology that can easily be used by all systems to take advantage of this area. The next section focuses on Web services and goes into greater detail on this matter.
Introducing Service Oriented Architecture and Web Services
Systems integration is one thing that virtually every IT department has had to deal with, from management down to the single developer. Whether a common platform was required or the same tool sets were needed, integration was never a simple task in the past and was usually costly in both time and money. Service Oriented Architecture (SOA) is a concept where none of these issues matters. It takes the approach that interacting systems should not be tightly bound to each other, thus promoting independence and reusability of services.
Using object-oriented programming in PHP 5 as an example, say you build an application using objects. The classes for the objects were well thought out, so each performs operations for specific areas of functionality. Another area of the company is working on a separate application and ends up needing to access functionality from the first application. On top of that, this new application is not even written using PHP so cannot reuse any code natively. The bruteforce method would be to have this new application duplicate the logic the PHP application does. This, however, presents problems if the logic were to change in the PHP application. The other application would need to also change its logic or face the problem that it no longer works correctly, which could lead to a variety of problems within the company, including data corruption.
Using SOA, the PHP application can expose the functionality of its classes via a service. Through a common protocol and descriptive messaging, the other application can access the functionality of the PHP application. For example, a daemon, which is a process waiting for invocation to perform a task, is written in PHP and run via the PHP command-line interpreter (CLI). The daemon accepts connections via Transmission Control Protocol/Internet Protocol (TCP/IP) and processes requests based on the messages it receives, which are written in some company-standardized text language. This text language describes the class to access, the function to call, the arguments, and their values needed by the function. The outside application then connects to the daemon, sends its message, and receives some response. Because the task was an external process, the calling application does not care how it was done, just that it was performed.
Although generic in its description and not going into specifics, the previous scenario should give you some sense of what SOA is. The inception of the Web service technology, which is a specific implementation of SOA, has brought new steam to the SOA concept. XML as a common message format using standard Internet protocols, such as Hypertext Transfer Protocol (HTTP) and HTTP Secure (HTTPS), has sparked new interest in this type of architecture, because using these standards is simple, is universally supported, and does not require anyone to reinvent the wheel.
The term Web services has to be one of the most confusing and controversial terms ever. In extremely general terms, Web services are a form of distributed computing using XML in their communications.Before attempting to define Web services, some background of how they came about is in order.
Evolution of Web Services
Tracing the roots of Web services, it seems XML-RPC—which is Remote Procedure Call (RPC) over HTTP via XML-is the obvious starting point. XML-RPC was a fork of the early, still in development, SOAP specification. A general misconception was that XML-RPC was the origin of SOAP and that SOAP was actually built upon XML-RPC.
These technologies, XML-RPC and SOAP, are just another form of distributed computing and use XML for the encoding, which allows for greater interoperability. You may have heard the Web service technology is a replacement for distributed object technologies, such as Distributed Component Object Model (DCOM), Common Object Request Broker Architecture (CORBA), or Remote Method Invocation (RMI). You can probably find arguments both for and against this. The Web service technology, however, is not a replacement for these technologies and is not even the same as them. Similarities do exist, but XML is just another tool to build distributed systems.
The Definition of Web Services
If you asked ten people to define the term Web services, you are likely to get ten different answers. This term has no single definition. Even the standards authorities cannot agree on what this term means. Before presenting you with what we consider to be a Web service, let’s first examine some definitions you may encounter.
The W3C created the Web Services Architecture Working Group to advise and create architectural documents in the area of Web services. The closest definition from the latest Working Group:A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards.
In addition, the Web Services Interoperability Organization (WS-I) conveniently does not state any definition for Web services, rather, the group defines requirements for the interoperability of Web services, which must be adhered to for an application to be granted conformance. (The WS-I is not a standards body but a collection of the larger corporations considered “leaders” in the Web service arena.) A definition that can be inferred from reading the specifications is that a Web service consists of Web Services Description Language (WSDL), SOAP, and Universal Description, Discovery, and Integration (UDDI). This is pretty much in line with what you would be told if you were to ask a Web service purist to define Web service.
The companies pushing WSDL, SOAP, and UDDI as the backbone of Web services are the same ones that have invested heavily in these technologies over the years. It is in their best interests to push these as standards to at least recoup some of the cost they have incurred. Based on those strict guidelines, Representational State Transfer (REST) is not even considered a Web service, although most people think of REST-based services as such. You almost get the feeling that unless you are using WSDL, SOAP, and UDDI, you are doing it wrong.
As developers, we all know there is only ever a single solution to a problem, and everything else is just plain wrong . The basic XML was not difficult.
Web Services in the Real World
It may be easier to come to some understanding of the term Web services by looking at a few places it is currently used on the Internet. Some big Internet companies, which you are probably already familiar with, offer Web services so you can tie your application into their systems. A few of the services, examples, are Yahoo, Google, Amazon, and eBay.
Yahoo Web Services
The Yahoo Web service, which uses REST, provides an application to use Yahoo’s search engine to find images, businesses, news, and video on the Internet. You must register for the service to obtain an application ID that is used in the requests. You can obtain this ID via http:// developer.yahoo.net/; its use is limited to the terms of service on the Yahoo Web site. (The following example does not require registration because it is just using the demo mode.)
Consider a hypothetical application that needs to search on terms and display the results it finds on the Internet to a user. Prior to these public Web services, many people would have their application perform a request to the search engine the same way a browser would do it. The result would be that the application would receive a nice HTML page, which then the developer would have to somehow parse to gather the correct information. This was not all that easy, and if the resulting HTML layout changed or if the content the application expected to be there for identification purposes changed, the application would need to be modified to work again. This is considered screen scraping, and some Web sites frown upon this method.
Using the Yahoo application programming interface (API), a search for the term XML is now very simple, and the results are easy to integrate into an application. Using a browser, enter the following location:
http://api.search.yahoo.com/WebSearchService/V1/webSearch?appid=YahooDemo&query=xml&results=2. The result should be an XML document that is easily parsed and contains two results. Compare that with what is normally returned when searching from a browser: http://search.yahoo.com/search?p=xml&sm=Yahoo%21+ Search&fr=FP-tab-web-t&toggle=1.
The first two results from the normal browser search are the same as the results returned from the Web service. The format is completely different. The Web service returns the information in XML, which allows for easy application integration, and the normal browser search is returned in HTML for presentation.
Google Web APIs
Google also offers a wide range of Web services, including searches as well as integration with many of their other services such as AdWords and Blogger. You can find a complete list of the services at http://www.google.com/apis/index.html. Registration is required to obtain a license key and access the Web services. Accessing the Web Search API is different from the previous Yahoo Web service example. Google uses SOAP rather than REST, though the concept is the same as Yahoo. XML is used in communications so an application can be easily integrated.
A more advanced Web service is the AdWords API. AdWords is Google’s cost-per-click advertising service. Using the API, an application can hook directly into the AdWords server, allowing for remote management of accounts and campaigns. For example, the application can manage the keywords, ad text, and the Uniform Resource Locator (URL) of a running advertisement.
Amazon E-commerce Service (ECS)
Amazon provides access to its products and to its e-commerce functionality through its E-commerce Service (ECS). The service is accessible using either REST or SOAP, which offers more flexibility to developers because they can use the technology they are most comfortable using. Registration is required to obtain a subscription ID for accessing the service. You will need to navigate to the Web service page from http://www.amazon.com for more information.
The service provides access to product information, including descriptions, images, and customer reviews, as well as search capabilities such as wish list searches. On top of the normal functionality you would expect, you can also access remote shopping carts. Putting all these services together, a site dedicated to some specific topic-for example, dogs-could dynamically add products from Amazon involving dogs to their site and offer the ability to add items to the cart that is eventually sent to Amazon for the checkout process. Prior to this capability, it was common to see a product on a Web site linked directly to Amazon for purchase. Using the service, the user could remain on the developer’s site and continue adding products until they are ready to check out.
eBay
eBay offers a developer program, at http://developer.ebay.com/, allowing an application to tap into its platform using eBay’s XML API, REST, or SOAP. Registration is required, and a free individual license is available. The REST API is quite limited in functionality compared to the other two APIs. Using REST, only publicly available information is available to be accessed so is currently limited to searching listings. The other APIs, however, offer an extensive collection of functionality. Virtually anything you can do via a browser can now be automated through an application. For example, an application could integrate with a current inventory and sales system. This not only reduces the amount of time spent manually handling transactions and keying them into a system and offers a seamless user interface (UI) for a sales system, but it also allows eBay transactions to be integrated with an inventory system to maintain a realtime inventory.
Defining Common Terms and Acronyms
XML is one of those technologies where you just cannot escape acronym. Table is a quick guide to some of the more commonly used terms and acronyms.
XML-Related TermsTerm | Definition |
URI | Uniform Resource Identifier. An address to locate a resource on a network (for example, http://www.example.com). |
URL | Uniform Resource Locator. URLs are subsets of URIs but today are considered synonymous with URIs. |
W3C | World Wide Web Consortium (http://www.w3.org/). An international consortium developing Web standards. |
OASIS | Organization for the Advancement of Structured Information Standards (http://www.oasis-open.org/). An international consortium developing various standards. |
ANSI | American National Standards Institute (http://www.ansi.org/). A private organization that creates standards for the computer and communications industries |
ISO | International Organization for Standardization ( http://www.iso.org/). An international standards organization consisting of national standards bodies from around the world. |
DTD | Document Type Definition. This is used within an XML document primarily for validation. |
Parser | A processor that reads and breaks up XML documents. Validating parser can validate documents based on at least DTDs. |
DOM | Document Object Model. |
SAX | Simple API for XML. |
XSLT | Extensible Stylesheet Language Transformations. |
XPath | A language for addressing parts of an XML document. |
REST | Representational State Transfer. |
SOAP | This once stood for Simple Object Access Protocol. As of SOAP 1.2, though, this is no longer considered an acronym. |