Chapter 3 - XML for Document Exchange

PCATS Electronic Commerce Guidelines - XML Business-to-Business Document Exchange recommend using the eXtensible Markup Language (XML) as a vendor neutral medium for the exchange of business documents between trading partners.  Specifically, XML documents contained herein are used to transfer business data between trading partners.  The content of these documents is defined and constrained by using XML Schema Language (XSD) as defined in XSDL 1.0.

XML Background

XML is essentially a lighter weight, web-friendly version of its parent language, the Standard Generalized Markup Language (SGML).  Like SGML, XML is a meta-language, and for this reason it is suitable for defining other languages or data exchange vocabularies.  As such, it is an architecture, not an application.  In contrast, HTML is one particular vocabulary that has been defined using SGML.  The difference is one of extensibility - while HTML has a single, fixed set of tags designed for a single purpose (describing how a document should be rendered for viewing in a browser), XML may be used to describe new document types in almost limitless ways.

XML brings a host of advantages for solving the problem of loose system-to-system integration and data exchange.  It is a readily available open standard, which is designed to attach meaning directly to the data it represents, rendering it both "human readable" as well as simple to work with programmatically.  Its ability to flexibly represent data sets whose content may often change makes XML a logical choice for a dynamic business environment.  As more and more organizations adopt XML as a standard way to represent virtually any kind of structured data, the ability to exchange data using XML becomes increasingly appealing.  In summary, XML offers the following features:

  • Vendor neutral
  • Human and machine readable
  • Flexible and easily extensible
  • Handles batch and real-time modes of operation
  • Can be used for both legacy and the latest object-oriented systems
  • Simple to create and read using "Off-the-Shelf" parsers an other tools
  • Approved and hosted by the World Wide Web Consortium (W3C) http://www.w3.org.

How XML Works

XML uses data elements to hold data.  A data element is made up of three parts: a start-tag, content, and end-tag.  The start-tag is enclosed in angle brackets (<>) and contains an identifier (sometimes called a generic identifier, or GI), which names the data.  The start-tag may also have attributes (herafter called XML attributes), which are simple name/value pairs, which describe the content data.  The end-tag of the element is also enclosed in angle brackets and uses the same identifier as the start-tag, except that it starts with a slash (/).  Data between the start-tag and the end-tag is the content of the element.

A typical XML tag is essentially an instruction issued by the sender (SEN) to the receiver (REC) that states, in effect, to treat the data enclosed in the brackets as the tag specifies, i.e., as customer name if the tags are <CustomerName>...</CustomerName>.  Both the receiver and sender must have a common understanding of the tag names that are assigned to fields in an XML document.  The use of the PCATS EB2B Data Elements assures that both the receiver and sender have agreed on the meaning of the tags.  This agreement is assured by the use of an associated XML Schema (XSD).

A Schema is a formal definition of data elements allowed and expected in a specific XML document.  It specifies what names can be used for tags, where they may occur, the allowed value of associated data content, and how everything fits together.

The body of work contained in this document includes both the PCATS-NAXML Schema Definitions (XSDs) and examples of their corresponding XML documents.

It is possible to design a set of XML documents for the interchange of business information without reference to Schema definitions.  However, to do so could produce a chaotic situation whereby each trading partner could specify its own elements and document constructs.  The result would be one-off solutions likely negating the benefits gained by "standardization" of the exchange.

XML Parsers

From a programmatic standpoint, XML is very easy to work with, at least in its more basic forms.  To generate XML compliant documents, a sending program need only produce standard text in a a predefined way.  XML receivers typically employ an XML parser to break the incoming document into its constituent data elements.

There are two levels of determining if an XML document is error free.  The basic level checks to insure the document is "well-formed", i.e., it conforms to all of the syntactic rules for an XML document.  The second level determines if the document is "valid", i.e., does it conform to the XSD that has been associated with it.  An XML document can be well-formed but not valid.  A valid dcoument also must be well-formed (be definition).

Various XML parsers are available depending on the needs of the third-party application.  Some parsers check only for being well-formed and others for being both well-formed and validity.  Because these guidelines recommend utilizing Schemas, a validating parser is a necessity.  An up-to-date listing of parsers commonly available can be found at http://www.xml.com.

Namespaces

In July 1999, the W3C approved the namespaces specification.  Namespaces provides scoping for element and XML attribute names, and its use avoids naming collisions when XML documents are being exchanged in a broader context.  Namespaces are a collection of names that are identified by a URI (Uniform Resource Identifier) where the names are used as XML elements and XML attributes.  By reference to an XML namespace, the same element name such as <bat> can be used with different meaning depending upon its referenced namespace declarations.

XSL/XSLT

One of the challenges presented to a designer of XML documents is how to define tags so that they can be easily recognized and used by all trading partners.  This simply isn't possible to achieve.  However, the use of XSLT (eXtensible Style Sheet Transformations) provides each trading partner the ability to transform an XML document conforming to the PCATS Guidelines into an XML doucment or other document format required by the that trading partner.  It is left to each trading partner to develop the XSLT stylesheets needed to accomplish its own translation/transformation.  For information purposes, one example of a XSLT transformation is provided in this Guideline.