Analyzing XML schemas with the Schema Infoset Model
![]() |
|
Easily perform complex queries on your schemas with this model
Level: Intermediate |
Shane
Curcuru (shane_curcuru@us.ibm.com)
Advisory
Software Engineer, IBM
July 2002
As the use of schemas grows, the need for tools to manipulate schemas grows. The new Schema Infoset Model provides a complete modeling of schemas themselves, including the concrete representations as well as the abstract relationships within a schema or a set of schemas. This article will show some of the power of this library to easily query the model of a schema for detailed information about it; we could also update the schema to fix any problems found and write the schema back out.
Note: This tip assumes you have a basic knowledge of schema documents; there are a number of links to schema documentation and a tutorial in Resources.
Although there are a number of parsers and tools that use schemas to validate or analyze XML documents, tools that allow querying and advanced manipulation of schema documents themselves are still being built. The Schema Infoset Model (AKA org.eclipse.xsd.*, or just "the library") provides a rich API library that models schemas -- both their concrete representations (perhaps in a schema.xsd file) and the abstract concepts in a schema as defined by the specification. As anyone who has read the schema specs knows, they're quite detailed, and this model strives to expose all the details within any schema. This will then allow you to efficiently manage your schema collection, and empower higher level schema tools -- perhaps schema-aware parsers and transformers.
Schema Infoset Model UML diagrams The library includes various UML diagrams for the actual library classes, which gives a quick overview of the relationships and attributes of common schema components. Abstract Schema Component relationships Abstract Schema Component attributes Schema Library class listing These diagrams are included in the library's documentation, including several other UML diagrams for both the abstract and concrete class trees. |
For an interface listing of the library showing all the schema objects modeled, please see Schema Infoset Model UML diagrams. The library also includes the UML diagrams used in building the library interfaces themselves; these diagrams show the relationships between the library objects, which very closely mimic the concepts in the schema specifications.
Example: Analyzing your
schemas
In this example, you'll want to check your schema
for possibly failing to specify restrictions on integer-derived types.
This could be useful for ensuring that all order quantities in purchase
orders have been bounded. Here, the schemas must be very specific, so you
want to require that all simple types that derive from integers include
both min/maxInclusive or min/maxExclusive facets. However, if the
min/maxInclusive or min/maxExclusive facets are inherited from a type
which this type derives from, that is still sufficient.
While you can use XSLT or XPath to query a schema's concrete
representation in an .xsd
file or inside some other
.xml
content, it is much more difficult to discover the type
derivations and interrelationships that schema components actually have.
Since the Schema Infoset Model library models both the concrete
representation and the abstract concept of the schema, it can easily be
used to collect details about its components, even when the schema may
have deep type hierarchies or be defined in multiple schema files.
In this simple schema, you will find some types that meet the criteria of having max/min facets, and some that do not. (You can find the full schema in FindTypesMissingFacets.xsd included in the zip file.)
Listing 1. Sample schema
|
Loading schemas into the
library
The library can read and write schema objects from a
variety of sources. I'll show it using the org.eclipse.emf ResourceSet
framework to easily load sets of schemas; you can also build and emit
schemas directly from or to a DOM object that you manage yourself. The
library provides a custom XSDResourceSet
implementation that
can intelligently and automatically load sets of schemas related by
includes, imports, and redefines. The abstract relationship between
related schemas is also modeled in the library.
|
Convenient schema
querying
Now that you have an XSDSchema
object,
you need to query it to find any types that are missing max/min facets.
First, you'll use some convenient library methods to quickly find all of
its simpleTypeDefinition
s that derive from the built-in
integer type. Since the library provides a complete model of the abstract
meaning of a schema, this turns out to be very straightforward. You can
query the XSDSchema
for its getTypeDefinitions()
listing, and then filter for XSDSimpleTypeDefinition
s that
actually inherit from the base integer type.
|
The schema components
model
Every component defined in the W3C schema
specifications is modeled in detail in the library. Now that you have a
list of all XSDSimpleTypeDefinition
s that derive from an
integer, you can query this list for ones that are missing either their
max or min facets, and produce a report. Note that the library can
conveniently group the effective max/minExclusive or max/minInclusive
facets together for quick searching; it also provides detailed access to
each type, including the actual lexical values if needed.
|
Your report: Types missing max/min
facets
With just a little bit of code, you've discovered
some fairly detailed information about the schema. If you download the
sample code and run it against the provided schema file, you should see a
listing like this:
|
Conclusion
Although this is a contrived
example, it does show how the library's detailed representation of a
schema makes it easy to find exactly the parts of a schema you need. The
library provides setter methods for the properties of schema components,
so it is easy to update your sample to automatically fix any found types
by adding any missing facets. And since the library models the concrete
representation of the schema as well, you can write your updated schema
back out to an .xsd
file.
Sample code
A
sample program, XSDFindTypesMissingFacets.java
, shows the
example in this article. It uses a schema document
FindTypesMissingFacets.xsd
which has a number of types with
and without max/min facets.
You can download the sample program and the following sample .java files in a zip file.
Copies of several other sample .java files normally shipped with the Schema Infoset Model are also attached. These include:
XSDSchemaQueryTools.java
showcases a number of other
ways to perform advanced queries on schema objects.
XSDSchemaBuildingTools.java
with convenience methods
for building schemas programmatically.
XSDPrototypicalSchema.java
uses the library to build
the ever-popular schema primer PurchaseOrder
sample. This content was adapted from an article on IBM developerWorks at http://www.ibm.com/developerWorks/.
About the
author Shane Curcuru has been a developer and quality engineer at Lotus and IBM for 12 years and is a member of the Apache Software Foundation. He has worked on such diverse projects as Lotus 1-2-3, Lotus eSuite, Apache's Xalan-J XSLT processor, and a variety of XML Schema tools. Questions about this article or about automated testing can be sent to him at shane_curcuru@us.ibm.com. |