This article was originally published on the Encodo Blogs. Browse on over to see more!
Metadata is, by definition, data about an application’s data. It describes the properties and capabilities of and connections between different types of information in a particular application domain. Examples of application domains are bookkeeping, document management, a lending library or inventory management.
Listed below are a few commonly used terms for modeling methodologies:
The approach recommended in the following paper can be generally described as MDD, but diverges from traditional approaches in several key ways.
All software defines and uses metadata, but usually defines it implicitly, encoding it in the constructs of the programming language (e.g. classes, field, methods, references, etc.) in which it is written. A non-trivial application—one that renders a GUI or responds to a web-service request—needs much, much more information than that. More often than not, that extra information is encoded in application-specific code that is usually not generalized (though it may follow a pattern).
Software increases in complexity and size with time. In order to maintain any sort of control over larger software projects, it is important to strike a balance between the following two precepts:
Using a standardized metadata framework helps an application apply the D.R.Y. principle, but seems to violate K.I.S.S. However, applying K.I.S.S. to the metadata framework itself results in a simple API that makes any application using it also much simpler than it would have been with an ad-hoc solution. Using the same simple pattern and library throughout several applications decreases the overall complexity and drastically improves maintainability.
Imagine an application that generates a list of entities, say Books. Regardless of whether it generates a report, a web page or a graphical user interface (GUI), it needs to identify columns of data with labels. Traditionally, each application solves the problem of retrieval and internationalization of labels on its own. Programmers without much time or experience—and without framework support—will most likely hard-code the text for the labels, making maintenance more time-consuming and error-prone.
A metadata-based application, on the other hand, starts off with a model, in which the labels for bits of data are already defined. This application need concern itself only with transforming the information in the metadata to the output format and not with retrieval and internationalization of the metadata itself. On top of that, the application is free to use other available metadata which maps to the output format, like color, borders or font-size.
Apps using metadata have the following advantages:
Those few systems that do use explicit metadata do so in order to provide object-relational mapping (ORM) and CrUD (Create/Update/Delete) access to a database. To name just a few:
These solutions use various mechanisms to specify the metadata (uniqueness, primary key, relationships, etc.) needed to communicate with a database: Cayenne uses XML files, Hibernate allows either XML files or Java annotations, LINQ uses the class definition along with optional attributes and Django uses inner classes.
Django goes further still by providing metadata for web UIs as well as data access. It uses this metadata to generate the entire administrative back-end—including sortable, filterable lists and CrUD UI for all objects—automatically. The .NET framework has grids that automatically provide CrUD from a LINQ dataset, but these cannot be customized by tweaking central metadata, as in Django . Other frameworks have been similarly inspired—Rails (Ruby) and Grails (Groovy) come to mind—but are still in very early stages and don’t seem to use much central metadata. For example, the BeanForm component in Tapestry generates web forms from objects using Java reflection and Hibernate properties (if present). However, it can only be further customized (e.g. styles and classes or layout) with metadata hard-coded in the HTML document.
All of the approaches above use the class hierarchy as the model and the reflection/introspection services of the language itself as the metadata API. Metadata not directly expressible in the language is attached using attributes (LINQ), annotations (Hibernate) or inner classes (Django).
This raises a few issues:
With metadata so central to the application, it makes no sense to cede control of it to a single external component (like an ORM). Instead, it must be independently and centrally defined and under application control.
In order to be both truly useful, metadata must satisfy the following conditions:
None of the requirements listed above places restrictions on the software environment. Rather, it emphasizes only that none of the components gets to control the metadata. For example, it is trivial to generate the data classes needed by Hibernate or LINQ from the metadata. Similarly, nothing prevents a project from generating the in-memory metadata from more traditional modeling approaches like UML.
The application benefits from the centralized metadata, but can continue to any external components without restriction.
The first step is to define the boundaries of an application domain with entities, properties, operations and relationships (described in section 5 – “Elements of metadata”). However, as mentioned above, an non-trivial application needs specialized metadata in order to work with one or more external components, such as:
To address these different needs—and to stay centralized but decoupled—metadata is split into aspects, each of which encapsulates one of the tasks listed above. For example, an application might want to store the following information in its metadata:
Aspects that are very similar, like “gui” and “web”, can put shared metadata into another aspect (e.g. “view”). This way, common metadata, like “color” and “font-size” are in the “view” aspect, while browser-specific metadata is defined in the “web” aspect. A desktop application includes the “gui” and “view” aspects, while a web application includes “web” and “view”. The reporting tool mentioned above includes only “view” so that it has access to the required display metadata.
The sections below briefly sketch the basic parts of metadata and are not meant to be comprehensive.
Since metadata is descriptive, all of its elements have the following basic properties:
Complicating things slightly, each textual property—like the “singular title” above—has multiple values, one for each language supported by the application. In addition to these basic, shared properties (of which there can be many, many more), specific types of elements add other features.
A lending library application is used in the examples below.
Entities describe particular types of objects in the application domain, like books, authors, publishers or customers. Less obvious entities are languages, media types, genres or lending transactions (when a customer loans a book for a period of time). Entities include the following lists:
Each of these is described in more detail in the following sections.
A property is a single feature of an entity, like a book’s title, an author’s first or last name or the due date for a loaned book. In addition to the basic features above (title, description), a meta-property has the following features:
There are, of course, dozens of other features that an application might associate with a property, but things like “color”, “font-size” or “visible” are only interesting to particular application domains. Therefore, these features belong in domain-specific aspects (as described above in “Aspects of Meta-Data”).
An operation is an action that can be executed against an entity (or, more precisely, an instance of an entity). In addition to the basic features above (title, description), a meta-operation has the following features:
Including the signature in the metadata is useful for validation, but the implementation is best left in the application itself (including it in the metadata would violate K.I.S.S.).
An application domain consists of more than just free-floating entities: it is the relationships between those entities that truly describe what is possible within a metadata model. A book has a list of authors as well as a publisher, whereas publishers and customers have lists of books. A relationship has the following features:
As with properties, these are the basic features that all relationships have to describe them fully; domain-specific aspects may add more.
Using metadata explicitly is the kind of approach that comes after years of experience working with other methodologies and technologies. It arises from a need to avoid re-inventing the wheel with each new application. We started Encodo after working for years with modeling tools that offered some—though not all—of the advantages mentioned in this paper.
Most available components and libraries, however, don’t work with metadata as we’d grown accustomed to. We tried using them as-is but, soon enough, realized we could adhere to neither K.I.S.S. nor D.R.Y. principles. Having had the advantage of using metadata in previous applications, our standards were raised to a level where we were no longer willing to work without it.