On Metadata Management Technology: Status and Issues

<!--#set var="title" value="Journal of Object Technology - On Metadata Management Technology: Status, Won Kim" -->
<!--#include virtual="/include/wide_header.html" -->
  <div align="center">
 <table border="0" cellpadding="0" cellspacing="2">
    <tr> 
      <td class="text"> <div align="right"> 
          <table border="0" cellpadding="5" cellspacing="0" width="0">
            <tr> 
              <td> <p class="text"><a href="../column3">Previous column</a></p></td>
              <td align="right"> <p class="text"><a href="../column5">Next
                     column</a></p></td>
            </tr>
          </table>
          <hr>
          <table border="0" cellpadding="0" cellspacing="2" width="100%">
            <tr> 
              <td valign="top"> <h1>On Metadata Management Technology: Status
                  and Issues</h1>
			    <p class="text"><strong>Won Kim</strong>, SamSung Electronics, Suwon, Korea</p></td>
              <td align="right" valign="top"><img src="/images/graph/line20h.gif" alt="space" width="20" height="5" border="0"></td>
              <td align="right" valign="top"><span class="toptitle">COLUMN</span><br>
                <br> 
				<a href="column4.pdf"><img src="/images/logos/pdficonsmall.gif" alt="PDF Icon" width="22" height="24" border="0"><br>
                </a><span class="text_small">PDF Version</span></td>
            </tr>
          </table>
        </div>
        <h4>Abstract</h4>
        <p>Metadata captures the semantics of data in disparate data sources
          in an integrated enterprise information system. As such, there has
          long been a universal agreement on its importance. However, there are
          only a small number of vendors that offer metadata management systems
          as a separate product. In this article, I review the status of metadata
          management technology and vendors, and outline some of the key issues
          that are beyond the capabilities of metadata management products and
        are in the domain of consulting services.</p>
        <hr noshade width="80%" size="1"> 
        <h3>1	INTRODUCTION</h3>
        <p>Metadata is loosely defined as &#8220;data about data&#8221; (i.e.,
          descriptions of stored data). Although such a definition is not incorrect
          or inaccurate, it is too loose and vague when one has to organize,
          search, and manage metadata to support applications that drive one&#8217;s
          business or organizational operations. Metadata management has had
          a long history. The first generation of metadata management system
          was file-based data dictionary systems. The second generation was metadata
          repositories based on relational database systems. There are several
          vendors of federated database systems, now being called enterprise
          information integration systems. A metadata management system is always
          an integral part of such systems. Today there really are very few satisfactory
          universal metadata management systems on the market. Enterprises that
          need metadata management in their information system infrastructure
          should adopt one of the systems on the market, and shore up the deficiencies
        of the system with system integration and consulting services. </p>
        <p>In this article I discuss the following: </p>
        <ol>
          <li>types of metadata </li>
          <li>	difficulties in metadata management </li>
          <li>	metadata management system functions and architecture </li>
          <li>	metadata
            management system vendors, and </li>
          <li> metadata management issues that
                may require consulting services.<br>
          </li></ol>
		  <h3>2 TYPES OF METADATA</h3>
        <p>In practice, I believe that there are several types of “data about data”; that is, it is useful and necessary to define different types of metadata.</p>
        <ol>
          <li><strong>system catalogs metadata</strong><br>
            Relational database systems automatically maintain a type of metadata
              typically named system catalogs. System catalogs are data descriptors,
              and include such tables as Relation Table, Column Table, Usage
              Table, etc. the Relation Table includes column names for each relation
              in
              the database, while the ColumnTable includes data type, length,
              integrity constraint (Null allowed or not; Unique or not), etc.
              the Usage Table
              includes information about when a compiled code becomes invalid
              and requires re-compilation.</li>
          <li><strong>relationship metadata</strong><br>
            Relationship metadata means information about the relationship
                between data entities (i.e., tables). Relationships include primary
                key-foreign
              key relationship between a column in one table and a column in
                another table; generalization/specialization relationship (i.e.,
                IS-A relationship)
              between a class and its subclass in an object-oriented system or
                object-oriented database; aggregation relationship between an
                entity and its attributes;
              inheritance relationship between a class and its subclass in an
                object-oriented system or object-oriented database; and any other
                special semantic
            relationship which implies update or delete dependency.</li>
          <li><strong>content metadata</strong><br>
            Content metadata is descriptions of the contents of stored data
                at an arbitrary granule. Content data may be for an individual
                object
              (in the case of a textual document), a column in a table, or a
                table. Content metadata may be as simple as one keyword, or as
                complex as
              a business rule or a formula for computing tax or commission, or
                a link to an entire document. Content metadata is one of the
                most labor-intensive
              types of metadata with respect to its creation, reading, and updating.
              There are some products on the market, such as Interwoven&#8217;s
              MetaCode, that use text-mining technology to automatically capture
              keywords or
            summaries of textual documents as content data about such documents.</li>
          <li><strong>data lineage metadata</strong><br>
            Data lineage metadata is lifecycle data about stored data. In particular,
              it includes information about the creation of data (when, who,
                why), subsequent updates (when, who, why), transformation, versioning,
                summarization,
              migration, and replication. It also includes transformation rules,
              and descriptions of migration and replication. Just as content
                metadata, data lineage metadata may be at an arbitrary granule.
                Data lineage
              metadata is, broadly speaking, a form of relationship metadata,
                since data transformation, migration, and replication imply dependency
                among
              different manifestations of the same original data. For example,
                when wrong data is found in one document, changes need to be
                made not only
              to that document, but also all other documents from which the document
            was derived.</li>
          <li><strong>technical metadata</strong><br>
            Technical metadata is technical information about stored data.
                It includes such information as the format (e.g., .doc, .gif,
                .wav), compression
              or encoding algorithm used, encryption and decryption algorithm,
                encryption and decryption keys, software (including the release
                number) used to
            create or update the data, API used to access the data, etc.</li>
          <li><strong>data usage metadata</strong><br>
            Data usage metadata is descriptions of how and for what purposes
              the data is to be used by the users and applications. It is often
              called &#8220;business
          data&#8221;, as the intended users are often business analysts.</li>
          <li><strong>system metadata</strong><br>
            System metadata is descriptions about the overall system environment,
            including hardware, operating systems, application software, etc.</li>
          <li><strong>process metadata</strong><br>
            Process metadata is descriptions of the process in which the applications
          operate, and any relevant outputs of each step of the process. </p></li>
        </ol>        
		<p>I note that although other authors and companies include the data usage metadata, system metadata, and process metadata as legitimate types of metadata, and I too include them herein, the case for including them as metadata is rather weak. These types of metadata are more accurately “data” rather than “metadata”. Further, although other authors and companies tend to call such “data” as SQL code, design diagram, etc. as metadata, I think that these should be regarded more accurately as “data”. “Legitimate” metadata is metadata that the customer needs in order to understand the semantics and lineage of stored data, and in order to properly run the applications in support of the business needs. In other words, it is not necessary to take as legitimate metadata all of the very broadly and vaguely defined “metadata” in various technical white papers or product brochures.</p>
		<h3>3	DIFFICULTIES IN METADATA MANAGEMENT</h3>
		<p>There are three types of difficulties in metadata management, including metadata definition and management, technology, and standards. Metadata definition and management is about defining, creating, updating, transforming, and migrating all types of metadata that are relevant and important to a user’s objectives. As described in the previous section, most metadata, other than the system catalogs metadata and small parts of other types of metadata, requires diligent, timely, and disciplined manual data capture/gathering and update. Many organizations do not have the human resources or the discipline to identify, capture and manage comprehensive metadata.</p>
		<p> Metadata management technology includes metadata design tools that allow
		  users to model the schema of metadata across all data sources, and metadata
		  repository systems that allow the users to extract metadata from various
		  data sources, search and query metadata, exchange metadata with other users,
		  etc. I will discuss trends in metadata management technology in the next
	  section.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       </p>
		<p> Metadata standards include not only those for modeling and exchanging
	  metadata, but also the vocabulary and knowledge ontology.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       </p>
		<p> It is these difficulties that have stunted universal adoption of metadata
		  management technologies. Most vendors of metadata management technology claim
		  (to plan) adoption of Object Management Group’s metadata modeling standards Meta Object Facility (MOF) and Common Warehouse Metamodel (CWM), and the metadata import and export standard XML Metadata Interchange (XMI). Further, there are efforts such as Dublin Core Metadata Initiative’s Metadata Terms to standardize on certain metadata vocabulary. Standard knowledge ontology is also needed to organize such types of metadata as content metadata and data usage metadata. 
  </p>
		<p>With respect to the vocabulary and knowledge ontology, where there are suitable
		  industry standards, the standards may be adopted in full or in part. Where
		  there is no industry standard or where the industry standard is too cumbersome
	  or inappropriate, at least                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        “standards” internal to an enterprise should be defined and used.</p>
		<p>Further, appropriate procedures need to be defined and followed within the
		  enterprise in documenting the capture, update, transformation, migration,
		  replication of metadata and relevant transformation rules and business rules,
		  etc. </p>
		  <h3>4	METADATA MANAGEMENT SYSTEM FUNCTIONS AND ARCHITECTURE</h3>
		  <p>The first and second generation of metadata management systems did not
		    fare well. They did not provide adequate facilities for managing
		    metadata and there were no standards. One of the major problems with
		    these systems
		    was the fact that all metadata was stored centrally, and when changes
		    occur in data sources, the central metadata have to be updated manually.
		    One
		    trend in metadata management is for real-time access of distributed
		    data sources. This means that a global metadata model is kept in
		    a central repository,
		    and metadata is extracted from distributed data sources on demand.
		    This is called a federated metadata repository. The real-time extraction
		    of
		    metadata from a data source is performed through an adapter designed
		    for that data source. However, a federated metadata repository can
		    suffer in
		    performance, since some types of metadata (e.g., data lineage metadata,
		    technical metadata, data usage metadata, system metadata, process
		    metadata) do not come from any data sources, and even the metadata
		    that can be extracted
		    from data sources (e.g., contents metadata, catalogs metadata, relationship
		    metadata) should often be in the central repository for performance
		    reasons. So, a hybrid approach of maintaining both the federated
		    global metadata
		    and some of the actual metadata in a central repository is the desired
	    architecture.</p>
		  <p>Metadata management systems now on the market have become
		      more powerful than the first and second-generation metadata repositories
		      in terms
		      of metadata management facilities. The basic set of facilities
		    in a metadata management system should really include</p>
		  <ol>
		    <li>a metadata designer/modeler with a graphical user interface </li>
		    <li> a query manager (with query formulation, index creation and
		              management facilities) and metadata and query results browser
		            with a graphical user
              interface </li>
		    <li> security and access control (either by an access
	          control list or group and role-based access control) </li>
		    <li>	backup
	          and recovery (of metadata) </li>
		    <li> adapters to allow extraction of data
		          from a very wide variety of modern enterprise applications,
		      such as ERP, CRM, SCM,
		              and ECM systems, and a wide variety of data types, such
		      as relational databases, indexed
		                    sequential files, legacy hierarchical databases,
		      message middleware,
                HTML, XML, multimedia data, etc.</li>
		    <li> 	support for application development in Java, XML,
                and web services.</li>
		    <li>	adopt such standards as XMI,
		          MOF and CWM. </li>
	    </ol>		  
		  <p>Typically, vendors of metadata management systems offer their
	      own adapters and adapters provided by third-party adapter vendors,
		    such as iWay. Further,
		    the trend is to make the adapters bi-directional, in that the metadata
		    management systems receive data (and metadata) from the data sources,
      and also push updated data back to the data sources.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       </p>
		  <p>Beyond the above &#8220;basic&#8221; facilities,
	      metadata management systems really need to provide facilities to automatically
	      manage impact analysis,
		    data lineage analysis, and support for terminology and ontology standards.
		    There is almost no metadata repository system that supports these &#8220;advanced&#8221; facilities.
		    Data Advantage Group&#8217;s MetaCenter, although lacking in some key areas,
		    does provide some impact analysis and data lineage analysis support.
		    Further, automatic means of capturing &#8220;contents&#8221; metadata can
		    help the users of metadata repository systems. Interwoven&#8217;s MetaCode
		    is one example; using text-mining technology, it extracts key phrases
		    and summaries
	    (i.e., contents metadata) from textual documents.</p>
		  <p>As remarked earlier,
		      much of the metadata can only be created or updated manually, and
		    such metadata needs to augment the part of
		      metadata that can be automatically extracted and updated via adapters
		      to data sources.
		      For this reason, metadata repository systems now tend to emphasize
		      metadata extensibility, that is, provide facilities to accommodate
		      adding new types
		      of metadata. However, this important facility is not easy to provide.
		      The reason is that as a new type of metadata is added, such considerations
		      as data lineage, data dependency based on semantic relationships,
		      vocabulary
		      and knowledge ontology related to the new type of metadata must
		    be accommodated in a manner that is consistent with those for the existing
      types of metadata.</p>
	  <h3>5	METADATA MANAGEMENT SYSTEM VENDORS</h3>
		  <p>There are at least five categories of vendors that possess metadata management
		    technology: traditional database system vendors, enterprise application
		    vendors, content management system vendors, metadata repository vendors,
		    and enterprise information integration vendors. I will discuss only the
	    latter three types of vendors below.</p>
		  <p>Vendors of metadata repository systems
		    (along with the names of their products) include Data Advantage Group
		    (MetaCenter), ASG/ViaSoft (Roche), Informatica (SuperGlue), Microsoft
		    (Repository), Server
	    Enterprise (Saphir), Computer Associates (Platinum), etc.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       </p>
		  <p>Enterprise
		    Information Integration (EII) technology is basically federated database
		    technology
		    that integrated multiple data sources (e.g., database systems, file
		    systems, applications that store and manage their own data in databases
		    and files)
		    while keeping all data in their native data sources. EII systems
		    must create a global view of all the metadata that describe data
		    that reside in the
		    external data sources. The global view is the metadata that unifies
		    all metadata. Just about every EII vendor uses XML as the data model
		    for the
		    global metadata, and provides an adapter to each of the external
		    data sources. EII vendors focus on the “information integration” aspects,
		    rather than metadata management aspects of the EII technology. There
		    are many EII vendors.
		    The list (along with the names of their products) includes MetaMatrix
		    (MetaBase), IPEDO (Information Hub), XAware (XA-Suite), Actuate/Nimble
		    (Integration
		    Engine), Attunity, DataMirror (iFederate), TIBCO (Canon Developer),
	    Certive, Venetica (VeniceBridge), etc</p>
		  <p>Another category of vendors
		      that have metadata
		      management component in their products is ECM (enterprise content
		      management) vendors. ECM systems include sub-products such as document
		      management systems,
		      website management systems, records management systems, digital
		    asset management systems, collaboration systems, etc. Website management
		      systems in particular
		      need to bring together different types of data from a variety of
		      data sources, for distribution and presentation to portals. ECM
		    vendors
		      include FileNet,
		      OpenText, Interwoven, Vignette, Stellent, Documentum, etc. Vignette,
		      in particular, creates an object-oriented global view of different
		      data sources
		      as metadata. However, ECM vendors do not offer metadata management
		      component of their product suite as a separately supported product.</p>
			  <h3>6	METADATA MANAGEMENT ISSUES THAT MAY REQUIRE CONSULTING SERVICES</h3>
<p>Enterprise that depend on one or more complex enterprise applications that manage large volumes of complex data invariably need to manage metadata. Given today’s metadata management technology, the state of metadata management standards, and the fact that much of metadata management cannot be automated means enterprises require not only metadata repository systems but also consulting services. The following are areas (or topics) of metadata management in which consulting services need to complement metadata repository systems.</p>
<ol>
  <li>	identification of metadata that is relevant and important to an enterprise’s
      data management objectives (this will require interviewing business analysts
      and technical managers)</li>
  <li> metadata design and modeling (using a particular
        metadata repository system)</li>
  <li> definition of metadata vocabulary (this should
          be done in stages; further, this will require interviewing business
    analysts and technical managers</li>
  <li>	definition of metadata knowledge ontology (this
        too should be done in stages; further, this will require interviewing
    business analysts and technical
            managers)</li>
  <li> adapter development (using an adapter development SDK that
          comes
            with a metadata repository system)</li>
  <li>	determination of metadata and
        data prefetching (into the metadata repository) strategy. </li>
		</ol>
        <h3>REFERENCES</h3>
        <ol>
          <li> Overview of Metadata Management Architecture, a technical white
              paper from Data Advantage Group. <em>(*This provides a good insight
            into metadata management architecture.*)</em></li>
          <li>  Metadata As An IT Platform, a technical white paper from Data
              Advantage Group <em>(*This provides a good insight into metadata
              management architecture.*)</em></li>
          <li> Metadata Management for Data Warehousing: An Overview, Anca
                Vaduva and Thomas Vetterli, International Journal of Cooperative
              Information Systems, vol. 10, no. 3 (2001), pp. 273-298<em> (*This
            provides a good
                  overview of the general scope of metadata management in a large
            enterprise.*)</em></li>
          <li> The Changing Face of Repositories, Lana Gates, Application
                Development Trends, December 2001, pp. 25-30. <em>(*This is a
                good short article
                    about the scene of repository technology, vendors, and standards,
                    as seen
                    in 2001. The article is now dated, but still sheds some useful
                    insight into metadata management.*)</em></li>
        </ol>
        <h4>About the author<br>
        </h4>       
		 <table border="0" cellpadding="0" cellspacing="0">
          <tr> 
            <td valign="top"><img src="images/WonKim.jpg" width="100" height="100"></td>
            <td valign="top"><img src="/images/graph/line10h.gif" alt="space" width="10" height="10" border="0"></td>
            <td valign="top" class="text"><strong>Won Kim</strong> is Senior Advisor at SamSung
              Electronics, Korea. He is Editor-in-Chief of ACM Transactions on
              Internet Technology (<a href="http://www.acm.org/toit" target="_blank">www.acm.org/toit</a>), and Chair of ACM Special
              Interest Group on Knowledge Discovery and Data Mining (<a href="http://www.acm.org/sigkdd" target="_blank">www.acm.org/sigkdd</a>).
              He is the recipient of the ACM 2001 Distinguished Service Award.
              He can be reached at <a href="mailto:wonkim@austin.rr.com">wonkim@austin.rr.com</a>.<a href="mailto:flosav@cantv.net"><br>
              <br>
            </a></td>
          </tr>
        </table>        
		 <hr noshade width="80%" size="1">
        <p>Cite this column as follows: Won KIm: &#8220;On Metadata Management
          Technology: Status and Issues&quot;,
          in <em>Journal of Object Technology</em>,
          vol. 4, no. 2, March-April 2005, pp. 41-47 <a href="http://www.jot.fm/issues/issue_2005_03/column4">http://www.jot.fm/issues/issue_2005_03/column4</a></p>
        <hr> 
		<table border="0" align="right" cellpadding="5" cellspacing="0">
          <tr> 
            <td> <p class="text"><a href="../column3">Previous column</a></p></td>
            <td align="right"> <p class="text"><a href="../column5">Next
                   column</a></p></td>
          </tr>
        </table></td>
    </tr>
  </table>
</div>
<!--#include virtual="/include/wide_footer.html" -->