IBM Corp has announced two new open source initiatives around its Unstructured Information Management Architecture framework to foster greater development around search and content analytics.
The company has helped to establish a UIMA Technical Committee from members of the OASIS (Organization for the Advancement of Structured information Standards) consortium to help companies standardize on UIMA, IBM is also championing a new Apache Software Foundation incubator project for developing UIMA-based software.
The goal of the UIMA Technical Committee is to refine and finalize a set of specifications based on an initial contribution from IBM, DPRA, and MITRE, and other founding committee members including EMC, SRI International, Science Applications International, Temis, Thompson, Army Information, Intelligence Warfare Directorate, as well as several academic Universities including Carnegie Mellon, Columbia and Stanford.
Carnegie Mellon’s Language Technology Institute is also hosting a UIMA Component Repository website that allows developers to post information about their UIMA-compliant analytics components.
IBM will also provide the initial contribution to the new Apache incubator project for the UIMA version source code. The Apache Software Foundation supports open-source software projects through open and collaborative development.
Nelson Mattos, vice president, Information and Interaction, IBM Research, said the aim of the two initiatives is to help companies to squeeze more business value from corporate data by discovering relationships and identifying patters in text documents, email, center notes, customer surveys, audio recordings, images, blogs and RSS news feeds and other unstructured data sources.
We’re making UIMA available to the community at large with the belief that it can help accelerate innovation, collaboration, and adoption of semantic search and content analytics software.
IBM has certainly played a pivotal role in developing and pushing UIMA as a standard integration framework for disparate text analytics and business intelligence tools.
IBM first release UIMA in August 2005 and at the start of this year introduced it to the open source community to drive more development of advanced text analysis applications.
At its core the UIMA standard effectively defines a common set of interfaces for integrating different text analytic components and applications — either in batch or real-time. Technically, the UIMA framework neutralizes the proprietary ontological, semantic and extraction layers of tools, allowing them to interoperate more smoothly. Vendors simply plug their interfaces into the framework.
A number of BI and text analysis software vendors, including ClearForest, Cognos, SAS Institute, Factiva, and NStein Technologies, have endorsed UIMA in their products.
Commercially available UIMA-compliant products are currently freely available from IBM, Attensity, ClearForest, Temis and Nstein.
The UIMA source code is freely downloadable from Sourceforge.net web site. A UIMA SDK, which comes with additional tools and components, is also available from IBM’s Alphaworks site.