Semantic Engines

The IKS vision is to bring semantic technologies as open source components to small and medium sized CMS providers. A major contribution of IKS to the CMS space is Apache Stanbol, the open source semantic enhancements engine that is being developed as immediately usable software for existing content management systems.

While traditional metadata services are usually covered by CMSes, Apache Stanbol provides semantic lifting of the textual content: the automatic detection of “Named Entities” such as persons, places and locations and their linking to external sources, e.g. to dbpedia descriptions of resources. The enhancement capability of Apache Stanbol is currently the most mature part of the engine, but the engine framework is not restricted to just this activity.

A RESTful API for content analysis and entity linking

We focus on linking named entities, because simple tagging with words is not enough to overcome ambiguity and complexity of meaning – tags don’t cut it. We acknowledge the huge amount of legacy data and unstructured texts around and therefore provide a mechanism for automatic detection of entities and links. Our system works as local running software, where the content does not need to be posted elsewhere but is kept in house. This way, a content management system can make use of its own security and backup configuration. And last, but not least, its freely available under a permissive open source license.

Apache Stanbol uses a stateless interface to allow clients to submit content to the Enhancement Engines and get the resulting RDF enhancements at once without storing anything on the server-side. Its main mechanism has been developed especially to cater for the huge variety of running systems from simple web content management to all kinds of enterprise content management implemented in various programming languages and using various frameworks.

Getting Started – A simple example.

curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" \ -data "Barack Obama is president of the United States of America."

You will get a response with two Text Enhancements – the entities “Barack Obama (Person)” and “United States of America (Location)” as well as several so called Entity Enhancements – links to the according resources in Wikipedia.

What happened? The natural language processing and recognition of named entities is provided by the first engine powered on Open NLP. Its stable version works with english texts only and detects the following entity types: Person, Organisation and Location. The next engine takes this as input and provides possible links to the resources at dbpedia.

How structured description and linking of content works.

The resulting RDF graph is the nugget you get for every text from Apache Stanbol. Here, the magic starts (with your help)!

Google Semantic Search: Search Engine Optimization (SEO) Techniques That Get Your Company More Traffic, Increase Brand Impact, and Amplify Your Online Presence (Que Biz-Tech)
Book (Que Publishing)
Interesting facts
Related Posts