Volume 7 : Number 2 : Paper 4

December 2004 Special Issue of Best Papers presented at CLEI2003, La Paz, Bolivia. Guest Editors: Dr. Jose Carlos Maldonado (ICMC-USP, Brazil), Dr. Cesar Ibarra Guerrero (Bolivia), Dr. Adenilso da Silva Simao
TM-Builder: An Ontology Builder based on XML Topic Maps

Authors and Affiliations:
Giovani Rubert Librelotto, Universidade do Minho, Departamento de Inform
Jose Carlos Ramalho, Universidade do Minho, Departamento de Inform
Pedro Rangel Henriques, Universidade do Minho, Departamento de Inform

Everyday a huge number of new information resources are linked to the web.
This way the web is growing very fast, making search tasks more and more
difficult with worse results. To solve the problem several initiatives were
undertaken and a new area of research and development emerged: the one called
Semantic Web. When we refer to the semantic web we are thinking about a
network of concepts. Each concept has a group of related resources and can be
related to other concepts; we can then use this concept network to navigate
among web resources or simply among information resources. From the undertaken
initiatives one became an ISO standard: Topic Maps ISO 13250. The aim of this
paper is to introduce a Topic Map (TM) Builder, that is a processor that
extracts topics and relations from instances of a family of XML documents. A
TM-Builder is strongly dependent on the resources structure. So, to extract a
topic map for different collections of information resources (sets of
documents with different structures) we have to implement several TM-Builders,
one for each collection. This is not very easy! To overcome this inconvenient
we have created an XML abstraction layer for TM-Builders that enables us to
specify the topic map we want to build from a concrete family of resources, in
order to generate automatically the intended extractor. To describe that
process, i.e. the extraction of knowledge from XML documents to produce a TM,
we present a language to specify topic maps for a class of XML documents, that
we call XSTM (XML Specification for Topic Maps). We also discuss a XSL
processor that automatically generates the Extractor from its formal
specification written in XSTM, the XSTM-P.

Received , Revised
Full paper, 14 pages [ PDF, 427 Kb ]