AlignmentGenerator: Hotspot AlignmentGenerator
Developers: Björn Endres
Description
This experimental AlignmentGenerator tries to tackle large scale ontology alignment, for which other approaches fail due to high memory and/or performance requirements. The general idea is a "divide and conquer" approach, continously aligning a number of parts of the ontologies and joining them it to a single, large alignment.
A basic concept, that is used within this approach, is the ContextOntology. It defines a subontology by taking a single class of another ontology along with its context. Now, a Hotspot consists of the following attributes:
- a source ContextOntology, a subontology of the original source ontology.
- a target ContextOntology, a subontology of the original target ontology.
- an alignment between those two
The general idea of the algorithm is now, to identify a Hotspot using single sure matches as anchors. Then, this Hotspot is optimised in size with respect to a hotspot quality measure. Finally, the resulting hotspot is joined with every other hotspot it overlaps (in both, source and target). These steps are repeated until a satisfactory cover rate is reached.
Since the motivation for this approach was the alignment of large scale ontologies, the hotspots are only allowed to grow to a certain size limit during ontimisation. Hotspots can, however, become *very* big by being joined with others. Therefore, the joining algorithm does not involve any similarity measures, but merely joins their ContextOntologies and relations sets, while adding a prior to the hotspot of higher quality.
To implement this approach in a generic way, the algorithm requires two exchangeable modules, which are specified by the interfaces HotspotIdentifier and HotspotMeasure. They provide the method to identify new hotspots in the ontologies and to judge the quality of the hotspot at hand. See the definitions of these two modules to learn about their various implementations.
Characteristics
Generally, the algorithm is very sensitive to the performance of the HotspotIdentifier and HotspotMeasure used. Tests with smaller ontologies revealed acceptable results, even with simple implementations of these modules. Evaluation for larger ontologies (like the anatomy or the food scenario) is pending due to the lack of a ground truth.
Evaluation/Performance
Specification
Parameters
Parameter name | ValueType | Default | Description |
PARAM_TAXONOMY_ONLY | Boolean | FALSE | Defines whether only the taxonomy (classses) should be aligned. This defaults to aligning the classes along with the properties. |
PARAM_THRESHOLD | Double | 0.0 | This standard parameter forces the algorithm to drop all relations with lesser confidence. However, this parameter must be chosen carefully, since the confidences produced by the BordaCount? algorithm are no explanatory values. It would probably make more sense, to apply a threshold to each SimilarityMeasure, but this requires for a set of threshold parameters and not just a single one. |
Dependencies
License Issues
This module is subject to the license the PhaseLibs project is published under.