wiki:generator_PhaseTab

Version 4 (modified by endres, 18 years ago) (diff)

--

TracNav?

AlignmentGenerator: PhaseTab Algorithm

Developers: Björn Endres and Malte Kiesel

Description

The PhaseTab algorithm is a generic algorithm that employs an arbitrary number of SimilarityMeasures to generate an alignment. The algorithm was originally developed by Malte Kiesel and Ludger van Elst and has been presented in the paper referenced here. It was then adapted for the PhaseLibs project.

The basic principle of the algorithm is the iterative confirmation of the best ranked relation proposals. This approach makes use of a feature of some SimilarityMeasures, especially the similarity flooding, which increase their precision when given higher confidences on correct relations.

In order to archieve a robust and fair ranking, a BordaCount? alogrithm is used to identify the n best rated realtion of all SimilarityMeasures involved.

An iteration cycle now consists of three phases:

  1. Generate the SimilarityMatrix for all SimilarityMeasures involved, using the current alignment as an input; the current alignment being empty in the first iteration. If the SimilarityMeasure is independent of such input, e.g. with a frame name based comparison measure, the matrix should not be recalculated. This is, however, up to the measure to decide.
  1. Copy all 1.0 rated realtions, i.e. "confirmed ones", into a new and empty alignment.
  1. Rank the remaining relations and identify the n best. These are also confirmed by setting their confidence to 1.0.
  1. Finally add all remaining relations with the confidences calculated by the BordaCount? algorithm. It is up to the used Alignment implementation, to filter these masses of relations. Typically, a MatchingAlignment would be used.
  1. Reiterate 1)-4) until a maximal number is reached or all relations are confirmed.

Characteristics

This algorithm estimates, that the best scoring relations can be confied in. According to our experiences, this is the case in many scenarios. It is, however, highly vulnerable to errors, since they become confirmed and add a high amount of noise to the algorithms confidences. Therefore, it is crucial to keep an eye on at least two parameters: the number n of relations confirmed per iteration, and the number of iterations. We have not yet been able to define a general criteria to determine when the algorithm trails into cascades of false confirmation.

The alignment generated by this algorithm lacks, due to its radical approach, reasonable relation confidences. This should not be a problem for end users, but is not beneficial when it comes to using it as a subroutine.

Evaluation/Performance

Specification

Parameters

Parameter name ValueType Default Description
PARAM_TAXONOMY_ONLY Boolean FALSE Defines whether only the taxonomy (classses) should be aligned. This defaults to aligning the classes along with the properties.
PARAM_THRESHOLD Double 0.0 This standard parameter forces the algorithm to drop all relations with lesser confidence. However, this parameter must be chosen carefully, since the confidences produced by the BordaCount? algorithm are no explanatory values. It would probably make more sense, to apply a threshold to each SimilarityMeasure, but this requires for a set of threshold parameters and not just a single one.
PARAM_MAXIMAL_ITERATIONS Integer 64 Defines the number of iteration cycles to be performed. Thus (PARAM_MAXIMAL_ITERATIONS * PARAM_CONFIRMATION_STEP_SIZE) relations eventually be confirmed.
PARAM_CONFIRMATION_STEP_SIZE Integer 5 Defines the number of relations to confirm with each iteration.
PARAM_FIX_1_0_RELATIONS Boolean TRUE EXPERIMENTAL: This enables the user to lift the 1.0 restriction, i.e. each iteration the relations with confidence 1.0 are set to 0.9999 to enable the system to review them. However, first test produced poor results.

Dependencies

The algorithm uses the BordaCount? alignment generator and the MatchingAlignment alignment implementation.

License Issues

This module is subject to the license the PhaseLibs project is published under.