Version 3 (modified by endres, 18 years ago) (diff) |
---|
AlignmentGenerator: PhaseTab Algorithm
Developers: Björn Endres and Malte Kiesel
Description
The PhaseTab algorithm is a generic algorithm that employs an arbitrary number of SimilarityMeasures to generate an alignment. The algorithm was originally developed by Malte Kiesel and Ludger van Elst and has been presented in the paper referenced here. It was then adapted for the PhaseLibs project.
The basic principle of the algorithm is the iterative confirmation of the best ranked relation proposals. This approach makes use of a feature of some SimilarityMeasures, especially the SimilarityFlooding, which increase their precision when given higher confidences on correct relations.
In order to archieve a robust and fair ranking, a BordaCount? alogrithm is used to identify the n best rated realtion of all SimilarityMeasures involved.
An iteration cycle now consists of three phases:
- Generate the SimilarityMatrix for all SimilarityMeasures involved, using the current alignment as an input; the current alignment being empty in the first iteration. If the SimilarityMeasure is independent of such input, e.g. with a frame name based comparison measure, the matrix should not be recalculated. This is, however, up to the measure to decide.
- Copy all 1.0 rated realtions, i.e. "confirmed ones", into a new and empty alignment.
- Rank the remaining relations and identify the n best. These are also confirmed by setting their confidence to 1.0.
- Finally add all remaining relations with the confidences calculated by the BordaCount? algorithm. It is up to the used Alignment implementation, to filter these masses of relations. Typically, a MatchingAlignment would be used.
- Reiterate 1)-4) until a maximal number is reached or all relations are confirmed.
Characteristics
This algorithm estimates, that the best scoring relations can be confied in. According to our experiences, this is the case in many scenarios. It is, however, highly vulnerable to errors, since they become confirmed and add a high amount of noise to the algorithms confidences. Therefore, it is crucial to keep an eye on at least two parameters: the number n of relations confirmed per iteration, and the number of iterations. We have not yet been able to define a general criteria to determine when the algorithm trails into cascades of false confirmation.
The alignment generated by this algorithm lacks, due to its radical approach, reasonable relation confidences. This should not be a problem for end users, but is not beneficial when it comes to using it as a subroutine.
Evaluation/Performance
Specification
Intitialisation
Parameters
none
Dependencies
none
License Issues
TODO