Changes between Version 6 and Version 7 of similarity_SimFlood


Ignore:
Timestamp:
08/31/06 16:07:42 (18 years ago)
Author:
anonymous
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • similarity_SimFlood

    v6 v7  
    4242 
    4343=== Characteristics === 
    44 The similarity flooding (SF) is optimized through dividing the whole graph representing a ontology into several subgraphs. It tries to create a set of minimal pairwise connectivity graphs for later fixpoint computation. Three policies are adopted to create the subgraphs. According to the default one Taxonomy an OntoGraph object is created from each given class of the class pair. The graph objects contain all the classes having is-a relations to the given classes or vice versa. Then a paarwise connectivity graph is generated from the two OntoGraph objects. The other policy SuperSubClsOnly includes all the superclasses and subclasses of the given class pair, the superclasses of superclasses and subclasses of subclasses are considered. The vertices belonging to the same derivation level can make a combinatorisch PCG vertex. The vertices from different level will not make a PCG vertex. The last policy All provides the same result as the original implementation.  
    45 The optimized SF allows also setting different fixpoint computing formula introduced in article [http://www-db.stanford.edu/~melnik/mm/sfa/ Similarity Flooding: A Versatile Graph Matching Algorithm]. It uses the same names A, B, C to identify the different formula.  
     44The optimized similarity flooding (SF) contains both the implementation of original algorithm and 3 optimized ones. The differences are characterized by the way of constructing pairwise connectivity graphs (PCG), which is configurable by the parameter sf.optimized.growthPolicy. If the growth policy is set to All, the original algorithm is used. If the policy is set to SinglePCG, it will constructs PCGs based on the relations of the input alignment. It constructs only PCGs containing all the input relations.  
     45With a parameter sf.optimized.numberOfPCGs, the top n PCGs ranked by variance of confidence will be selected as output. According to observation, PCGs with lower quality has always low variance. Another policy is called PCGStepwise, it tries to construct all the PCGs having more than one vertex. Then based on the parameter sf.optimized.numberOfPCGsTaxonomy, the top n PCGs with biggest variance will be selected. The last one is Taxonomy. Just as the name said, only one Taxonomy from ontology A will be compared with a taxonomy of ontology B.  
    4646 
    47 According to policies SuperSubClsOnly and Taxonomy, new subgraphs will be added if required. To avoid computation overhead, the new graph will be firstly merged into the existing subgraphs, if the same vertex is found in the existing subgraphs, the existing ones will take place of the new ones. Then the subgraphs having the same vertices will be marked as dirty. A fixpoint computation will run on the new graph and the dirty graphs.    
     47The optimized SF allows also setting all fixpoint computing formula introduced in article [http://www-db.stanford.edu/~melnik/mm/sfa/ Similarity Flooding: A Versatile Graph Matching Algorithm]. 2 kinds of propagation coefficients are also available. 
     48 
     49To achieve the best performance, if there is no input relation, fixpoint formula Basic is proven to be the best. With input relations e.g. from string based measures, formula C is generally the best. For faster evaluation, SinglePCG with formula C are considered a good match in circumstance that SF get input alignment from other similarity measure, the growth policy can results in constructing only 1 to 3 PCGs.   
     50 
     51de.dfki.km.phaselib.impl.alignments.SimCombinedAlignGenerator is a example of SF accepting a alignment from string based measure, threshold by 0.5, which has a precision of 83% and recall 38%. In de.dfki.km.phaselib.impl.alignments.inria.BestFirstAlignGenerator SF uses original implementation with formula basic, and string based measure generates alignment parallel with threshold 0.5. The better one of the 2 generated alignment will be choosen as the final result. It's assumed that the best alignment find more relations. With the simple rule this generator reaches a precision of 0.98 and recall 0.58. 
    4852 
    4953=== Evaluation/Performance === 
    50 TODO 
     54Alone with formula basic and inria benchmarks. precision 0.68, recall 0.31. 
    5155 
    5256=== Specification ===