Changes between Version 2 and Version 3 of similarity_SimFlood


Ignore:
Timestamp:
05/02/06 10:35:56 (19 years ago)
Author:
endres
Comment:

restructured, removed (some) typos

Legend:

Unmodified
Added
Removed
Modified
  • similarity_SimFlood

    v2 v3  
    11= Similarity Measure: Similarity Flooding = 
    2 Developer: [mailto:kiesel@dfki.uni-kl.de Malte Kiesel] [mailto:endres@dfki.uni-kl.de Björn Endres] [mailto:Lizhen.Qu@dfki.de Lizhen Qu]  
     2Developers: [mailto:kiesel@dfki.uni-kl.de Malte Kiesel] [mailto:endres@dfki.uni-kl.de Björn Endres] [mailto:Lizhen.Qu@dfki.de Lizhen Qu]  
    33 
    44== Description == 
    5 This is based on the algorithm similarity flooding from Sergey Melnik,Hector Garcia-Molina and Erhard Rahm. The detailed description of the algorithmus can be found at [http://www-db.stanford.edu/~melnik/mm/sfa/ Similarity Flooding: A Versatile Graph Matching Algorithm]. Besides a implementation of the original algorithmus, [mailto:Lizhen.Qu@dfki.de Lizhen Qu] provides a optimized version to make it run more efficient under huge onologies.  
     5This module is based on the algorithm similarity flooding from Sergey Melnik, Hector Garcia-Molina and Erhard Rahm. The detailed description of the algorithmus can be found at [http://www-db.stanford.edu/~melnik/mm/sfa/ Similarity Flooding: A Versatile Graph Matching Algorithm]. Besides a implementation of the original algorithmus, [mailto:Lizhen.Qu@dfki.de Lizhen Qu] provides a optimized version to make it run more efficiently with huge ontologies.  
    66 
     7---- 
    78 
    8 == Specification == 
    9 The original version from [mailto:kiesel@dfki.uni-kl.de Malte Kiesel] [mailto:endres@dfki.uni-kl.de Björn Endres] is under namespace de.dfki.km.phaselib.impl.similarities.sf. The opimized one is under de.dfki.km.phaselib.impl.similarities.sf.optimized. The usage of this similarity measure is no different than the other similarity measures, just call the corresponding methods that inherit from de.dfki.km.phaselib.model.evidence.SimilarityMeasure. 
     9= Original SF = 
     10Developers: [mailto:kiesel@dfki.uni-kl.de Malte Kiesel] [mailto:Lizhen.Qu@dfki.de Lizhen Qu] [mailto:endres@dfki.uni-kl.de Björn Endres]  
    1011 
    11 === original SF === 
    12 The original SF takes a matrix calculation based approach. The library colt enables the efficient matrix calculation. The whole implementation is achieved through 2 classes Graph and SimFlood. The class Graph is used for several purposes: A graph model representing the Ontology, a data structure for paarwise connectivity graph (PCG) and conversion from PCG to induced propagation graph(IPG). IPG is represented as an object of cern.colt.matrix.DoubleMatrix2D. The class SimFlood, which implements de.dfki.km.phaselib.model.evidence.SimilarityMeasure, calculates the similarity values from IPG and stores them in an object of class de.dfki.km.phaselib.impl.similarities.common.SFSimilarityMatrix. Every call of either calcSimilarities() or calcSimilarity results in a new calculation of similarity values.  
     12The original SF takes a matrix calculation based approach, the library colt supporting efficient matrix calculations. The whole implementation is achieved through the two classes Graph and SimFlood. The class Graph is used for several purposes: A graph model representing the Ontology, a data structure for pairwise connectivity graph (PCG) and the conversion from the PCG to the induced propagation graph (IPG). The IPG is represented as an object of cern.colt.matrix.DoubleMatrix2D. The class SimFlood, which implements the SimilarityMeasure, calculates the similarity values from IPG and stores them in an object of class de.dfki.km.phaselib.impl.similarities.common.SFSimilarityMatrix.  
    1313 
    14 ==== dependencies ==== 
    15 [http://hoschek.home.cern.ch/hoschek/colt/ colt] a efficient scientific library is used here for matrix based calculation. 
    16 === optimized SF === 
    17 The central class of optimized SF is SimilarityMatcher, which implements SimilarityMeasure. OntoGraph describs a model, which is a graph representation of an ontology. A paarwise connectivity graph is represented by PCGGraph. Because IPG is basically a PCG with weighted edges going in 2 directions, it's also represented by the class PCGGraph. Other than the original implementation, optimized SF uses PCGVertex to store the similarity values, which make it easier to add new subgraphs latter on. 
     14=== Characteristics === 
     15Every call of either {{{calcSimilarities()}}}  or {{{calcSimilarity()}}} results in a new calculation of similarity values.  
    1816 
    19 The similarity flooding (SF) is optimized through dividing the whole graph representing a ontology into several subgraphs. It tries to create a set of minimal paarwise connectivity graphs for latter fixpoint computation. 3 policies are adopted to create the subgraphs. According to the default one Taxonomy an OntoGraph object is created from each given class of the class pair. The graph objects contain all the classes having is-a relations to the given classes or vice versa. Then a paarwise connectivity graph is generated from the 2 OntoGraph objects. The other policy SuperSubClsOnly includes all the superclasses and subclasses of the given class pair, the superclasses of superclasses and subclasses of subclasses are considered. The vertices belonging to the same derivation level can make a combinatorisch PCG vertex. The vertices from different level will not make a PCG vertex. The last policy All provides the same result as the original implementation.  
     17=== Evaluation/Performance === 
     18TODO 
     19 
     20=== Specification === 
     21==== Intitialisation ==== 
     22The SimilarityMeasure class is 
     23 {{{de.dfki.km.phaselib.impl.similarities.sf.SimFlood}}} 
     24 
     25Initialisation is straight forward ({{{new SimFlood()}}}) 
     26 
     27==== Parameters ==== 
     28none 
     29 
     30==== Dependencies ==== 
     31 1. [http://hoschek.home.cern.ch/hoschek/colt/ colt] - an efficient scientific library used here for matrix based calculation. 
     32 
     33---- 
     34 
     35= Optimized SF = 
     36Developer: [mailto:Lizhen.Qu@dfki.de Lizhen Qu]  
     37 
     38The central class of the optimized SF is SimilarityMatcher, which implements SimilarityMeasure. OntoGraph describes a model, which is a graph representation of an ontology. A pairwise connectivity graph is represented by PCGGraph. Because IPG is basically a PCG with weighted edges going into two directions, it's also represented by the class PCGGraph. Other than the original implementation, optimized SF uses PCGVertex to store the similarity values, which makes it easier to add new subgraphs later on. 
     39 
     40=== Characteristics === 
     41The similarity flooding (SF) is optimized through dividing the whole graph representing a ontology into several subgraphs. It tries to create a set of minimal pairwise connectivity graphs for later fixpoint computation. Three policies are adopted to create the subgraphs. According to the default one Taxonomy an OntoGraph object is created from each given class of the class pair. The graph objects contain all the classes having is-a relations to the given classes or vice versa. Then a paarwise connectivity graph is generated from the two OntoGraph objects. The other policy SuperSubClsOnly includes all the superclasses and subclasses of the given class pair, the superclasses of superclasses and subclasses of subclasses are considered. The vertices belonging to the same derivation level can make a combinatorisch PCG vertex. The vertices from different level will not make a PCG vertex. The last policy All provides the same result as the original implementation.  
    2042The optimized SF allows also setting different fixpoint computing formula introduced in article [http://www-db.stanford.edu/~melnik/mm/sfa/ Similarity Flooding: A Versatile Graph Matching Algorithm]. It uses the same names A, B, C to identify the different formula.  
    2143 
    2244According to policies SuperSubClsOnly and Taxonomy, new subgraphs will be added if required. To avoid computation overhead, the new graph will be firstly merged into the existing subgraphs, if the same vertex is found in the existing subgraphs, the existing ones will take place of the new ones. Then the subgraphs having the same vertices will be marked as dirty. A fixpoint computation will run on the new graph and the dirty graphs.    
    2345 
    24 ==== dependencies ==== 
    25 log4j - logging 
     46=== Evaluation/Performance === 
     47TODO 
     48 
     49=== Specification === 
     50==== Intitialisation ==== 
     51The SimilarityMeasure class is 
     52 {{{de.dfki.km.phaselib.impl.similarities.sf.optimized.SimilarityMatcher}}} 
     53 
     54Initialisation is straight forward ({{{new SimilarityMatcher()}}}) 
     55 
     56==== Parameters ==== 
     57none 
     58 
     59==== Dependencies ==== 
     60 1. log4j - logging 
    2661