Citation
Ruspini, E. and Thomere, J. and Wolverton, M. Database Editing Metrics for Pattern Matching, in Proceedings of the IEEE International Conference on Computational Intelligence for Homeland Security and Personal Safety (CIHSPS-04), S. Giuliano – Venice, Italy, pp. 39-45, Aug 2004.
Abstract
Pattern-matching techniques are important tools to treat problems in several fields, including bioinformatics, case-based reasoning, information retrieval, and pattern recognition. These procedures are important in homeland security and crime prevention applications because the underlying problems require the discovery, in large databases, of instances of patterns known to be associated with illegal activities.
While pattern matching may be defined in strict terms as the satisfaction of a logical expression, defining the pattern, by a set of assertions contained in the database, the value of relevant procedures is considerably enhanced by permitting the discovery of approximate matches between database and patterns. The notion of approximate matching is based on the consideration of soft predicates, which may be satisfied to a degree, rather than the conventional crisp predicates of classical logic.
This paper introduces a family of metrics to measure the degree of qualitative match between a database and a pattern, that is, an elastic constraint on database objects and their relations. These metrics provide a formal foundation for the application of graph-editing metrics—measures of the cost associated with graph transformations—to pattern-matching problems. The degree of matching between database and patterns is determined by means of similarity measures that gauge the resemblance between pairs of objects. In our treatment, these measures have a semantic basis stemming from consideration of knowledge structures, such as ontologies, describing common properties of two objects. Approximate pattern matching is treated as the process of modifying databases into a transformed database that strictly satisfies the constraints expressed by the pattern. Associated with each transformation is a measure of admissibility derived from the similarity between the original and transformed databases. The degree of matching of database to pattern is defined as the admissibility of the transformation with highest admissibility value.