Citation
Yeh Eric, Agirre Eneko. SRIUBC: simple similarity features for semantic textual similarity, in Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 617-623, 2012.
Abstract
We describe the systems submitted by SRI International and the University of the Basque Country for the Semantic Textual Similarity
(STS) SemEval-2012 task. Our systems focused on using a simple set of features, featuring a mix of semantic similarity resources,
lexical match heuristics, and part of speech (POS) information. We also incorporate precision focused scores over lexical and POS information derived from the BLEU measure, and lexical and POS features computed over split-bigrams from the ROUGE-S measure.
These were used to train support vector regressors over the pairs in the training data. From the three systems we submitted, two performed well in the overall ranking, with splitbigrams improving performance over pairs drawn from the MSR Research Video Description Corpus. Our third system maintained three separate regressors, each trained specifically for the STS dataset they were drawn from. It used a multinomial classifier to predict which dataset regressor would be most appropriate to score a given pair, and used it to score that pair. This system underperformed, primarily due to errors in the dataset predictor.