MT-Based Query Translation CL1R Meets Frequent Case Generation

Kimmo Kettunen

Abstract


The paper introduces the evaluation results of Cross Language Information Retrieval(CLIR) for three target languages, Finnish, German and Swedish using English as the source language. Our CLLR approach is based on machine translation of topics and usage of the Frequent Case Generation (FCG) method for management of query term variation in translated topics and retrieval in inflected indexes. Retrieval results of more standard query term variation management approaches, such as stemming and lemmatization of translated topics, are also shown. Results of the paper show, that when machine translation of queries are combined with FCG, results can be at best very promising. The besi Machine Translation (MT) programs seem to translate standard laboratory type Information Retrieval (IR) topics quite well at least from the query performance point of view. Few times the translated queries perform as well as or slightly better than the monolingual baseline. Many times differences to monolingual baseline are small.

PAPER TYPE: Research Paper

KEYWORDS: Machine Translation; Cross Language Information Retrieval (CLIR); Frequent Case Generation (FCG); Information Retrieval (IR); Stemming; Lemmitization


Full Text: PDF



Creative Commons License The TRIM is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License