Information Extraction for Thai Documents

Rattasit Sukhahuta, Dan J. Smith

Research output: Contribution to conferencePaper

2 Citations (Scopus)

Abstract

An increasing amount of electronically available information is stored in Asian language documents, which makes Information Retrieval (IR) and Information Extraction (IE) for these languages important for a large number of users. Analysis and extraction of information in these languages presents several interesting problems not seen in Western European languages; these are interesting in their own right and for the insights they can give into more general IR and IE techniques. We describe these problems and our system for Thai language IE One of the main concerns when working with Thai natural language is that the structure of the language itself is highly ambiguous. The analyser therefore requires more sophisticated techniques and large amounts of domain knowledge to cope with these ambiguities. We describe our approach to a natural language analysis system that performs preprocessing for the Thai language and the extraction module to retrieve specific information according to the predefined concept definitions.
Original languageEnglish
Pages103-110
Number of pages8
DOIs
Publication statusPublished - 2000
EventProceedings of the fifth international workshop on on Information retrieval with Asian languages (IRAL 2000) - Hong Kong, China
Duration: 30 Sep 20001 Oct 2000

Conference

ConferenceProceedings of the fifth international workshop on on Information retrieval with Asian languages (IRAL 2000)
CountryChina
CityHong Kong
Period30/09/001/10/00

Cite this