An increasing amount of electronically available information is stored in Asian language documents, which makes Information Retrieval (IR) and Information Extraction (IE) for these languages important for a large number of users. Analysis and extraction of information in these languages presents several interesting problems not seen in Western European languages; these are interesting in their own right and for the insights they can give into more general IR and IE techniques. We describe these problems and our system for Thai language IE One of the main concerns when working with Thai natural language is that the structure of the language itself is highly ambiguous. The analyser therefore requires more sophisticated techniques and large amounts of domain knowledge to cope with these ambiguities. We describe our approach to a natural language analysis system that performs preprocessing for the Thai language and the extraction module to retrieve specific information according to the predefined concept definitions.
|Number of pages||8|
|Publication status||Published - 2000|
|Event||5th International Workshop on on Information Retrieval with Asian Languages - Hong Kong, China|
Duration: 30 Sep 2000 → 1 Oct 2000
|Workshop||5th International Workshop on on Information Retrieval with Asian Languages|
|Abbreviated title||IRAL 2000|
|Period||30/09/00 → 1/10/00|