Machine Translation презентация

Содержание

Слайд 2

Introduction

sub-field of computational linguistics that investigates the use of software to translate text

or speech from one natural language to another (http://en.wikipedia.org/)
Use: translation of large amount of date in the shortest possible time
Standard documents
Instructions and manuals
Web sites, multilingual search
Reference information(addresses, recipes, etc.)
Aim: to understand the main contents of the document in a foreign language unknown to the user
NOT to be used instead of human translation !!!

Слайд 3

Approaches to machine translation
Rule-based approach
Statistical
Example-based approach
Hybrid machine translation

Слайд 4

Rule-based translation

Stages
Morphological analyses of source language
Parsing source language (syntactic groups)
Getting syntactic information about

each word
Dictionary based translation
example:

A girl eats an apple. (Eng.-Ger.)
stages of translation:
1st: getting basic part-of-speech information of each source word: a = ind.art.; girl = n.; eats = v.; an = ind.art.; apple = n.
2nd: getting syntactic information about the verb “to eat”: here: eat – Pr. Simple, 3rd Pers. Sing., Act. V.
3rd: parsing the source sentence:(an apple) = the object of eat
4th: translate English words into Germana (category = indef.article) => ein (category = indef.article)girl (category = noun) => Mädchen…
5th: finding appropriate inflected forms: A girl eats an apple. => Ein Mädchen isst einen Apfel.

Слайд 5

Statistical translation

Translations are generated according to probability distribution on the basis of statistical

models whose parameters are derived from the analysis of bilingual text corpora
Benefits
Better use of resources
More natural translations
No programmers or linguists* involved
Shortcomings
Corpus creation can be costly for users with limited resources.
The results are unexpected. Superficial fluency can be deceiving.
Statistical machine translation does not work well between languages that have significantly different word orders

Слайд 6

Статистический перевод

Основа - параллельный корпус
Вероятности назначаются подсчетом наиболее вероятного варианта перевода
Оценки вероятности зависят

от объема и качества обучающего корпуса
Лингвистическая информация: разбиение на предложения, графематический анализ, морфология
При наличии корпуса простейшая система перевода может быть сделана на 2 недели

Слайд 7

Rule-based vs. statistical

news:

document:

Слайд 8

Rule-based translation

Types
Dictionary-based (direct)
Transfer-based
Interlingual

Слайд 9

Dictionary-based (direct)

word by word translation
with or without morphological analysis or lemmatisation
Application
translation of long lists

of phrases on the subsentential (i.e., not a full sentence) level, e.g. lists, inventories or simple catalogs of products and services.

Слайд 10

Direct translation example

Слайд 11

Transfer-based machine translation

1. Analyzing the input text for morphology and syntax (and sometimes

semantics)
2. Creating an internal representation
3. Generating translation using both bilingual dictionaries and grammatical rules

Sentence in a source language

Source language structure

Sentence in a target language

Target language structure

analysis

transfer

synthesis

Слайд 12

Interlingua machine translation

the source language is transformed into an interlingua, i.e., an abstract

language-independent representation
the target language is generated from the interlingua.

Слайд 13

Transfer vs. interlingua

Слайд 14

Hybrid machine translation 

method of machine translation characterized by the use of multiple approaches within

a single machine translation system.
Types:
RBMT guided by statistics
Statistical method guided by RBMT
Имя файла: Machine-Translation.pptx
Количество просмотров: 104
Количество скачиваний: 0