0% Complete
Home
/
12th International Conference on Computer and Knowledge Engineering
Introducing E4MT and LMBNC: Persian pre-processing utilities
Authors :
Zakieh Shakeri
1
Mehran Ziabary
2
Behrooz Vedadian
3
Fatemeh Azadi
4
Saeed Torabzadeh
5
Arian Atefi
6
1- Targoman Intelligent Processing Co. Pjc.
2- Targoman Intelligent Processing Co. Pjc.
3- Targoman Intelligent Processing Co. Pjc.
4- Targoman Intelligent Processing Co. Pjc.
5- Amirkabir university of technology
6- Targoman Intelligent Processing Co. Pjc.
Keywords :
Natural language processing،Neural Machine Translation،Persian text pre-processing
Abstract :
In this paper, we introduce two utilities, extensively used in our services and products. A Persian pre-processor(E4MT) we use for both training and inference in our machine translation services and a corpora-level language model-based error corrector(LMBNC), which we apply to corpora before training. E4MT(Essential tools for MT) consists of character normalization, spell correction, entity tagging, and tokenization/detokenization modules. It handles the Persian large vocabulary size problem by approximately reducing the vocabulary size by a factor of 2. We show that applying E4MT on the English-Persian translation task, yields an improvement of at least 1.2 BLEU over other toolkits. We apply LMBNC on the training corpora, which uses a domain-specific language model to identify context-dependent misspellings. The results show, using this corrected training corpora improves the English-Persian translation quality by 0.6 BLEU over its baseline. Additionally, the manual evaluation shows 97.9\% precision for E4MT and 98.1\% precision for LMBNC.
Papers List
List of archived papers
A Language-Independent Approach to Classification of Textual File Fragments: Case Study of Persian, English, and Chinese Languages
Fatemeh Mansouri Hanis - Hamidreza Khoshvaghti - Mehdi Teimouri - Hadi Veisi
AvashoG2P: A multi-module G2P Converter for Persian
Ali Moghadaszadeh - Fatemeh Pasban - Mohsen Mahmoudzadeh - Maryam Vatanparast - Amirmohammad Salehoof
Traffic Sign Recognition Using Local Vision Transformer
Ali Farzipour - Omid Nejati Manzari - Shahriar B. Shokouhi
Virtual machine consolidation using SLA-aware genetic algorithm placement for data centers with non-stationary workloads
Hossein Monshizadeh Naeen
Joint mobility-aware offloading and UAV position optimization in Blockchain-enabled 5G
Zeinab Rabbani - Zeinab Movahedi
A Review on Machine Learning Methods for Workload Prediction in Cloud Computing
Mohammad Yekta - Hadi Shahriar Shahhoseini
Ensemble-Based Fraud Detection: A Robust Approach Evaluated on IEEE-CIS
Fatemeh Moradi - Mehran Tarif - Mohammadhossein Homaei
The Internet of Things-Enabled Smart City: An In-Depth Review of Its Domains and Applications
Amir Meydani - Ali Ramezani - Alireza Meidani
Implementation of a Low-Overhead 2-Bit Parity-Preserving Reversible Vedic Multiplier for Quantum Architectures
Shekoofeh Moghimi - Negin Mashayekhi - Mohammad Reza Reshadinezhad
Variance-Guided Feature Correlation for Deep Full-Reference Image Quality Assessment
Amirreza Khakpour - Sina Yademellat - Azadeh Mansouri
more
Samin Hamayesh - Version 43.7.0