0% Complete
Home
/
12th International Conference on Computer and Knowledge Engineering
Introducing E4MT and LMBNC: Persian pre-processing utilities
Authors :
Zakieh Shakeri
1
Mehran Ziabary
2
Behrooz Vedadian
3
Fatemeh Azadi
4
Saeed Torabzadeh
5
Arian Atefi
6
1- Targoman Intelligent Processing Co. Pjc.
2- Targoman Intelligent Processing Co. Pjc.
3- Targoman Intelligent Processing Co. Pjc.
4- Targoman Intelligent Processing Co. Pjc.
5- Amirkabir university of technology
6- Targoman Intelligent Processing Co. Pjc.
Keywords :
Natural language processing،Neural Machine Translation،Persian text pre-processing
Abstract :
In this paper, we introduce two utilities, extensively used in our services and products. A Persian pre-processor(E4MT) we use for both training and inference in our machine translation services and a corpora-level language model-based error corrector(LMBNC), which we apply to corpora before training. E4MT(Essential tools for MT) consists of character normalization, spell correction, entity tagging, and tokenization/detokenization modules. It handles the Persian large vocabulary size problem by approximately reducing the vocabulary size by a factor of 2. We show that applying E4MT on the English-Persian translation task, yields an improvement of at least 1.2 BLEU over other toolkits. We apply LMBNC on the training corpora, which uses a domain-specific language model to identify context-dependent misspellings. The results show, using this corrected training corpora improves the English-Persian translation quality by 0.6 BLEU over its baseline. Additionally, the manual evaluation shows 97.9\% precision for E4MT and 98.1\% precision for LMBNC.
Papers List
List of archived papers
A Deep CNN Model Based Ensemble Approach for Semantic and Instance Segmentation of Indoor Environment
Sajad Rezaei - Jafar Tanha - Zahra Jafari - SeyedEhsan Roshan - Mohammad-Amin Memar Kochebagh
A New Hypercube Variant: Pruned Shuffle Connected Cube
Reza Latifi - Mahmoud Naghibzadeh
Forecasting El Niño Six Months in Advance Utilizing Augmented Convolutional Neural Network
Mohammad Naisipour - Iraj Saeedpanah - Arash Adib - Mohammad Hossein Neisi Pour
Introducing E4MT and LMBNC: Persian pre-processing utilities
Zakieh Shakeri - Mehran Ziabary - Behrooz Vedadian - Fatemeh Azadi - Saeed Torabzadeh - Arian Atefi
Evolutionary Approach to GAN Hyperparameter Tuning: Minimizing Discriminator and Generator Loss Functions
Sajad Haghzad Klidbary - Anahita Babaei - Ramin Ghorbani
A parallel CNN-BiGRU network for short-term load forecasting in demand-side management
Arghavan Irankhah - Sahar Rezazadeh Saatlou - Mohammad Hossein Yaghmaee - Sara Ershadi-Nasab - Mohammad Alishahi
Degarbayan-SC: A Colloquial Paraphrase Farsi Subtitles Dataset
Mohammad Javad Aghajani - Mohammad Ali Keyvanrad
Extreme Gradient Boosting (XGBoost) Regressor and Shapley Additive Explanation for Crop Yield Prediction in Agriculture
Dennis A/L Mariadass - Ervin Gubin Moung - Maisarah Mohd Sufian - Ali Farzamnia
No-Reference Video Quality Assessment by Deep Feature Maps Relations
Amir Hossein Bakhtiari - Azadeh Mansouri
ROCT-Net: A new ensemble deep convolutional model with improved spatial resolution learning for detecting common diseases from retinal OCT images
Mohammad Rahimzadeh - Mahmoud Reza Mohammadi
more
Samin Hamayesh - Version 41.5.3