Recognition and Translation of Thai Characters and Words from Text

Recognition and Translation of Thai Characters and Words from Text

OUTPUT


Goal/Objective:

● Detect, segment, and translate Thai characters from images of printed text. (Character classification)
○ This includes characters that have been rotated, slanted, within noisy images,
and in uneven lighting conditions.
○ Initially, we will test and work on detecting a single font of Thai, then add different
fonts and sizes to improve algorithm robustness.
○ We will also just start with the 44 consonants and if we are able to characterize
these, we will move on to the different vowels that exists on top, underneath, and
on the side of consonants.
● We will collect data​ from Thai newspaper websites.
○ First our training data on one font would be done by copying text into google docs
and printing them out as a jpeg images.
○ We will physically print them out and take photos of these document in different
lightings and orientations. Use these as both training and test data.
○ We will then use images from actual newspapers that’s been downloaded online
as further test data with different fonts and sizes.
● Translate the text after language detection
● Reach Goal: Machine learning to derive feature model for vowel and consonant
detection, primary handwritten Thai.
● We will evaluate ​our performances on the percentage of correct character classifications
of each characters. The number of false positives and misclassifications. We will try to
increase the correct characterization percentage and decrease misclassification and
false positive percentages.

Methods:

1. Create database of Thai characters starting with one font then using multiple fonts.
a. Obtain computer type font screenshots of the 44 Thai consonants, its 15 vowel
symbols, and the various vowel forms to use as character templates.
2. Generate character detection filters, such as hit-miss and minimum rank filters, and
templates from various Thai fonts.
3. Process and extract features and characters from images via locally adaptive gray level
thresholding, binarization, and region moments.
a. Resize and rotate images for proper matching and alignment
b. Reduce noise with median filtering, etc.
c. Identify regions of neighboring text.
4. Detect characters using filters and template matching.
5. Translate words and sentences with Google
6. Additional: extract features of different Thai consonants and vowels using machine
learning tools to create a model to detect handwritten Thai.

FOR BASE PAPER PLEASE MAIL US

DOWNLOAD SOURCE CODE CLICK HERE 

Comments

Popular posts from this blog

Light Field Images for Background Removal

Using Image Processing to Identify and Score Darts thrown into a Dartboard

Face Detection, Extraction, and Swapping on Mobile Devices