OCR SYSTEM WITH PATTERN, COLOR RECOGNITION AND

Transcription

OCR SYSTEM WITH PATTERN, COLOR RECOGNITION AND
i
OCR SYSTEM WITH PATTERN, COLOR RECOGNITION AND
TRANSLATION FUNCTION
HO CHUN KIT
Faculty of Electrical Engineering
Universiti Teknologi Malaysia
i
HO CHUN KIT
23 MAY 1986
OCR SYSTEM WITH PATTERN,
COLOR RECOGNITION AND
TRANSLATION FUNCTION
2009/2010
860523-56-6709
20 APRIL 2010
ASC.PROF DR YAHAYA MD SAM
20 APRIL 2010
ii
“I hereby declare that I have read this thesis and in my
opinion this thesis is sufficient in terms of scope and quality for the award of the
degree Bachelor of Electrical Engineering (Control and Instrumentation)
Signature : ....................................................
Name of Supervisor : PM. Dr. Yahaya Md Sam
20 APRIL 2010
Date : ............................................................
iii
OCR SYSTEM WITH PATTERN, COLOR RECOGNITION AND
TRANSLATION FUNCTION
HO CHUN KIT
A thesis submitted in fulfillment of the
requirements for the award of the degree of
Bachelor Electrical Engineering (Control and Instrumentation)
Faculty of Electrical Engineering
Universiti Teknologi Malaysia
APRIL 2010
iv
I declare that this thesis entitled “OCR System with Pattern, Color Recognition and
Translation Function” is the result of my own research
except as cited in the references. The thesis has not been accepted for any degree and
is not concurrently submitted in candidature of any other degree.
Signature : ....................................................
Name : HO CHUN KIT
20 APRIL 2010
Date : ....................................................
v
To my beloved family and supervisor.
vi
ACKNOWLEDGEMENT
I am grateful to many people for assistance with the preparation of this work:
firstly, to my supervisor, PM. Dr. Yahaya Md Sam for much helpful guidance,
encouragement and advice.
I would like to thank all of my friends who have given me supports and
opinions. Without their opinions, I will not be able to finish this project fast. My
training company, TTVISION SDN BHD has inspired me to choose this title.
Thanks for giving me guidance and valuable experience along my training.
Thanks for my family who has been giving me supports, love and
encouragement all the time.
To one and all, I extend my appreciation and thanks.
vii
ABSTRACT
Optical Character Recognition(OCR). OCR is mainly about pattern
recognition, artificial intelligent, machine vision. The OCR techniques also include
the image processing technique now. OCR knowledge becoming popular nowadays
due to its market value in machine vision inspection system, character scanning and
recognition and pattern recognition. OCR can greatly decrease the error probability
and work loads of human. The building and operation of this system is based on
industrial OCR and IC chips inspection system, and for the software part of the
translation function, it is based on the OCR software for characters recognition. This
project is developed an intelligent OCR system that can be used for materials and
characters recognition for inspection, and language learning purposes. In addition, it
is equipped with easy- to-use teaching system to do instant image recognition and
memorization until the system getting more accurate after learning.
viii
ABSTRAK
OCR dalam kata penuh ialah Optical Character Recognition. OCR secara
umumnya adalah mengenai pengecaman bentuk, kepintaran buatan manusia, dan
mesin pemantauan. Teknik OCR sekarang juga merangkumi teknik pemprosesan
imej. Pada zaman kini, pengetahuan OCR menjadi semakin penting kerana
pengetahuan ini adalah amat berharga dalam sistem mesin pemantau, pengecaman
tulisan dan bentuk.OCR berupaya menurunkan kebarangkalian kesalahan dan
menringankan kerja manusia. Laporan ini adalah mengenai pembinaan satu sistem
OCR (Optical Character Recognition) yang pintar, seterusnya digunakan dalam
pelbagai bidang. Pembinaan sistem ini adalah berdasarkan OCR dalam industri dan
sistem pengesan kecacatan litar bersepadu semasa pemprosesan. Bahagian perisian
penjemahan bahasa pula adalah berdasarkan perisian OCR pengecaman basaha dan
tulisan. Tujuan projek ini adalah membina satu sistem OCR yang pintar untuk
mengesan kecacatan produk , dan pembelajaran bahasa. Tambahan pula, sistem
tersebut mengandungi fungsi pengecaman yang senang digunakan untuk mengecam
dan mengingati imej secara serta-merta, sehingga sistem ini menjadi semakin ‘pintar’
selepas ‘belajar’.
ix
TABLE OF CONTENT
CHAPTER
1
TITLE
PAGE
DECLARATION
ii
DEDICATION
v
ACKNOWLEDGEMENT
vi
ABSTARCT
vii
ABSTRAK
viii
TABLE OF CONTENT
ix
LIST OF TABLES
xii
LIST OF FIGURES
xiii
LIST OF APPENDICES
xvii
INTRODUCTION
1
1.1
OBJECTIVES
2
1.2
SCOPES
3
1.3
PROBLEM STATEMENT
3
1.4
BACKGROUND OF STUDY
4
1.5
COMPARISONS OF THE PREVIOUS THESIS
7
OF UTM’S STUDENTS
2
LITERATURE REVIEW
9
2.1
METHOD AND TECHNIQUES
9
2.1.1 MODERN APPROACHES
9
2.1.2 TECHNIQUES
11
2.1.2.1 AID OF PROCESSING TOOLS
11
x
2.1.2.2 ACCESS THE IMAGE AND PROCESS
12
THE PIXELS
2.2
2.3
3
DEVICES
16
2.2.1 RS232 DB9 AS SERIAL COMMUNICATOR
16
2.2.2 PIC18F452
19
2.2.3 SENSOR AND COMPARATOR
20
2.2.4 MOTOR DRIVER
22
SOFTWARE
22
2.3.1 MICROSOFT VISUAL BASIC 2005
22
2.3.2 HARDWARE PROGRAMMING (mikroC)
23
METHODOLOGY
24
3.1
FLOW OF WORK
24
3.2
EQUIPMENTS
25
3.2.1 SOFTWARE
25
3.2.2 HARDWARE
26
PROGRAM DESIGN
28
3.3.1 FLOW CHARTS
28
3.3.2 THRESHOLD
30
3.3.3
RECOGNITION ALGORITHM
34
3.3.4
INTELLIGENT TEACH FUNCTION
37
3.3
3.3.5 SHAPE RECOGNITION
38
3.3.6 COLOR RECOGNITION
40
3.3.7 DIFFERENT CHARACTER SIZE
43
RECOGNITION
3.3.8 CHINESE RECOGNITION
44
3.3.9 JAPANESE RECOGNITION AND
46
PRONUNCIATION GUIDES
3.4
3.3.10 SERIAL COMMUNICATION
51
HARDWARE AND CIRCUIT DESIGN
53
3.4.1 LIGHTING
53
3.4.2
54
REJECTER DESIGN
3.4.3 ITEM SLOTS
55
xi
4
PROJECT IMPLEMENTATION
60
4.1
OPERATION DESCRIPTION
60
4.2
SYSTEM IMPLEMENTATION AND RESULT
61
4.2.1 SOFTWARE IMPLEMENTATION
61
4.2.1.1 CHARACTER RECOGNITION AND
61
INSPECTION
4.2.1.2 SHAPE RECOGNITION AND PCB
68
INSPECTION
4.2.1.3 COLOR RECOGNITION AND
71
INSPECTION
4.2.1.4 CHINESE RECOGNITION AND
78
TRANSLATION
4.2.1.5 JAPANESE RECOGNITION AND
83
PRONUNCIATION GUIDES
4.2.1.6 DIFFERENT CHARACTER SIZE
84
RECOGNITION
4.3
5
4.2.2 HARDWARE IMPLEMENTATION
86
RESULT ANALYSIS
88
RECOMMENDATION AND CONCLUSION
89
5.1
CONCLUSION
89
5.2
FUTURE WORK RECOMMENDATION
90
REFERENCES
APPENDICES A-D
91
94-102
xii
LIST OF TABLE
TABLE NO.
TITLE
PAGE
3.1
List of components
26
3.2
Calculation of the distribution of black pixel for
35
several characters.
xiii
LIST OF FIGURES
FIGURE NO.
TITLE
PAGE
1.1
Basic flow of the project
2
1.2
Pattern and color inspection
4
1.3
A Demo Version of an OCR tool.
6
1.4
Two freeware examples of OCR tools that written is VB6 7
to let people get basic programming concept.
1.5
Industrial OCR tools that used for IC inspection. From
7
TT VISION SDN BHD.
2.1
Example of image threshold from MATLAB.
12
2.2
Distinguishing two characters from each other.
13
2.3
Color specification of RGB color cube.
13
2.4
TTVISION Epoxy inspection base on segmentation and
14
blob discovery concept
2.5
A number of ‘0’ in pixels
15
2.6
A pattern of pixels distribution for number of ‘0’.
15
2.7
Fuzzy output of the number recognition. SOP is sum of
16
pixels. TERM is width.
2.8
RS232 DB9 Pin out.
16
2.9
Connection between PIC and RS232-DB9
16
2.10
Sample program using VB6 to open comport .
18
2.11
PIC18F452 Pin Diagram from Microchip Technology Inc 19
2.12
IR sensors and Operation.
20
2.13
Pin Diagram of LM324
21
2.14
IR Sensor working with comparator.
21
xiv
2.15
L293D Pin Diagram
22
3.1
Project development flow
24
3.2
Circuit diagram and circuit built
27
3.3
Hardware and items a) 3D drawing of the system
28
b) inspection item models
3.4
Characters Inspection Program Flow Chart
29
3.5
Pattern Inspection Program Flow Chart
29
3.6
Colors Inspection Program Flow Chart
29
3.7
Chinese and Japanese Recognition Program Flow Chart
30
3.8
Flow chart of automatic lighting threshold adjust
32
3.9
Image threshold process.
33
3.10
The Lighting setting and auto calibrate function of the
33
software.
3.11
Sample ‘A’ captured under certain lighting condition.
33
3.12
Distribution of Pixels
34
3.13
Image Under Different Lighting Threshold Value.
36
3.14
Detecting and processing multi character. Boundaries
36
formed in sequence a to z and 1 to 2
3.15
Pixel manipulation (a) Forming array (b) Pixel Detection
37
3.16
Teach Function Input Box.
38
3.17
Shape Inspection Selections.
40
3.18
Color Selection (red rectangle).
40
3.19
Percentage Color Inspection and Threshold Selections
42
3.20
Character of Different Size
43
3.21
Chinese character recognition and translation interface.
45
3.22
Set of Hiragana
47
3.23
Set of katakana with some Modern digraph additions
48
with diacritics.
3.24
Japanese Recognition Interface
49
3.25
Japanese Recognition Interface with combination of
49
characters.
3.26
Port Setting Dialog Box.
52
3.27
Camera and lighting positioning
53
xv
a) Camera Distance Fixing b)Lighting set up
3.28
Rejecter design and position. Red arrow shows the
54
movement path of rejecter.
3.29
Rejecter should be at proper height
55
a) view from behind b) side view
3.30
Item Slots a) top view b) side view c) bottom view
56
3.31
Motor for Item Slots
57
3.32
First trial (detecting the black strips)
58
3.33
IR sensor function by reflection
58
3.34
IR sensor detecting the slit
58
3.35
Shielded IR sensor
59
3.36
Slits on Item Slots
59
4.1
Location of the exe file
61
4.2
Character recognition and inspection interface
62
4.3
Port setting user interface.
63
4.4
Image Preview from camera.
64
4.5
NOT PASS code generated
65
4.6
Teach function of the system
66
4.7
OCR result after teaching the system.
67
4.8
database in a text file for system recognition.
67
4.9
Shape recognition interface.
68
4.10
Threshold image of a simple PCB pattern.
69
4.11
Marked selection region.
70
4.12
Result of shape inspection
71
4.13
Color recognition interface.
72
4.14
Regional color inspection mode.
73
4.15
Regional color inspection for PASS item.
74
4.16
Regional color inspection for NOT PASS item.
74
4.17
Percentage color inspection mode.
75
4.18
Improved setting to filter out colors.
76
4.19
Color percentage inspection is independent of the
77
item’s pattern.
4.20
Item that not pass the test because over percentage.
77
4.21
Chinese recognition interface.
78
xvi
4.22
Chinese recognition result.
79
4.23
Chinese recognition result turned to editable text.
80
4.24
Chinese recognition result can be exported.
80
4.25
Chinese recognition Teach Mode.
81
4.26
Key in the Character by Chinese software in the text box.
82
4.27
New recognition result.
82
4.28
Japanese Recognition result.
83
4.29
Japanese recognition result for combination
84
of pronunciation.
4.30
Larger size character recognition.
85
4.31
Smaller size characters recognition
85
4.32
Overall system hardware.
86
4.33
a) Inspection items position
87
b) System scan through each items
4.34
System flow a) Rejecter reject the item
87
b) Process continue until last item
4.35
System reset to initial position automatically after finish.
87
5.1
Use hand phone to translate any language.
90
xvii
LIST OF APPENDICES
APPENDIX
A
TITLE
2D MECHANICAL DRAWING
PAGE
94
(BODY TOP VIEW)
B
2D MECHANICAL DRAWING
95
(SLOTS TOP VIEW)
C
2D MECHANICAL DRAWING
96
(OVERALL TOP VIEW)
D
E
MICE EXHIBITION CERTIFICATE
PIC18F452 CODING
97
98
1
CHAPTER 1
INTRODUCTION
This project basically can perform mark or characters recognition
(communicated to hardware to do inspection like IC mark inspection), Shapes
recognition (communicated to hardware to do inspection like PCB inspection), Color
recognition (communicated to hardware to do inspection like colors inspection),
Chinese recognition and translation to English. For extra function, the system created
also will be able to do Japanese recognition and pronunciation guide.
OCR can still be considered as a new industrial knowledge, where it is normally
applied in building machine vision system for various manufacturing fields. This kind of
industrial software is normally kept private.
Generally, this project is to study about OCR and the application of this
knowledge for different purposes. That is, OCR in numbers and alphabets (applied in IC
mark inspections), color recognition (applied in organic materials inspections), Chinese
and Japanese characters recognition, teaching, turn to editable text, and translation
2
function. The following parts of this paper will describes on more detailed how can this
system be created at the lowest cost with easy-to-understand algorithm based on pixels
RGB manipulations, pixels division and array matching.
The basic flow and concept of this project is as shown in the Figure 1.1. First of
all, image acquiring under proper lighting condition, image processing, then, the result
will be analyzed by the computer. Next, the computer will communicate with the
hardware to do further action, for example, item rejection.
Figure 1.1: Basic flow of the project
1.1
OBJECTIVES
The aims of building the OCR system are:
i.
To create an OCR system that can react or give signal according to recognition
result (alphabets and words).
ii.
To create an OCR system with pattern and color recognition.
iii.
To create an OCR system that translates Chinese character to English.
3
1.2
i.
SCOPE
Create an OCR system to do mark, character or word inspection.
a. Identify the word / alphabet (specific font) on pictures.
b. Differentiate with the desired word/ alphabets.
ii.
Create an OCR system to do pattern inspection.
a. Identify and differentiate simple pattern on PCB or IC lead.
iii.
Create an OCR system to do color inspection.
a. Identify and differentiate colors (3 colors).
iv.
Create an OCR system to translate Chinese characters to English.
a. Users can teach the system.
1.3
PROBLEM STATEMENT
The OCR technology is not solely about scanned text transformation to the
computer text. It is becoming important to the electronics production industries in the
vision inspection system. It will be good to develop a vision system that can do the OCR,
pattern recognition, and even color recognition in one package. It is also essential for the
system to interact with the computer and user to react and give alert to the user after
getting the recognition results to minimize the works of the operators. Inspections by
operators are usually slow, tiring, high probability of error and cost is needed to pay the
salary to the operators. Machine inspection is still a new industry. More discovers of this
knowledge is required. Many inspection machine still using color sensors. Color
inspection is normally applied in the inspection or sorting of organic substances like
tobacco leaves. Color inspection by using color sensors that have limited spec is not
4
flexible. Additional costs have to be spent in any changes of colors and number of colors.
If the OCR system can recognize colors as well, cost can be saved. Beside inspection,
OCR can also be applied in translating Chinese or others characters into English.
Figure 1.2: Pattern and color inspection
1.4
BACKGROUND OF THE STUDY
The OCR technique is developed since the year of 1929 in Germany. The first
concept on OCR is by using Photo detector. When a lined character passes through the
photo detectors, no light will reach the photo detector. The pattern of the light reflection
will then give signal to the computer to continue the analyzing job.
The previous commercially use of the OCR technique is from the Reader’s
Digest Company for avoiding time consumption of retyping a certain text. The second
system was sold to the Standard Oil Company of California for reading credit card
imprints for billing purposes, with many more systems sold to other oil companies.
Other systems sold by IMR during the late 1950s included a bill stub reader to the Ohio
Bell Telephone Company and a page scanner to the United States Air Force for reading
5
and transmitting by teletype typewritten messages. IBM and others were later licensed
on Shepard's OCR patents. [1] In America, the OCR system is applied in mail sorting
purpose. The second system was sold to the Standard Oil Company of California for
reading credit card imprints for billing purposes, with many more systems sold to other
oil companies. Other systems sold by IMR during the late 1950s included a bill stub
reader to the Ohio Bell Telephone Company and a page scanner to the United States Air
Force for reading and transmitting by teletype typewritten messages. IBM and others
were later licensed on Shepard's OCR patents.[2] Basically, The OCR is being applied in
the field of barcode reading, text reading for the blind people. The OCR program is
being sold since 1979 that do mainly the text to computer text conversion purpose.
Nowadays, OCR technology keeps improving to achieve almost 99% accuracy.
There are many ways to write a OCR program, such as using Neural Network, C++, C#,
VB and so on. Each of the languages performs the task almost the same way. VB is
used widely in the industrial today due to the cost and easy to learn. OCR is being
applied to a lot of advance scanning purpose by many companies in Malaysia. For
example, TT VISION and VITROX at Penang. Vision industry is still a new industry in
Malaysia. OCR and pattern recognition are being applied in the IC chips inspection
system to indentify the defects of the Mark, Leads, Thickness and Labels. The
companies are using high resolution cameras like Panasonic Industrial Camera and
various type of lighting system that using LED to get a good image to do OCR and
pattern recognition. Due to the highly repeatable system of OCR, the cost of hiring
operators to examine the defects can be cutoff. The probability of making errors also
will be minimized.
The common methods of vision are:
a. Pixel counting: counts the number of light or dark pixels
b. Threshold: converts an image with gray tones to simply black and white
c. Segmentation: used to locate and/or count parts
6
i. Blob discovery & manipulation: inspecting an image for discrete
blobs of connected pixels (e.g. a black hole in a grey object) as
image landmarks. These blobs frequently represent optical targets
for machining, robotic capture, or manufacturing failure.
ii. Recognition-by-components: extracting geons from visual input
iii. Robust pattern recognition: location of an object that may be
rotated, partially hidden by another object, or varying in size
d. Barcode reading: decoding of 1D and 2D codes designed to be read or
scanned by machines
e. Optical character recognition: automated reading of text such as serial
numbers
f. Gauging: measurement of object dimensions in inches or millimeters
g. Edge detection: finding object edges
h. Template matching: finding, matching, and/or counting specific patterns.
[2]
There are many OCR software available in the market and selling with high price.
For example, ExperVision TypeReader & OpenRTK, ABBYY FineReader OCR,
Aprise OCR as shown in Figure 1.3, Microsoft Office Document Imaging, Brainware,
SimpleOCR, Tesseract and many more. There are also free OCR learning software like
Quickwrite OCR as shown in Figure 1.4. Example of industrial OCR software like OCR
software from TTVISION as shown in Figure 1.5.
Figure 1.3: A Demo Version of an OCR tool.
7
Figure 1.4: Two freeware examples of OCR tools that written is VB6 to let people get
basic programming concept.
Figure 1.5: Industrial OCR tools that used for IC inspection. From TT VISION SDN
BHD.
1.5
COMPARISONS OF THE PREVIOUS THESIS OF UTM’S STUDENTS.
LEOW YEE RUN from 5SEM has done a thesis in the title of “Neural Network
Simulator for Alphabet Recognition with Speech Synthesizer.” [3] The author has used
several software to perform OCR task. VB6, MATLAB, Neural Network, Microsoft
Agent are the software that being applied to do image recognition and then convert it to
the sound signal by using Microsoft Agent. The only Hardware that the author used is a
simple webcam.
8
NOOR KHAFILAH BINTI KHALID from 4SEM also had written a thesis
entitled of “An Image Processing Approach Towards Classification of Defects Printed
Circuit Board.” [4] The purpose of the software design is to compare a sample image to
a prototype image. An image processing approach is applied to examine the defects of
the PCB board. The author did not use any hard ware during the process. It just simply a
comparison between pixels of two picture to obtain a certain value of percentage error. If
the picture exceeds a certain threshold of error, the user of the software will be notified.
AZIZUL BIN KEPLI, 5SEM had written a thesis on “Vision Based autonomous
Color Detection and Object Tracking Robot”. [5] The author different from the
previous one. He used only hardware to do the image recognition. CMUcam2 is used to
differentiate the colors of the object and the robot will be able to react to the object.
Lastly, LEE CHEE WEI from 4SEI, have written a thesis on “Smart Automated Parking
System Using Plate Number Recognition Technology”. The author using Visual Basic
2008 to write the OCR program. The system is able to recognize the plate number and
record the numbers in the computer when the webcams are triggered by the PIC. He
used the aid of image processing tools (TESSERACT).
All these studies have applied the OCR and pattern recognition technology.
There are still certain criteria that need to be improved to optimize the image recognition
result, and the abilities of OCR system.
9
CHAPTER 2
LITERATURE REVIEW
2.1
METHOD AND TECHNIQUES
This section is mainly discussion of the method and technique applied for
building the OCR system, devices and type of development software involve.
2.1.1 MODERN APPROACHES
Traditional method of OCR using light source and photo detector is no longer
applicable due to inaccuracy and problem of resolution. There are some modern
approaches like using progressive cameras to capture the image and analyze by using
computer. Hence, software writing is very important.
10
There are several common modern approaches on OCR software writing;
Visual Basic:
It is very common software on writing graphical user interface (GUI) in many
applications. This method of developing OCR software is widely applied in developing
commercial and industrial OCR. The reasons are, the programming language is very
simple, easy to learn, more features on GUI, easy to find coding references because of
its popularity, and the most important thing is, Microsoft provide free download for this
software for everybody.
MATLAB:
MATLAB has provided a toolbox for users to do image processing. Although most of
the function of image processing has been provided, but it is still not widely apply in
developing OCR due to the coding used is more complicated and the most important
thing is, it is not very capable in developing GUI.
Neural Network:
This method is popular in analyzing image. Widely applied in complicated image
processing purposes. For example, analyzing maps, blood cells. This software and
concept is more difficult to learn compare to visual basic.
Fuzzy Logic:
This method required a microcontroller as fuzzy engine that can receive fuzzy input of
data. For example, MC68HC11E9 8-bit microcontroller, provided by Freescale
Semiconductor [6]. Users must write a fuzzy rule to provide inputs to this
microcontroller. The coding is complicated, by the concepts are easy to understand. This
method is not popular in OCR because people prefer to use computer to analyze rather
than a microcontroller. Actually, the concept on writing the fuzzy rules is widely applied
in writing OCR software.
11
2.1.2 TECHNIQUES
Although we have a lot of different approaches on writing to OCR software, such as
using Visual Basic, MATLAB, Neural Network, Fuzzy logic, C#, C++, at the end,
commonly end up with two choices.
1. Aid of image processing tools
2. Access the image and process the pixels
2.1.2.1
AID OF PROCESSING TOOLS
If we are choosing the first method, there are a lot of commercially available
tools like TESSERACT, SDK LEADTOOLS, ABBYY Fine Reader OCR and more. By
using this method, during the programming part, user can call the dll files directly
provided by these commercial tools to do image processing jobs. Users can save a lot of
times in order to figure out the way on how to manipulating the pixels. There are some
drawbacks of this method. Firstly, this method is less flexible. The processing tools
cannot always provide the needs to suit all different conditions and system requirement.
Secondly, there is a limitation of creativity. Cost is needed to purchase this kind of
processing tools.
12
2.1.2.2
ACCESS THE IMAGE AND PROCESS THE PIXELS
If we are choosing the second method, we have to do a lot of coding for image
processing. It is time consuming because image processing normally involve a lot of
logic behind the coding. The image processing method we have to apply are like pixels
manipulating, Threshold, Edge detection, Blob discovery, applying fuzzy rules. This
method is more flexible, more spaces for creativity development and improvement and
cost saving. Figure 2.1 shows one of the image processing technique called threshold.
Figure 2.1: Example of image threshold from MATLAB.
Threshold is a very important technique in image processing to differentiate and
extract the objects in pictures. It is generally about image contrast adjustment. This
technique applied even in blood cells and maps analysis for more advance purposes.
[7],[8].The following techniques are Sub pixels, Segmentation , pixels location
detections, Array matching [6],[10], Fuzzy rules[11].
Image captured from the images contain colors and blur that may affect the result
of OCR. So, we have to change to grayscale and then change to black and white bit by
bit. After this, we have to rotate the position of the image and then, do segmentation and
recognition. Figure 2.2 shows the way to calculate pixels to distinguish characters.
13
Figure 2.2: Distinguishing two characters from each other.
The all-black square is a pixel that belongs to the connected component that
comprises the current character. Since none of the pixels belonging to the next character
touches the current character, an ending location is set.
For color recognition, the color definition of computer is Red, Green, and Blue.
Each counting from 0 to 255. The possible combination of the color components are
256*256*256 = 16777216 colors as shown in the color block in Figure 2.3. We can
retrieve the color value by
Color=Bitmap.GetPixel(x, y)
coding of Visual Basic 2005
Color values retrieved from the image captured then will be compares with reference.
Figure 2.3: Color specification of RGB color cube.
14
The real coding examples for industrial and commercial OCR are not easy to get
due to the copy write issue. Most of the time, we can only get the techniques and
theories on image processing. For example, through the journals “International journal
of pattern recognition and artificial intelligent” and “Photogrammetric
Engineering and Remote Sensing”. For personal experience, through industrial
training in TT VISION SDN.BHD at Pulau Pinang, their development of inspection
system basically using the same approaches like pixel manipulation, blob discovery and
segmentation as shown in Figure 2.4. The way of pixels manipulation is depends on
creativity.
Figure 2.4: TTVISION Epoxy inspection base on segmentation and blob discovery
concept
Here again let’s take an example from FreeScale Semiconductor in developing
fuzzy rules to do character recognition.
15
Figure 2.5: A number of ‘0’ in pixels [6]
From Figures 2.5 and 2.6, it can be observed the distribution of the black pixels.
From high to low, then to high again and lastly decrease to 0. This is true even for the
skewed number. Each number will have different distribution pattern.
Figure 2.6: A pattern of pixels distribution for number of ‘0’.
The next step is, divide the pixels of the number into a few columns to generate
fuzzy rule. For example, the first column has large number of black pixels it will provide
output of ‘High’ .Second column is ‘Low’. If it is medium, output will be ‘medium’. But
this is not enough. There must be some similar pattern of distribution. For example, ‘0’
with ‘8’, ‘2’ and ‘5’ and so on. To handle this, we have to count the total number of
pixels as second level of recognition between these cases. Larger number of pixels will
produce ‘High’ and lower will produce ‘Low’. If there is conflict of similar case again,
we have to calculate the width of that numbers again to produce fuzzy output. At last,
16
the fuzzy output will be generated to differentiate numbers. This output then will be boot
into the microcontroller to do OCR.
Figure 2.7: Fuzzy output of the number recognition. SOP is sum of pixels. TERM is
width.
2.2
DEVICES
2.2.1 RS232 DB9 AS SERIAL COMMUNICATOR
Figure 2.8: RS232 DB9 Pin out.
Figure 2.9: Connection between PIC and RS232-DB9
17
The serial connection between the PIC and RS232 is as shown in Figures 2.8 and 2.9
Normally we have to consider about the :
1) Baud rate = The baud unit is named after Jean Maurice Emile Baudot, who was an
officer in the French Telegraph Service. He is credited with devising the first uniformlength 5-bit code for characters of the alphabet in the late 19th century. What baud really
refers to is modulation rate or the number of times per second that a line changes state.
This is not always the same as bits per second (BPS). If you connect two serial devices
together using direct cables then baud and BPS are in fact the same. Thus, if you are
running at 19200 BPS, then the line is also changing states 19200 times per second. [12]
2) Data Bit = Directly following the start bit, the data bits are sent. A bit value 1 causes
the line to go in mark state, the bit value 0 is represented by a space. The least
significant bit is always the first bit sent.
3) Stop Bit = The stop bit identifying the end of a data frame can have different lengths.
Actually, it is not a real bit but a minimum period of time the line must be idle (mark
state) at the end of each word. On PC's this period can have three lengths: the time equal
to 1, 1.5 or 2 bits. 1.5 bits is only used with data words of 5 bits length and 2 only for
longer words. A stop bit length of 1 bit is possible for all data word sizes.
4) Parity Bit = For error detecting purposes, it is possible to add an extra bit to the data
word automatically. The transmitter calculates the value of the bit depending on the
information sent. The receiver performs the same calculation and checks if the actual
parity bit value corresponds to the calculated value.[13]
18
Create an instance of CRs232 then set COM parameters before invoking the Open method
Here's an example:
Dim moRS232 as New Rs232()
With moRs232
.Port = 1
'// Uses COM1
.BaudRate = 2400
' // 2400 baud rate
.DataBit = 8
‘// 8 data
bits
.StopBit = Rs232.DataStopBit.StopBit_1
'// 1 Stop bit
.Parity = Rs232.DataParity.Parity_None
'// No Parity
.Timeout = 500
'// 500 ms of timeout admitted to get all
required bytes
End With
'// Initializes and Open
moRS232.Open ()
You can, optionally control the state of DTR/RTS lines after the Port is open
'// Set state of RTS / DTS
moRS232.Dtr = True
Figure 2.10: Sample program using VB6 to open comport .
Baud Rate is very important in serial communication, so the PIC and the computer can
be synchronize. This can be done by computer coding as shown in Figure 2.10.
Baud rate calculation:
Desired Baud Rate = Fosc/(64X + 64) = Fosc/64(X + 1)
Fosc is the frequency of the crystal use.
X is the value we load into the SPBGR register.
The following is the example command of opening comport for the PIC program code.
OpenUSART (USART_TX_INT_OFF & USART_RX_INT_OFF &
USART_ASYNCH_MODE & USART_EIGHT_BIT & USART_CONT_RX &
USART_BRGH_HIGH, X)
19
2.2.2 PIC18F452
PIC18F452 in figure 2.11 is a 40-pin high performance, enhanced FLASH
microcontrollers with 10-Bit A/D. It is an enhanced version of microcontroller compared
to PIC16F and PIC17F. It has more memory space compared to previous versions as
stated earlier, but Source code compatible with the PIC16 and PIC17 instruction sets.
The popular features of PIC18F452 are like analog to digital conversion, 100,000
erase/write cycle Enhanced FLASH program memory typical, 1,000,000 erase/write
cycles Data EEPROM memory, FLASH/Data EEPROM retention more than 40 years,
wide operating voltage range (2.0V to 5.5V), low power, high speed FLASH/EEPROM
technology.
Figure 2.11: PIC18F452 Pin Diagram from Microchip Technology Inc
The reasons for choosing PIC18F452 in this project are, high performance,
robust, easy to program by using C programming, large amount of I/O pins and
reasonable price.
20
2.2.3 SENSOR AND COMPARATOR
IR emitter and IR phototransistor. An infrared emitter is an LED made from
gallium arsenide, which emits near-infrared energy at about 880nm. The infrared
phototransistor acts as a transistor with the base voltage determined by the amount of
light hitting the transistor. Hence it acts as a variable current source. Greater amount of
IR light cause greater currents to flow through the collector-emitter leads. [14]
Figure 2.12: IR sensors and Operation.
The IR sensor pair normally work with comparator in order to trigger or give
signal according to the voltage output or the IR pair as shown in figure 2.12. Example
or comparator circuit is LM324 chip or Low Power Quad Operational Amplifiers. Its
low power requirement with 2nA and +5V to 30V operation range make it suitable for
this project. LM324 in figure 2.13 provides 4 op-amps that can be use as comparator.
21
Figure 2.13: Pin Diagram of LM324
Figure 2.14: IR Sensor working with comparator.
When the receiver IR RX receives the infrared light from the emitter, the
potential difference of the negative input of the comparator will rise. When it higher than
the positive input voltage, sensor X will be triggered. R3 is use to adjust the sensitivity
of the IR sensor because the voltage of the transmitter and receiver should be adjust to
balance at different condition of lighting before use. The basic circuit is as shown in
Figure 2.14.
22
2.2.4 MOTOR DRIVER
Motor driver is required in this project even motors with small power
requirement are being used. We can’t use PIC to drive the motors because motors will
produce back electric motive force that will damage the PIC. It is always a good
practice to use a motor driver to drive motors by using PIC. Motor driver chosen in this
project is L293D or Push-Pull Four Channel Driver with diodes. This chip provides 4
channels to enable users control 4 motors as shown in Figure 2.15.
Figure 2.15: L293D Pin Diagram
2.3
SOFTWARE
2.3.1 MICROSOFT VISUAL BASIC 2005
Visual Basic 2005 is the most popular programming language for building a
Windows and Web application. The most important thing that this language provides is
the .NET Framework, which is a very large collection of functions where users allowed
23
to call for complete their tasks. It is a very powerful and user friendly software that even
can create some application without a single line of coding. Visual Basic 2005 provided
a large numbers of tools for users to create Windows forms according to their needs. The
debugging tools of the Visual Basic itself will help the users to debug their coding.
Visual studio 2005 express Edition is a freeware that users can directly download online
from the Microsoft official website.
2.3.2 HARDWARE PROGRAMMING (mikroC)
The software used to program the PIC18F452 is mikroC. This software allowed
users to develop their own applications quickly and easily with intuitive C compiler for
PIC microcontrollers. This software provides useful implemented tools, many practical
codes examples, broad set of built in routines, and a comprehensive Help. It is suitable
for experienced engineers or beginners. This software chosen in this project because it
provides free student version. It also provides easy-to-use USART module for serial
communication purpose.
24
CHAPTER 3
METHODOLOGY
3.1
FLOW OF WORK
Figure 3.1: Project development flow

Computer software development is development of character recognition, color
recognition and translation software by using Visual Basic 2005. Here,
25
references should be made. There are a lot of OCR software and techniques
available in the market, but there are not open source. Hence, getting basic ideas
and concepts are very important. In this project, no image processing tools
involve to achieve true learning and design process.

For the Hardware design part, PIC18F452 as the main hardware to control the
flow of whole inspection system. In this process, there are a lot of problems that
we need to solve. For example, dimensions of the mechanical parts, motors,
noise to serial port, isolation of supply, and orientation of the inspection items,
and so on.

After that, communication of hardware with computer software written is by
using serial RS232 communication with the help of MAX232 chip. Here,
communication coding by Visual Basic and mikroC are needed. This is also a
huge topic.

Testing and improvement part is mainly improving the programming, robustness,
and rapid ability of the system.
3.2
EQUIPMENTS
3.2.1 SOFTWARE
i.
Microsoft Visual Basic 2005, Express Edition
ii.
microC 8.0 student version.
iii.
UIC00A PICkit 2v2.55 (PIC burner)
26
3.2.2 HARDWARE
For circuit, the components required are as stated below. For the mechanical
parts, it is as stated according to the mechanical drawing. The tracks are aluminum,
while the other parts are zinc. For detailed two dimensions drawing, can refer to
APPENDIX A, APENDIX B and APPENDIX C.
1. PIC18F452
16. Adaptor terminal x2
2. L293D motor driver
17. 7805 regulator x2
3. DC motor x2
18. Crystal 10MHz
4. DC supply 12v x2
19. 1k Ω variable resistor x2
5. MAX232
20. Soldering gun
6. IR sensor set x2
21. PIC programmer
7. LED x5
22. Jumper wires
8. 10Mhz crystal
23. Male/female connectors
9. 30pF capacitor x4
10. 10µF capacitor x 7
11. Push button x2
12. 10kΩ resistors x2
13. 220Ω resistor
14. RS232 port
15. Webcam (akkord)
Table 3.1: List of component
The system consists of a simple webcam with fixed position to minimize the noise
from the lighting changes. There is a sliding plate where consist of 5 slots. Only 4 slots
can be used to place the items to keep the sliding plate still in the track of motor (motor
still touching the plate). Picking mechanism move by another motor where it can push
the defected items out. The sliding plate will have slits in between each slot to let the IR
27
sensors to sense the positions of the plates and stop at the right position. The motor
speed will be controlled by PWM to avoid any overshoot of the position of the plate.
Figure 3.2 shows the Circuit of the system built while Figure 3.3 shows the hardware
and inspection items model.
Figure 3.2: Circuit diagram and circuit built
28
(a)
(b)
Figure 3.3: Hardware and items a) 3D drawing of the system b) inspection item models
3.3
PROGRAM DESIGN
3.3.1 FLOW CHARTS
Each features of the system has its own flow of programming. There are not
interacted basically but will link to the main program (form) to give signal to the PIC.
29
Figure 3.4: Characters Inspection Program Flow Chart
Figure 3.5: Pattern Inspection Program Flow Chart
Figure 3.6: Colors Inspection Program Flow Chart
30
Figure 3.7: Chinese and Japanese Recognition Program Flow Chart
The reason to let the program count to 4 is because the inspection system model
built is limited to 4 items inspection at once. The items calculation is done by PIC18F.
The more detail concept about this flow will be discussed in the sections below.
3.3.2 THRESHOLD
Threshold process is the first concept that we should start in image processing.
Basically, we need to turn the picture to black and white. The first step on OCR is
capturing a good picture, where adequate lighting [15] is crucial in threshold process.
Figure 3.9 shows the technique of threshold process to filter out the undesired colors and
noise from a picture for the ease of pixels manipulation and calculation.
31
Each pixels has its own value of RGB (Red , Green, Blue) from 0 to 255.To
threshold a pixel, we have to retrieve the RGB values of the pixels and multiply it with
certain color ratio using equation 3.1:
H= (x, y).R*CR+(x, y).G*CG+(x, y).B*CB
………………………………… (3.1)
Where H is threshold value, (x, y) is location of pixel, (x, y).R*CR is red
component of that particular pixel multiply with the red color ratio, (x, y).G*CG is green
component of that particular pixel multiply with the green color ratio, (x, y).B*CB is
blue component of that particular pixel multiply with the blue color ratio [16]. H value
will vary according to the lighting from surrounding and it indicates darker colors. So, in
the programming if H< lightsetting, pixel will be set to black. Else, it is white. For user
friendly purpose, lightsetting is a variable form 0 to 255 set by users with simple
Horizontal Drag Bar as shown in Figure 3.10. This is very important for the system
used in different lighting conditions and flexibility of setting any threshold values for
acquiring different quality of images as users like.
The first program developed is not good because every time we try to capture an
image, I will have to adjust the lighting threshold. This is very not user friendly and we
will not know what is the threshold value suitable for that particular lighting condition.
This will cause a difficulty in pixel manipulation and calculation process. I continue to
develop my program to more immune to lighting changes. I design my software such
that users need to calibrate the lighting tolerance by using the alphabet of ‘A’.
To let the software become more user friendly, auto calibrate of the brightness of
the picture taken is an good idea. Auto calibrate is a process to let the system adjust the
32
threshold value to a suitable threshold value under different lighting condition. In this
case, a character ‘A’ is used as a sample.
First of all, capture an image of the character ‘A’. Then, all we need to do is just
press the button of ‘Auto Calibrate’. After that, the system will run automatically to
adjust the threshold bar until suitable threshold value obtained.
In order to do this, in the programming section, some looping of program is
required. The basic structure of the programming flow is as Figure 3.8:
Figure 3.8: Flow chart of automatic lighting threshold adjust
From the programming code structure above, first of all, threshold the image.
Then, calculate the total number of black pixel. In this case, earlier testing was
conducted, and the best quality of threshold is when the total black pixels of ‘A’ is 415.
If the image captured is in different lighting condition, it will affect the total number of
black pixel. Hence, we can let the program calculate the total number of black pixel. If it
is not equal to 415, then system will adjust the lighting adjustment bar. The process
33
continues until we get the total number of pixel equal to 415. Then, this is the best
threshold value for that particular lighting condition.
Figure 3.9: Image threshold process.
Figure 3.10: The Lighting setting and auto calibrate function of the software.
Figure 3.11: Sample ‘A’ captured under certain lighting condition.
34
3.3.3
RECOGNITION ALGORITHM
The First trial of recognition is by applying Fuzzy Rules to find the number of
black pixels distributed in different region as shown in Figure 3.12.
Figure 3.12:
3
Distribution of Pixels
For the ease of explanation, let’s take the character ‘T’ in Figure 3.12 as an
example. If we move from left to right, we can observe that the distribution of the black
pixel increase and become maximum at the middle of the character. Then, it decrease to
the right. Same thing happened as we move from the top to bottom. The distribution of
the pixel
ixel maximum at the top, then decreases as we move to the middle and then,
maintained. Through these characteristic curves
curves,, we can actually differentiate a character.
In this method, total number of pixels of each character is analyzed. Different
character will form different shape of graph. The highest point of the graph will be
considered as ‘High’ .Middle point is ‘Medium’, lowest point is ‘Low’. Different
characters will have different sequences of ‘High’, ‘Medium’, and ‘Low’. If there are
similar cases like ‘0’ and ‘H’, which will provide the same result, then the total number
of pixels will be considered and etc. This method is not good enough after testing
because there are many similarities of each character and we have to figure out more
35
ways to differentiate between them. This method also sensitive to lighting changes form
surrounding.
Table 3.2: Calculation of the distribution of black pixel for several characters
The second trial which is the array matching method is preferred and applied in
this project. Referring to Figure 3.16 (a), we can observe that each character or alphabet
consists of black pixels and white pixels. First of all, we need to find the first point of
the character, which is the intersection point between line ‘a’, and line ‘b’ in Figure 3.16.
Then, the program will go through each pixel from that point until the end point, which
is the intersection point between line ‘c’ and line‘d’. Black pixels will be saved as ‘1’
and white pixels will be saved as ‘0’. A two dimensions array is formed. This array
(array A) will be compared with the arrays stored in the text earlier in the Teaching
Function. Let’s take one example among the arrays stored and call it as array B. When
we compare each array, we run through each elements or array A and array B
simultaneously. When there are same elements like array A(x, y) = ‘1’ and array B (x, y)
= ‘1’ or array A(x, y) = ‘0’ and array B (x, y) = ‘0’ it will be counted as one same bit. So
the percentage of similarity can be calculated as in equation 3.2,

 number_ of _ same_ bit

*100 …………. (3.2)

 area_ bounded_ by _ line _ abcd

Percentage of similarity = 
36
Each stored array will be recalled and compare with array A. Result is in
percentage of similarity. The highest percentage will be the result of OCR.
The recognition algorithm implemented is better than Fuzzy Rules that normally
being applied due to the reason of less similarity case consideration and no overlapping
characteristics of each character. With teaching function, it can handle more cases
without reprogramming the software to add extra algorithm in differentiating characters.
In addition, the array matching method will be more immune to lighting changes and
poor threshold value selection. For example, case (a) in Figure 3.13 is the image
captured. Case (b) is after threshold under correct value. So, the system can detect it as
‘Y’. Case (c) is under poorer threshold value but the system still can recognize it as ‘Y’.
Because it still has the highest percentage of similarity.
Figure 3.13: Image Under Different Lighting Threshold Value.
For multi character, looping is required. After detecting the first character, extra
programming is required to detect whether there are extra character inside the image. So,
we have to loop the image again to detect any black pixel left. If there are black pixels,
another array will be formed as shown in figure 3.14.
Figure 3.14: Detecting and processing multi character. Boundaries formed in
sequence a to z and 1 to 2
37
3.3.4
INTELLIGENT TEACH FUNCTION
The earlier developed alphabet recognition was by differentiating the number of
pixels of particular alphabets. User can teach the system to learn alphabets and the input
will be stored in RAM. This is not good because many alphabets have same number of
pixels, and RAM memory will loss when we restart the program.
After several improvements, the system created is well equipped with an
intelligent memorizing and recognition function. It is also equipped with camera
function that directly link to any camera connected to the computer. Once the users
capture the image to the system, Teaching Function will allow the program to detect the
position of the character or alphabet captured and memorize the positions of black and
white pixels (as shown in Figure 3.15 (b)) and store in an array and files to become a
ROM. So the system will remember these characters even when the system is restarted.
From Figure 3.15 (a), first of all the system will scan through x and y axis of the
picture to find line ‘a’ (First meeting black pixel), then line ‘b’, ‘c’ and‘d’. A two
dimensions array formed. Black pixels stored as ‘1’ and white pixels stored as ‘0’. This
array will be exported to a text file to store as ROM. Then, users will prompt to a new
window that requires the users to teach the system what is this shape represents and will
be stored in a proper location. Teach Function completed. Hence, even any type of font
and shape, the system is capable to learn it and turn to editable text. For Chinese and
Japanese Characters Recognition, the same rules applied.
(a)
(b)
Figure 3.15: Pixel manipulation (a) Forming array (b) Pixel Detection
38
After storing the array, user will be prompted to tell the system the meanings of
that character or the array stored. This can be done by using a Input Box function. Then,
the data from the Input Box will be stored in another file as shown in figure 3.16. If the
user want to teach the system again, the following data will be saved in the same file
path for array and the meanings. The new data will be saved at another line of the file.
Hence, number of line in these files represents the number of data stored. For example,
the array will be stored at file A line 1. The meanings will be stored at file B line 1. The
following array and meanings will be stored at line 2 for each file. Then, the teaching
process ended.
System Teach Function is a very good idea because we don’t have to write the
coding again if we need to recognize a new image. It is very easy to use with just a few
simple steps, even a person who don’t know any programming technique can complete
the task.
Figure 3.16: Teach Function Input Box.
3.3.5 SHAPE RECOGNITION
By applying the same concept of array matching, system can be programmed to
do shape recognition and inspection. Here, features added, so the users can select 6
39
regions on pictures to do matching and inspection. The results will be returned in a form
of percentage of similarity. And the users can select the threshold for it to become
‘PASS’ or ‘NOT PASS’.
Shape recognition normally applied in PCB or IC lead inspection purposes. The
flexibility for the user to select the region they want to inspect is very important to
increase the speed of the inspection process. Even though it is more convenient to the
user if we let the system to memories the whole circuit, but the speed of inspection will
be decreased. So, in order to increase the speed, it is better to process only the regions
selected by the user. Actually, user can still drag and select the whole circuit to let the
system memories if they don’t care about the speed.
The program is written in such a way that it will memories the array within the
region of selection based on the image reference point. Image reference point is the first
point of the image (the first black pixel at left and top). After memories, inspection
process can be started. First of all, the program will run through the image from position
x = 0 and y = 0, the first black pixel detected will be the reference point. Based on the
reference point, coordinate of the stored array will be recalled and comparison process
will be carried out. The result is percentage of similarity. If percentage of similarity is
less than the threshold set by the user, it will generate a ‘NOT PASS’ code. Item will be
rejected out. Else, the item will ‘PASS’.
For the convenient of the user, program is written to memories the coordinates of
the reference point and arrays in a text files. Even the user restart the program, the
memories still retained, unless deleted by users. Example of shape recognition selection
is as shown in figure 3.17.
40
Figure 3.17: Shape Inspection Selections.
3.3.6 COLOR RECOGNITION
As stated earlier, each pixel has its own RGB value. Each pixel may have a very
different RGB values after testing, although we cannot notice any difference of color
between the two different dotes. This difference in values of RGB is caused by uneven
lighting condition when capturing an image.
In this project, color recognition function consists of 2 modes. Regional Color
Inspection and Percentage Color Inspection. User can select maximum number of 3
colors to recognize by mouse dragging. Any colors can be selected. Without worrying
the RGB values, computer can handle for the rest.
Figure 3.18: Color Selection (red rectangle).
41
From the region selected as shown in Figure 3.18, computer will run through
each pixel within that particular area and get the values of R, G and B of each pixel and
get the average as in equation 3.3;
Average R =  
 R _Value_ of _ each_ pixel 
 …………………………………….
 total _ number_ of _ pixel 


(3.3)
Similar calculation for G, green and B, blue. These values will be memorized by
the system Tolerance can be adjusted by simple scroll bar provided to the users in the
software interface. When in run time, if the system gets a different value of RGB in one
of the region selected, the item will be rejected out. The items will pass, if they follow
the rules as below.
Average R of item - tolerance  Average R stored  Average R of item+ tolerance
Average G of item - tolerance  Average G stored  Average G of item+ tolerance
Average B of item - tolerance  Average B stored  Average B of item+ tolerance
For Percentage Color inspection, users can select any maximum of 3 colors.
Selection is as easy as previous function like Figure 3.18. This time, the system will get
the average RGB values and detect the whole picture to see if there is similar color
detected or not. System will filter out other unselected colors and calculate the
percentage of the selected colors in the picture. Users can set a threshold value of
percentage. If exceed the threshold percentage, the item will be rejected.
42
The program will run through each pixels of the image. If the pixels RGB values
are in the range of the selected RGB, it will be displayed. Else, Other colors of the image
will be turned to white color. Hence, calculation can be made according to equation 3.4;

 colored _ pixel 
 total _ number _ of _ pixel 


Percentage of selected color in image = 
…….. (3.4)
By referring to Figure 36, if users set threshold of ‘PASS’ is 2 % each, after
processing, Red color is 4%. It will generate a ‘NOT PASS’ alert and this item will be
rejected. Figure 3.19 also shows the interface of colors threshold setting for user. This
concept is applicable in organic materials inspections where we need to determine
percentage of colors in rotten fruits and tobacco leaf to determine its grades and quality.
Figure 3.19: Percentage Color Inspection and Threshold Selections
43
3.3.7 DIFFERENT CHARACTER SIZE RECOGNITION
How about different size of the character? The user may ask. The answer will
be no problem for my software. Character recognition for different size of font is little
bit complicated compared to normal array matching. Normally different size of the
character of alphabet not applied in mark inspection. But this project is study about OCR.
So, other applications like car’s plat number recognition should be considered.
Similar method is applied in this function to allow the system to recognize
different sizes of fonts. This time, we cannot go through each pixel of different fonts
because we will get different numbers of pixels for sure. We can divide the character in
N x N parts as in Figure 3.20. Through observation, we can realize that the area
occupied by the dark pixels in each small part will be same even the character ‘grow up’
bigger. So, we can store the 2 dimensions array based on concept that if dark area more
that 50% of the total area of each small part, will be stored as ‘1’. Otherwise, it will be
stored as ‘0’. After that, same method of array matching can be used.
Figure 3.20: Character of Different Size
44
3.3.8 CHINESE RECOGNITION
Chinese recognition is one of the functions of OCR. There are many software
available in the market to do such work. For sure, the coding of these software are kept
private for the developers. The Chinese recognition part of this project is solely own
ideas without references and helps of image processing tools from other source.
First of all, image captured will be threshold to black and white automatically. If
the threshold quality is not good, user can readjust the threshold value again. The
distance of the image from the camera is fixed. For this function, the software can only
recognize certain size of character. User must teach the system first with a set of
characters. For this purpose, Teach Function is prepared for the user to do instant image
recognition.
After threshold the image, a little programming is required to let the user select
the character they want to recognize. As the user move the mouse to the threshold
image, a red color square must be provided to the user as a guide to proper character
selection as shown in Figure 3.21. This can be done with mouse move event and system
drawing function provided in visual basic [17].
45
Rich
text box
Selection guide
Top 5
arrays
Figure 3.21: Chinese character recognition and translation interface.
The image within the selection region will be changed to array and stored inside
the computer text file when user clicks. Then, an Input Box should be prompt out to ask
the user insert the meanings in English and will be stored in another file path.
Chinese characters sometimes can’t stand alone to represent one meaning. To
solve this problem, we must turn the image recognized into editable text (text we
normally type inside word file). Hence, followed by the Input Box that prompt the user
to insert the character’s meanings in English, another Input Box must be prompted to the
user to type the Chinese character and stored inside the system. Hence, in order to teach
the system, one must know Chinese well, and the computer must be able to let the user
inserts Chinese characters.
After teaching, the system will be able to recognize the same image when user
selects the image again. The concept is the same for the alphabet recognition, based on
array matching. When the system not in Teach Mode, if the user clicks the character, the
array bounded by the red square will be saved in a temporary array inside the program.
This array will be used to compare each line of the arrays stored inside the computer text
46
file. The comparison result is in percentage of similarity. After comparing all the arrays
stored inside the computer, the top 5 of the arrays that having the highest percentage of
similarity will be the recognition result. The locations of these arrays are based on the
number of line in the text file. The meanings of these arrays also having the same
number of lines (same location), but in another text file. Once the locations of the top 5
arrays are known, the meanings can be recalled and display in the text boxes in the
software.
When the user clicks the text boxes, the text will be copied to the rich text box at
right hand side of the software interface. This is very important to let the user copy the
text to other path of the computer. If the user click the ‘EXPORT’ button, an event
should be call (System.Diagnostic.Start( )) to start an empty text file and copy the
contain of the rich text box inside this text file. Hence, user can save the recognized text
to the other path of the computer in text file format.
3.3.9 JAPANESE RECOGNITION AND PRONUNCIATION GUIDES
Japanese characters can’t stand alone to represent one meaning. Therefore, we
can’t provide the translation of the Japanese characters to English in this project. It is
better to develop the software to just provide pronunciation guides due to time constraint.
There are 3 sets of Japanese characters. Katakana, Hiragana, and Kanji. Hiragana
are part of the Japanese writing system. Japanese writing normally consists of kanji
which are used for the main words in a sentence, and hiragana which are used for the
little words that make up the grammar (in English these would be words like “from” and
47
“his”). Hiragana is also used for the endings of some of the words [18]. Katakana is a
Japanese syllabary, one component of the Japanese writing system along with hiragana,
kanji, and in some cases the Latin alphabet. The word katakana means "fragmentary
kana", as the katakana scripts are derived from components of more complex kanji.
Katakana are characterized by short, straight strokes and angular corners, and are the
simplest of the Japanese scripts [19].
Figure 3.22: Set of Hiragana [18]
48
Figure 3.23: Set of katakana with some Modern digraph additions with diacritics. [19]
The method of teaching, recognizing are storing arrays are basically the same
with the Chinese recognition. Instead of ask user to key in the meanings, this time the
Input Box will ask to user to key in the pronunciation of that particular character. There
is one part that made the Japanese recognition system becoming more difficult, which is
the ‘yoon’ part of the Hiragana and diacritics part of the katakana. For these two parts,
when certain characters come together, their pronunciation will be different. Here, we
should pay attention while in programming.
First of all, system has been taught with the vowels of Hiragana and monographs
of katakana by using the same method as the Chinese recognition part. Now, the system
will be able to recognize and provide pronunciation guides for single character of
Hiragana or Katakana. But this is not enough because it will provides wrong
pronunciation guide for case of ‘yoon’ and diacritics. For example, referring to the
49
Figure 3.22, when ‘ki’ followed by ‘ya’, the pronunciation will become ‘kya’ instead of
‘ki ya’.
Mode
Figure 3.24: Japanese Recognition Interface
Tricky
part
Figure 3.25: Japanese Recognition Interface with combination of characters.
50
Another example of combination of character is as shown in Figure 3.25, the
‘mya’. In the programming part, a text box should be provided below the rich text box of
the pronunciation guide to do further notice to the user about the correct pronunciation
of ‘mi ya’. This text box should be hidden again when user do further selection of text.
How can we do this? The trick is at the ‘Tricky part’ as shown in Figure 3.25.
This part is provided to show the user about the pronunciation of current selection and
the previous selection. This part playing an important role in programming to detect any
combination of characters that provide different pronunciation. For example, when ‘mi’
is the previous selection, and ‘ya’ is the current selection, the text box will appeared and
provides extra pronunciation guides. For other combinations, same method will be
applied. So, in programming part, while providing pronunciation guide, extra
programming technique like ‘Select case…case ‘mi’..if followed by ‘ya’..End Select’
then prompt out the proper pronunciation should be applied.
Next, how the system knows the image captured and selected is in Chinese or
Japanese? In this case, by referring to Figure 3.25, the interface has provided a menu
strip to let the user to change from Chinese recognition mode to Japanese recognition
mode. In the programming part, if the user selects the Japanese mode, only the Japanese
recognition text files will be access. If the user selects the Chinese recognition mode,
only the Chinese recognition text files will be access.
51
3.3.10 SERIAL COMMUNICATION
In this project, serial communication being used to communicate with the
hardware to let the hardware know what action should be taken to the item that not pass
or pass the test. Serial communication is done by using RS232 and MAX232 chip to
connect computer with the PIC18F452. The following coding is the serial
communication module provided by mikroC helps menu.
void main() {
USART_init(9600);
// initialize USART module
// (8 bit, 9600 baud rate, no parity bit...)
while (1) {
if (USART_Data_Ready()) {
i = USART_Read();
USART_Write(i-32);
// if data is received
// read the received data
// send data via USART
}}}
By observing this code, we can modify the coding to suit our needs. When
receive certain data, we can launch certain function. After the task finish, we can send
back certain data to the computer to update the status. APPENDIX D is the modified
coding to suit this project. The data is 8 bits data, in this case, it is a character (alphabet)
because character length is 8 bits.
The coding from APPENDIX D is for PIC18F452. For the software to “talk”
with the PIC, another set of coding is needed. In Visual Basic 2005, the serial
communication part has been improved compared to the previous version. Library must
be included at the beginning of the coding as shown below.
52
Import System.IO.Ports
Import System.Runtime.Remoting.Messaging
The detailed coding can refer to www.lvr.com [20]. After the declaration, a
dialog box (Figure 3.26) have to be created to let the user choose the number of comport,
baud rate, hand shaking, and so on. For this project, a small letter will be sent to the PIC
to tell the PIC to do different work according to different alphabet sent. After finish
certain work assigned, PIC will send back the same alphabet, but in capital letter to
update the status. Capital letter can be obtained by minus the ASCII code by 32. In the
programming part, any small letter typed inside the text box of the ‘COMPORT
ACTIVITY’ will be sent to the PIC. The echo also will be displayed at the same text
box. So, the user can control the movement of the hardware manually by using keyboard.
In serial communication data sending, if the OCR inspection passes, ‘a’ will be
sent to the PIC, so the hardware will move to the next item. If OCR inspection not
passes, ‘b’ will be sent to PIC, it will activate the rejecter after the Item Slots moved to
the next item. The hardware take action according to the serial data sent to PIC. For
more detail, refer to APPENDIX D.
Figure 3.26: Port Setting Dialog Box.
53
3.4
HARDWARE AND CIRCUIT DESIGN
3.4.1 LIGHTING
Lighting playing an important role for image capturing. Proper lighting helps to
reduce noise for the image capturing process. First of all, distance of the camera from
image has to be fixed before continue to write the OCR program. With the help of super
bright L.E.D, we can start to adjust the value of threshold and image position. Figure
3.27 below is the lighting set up process and criteria.
(a)
(b)
Figure 3.27: Camera and lighting positioning
a) Camera Distance Fixing b)Lighting set up
The lighting for L.E.D. is 50 mm x 50 mm surrounding the picture as shown in Figure
3.27 (b). The L.E.D. used are white L.E.D. For testing purpose (circuit on proto board),
11 L.E.D. being used. For the real circuit, only 7 L.E.D. are required. The reason using
L.E.D. as lighting source is, energy saving and long life time. The lighting angle is about
45 to 50 degrees pivot from horizontal, which is, from the image. Camera distance from
image is 640mm. This distance should be fixed all the time to avoid any changes of the
font size.
54
3.4.2
REJECTER DESIGN
The rejecter consists of a 5V small DC motor and two ‘legs’ to push out items
that not pass the inspection test as shown in Figure 3.28.
Motor
Legs
Figure 3.28: Rejecter design and position. Red arrow shows the movement path of
rejecter.
The rejecter is one of the motors driven by the motor driver. Legs are the small
aluminum plates stick on the moveable part of the rejecter. Small pieces of sponge stick
on the legs to enlarge the bottom area of the legs to push out items. Actually, this is a
part of CD player that we normally can find.
The position of the rejecter should be very carefully placed to avoid any crashing
of the system. Rejecter placed next to the camera so it can push out the item
immediately after the software has process the image. It can be program to move
55
forward and backward. Care should be taken to not damage the motor while
programming because this rejecter has limited place to move. In programming part, let
the motor move forward for 110ms and stop for 300ms before move backward for
120ms. These values obtained through trial and error. If we let the motor move forward
or backward longer in time, the gears will be damaged. In addition, if we are not
stopping the motor for a while after moving forward before move to backward, the
motor will not last long also. Rejecter should be supported to achieve proper height
(100mm) as shown in the Figure 3.29 below.
100mm
Figure 3.29: Rejecter should be at proper height a) view from behind b) side view
3.4.3 ITEM SLOTS
Item slots or inspection slots is the place where we put the items that we need to
inspect on it. The proper dimension of Item Slots is as shown in the APPENDIX B. The
real Item Slots built is as shown in the Figure 3.30 below. Several magnets stick below
the Item Slots to hold the inspection items on their position. In real case, through
mechanical fabrication, magnets are not required.
56
Slits
(a)
(b)
Magnets to hold
inspection items
(c)
Figure 3.30: Item Slots a) top view b) side view c) bottom view
57
To move the slots, a small 12V DC motor was used under the Item Slots as
shown in Figure 3.31. An anti-slide material (for car dashboard) sticks along the Item
Slots and the motor gear to create friction for the smooth movement of motor instead of
using expensive motor conveyer belt and timing belt.
Motor
Anti-slide
material
Figure 3.31: Motor for Item Slots
In order to let the slots stop at exact location for the camera to capture the image,
some effort has been paid. The first trial is by using black strips stick on the back of the
Item Slots. Then, the IR sensor pair is placed below the camera to detect the reflection of
light. By detecting black strips, it will give signal to the PIC to let the Item Slots stop.
Unfortunately, after testing the mechanism under different lighting conditions, the Item
Slots not stop at same position each time. This is due to the noise from surrounding. The
different intensity of light, will affect the sensitivity of the IR sensor. The first trial is as
shown in Figures 3.32 and 3.33.
58
Black strip
IR sensor
Figure 3.32: First trial (detecting the black strips)
Figure 3.33: IR sensor function by reflection
After that, a better method has to be figured out. By applying the concept of
rotary encoder, several slits were built along the slots as shown in figure 3.34, and shield
the IR sensor to let it immune to lighting changes from the surrounding. This time, IR
sensor was configured to detect the slits, not base on reflection. The problem solved.
Figure 3.34: IR sensor detecting the slit
59
Transmitter
and receiver
inside
Figure 3.35: Shielded IR sensor
Figure 3.36: Slits on Item Slots
60
CHAPTER 4
PROJECT IMPLEMENTATION
This chapter basically about discussion of how to use or implement the software
and hardware of the system built and results of the system implemented.
4.1
OPERATION DESCRIPTION
This project consists of software part and hardware part. Software part are the
OCR inspection with teach function, Shape recognition, color recognition, Chinese
recognition with English translation, Japanese recognition with pronunciation guides ,
lastly, different size of font recognition. For hardware part, we don’t have to take care
about it much because it is controlled by the software with serial communication.
61
4.2
SYSTEM IMPLEMENTATION AND RESULT
4.2.1 SOFTWARE IMPLEMENTATION
After writing the software through Visual Basic 2005, an exe file will be built
when we try to simulate the software. The exe file is stored inside the bin file in the
project as shown in Figure 4.1. This exe file will let the user run the system created.
When user double click the exe file, the Graphical User Interface created will be
executed.
Figure 4.1: Location of the exe file
4.2.1.1 CHARACTER RECOGNITION AND INSPECTION
After clicking the exe file, the first interface in the OCR inspection interface.
This is an interface that consists of various functions. A menu strip, with drop down lists,
a lighting threshold adjustment bar, Available Devices display for displaying available
62
cameras connected to the computer, several function buttons, comport activity and status
display, and Teach function as shown in figure 4.2.
Menu
strip
Function
buttons
Camera
available
Comport
status
Image
frames
OCR
display
Teach
mode
Lighting threshold
adjustment
Automatic
inspection button
Figure 4.2: Character recognition and inspection interface
First of all, we should activate the comport to let the computer communicate with
the hardware. The hardware must be switch on. In the menu strip, click ‘SETTING’,
then, from the drop down list, select ‘COMPORT’. After that, a window interface will
be popped out as figure 4.3 to let the user select the comport, baud rate, handshaking and
63
so on. For this case, select the COM1 and baud rate of 9600 since 9600 is used in
writing mikroC.
Figure 4.3: Port setting user interface.
After setting the comport, user should click ‘OK’. Then user will back to the
earlier OCR interface. This project communicates with the hardware by sending 8 bit
character. User can observe the character sent and received by selecting ‘COMPORT
ACTIVITY’ in the ‘SETTING’ section of the menu strip. A text box will be displayed
above the comport status text box. Then, user should click ‘OPEN PORT’ in the OCR
interface. The button will change to ‘CLOSE PORT’ because it is programmed as a
multi-function button. User can click the same button again to close the comport. The
camera should be connected to the CPU to enable the ‘Start Preview’ button.
After clicking the start preview button, a camera image will be displayed in the
picture box as shown in Figure 4.4. Then, user can adjust the position of the items based
on the camera image. For OCR inspection, the program is written such a way that it can
recognize the image at any position of the picture box, but the image must not be rotated.
Rotated image will generate error in the software and the inspection process will stop.
64
Figure 4.4: Image Preview from camera.
After that, user can adjust the lighting bar to adjust the image threshold value and
click the ‘Black White’ button until a satisfied image occur in the next picture box. User
can use the Auto calibrate button by applying the method as stated earlier. After
capturing a good image with suitable threshold value, user can try the OCR result by
clicking ‘OCR3’. The software will process the image and the character recognized will
be displayed in the text box labeled OCR result. If the result is not the same as the
characters on the item, the threshold value should be adjusted again or the position of the
item is not correct or rotated.
There is a section called ‘OCR Model’ below the ‘OCR result’ section. This is
the place where the user inserts the model for the inspection. If the character recognized
in the ‘OCR result’ part not the same with the ‘OCR Model’ inserted, NOT PASS code
will be generated as shown in Figure 4.5. Then, the item will be rejected. If same, PASS
code will be generated. Of course, the OCR Model can be changed. User can select the
‘INSERT OCR MODEL’ in the ‘SETTING’ section in the menu strip. An Input Box
will appear to ask the user to key in the OCR model. Proper guides on how to key in the
OCR model has been provided in the Input Box.
65
If the OCR result match with the characters in the image and the hardware is at
the correct position, user can press the green button and start the Inspection process.
There is a small text box labeled ‘Item’. It is created for display the item inspected and
count the number of inspection item. If the number is 4, the inspection will be ended.
The slots will be reset back to the initial position. To emergency stop the inspection
process, user can click the ‘STOP INSPECTION’ button displayed in red color where it
is initially green color and displayed ‘START INSPECTION’. After starting the
inspection, the hardware will move, and the software will capture and analyze the image
automatically. The movement of hardware will be discussed in the hardware
implementation part.
Different
Figure 4.5: NOT PASS code generated
The ‘OCR TEACH ENABLE’ item in the ‘SETTING’ section of the menu strip
enables the user to change to Teach Mode. This function is password protected to avoid
any misusing (password is ckho). User requires some knowledge of the program flow to
teach the system. It will activate the ‘Teach’ button. This system is not “mature”
enough. There are still some symbols or characters that the system not recognizes. To
improve the system, user can always teach the system by pressing the ‘Teach’ button
66
after capture and threshold the image. The system will memorizes and stores the array
or pattern in the computer as ROM. The image captured should contain only one
character regardless any position (none rotated). The teaching process shown in Figure
4.6 below. After clicking the teach button, an Input Box will occurs to ask user insert
the character in the image. Here, user can key in according to the image.
Next, click ‘OK’. Teach completed. Now, user can try the result by clicking
OCR3. The result will be displayed as shown in Figure 4.7. NOT PASS code generated
because it is not same with the OCR model. Now, the system can recognize the inverted
R.
How to erase the taught character if mistake was made in the Teach process?
This can be done by entering the text files named “OCRdatabase.txt” and delete the
highlighted part as shown in Figure 4.8. Here, we can observe, the system save the
characters in the form of array element ‘1’ and ‘0’ as stated earlier. The system already
programmed to do 500 alphabets, 10000 Chinese characters, 10000 Japanese characters.
The amount can be easily increased by little effort of programming.
Figure 4.6: Teach function of the system
67
Figure 4.7: OCR result after teaching the system.
Figure 4.8: database in a text file for system recognition.
68
4.2.1.2 SHAPE RECOGNITION AND PCB INSPECTION
In the menu strip of the OCR inspection interface, user can select the ‘PROJECT’
item. A drop down list occurs. ‘Shape recognition’ item can be selected to enter next
windows, the shape recognition mode. To do inspection, the comport must be activated,
hardware must be switched on. A new interface for shape recognition will occur as
shown in Figure 4.9.
Menu strip
Camera
frame
Camera
Function
Function
buttons
Image
frame
Threshold
adjustment
Percentage
setting
Figure 4.9: Shape recognition interface.
Selection
Start
inspection
69
First of all, camera preview can be started by pressing ‘Start Preview’ button.
After that, threshold of the image can be adjusted by adjusting lighting threshold bar.
Then, user can press ‘Black White’ button to see the result of threshold image. If the
threshold is not satisfying, adjust the threshold again to get clearer black and white
image. As shown in Figure 4.10.
Figure 4.10: Threshold image of a simple PCB pattern.
User can select maximum of 6 parts to do inspection as stated earlier. This
technique called blob discovery. First of all, remember that the system save the pattern
in ROM. Hence, the ‘Delete’ button should be pressed to delete any previous selection
of previous pattern in the memory. User will prompt to an Input Box to confirm. After
delete the previous pattern, selection can be started. Selection can be done in the Image
Frame by dragging the parts that user wants to inspect by mouse. A red square will
occurs at the selection part. Then, click the ‘Memories Blob’ button to memories the
parts. The number of parts memorized will be shown in the Selection section in light
blue color. Example shown in Figure 4.11, where two parts are selected.
70
Figure 4.11: Marked selection region.
Then, the ‘SET PERCENTAGE TO PASS’ should be adjusted. If the image is
similar, it will have higher percentage of similarity. This section will let the user set a
threshold to let the item pass or not. If the percentage of similarity calculated less than
this value, it will be rejected out in the inspection process. User can test the result by
capturing an image of defected PCB as shown in Figure 4.12. After capturing the image,
just press ‘Black White’ button, then, press ‘Blob Inspection’ button to observe the
result.
Through the testing, the defective PCB is missing one part, which is the first
selection part. It has lower percentage of similarity (63.141 %). The percentage of
similarity not equal to 0 because while in selection, some of the white region has been
selected and the white region still match with the defective circuit. The non defective
part has 90.833% (not 100% because we can’t achieve perfect positioning) of similarity.
The threshold percentage setting was 83% (set by trial and error). One of the parts
calculated not pass, will cause the item rejected.
71
After selection and threshold testing, the inspection can be started by clicking the
‘START INSPECTION’ button, which is green in color. The system will run
automatically.
Figure 4.12: Result of shape inspection
4.2.1.3 COLOR RECOGNITION AND INSPECTION
To enter the color inspection mode, in the menu strip of the OCR inspection
interface, user can select the ‘PROJECT’ item. A drop down list occurs. ‘Color
recognition’ item can be selected and enter the next windows, the color recognition
mode. First of all, to enable the Group Box, select the ‘Percentage color inspection’ and
‘Regional color inspection’ in the ‘Menu’ in the menu strip. Then, tick the radio button
of the ‘Region’ section to enable mouse selection.
72
Menu strip
Mouse
marking
Camera frame
Function
buttons
Image frame
Percentage
setting
Result frames
RGB threshold
setting
RGB values
display
Figure 4.13: Color recognition interface.
The color recognition interface consists of two modes. First is Regional Color
Inspection. Second is Percentage Color Inspection. For Regional color inspection, user
can select maximum of 3 colors to inspect. First of all, the ‘Delete’ button should be
pressed in order to delete previous stored data. Next, click the ‘Start Preview’ button to
enable the camera. Then, press the ‘Capture’ button to capture the image and display at
Image frame. After that, user can drag and select any three regions in the Image frame in
order to let the system memorize. The selected regions will be marked with red squares.
Let the system memories the colors by clicking ‘Memories Color’ in the function
73
buttons section. Adjust the RGB threshold by adjusting the ‘REGIONAL’ scrollbar.
Tolerance of 20 will be sufficient for this camera. If the camera quality is better,
probably tolerance will be lower. The interface should be similar with Figure 4.14.
Figure 4.14: Regional color inspection mode.
Now, the system memorized the three colors in each different position. User can
try the result by clicking ‘Color Inspect’ button. The result should be PASS as shown in
Figure 4.15. Then, try a defective item. The result should be NOT PASS as shown in
Figure 4.16. The tolerance should be adjusted if the system gives PASS signal to the
defective colors. If one of the colors different, the item considered not pass.
After the testing, user can start the inspection by clicking ‘START
INSPECTION (REGIONAL)’ button, which is green in color.
74
Figure 4.15: Regional color inspection for PASS item.
Figure 4.16: Regional color inspection for NOT PASS item.
75
For percentage color inspection, the system will detect and calculate the
percentage of certain color in the item instead of detecting the colors of certain regions.
‘Delete’ button should be pressed. Selection can be made by same method. RGB
threshold can be adjusted to get better filter of colors in the Result frames. The interface
should be similar with Figure 4.17 after clicking ‘Color % Inspect’ to try the result.
Through Figure 4.17, the filtered colors are not good enough. RGB threshold has
to be adjusted again. The result is NOT PASS because the percentage setting to pass is
0 %. For this case, the colors on the item will be limited to 3%. The text boxes of
Percentage setting section have to be changed to 3 %. If one of the colors selected
exceed 3 %, it will be rejected out.
Figure 4.17: Percentage color inspection mode.
After increased the RGB tolerance to 20 of each, and the percentage have been
changed, the result should be PASS after clicking the ‘Color % Inspect’ button. Colors
filtered become clearer. The PERCENTAGE section is the calculation of percentage of
76
selected colors on the item. The result is shown in Figure 4.18. Figure 4.19 shows the
PASS item, while Figure 4.20 shows the item that will be rejected by the system.
To start the inspection and let the system run automatically, user should click the
‘START INSPECTION (PERCENTAGE)’, which is green color after carried out the
testing procedures above.
Figure 4.18: Improved setting to filter out colors.
77
Figure 4.19: Color percentage inspection is independent of the item’s pattern.
Figure 4.20: Item that not pass the test because over percentage.
78
4.2.1.4 CHINESE RECOGNITION AND TRANSLATION
To enter Chinese recognition mode, user can click the ‘Chinese’ item in the
‘PROJECT’ item in the menu strip of the OCR inspection interface. A new window will
be popped out to user as shown in Figure 4.21.
Menu
strip
Camera
frame
Recognition
result
Image
frame
Lighting threshold
adjustment
Editable
text section
Mode
display
Counting
Figure 4.21: Chinese recognition interface.
Using the same method, image can be acquired and threshold. The lighting
threshold should be adjusted to get clear black and white image each time. The
79
computer should be equipped with Chinese software in order to display and type
Chinese characters. NjStar communicator will be used in this project.
After captured an image, threshold the image by clicking ‘Black White’ button.
As user move the mouse into the Image frame, a red square will occur. User has to move
the square to surround certain character in the image. After that, click the mouse. The
recognition result and translation will occur in the Recognition result section. The result
should be similar with Figure 4.22.
There are 5 characters appeared simultaneously in the Recognition result section.
User has to select the best result. In this case, is the forth one. If the user click the text
box in the Recognition result section, the result will turn to editable text in the Editable
text section as shown in Figure 4.23.
Text
box
Figure 4.22: Chinese recognition result.
80
Figure 4.23: Chinese recognition result turned to editable text.
After a few recognitions, user can export the editable text to other path of the
computer in text file format by clicking the ‘EXPORT’ button. A text file with the
editable text on it will be popped out. Here, user can save the file to desired path as
shown in Figure 4.24.
Figure 4.24: Chinese recognition result can be exported.
81
The system equipped with intelligent teach function. If the system can’t
recognize the character, enable the teach function with the same password. Teach mode
enable is in the menu strip, ‘Menu’ item. System will indicate the Teach mode in mode
display section. Select the character with the red square, an Input Box will occur. User
has to key in the English meaning of the Chinese character here and click OK. Then, a
message box will prompt user to enter the Chinese character in the text box occur and
shown in Figure 4.26. After key in the Chinese character, click the ‘SAVE’ button. End
the teach mode by clicking the ‘Teach Mode Enable’ again in the Menu.
After that, the system will recognize the image when the character being selected
again as shown in figure 4.27. If user wants to erase the taught character, same method
will be applied as OCR inspection interface. The files should be access are “chinese.txt”,
“ChineseEng_database.txt”, and “Chinese_database.txt”. The last line of these files
should be deleted.
Figure 4.25: Chinese recognition Teach Mode.
82
Figure 4.26: Key in the Character by Chinese software in the text box.
Figure 4.27: New recognition result.
83
4.2.1.5 JAPANESE RECOGNITION AND PRONUNCIATION GUIDES
The same method will be applied as Chinese recognition. The only different is
the extra rich text box provided for pronunciation guides beside the editable text section.
To change from Chinese recognition to Japanese recognition, click the ‘Japanese
Recognition’ in the ‘Setting’ section in the menu strip. Mode indication will be changed.
The teaching procedures and export procedures are the same. The results are as shown
in Figure 4.28 and 4.29.
When there is a combination of pronunciation, the system will prompt the user as
shown in Figure 4.29. To delete the taught character, this time, user have to access the
“japanese.txt”, “JapanesePro_databese.txt” and “Japanese_database.txt” and delete the
last line.
Figure 4.28: Japanese Recognition result.
84
Figure 4.29: Japanese recognition result for combination of pronunciation.
4.2.1.6 DIFFERENT CHARACTER SIZE RECOGNITION
For different character recognition, it is built for other application like plat
number recognition. To enter this mode, at the OCR inspection interface, select the ‘Plat
Number’ item in the ‘PROJECT’ item of the menu strip. After the window popped out,
acquired image as the method above. Try to recognize different size of characters by
adjust the distance of the camera to the image and click the OCR button. The system still
be able to recognize the image. Sometimes, error will occur. User can teach the system
by using the same method as OCR inspection interface. To teach the system, press the
‘Recognize’ button.
85
Figure 4.30: Larger size character recognition.
Figure 4.31: Smaller size characters recognition
86
4.2.2 HARDWARE IMPLEMENTATION
Hardware implementation is very simple. First of all, we have to make sure the
Item Slots is at its initial position. As shown in the Figure 4.32. Switch on both power
supply and we will observe that the LED lighted as shown in the Figure 4.32. Camera
should be connected to the USB port of the CPU. The inspection items model should be
placed on the Item Slots as shown in Figure 4.33. System will scan through the items
one by one after user clicked the ‘START INSPECTION’ buttons in each mode.
Rejecter will reject the items according to the inspection results. After scanned through
4 items, the Item Slots will reset back to initial position.
MAX 232 to
RS232
Track
Circuit
Camera
PIC
programmer
Rejecter
Lighting
Figure 4.32: Overall system hardware.
Item Slots
initial position
87
(a)
(b)
Figure 4.33: a) Inspection items position b) System scan through each items
Red light indicate NOT PASS
(a)
Green light indicate PASS
(b)
Figure 4.34: System flow a) Rejecter reject the item b) Process continue until last
item
Figure 4.35: System reset to initial position automatically after finish.
88
4.3
RESULT ANALYSIS
After numbers of testing for each mode of the system, errors occur most frequent
under OCR inspection mode. Errors occur due to position of the items rotated. System
can’t analyze the rotated image and cause system generates error and must be restarted.
Effort has been taken in the programming part to handle this problem. When error
occurs, the system will stop the inspection and gives error message to the user instead of
restarting the program again. This is a temporary method. More suitable method should
be figured out to recognize rotated image.
Normally, type of error occurs for shape inspection is the same with OCR
inspection. Rotated items will cause the percentage of similarity decreased. Hence
system will reject the rotated items even it is none defective. For this project, a small
metal plate was stick at the bottom face of the inspection item. It can help to hold the
position of the items with the present of magnets below the Item Slots. There is no
problem for other inspection and recognition mode.
The speed of the inspection is about 3 to 4 seconds for each item. For color
recognition, it will take longer time if user selects more color. Process speed can be
increased by closing all other running program of the computer and use better processor.
89
CHAPTER 5
CONCLUSION AND FUTURE WORKS
5.1
CONCLUSION
OCR system with various functions has been created. Although there are still
some weakness from the mechanical parts, but this can be solve through real accurate
mechanical parts fabrication. The speed of the whole process can be increase drastically
by computers that with greater processor speed. The concepts of the software parts can
be apply in materials inspections and language learning and turn to editable text. More
features can be added to maximize OCR function in the future. The whole project is
basically study and applies of OCR knowledge. Once we understand the image
processing skills, we will be able to generate a lot of ideas to let the computers work for
us.
90
5.2
FUTURE WORK RECOMMENDATION
In the future, the weakness of the mechanical part in this project can be improved
by real mechanical fabrication. Accurate mechanical fabrication can avoid the
inspection items to rotate from original position. Better CPU processor should be used
to increase the speed of image processing and pixel calculation. The whole project
should be shielded by using dark Perspex to avoid any lighting noise from surrounding.
More reference point should be taken for shape recognition and inspection instead of
only one reference point in this project. Quality of the camera should be improved to
increase the image capturing speed.
The character recognition software can be modified and install inside the hand
phone. When we are in foreign country, it can help us to capture and translates any
language that we are not understand.
Figure 5.1: Use hand phone’s to translate any language.
91
REFERENCES
1.
“Optical Character Recognition”, Wikipedia, The free encyclopedia.
http://en.wikipedia.org/wiki/OCR.
2.
“Machine Vision” Wikipedia, The free encyclopedia.
http://en.wikipedia.org/wiki/Machine_Vision.
3.
Leow Yee Run, “Neural Network Simulator for Alphabet Recognition with
Speech Synthesizer”, Final Year Thesis UTM, FKE, 5 SEM, 2003/2004.
4.
Noor Khafilah Binti Khalid, “An Image Processing Approach Towards
Classification of Defects Printed Circuit Board”, Final Year Thesis UTM, FKE,
5 SEM, 2007/2008.
5.
Azizul Bin Kepli, “Vision Based autonomous Color Detection and Object
Tracking Robot”, Final Year Thesis UTM, FKE, 5 SEM, 2003/2004.
6.
William A. Gowan, “Optical Character Recognition Using Fuzzy Logic”, free
scale semiconductor, http://
www.freescale.com/files/.../doc/app.../AN1220_D.pdf , 2004.
7.
Hao Song & WeiXing Wang, “A new separation algorithm for overlapping
blood cells using shape analysis”, “International journal of pattern recognition
and artificial intelligent”, world Scientist publishing Company, vol. 23 pg 847864.
8.
“An assessment of Geometric activity features for per pixel Classification of
urban man-made objects using very high resolution satelite imagery”,
“Photogrammetric Engineering and Remote Sensing”, vol.75 April 2009, pg
397-411.
92
9.
Zhangquan, “Modification of pixel swapping algorithm with initialization from
a sub pixel / pixel spatial attraction model”, “Photogrammetric Engineering and
Remote Sensing”, vol.75 April 2009, pg 557-567.
10.
Mikael Laine, “A STANDALONE OCR SYSTEM FOR MOBILE
CAMERAPHONES” , University of Turku, Department of Information
Technology and Turku Centre for Computer Science (TUCS) , 2006.
11.
Zhidong Lu, “A Robust, Language-Independent OCR System”, Cambridge, MA
02138, http:// www.metacarta.com/.../Language-independent-OCR-Kornai.pdf
12.
“Baud Rate”, http://www.pccompci.com/Baud_Rate.html
13.
Lammert Bies, “RS232 Specifications and standards”,
http://www.lammertbies.nl/comm/info/RS-232_specs.html, 2009.
14.
Kai Rider and Kai Tracer, “Use of Infrared Transimitter and Receiver to detect
black or white line”, http://irbasic.blogspot.com/, Friday, November 24, 2006
15.
Nello Zuech, President, Vision Systems International, “Considerations in
OCR/OCV Applications”, Kallett’s Corner,Machine Vision Market
analysis ,2009. http://www.machinevisiononline.org/
16.
“KnowledgeBase for ActiveReports for .NET”,
http://www.datadynamics.com/forums/76296/ShowPost.aspx
17.
Evangelos Petroutsos, “Mastering Microsoft Visual Basic 2005 ”, Wiley
Publishing, Inc, 2006 .
93
18.
“Hiragana” , Wikipedia, The free encyclopedia,
http://en.wikipedia.org/wiki/Hiragana.
19.
“Katakana”, Wikipedia, The free encyclopedia,
http://en.wikipedia.org/wiki/Katakana
20.
Jan Axelson, “Lakeview Research”, http://www.lvr.com.
21.
“Machine Vision” Wikipedia, The free encyclopedia.
http://en.wikipedia.org/wiki/Machine_Vision.
22.
“MAX232, MAX232I, DUAL EIA-232 DRIVERS/RECEIVERS”,
http://focus.ti.com/lit/ds/symlink/max232.pdf, TEXAS INSTRUMENT,
FEBRUARY 1989 − REVISED MARCH 2004.
23.
Wichit Sirichote, “RS232C Level Converter” ,
http://chaokhun.kmitl.ac.th/~kswichit/MAX232/MAX232.htm
24.
Shoaib Ali , “To test MAX232”, Shoaib Ali, http://www.arcelect.com/rs232.htm ,
06/30/09.
25.
“LOW POWER QUAD OPERATIONAL AMPLIFIERS”,
http://www.datasheetcatalog.com/LM324 , STMicroelectronics, 2001.
26.
“Tobacco sorter TS3” , http://www.key.net/products/tobacco-sorter/default.html
94
APPENDIX A
2D MECHANICAL DRAWING (BODY TOP VIEW)
95
APPENDIX B
2D MECHANICAL DRAWING (SLOTS TOP VIEW)
96
APPENDIX C
2D MECHANICAL DRAWING (OVERALL TOP VIEW)
97
APPENDIX D
MICE EXHIBITION CERTIFICATE
98
APPENDIX E
PIC18F452 CODING
int i=300;
int j=1;
unsigned char b;
/////////////////////////////////////
void motorleft(void)
{ while(i--)
{PORTA.F5=1;
} PORTA.F5=0;
}
///////////////////////////////////////
void picker(void)
{
PORTB.F1 = 1;
Delay_ms(110);
PORTB.F1=0;
Delay_ms(300);
PORTB.F2 = 1;
Delay_ms(120);
PORTB.F2 = 0;
//function that move motor to left
//function for rejecter movement
//utilize concept of PWM to control motor speed
//forward (push)
//backward (pull back)
}
///////////////////////////////////////
void reachline1(void)
{
while(PORTD.F0!=0)
//if sensor not sense the slit, move.
{PORTA.F5=1;
Delay_us(4);
// PWM
} PORTA.F5=0;
}
//////////////////////////////////////
void escapeline(void)
{
while(PORTD.F0==1) //while sensor sensing the slit, move away from slit
{PORTA.F5=1;
}PORTA.F5=0;
}
///////////////////////////////////////
void reachfollowing(void)
{ while(PORTD.F0!=1) //reach following slit
{
PORTA.F5=1;
Delay_us(8);
//PWM
PORTA.F5=0;
} PORTA.F5=0;
}
//////////////////////////////////////
void escapelinej(void) //while sensor sensing the slit, move away from slit
{
// This function is for movement from to reset position
99
while(PORTD.F0==1)
{PORTB.F4=1;
}PORTB.F4=0;
}
//////////////////////////////////////
void reachfollowingj(void)
//reach following slit, for reset position
{ while(PORTD.F0!=1)
{
PORTB.F4=1;
Delay_us(13);
//PWM
PORTB.F4=0;
} PORTB.F4=0;
}
///////////////////////////////////////
void overshoot_test(void)
//to test whether the position of slot overshoot
{if(PORTD.F0==0)
{ while(PORTD.F0!=1)
//if overshoot, move back to find the slit
{
PORTB.F4=1;
Delay_us(3);
//PWM , very slow movement to move back to slit
PORTB.F4=0;
}PORTB.F4=0;
}
else {}
}
/////////////////////////////////////////
void main() {
USART_init(9600);
// initialize USART module
ADCON1 = 6;
// (8 bit, 9600 baud rate, no parity bit...)
TRISD =0xff;
TRISA=0;
PORTA=0;
TRISB=0;
PORTB=0;
while(1)
{
if(Usart_Data_Ready())
{
b=Usart_Read();
if(b=='i')
{ motorleft();
Usart_Write(b-32);
}
if(b=='j')
{ PORTA.F1=0;
PORTA.F0=0;
while(i--)
100
{PORTB.F4=1;
} PORTB.F4=0;
Usart_Write(b-32);
}
if(b=='p')
{
picker();
Usart_Write(b-32);
}
//picker or rejecter
if(b=='v')
{ PORTA.F1=0;
PORTA.F0=1;
escapeline();
Delay_ms(1000);
reachfollowing();
Delay_ms(500);
overshoot_test();
Delay_ms(500);
picker();
Delay_ms(1000);
Usart_Write(b-32);
}
//OCR NOT PASS
if (b=='t')
{ PORTA.F1=1;
PORTA.F0=0;
escapeline();
Delay_ms(1000);
reachfollowing();
Delay_ms(500);
overshoot_test();
Usart_Write(b-32);
}
//OCR PASS
if(b=='u')
{reachfollowing();
Usart_Write(b-32);
}
if (b=='k')
{ PORTA.F1=1;
PORTA.F0=0;
escapeline();
Delay_ms(1000);
reachfollowing();
Delay_ms(500);
overshoot_test();
Usart_Write(b-32);
}
//SHAPE PASS
101
if(b=='l')
{ PORTA.F1=0;
PORTA.F0=1;
escapeline();
Delay_ms(1000);
reachfollowing();
Delay_ms(500);
overshoot_test();
Delay_ms(500);
picker();
Delay_ms(1000);
Usart_Write(b-32);
}
//SHAPE NOT PASS
if (b=='m')
{ PORTA.F1=1;
PORTA.F0=0;
escapeline();
Delay_ms(1000);
reachfollowing();
Delay_ms(500);
overshoot_test();
Usart_Write(b-32);
}
if(b=='n')
{ PORTA.F1=0;
PORTA.F0=1;
escapeline();
Delay_ms(1000);
reachfollowing();
Delay_ms(500);
overshoot_test();
Delay_ms(500);
picker();
Delay_ms(1000);
Usart_Write(b-32);
}
if (b=='o')
{ PORTA.F1=1;
PORTA.F0=0;
escapeline();
Delay_ms(1000);
reachfollowing();
Delay_ms(500);
overshoot_test();
Usart_Write(b-32);
}
if(b=='r')
{ PORTA.F1=0;
PORTA.F0=1;
//COLOR PASS (regional)
//COLOR NOT PASS (regional)
//COLOR PASS (percentage)
//COLOR NOT PASS (percentage)
102
escapeline();
Delay_ms(1000);
reachfollowing();
Delay_ms(500);
overshoot_test();
Delay_ms(500);
picker();
Delay_ms(1000);
Usart_Write(b-32);
}
if (b=='x')
{ escapelinej();
reachfollowingj();
escapelinej();
reachfollowingj();
escapelinej();
reachfollowingj();
escapelinej();
reachfollowingj();
Usart_Write(b-32);
}
if(b=='z')
{ escapeline();
Delay_ms(1000);
reachfollowing();
escapeline();
Delay_ms(1000);
reachfollowing();
escapeline();
Delay_ms(1000);
reachfollowing();
escapeline();
Delay_ms(1000);
reachfollowing();
Usart_Write(b-32);
}
//motor to right and through each slits
//motor to left and through each slits
else if (b=='a' || b=='b' || b=='c' ||b=='d' ||b=='e' || b=='f' ||b=='g' ||b=='h' )
{ Usart_Write(b-32);
}
}
}
}