Introduction of Classification Method for Recognition of Printed Characters

<

The identification method is the core of the entire system. Pattern recognition methods for Chinese character recognition can be broadly classified into structural pattern recognition, statistical pattern recognition, and a combination of the two. The following are introduced separately.

Structural pattern recognition

Chinese characters are a special kind of pattern. Although their structure is more complicated, they have rather strict regularity. In other words, the Chinese character graphic contains rich structure information, and it can seek to extract structural features and grouping rules containing such information as a basis for recognizing Chinese characters. This is structural pattern recognition.

Structural pattern recognition is the main method of early Chinese character recognition research. Its main starting point is the composition of Chinese characters. From the standpoint of the composition of Chinese characters, Chinese characters are composed of strokes (points and columns, etc.) and radical radicals. It can also be assumed that Chinese characters are composed of smaller structural primitives. From these structural primitives and their interrelationships, it is possible to accurately describe the Chinese characters, just as an article is composed of words, words, phrases, and sentences according to grammatical rules. So this method is also called syntactic pattern recognition. In recognition, the above structural information and syntax analysis methods are used for identification, similar to a logical reasoner.

Using this method to describe the structure of Chinese characters is theoretically more appropriate. Its main advantage lies in its strong adaptability to font changes and its ability to distinguish similar words. However, in practical applications, the main problem faced is the ability to resist interference. Poor, because there are various disturbances in the actual text image, such as tilt, twist, break, adhesion, stain on the paper, poor contrast, and so on. These factors directly affect the extraction of the structural primitives. If the structural primitives cannot be accurately obtained, the subsequent reasoning process becomes passive water. In addition, the description of structural pattern recognition is more complicated and the complexity of the matching process is also higher. Therefore, in the field of printed Chinese character recognition, the pure structure pattern recognition method has gradually declined, and the method of syntactic recognition is increasingly being challenged.

Statistical pattern recognition

Statistical decision theory developed earlier and the theory is more mature. The main point is to extract a set of statistical features of the pattern to be identified, and then make classification decisions according to the decision function determined by certain criteria.

The statistical pattern recognition of Chinese characters is to treat the character lattice as a whole, and its features are obtained through a large number of statistics on the whole. Statistical characteristics are characterized by strong anti-interference, and matching and classification algorithms are simple and easy to implement. The disadvantage is that the subdivision ability is weak, and the ability to distinguish similar words is worse. Common statistical pattern recognition methods include:

(1) Template matching. Template matching does not require a feature extraction process. The image of the character is directly used as a feature. Compared with the template in the dictionary, the template class with the highest degree of similarity is the recognition result. This method is simple and easy, and can be processed in parallel; however, a template can only recognize characters of the same size and same type of font, and there is no good adaptability to slanting and thickening of strokes.

(2) Using the method of transforming features. The character image is binary transformed (such as Walsh, Hardama transform) or more complex transform (such as Karhunen-Loeve, Fourier, Cosine, Slant transform, etc.), and the dimension of the transformed feature is greatly reduced. However, these transformations are not rotation-invariant, and therefore there is a large deviation in recognition of obliquely-deformed characters. Although the calculation of the binary transformation is simple, the transformed features have no obvious physical meaning. Although the KL transform is optimal from the perspective of minimum mean square error, the amount of computation is too large to be practical. In short, the complexity of the transformation features is high.

(3) Projection histogram method. The character image is projected in the horizontal and vertical directions as features. This method is very sensitive to tilt rotation and has poor subdivision capabilities.

(4) Geometric Moment features. MK Hu proposed to use the moment invariant as a feature of the idea, causing the upsurge of research moments. The researchers also identified dozens of shift-invariant moments. We all hope to find stable and reliable features that are highly adaptable to various disturbances. Research in geometric moments reflects this desire. The above-mentioned geometric moments are all unchanged under the linear transformation. However, in the actual environment, it is difficult to guarantee the precondition of linear transformation.

(5) Spline curve approximation and Fourier Descriptor. Both methods are for character image outlines. The Spline curve approximation is to find the vertices with large curvatures on the contours. The Spline curve is used to approximate the contour lines between adjacent vertices. The Fourier descriptive function uses the Fourier function to simulate the closed contour, and takes the coefficients of the Fourier function as the characteristic. The former is very sensitive to rotation. The latter is inapplicable to character images that are not enclosed in contour lines, and therefore it is difficult to use for recognition of stroke-breaking characters.

(6) stroke density characteristics. There are many kinds of descriptions of the stroke density, and the following definitions are used here: The stroke density of a specific range of a character image is the number of penetrations in a range of a fixed number of scannings in a horizontal, vertical, or diagonal direction. This feature describes the degree of density of each part of the strokes of Chinese characters and provides relatively complete information. This feature is quite stable when the image quality is guaranteed. This feature is also often used in off-line handwriting recognition. However, the error is large when the strokes inside the character stick.

(7) Peripheral features. The contours of Chinese characters contain rich features. Even in the case of strokes inside the characters, the outline information is still relatively complete. This feature is very suitable as a feature of coarse classification.

(8) Method based on microstructure features. The starting point of this method is that the Chinese characters are composed of strokes, and the strokes are composed of a certain direction, a certain positional relationship and a rectangular section of aspect ratio. These rectangular segments are called microstructures. Using the features of the relationship between microstructures and microstructures to recognize Chinese characters, especially for the recognition of multi-body Chinese characters, good results have been obtained. The downside is that the extraction of micro-structures can be difficult when the internal strokes stick.

(9) Characteristics of feature points. As early as 1957, Solatron Electronics Group released the first OCR system using a peephole method. The main idea is to use different representative black dots (strokes) and white dots (background) in the character lattice to distinguish different characters. Later, some people applied this method to Chinese character recognition, adding descriptions of attributes such as endpoints, vertices, intersections, etc. to the black points. Also get better results. It is characterized by strong adaptability to the recognition of the character with internal strokes, and good intuitiveness, but it is not easy to be represented as a vector form. It is not suitable as a feature of rough classification, and it is difficult to match.

Of course, there are many different kinds of statistical features, such as graph description method, inclusion selection method, shelling perspective method, differential stroke method, etc., which will not be introduced here one by one.

Combination of statistical identification and structural identification

Structural pattern recognition and statistical pattern recognition have their own advantages and disadvantages. As we deepen our understanding of the two methods, these two methods are gradually merging. The gridding feature is the product of this combination. Character images are evenly or non-uniformly divided into regions, called "grids". Look for various features within each grid, such as the ratio of the stroke points to the background points, the number of intersections, the end points of the strokes, the length of the refined strokes, the stroke density of the grid section, and so on. The statistics of the features are in units of grids, and even if there are errors in the statistics of individual points, they will not have a large impact and enhance the anti-interference of the features. This method is increasingly used.

Artificial neural networks

Artificial Neural Network (ANN) is a network structure that simulates neurons in the human brain. It is an adaptive nonlinear dynamic system in which a large number of simple basic elements, neurons, are interconnected. Although the current research on human brain neurons is still imperfect, we cannot determine whether the ANN's working methods are the same as those of human brain neurons, but the ANN is attracting more and more attention.

The structure and function of each neuron in ANN is relatively simple, but the combination of a large number of simple neurons can be very complex. We can thus accomplish complex functions such as classification and identification by adjusting the connection coefficients between neurons. ANN also has some adaptive learning and organization capabilities. The "cells" that make up the network can work in parallel, and can accomplish complex functions such as classification and identification by adjusting the connection coefficients between "cells." This is what von Neumann's computer can't do.

ANN can be used as a simple classifier (without feature extraction, selection) or as a fully functional classifier. In the classification problem with a small number of categories such as the identification of letters and numbers in English, the image dot matrix of characters is often directly used as the input of the neural network. Different from the traditional pattern recognition method, in this case, the feature extracted by the neural network has no obvious physical meaning, but is stored in the connection of each neuron in neurophysics, eliminating the need for people to come by. Determine the method and implementation process of feature extraction. In this sense, ANN provides a possibility of "automatic character recognition." In addition, the ANN classifier is a non-linear classifier that can provide complex interclass interfaces that we can hardly imagine. It also provides a possible solution to the problem of complex classification.

At present, the ANN is very large and complex in terms of classification of superclasses such as Chinese character recognition, and it is far from being practical. There are many reasons for this, and the main reason is that we have not yet found a perfect answer to the way the human brain works and many of the ANN itself.

Disinfection Machine Thermometer

The temperature measurement and disinfection integrated machine integrates automatic body temperature detection and automatic dispensing of hand sanitizer, which is more convenient and cost-effective. Compared with the face temperature measurement all-in-one machine which is greatly affected by the outdoor environment, it has stronger environmental temperature adaptability and can still be used normally in complex environments. Even if it is applied outdoors, it will not be affected by the surrounding environment. And affect the accuracy of temperature measurement

Wall Mounted Thermometer Hand Sanitizer, Spray Dispenser Automatic Sensor Thermometer, Hand Temperature Measurement Dispenser

Guangdong Zecheng Intelligent Technology Co., Ltd , https://www.zechengsecuritys.com