Hybrid Architecture based on RNN-SVM for Multilingual Online Handwriting Recognition using Beta-elliptic and CNN models

Currently, deep learning-based approaches have proven successful in the hand-writing recognition ﬁeld. Despite this, research in this area is still required, particularly in multilingual online handwriting recognition by exploiting new network architectures and combining relevant parametric models. In this paper, we propose a multi-stage deep learning-based algorithm for multilingual online handwriting recognition based on hybrid deep Bidirectional Long Short Term Memory (DBLSTM) and SVM networks. The main contributions of our work lie partly in the composition of a new multi-stage architecture of deep learning networks associated with eﬀective feature vectors that integrate dynamic and visual characteristics. First, the proposed system proceeds by pre-treating the acquired script and delimiting its Segments of Online Handwriting Trajec-tories (SOHTs). Second, two types of feature vectors combining Beta-Elliptic Model (BEM) and Convolutional Neural Network (CNN) are extracted for each SOHT. Then, the SOHT feature vectors are fuzzily classiﬁed using DBLSTM


Introduction
Automatic handwriting recognition has regained more importance with the diffusion of intelligent mobile devices such as PDAs, tablet-PC, and smartphones equipped with sensitive pen or touch screens.These devices allow us to easily record online handwriting input in so-called digital ink format.This format describes temporal and spatial information on the sequence of points M i (x i , y i , t i ) sampling the pen or touch trajectory [1] (where x i , y i denote position coordinates at t i -th instant).The high popularity of these hand-held devices and their smart way to input data, coupled with recent major progress in machine learning especially the deep learning techniques in speech recognition, has invited researchers to adopt and develop deep learning approaches to build more efficient online handwriting recognition systems.
The writing style variability coupled with the incorporation of multilanguage words in the same online handwriting script pose some challenges for its recognition task.In fact, besides shape variation and opposed direction of script evolution such as Arabic and Latin, there are more character category depending on their positions in the word especially, for Arabic script.This makes it difficult to directly adapt the existing recognition systems for such tasks and invites to adopt new architecture based on the combination of deep learning and relevant complementary model.
One of the most relevant criteria to succeed in an online recognition system is the choice of a powerful handwriting model.A wide number of handcrafted features such as structural, parametric, and global features are described in the literature to overcome specific problems like occlusions, ligatures, and scale variations, etc.Moreover, many other properties are reported with human handwriting movements such as speed, amplitude and script size which can be modified involuntarily without alternating the global shape of velocity profile modeling the handwriting trajectory [2].The hybrid kinematics and geometric aspect of online handwriting generation movements are presented by the Betaelliptic model (BEM) [3] which has attained immense success in many fields such as handwriting regeneration [4] , writer identification [5] ], [6] and temporal order recovery [7] .The handwritten trajectory is normally segmented into n simple movements limited by curvature or velocity extrema called strokes.In our work, we are interested to exploit the great benefit of this BEM in online handwriting trajectory modeling.
Lately, deep learning technologies [8] have attracted immense research works and industrial applications.These new categories of methods provide a powerful representation for end-to-end solutions.For instance, CNN is a prominent deep learning method which automatically extracts useful features from low to high level.It has been widely used in pattern recognition field with excellent results.Also, Recurrent neural networks (RNNs) have been employed successfully in online handwriting recognition.It is very effective with LSTM for handwriting sequence generation [9] and produces top results in online handwriting recognition [10].Motivated by these ideas and noting the great diversity of writing styles, we are interested in applying a CNN and RNN-based model in our problem.In this paper, we present a multi-stage system that is used to recognize online handwritten multi-script text based on DBLSTM and SVM algorithm.
The main idea is to combine two different approaches for handwritten trajectory representation: the enhanced BEM which is described by a combination of two aspects of online data modeling such as the geometric and dynamic profiles, and the CNN as an efficient feature extraction model applied after transforming the online trajectory into image-like representations.We are interested in overcoming the limits of the online module by allowing the recognition system to apprehend the different writing styles that differ in the timeline of the ink plot but are similar in the final layout.The feature set we employed is described in section 6.Indeed, we compare the performance of single classifiers and their combination with the aim of enhancing the global system discrimination using three available datasets described later.
The remainder of this paper is organized as follows: In section 2, we present an outline of the most prominent related works.Section 3 provides a brief description of the script on which we mainly worked.An overview of our proposed recognition process is described in section 4. Section 5 summarizes the pre-processing techniques that are used in our work.We describe in section 6 the proposed strategy for handwritten trajectory modeling and pre-classification step of SOHT.The recognition process that includes SOHT fuzzy classification, regularization, data augmentation and script recognition are presented in section 7. Finally, section 8 describes the evaluation of the proposed system using three databases and discusses the obtained results before ending with a conclusion and future scope of work.

Related work
Many studies have been done on online handwriting recognition [1], [2].Besides, it remains an active area of research because practical applications are relatively recent, and its technology has been applied in many fields such as artificial intelligence, computer engineering, image analysis, etc.In the literature, studies dealing with online handwriting recognition can be classified into two main categories: the conventional or traditional approaches and deep learningbased approaches.

The conventional approaches
Generally, conventional approaches require an enormous effort of engineer's expertise to design a representative features extractor.Various studies have been made in this context such as decomposition of the online signal into elementary stroke based on Beta-Elliptic model, characteristic strokes [3], or grapheme unit like presented in [11] which is based on baseline detection algorithm.Also, tangent differences and histograms of tangents features are used to represent online Arabic characters [12].A framework based on statistical features of online Arabic character recognition are investigated in [13].Within the context, some studies implement algorithms of the conventional pattern classification which include decision trees [14], template matching [15], Hidden Markov model (HMM) [16], neural networks [17], and support vector machines (SVM) [18].More recently, Zitouni et al. [19]have proposed a two-stage SVM classifier for online Arabic script recognition which is based on a combination of beta-elliptic and fuzzy perceptual code representation.Also, a reinforcement learning based-approach for online handwriting recognition is proposed by [20].It consists of extracting structural features using Freeman chain codes and elementary perceptual codes (EPCs), parametric features employing beta stroke theory after segmenting the handwriting trajectory into strokes.

Deep learning-based approaches
Deep learning models extract features automatically from the raw sequential data.Recently, the performance of online handwriting recognition systems has been significantly enhanced thanks to the employment of models such as convolutional neural networks (e.g., [18]), [21]), Deep Belief Network (DBN) (e.g., [22]) which have been broadly applied for handwriting recognition.It has considered as a powerful tool for image classification.Some research work use CNNs on the online characters after converting the handwriting trajectory to image-like representations.Also, Yuan et al. [23] proposed word-level recognition for Latin script by employing CNN architecture.In this article, CNN is used for recognizing online handwritten document after word segmentation.
Likewise, CNNs have been widely applied for handwritten Chinese text recognition [24].It is considered as a powerful model that automatically generates multiple lower-level and high-level features.
Furthermore, RNN has been used successfully for online handwriting recognition due to their ability to model sequential data.It gives good results with Long Short-term Memory (LSTM) for online Arabic character recognition based on a rough path signature [25].Likewise, Ghosh et al. [26] have performed a word recognition for both cursive and non-cursive Devanagari and Bengali scripts based on LSTM and BLSTM recurrent networks.In this article, each word is partitioned into three horizontal zones (upper, middle, lower).Then, the middle zone is re-segmented into basic strokes in each directional and structural feature before carrying out training the LSTM and BLSTM.To improve the performance of deep BLSTM in online Arabic handwriting recognition, Maalej et al. [27] employed three techniques such as dropout, Maxout, and ReLU activation function.These techniques are used in a different position on BLSTM layers and provide a good result.The dropout technique is used to solve the overfitting problem.The second technique is based on rectified linear units ReLU which is localized between the LSTM hidden layers in order to reduces the label error rate.For the third technique, the Maxout units are integrated inside the LSTM nodes, and also stacked on a separate layer after the BLSTM layers.
Combinations of these techniques have also been investigated.An instance of this is CNN with deep BLSTM (e.g., [28]) in which CNN allows automatically generating features from the scanned ink sequence, while BLSTM has modeled frame dependency within the sequence.RNN with BLSTM (e.g., [29]) and BLSTM with gated recurrent unit (GRU) have been used for online Chinese character recognition and generation [30].
Further, several researches focus specifically on mono-language script, for instance, Arabic [12], Chinese [7], Tamil [31], Japanese [26] etc.Similarly, there exist multi-script and multi-language systems for online handwriting recognition like those presented in [32], [33] as well as other commercial systems such as those developed by Apple [34] and Microsoft [29] organizations.

Language specification
In this section, we briefly provide an overview of Arabic and Latin scripts supported by our system, considering the writing diversity of these scripts compared to others.
The Arabic script is used in multiple languages such as Persian, Pashto, Kurdish, and Urdu [13].More than 420 million people of the world use the Arabic as a major language and script (UNESCO 2012) [35].Arabic script is generally written in a cursive manner, most of these letters are connected to their neighboring characters.It generally contains 28 basic letters with other 10 marks that can change the meaning of the word.Also, there are 10 digits for usual digit recognition tasks.While there are 26 alphabetic letters for English, there are over 28 in Arabic.In running text, Arabic characters are written from right to left, and also contains small marks and dots.Moreover, the letters have multiple shapes that depend on their position in the word.We can distinguish four different forms (Beginning, Middle, Isolated and End form) of the Arabic letters according to their position within a word, as shown in Table .1.However, some characters have the same beginning and isolated shapes (eg., Alif ' ', Raa ' ').Also, several letters have the same body (e.g., Baa ' ', Thaa ' ') but differ only in position and the number of dots and diacritical marks which may be accidentally misplaced in handwriting which emphasizes the fact that Arabic character is more difficult than many other scripts.Further detail of Arabic characteristics and difficulties of writing are available in [36].
The Latin script serves as the most widely adopted writing system in the world.It is used as the standard set of writing glyphs for 59 languages.Depending on the writing style, Latin script can be written cursive or semi-cursive and the characters' shapes vary accordingly.In addition to the basic 2 * 26 characters (capital and minuscule shapes), it also supports accented characters.
Unlike Arabic script, the Latin script is written from left-to-right.

System overview
In this section, we introduce an outline of our proposed system of developing a multi-stage architecture based on a hybrid of DBLSTM recurrent neural networks and SVM for multilingual online handwriting recognition.Indeed, the main purpose of our work consists of how to adapt and exploit the effectiveness of BEM and CNN models in online multilingual handwriting recognition.
As shown in Fig. 1, the architecture of the proposed system proceeds as follows: First, the input signal (x, y) is normalized and denoised by a preprocessing technique.Second, the handwriting trajectory is divided into continuous components called SOHTs (Segment of Online Handwriting Trajectory) each of which represents trajectory segments limited between two successive points: starting points (pen-down) and ending points (pen-up).Third, we extracted two types of feature vectors for each SOHT: the online hand-drawn trajectory parameters are extracted using BEM, and the generated generic features by

Preprocessing
The handwriting trajectories are collected online via a digitizing device.
These trajectories are characterized by high variation which requires geometric correction and denoising steps to reduce the handwriting variability, eliminate noise, and normalize the handwriting size.Given the handwriting trajectory, the Chebyshev type II low-pass filtering with a cut-off frequency of f cut = 10Hz is applied to attenuate the effect of noise and errors caused by spatial and temporal quantification of the acquisition system.The value of cut-off frequency results from a trade-offs between the conservation of the handwriting ripple produced by young agile writers and the elimination of the physiological tremors generated by other older writers [37].
The used filter has a magnitude transfer function given by: Where ω s is a constant of frequency scaling, and ε is a constant that adjusts the influence of C N (ω s /ω) in the denominator of |H(jω)| 2 .Consequently, it is noted that the hyperbolic cosine is employed in Eq. 2 for low frequencies, and, from Eq. 1 that this results in a response near unity; the trigonometric cosine is used for high frequencies beyond ω s resulting in a rippling response of small magnitude.
As shown in Fig. 2, the frequency response R(f ) of the low-pass filter Chebyshev type II is described by the following Eq. 3.
) and A is the needed sidelobe attenuation in decibel (dB).Fig. 3 shows an example of applying the low-pass filter Chebyshev type II on online handwriting trajectory of Arabic character ' '.
Three decomposition zones namely upper, core, and lower regions are extracted from the level of the horizontal fulcrum of handwriting trajectory.Therefore, a handwriting size normalizing procedure is applied to adjust its height to a fixed value h = 128, while retaining the same length/height ratio [3], [4].The pre-processing techniques are used and tested for the supported scripts such as Arabic, Latin, and digits.After denoising and eliminating the handwriting vari-

Feature extraction and SOHTs pre-classification
In this section, we describe the feature extraction method for the shape tra-liorate the recognition rates.Indeed, we identify two types of feature classes in our work: 1. Progressive plot features are extracted for each stroke after segmentation using BEM.In this context, we benefit from this model to represent the dynamic (velocity) and static (geometric) profiles of the online handwritten trajectory.
2. Post-drawing or perceptive features are extracted from the bitmap of the image using the CNN model.

Beta-Elliptic modeling (BEM)
The BEM is derived from the kinematic Beta model with a juxtaposed analysis of the spatial profile.It considers a simple movement as the response to the neuromuscular system, which is described by the sum of impulse signals [38] as the Beta function [1].The specificity of BEM is the combination of two aspects of the online handwriting stroke modeling: the velocity profile (dynamic features) represented by a beta function that culminates at a time t C coinciding with a local extremum (maximum, minimum, or double inflexion point) as shown in Fig. 4.a), and an elliptic arc as illustrated in Fig. 4.b) modeling the static (geometric) profile for each stroke of the segmented trajectory.We describe in the following sub-section how the BEM process works.

Velocity model
In the dynamic profile, the curvilinear velocity V σ (t c ) shows a signal that alternates between extremums (minima, maxima, and inflexion points) which determine and delimit the number of trajectory strokes.For BEM, V σ (t c ) can be reconstructed by overlapping Beta signals where each stroke corresponds to the generation of one beta impulse described by Eq. 4: with Where t 0 and t 1 are respectively the starting and the ending times of the generated impulse which delimiting the correspondent trajectory stroke, t c is the instant when the beta function reaches its maximum value as depicted in 255 Eq. 5, K called impulse amplitude, p and q are intermediate shape parameters.
As described in Eq. 6, the velocity profile can be reconstructed by the overlapped beta signals.Some examples of the velocity profile modeling of the online Arabic script like character ' ' and word ' ', and Latin character 'a' was presented in Fig. 5.a), c), and e), respectively.

Trajectory modeling
In the space domain, many studies have been introduced for generating online handwriting trajectories .In Bezine et al. [4], each stroke located between two points M 1 and M 2 is included to an elliptic trajectory checking Eq. 7, where X and Y denote the cartesian coordinates along the elliptic stroke, a and b are 265 respectively the minor and major axis dimensions.
Also, an elliptic arc is described by the trajectory tangent on their endpoints M 1 and M 2 which is invented by [39].Indeed, each elementary beta stroke situated between two successive velocity extrema times can be represented by

Hybrid Beta and Elliptical models
As mentioned previously, each simple stroke is described by 10 parameters summarized in Table 2.The first six beta parameters give the overall temporal properties of the neuromuscular networks implicated in motion generation, whereas the last four elliptical parameters present the global geometric properties of all the muscles and joints inducted to execute the movement.

CNN features
In this sub-section, we briefly explain the use of CNN model in our con-   lengths and the depth of the CNN entity, respectively.As shown in Fig. 6, the CNN output in the trajectory modeling step is noticed by and F i ∈ R D .

SOHTs pre-classification
After segmentation of the handwriting trajectory into SOHTs and the ex-  The LSTM recurrent networks have memory, since they maintain a state vector that implicitly contains information about the history of all the past elements of a sequence.Hochreiter et al. [40] have introduced this LSTM network which is often employed to learn longer-term dependencies and reduce the vanishing gradient problem [41].As shown in Fig. 7, an LSTM layer is a special kind of RNN composed of multiple subnets called memory blocks which are recurrently connected.Each block includes a set of internal units known as cells which are used both for storing and accessing information over a long time duration.The activation of these cells is carry out by three multiplicative 'gate' units.At time-step t, there are input gate i t , forget gate f t , output gate o t , which are presented by the three equations Eq. 8 -Eq.10.
Where W * represents the input-to-hidden weight matrix, U * is the state-tostate recurrent weight matrix, and b * is the bias vector.
i t and f t are used to control the updating of c t which in turn saves the longterm memory.• is the operation denoting the element-wise vector product.
The output gate o t is used to control the updating of h t as shown in Eq. 13.
Given an input sequence [x 1 , x 2 ,..., x t ], we obtain a hidden state sequence of [h 1 , h 2 ,. . ., h t ] by passing the input signal through the forward LSTM.
In various tasks, it is necessary to employ both past and future complementary contexts instead of the RNN that only uses past contexts.In fact, to build a bidirectional LSTM (BLSTM) model, we combine two LSTM sub-layers in forwarding and backward directions [28].Indeed, we can obtain another hidden state sequence of [h 1 , h 2 ,. . ., h t ] by passing the reversed sequence of [x t , x t−1 ,..., x 1 ] over the backward LSTM.All the hidden states are then fed to a fully connected layer followed by a softmax layer for final classification.The details of BLSTM architecture are described in the experimental section.
As mentioned previously, each trajectory T is composed of a number of SOHTs, T ={SOHT 1 , SOHT 2 ,. . .SOHT n }.In the training track, each SOHTs sample is assigned to the most likely group C j , j=1. . .K, according to its visual offline features stemming from CNN.Thereafter, this unsupervised assignment will serve to train two DBLSTM networks used for SOHTs fuzzy labeling.Indeed, the SOHTs labeling stage considers two types of feature sets for each handwriting segment SOHT i , i=1. . .n.
The first set X i on combines dynamic and geometric features extracted using BEM model while the second X i of f contains the post drawing features extracted by CNN model.As shown in Fig. 6, the output of each DBLSTM is a vector of size K : {P (X ion |C 1 )P (X i on |C 2 )...P (X i on |C K )} and probabilities of the i th analyzed handwritten trajectory segment to the K SO-HTs groups for both online and offline branches respectively.

Regularization and data augmentation
Regularization is an important process to enhance the efficiency of deep recurrent neural networks.The most commonly used regularization method is the dropout technique [42].It is used in the training phase to avoid the overfitting problem by randomly dropping hidden units.In our case, we use the dropout technique in both input and fully-connected layers with the probability of 0.3 which is similar to the approach taken in [30].Also, the wide number of training data is another key to the success of deep neural networks.To increase our training dataset, we adopt the data augmentation strategy which is widely used in image-based recognition systems [43].This approach proceeds by generating randomly distorted samples from those composing the original dataset.For this, we modified some parameters like the inclination angle of the trajectory, baselines, handwriting italicity, smoothing, etc.These techniques increase the training set, generate more training samples, and also bring more variation in the training set.

SVM fusion engine for script recognition
After SOHTs fuzzy labeling, the outputs of DBLSTM of both online and offline sub-system are treated as input for the SVM classifier to make decision fusion.Indeed, SVM is considered as an efficient tool for both linear and nonlinear classification based on a supervised learning algorithm.It has proven high success in several practical applications such as pattern classification.As shown in  is described as: Where γ are the parameters that will be determinate empirically, x and x' denote the input vectors respectively, φ is a nonlinear transform features space.
For the recognition of the overall handwritten script, the SOHTs vectors belonging to the same online trajectory (word, letter, digit, etc) are gathered to form the fuzzy SOHTs membership matrix P on (i, j ) and P of f (i,j ) provided as input layer for the SVM which allows to merge local decisions and determines the classification of the overall handwritten script.

Experiments and results
In this section, we describe our experimentation which is performed on multilingual online handwriting recognition of the scripts.The utilized datasets are initially presented followed by the conducted ablation studies and discussion of the results.Here, two metrics: the Character Error Rate (CER) and Word Error Rate (WER) were employed to evaluate the proposed system defined as the percentage of characters or words recognized correctly.These metrics are evaluated with the output of each DBLSTM and also with the fusion of them using SVM engine.Next, we compare our approach with those of the state-ofthe-art approaches using the same databases.Finally, we present some strengths and limitations of the proposed system in the error analysis section.All tests were implemented using core i7 processor of 3.2 GHz, 8 GB of RAM.

Datasets overview
One of the most difficult problems of online handwriting recognition is the requirement for a standard database that serves a variety of writing styles and includes the various classes in the target language.In order to test the performance and efficiency of our proposed system, we have used three datasets: Online-KHATT [45] for isolated Arabic characters, ADAB [46] for Arabic words, and UNIPEN [47] for Latin characters and digits.In this section, we describe these publicly available datasets.

Online-KHATT dataset
Online-KHATT is a new open-vocabulary Arabic dataset proposed by [36].It is composed of 10,040 lines of Arabic online text taken from 40 books, which are written by about 623 participants using android and windows-based devices.
The authors claim that this database is challenging which consists of many problems such as thickness, writing styles, dots number, and position.In our experiment, we use a subset of the segmented characters of this dataset [13] that contain 44,795 data after using data augmentation technique.The subset is divided into a training set with 30,795 samples, a validation set of 4,000 samples, and a test set of 10,000 data.As described in the previous section, the shapes of the letters differ depending on the position in the word they are found, which makes a total of 114 different shapes in this dataset.

Unipen benchmark dataset
The Unipen dataset contains three sections 1a, 1b, and 1c for digits, upper, and lowercase Latin characters.The size of each data section is 16K, 28K, and 61K respectively.This database is divided into training, validation, and test sets.Table 3 summarizes the total number of characters for each section.In order to increase the training accuracy, we have applied the data augmentation technique.We used sets 1, 2, and 3 which contain more than 45,158 words for the training process.We employed sets 5 and 6 which contain 4,000 samples for validation.Then we tested our system using 8,417 words from set 4.

Ablation studies
To study the impact of the proposed architecture for recognizing multilingual scripts using hybrid DBLSTM and SVM engine on both online and offline branches, we have designed three groups of experiments.
The first experiment is related to the pre-classification step which consists in choosing the value K corresponding to the best grouping of SOHTs.After trajectory segmentation, we construct a dataset composed of more than 200.000 SOHTs.These were later classified into K groups by using k-means clustering algorithm based on extracted CNN parameters.After several tests, the value K is fixed empirically to 210, which represents the number of groups that return the smallest sums of intra-cluster points to their centroid distances.70% of this dataset is used for training the next stage composed of DBLSTM and the rest for the classification tests.The second experiment consists in choosing the architecture that offers the best CNN feature.For this, we tried several settings of CNN to train the offline system on the SOHTs training dataset.As illustrated in Table 5, three variations of CNN model were designed by changing the number of layers and filters in each layer.We chose CNN3 with 13 convolution layers and 4 Max-Pooling layers similar to Zhang et al. [30] because it is the most efficient architecture which generates a high accuracy on SOHTs dataset.The input layer is of size 32x32 of gray-level image.The applied filter of convolutional layers is 3x3 with a fixed convolution stride of one.The feature maps dimension in each convolution layer is increased gradually from 50 in layer-1 to 400 in layer-12.After three convolutional layers, a max-pooling 2Ö2 window with stride 2 is implemented to halve the size of the feature map.The network is trained using its parameters: a stochastic gradient descent with momentum SGDM, the learning rate was 0.001, and 100 mini-batch size.We use also the dropout technique with 0.2 only in the last layer.

DBLSTM parameters
In the proposed system, we used the same architecture of DBLSTM for SOHTs fuzzy classification on both online and offline data, respectively.The network parameters in training data are fixed after several tests.The best topology of our network architecture is composed of three bidirectional hidden layers which give good results.The size of the input layer depends on the dimension of feature vector using BEM or CNN respectively.We have also changed the number of nodes (32, 64, 128, 256, and 400) in the forward and backward hidden layers.The best number of nodes applied in our DBLSTM is 400 nodes as described in the experimental results section.The size of the output layer is defined by the number of SOHTs class.Dropout is used in fully connected layer with a probability of 0.3.The network is trained using its parameters: a SDGM and mini-batch of size 200.The training process is started with an initial learning rate of 0.001 and 400 maximum number of epochs.A categorical cross-entropy loss function is used to optimize the network.After each epoch, the training data is shuffled to make different mini-batches.

Experimental Results
One of the most important problems of handwriting recognition is the selection of the relevant set of features, that has been the subject of several studies [48].Since the focus of our work is to show that the used method which is based on the combination of two approaches of handwriting representation described above, can be useful in online handwriting recognition.To understand the effectiveness of our method, we have planned three groups of experiments: one is based on dynamic and geometric features obtained by using BEM, the second is based on bitmap feature from CNN model while the third is on the fusion of the two.

Experiments on Online-KHATT
To evaluate the performance of our proposed system for Arabic characters, we used Online-KHATT dataset described previously.As shown in Fig. 9 and Table 6, we conduct further study by changing the number of layers and nodes per layer for both BEM and CNN features to determine the optimal size of the BLSTM model.We observe that for the two input features, the use of 3 layers surpasses more shallow networks, and the addition of more layers brings almost no improvement.Further, using 256 nodes per layer is enough, as larger networks provide only limited improvements, if at all.this, we carried out some tests using SVM engine with different kernel functions.
As shown in Table 7, the CER on Online-KHATT dataset has been decreased with RBF function to 5.25% with the hybrid of the two models.Moreover, we compare our system to others that also experimented on online Arabic character recognition.The results are summarized in Table 8 which 520 present for each work: the used classifier, feature extractor model, and accuracy of the results.We can see that the performance obtained by our system is better than the two presented systems which are also trained on the same training dataset.We observe that the obtained results on online-KHATT dataset are significantly better than those found by [13] and [25].This is due to the use of the complementarity models for handwriting representation on the one hand which is described simultaneously by the geometric and dynamic features extracted by using BEM and the strength of a deep learning CNN model, which represents the discriminating power to differentiate Arabic characters.On the other hand, the combination models allow the enhancement of the global system discriminating power.LSTM [25] Path signature feature 7.43 Our system using DBLSTM-SVM BEM + CNN 5.25

Experiments on UNIPEN
We evaluate the performance of our system on the test set of online signals from UNIPEN dataset distributed in three sections 1a, 1b, 1c.Table 9 shows comparison results with other previous studies using the same database.We observe that the results achieved by our system using the combination of the two models are better than the 'writer-independent' experiment reported by [49] that uses cluster generative statistical dynamic time warping (CSDTW).
This indicated that our proposed method is also robust and very promising for online handwriting Latin characters and digits.Again, we refer to more recent work [50] which uses LSTM with CTC.We note that the recognition rate is very competitive.It would be noted that UNIPEN contains very difficult data because of wide variety of writers, mislabeled, or noisy data.the obtained WER results show the effectiveness of our architecture using single models.In the same way, the lower WER is achieved when we use 3 layers of BLSTM with 256 nodes.
Further, the performance of our system has been gained when we combined both DBLSTM models.As shown in Table 7, the low WER is 1.28% which was obtained employing SVM with RBF function.
To better understand the efficiency and the robustness of our system, we discuss the performance compared to the previous state-of-the-art systems.It may be noted from Table 11 that our system provides better results compared to other works.Indeed, despite the association of on-line and off-line data are previously used in related works for handwriting recognition, our developed system is distinguished from the literature [18], [16] by the integration of multi-stage deep learning networks for the analysis and the fuzzy classification of both online and  We demonstrate in this sub-section the strengths and limitations of the present system by depicting some examples of correct and misclassified sequences.Fig. 11 shows some examples of handwritten characters, words, and digits from the used datasets, processed by our recognition system.Each item is annotated with the label of the corresponding handwritten proposition.
We can identify some samples that are not recognized by one of the used models and corrected by the other and vice versa.For instance, the first and third top samples of corrected partition are well classified by the online BEM and not handled well by CNN.Likewise, the second, fourth, and right fifth samples are recognized only by CNN model.Based on these observations, the accuracy of our global system can be increased by using the combination of the two modules which explain the complementarity of them.
However, despite the good results achieved by our system, we notice that the result obtained using Online-KHATT and UNIPEN datasets remains to be improved.This is due to the strong similarity between several characters and the great diversity of writing styles.We note that some examples are quite distorted such as the first sample of ' ', the third sample of ' ' for the Arabic script as well as the '3' and '9' for digit samples in error samples partition.
Also, we notice that the first and the fourth sample of ' ' on the one hand, the

Conclusion and Future Work
We presented in this study a multi-lingual online handwriting recognition system based on the combination of DBLSTM and SVM.The proposed system divides the online handwriting trajectory into SOHTs and the pre-classified into sub-groups.Then, two online handwriting representation models are employed: 600 the BEM which is characterized by the presence of dynamic and geometric features of online pen-tip trajectory, and a powerful feature extracted using CNN model.Indeed, we employ the extracted features to evaluate the classification system and compare the performance of single classifiers using DBLSTM.We proceed also by combining the obtained results using SVM to improve the dis-605 crimination power of the global system.
The effectiveness of the corresponding models, and of a combination of these, was evaluated in experiments using three databases; online-KHATT for on-line Arabic characters, ADAB for Arabic words, and Unipen for Latin characters and digits.By combining models, the obtained results suggest that our proposed method is well suited to online handwriting recognition compared to other works in the state-of-the-art which can be demonstrated by the complementary and ascendancy of both models.
We have also demonstrated that the used features extraction models are rather generic and their applicability in the other scripts such as Persian, English, Chinese, etc., is interesting and we consider as future work.Again, we schedule to extend our method for continuous text recognition.
using the last CNN layer after transforming the handwriting cursive trace to bitmap image.After that, SOHTs are fuzzy classified into k sub-class defined by k-means unsupervised algorithm in the training phase.The SOHT's fuzzy classification module uses as input the extracted BEM and CNN feature vectors which train two DBLSTMs networks integrated with the online and offline branches, respectively.Finally, the description of the fuzzy output obtained by the two DBLSTMs are combined using SVM to improve the discriminating power of the global system.In the following sections, we introduce each module 190 in detail.

Figure 3 :
Figure 3: (a) An acquired trajectory of Arabic character ' ' and (b) after Low-pass filtering and smoothing.

Figure 5 :
Figure 5: Online handwriting modeling of Arabic character ' ' and word ' ' and Latin character 'a' with BEM.(a), (c) and (e) describe the velocity profile, (b), (d) and (f) represent the geometric profile.
an elliptic arc described by four geometric characteristics such as: a, b, θ, θ p as shown in Fig. 4. b).There a and b define respectively the half dimensions of the major and the minor axes of the elliptic arc, θ is the inclination angle of the ellipse major axis, and θ p denotes the trajectory tangent inclination at the minimum velocity endpoint.These parameters reflect the geometric properties of the end effector (pen or finger) trace, dragged by the set of muscles and joints involved in handwriting.Fig. 5.b), d), and f) depict respectively some examples of modeling geometric profile of the same chosen Arabic and Latin samples.
text.Inspired by the recent progress of deep learning technologies in different topics, we have employed CNN architecture in the off-line script bitmaps reconstructed from the SOHTs database.The latter represents only the final layout of the hand drawing that skirts the chronologic style of its generation.Indeed, the CNN features are extracted from the offline SOHTs by a convolution network that contains multiple convolutions and max-pooling layers.Also, the batch normalization and dropout techniques are applied to enhance the feature extraction step.The details of the CNN architecture are described in the experimental section.The input image (32Ö32) is transformed into CNN-feature of size (Bachsize, L, D).Specifically, Bachsize is fixed to 32, L and D are the Angle of inclination of the tangents at the endpoint M 2 .

300
traction of their offline visual features, we obtained an extensive database of multi-lingual handwriting segments with an indefinite number of labels due to the large variability of writers' handwriting styles, especially when mixing cursives and discrete styles.Thus, having failed to manually assign a label to each SOHT, we chose to use the k-means unsupervised clustering algorithm for The considered number of sub-groups K is defined empirically to maximize the recognition rate.Indeed, the change of the value K leads to the modification of the network's accuracy of each classifier and consequently on the overall fusion recognition system.

Figure 6 :
Figure 6: SOHTs pre-classification and online handwriting script recognition process.

Fig. 8 .
Fig.8.a), SVM usually determines an optimal separating hyper-plane by adopting a novel technique that maps the sample points into a high-dimensional feature space using a nonlinear transformation.It was originally designed to solve binary classification problems.However, it can be employed also to solve multiclass problems (see Fig.8.b) using several methods such as one-versus-all [44] based on dot product functions (kernel functions) in feature space.Examples of these functions, linear, Radial Basis Function (RBF), sigmoid function, etc.In our context, we use RBF function as a kernel in the hybrid model which

Figure 9 :
Figure 9: CER of trained DBLSTM models on the Online-KHATT dataset with different numbers of LSTM layers and nodes using BEM and CNN inputs.The solid lines illustrate the results using BEM, and dashed lines indicate the results with CNN model.

Figure 10 :
Figure 10: WER of DBLSTM models trained on the ADAB dataset by modifying the numbers of LSTM layers and nodes using BEM and CNN inputs.

560
offline data.Also, we adopt the decision model merger between the fuzzy classification results obtained by the two branches classifiers of respectively online and offline data using SVM engine.This new architecture has demonstrated its efficiency in the test phase by achieving advantageous WER 1.28% concerning the state of the art and in particular, comparing to the system of[51] which 565 uses deep learning network only on the branch of offline data processing.
first and the third sample of ' ' on the other hand are legible but misclassified, most likely because they have an appearance rare among this letter's training examples.Likewise, a confusion between different forms of the position of the same character such as isolated ' ' and the end ' ', and the same forms of position like middle ' ', and middle ' ' which can be explained by the wrong identification of the delayed stroke.Other examples are not recognized by our system for Latin scripts such as the three samples of the uppercase characters 'F' and 'C', the lowercase of characters 'b' which can be explained by the diversity of writing style and the similarity between many characters of these scripts.

Figure 11 :
Figure 11: Some corrected and misclassified samples of Arabic, Latin characters, and digits.

Table 1 :
Some Arabic characters in different forms

Table 2 :
Features Extraction generated by using BEM.

Table 4 ,
[46] database is partitioned into six distinct sets which were collected for the ICDAR 2011 online Arabic handwriting recognition competition[46].

Table 5 :
CNN configuration: Conv means Conv, Norm-ReLU layer.The convolution and max-pooling layers' parameters are represented as "Conv: the size of filter" and "Max pool: filter size", respectively.

Table 6 :
CER (%) on the Online-KHATT dataset for different BLSTM layers and nodes.For the last stage of the proposed system, we combined the outputs of the DBLSTMs in order to improve the results.To do

Table 7 :
Error Rate (ER) on Online-KHATT and ADAB datasets using SVM engine with different kernel function.

Table 8 :
CER on Online-KHATT in Comparison to the state of the Art.

Table 9 :
CER on UNIPEN Data compared to the State of the Art.We have also carried out experiments on the test set of the ADAB database using the same network architecture.As summarized in Table10 and Fig 10,

Table 10 :
WER (%) on ADAB Data for different BLSTM layers and Nodes.

Table 11 :
WER on ADAB Data compared to the State of the Art.