Towards an Automated Model-Based Test Generation Approach from Controlled Natural Language Speciﬁcations Written in Arabic

. Model-Based Testing (MBT) technique is a quite interesting tool for evaluating and validating the behaviour of a large class of systems from diﬀerent ﬁelds such as smart cities, internet of things, cloud ar-chitectures, web services, telecommunications, transportation, electronic games, etc. This tool is based on strong and sophisticated formal foun-dations which makes it perfectly appropriate for automated validation processes adoption. In this paper, we propose a new model-based testing approach which takes as input a set of requirements described in Arabic Controlled Natural Language (CNL) which is a subset of the Arabic language generated by a speciﬁc grammar. The semantics of the considered requirements is deﬁned using the Case Grammar Theory (CTG). The requirements are translated into Transition Relations which serve as an input for test cases generation tools. We illustrate our approach by a simple case study from the automotive domain.


Introduction
Nowadays, testing [18] is a very important phase in the development of any computer or electronic systems for a large variety of systems, software and technologies in different domains and applications such as smart cities, internet of things, cloud architectures, web services, telecommunications, transportation, electronic games, etc. Indeed, these tests should contribute to the detection and elimination of different types of errors and anomalies which could cause enormous damage due to the great complexity and inter-connectivity of the systems used in our daily life. Different types of tests exist such as unit testing [10], integration testing [40], system testing [5], regression testing [13], performance testing [37], security testing [25,23,24], etc. These different test scenarios can be executed either in system development time or at runtime after the system has been put into operation. In general, this testing phase corresponds to a very costly phase in terms of time and resources.
The majority of software requirements specifications are written in natural language, making requirements engineering artifacts a great starting point for the creation of software products.
According to [42] for the automation of the testing procedure, it is possible to identify three important phases: 1. Automatic generation of test scenarios from requirements descriptions; 2. Automatic execution of test scenarios; 3. Automatic valuation of results.
In this work, we focus mainly on the first phase. The model-based testing (MBT) is an efficient methodology for automating these steps. In this case, test scenarios are extracted from behavioral models [11] written using pure formal specification languages or semi-formal languages (such as UML [38]).
Testing practitioners frequently breakdown the system in multiple ways or utilize scenarios to generate test cases for model-based systems, after which models are produced for each scenario. These intermediate models are used to produce the test scenarios. The complexity and time necessary to discover scenarios, the accompanying intermediate models, and, as a result, the generated test cases are among the drawbacks of model-based techniques. Typically, the initial requirements specifications are ideal inputs to begin the testing process since they allow for cost reduction and increased efficacy citeNogueira2014.
Our approach is intended to work as follows. Initially, it should parse the textual system requirements to check whether they conform to the CNL structure. Our CNL is a precise and non-ambiguous subset of the Arabic language. After parsing, we need to provide a semantic interpretation for the requirements [2]. This technique was initially proposed by [8]. From the obtained case frames, the requirements' semantics need to be translated into an internal model representation. Based on this model, we will be able to generate test scenarios using the its SMT solver and the RT-Tester. Section 2 gives a short overview about the Arabic Language characteristics. Section 3 is dedicated for test generation phase description. Section 4 addresses related work, and Section 5 presents conclusions and future work.

Characteristics of the Arabic Language
Arabic [14] is a Semitic language spoken by more than 400 million people across 22 countries in the Middle East and North Africa. With such a number of native speakers, it is considered as the fifth most common spoken worldwide language. The specificities of this language differentiate it from other languages (Indo-European languages) and make it worthy of study. These specificities occur at many levels: morphology, syntax and semantics as illustrated below. The writing and reading systems of Arabic go from right to left and contain in all 28 letters. Concerning the words, they can be divided into three basic types: Nouns, Verbs and Particles . It is based on a trilateral root system, which means that words (verbs and nouns) are derived from roots of three letters (consonants). Various vowels, prefixes and suffixes are added to the initial roots to form different meanings surrounding that root. Such a system makes it easy to understand the meaning of unseen words by looking at their root.
Moreover, Arabic is known for its morphological richness. For instance, there are up to 1000 synonyms to refer to a camel, 500 for a lion, 52 synonyms for darkness, and 34 for the rain. Arabic, unlike many other languages (Indo-European languages), contains two types of sentences: nominal sentences (or clauses) and verbal sentences (or clauses), the main difference between the two types is the start of the sentence. In a Verbal sentence, the sentence starts with a verb which is followed by a subject while a nominal sentence starts with a noun (or a pronoun) which is the subject and is flowed by a predicate (noun, adjective, preposition and noun, or verb).

Specification Analysis and Test Generation
Our work is mainly inspired by the works of [6,7,44]. In our work, we adopt "Natural Language Processing" (NLP) for parsing every system requirement with respect to our Arabic-CNL (Figure 1). For all valid requirements, the parser produces "corresponding syntax trees" (ST).
A "Controlled Natural Language" is a subset of a given natural language which uses a limited set of grammar rules and a predefined lexicon which contains the vocabulary of the considered application domain. CNLs are mainly considered for avoiding textual ambiguity and complexity. The Arabic-CNL was defined for producing unambiguous requirements in Arabic language. We will adopt the Case Grammar linguistic approach [2] for representing the semantic meaning of the Natural Language. The obtained representation is then translated into a new model the formal semantics of which is represented using a transition relation.
The behavior of the system is modeled by means of state machines, described using "the RT-Tester Internal Model Representation". For producing test scenarios with appropriate test data, the concurrent state machine semantics is modeled using transition relations which associates pre-states, current time and other variables with post-states. An example of a specification written in Arabic and its translation into English is proposed in Figure 2. The proposed specification conforms to the considered CNL (Arabic-CNL). The decomposition of the considered specification is illustrated in Figure 3.

Related Work
In [41], the authors presented a technique aimimg at model-based test case generation while taking into consideration natural language requirements deliverables. The technique is aided by an application that allows natural language specifications to be automatically translated into Statechart models. In [34], the authors addressed the problem of automatically creating executable security test scenarios from natural language security requirements.
In [9], an ongoing research aiming at developing an appropriate technique for producing natural language descriptions test scenarios using a formal language was proposed. An initial prototype was developed and applied for the case of a project from aerospace field. In [16], the researchers presented a technique which generates MBT test scenarios from NL requirements. These requirements are translated into state charts and test scenarios are derived from these state charts.
In [17], engineers and experts write requirements using structured NL scenarios. These scenarios are then translated automatically into executable test cases which provide backward and forward trace-ability. In [4], the authors proposed a tool for analysing requirements. This tool accepts NL requirements, makes a classification of them, makes interactions with users for refining them, transforms automatically NL requirements into a logical description for validating them and ultimately it produces test scenarios.
In [43], the authors proposed a methodology for generating test cases which takes as input a set of restricted NL requirement. This methodology automatically transforms these requirements to executable Petri-Nets. The latters are adopted as an input for the generation of test cases. In [19], a tool called Kirby was developed for generating automatically executable test cases from structured English requirements.
In [15], the authors applied ML techniques for identifying semi-structured requirements descriptions and proposed a rule-based methodology for their transformation into Cause Effect (CE) Graphs. This study demonstrated that the proposed technique allowed about to 85 % time savings for the creation of the test models without any quality degradation.
Concerning the Arabic Language, the authors of [12] tried to adopt a formal techniques to Arabic, as a preliminary phase towards its computation. Similarly in [1], a formal model for extracting ontological relations for Arabic language was proposed. To the best of our knowledge, no previous work in the literature was dedicated for model based testing for specifications written in Arabic language.

Conclusion and Future Work
In this work, we have proposed a first attempt for the development of an automatic test generation methodology from specifications written in Arabic language. This work is still in a preliminary phase and and should be improved in different directions.
First, we have to implement a test generation prototype against the considered specification class to experimentally validate the proposed approach. This prototype should be tested in different fields and disciplines in order to estimate its efficiency and its performance.
Moreover our approach may be extended for the case of distributed systems made of several components [3,30,22,21,20]. In this case, many test generators and testers will be needed and an optimal coordination [28] between these different components will be required. We may also take into account behavioral and structural adaptations of the considered systems under test [29] which means how to adapt our test scenarios in these situations.
It will be also useful to consider test cases written with conformance to TTCN-3 [27] standard ( Figure 4) and to cover other types of test scenarios such as security testing [39] and load testing [26,32,31,33].
Finally, it will be interesting to consider a larger class of Arabic requirements with less restrictions on the structure of the specifications. This may be achieved by considering specific machine learning techniques [35,36].