The software industry is constantly trying to solve the quality issue, but how meaningful her success at this point is difficult to say. The article deals with a new generation of testing tools that are designed to improve the quality of programs. However, the tools, even automatic, can not help if they are used incorrectly. Therefore, the discussion is preceded by the presentation tool of general provisions of the "proper" test.
"The struggle for quality" programs may be conducted in two ways. The first way is "easy" to assemble a team of good programmers with experience in similar projects, to give them a good task, good tools to create good working conditions. With high probability we can expect that there will develop a software system with good quality.
The second way is not so simple, but produces high-quality products and when these conditions are not able to observe – not enough good programmers, clarity in the delivery problems, etc. This path requires standardized software development processes: uniform requirements to enter the stage of work, documentation, organizing regular meetings, to inspect the code and so on. One of the first advances on this front was the introduction of the concept of the life cycle of a software system that clearly identifies the need to deal with many problems, without which you can not count on the success of a software project.
The simplest set of life cycle stages as follows:
Standardized life-cycle scheme with clear regulation of the necessary works and a list of relevant documentation was the basis of so-called "waterfall" or cascade model. Waterfall model implies a strict partition of the software development process into stages, and the transition from one stage to another is done only once to be fully completed in the previous step. Each stage ends with the release of complete documentation, sufficient to ensure that the development could be extended another team. Waterfall model has become dominant in the standards development process of the Ministry of Defense. Many of the willy-nilly, even deviating from this model, in general, agree with her intelligence and utility.
Waterfall model required to formulate a precise and complete all the requirements, requirements change was possible only after completion of all work. Waterfall model did not give an answer to the question of what to do when the requirements change or changing the understanding of these requirements directly during development.
In the late 80s has been proposed a so-called spiral model was developed and tested in practice, the method of iterative and incremental development (Iterative and Incremental Development, IID). In the spiral model took into account the problems of waterfall model. The main focus of the spiral model is placed on an iterative process. Experiments are described using IID with the length of the iteration only half a day. Each iteration ends with the issuance of a new software version. On each version of the refined (and possibly changing) requirements for the target system and measures taken to ensure that meet the new requirements. In general, Rational Unified Process (RUP) also follows this model.
Is it possible to solve the quality problem? Only to some extent.
The problem of improving the overall software quality and testing attracts more attention, in the universities introduced special courses on testing and quality assurance, prepare specialized expertise in testing and quality assurance engineers. But there are still bugs cost the U.S. only from 20 to 60 billion dollars annually. With approximately 60% of the losses borne by end users. A situation in which consumers are forced to buy obviously defective goods.
However, the situation is not hopeless. A study conducted by the National Institute of Standards and Technology found that the size of losses associated with failures in the software, can be reduced by about a third, if the extra effort to invest in infrastructure testing, particularly in the development of testing tools.
What is the direction of the main attack? That offer a "best practice"?
In the 80s and 90s the answer to this question was something like this. The most expensive mistakes are made in the first phases of the life cycle – this is an error in the determination of requirements, choice of architecture, high-level design. So we should concentrate on finding mistakes in all phases, including the earliest, rather than waiting until they show up when testing is finished the implementation. In general, the thesis was: "To reduce the time between ‘making’ mistakes and the time of its discovery." Thesis in general is good, but not very constructive, because it does not give direct recommendations on how to reduce this time.
In recent years, with the advent of which is denoted by the epithet agile («smart," "quick") are offered and implemented new methods for early detection of design errors. For example, current models, such as Microsoft Solutions Framework (MSF, www.microsoft.com / rus / msdn / msf) and eXtreme Programming (XP), there are the following recommendations for the development of tests:
In other words, a mistake – whether it is in the requirements of the project or implementation – do not live longer than the launch of a test that checks the implementation of this requirement. Hence, although the astronomical time between "making" the error and its discovery may prove to be great, but wasted effort spent not very much, the implementation did not have time to go far.
Do not dwell on the validity of these provisions and their effectiveness. As often happens, a side effect of innovation was more important than the actual realization of this idea. In this case, the debate around the "nimble" methods have led to new understanding of the place of testing in the software development process. It turned out testing in a broad sense, ie development, pass tests and analysis of results, not only solve the problem of finding already made in the code errors. Serious approach to testing helps to prevent mistakes: it is necessary, before writing code, think about what errors it could be done, and write a test, aimed at these errors as the code quality is improved.
The new models life cycle testing, as it were dissolved in other phases of development. Thus, MSF does not contain the phase of testing – tests are written and always used!
Thus, the various work programs in the manufacturing process must be well integrated with the testing. Accordingly, testing tools should be well integrated with many other development tools. Of the major producers of programming tools, the first to understand this company Telelogic (a set of tools for design, simulation, implementation and testing of telecommunications software, based on notations SDL / MSC / TTCN) and Rational Software (the same set, predominantly based on the notation of UML). The next step is done at IBM, starting the integration capabilities of tools from Rational software development environment Eclipse.
Thesis XP – «Write test before implementing" – as good as a slogan, but in reality, just as nekonstruktiven. For large software systems have to develop tests for different purposes: the unit tests, integration and component tests, system tests.
Three components of the test – an excursion into the theory
Unit tests are small modules (procedures, classes, etc.). When testing a relatively small unit size of 100-1000 lines it is possible to check, if not all, at least, many branches of logic in the implementation, different paths in the graph in the data, the boundary values. According to this criteria are based test coverage (covering all operators, all of the logical branches, all the boundary points, etc.).
Checking the correctness of each module, unfortunately, does not guarantee the correctness of the system modules. The literature is sometimes considered "classical" model of irregular testing organization plugin system, often referred to by the "Great Leap Forward." The method consists in the fact that the first test each module separately, then combine them into a system and test the system as a whole. For large systems, this is unrealistic. In this approach will be spent very much time to localize errors, and the quality of testing remains low. Alternative "great leap" – integration testing when the system is constructed in stages, groups of modules are added gradually.
Dissemination of component technologies has created the term "unit test" as a special case of integration testing.
Fully implemented the software system being tested. At this stage, the tester’s not interested in the correctness of the implementation of certain procedures and methods, and the entire program as a whole, sees it as the end user. The basis for the tests serve as general requirements for the program, including not only the correctness of the feature, but the performance, response time, fault tolerance, attacks, user errors, etc. For the system and component testing using specific types of test coverage criteria (for example, whether covered all typical scenarios, all scenarios with emergency situations, the pairwise composition scenarios, etc.)..
Having finished an excursion into the methodology, let us return to the question of what testing tools are currently being used and how they comply with new ideas about the place of testing during software development.
At the moment, to the greatest extent automate the following stages: performance tests, collect the data, analysis of test coverage (for unit testing is usually covered by gathering information about the operators and the logical-covered branches), tracking the processing status of requests for correction of errors.
Overview of test tools will be carried out in reverse order – from system test to the modular.
Widespread application testing tools with a graphical user interface. They are often called tools for functional testing. If the level of responsibility of the application is not great, so that testing can be limited, and like most testing is cheap.
In this form of testing is widely used tools record-playback (record / playback); of the most well-known products can be called Rational Robot (company IBM / Rational), WinRunner (Mercury Interactive), QARun (Compuware). Along with this there are some tools for text-based terminal interfaces, for example, QAHiperstation company Compuware.
For the system load testing Web-based applications and other distributed systems, widely used tool LoadRunner from Mercury Interactive; it is not aimed at the generation of sophisticated test scripts, but provides a wealth of material for performance analysis, search for bottlenecks that affect the performance of a distributed system.
Approximate overall scheme of using record-playback tool is as follows:
The resulting script can be run multiple times, making it necessary for small change.
When writing the script can make stops in order to specify which system responses in specific situations must be considered as correct, the variations possible user input, etc. With such variations the next time you play the test instrument itself will choose one specific alternative. If you do not match the expected response of the system response will be fixed bug.
However, the possibility of this type of testing is limited:
The next class of tools – tools to test components. An example is Test Architect (IBM / Rational). These tools help to organize the testing of applications built on one of the component technologies (eg, EJB). Provides a set of templates for creating various components of the test program, in particular, tests for modules, scripts, plugs.
Does the tool demand advanced test development? In general, yes: to create a test sufficiently describe interfaces of components. But there are weaknesses, which, however, inherent in most other instruments. Thus, the test script to write manually. In addition, there is no unified system of job criteria, test coverage and communication of these criteria, the functional requirements for the system.
The last of the classes discussed here of tools – tools to test the modules. An example is the Test RealTime (IBM / Rational), designed to test the modules in C + +. An important component of this tool is a mechanism for verifying ‘assertions »(assertion). With statements can formulate the requirements for input and output data of the functions / methods of classes in the form of logical conditions in a similar form can be set invariant requirements for data objects. This is a significant step forward compared with the Test Architect. The device enables the allegations in a systematic way to represent functional requirements and based on these requirements, build test coverage criteria (although, Test RealTime support for automated coverage analysis does not provide).
In principle, this tool can be used for advanced development of tests, but it remains unsold all the same function as the actual generation of test inputs – this work must be done manually. There is no technical and methodological support for reuse of tests and approvals.
The solution of these problems offers a new generation of tools that follow the approach of model-based testing (model based testing) or on the basis of specifications (specification based testing).
At the head of the developer and the tester is always present one or the other "model" of the program structure, as well as a "model" of the desired behavior on the basis of which, inter alia, to scan the lists of properties and create the appropriate test cases. (Note that these are different models, often called the first architecture, and second – the functional or behavioral). They are often compiled on the basis of documents or discussions in an informal way.
Development of models and specifications associated with the "mathematization" of programming. Attempts to use different mathematical approaches to design and even the generation of programs undertaken since the early years of computers. The relative success was achieved in the theory of compilers, relational databases and a few highly specialized fields, significant results in most practical areas to be achieved. Many began to refer to formal methods in programming skeptical.
A new surge of interest in formal methods has occurred in the first half of the ’90s. He was summoned to the first results obtained with the use of formal models and formal specifications in the test.
The advantages of model based testing seen in the fact that:
However, there was no clarity as to the quality of such tests. Models are usually easier to implement, so it can be assumed that the tests well, "covering" model, too poor to cover the real systems. Required extensive experiments in real projects.
Model – a reflection of the structure and behavior of the system. The model can be described in terms of system state, input signals to it, the final states, data flows and control flows that are returned by the system of results, etc. To reflect different aspects of the system used and different sets of terms. A formal specification provides a complete description of the system model and requirements for its behavior in terms of a formal method. To describe the characteristics of the system can use several models in several formalisms. Generally, the more general modeling notation is, the more difficulties arise when the test automation software based on the model / specification is described in this notation. Some notations and languages are more oriented to the description of accessibility and transparency, while others – for the subsequent analysis and translation, in particular, the broadcast specification of the test. Attempts were made to develop the language of formal specifications, meeting the requirements of industrial applications (for example, the methodology RAISE), but widespread use, they have found.
There are several now classic formal specification notation: VDM, Z, B, CCS, LOTOS, etc. Some of them, eg, VDM, are primarily used for rapid prototyping. Language B is convenient for analysis, in particular for verification of analytical models. All of these languages are widely used in university programs. In actual practice, to describe the architectural models using UML, and to build behavioral models – the languages of SDL / MSC, executable UML diagrams and notations are close to them.
The languages and notations for behavioral patterns, unfortunately, do not possess sufficient generality. They worked well in telecommunications applications, and practically useless for describing the functionality of software systems "generic": operating systems, compilers, databases, etc.
On the role of test development tools for such systems claim to a new generation of model descriptions / specifications, and test generation tools to check the consistency of behavior realization of a given model.
Test Real Time – one of the first members of this group. Provides more opportunities of Jtest Parasoft. An interesting tool company Comformiq. The family of development tools based test models offer Institute for System Programming in cooperation with the company ATS. Since the family UniTesK author tells much closer, we present a general scheme of the approach of model based testing on examples of UniTesK.
The first phase is relatively short, but in real projects it is important. It is here laid level of abstraction of the model. The model should be as simple as possible: this will require a comprehensive set of tests. At the same time, the model should be meaningful to disclose the specifics of the implementation under test. Thus, the problem of the first phase – to find a compromise between abstraction and detail.
The task of the second phase – the description of requirements for the behavior of the system. Many of the approaches (eg, SDL) offers describing executable models that can be considered as prototypes for future implementation. Job requirements in this case is defined by "the implementation must behave as a model." Approach is understandable, but, unfortunately, in many real situations, it does not work. For example, in the header of a message constructed by the model shown at one time, and in the same title from the sale – a few more. This is a bug or not? Another example. Memory management model generated a pointer to a free piece of memory, and the real system has issued another pointer: a model system and operate in different address spaces. Is this a bug?
UniTesK proposes to use the so-called implicit specification or specification limits. They are given in the form of pre-and postconditions of the procedures and restrictions on the types of invariant data. This mechanism can not describe the model algorithms for computing the expected values of functions, but only their properties. For example, in the case of memory management model will be given a Boolean expression in the postcondition of the "value of the pointer belongs to the region of memory." A simple example of a postcondition for the function "square root" is shown in ric. 2, the same specification is presented in three different notations: in the style of the C language, Java and C #. Using the specification extensions of conventional programming languages instead of classical languages formal specifications – a move which goes almost all the developers of such tools. Their only distinction between the expressive power notations and analysis capabilities and broadcast specifications.
The third phase – the development of test scenarios. In the simplest case, the script can be written manually, but this group of instruments – it’s bad form. Test, ie, sequence of operation calls on the target system with the proper parameters, you can generate, starting from a description of the program or data structure. We call such a description of the script. The company offers Conformiq to describe a state machine. Different states of the automaton correspond to different values of the variables of the target system, the transitions – the challenges of operations of the system. Determine the machine – which means for each state to describe in what state we get from this, if you look at any desired operation with any predetermined parameters. If this description is easily obtained, nothing more to do not need: a tool automatically generates a test and submit test results, for example, in the form of MSC-graphs. But how easy is it for, say, a program with an integer variable and two or three operations? Most likely, yes. However, in general, make impossible.
In UniTesK for generating test sequences of finite automata is not described, but is generated as the test execution. All that is required from the developer’s test – a method of calculating reference model state based on the state of the target system and method of enumeration used in the current state of the test stimuli. These calculations are recorded in the test scenarios. The next test the impact of chosen based on the specification script depending on the results of previous actions. This approach has two important advantages. First, it allows you to build complex test sequences in an extremely compact and easy to write and understand. Second, the tests become highly flexible: they can easily be parameterized depending on current needs for testing and can even automatically adjust to minor changes to the model. Fig. 3 shows an example of scenario method.
In general, a test scenario describes the iterators for all methods of this class, but every time a developer decides to test only a local problem – how to sort the input parameters of a single method. The general problem – how to organize the sequence of calls, as many times as necessary to return to the same state to be tested for yet another method, even for a single parameter, and when to stop to do extra work – it all takes on a tool.
In UniTesK used a single test architecture, suitable for testing systems of varying complexity, relating to different subject areas, and providing the scalability tests. Components of the test, requiring the writing of a person separate from the library and automatically generated.
In real systems, the number of distinct states and the number allowed in each of the test stimuli is very large, which leads to a combinatorial "explosion of states." To combat this effect a mechanism factorization model: the state of the target system, the difference between them is insignificant in terms of the objectives of this test, combined into one generalized state model, and similarly in groups and test actions. Factorization process provides developers with creative freedom, but, nevertheless, he maintained strict research, determine sufficient conditions under which guarantee the correctness of the results and a significant reduction in test time while maintaining the achieved test coverage.
Creators UniTesK, believing that there should be a separate development environment to test not only gave him an opportunity to mimic a variety of programming languages, but provided the integration of its component instruments in popular software development.
The new quality, which promises new tools
As noted above, the creators of testing tools usually face the following challenges:
Does the testing tools that are used to generate a test model or a formal specification of the target system, the fundamental advantages over traditional methods? To answer this question, we show how to solve the problems mentioned tools that use models.
Criteria for test coverage. The main criteria – check all allegations, particularly allegations that define post-conditions of procedures or methods. It is easily verified and easily binds to the functional requirements of the target system. Thus, tools UniTesK, tools for Java and C # provide four levels of nested criteria.
Reusing Tests. The level of reuse is significantly higher than that of traditional instruments. Developer testing is not writing a test script and test criteria for approval and test scenario. And, both devoid of many ‘implementation details, and so they are easier to reuse for the new version of the target system or to adapt the specifications and tests for a similar project. For example, statistics UniTesK shows that the level of reusability for test kernels of different operating systems exceeds 50%.
Automatic test generation. This is the main advantage of new tools, they are significantly ahead of traditional media, because they use are not arbitrary kinds of notations and modeling techniques and specifications, namely those that give benefits for automatic test generation. Thus, statements can generate a test "oracle" – a program for automated analysis of the correctness of the result, various kinds of finite automata or their equivalents can generate test sequences. In addition, because the models are usually simpler than the implementation, they can conduct a more thorough analysis, so the test suite becomes more systematic.
The above tools have been tested on real, large-scale projects. Of course, every project carries some specifics, possibly preventing an exhaustive testing. However, experience from use of these tools shows that it is usually possible to achieve good results, better than the results obtained in similar conditions with manual testing. Members UniTesK, usually at an acceptable level of quality take 70-80% code coverage of the target system, and it should be granted, at a minimum, meet the criterion of logical branches in postconditions. For some complex programs (including, for the block compiler optimization GCC) had reached the level of coverage of 90-95%.
Is there any fundamental limitations to the applicability of this approach? Its almost impossible to apply in the case when, for whatever reason, nobody in the chain of the customer – the developer – the tester was unable or unwilling to articulate the requirements for the target system. However, it is not only a constraint, but also an additional incentive to improve the development process, another reason to explain to the customer that the investment in the design phase is far outweighed by the reduction in overall development time and cost of the project.