Essays on the validation data


Why do I need validation of data?

It would seem that "invalid" data that do not satisfy certain restrictions, may cause malfunction of the program. But what does this mean? Suppose at some point in the program exception occurs when trying to convert a string to number if the string is malformed. Of course, if an exception is not caught anywhere, it could lead to a crash. But it is highly unlikely scenario. Most likely at some point work interceptor, which either will give the user some error message in the program, or make an entry in the error log, then the program will try to recover from the failure and continue. That is, even if the validation does not perform, it is likely that nothing bad will happen.

But some negative effects from lack of validation can still make it be, let’s just take a closer look, what problems could arise.

  • Inability to recover from failure. Not always has the ability to "restore things." Perhaps, in the work program has performed any irreversible actions – delete the file, sent over the network, print something to the printer, launched the cutter machine and it is partially made billet handling details. But even if recovery is possible in principle, the reconstruction algorithm can also be wrong, and this sometimes leads to very dire consequences.
  • The additional load on the system. Disaster recovery – is extra work. All the work that was performed before the crash – also wanted. And that means an additional burden on the system, which can be avoided if the pre-test data. On the other hand, validation – this is also the additional stress and recovery has to do only occasionally, and checks should be done every time, so it is still unknown, it is more profitable.
  • Injections do not cause crashes. One of the main ways of exploitation of vulnerabilities in the programs is to "trick" the validators, that is to transmit data to the validator recognizes correct, but they are interpreted unintended way, so that an attacker can gain unauthorized access to data, or certain features of the program, or is capable of destroying data or programs. If the validation is not at all, the task of the attacker maximally simplified.
  • The complexity of identifying the causes of the problem. If an exception is flown from somewhere in the depths of the program, identify its root causes are not so simple. And even if it is possible, it may be difficult to explain to the user that a malfunction caused by the data, which he introduced some time ago in some completely different place in the program. And if the check is immediately after you enter data, there are no difficulties with the identification of the source of the problem does not arise.

In short, the lack of validation may lead to the above (and maybe even some other) problems. Accordingly, the presence of validation to prevent serious disruptions, makes the identification of problems, but you have to pay for this performance, because additional testing increases the load on the system. And then we turn to the second question – how to reduce this burden.

Where and when to perform data validation?

As mentioned above, in terms of load reduction is best does not perform data validation.

But if you still need verification, logic dictates that it is convenient to check the data at the point where they enter the program from the outside world. Upon such verification, we can be sure that the program get the correct information in the future they may be used without additional checks.

This may be a user interface through which a person enters the data. This can be a file containing the settings of the program or data that the program should handle it. This may be a database where information can be released from other programs. This may be a network protocol to communicate with other programs. Finally, it may be a software interface that uses another program, causing some functions / procedures and passing them parameters.

Alas, common sense is sometimes forced to retreat before the onslaught of reality. "Faith-control input data sometimes is not just impractical, but is generally impossible. Below are some reasons for this.

  • Validation is required to access the inaccessible part of the system. This is especially true for verification of data entered by man through a graphical user interface. Modern applications are often built using a layered architecture, which assumes that the implementation of UI is highlighted in the presentation layer, but for verification requires access to the other layers, until layer database.
  • This is particularly evident for Web applications, where the user interface is implemented in the browser and executed on the client side, and to an input validation requires comparison with what is stored in the database. In this situation, verification has to do after sending data to the server. (However, now with the advent of AJAX-technology, this problem is partially solved).
    Validation requires a fully replicate the logic processing. As noted two paragraphs above, the multi-application architecture, user interface is usually allocated to a special presentation layer and logic data is on another layer. And there are situations when the need for validation is almost entirely run this process, because there is no shorter way to see it completed successfully or not.

How do I perform data validation?

However, wherever performs validation, you can do it in several different ways, depending on what restrictions are imposed on the data.

  • Character by character check. Typically, such checks are done in the user interface, as input data. But not only. For example, a parser compiler also detects invalid characters directly in the process of reading the source file. Therefore, such tests can be called "lexical".
  • Check individual values. UI is to check the values ??in a separate field, and it can run both as I type (check it incomplete value that you entered so far) and when finished, when the field loses focus. For the programming interface (API) is a test of one of the parameters passed to the subroutine. For data from a file, this is a test of a read for file fragments. Such checks, once again, by analogy with the compiler vectorization terminology might be called "syntax".
  • The collection of input values. We can assume that the program initially transferred some data, and then fed a signal that initiates processing. For example, a user enters data into a form or in multiple forms (the so-called "wizard,") and finally pressed the button «OK». At this point you can perform so-called "semantic" validation, designed to validate not only the individual values, but also the relationships between them, mutual constraints.
  • Yes, it is possible that each individual value "syntactically" correct, but together they form an inconsistent set. For the user interface, this kind of validation involves checking a set of input parameters, the procedure for the case of receiving data from a file is to check all the read data.
    Check the status of the system after processing. Finally, there is the last method, which can have recourse, if validated directly input data do not succeed – you can try to process them, but leave the possibility to return everything to its original state. Such a mechanism is often called transactional.

Transaction – a sequence of actions that are either all completed successfully, or there is any failure in the performance of an individual action, and then canceled the results of all previous steps in this chain. So, validation can be performed in the course of the transaction, and the last test can be performed at the end of the transaction data processing. In this case, we validiruem not the data itself, and the state, which turned out after they complete treatment, and if this state does not satisfy some restrictions, while we acknowledge the input data invalid and return everything to its original state.

Which method validation should be applied in practice in a particular case? Most often, one way to limit is not possible, and do not need. Validation of data can and should be performed in several stages, complicating testing.

First, as you enter, making sure that data does not contain invalid characters. For example, for a numeric field, a user may be banned from entering non-numeric characters.

After commissioning is completed, you can check all the difference entirely. For the entered number may be some restrictions, for example, it should not exceed a certain maximum allowable value. If our field is a numeric age, it should be in the range from 0 to, say, 120.

When all fields are filled, you can check whether the input values ??are compatible with each other. For example, if in a form other than a field to specify the age of a field to enter the passport number, the application can verify that the passport number when filling out age should be at least 14 years.

Finally, if all entered correctly, you can try to start treatment, performing checks along the way, as well as at the end, and if something goes wrong, roll back to its original state.

Well, of course, check the next level of hedging can verify the previous levels. Say, for web applications is mandatory checking of data arrive at the server in the HTTP-request, regardless of whether you are in front of this preliminary validation in the browser or not. The reason for this is that the checks on the client side can be circumvented.

See Also

    Advertising

    Archives