What are they?
Well, non-functional requirements are requirements that define the operation of the system under test rather than the behaviour of the system under test, or the functional requirements as these are known.
The categories under which non-functional requirements are grouped are numerous with a degree of overlap, we are going to attempt to demystify some of these whilst attempting to articulate how they can be tested and some of the common pitfalls.
The world of non-functional testing can be murky and ambiguous in contrast to its functional counterpart where expected functionality and behaviour is easier to define.
Hopefully this blog post will give some insight in to how non-functional requirements can be tackled and made testable.
The ‘ilities’ are a collective name for system quality attributes.
They define how non-functional requirements are grouped, the name is a little misleading as a number of them do not end in ‘ility’.
- Disaster Recovery.
There are many more ‘ilities’ but there is significant overlap and we genuinely believe that the majority of systems non-functional requirements can be covered using the categories definition in the next section.
What they mean
Let’s look at each one in turn…
This is fairly self-explanatory in that it is a measure of an application, component or service’s ability to respond within a defined period of time, with examples being:
- A REST service call must respond with 500 milliseconds,
- A synchronous database request must respond with 1000 milliseconds,
- A batch process must complete its processing within 60 minutes.
In our examples we have a component and we have a clear response time, these are all good examples of Performance non-functional requirements.
As our performance tests only become relevant when run under load and concurrency, more on this later.
As we will be dealing with a large sample size of response times and to ensure we ignore any anomalous results a sensible way to measure response times is to use the 95th percentile.
A percentile is a measure used in statistical analysis.
And measuring the 95th percentile means that the upper 5% of the samples will be ignored and the response time that is the greatest from the remaining 95% of samples will be the response time we will compare our requirements against.
Our examples above can be revised to be:
- A REST service call must respond with 500 milliseconds, at the 95th percentile,
- A synchronous database request must respond with 1000 milliseconds, at the 95th percentile,
- A batch process must complete its processing within 60 minutes.
Notice we have not changed the batch process requirement, this is due to the fact that whilst it may affect a significant number of records it is still one execution therefore percentile calculation is not relevant.
This is the levels of load and concurrency under which our application has to perform, examples being:
- A REST service must be able to support 100 requests a second,
- The application must be able to support 1000 concurrent users,
- The database must be able to support 10 million records,
- The message queues must be able to support 100 messages per second.
To make these volume requirements relevant we must acknowledge, that under these levels of load, we must still meet our response performance requirements, our examples above can be revised to:
- A REST service must be able to support 100 requests a second and still meet its response time requirement,
- The application must be able to support 1000 concurrent users and still meet its response time requirement,
- The database must be able to support 10 million records,
- The message queues must be able to support 100 messages per second with no backlogging of message requests.
Notice we have not changed our database record size requirement. We will discuss how we further combine non-functional requirements later to make them testable.
This is increasing the level of load and concurrency to ensure the application can support predicted growth over a number of years.
Examples of non-functional requirements that fall into this category are:
- The application must be able to support an annual transactional growth rate of 10%, and still meet all defined transactional performance requirements,
- The database must be able to an annual growth rate of 20%, with no degradation in database performance,
- The application must be able to support a 10% growth in user concurrency, and still meet all defined transactional performance requirements.
We have already had two examples of revising non-functional requirements to give them context and the examples above, and from now on, will include this technique.
The non-functional requirements categorised here are to ensure that the application can not only be maintained once delivered into production, but meets any regulatory requirements it may face; examples being:
- All log files must be regularly cycled to ensure application disk space is recovered,
- An audit record for each business transaction is written to the audit log,
- Log files will be pushed to an alternative tool to make them readily available to all system users.
This categorisation of non-functional requirements are around considering how the application or system will be supported in production once it is promoted.
They include making sure that prior to going live your production monitoring tools have been tested to ensure they pick up on application failure and warnings, with an example being:
- The application server will be monitored for transaction response times exceeding their non-functional requirements at the 95th percentile, for longer than 1 minute.
You need to consider the impact of high volumes of transactions on your infrastructure, and determine through suitable non-functional requirements what these ceiling may be, examples include:
- The application server shall not exceed 50% CPU consumption under levels of peak volume and concurrency,
- The database server shall not exceed 90% of its available memory without performing a Garbage Collection,
The topic is a nice segue into Resilience which is next, the reasons for it doing so will be explained below.
This categorisation all about loss of servers or recovery from failure.
The reason that the Utilisation non-functional requirements overlap is that the CPU consumption and Memory footprint of your application under load, at peak times, must be supportable on your infrastructure under conditions of failure.
To outline how these requirements may be defined see the examples below:
- Loss of a single application server must not have a detrimental effect on the performance of defined transactional requirements, at levels of peak load and concurrency, or compromise the server CPU or Application Memory,
- The database must replicate in real-time to a standby instance, unless an active–active configuration is configured,
The database non-functional requirement is important as whilst this is not necessarily to do with supporting load during failure it is about the ability to recover from a database server failure should you only have a single instance.
It is important to note that in the world of pods and containers failure of services and servers is handled by the technology.
Whilst Resilience is important, recovery from failure only needs to be proven under test conditions and does not form part of a manual activity as it once would have.
This is hugely important this one and the ramifications of not considering this are significant, things to think about are:
- The applications can prevent cross site scripting attacks,
- The application does not store hard coded sensitive information.
There are a significant number of internal security non-functional requirements that you can test for.
Some tests, penetration test being an example, are better outsourced to specialist companies.
This is because they have the specialist knowledge required to perform the tests the service attack tests must be done from outside the companies estate in order to be valid.
Ensuring the application performs over a period of extended time is important.
The response times of an application, which we have discussed in the performance section, must not degrade the longer the application is available with some examples being,
- The application must be able to support a continuous level of availability under levels of normal operating volumes and concurrency, with no application performance degradation over a period of time between planned application restarts,
- The application CPU and Memory consumption must not degrade over a period of time between planned application restarts,
- The application must be available between 7am and 10pm, 7 days a week.
We discuss planned application re-starts in a number of these requirements, if your application is 24/7/365 then this obviously makes this non-functional requirement complicated to test and this will be discussed in later sections.
Recoverability and Disaster Recovery
We will discuss these non-functional requirement groupings together as there are a significant amount of common ground and overlap between the two, example below:
- The database must run a full backup once a week with incremental backups daily,
- The database backup must run after any overnight batch processing and not compromise the batch window,
- A disaster recovery test will be scheduled and executed annually to ensure recovery from primary site failure is achievable, and the steps to perform this process are well documented and regularly revised.
Infrastructure as code is important as it allows you to build environment from machine readable definition files when developing using DevOps principles. This should be considered when starting to build your recovery strategies.
Making them testable
We have discussed that various grouping of non-functional requirements and now let’s discuss how we make these testable.
Let’s take an example from our definitions section:
We had a Performance non-functional requirement that stated
A REST service call must respond with 500 milliseconds at the 95th percentile
and a Volume non-functional requirement that stated
A REST service must be able to support 10 requests a second and still meet its response time requirement
So our testable requirement is
A REST service must be able to support 10 requests a second, and must respond with 500 milliseconds, at the 95th percentile
We can add to this by including a Utilisation non-functional requirement
The application server shall not exceed 50% CPU consumption
Our testable requirement is now expanded to give us
A REST service must be able to support 10 requests a second, and must respond with 500 milliseconds at the 95th percentile, and the application server shall not exceed 50% CPU consumption
It’s an import point this that non-functional requirements do not need to be tested in isolation, it is true to say that some do but for the most part they cannot and you need to combine them to make them testable.
So we’ve discussed what non-functional requirements might look like, and we’ve discussed how we make them testable by combining different categories of non-functional requirements.
Testing them however needs discussing because it’s not always that obvious.
For the most part you are going to have to automate these tests as you will be running under levels of load and concurrency, this may be obvious but it is worth pointing this out.
How to do this is not within the scope of this blog post but there are many examples in the OctoPerf blog pages.
In our DevOps world we shift-left our performance testing and test our services in isolation and this is important to de-risk performance issues.
Ultimately we need to build a number of well thought out scenarios using our testable non-functional requirements to ensure our system under test meets our defined non-functional requirements.
Let’s talk about these scenarios.
Peak Hour Load
This is one of the most important scenarios there is, where peak volumes of load and concurrency are combined to mirror those at the seasonal peak of your application.
You would look to satisfy the majority of your Performance and Volume non-functional requirements with this scenario, as well as ensuring that the Maintenance and Utilisation ones are satisfied.
There is always the option to re-run the Peak Hour Load Scenario under limited server resources to satisfy the Resilience non-functional requirements.
Soak Tests run over a protracted period of time to ensure that our Availability non-functional requirements are met and a sensible duration for this test must be defined.
We would recommend running for 2 overnight cycles and obviously the days in between to give us a 3 day test to satisfy the non-functional requirements for any Soak Test.
A blog on this subject is available on the OctoPerf blog pages.
Soak Tests should be run at average levels of load and concurrency which is normally 60% of the Peak Hour Load test volumes.
Starting at your Peak Hour Load test volumes you systematically increase the load at sensible increments until either you have met your Scalability non-functional requirements, or the system reaches breaking point.
Testing to destruction, if you have already met your expected growth rates is sometimes a good idea to determine what volumes your application could support should your business model change and volumes increase more rapidly than expected.
Pitfalls and how to avoid them
Over-engineering your testing is a common problem, trying to write automated tests to cover all business processes is something that a lot of people try and these become difficult to maintain.
Get a reasonable balance of tests and importantly get your levels of load and concurrency correct.
You don’t need a significant number of scenarios to cover your non-functional testing requirements, the three outlined above are generally enough to satisfy most non-functional requirements with the Security and Recoverability tests probably requiring a more manual approach.
Make your non-functional requirements testable by combining them.
Do not over-engineer your scripts or tests.
You don’t need to automate everything some non-functional testing requires a manual approach, those supporting Recoverability and Disaster Recovery in particular.
Shift-left as much as you can to de-risk your non-functional requirements.
Very useful article. Great job.