By: Josiah Huckins - 8/28/2019
minute read
Testing should not be an afterthought,
when sizing your environments. High performing applications and services are desirable, but you need to test that performance and ensure real world usage aligns with expectations. Not only that, but security testing should also be considered through all stages of an environment's lifespan.There are a lot of different types of cloud environments, and mixed environments of virtual and physical servers. In this post, I'll talk about practical, common sense options, usable for any environment.
Know your throughput and constraints
To begin sizing a new environment, you need to understand how much data it is expected to process in a specified amount of time. For new systems, this can be challenging to determine. This is why testing is a key to success.- Begin laying out the pieces needed to build the complete environment in an architecture diagram. In this diagram, list the servers and their calls to other servers, both internal to the environment and external to it. For each server, list its resource details for CPU compute cores, RAM and mass storage. List out any SAN array or NAS storage capacities shared among servers. Be sure to list any firewalls, proxies and routing devices and the number of WAN links (if applicable). You don't necessarily have to include redundancy considerations at this stage, this is meant to establish the bare minimum requirements for a functioning service or application.
-
With the initial diagram created, its time to determine the number of transactions that each piece in your architecture is able to process in a specified time frame. This can and should be measured in transactions per second/minute/hour and per day. You can measure based on any other timing metric, but the transactions per second is the most important. When it comes to computer data processing, seconds can seem like centuries. An environment with high latency, or one with bottlenecks will stand out.
To determine your transactions per second on servers, look at how long it takes the server to receive a request and provide a response. This will differ depending on the server type and transaction complexity. Don't just measure one request and response, measure multiple simultaneous ones. At this stage its a good idea to stress test the servers to determine their breaking points. You need to know how much they can handle before crashing/failing, that way you can establish a max throughput for the host. Certain database stored procedures and views being excluded, most cloud service requests should complete in under a second. Repeat this testing process for any other servers and networking equipment.
Remember, modern routers and switches are really just computers themselves, though they handle pre-processed data and have the added burden of wrapping or encoding the data in a format that other network devices can understand. Separate metrics are expected for network devices in your environment. - Note any employed performance enhancing techniques relevant to network I/O, such as NIC teaming, frame sizing adjustments and cache. For network transmission lines, know the rated bandwidth (bits per second) for the types of cabling you use. If sending data out of the local area (to another LAN or WAN), be aware of any potential attenuation and EMI issues with the transmission medium. Based on rated bandwidth, stress test the transmission lines to understand the true maximum throughput of your connections. For example, if the line is rated to handle 10Gbps, your throughput measurements should reflect a value near to that.
Taken altogether, you should have a good representation of what your servers, network devices and connections can handle. Based on these metrics, if the environment does not perform well you can add capacity or tweak configurations until you hit a desired baseline. Adding capacity can be done vertically (more cores and RAM per server) or horizontally (more servers). The direction you choose is important. The needs for processing power, redundancy and failover should be considered. Don't be rigid. Sometimes you need to expand the environment in both directional planes.
Divide and...Coordinate
With a near to finalized production environment built, you should duplicate this build and preferably twice. Establish separate testing environments. Testing performance should not just be an initial project task, but an ongoing, never ending effort. Security testing is also an ongoing requirement. You need to regularly penetration test any environment that connects to other networks. A dedicated testing environment that is a replica of production or at least proportional to its size will allow you to pen test without affecting real end-users.Test to find holes and scan for vulnerabilities. If issues are discovered, patch and update configurations and then test again. Once the issues are resolved, apply the updates to the production environment. Why the preference for two testing environments? So you can pen test and load test at the same time and coordinate testing efforts. This efficient strategy will allow you to find issues with performance and/or security quickly. Specifically, it will allow you to correlate any DoS problems with current configurations, software versions and external malicious entities.