Subscribe to Windows IT Pro
November 01, 1996 12:00 AM

LoadSim Revealed: Scientific Method to the Rescue

Windows IT Pro
InstantDoc ID #2812
Rating: (0)

Microsoft provides a handy little tool with Exchange Server 4.0 called LoadSim (as seen in Screen A), which functions as a load generator and user simulator for capacity testing a messaging platform-specifically, Exchange. It runs on one or more client machines in tandem, sending and receiving messages, accessing public or private folders, etc., as it emulates the activities of a normal Exchange user.

While LoadSim was intended to be a capacity planning tool (to find out how many users you can support on a system, with what kind of response times), it also makes an excellent performance testing tool if used properly. However, LoadSim is not without problems. If you are aware of them, such as client dependencies, a quirky user interface, and sometimes unpredictable behavior, you can use it to test existing systems-or find out what a new one will do for you-by planning your testing strategy around these holes. In the Windows NT Magazine Lab, we decided that LoadSim would make an excellent first step in testing server hardware as messaging platforms-we can tune the system configuration (number of CPUs, amount of memory, disk and network layouts, etc.) and change the user load (number of users, transaction mix) to come up with curves that tell a more complete story about a particular machine. Instead of a single number characterizing the performance of an entire client/server system, we can use these curves to find trends and breakpoints of various types of systems.

Know Your Enemy
First, lets look at the problems we know about. Client dependencies in LoadSim are fairly significant-the horsepower of the client system has a large bearing on measured response times. LoadSim is more memory constrained than CPU constrained, but even with a large amount of memory, the client falls down on high user counts. Besides, you have to do what's real - you can't simulate 1000 users on a single physical client system, because it introduces new dependencies at the client level that you are trying to avoid-actually, it introduces dependencies that you are trying to measure on the server! With too high a user count, whether the CPU is fully taxed and memory is optimal or not, the I/O capabilities of the client system get in the way. With an appropriately fat client, you can simulate a certain number of users and attain the same throughput for each one (within an acceptable tolerance) as you would having a separate physical machine for each client. If you go too far, you hit bottlenecks in the client such as network bandwidth, memory, CPU, and disk utilization, etc., that warp your results.

When we set up our testing environment for the Tricord review, we ran tests using a maximum configuration on the server (four CPUs, 1GB of RAM), while varying the number of users simulated on a single physical client system. We found that the response time didn't start degenerating noticeably until we went above 100 users (that is, the response time at 10 users was within 10%-15% of that for 100 users). Also, other vendors such as Compaq, and even Microsoft, have performed similar tests in a comparable environment to the one we used, and came up with the same results for client load. We also tuned the user load and think times (how long the pause is between user operations) to values between absolute "real world"-which is an eight-hour day with long breaks between actions-and a livable testing environment that wouldn't take 24 hours to get a single data point. We ended up with a two-hour day, and a four-hour test run, which neither overwhelmed the client system, nor represented an unrealistic environment. We took data points from the two middle hours (the last half of the first day and the first half of the second day), so that the ramp-up time (the first hour) for the test to reach steady state did not influence the results, nor did the ramp-down as the users log off.

Since we could operate within a reasonable range of real world results, and keep the test believable and repeatable, we determined that LoadSim was a good starting point for messaging tests. But what about the other problems I mentioned, like the inconsistent interface and unpredictability, which would seem to contradict using this tool at all?

The interface is a resolvable issue-it just takes a little babysitting of the test runs. The utility itself follows the typical Microsoft GUI guidelines (rather than being a command-line interface), but the error trapping is a little weak, so restarting the tool or reloading a set of test parameters can change test settings. Before each run, we had to double check every system to make sure that it was going to run the test we intended.

Unpredictability is a little more difficult to deal with, and it is a two-fold problem. First is the unpredictability of the interface, which I just explained. Second is the unpredictability of the test results. On the one hand, LoadSim is a fine end-to-end testing environment, while on the other hand you don't really know what it is measuring, and can only infer certain things by analyzing the results against server operations (such as disk and CPU utilization). There is a narrow band of settings in the test, as well as a specific hardware configuration on the server, that seems to give relatively error-free logs (see the section on load and scaleability in the main article). A test run isn't necessarily invalid if there are errors-it just points to bottlenecks in the system.

Related Content:

ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.