In “Calculating Concurrent Sessions, Part 1” and “Calculating Concurrent Sessions, Part 2,” I covered a task to calculate the maximum number of concurrent sessions for each application. I started the series by presenting a set-based solution (call it Original Set-Based Solution) that didn’t perform well because it had quadratic algorithmic complexity. I also presented a cursor-based solution (call it Cursor-Based Solution) and explained that the cursor alternative performed better because it had linear complexity. In the second part of the series I presented a new set-based solution (call it New Set-Based Solution 1) with linear complexity. This solution performed better than the cursor-based solution, but it involved using a temporary table, a couple of scans of the data, plus a seek operation in an index for each row from the table.
I thought that New Set-Based Solution 1 was the best available, but I was pleasantly surprised to learn about a fantastic set-based solution (call it New Set-Based Solution 2) that performs even better. Several readers sent me this solution. New Set-Based Solution 2 doesn’t involve the use of any temporary tables, requires only two scans of the data, and has linear complexity. In a benchmark test that I ran, it proved to be an order of magnitude faster compared with New Set-Based Solution 1. I’d like to thank Ben Flanaghan, Arnold Fribble, and R. Barry Young for coming up with the new solution.
Creating and Populating the Table
When I originally presented the task, I provided code to create and populate a table called Sessions. Listing 1 contains the code to create and populate the Sessions table with sample data. Listing 2 contains code to create a helper table function called GetNums, which returns a table result with a sequence of integers of a requested size. Listing 3 contains the code to populate the Sessions table with a large set of rows to test the performance of the solutions.
As a reminder, you’re supposed to calculate the maximum number of concurrent sessions (active simultaneously) for each application. If one application ends at the same time that another starts, you’re not supposed to consider the two as concurrent. Table 1 shows the desired results for the sample data provided in Listing 1.
Table 1: Desired Results| app |
mx |
| app1 |
4 |
| app2 |
3 |
New Set-Based Solution 2
Like the cursor-based solution that I covered previously, New Set-Based Solution 2 relies on unifying start and end events into one chronological sequence of events, associating a +1 event type to start events, and a -1 event type to end events. However, instead of using a cursor to process the records one at a time and keep aggregating the event type to calculate the count of concurrent sessions at each point, New Set-Based Solution 2 uses row numbers to calculate the count of concurrent sessions. The technique is ingenious and surprisingly simple. The first part of the solution involves calculating start ordinals (how many sessions started so far), as well as start-or-end ordinals (how many sessions either started or ended so far). Start ordinals are calculated by assigning row numbers, partitioned by app, ordered by starttime, only to start events, and assigning a place holder (e.g., a NULL) to end events, like so:
SELECT app, starttime AS ts, +1 AS type,
ROW_NUMBER() OVER(PARTITION BY app
ORDER BY starttime) AS start_ordinal
FROM dbo.Sessions
UNION ALL
SELECT app, endtime, -1, NULL
FROM dbo.Sessions;