Last month I introduced sequences, which are a new feature in the upcoming version of SQL Server (code-named Denali). I explained how to create and alter sequences, I discussed how to obtain new values using the NEXT VALUE FOR expression, and I compared sequences with the IDENTITY column property. This month I discuss caching of sequence values, obtaining a range of sequence values, and producing multiple unique sequence values in the same target row.
Caching
SQL Server supports defining a caching option for sequences, with the purpose of improved performance by minimizing disk access. The caching option is an extension to standard SQL. When you create or alter a sequence, you can indicate the CACHE option along with a cache value or NO CACHE if you want to prevent caching. The current default option (as of press time) is to use a cache value of 50; this default might change in the future. (For more information about cache values, see the sidebar “How to Determine the Default Cache Value in SQL Server Denali.”)
If you choose the NO CACHE option (remember, the default is CACHE 50), every time a new sequence value is generated, SQL Server has to write the change to the system tables on disk. This, of course, can have a negative performance effect. When using the CACHE <cache_value> option, SQL Server writes to the system tables only once every <cache_value> request. What SQL Server records on disk every time a new block of cached values is allocated is the first value to be used in the next block. In the meantime, SQL Server keeps in memory only two members with the same type as the sequence type, holding the current value and how many values are left in the current block of cached values before the next write to disk needs to occur. Every time a request for a new sequence value is made, SQL Server updates those two members in memory.
Compared with not caching values, you can observe significant performance improvements when caching with even fairly small blocks of values (e.g., 50); the bigger the cache value, the better the performance. So why not simply use as big a value as possible in the cache option? Because if SQL Server shuts down uncleanly (e.g., power failure), you basically lose the remaining range up to the value currently written to disk. Given the fact that new sequence values generated in a transaction that doesn’t commit are eventually lost anyway, sequences can’t guarantee no gaps to begin with.
Note that if SQL Server shuts down cleanly, it writes the current value plus one to disk; then upon restart, when the next request for a sequence value is made, the new sequence value starts from that point. As an example, suppose the cache size is 50. The stored value in the system tables is 151 (first noncached value), the current sequence value is 142, and the number of values remaining to use is 8 before the next request will cause a write to disk. So if you shut down the system cleanly at this point, SQL Server will write the number 143—the next value to be used—to the system tables. When you restart the system and a request for a new sequence value is made, SQL Server will allocate 50 values by writing to disk 193.
If you’re curious about whether SQL Server caches values for IDENTITY as well, it does. Currently (at press time), the cache size of IDENTITY is a hard-coded value of 10, but this size might change in the future. The internals of the IDENTITY property and sequences are quite similar—but because the default cache size is 50 for sequences and 10 for IDENTITY, you should see a small performance advantage of sequences compared with IDENTITY.
There’s another interesting difference in the handling of caching between the two features. As I mentioned, with sequences, if the system shuts down unexpectedly, you lose the remaining cached values in the current block and simply end up with a gap in your values. With IDENTITY, when SQL Server restarts, during recovery it scans the log records to determine the last used IDENTITY value, so you end up not losing the remaining cached values (except in certain scenarios with low likelihood because their time window is very small). If you’re wondering why SQL Server can’t recover the last used sequence value from the log in a similar manner, it’s because sequence values can be generated by SELECT queries without inserting them into a table—and therefore no record exists in the log for those values. So SQL Server simply doesn’t attempt to recover the lost cache values.
Let’s examine some performance numbers from a test I ran on my laptop. It’s not really a thorough or exhaustive test, but rather a fairly simple test just to get a general sense of performance. Use the code in Listing 1 to create the objects used in this performance test.