Learn about NTFS5's Distributed Link Tracking, sparse-file support, volume change tracking, encryption, and alternate data streams
In "Inside Win2K NTFS, Part 1," November 2000, I began this two-part series by describing general indexing, consolidated security, reparse points, and quota management, all of which are features new to NTFS 5.0 (NTFS5), the Windows 2000 version of NTFS. In this issue, I look at how NTFS5 implements Distributed Link Tracking (DLT), sparse-file support, volume change tracking, and encryption. I conclude with a look at alternate data streams, an NTFS feature rarely used before NTFS5.
Distributed Link Tracking
Windows Explorer supports a type of symbolic link called a shell link or shell shortcut. Shell links, which often appear on the Windows desktop and in the Start menu, let you easily access programs and files without having to navigate to their original location. Another type of Windows link is an OLE link. OLE links are links in one application's files that store data belonging to another application. For example, if you embed a Microsoft Excel spreadsheet in a Microsoft Word document, an OLE link in the Word document refers to the Excel document.
If you've ever moved a link source (the executable program that a shell or OLE link refers to) in a pre-Win2K version of Windows and then clicked the link that referenced the source, you've witnessed the heuristic-based search that Windows Explorer performs in its attempt to find the source's new location. If the search is unsuccessful, Windows Explorer gives up and requires you to tell it where you moved the source. In Win2K, DLT automatically updates shell links to point at moved link sources. The only requirement is that the link source's original and final locations are both on NTFS5 volumes and in the same domain.
NTFS5 is required for automatic link updating because NTFS5 lets an application assign files and directories a 16-byte (128-bit) object ID. Not coincidentally, 16 bytes is the length of a Windows globally unique ID (GUID). Windows uses GUIDs as a general-purpose identification mechanism because a GUID's length and the algorithm that Windows uses to generate a GUID virtually guarantee that every GUID is statistically unique. The DLT service, which Win2K implements as a Win32 service, assigns a GUID to every link source that resides on an NTFS volume and directs NTFS to record the GUID as the source's object ID. When you click on a shell link or open a document with an embedded OLE link and Windows Explorer fails to find the link source at the path the link specifies, Windows Explorer queries the DLT service for the source's new location. DLT uses the object ID that Windows Explorer supplied to attempt to open the link source on each volume in the domain, starting with local volumes. When DLT finds the file (or directory), DLT asks NTFS for the name of the file and returns the result to Windows Explorer, which updates the link and follows it to the tracked location. Thus, even if the source moves to a volume on another computer in the domain, the link continues to work.
When the DLT service (or another application) assigns an object ID to a file or directory, NTFS creates an attribute of type $OBJECT_ID in the file's Master File Table (MFT) entry, as Figure 1, page 46, shows. (For information about the MFT and its entries and attributes, see "Inside NTFS," January 1998.) This 16-byte attribute stores the GUID. At the same time, NTFS adds an entry to the \$Extend$ObjId metadata file that maps the GUID to the file's MFT entry. Win2K stores the metadata file's entries in an index named $O, which NTFS sorts by object ID in a B+ tree. When DLT finds a moved link and attempts to use the link's object ID to open it, NTFS can look up the link's MFT entry number in the $ObjId file's $O index. After DLT uses the file's MFT entry number to open the file, DLT asks NTFS for the file's name, updates the link, and opens the link source. The $ObjId file's use is an example of general indexing, a new NTFS feature I describe in "Inside Win2K NTFS, Part 1."
Sparse Files
Many applications consist of a server that logs data to a file and a client that reads the data that the server writes. This architecture usually requires the use of a technique called circular logging, in which the log file is a fixed size and the server rolls over, or returns to the beginning of the file, after reaching the file's end. A rollover can lead to overwriting data before the client has a chance to read it. Other applications, such as databases, allocate extremely large files, of which valid data fills only small portions, resulting in disk space that is allocated but unused. NTFS5 includes a feature called sparse-file support that Win2K applies to both these situations to minimize unnecessary disk utilization and avoid the rollover problems inherent in circular logging.
Sparse-file support lets an application designate unused portions of a file as empty, freeing the disk space allocated to the empty regions. In a logging application, the server appends log information to the file without needing to roll over, and the client marks file areas empty as it reads them. A log file might therefore appear to be very large but use only a small amount of disk space. When you view a file's properties, Windows Explorer shows both sizes and calls the amount of space the file takes on disk the Disk size. If an application reads from part of a sparse file that is designated as empty, the application receives zero-filled data. In the case of a database, the database application marks the unused parts of the database as empty, releasing the disk space. Figure 2 illustrates a sparse file.
The NTFS implementation of compressed files has always had a type of sparse-file support. NTFS's compression algorithm compresses files in blocks of 16 virtual clusters, typically allocating fewer than 16 logical clusters on disk to store that data. A virtual cluster is a cluster within a file, and a logical cluster is a cluster on a volume. For an uncompressed file, virtual clusters correspond to logical clusters one-to-one, but in a compressed file, multiple virtual clusters might map to fewer logical clusters. When an application reads from a compressed file, NTFS decompresses the logical clusters that make up the 16-cluster block that the application is reading, recreating the 16 uncompressed virtual clusters in memory. If a 16-cluster block is filled with zeros, NTFS optimizes disk utilization by not allocating any logical clusters for the block. When an application reads from such a region, which is called a sparse region, NTFS returns zero-filled data.
NTFS5's sparse-file support relies
on the same driver subroutines that NTFS uses for sparse regions of compressed files. However, two differences exist between sparse files and compressed files. One difference is that the nonempty portions of a sparse file aren't compressed. The second difference is that an application can indicate to NTFS that a region of a sparse file has become empty, freeing the logical clusters that were previously allocated to it. To make a sparse region of a compressed file, an application must write zero-filled data to the file.
You might think that users could use sparse files to exceed their disk quota
by allocating large empty files and then filling them over time. However, NTFS counts a sparse file's virtual size, not the on-disk size, against a user's quota limit.
Volume Change Tracking
Before Win2K, an application that needed to monitor a volume for changes had limited options. The Win32 API exports functions that notify an application when the contents of a directory or directory tree change, but the application must either scan directories to determine the nature of changes as they occur or take the risk that the list of changes that Windows returns will overflow the application's buffer. Scanning directories after every change causes an unacceptable performance hit, yet many applications, such as incremental backup solutions, can't tolerate missed changes. NTFS5's new volume change tracking facility lets applications easily monitor changes to a volume and provides an effective alternative to earlier approaches for file-
replication, incremental-backup, and virus-scanning applications.
Interestingly, volume change tracking relies on sparse-file support. NTFS disables volume change tracking by default, so applications that need to use it must enable it. When an application turns on volume change tracking, NTFS creates a change journal, which is a sparse file named \$Extend\$UsnJrnl. Figure 3 depicts the change journal. Every change to a volume, such as creating, resizing, or deleting a file, logs an entry to the $J alternate data stream (I describe alternate data streams in more detail later) in the $UsnJrnl file. A change entry records the name of the file that changed, the type and time of the change, and several other pieces of useful information. Applications can request that NTFS notify them when it adds new entries to the journal and can read entries as the system logs them. The Win2K File Replication Service (FRS), which Win2K uses to replicate Dfs shares and to propagate group policies, is one of the change journal's clients.
NTFS monitors the on-disk space that the change journal consumes and limits this space to the amount the application specifies when it enables change tracking. When a journal exceeds the specified size, NTFS marks the change journal's oldest valid entries as empty. For even a moderately sized change journal, only an extremely high volume of file activity would cause an application to fall so far behind that it couldn't read entries before NTFS deletes them.
Encryption
Support for Encrypting File System (EFS) is an important part of NTFS5. Although I cover that support extensively in the series "Inside Encrypting File System" (June and July 1999), I provide a summary here. (For background information about EFS, see "Related Articles in Previous Issues." For an administrator's view of EFS management, see Mark Minasi, Inside Out, "Decrypting EFS," page 139.)
EFS isn't built in to NTFS but is an add-on driver (\winnt\system32\driversefs.sys). EFS provides transparent file-based encryption so that users can protect sensitive data that might fall into unauthorized hands. Although NTFS security prevents unauthorized access to a file while a Win2K system is online, a malicious user could bypass this security by booting a computer to another installation or by using NTFSDOS (available from www.sysinternals.com/ntfs30
.htm) from a DOS boot floppy disk.
When someone uses the Advanced Attributes panel of a file's properties dialog (which Figure 4 shows) to initially encrypt a file, the EFS service, which runs in the Local Security Authority Subsystem (LSASS\winnt\system32\lsass .exe), uses Crypto API services to assign the user a private/public key pair that EFS can use for file encryption. EFS randomly generates a file encryption key (FEK) for an encrypted file and uses the FEK and the Data Encryption Standard X (DESX) encryption algorithm to encrypt the file's data. Because DESX is a symmetric algorithm, decryption also uses the FEK, so EFS must protect the FEK from unauthorized access. To protect the FEK, EFS uses the RSA asymmetric encryption algorithm to encrypt the FEK with the user's EFS public key and stores the encrypted FEK with the file. When a user opens and reads an encrypted file, EFS uses the user's EFS private key to decrypt the FEK, then uses the decrypted FEK to decrypt the file data.
Because EFS uses the user's password to encrypt the user's EFS private key, an intruder would have to obtain the user's password to compromise encrypted file data. Further, although the initial release of Win2K stores encrypted EFS private keys on disk within a user's profile directory, subsequent releases will let users store their keys on smart cards.
EFS stores encrypted FEKs in a file's $LOGGED_UTILITY_STREAM attribute (which is new to NTFS5), as Figure 5, page 50, shows. The stored data consists of a Data Decryption Field (DDF) and a Data Recovery Field (DRF). The DDF contains a copy of the FEK encrypted with the user's EFS public key, and the DRF contains a copy of the FEK encrypted with the system's Recovery Agent EFS public key. On a standalone system, the local administrator is the default system Recovery Agent; on a domain-based system, the default Recovery Agent is the domain administrator. Because EFS includes in every file an FEK encrypted with a Recovery Agent's EFS public key, an authorized user can decrypt the file and recover its contents if the user is unable to do so.
When the system boots, NTFS reads the Registry value HKEY_LOCAL_MACHINE\ SYSTEM\CurrentControlSet\Control\FileSystem\NtfsEncryption
Service to obtain the EFS driver's name, then starts the driver. The EFS driver registers callbacks (i.e., routines that NTFS directly invokes) so that NTFS can hand off encrypted-file-related operations to the EFS driver. Internally, NTFS handles encrypted files much as it handles compressed files. NTFS decompresses a compressed file's data into the Win2K file-system cache and recompresses the data when it's written to the on-disk file. Similarly, NTFS decrypts an encrypted file's data into the cache and re-encrypts the data when writing it to disk.