GetRetrievalPointers. The GetRetrievalPointers command returns an array of mapping information
about the clusters allocated to a file. Each fragment of the file corresponds to an array entry that consists of a
logical cluster number within the file and a drive cluster number. A final array entry contains the file cluster number
of the end of the file. For example, consider a file with three fragments: The first fragment starts at drive cluster
1200 and continues for 4 clusters, the second fragment starts at drive cluster 1000 and continues for 3 clusters, and
the third fragment starts at drive cluster 1300 and continues for 5 clusters. Table 1 shows the mapping array that
GetRetrievalPointers returns for the file.
Sometimes GetRetrievalPointers returns a mapping array that contains a drive cluster entry of -1, as shown in Table
2. This entry signals a compressed file. In Table 2's example, the compressed data starts at drive cluster 1200 and
continues for 4 clusters. The final file cluster entry of 16 means that the uncompressed file will require 12 more
clusters.
MoveFile. The MoveFile command is the heart of NT's defragmentation support. MoveFile requires a
handle to both the file that contains segments to be moved and the file's drive. Additional parameters track the
starting cluster (relative to the start of the file) of the segment to be moved, the target drive cluster number, and
the length of the segment. If the target clusters are free, MoveFile relocates the original clusters in a way that
prevents data loss in case the system crashes during the move.
GetVolumeData. NT includes the two NTFS-specific defragmenting commands because the NTFS driver
uses clusters differently from the FAT driver. The GetVolumeData command obtains detailed information about an NTFS
drive, including its cluster size (in bytes), the size of the drive (in clusters), and the amount of free space on the
drive (in clusters).
Defragmenters use GetVolumeData to identify a reserved portion of the disk (known as the MFT-Zone) that NTFS uses
for expanding the Master File Table (MFT). The MFT is NTFS's index to all the files and directories on a drive.
To optimize file lookups, NTFS tries to keep the MFT defragmented by not allocating clusters around the MFT to
other files. GetVolumeBitmap reports free clusters in the MFT-Zone, but MoveFile will not relocate clusters to this
area; defragmenters need to know the MFT-Zone's location to avoid it.
ReadMFTRecord. The other NTFS-specific command, ReadMFTRecord, obtains a record of information from
the MFT that you can use to create a cluster map for a file. Alternatively, you can enumerate the files on the drive and
use GetRetrievalPointers to obtain mapping information for each file. However, sequentially reading MFT records and
interpreting the raw NTFS on-disk data structures can enhance performance.
How NT 4.0 Defragmenters Work
| Disk defragmenters aim to optimize system performance by
reorganizing non-contiguous file clusters. |
At press time, only Executive Software and Symantec offer defragmenters for NT 4.0. Both vendors provide
downloadable versions of their products. The products perform the same basic operations, but they employ different
defragmentation algorithms. Let's look at how the products work.
First, each product creates a map of the drive, which shows the file fragmentation to the user. Mapping a drive
takes three steps: Get the map of free clusters on the drive with GetVolumeBitmap, enumerate all the files on the drive
and obtain file cluster maps with GetRetrievalPointers or ReadMFTRecord, display the file mappings.
Some clusters in the bitmap appear to be in use but not allocated to a file. These clusters belong to system
metadata files (i.e., files that store file system-related information), directories, or files accessed exclusively by a
process other than the defragmenter. Both products designate these clusters as immovable in their GUIs. The products
also identify the MFT-Zone if the drive is an NTFS volume.
Next, the products enter a defragmenting phase. Because the drive mapping information constantly changes as
programs create, delete, grow, and shrink files, the products do not rely on the information displayed to the user.
Instead, they again enumerate all the files on the drive, and perform the following steps for each file: Get the map of
free clusters on the drive with GetVolumeBitmap; obtain a cluster map for the file in question using
GetRetrievalPointers or ReadMFTRecord; move segments of the file with MoveFile in an attempt to defragment the file.
The logic behind the third step is different for each product, depending on whether the defragmenter tries to move
files to defragment the drive's free space or make room for files in specific places. Defragmentation is an iterative
process that can even be undone as other processes perform file operations, so the defragmenters often repeat the third
step many times.
Because MoveFile requires a handle to the file to be moved, the defragmenters must open a file before moving it.
Opening a file that another process has already opened for exclusive access is not possible. Neither product can move
files such as the MFT, Registry files, and Paging files because the system opens these files for exclusive access.
NTFS Caveats
Some restrictions apply to the NTFS implementation of MoveFile because its cluster movement engine uses NTFS file
compression code. NTFS file compression adds a twist to the way NTFS allocates clusters for files.
NTFS performs compression on 16-cluster segments of a file. If a 16-cluster segment of data compresses down to 5
clusters, for instance, NTFS stores the 5 clusters on disk and notes the remaining 11 clusters as virtual clusters.
To read the compressed file, the system reads the 5-cluster compressed portion from the disk, allocates memory for the
11 virtual clusters, and fills those memory locations with 0s. The system passes this 16-cluster chunk to the
decompression algorithm, which re-creates the original data.
On FAT volumes, MoveFile can move clusters individually. The NTFS MoveFile routine moves clusters in only
16-cluster blocks because NTFS file compression works with 16-cluster segments.
Furthermore, the NTFS MoveFile function does not work with clusters larger than 4KB (NTFS file compression buffers
are 64KB in size: 64KB ÷ 16 = 4KB). On drives larger than 4GB, the FORMAT utility initializes NTFS partitions with
cluster sizes greater than 4KB; consequently, large drives with FORMAT's default cluster sizes do not support
defragmentation.
Finally, NTFS prevents deallocated clusters from being used again until NTFS checkpoints the drive's state.
Once every few seconds, NTFS ensures that all its crash recovery data is safely on disk; only then can deallocated
clusters be reused. This characteristic challenges defragmenters because they can't determine when they can reallocate
free clusters without repeated calls to GetVolumeBitmap.