File Recovery Basics: How Data Recovery Works

No two data recovery situations are alike. Many times, it's possible to completely recover lost files from a disk, including the original filenames and folder structure. Other times, the files and data may be recovered, but the filenames, date/timestamps and folder paths are lost. And in some cases, no intact files can be found. This raises a common question from our users: why?

To help contextualize the answer to this question, it helps to gain a basic understanding of how files are stored on the disk and how they can be recovered. While professional data recovery usually requires years of experience and deep knowledge of the technical nuances of file systems and disk physics, learning the basics can help you set reasonable expectations for your data recovery software.

In this article, we'll take a very high level look at how file recovery works. We'll also show you how to apply this knowledge to a few common scenarios in order to estimate your chances for a successful file recovery.

How files are stored on the disk

To understand how files can be recovered from a disk, it helps to understand how files are stored on hard disks before they are lost.

Most modern operating systems divide (or "partition") the entire physical hard drive into one or several independent parts ("partitions"). In the DOS/Windows-based OS families, these partitions are called "logical disks." Logical disks are assigned drive letters and optional descriptive labels. For example, C: (System) or D: (Data). Each partition has its own file system type, independent from other partitions on the same physical disk. For example, a physical hard drive for a Windows system may contain two logical disks: one NTFS and another FAT32. Information about the partitions on the disk is stored at the beginning of the hard drive. This is usually referred to as a "partition table" or "partition map."

A typical partition structure is shown in Figure 1.
File_Recovery_Basics_1.png
Figure 1: Hard drive structure
Click image to enlarge

The hard drive service data and info about partition structure portion shown in Figure 1 is known as "meta-data." That is, information about the data on the disk (as opposed to the data itself). Similarly, each partition or logical disk is divided into two parts: one stores information about the disk (folder structure, file system, etc.) and the other stores the data that comprises those files. This division of data from meta-data allows for better disk space management, faster file search and increased reliability.

Figure 2 shows a typical logical disk structure.
File_Recovery_Basics_2.png
Figure 2: Logical disk structure
Click image to enlarge

The disk service information shown in Figure 2 contains specific information about the partition size, file system type, etc. This is necessary for the computer to correctly find the necessary data on the partition.

The info about files and folders contains file records that store filenames, sizes, date/times, and other technical information. This information also includes the exact physical locations (addresses) of the file data on the disk. This information is usually backed up on the drive itself, in case the first copy becomes corrupted.

Various file systems have different forms of storing this information. For example, the FAT file system stores this info in a File Allocation Table (FAT), whereas the NTFS file system stores it in a Master File Table (MFT).

When a computer needs to read a file, it first goes to the info about files and folders and searches for the record of that file. Then, it looks up the address of the file, goes to the specified place on the disk, and then reads the file data.

For contiguous files, where the data is grouped together on the disk, this process is very straightforward. However, files on the disk may be fragmented. That is, they may occupy several non-adjacent disk areas. This is more common than most users realize. After all, when you view a file from Windows Explorer or Finder, it is always represented as a single file. This is because the file system is doing all the work of piecing together the fragments behind the scenes. The info about files and folders stores the addresses of each fragmented piece of data so they can be quickly and reliably retrieved when the computer needs to read the file. This information and how it's retrieved plays an important role in file recovery.

When a computer wants to delete a file, it doesn't immediately destroy its data. Instead, it makes some changes to the info about files and folders to designate that the file has been deleted. Some operating systems simply mark the file as deleted, retaining all the meta-data about the file until it becomes necessary to overwrite it with meta-data about a new file. This is how Windows file systems handle deletions. Other operating systems, like Mac OS X, completely destroy the file record of the deleted file. While operating systems vary in whether they preserve or delete the info about files and folders immediately, all operating systems leave the actual file data untouched until it becomes necessary to allocate that disk space for another file. If no files are going to be written to the disk, the data information about the file and its data may remain forever.

As noted above, the portion of the disk that stores the file data also contains a backup copy of the info about files and folders. This part of the disk may also contain some additional pieces of information about file and folder structure scattered across the entire disk.

File recovery methods

Before we discuss the different methods for file recovery, it's important to note one thing:
If the data on the disk is overwritten, then the old data is gone. No program or commercially available data recovery method can recover it.

This is why it's of the utmost importance that no new files are written to a disk prior to attempting a data recovery.

For files that have not been overwritten, there are two file recovery methods. All data recovery software use one or both of these techniques.

Method One: File recovery through analysis of the info about files and folders

This is the first method that a file recovery program attempts to perform. This is because it can recover files with their original names, paths, date/time stamps, and their data (if successful).

The file recovery software starts by trying to read and process the first copy of the info about files and folders. In some cases (such as accidental file deletion), this is the only step that needs to be taken in order to recover the files in their entirety.

If the first copy of the info about files and folders is severely damaged, the software scans the disk for the second copy of the info about files and folders. It also attempts to glean additional information about the folders and files structure that may be on the data part of the disk. Then, it processes all this information to reconstruct the original folders and file structure.

If the file system on the disk isn't severely damaged, it is often possible to recover the entire file and folder structure.

If the file system on the disk is severely damaged, this recovery method cannot recreate the entire folder structure. Then recovered files will appear in "orphaned" folders.

Figure 3 shows these "orphaned" folders in R-Studio and R-Undelete.
File_Recovery_Basics_03.png
Figure 3: Recovered files and folders structure and "orphaned" folders
Click image to enlarge

Method Two: File recovery using search for known file types (raw file recovery).

If the first method fails to produce satisfactory results, a raw file search is performed. This second data recovery method can recover file data with greater success than the first method, but it cannot reconstruct the original file names, date/time stamps or the entire folder and file structure of the disk.

A search for known file types, or raw file recovery, works by analyzing the contents of the disk for "file signatures." File signatures are common patterns that signify the beginning or end of a file. Almost every file type has at least one file signature. For example, all files of the png (portable network graphics) file type start with the "‰PNG" string and many MP3 files start with the "ID3" string. Such file signatures can be used to recognize that a piece of data on the disk belongs to a certain file type and can therefore be recovered.

After performing a raw file search, R-Studio and R-Undelete will show possibly recoverable files as "Extra Found Files." These are assigned file extensions based on the identified file signature and given placeholder filenames (sequential numbers). See Figure 4 and Figure 5.
File_Recovery_Basics_05.png
Figure 4: Files recovered using search for Known File Types (Extra Found Files) that are found inside a logical disk
Click image to enlarge

File_Recovery_Basics_06.png
Figure 5: Files recovered using search for Known File Types (Extra Found Files) that are found outside a logical disk
Click image to enlarge

Limitations of Raw File Recovery

Although this method is usually the best technique for recovering data from a heavily damaged file system, there are some limitations. First is the fact that some file types have a "beginning" and "end" file signature, some have only a "beginning" file signature (and no "end" file signature) and some file types have no immediately recognizable file signature at all.

If the data recovery software can identify the "beginning" and "end" file signature, the file can easily be identified and recovered. For files that do not have an "end" file signature, the software can sometimes recover the file by assuming that it ends at the beginning of the next file. For files with no signature, like encrypted disk containers, a raw file search won't be able to distinguish the data from pieces of unallocated disk space.

Each of these circumstances is further complicated by file fragmentation. Furthermore, files without end file signatures may have long "garbage" trails after recovery. Figure 6 illustrates why.
File_Recovery_Basics_3.png
Figure 6: File data on the disk
Click image to enlarge

In the situation shown in Figure 6, File 1 and File 3 will be successfully recovered, while recovery File 2 and File 4 will be unsuccessful. Table 1 explains why:

Table 1: Data Recovery of Fragmented Files
File Condition Outcome
File 1 No end file signature, but the file ends right before the File 2 beginning signature. Successful file recovery.
File 2 Fragmented file. File 3 intersects File 2. Unsuccessful data recovery. The software will end the file at the beginning of File 3. The second part of File 2 will be lost
File 3 Contiguous file with a beginning signature and file end signature. Successful file recovery.
File 4 No end file signature, followed by unallocated space. Unsuccessful data recovery. The software will assume file ends at the beginning of File N, and will attach unallocated space to the end of File 4.

In addition to the complications of fragmented files, a raw file search also occasionally yields "false positives." For example, the string "ID3" may occur in a file without being a file signature. For example, the text you are currently reading includes the string "ID3" but is not an MP3 file. As such, a raw file search may incorrectly identify a portion of this text as the beginning of an MP3 file.

Advanced File Recovery

While all data recovery programs use a variation of one of the above methods, there are a number of additional search parameters and data recovery techniques that can be applied to garner better results. Advanced file recovery programs, like R-Studio, allow users to manually specify very complex and advanced sets of file signatures, including custom file signatures.

In practice, these two file recovery methods are usually used together: some files are recovered using one method, while the remaining files are recovered using the second method. Advanced file recovery programs like R-Studio and R-Undelete can use them simultaneously. That is, they can analyze the remnants of a damaged file system and search for known file types during the same disk scan.

Estimating File Recovery Outcomes

Using the above information, you can make a fairly accurate estimate of your chances for a successful file recovery. However, it's important to keep the following factors in mind:

  • The damage caused to a file system can be unpredictable. The condition of your files will depend on what caused the files to be lost, the health of the disk before the failure or data loss and any actions taken prior to data recovery attempts. The estimations given in this article are approximate, and only apply to disks where no new data has been written to the damaged disk.
  • Data recovery should not be attempted on disks with hardware failures. If you suspect that the disk may be physically damaged, your best course of action is to refer the case to a data recovery professional with the proper experience and equipment. Any attempts to tamper with a physically damaged disk may cause further data loss, rendering subsequent data recovery attempts futile.
  • We recommend performing all data recovery tasks from images of the actual disks in order to preserve the original data on the disk. This allows you to run multiple data recovery attempts without causing changes to the disk or risking further data loss. Advanced file recovery programs like R-Studio and R-Undelete can create disk images and scan the data simultaneously.

Case 1: File recovery from a hard drive with damaged service information

If a disk is improperly mounted or removed (e.g. due to a power outage or user error), it can cause some or all of the hard drive meta-data to be damaged or lost. Although the original hard drive service data and information about partition structure is lost, the rest of drive data remains intact. In these cases, file recovery programs can analyze the info about files and folders that is still intact on the partitions and recover all files and folders from them. An extra search for known file types is rarely necessary in these cases. This is the best case scenario for data recovery and typically yields the most successful results.

Case 2: File recovery from a repartitioned hard drive (physical disk)

If a disk is accidentally repartitioned, the outlook is similar to Case 1, with one important exception: When a new partition structure is created, some new data is written to the disk. This new data will overwrite the existing disk service information about the entire physical disk. However, the rest of the data from the old partitions remains untouched, including the info about files and folders. Thus, file recovery programs can scan the disk, find this information, and recover those files and folders that haven't been affected by the new partition data. An extra search for known file types is rarely necessary.

Case 3: File recovery from a reformatted partition (logical disk)

Reformatting is typically more disruptive than repartitioning, depending on the type of disk format operation that was performed.

A full format causes all data on the partition to be overwritten with a certain pattern (usually 00 or FF). This makes it impossible to recover any files from the partition.

A quick format causes some or all of the info about files and folders to be overwritten, but leaves the file data untouched. File recovery programs can scan the disk, find what remains from the previous file system, and recover files and folders accordingly. Depending on the file system(s) before and after the reformat, the data recovery results will vary widely using the first method. An extra search for known file types can be very successful in this case, even if the first method fails to find any files.

Case 4: File recovery from a disk with a damaged file system

This case depends heavily on how severely the file system has been damaged. Recall that the disk contains two copies of the info about files and folders. If only one copy is damaged, a file recovery program can read data from the backup copy and recover all the file info and data. If both copies are severely damaged, the prospects for a successful file recovery could be very grim. As with Case 3, an extra search for known file types may be the saving grace.

Case 5: File recovery when files have been moved across the disk

If computer freezes or crashes during a disk defragmentation or repartitioning operation, the results can be disastrous. This is typically the worst case scenario for file recovery. Although the info about the files and folders may look healthy, the meta-data may point to the wrong physical addresses for files that were in the process of being moved. For example, the data may have been written to a new location, but the info about the files and folders may not have been updated yet. Or, the info about the files and folders may have been updated, but some or all of the file data hasn't been moved yet. In these cases, even an extra search for known file types may be unsuccessful, as many files may be fragmented.

Conclusion

Although this article only gives a brief overview of the basic file recovery concepts, it should help you understand what to do and what to expect from your data recovery scenario. For more detailed instructions on how to use file recovery programs like R-Studio and R-Undelete, refer to their online help documents. You can also find detailed case studies and tutorials on specific file recovery cases in our online articles.

As always, the best way to estimate your chances for file recovery is to try our programs for free. Using the demo mode of R-Studio or R-Undelete, you can perform the full disk scan and advanced data recovery analyses in order to see which files are recoverable. If the files you are looking for are recoverable, you can register-on-the-fly in order to fully recover your files and save them to another disk.

© Copyright 2000-2016 R-Tools Technology Inc.