Bad sector

A bad sector in computing is a disk sector on a disk storage unit that is unreadable. Upon taking damage, all information stored on that sector is lost. When a bad sector is found and marked, the operating system like Windows or Linux will skip it in the future. Bad sectors are a threat to information security in the sense of data remanence.

Details

Bad sectors can be "soft" (logical) or "hard" (hardware, physical), depending on what is making the sector inaccessible. In case of power loss, bit rot (more likely on floppy disks), or firmware issues, the on-disk format can be corrupt beyond what the error correcting code can fix. This is a "soft" bad sector: writing over the corruption would succeed.^[1]

On the other hand, sectors broken physically cannot be restored: writing would fail, forcing a remap. A new drive may start with some innocuous bad sectors due to manufacturing flaws. Larger patches occur throughout use, due to head crash, wear-and-tear, physical shock, or dust intrusion.^[2]^[1]

On solid-state drives, flash wear or flash controller error may also cause bad sectors.^[3]

Handling

Operating system

Bad sectors may be detected by the operating system or the disk controller. Most file systems contain provisions for sectors to be marked as bad, so that the operating system avoids them in the future. Disk diagnostic utilities, such as CHKDSK (Microsoft Windows), Disk Utility (on macOS), or badblocks (on Linux) can actively look for bad sectors upon user request.

With the advent of SMART-enabled disk controllers (see below), the burden of avoiding bad sectors more commonly falls to the disk.^[4] Some newer file systems such as Btrfs and ZFS do not have a bad-block avoidance feature at all.^[5] Software tools that look for bad blocks still have a use case: by issuing writes at detected bad sectors, one can expedite the remapping process, avoiding further attempts at reading the bad sector.^[6]

Disk controller

When a sector fails a normal read access, a typical firmware of a modern (post-1990) disk controller would retry a few times in hope of succeeding, before timing out and marking it as "pending". (A successful read not only needs to produce data, but also needs to pass the error correction code.) Pending sectors may be retried on further reads. The repeated retry action produces a noise known as (the HDD version of) click of death.

When a sector is found to be unwritable (or not holding onto the written data when read immediately after a write), the firmware typically remaps the logical sector to a different physical sector no matter whether it was marked pending. Conversely, if a pending sector is successfully written to, it is removed from the pending list.

In both cases, the operations are transparent to the operating system, which only needs to issue sector-read and sector-write commands. The retries due to an unreadable sector may cause excessive delay before a definite success or failure.

There are two types of remapping by disk hardware: P-LIST (mapping during factory production tests) and G-LIST (mapping during consumer usage by disk microcode).^[4] Utilities can read the Self-Monitoring, Analysis, and Reporting Technology (SMART) information to tell how many sectors have been reallocated, and how many spare sectors the drive may still have.^[7] Because reads and writes from G-list sectors are automatically redirected (remapped) to spare sectors, it slightly slows down drive access even if data in drive is defragmented. Once the G-list is filled up, the storage unit becomes incapable of further remap and will show write errors to the operating system.^[8]^[9]

Command set comparison

Compared to ATA, the SCSI command set allows finer-grained management of bad sectors. Users can read the G-LIST, control whether automatic remap is performed (not only possible on write failures [AWRE], but also on read-with-ECC success [ARRE]) and use a dedicated command REASSIGN BLOCKS to manually remap if needed. The command set also provides a way to perform low-level format with FORMAT UNIT.^[10]

For example, with a sector that fails the first read attempt but succeeds on subsequent ones, the following choices are reasonable: ignore, write what was successfully read back (refresh), or attempt to remap (because the sector has proven to be bad at retaining data). The ATA command set provides ways to do the first two but not the third, leaving the possibility of further rot on this sector. The situation is similar with fully unreadable sectors, just more severe as the sector has proven capable of total data loss. The old filesystem mechanisms for avoiding bad sectors can be used in this case to avoid writing new data to such dubious sectors.

The SCSI / ATA Translation (SAT) standard defines a read-write-verify sequence for translating the REASSIGN BLOCKS command to ATA.^[11]

Manipulation methods

The SCSI and ATA standards used to have WRITE LONG commands for writing the raw contents of the sector, including the error correction code (ECC) data. As a result it could be used to create deliberate "soft" bad sectors, which in turn can be used to verify bad sector support in disk utilities and forensic tools. This is supported by:

The Windows program ATATool. For example, to create an error at LBA 10, ATATOOL /BADECC:10 \\.\PhysicalDrive1.
The Linux program hdparm, via --make-bad-sector. Can also issue the SCSI version due to SAT needs.^[6]
The Linux sg3_utils includes a command sg_write_long for this purpose.^[12]

The newer alternative to a raw WRITE LONG in ATA is WRITE_UNCORRECTABLE_EXT, which flags sector as bad immediately and prohibits further retries. The corresponding command in SCSI is WRITE LONG with a WR_UNCOR (write uncorrectable) option bit set. This can be accessed through hdparm^[6] or sg_write_long.

The counterpart to WRITE LONG is the equally obsolete READ LONG. It can be used to read the raw sector contents including the ECC data. It is accessible through sg_read_long for SCSI. No known program uses the ATA version of this command.

Frequency

In a 2007 study, CERN observed 1.53 million hard drives from 30 models over 32 months and analyzed the drive read errors returned. They noted that 3.5% of drives developed "latent read error" (i.e. unreadable bad sector), and that a disk with a bad sector is more likely to develop more. Bad sectors cluster spatially (in a 10–MB neighborhood) and temporally. Errors recovered by ECC, which are reported by enterprise drives (using the SCSI command set), also suggest a higher chance of a bad sector in the future.^[13]

References

^ ^a ^b Zhang (2 March 2018). "Hard vs Soft Bad Sectors in HDD: Different Causes and Solutions". Data Recovery Blog.
^ Chris Hoffman (5 July 2017). "Bad Sectors Explained: Why Hard Drives Get Bad Sectors and What You Can Do About It". How-To Geek.
^ "Question - should i rma my 980 pro". Tom's Hardware Forum. 14 February 2023. Retrieved 22 July 2024.
^ ^a ^b "Bad Sector Remapping". mjm.co.uk. Archived from the original on 10 March 2018. Retrieved 9 March 2018.
^ "badblocks - Can btrfs track / avoid bad blocks?". Unix & Linux Stack Exchange.
^ ^a ^b ^c hdparm(8) – Linux Programmer's Manual – Administration and Privileged Commands from Manned.org. "--make-bad-sector Deliberately create a bad sector (aka. "media error") on the disk. [...] Note also that the --repair-sector option can be used to restore (any) bad sectors when they are no longer needed, including sectors that were genuinely bad (the drive will likely remap those to a fresh area on the media). --write-sector: This can be used to force a drive to repair a bad sector (media error)."
^ Monitoring Hard Disks with SMART.Linux Journal, 2004.
^ "Encyclopedia". PCMag.com. Ziff Davis.
^ Stephens, Curtis E, ed. (11 December 2006), Information technology - AT Attachment 8 - ATA/ATAPI Command Set (ATA8-ACS), working draft revision 3f (PDF), ANSI INCITS, pp. 198–213, 327–344, archived from the original (PDF) on 30 July 2007
^ "INCITS 506-202x - Information technology - SCSI Block Commands - 4 (SBC-4) draft revision 22". 15 September 2020. Retrieved 22 May 2023. (sbc4r22.pdf)
^ Information technology - SCSI / ATA Translation - 6 (SAT-6) Draft Revision 2 SAT6r02.pdf
^ sg_write_long(8) – Linux System Administration Manual from ManKier.com
^ Lakshmi N. Bairavasundaram; Garth R. Goodson; Shankar Pasupathy; Jiri Schindler (June 2007). "An analysis of latent sector errors in disk drives". Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems. San Diego, California, United States: ACM. pp. 289–300. CiteSeerX 10.1.1.63.1412. doi:10.1145/1254882.1254917. ISBN 9781595936394. S2CID 14164251. Retrieved 9 June 2012.

External links

Bad Blocks Definition

[Zhang-1] Zhang (2 March 2018). "Hard vs Soft Bad Sectors in HDD: Different Causes and Solutions". Data Recovery Blog.

[2] Chris Hoffman (5 July 2017). "Bad Sectors Explained: Why Hard Drives Get Bad Sectors and What You Can Do About It". How-To Geek.

[3] "Question - should i rma my 980 pro". Tom's Hardware Forum. 14 February 2023. Retrieved 22 July 2024.

[MJM-4] "Bad Sector Remapping". mjm.co.uk. Archived from the original on 10 March 2018. Retrieved 9 March 2018.

[5] "badblocks - Can btrfs track / avoid bad blocks?". Unix & Linux Stack Exchange.

[hdparm-6] hdparm(8) – Linux Programmer's Manual – Administration and Privileged Commands from Manned.org. "--make-bad-sector Deliberately create a bad sector (aka. "media error") on the disk. [...] Note also that the --repair-sector option can be used to restore (any) bad sectors when they are no longer needed, including sectors that were genuinely bad (the drive will likely remap those to a fresh area on the media). --write-sector: This can be used to force a drive to repair a bad sector (media error)."

[7] Monitoring Hard Disks with SMART.Linux Journal, 2004.

[8] "Encyclopedia". PCMag.com. Ziff Davis.

[9] Stephens, Curtis E, ed. (11 December 2006), Information technology - AT Attachment 8 - ATA/ATAPI Command Set (ATA8-ACS), working draft revision 3f (PDF), ANSI INCITS, pp. 198–213, 327–344, archived from the original (PDF) on 30 July 2007

[sbc4-10] "INCITS 506-202x - Information technology - SCSI Block Commands - 4 (SBC-4) draft revision 22". 15 September 2020. Retrieved 22 May 2023. (sbc4r22.pdf)

[11] Information technology - SCSI / ATA Translation - 6 (SAT-6) Draft Revision 2 SAT6r02.pdf

[12] sg_write_long(8) – Linux System Administration Manual from ManKier.com

[13] Lakshmi N. Bairavasundaram; Garth R. Goodson; Shankar Pasupathy; Jiri Schindler (June 2007). "An analysis of latent sector errors in disk drives". Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems. San Diego, California, United States: ACM. pp. 289–300. CiteSeerX 10.1.1.63.1412. doi:10.1145/1254882.1254917. ISBN 9781595936394. S2CID 14164251. Retrieved 9 June 2012.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]