(NOTE: This is a work in progress. Don't add it to the index pages)
Disk partitioning in Linux can be confusing. This article attempts to take some of the mystery out of it.
Disk controllers and disks
The first thing to know about is the naming of disks. Linux's approach to disk naming is slightly confusing, and requires a little knowledge about the hardware.
Linux treats different types of disk hardware differently. Concentrating on commodity PC hardware, there are four or five main types of storage hardware available:
- IDE (also known as ATA, PATA, or ATAPI for CD drives)
- Serial ATA (SATA)
- USB (including memory sticks, external disk drives)
IDE has its own driver layer, and all of the others are implemented through the SCSI layer. We will limit the discussion, therefore, to IDE and SCSI devices. If you have Serial ATA, USB or Firewire devices, the SCSI information below applies.
The IDE specification allows a maximum of two drives per IDE controller, with both drives on a single ribbon cable. One drive is designated the master, and the other is the slave. Most PC systems have traditionally had two controllers on-board, for a maximum of four drives. Additional controllers can be added with expansion cards. Typically, controllers are described as "primary", "secondary", "tertiary", etc., usually starting with the two on-board controllers.
Linux allocates device identifiers to drives according to how they are attached to the controllers (known as the "topology"). So:
- /dev/hda is the primary master drive
- /dev/hdb is the primary slave
- /dev/hdc is the secondary master
- /dev/hdd is the secondary slave
- /dev/hde is the tertiary master
- and so on.
IDE CD drives (also known as ATAPI devices) are named using the same scheme. For example, an IDE CD-ROM (or CD-RW, or DVD drive, or DVD-RW) connected as the secondary master will be called /dev/hdc.
[[|As a historical aside, SATA devices were run from the IDE layer up until about kernel 2.6.7. Since SATA-I only allows one device per controller, SATA devices using the IDE layer are named /dev/hda, /dev/hdc, /dev/hde, /dev/hdg... ]]
The SCSI specification allows many more drives (and other devices such as scanners and tapes) attached to an individual controller. Linux names SCSI devices in order of their discovery. It also splits up the different types of SCSI devices into separate groups. SCSI hard disks are called sd, and are lettered:
- /dev/sda is the first SCSI hard disk
- /dev/sdb is the second SCSI hard disk
- /dev/sdc is the third SCSI hard disk
- and so on.
SCSI CD drives (and other CD or DVD drives) are sr, and are numbered:
- /dev/sra is the first SCSI CD drive
- /dev/srb is the second SCSI CD drive
- /dev/src is the third SCSI CD drive
- and so on.
SCSI tape drives are st, and are numbered:
- /dev/st0 is the first SCSI tape
- /dev/st1 is the second SCSI tape
- and so on.
Other SCSI devices are usually implemented as "generic" devices, and are called sg, numbered:
- /dev/sg0 is the first SCSI generic device
- /dev/sg1 is the second SCSI generic device
- and so on.
Remember that things like USB and SATA also use the SCSI drivers, and so the first SATA drive on a system is /dev/sda.
Disk Addresses, or, "Cylinders, Heads, and Sectors? Aren't they parts of a car?"
This section generally isn't something you need to know about these days, with modern hardware and software. If you're reading this for the first time, you may want to skip to the next section, on partitioning.
Back in days of yore, when geeks were real geeks, and hard disks were new technology and required a fork-lift truck to move... raw data was laid out on disks in "spider-web" style, where the data was put in concentric circular regions around the disk, and then each ring divided into the same number of equal parts. Since a disk system was often composed of a number of separate platters, each platter with its own read/write head, the concentric regions were called "cylinders" (imagine the outermost ring on each disk, placed one above the other, forming a cylinder). Finding a particular piece of data on such a device would require three pieces information: the cylinder it was in, which read/write head (and hence which platter) to read it from, and the position round the disk from some known start-point. These three pieces of location information are known as cylinder, head and sector, abbreviated to CHS. Since that time, all [PC] disk drives have used CHS indexing to identify a location on the disk.
(Stuff about CHS limitations, CHS translation, BIOSes, the 1024-cylinder limit, other limits)
Partitioning and disk layout
Whilst it is possible to use a raw disk device for storage, is isn't a typical method of use. Instead, disks are partitioned (called "disk slices" in some systems). The standard method of partitioning PC disks is, as with most PC hardware, fraught with limitations and caveats stemming from the way that the technique has evolved over time.
The format of a standard PC disk is that the first sector of the disk is set aside as the Master boot record (or MBR, also known as the boot sector). This contains up to 446 bytes of assembly code, with 66 bytes at the end of the sector. The final two bytes of the sector are a "boot indicator", indicating whether the disk can be booted from. The other 64 bytes at the end of the first sector are the partition table, consisting of 4 16-byte records. These four records each describe one partition on the disk. Thus a PC format disk can contain only four partitions.
Of course, this is pure bunkum.
The four partitions that can be described in the main MBR are known as the primary partitions. To be able to have more than four partitions, one of the primary partitions can be marked as an extended partition. This partition contains no data of its own, but has a "boot sector". In this boot sector, more partitions can be defined. These are known as the logical partitions, and these logical partitions can contain data. See (PICTURE).
In Linux, the primary partitions are numbered 1-4, and the logical partitions are numbered from 5 onwards, regardless of how many primary partitions there are. So, if you have two primary partitions, one of which is extended and contains three logical partitions, they will be numbered 1, 2, 5, 6 and 7. Partition 2 will not be usable for data, as it is just the extended partition.
Partition devices in Linux are the device name of the disk, followed by the partition number. So the first primary partition on the second SCSI drive in the system will be /dev/sda1. The second logical partition on the first IDE drive in the system will be /dev/hda6.
The most important thing to remember here is:
A filesystem is not a partition.
A partition is a simple, flat, storage container into which data can be written. Think of it like a page in a book. A filesystem is a specific layout for data, which can be interpreted to give (typically) a hierarchical layout of data and metadata, giving you directories, files, file permissions, and all the other paraphernalia of permanent on-line data storage. Think of the filesystem like the structure of the text on a page -- margins, column layout, headings, subheadings.
Typically, a filesystem is created inside a partition, one filesystem per partition. The one-to-one relationship means that the filesystem is usually referred to using the name of the partition that it lives on. So, when running, say, "fsck /dev/hda3", you aren't "fscking the partition", but "fscking the filesystem on the partition".
To use a filesystem properly, it must be mounted somewhere on the system. Any filesystem may be mounted anywhere in the directory structure using the "mount" command. However, the default locations for filesystem mount-points are defined in /etc/fstab.
The GRUB bootloader handles disks and partitions rather differently. Due to the fact that GRUB runs outside any operating system, and can be used to boot any OS, it has to use the machine's BIOS to determine what disks are available. On PC hardware, the BIOS will number the disks in some order by its own rules (typically, on-board IDE devices first, followed by devices on expansion cards, although that can be changed). GRUB refers to the disks it can see as (hd0), (hd1), (hd2), etc. These don't necessarily tie up in any obvious way with the Linux devices. If you look at /boot/grub/device.map, you can see the mapping between GRUB devices and Linux devices.
GRUB also uses different numbers for partitions. GRUB numbers its primary partitions 0-3, and logical partitions from 4 onwards. Partition numbers in GRUB are therefore one smaller than the corresponding partition number in Linux. Thus, the first primary partition on the first BIOS disk will be (hd0,0), and the second logical partition on the second BIOS disk will be (hd1,5).