Thursday, August 27, 2020

How to replace a bad hard drive in ZFS Raid

How to replace a bad hard drive in ZFS Raid and start the regeneration (resilvering process).


STEP 1 - INFORMATION OF THE FAILED DRIVE

Get GUID of the failed drive:

root@localhost# zdb
raid1:
    version: 5000
    name: 'raid1'
    state: 0
    txg: 1178836
    pool_guid: 8019483820723122312
    errata: 0
    hostid: 3155752912
    hostname: 'a6'
    com.delphix:has_per_vdev_zaps
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: 8019483820723122312
        create_txg: 4
        children[0]:
            type: 'mirror'
            id: 0
            guid: 11864727355575360377
            metaslab_array: 256
            metaslab_shift: 34
            ashift: 12
            asize: 2000384688128
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_top: 129
            children[0]:
                type: 'disk'
                id: 0
                guid: 15304200656844780564
                path: '/dev/sdb1'
                devid: 'ata-HITACHI_HUA723020ALA640_YGKU6BBG-part1'
                phys_path: 'pci-0000:00:1f.2-ata-2'
                whole_disk: 1
                create_txg: 4
                com.delphix:vdev_zap_leaf: 130
            children[1]:
                type: 'disk'
                id: 1
                guid: 980353070042574228
                path: '/dev/sdc1'
                devid: 'ata-HITACHI_HUA723020ALA640_YGKT7U6G-part1'
                phys_path: 'pci-0000:00:1f.2-ata-3'
                whole_disk: 1
                DTL: 384
                create_txg: 4
                com.delphix:vdev_zap_leaf: 131
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data

In this example we will pretend /dev/sdc is the bad drive. We will find the GUID for /dev/sdc which is 980353070042574228

Get serial #:

root@localhost# smartctl -a /dev/sdc | grep Serial Serial Number: YGKT7U6G


STEP 2 - REMOVE THE FAILED DRIVE

zpool offline raid1 980353070042574228


STEP 3 - REPLACE THE HARD DRIVE PHYSICALLY

Please replace the broken hard drive with a new hard drive.


STEP 4 - COPY PARTITION TABLE

Please note the first device in the command below is the TARGET and the second device is the SOURCE.

sgdisk --replicate=[TARGET] [SOURCE]
sgdisk --replicate=/dev/sdc /dev/sdb


STEP 5 - GENERATE RANDOM GUID

sgdisk --randomize-guids /dev/sdc


STEP 6 - ADD NEW HARD DRIVE TO ZFS POOL

zpool replace raid1 /dev/sdc


FINAL - CHECK AND MONITOR SILVERING PROCESS

watch zpool status raid1 -v