Thursday, August 27, 2020

How to replace a bad hard drive in ZFS Raid

How to replace a bad hard drive in ZFS Raid and start the regeneration (resilvering process).


STEP 1 - INFORMATION OF THE FAILED DRIVE

Get GUID of the failed drive:

root@localhost# zdb
raid1:
    version: 5000
    name: 'raid1'
    state: 0
    txg: 1178836
    pool_guid: 8019483820723122312
    errata: 0
    hostid: 3155752912
    hostname: 'a6'
    com.delphix:has_per_vdev_zaps
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: 8019483820723122312
        create_txg: 4
        children[0]:
            type: 'mirror'
            id: 0
            guid: 11864727355575360377
            metaslab_array: 256
            metaslab_shift: 34
            ashift: 12
            asize: 2000384688128
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_top: 129
            children[0]:
                type: 'disk'
                id: 0
                guid: 15304200656844780564
                path: '/dev/sdb1'
                devid: 'ata-HITACHI_HUA723020ALA640_YGKU6BBG-part1'
                phys_path: 'pci-0000:00:1f.2-ata-2'
                whole_disk: 1
                create_txg: 4
                com.delphix:vdev_zap_leaf: 130
            children[1]:
                type: 'disk'
                id: 1
                guid: 980353070042574228
                path: '/dev/sdc1'
                devid: 'ata-HITACHI_HUA723020ALA640_YGKT7U6G-part1'
                phys_path: 'pci-0000:00:1f.2-ata-3'
                whole_disk: 1
                DTL: 384
                create_txg: 4
                com.delphix:vdev_zap_leaf: 131
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data

In this example we will pretend /dev/sdc is the bad drive. We will find the GUID for /dev/sdc which is 980353070042574228

Get serial #:

root@localhost# smartctl -a /dev/sdc | grep Serial Serial Number: YGKT7U6G


STEP 2 - REMOVE THE FAILED DRIVE

zpool offline raid1 980353070042574228


STEP 3 - REPLACE THE HARD DRIVE PHYSICALLY

Please replace the broken hard drive with a new hard drive.


STEP 4 - COPY PARTITION TABLE

Please note the first device in the command below is the TARGET and the second device is the SOURCE.

sgdisk --replicate=[TARGET] [SOURCE]
sgdisk --replicate=/dev/sdc /dev/sdb


STEP 5 - GENERATE RANDOM GUID

sgdisk --randomize-guids /dev/sdc


STEP 6 - ADD NEW HARD DRIVE TO ZFS POOL

zpool replace raid1 /dev/sdc


FINAL - CHECK AND MONITOR SILVERING PROCESS

watch zpool status raid1 -v




Saturday, June 6, 2020

Proxmox PVE APT UPDATE "not authorized" Error Message - How to configure No-Subscription APT Repository

Proxmox PVE is a great software and mature Virtualization Environment.  I am very thankful for the team and grateful for the software. I wish I can afford to pay for subscription, but so far I am not that much profitable yet. Because of that I am still not a subscriber :-(    However I plan to be and recommend anyone who can to subscribe.

Having said the above, I would like to show you what steps are necessary to disable Enterprise Repository and Enable the No-Subscription Repository.  This is for the New Proxmox PVE 6.x using Debian Buster.

STEP 1 - ADD NO-SUBSCRIPTION REPOSITORY

nano /etc/apt/sources.list

Add the following lines at the end of your /etc/apt/sources.list

# PVE pve-no-subscription repository provided by proxmox.com,
# NOT recommended for production use
deb http://download.proxmox.com/debian/pve buster pve-no-subscription

Your new sources.list file should now look like this:

deb http://ftp.us.debian.org/debian buster main contrib

deb http://ftp.us.debian.org/debian buster-updates main contrib

# security updates
deb http://security.debian.org buster/updates main contrib

# PVE pve-no-subscription repository provided by proxmox.com,
# NOT recommended for production use
deb http://download.proxmox.com/debian/pve buster pve-no-subscription


STEP 2 - COMMENT OUT THE ENTERPRISE REPOSITORY FILE


nano /etc/apt/sources.list.d/pve-enterprise.list

Put a pound sign in front of this line:

# deb https://enterprise.proxmox.com/debian/pve buster pve-enterprise

Your pve-enterprise.list file should now look like this:

# deb https://enterprise.proxmox.com/debian/pve buster pve-enterprise



STEP 3 - TEST BY EXECUTING APT UPDATE

You should now be able to execute 'apt update' without errors.



# deb https://enterprise.proxmox.com/debian/pve buster pve-enterprise



STEP 3 - CONGRATULATIONS! YOU CAN NOW UPDATE FROM PROXMOX

Remember to Subscribe to Proxmox when you can!

Monday, May 18, 2020

Proxmox PVE version 6.2 has been released! (Proxmox Virtualization Environment)

Proxmox, one of the best open source project for virtualization.  Proxmox PVE has released its 6th edition with version 6.2.

Here are the list of features and changes for Proxmox 6.2 (thanks to magic of copy-and-paste):

Advanced options for the web-based management interface:
  • Proxmox VE implements built-in validation of domains for Let's Encrypt TLS certificates via the DNS-based challenge mechanism, in addition to the already existing HTTP-based validation mode.
  • Full support for up to eight corosync network links is available. The more links are used, the higher the cluster availability.
  • In the storage content view, administrators can now filter the stored data with the new column ‘Creation Date’ which, for example, simplifies to search for a backup from a certain date.
  • The language of the web interface can be seamlessly changed without the need to restart the session. An Arabic translation has been added and thus Proxmox VE supports 20 languages in total.
Linux Container:
  • The integrated container technology has been updated to LXC 4.0.2 and lxcfs 4.0.3. Proxmox VE 6.2 now allows to create templates for containers on directory-based storage.
  • New LXC templates for Ubuntu 20.04, Fedora 32, CentOS 8.1, Alpine Linux and Arch Linux.
Zstandard for Backup/Restore:
  • The integrated backup manager supports the fast and highly efficient lossless data compression algorithm Zstandard (zstd).
User and permission management:
  • Proxmox VE uses a role-based user and permission management for all objects such as VMs, storage, nodes, etc. The new LDAP sync enables synchronization of LDAP users and groups into the Proxmox user and group permission framework.
  • Full support and the integration for API tokens has been added allowing stateless access to most parts of the REST API by another system, software or API client. API Tokens can be generated for individual users and can optionally be configured with separate permissions and expiration dates to limit the scope and duration of the access. Should the API token get compromised it can be revoked without having to disable the user itself.

Further notable enhancements

  • QEMU/KVM: Support for Live Migration with replicated disks (storage replication with zfs) is enabled.
  • Testing the Ceph storage has become easier as the uninstall process has been simplified.

Wednesday, February 26, 2020

Converting LVM to LVM Thin so Proxmox PVE will support Snapshot Backup and Thin Provisioning

I have been using Proxmox PVE since version 2.x.  Back in Proxmox version 2 and 3, a regular LVM partition with enough free space in its LVM Volume Group would be enough for Proxmox Backup Snapshot function to work.  However starting Proxmox version 5 you have to use LVM Thin (thin provisioning) in order to make backup with snapshot to work and support over provisioning.

Backup with snapshot is important because it allows the VM or CT to continue working during the backup process (uninterrupted).  This is crucial and makes backup seamless and effortless without downtime.

Over provisioning allow you to create VM and CT without actually taking up the storage size specified in the VM or CT.  For example, lets say you create a CT with 100GB storage, however the CT actually only use 20GB, with thin provisioning the LVM Thin will only take up 20GB instead of 100GB therefore allowing you to assign / provision more storage and you actually physically have capacity of.  Of course if all your CT / VM actually takes up 100% of their capacity and you over provision all your CT / VM you will run out of space!

Here are the steps I have done to convert my existing partition (/dev/md0) from typical ext4 mounted as a directory to my Proxmox (which currently does not work for backup with snapshot) and converting it to LVM Thin:


STEP 1 - Move all your VM and CT

First you need to empty and remove all your VM and CT if the partition you are working on is being used by some VM / CT.  I simply just STOP them and back them up one at a time and restore at a different proxmox server.   If you backup and restore, you will then still have to remove all the VM and CT in the hardware node you are working on.  The point here is, you have to empty the partition we will be converting to LVM Thin (because the data will be erased during the process).


STEP 2 - UNMOUNT THE PARTITION

Unmount the partition using the command:

umount /mnt/md0

in case the above fails you can try using 'force':

umount -f /mnt/md0


STEP 3 - CHANGE PARTITION TYPE TO LVM

See the list of devices recognize by fdisk and find the device ID you are working on:

fdisk -l

My device ID is /dev/md127

Next, I will use fdisk to create a new single partition (100% size) and change its type to LVM

fdisk /dev/md127

Type command 'n' to create new partition, select default options (press enter a few times to select default) until the partition has been created.

Type command 't' to change its type, type '8e' to select Linux LVM

Type command 'w' to commit changes.


STEP 4 - CREATE PHYSICAL VOLUME AND VOLUME GROUP

Use fdisk again to find the newly created LVM partition:

fdisk -l

My new device ID for the partition is /dev/md127p1

Create physical volume first:

pvcreate /dev/md127p1

Check to make sure physical volume has been created:

pvdisplay



Then create volume group, I would like to call my volume group local_md0

vgcreate local_md0 /dev/md127p1

Check to make sure volume group has been created:



STEP 5 - CREATE LOGICAL VOLUME WITH THIN OPTION

The lvcreate command below will create a THIN logical volume because of the -T option and will leave 5% of the volume group free.  The 5% is enough in my case because this is a 2TB partition.  You only need to reserve enough space for snapshot changes during backup operation.  Meaning lets say your backup takes 30 min, during that 30 min time, is your 5% free space enough to record all the changes that may happen within that 30 min window?

lvcreate -T -n local_md0 -l 95%FREE local_md0


STEP 6 - ADD STORAGE IN PROMOX PVE AS LVM THIN


Click on Datacenter > Storage > Add




STEP 7 - CONFIRM IT HAS BEEN ADDED AS STORAGE






Congratulations!  You know have LVM Thin which means you can do Backup with Snapshot and your VM and CT can be over provisioned!