Backing up your homelab using Bacula

Like most homelab enthusiast out there, I came to the realization that several workloads I’m hosting contribute significantly to the digital quality of life of my household and while I tend to ensure such workloads have minimal single points of failure, the fact remains that mistakes DO happen which could cause significant data loss. Backing up important data is essential to facilitate service recovery in such situations.

Because I’m hosting rather large amounts of data, such as video surveillance footage, I wanted a backup solution that would allow me to select recovery points going back at least 30 days without needing to break the bank with storage device capacity. To accomplish this, I wanted a backup solution which supports Full, Differential and Incremental backup types.

Enter Bacula!

Bacula Community Edition is the solution that ticked all the boxes for me:

  • It is Open Source and actively maintained.
  • It provides Full, Differential and Incremental backup types.
  • It can backup remote workloads.
  • It is an enterprise ready solution – homelab knowledge gains often translate into professional skills! 😉

My backup server

I’m using an old HP Compaq 8200 Elite Convertible Minitower with 12 GB of RAM as my backup server. The operating system disk is a Crucial BX500 480GB 3D NAND SATA 2.5-Inch Internal SSD and I’m using two Seagate BarraCuda 8TB 3.5 Inch SATA 6 Gb/s 5400 RPM HDDs, setup as a striped ZFS pool.

I have opted to add this computer to my existing Proxmox cluster and virtualize the Bacula server so that I can easily migrate it to another host at a later time, should my compute or storage requirements increase. As you can see in Figure 1 below, I have assigned the Bacula VM 4 vCPUs, 4 GB of RAM, a 40G OS disk and a 10 TB disk to store the backups. This leaves me a little over 5 TB of storage to expand the Bacula VM’s volumes if necessary, or to provision additional VMs on the Proxmox node!

Bacula server VM's hardware assignment in Proxmox.
Figure 1.

I’m using Ubuntu Server 20.04 for the Bacula VM’s operating system, merely because I’m quite familiar with this Linux distribution and back when I provisioned the VM, Ubuntu 22.04 was not officially released yet.

I have mounted the 10 TB volume on the following path: /mnt/backups

Database backend

While Bacula supports more than one database backend to store its catalog, I’ve opted to use MariaDB and I am hosting both Bacula and the database on the same VM. Having everything related to Bacula co-located on the same VM was desirable to me as it will make lifting and shifting this workload on another host that much easier.

Installing Bacula

I have automated the installation of my Bacula server using an Ansible playbook and while this may be an interesting topic to cover in another post, for now it suffices to say that:

  • Bacula is configured to use:
    • --with-mysql for the MariaDB backend.
    • --with-openssl in order to encrypt the backup volumes destined to be synchronized into an Azure Storage Account.
  • There are two Storage Daemons.
  • Ansible installs and configures Postfix to relay e-mails to my SMTP server. This will enable Bacula to send e-mails on job completion, error, etc.
  • I also opted to install Bacula-web so that I can use the Web UI to review backup statuses, etc.

Configuring Bacula

Storage daemons and pools

I’m backing up my data into two tiers:

The first one is critical data, backed up into the CloudPool. This includes pictures and videos I’ve taken over time, database exports, home automation configuration files, mailboxes, video surveillance media where the cameras have detected movement and most importantly, a backup of the latest Bacula catalog.

The second tier, backed up into the LocalPool, comprises everything else.

    CloudPool

    Critical data is backed up in a storage pool called CloudPool served by a dedicated Storage Daemon. As mentioned before, I’m using this pool for very specific files and in order to adhere to the 3-2-1 backup principle, when all of the day’s backups have completed, a script gets triggered by Bacula to upload the BKPCLOUD volumes into an Azure Storage Account, which satisfies the offsite copy requirement.

    Because I’m limiting this pool’s total size to 1.5 TB and my Storage Account is configured using the Cool access tier, I can rest assured that my Azure invoice will not go over ~CAD$18/month for using such storage.

    Device {
        Name = CloudFileDevice
        Media Type = File
        Archive Device = /mnt/backups/cloud
        LabelMedia = yes;                   # lets Bacula label unlabeled media
        Random Access = Yes;
        AutomaticMount = yes;               # when device opened, read it
        RemovableMedia = no;
        AlwaysOpen = no;
    }
    
    Pool {
        Name = CloudPool
        Pool Type = Backup
        Storage = CloudFileStorage     # This value overrides the one from the Job!
        Recycle = yes                  # Bacula can automatically recycle Volumes
        AutoPrune = yes                # Prune expired volumes
        LabelFormat = "BKPCLOUD"       # Prefix for backup files
        Volume Retention = 60 days     # 60 days
        Maximum Volume Bytes = 1G      # Limit Volume size to something reasonable
        Maximum Volumes = 1500         # Limit the Blob storage consumption to 1.5 TB!!!
        Action On Purge = Truncate
    }

    CloudPool highlights

    The important properties for the CloudPool are highlighted in the above code snippet:

    • Archive Device and LabelFormat: These two properties define the path and file names that Bacula’s backup volumes will be stored into, for this specific pool. With this configuration, the volumes will be stored as /mnt/backups/cloud/BKPCLOUD0001, /mnt/backups/cloud/BKPCLOUD0002 and so on.

    • Maximum Volumes and Maximum Volume Bytes serve to limit the number of individual backup volumes Bacula will create as well as the maximum size each volume can occupy on the file system. This essentially limits the CloudPool file system usage to 1.5 TB (1500 x 1GB volumes). Also, by limiting the volume sizes to 1 GB, I’m potentially reducing the amount of data required to fetch from the Storage Account should I need to restore from the cloud. Furthermore, the script will upload only the volumes where the checksum differs from the file in the Storage Account.

    • By enabling Recycle and AutoPrune, Bacula will re-use volumes when all of its backup jobs become older than the Volume Retention. For example, if the last created volume is /mnt/backups/cloud/BKPCLOUD0123 but all of the jobs in volume /mnt/backups/cloud/BKPCLOUD0002 are older than 60 days, Bacula will re-use BKPCLOUD0002 instead of creating (labelling) a new volume named BKPCLOUD0124. This further constrains the pool from consuming more than 1.5 TB of storage space.

    LocalPool

    I use the LocalPool to backup data I consider not sufficiently important to warrant an offsite copy, but keeping a second copy around would be beneficial should a workload disaster occur. This mostly includes the operating system disks for my computers and servers. Because I typically provision my servers using Ansible playbooks, I can re-deploy workloads with limited effort if required, and these backups serve as a safety net if the server’s configuration somehow drifted over time.

    Device {
        Name = LocalFileDevice
        Media Type = File
        Archive Device = /mnt/backups/local
        LabelMedia = yes;                   # lets Bacula label unlabeled media
        Random Access = Yes;
        AutomaticMount = yes;               # when device opened, read it
        RemovableMedia = no;
        AlwaysOpen = no;
    }
    
    Pool {
        Name = LocalPool
        Pool Type = Backup
        Storage = LocalHugeFileStorage # This value overrides the one from the Job!
        Recycle = yes                  # Bacula can automatically recycle Volumes
        AutoPrune = yes                # Prune expired volumes
        LabelFormat = "BKPLOCAL"       # Prefix for backup files
        Volume Retention = 35 days     # 35 days
        Maximum Volume Bytes = 5G      # Limit Volume size to something reasonable
        Maximum Volumes = 1660         # 1660 * 5 = 8.3T
        Action On Purge = Truncate
    }


    You can see in the above LocalPool configuration that 8.3 TB is made available to the pool. Backup volumes are stored in a different directory, namely /mnt/backups/local and the volume names will start with BKPLOCAL.

    Configuration files

    I maintain Bacula’s configuration files as part of a Git repository with a CI/CD pipeline deploying its contents on the Bacula server, reloading the configuration and verifying service health. Here is the file system structure for the configuration files:

    .
    ├── bacula-dir.conf
    ├── bacula-fd.conf
    ├── bacula-sd-cloud.conf
    ├── bacula-sd-local.conf
    ├── bconsole.conf
    ├── clients
    │   └── server_1_cli.conf
    ├── filesets
    │   └── server_1_fs.conf
    ├── jobs
    │   └── server_1_jobs.conf
    ├── messages.conf
    ├── pools.conf
    ├── schedules.conf
    ├── scripts
    │   ├── UploadToAzure.ps1
    │   └── [All other scripts includes with Bacula]
    └── storages.conf


    Whenever I add or delete a new configuration file under the clients, filesets or jobs directories, they get dynamically included in Bacula’s configuration by the means of the following excerpt at the very bottom of the bacula-dir.conf file:

    # Includes
    @/etc/bacula/schedules.conf
    @/etc/bacula/messages.conf
    @/etc/bacula/storages.conf
    @/etc/bacula/pools.conf
    
    # The following commands return the full path and name of *.conf files, prefixed with an at-sign.
    
    # Clients
    @"|sh -c 'ls -d -1 /etc/bacula/clients/*.conf | sed \"s/^\\(.\\)/@\\1/g\"'"
    
    # File Sets
    @"|sh -c 'ls -d -1 /etc/bacula/filesets/*.conf | sed \"s/^\\(.\\)/@\\1/g\"'"
    
    # Jobs
    @"|sh -c 'ls -d -1 /etc/bacula/jobs/*.conf | sed \"s/^\\(.\\)/@\\1/g\"'"


    With that said, all that’s left is to start creating new files under the clients, filesets or jobs directories in order for new workloads to be backed up using Bacula!

    One thing to note is that one can never be too cautious when storing data on the Internet and for that reason, all File Daemons (clients) use a distinct PKI keypair used to encrypt data in the Bacula volume. The PKI Signatures, PKI Encryption and PKI Keypair properties are used in the File Daemon configuration files. This way, if someone ever gains unauthorized access to my Storage Account and is able to download my Bacula catalog file and the BKPCLOUD volumes, the files will be useless unless they also have my private keys.

    In next post, I will cover the automated deployment of the Bacula server using Ansible.

    One Reply to “Backing up your homelab using Bacula”

    Leave a Reply

    Related Posts