Backing up your homelab using Bacula
Like most homelab enthusiast out there, I came to the realization that several workloads I’m hosting contribute significantly to the digital quality of life of my household and while I tend to ensure such workloads have minimal single points of failure, the fact remains that mistakes DO happen which could cause significant data loss. Backing up important data is essential to facilitate service recovery in such situations.
Because I’m hosting rather large amounts of data, such as video surveillance footage, I wanted a backup solution that would allow me to select recovery points going back at least 30 days without needing to break the bank with storage device capacity. To accomplish this, I wanted a backup solution which supports Full
, Differential
and Incremental
backup types.
Table of contents
Enter Bacula!
Bacula Community Edition is the solution that ticked all the boxes for me:
- It is Open Source and actively maintained.
- It provides Full, Differential and Incremental backup types.
- It can backup remote workloads.
- It is an enterprise ready solution – homelab knowledge gains often translate into professional skills! 😉
My backup server
I’m using an old HP Compaq 8200 Elite Convertible Minitower with 12 GB of RAM as my backup server. The operating system disk is a Crucial BX500 480GB 3D NAND SATA 2.5-Inch Internal SSD and I’m using two Seagate BarraCuda 8TB 3.5 Inch SATA 6 Gb/s 5400 RPM HDDs, setup as a striped ZFS pool.
I have opted to add this computer to my existing Proxmox cluster and virtualize the Bacula server so that I can easily migrate it to another host at a later time, should my compute or storage requirements increase. As you can see in Figure 1 below, I have assigned the Bacula VM 4 vCPUs, 4 GB of RAM, a 40G OS disk and a 10 TB disk to store the backups. This leaves me a little over 5 TB of storage to expand the Bacula VM’s volumes if necessary, or to provision additional VMs on the Proxmox node!
I’m using Ubuntu Server 20.04 for the Bacula VM’s operating system, merely because I’m quite familiar with this Linux distribution and back when I provisioned the VM, Ubuntu 22.04 was not officially released yet.
I have mounted the 10 TB volume on the following path: /mnt/backups
Database backend
While Bacula supports more than one database backend to store its catalog, I’ve opted to use MariaDB and I am hosting both Bacula and the database on the same VM. Having everything related to Bacula co-located on the same VM was desirable to me as it will make lifting and shifting this workload on another host that much easier.
Installing Bacula
I have automated the installation of my Bacula server using an Ansible playbook and while this may be an interesting topic to cover in another post, for now it suffices to say that:
- Bacula is configured to use:
--with-mysql
for the MariaDB backend.--with-openssl
in order to encrypt the backup volumes destined to be synchronized into an Azure Storage Account.
- There are two Storage Daemons.
- Ansible installs and configures Postfix to relay e-mails to my SMTP server. This will enable Bacula to send e-mails on job completion, error, etc.
- I also opted to install Bacula-web so that I can use the Web UI to review backup statuses, etc.
Configuring Bacula
Storage daemons and pools
I’m backing up my data into two tiers:
The first one is critical data, backed up into the CloudPool
. This includes pictures and videos I’ve taken over time, database exports, home automation configuration files, mailboxes, video surveillance media where the cameras have detected movement and most importantly, a backup of the latest Bacula catalog.
The second tier, backed up into the LocalPool
, comprises everything else.
CloudPool
Critical data is backed up in a storage pool called CloudPool
served by a dedicated Storage Daemon. As mentioned before, I’m using this pool for very specific files and in order to adhere to the 3-2-1 backup principle, when all of the day’s backups have completed, a script gets triggered by Bacula to upload the BKPCLOUD
volumes into an Azure Storage Account, which satisfies the offsite copy requirement.
Because I’m limiting this pool’s total size to 1.5 TB and my Storage Account is configured using the Cool
access tier, I can rest assured that my Azure invoice will not go over ~CAD$18/month for using such storage.
Device {
Name = CloudFileDevice
Media Type = File
Archive Device = /mnt/backups/cloud
LabelMedia = yes; # lets Bacula label unlabeled media
Random Access = Yes;
AutomaticMount = yes; # when device opened, read it
RemovableMedia = no;
AlwaysOpen = no;
}
Pool {
Name = CloudPool
Pool Type = Backup
Storage = CloudFileStorage # This value overrides the one from the Job!
Recycle = yes # Bacula can automatically recycle Volumes
AutoPrune = yes # Prune expired volumes
LabelFormat = "BKPCLOUD" # Prefix for backup files
Volume Retention = 60 days # 60 days
Maximum Volume Bytes = 1G # Limit Volume size to something reasonable
Maximum Volumes = 1500 # Limit the Blob storage consumption to 1.5 TB!!!
Action On Purge = Truncate
}
CloudPool highlights
The important properties for the CloudPool
are highlighted in the above code snippet:
Archive Device
andLabelFormat
: These two properties define the path and file names that Bacula’s backup volumes will be stored into, for this specific pool. With this configuration, the volumes will be stored as/mnt/backups/cloud/BKPCLOUD0001
,/mnt/backups/cloud/BKPCLOUD0002
and so on.Maximum Volumes
andMaximum Volume Bytes
serve to limit the number of individual backup volumes Bacula will create as well as the maximum size each volume can occupy on the file system. This essentially limits theCloudPool
file system usage to 1.5 TB (1500 x 1GB volumes). Also, by limiting the volume sizes to 1 GB, I’m potentially reducing the amount of data required to fetch from the Storage Account should I need to restore from the cloud. Furthermore, the script will upload only the volumes where the checksum differs from the file in the Storage Account.- By enabling
Recycle
andAutoPrune
, Bacula will re-use volumes when all of its backup jobs become older than the Volume Retention. For example, if the last created volume is/mnt/backups/cloud/BKPCLOUD0123
but all of the jobs in volume/mnt/backups/cloud/BKPCLOUD0002
are older than 60 days, Bacula will re-useBKPCLOUD0002
instead of creating (labelling) a new volume namedBKPCLOUD0124
. This further constrains the pool from consuming more than 1.5 TB of storage space.
LocalPool
I use the LocalPool
to backup data I consider not sufficiently important to warrant an offsite copy, but keeping a second copy around would be beneficial should a workload disaster occur. This mostly includes the operating system disks for my computers and servers. Because I typically provision my servers using Ansible playbooks, I can re-deploy workloads with limited effort if required, and these backups serve as a safety net if the server’s configuration somehow drifted over time.
Device {
Name = LocalFileDevice
Media Type = File
Archive Device = /mnt/backups/local
LabelMedia = yes; # lets Bacula label unlabeled media
Random Access = Yes;
AutomaticMount = yes; # when device opened, read it
RemovableMedia = no;
AlwaysOpen = no;
}
Pool {
Name = LocalPool
Pool Type = Backup
Storage = LocalHugeFileStorage # This value overrides the one from the Job!
Recycle = yes # Bacula can automatically recycle Volumes
AutoPrune = yes # Prune expired volumes
LabelFormat = "BKPLOCAL" # Prefix for backup files
Volume Retention = 35 days # 35 days
Maximum Volume Bytes = 5G # Limit Volume size to something reasonable
Maximum Volumes = 1660 # 1660 * 5 = 8.3T
Action On Purge = Truncate
}
You can see in the above LocalPool
configuration that 8.3 TB is made available to the pool. Backup volumes are stored in a different directory, namely /mnt/backups/local
and the volume names will start with BKPLOCAL
.
Configuration files
I maintain Bacula’s configuration files as part of a Git repository with a CI/CD pipeline deploying its contents on the Bacula server, reloading the configuration and verifying service health. Here is the file system structure for the configuration files:
.
├── bacula-dir.conf
├── bacula-fd.conf
├── bacula-sd-cloud.conf
├── bacula-sd-local.conf
├── bconsole.conf
├── clients
│  └── server_1_cli.conf
├── filesets
│  └── server_1_fs.conf
├── jobs
│  └── server_1_jobs.conf
├── messages.conf
├── pools.conf
├── schedules.conf
├── scripts
│  ├── UploadToAzure.ps1
│  └── [All other scripts includes with Bacula]
└── storages.conf
Whenever I add or delete a new configuration file under the clients
, filesets
or jobs
directories, they get dynamically included in Bacula’s configuration by the means of the following excerpt at the very bottom of the bacula-dir.conf
file:
# Includes
@/etc/bacula/schedules.conf
@/etc/bacula/messages.conf
@/etc/bacula/storages.conf
@/etc/bacula/pools.conf
# The following commands return the full path and name of *.conf files, prefixed with an at-sign.
# Clients
@"|sh -c 'ls -d -1 /etc/bacula/clients/*.conf | sed \"s/^\\(.\\)/@\\1/g\"'"
# File Sets
@"|sh -c 'ls -d -1 /etc/bacula/filesets/*.conf | sed \"s/^\\(.\\)/@\\1/g\"'"
# Jobs
@"|sh -c 'ls -d -1 /etc/bacula/jobs/*.conf | sed \"s/^\\(.\\)/@\\1/g\"'"
With that said, all that’s left is to start creating new files under the clients
, filesets
or jobs
directories in order for new workloads to be backed up using Bacula!
One thing to note is that one can never be too cautious when storing data on the Internet and for that reason, all File Daemons (clients) use a distinct PKI keypair used to encrypt data in the Bacula volume. The PKI Signatures
, PKI Encryption
and PKI Keypair
properties are used in the File Daemon configuration files. This way, if someone ever gains unauthorized access to my Storage Account and is able to download my Bacula catalog file and the BKPCLOUD
volumes, the files will be useless unless they also have my private keys.
In next post, I will cover the automated deployment of the Bacula server using Ansible.
One Reply to “Backing up your homelab using Bacula”