λk.(k blog): Posts tagged 'tutorial'urn:https-www-williamjbowman-com:-tags-tutorial-html2020-06-30T20:54:20ZSetting up your backup serviceurn:https-www-williamjbowman-com:-blog-2020-06-30-setting-up-your-backup-service2020-06-30T20:54:20Z2020-06-30T20:54:20ZWilliam J. Bowman
<p>I just ran the command <span class="stt">rm -rf ~</span>, deleting all my personal files in the process.
This was not the first time, and it was no big deal, because I back up my files
with automatic rolling backups.
My backup system is secure, redundant, and has low resources requirements.
The backup repository is encrypted, deduplicated, compressed, and mirrored
across multiple machines.
You can choose to use any or none of these features while following this guide.</p>
<p>In this guide, I describe how to set up a secure and robust backup service
yourself, which runs on Linux, macOS, and Windows via WSL 2.
I provide my own scripts, config files, and workflows for maintaining,
validating, and restoring the backups.
This is all setup using free software, supports multiple configurations with
varying degrees of security and redundancy, and scales well to more backup
clients.</p>
<p>If you’d prefer to not set this up yourself and you run macOS or Windows, I
recommend Backblaze:</p>
<blockquote>
<p><a href="https://www.backblaze.com/cloud-backup.html#af9v9g"><span class="url">https://www.backblaze.com/cloud-backup.html#af9v9g</span></a></p></blockquote>
<p>They automatically handle everything, including most of the features I want in a
backup service and some I could never implement myself, for $6/m per machine
(USD).</p>
<!--more-->
<p></p>
<div class="SIntrapara">
<h1 class="fake-header">Table of Contents</h1>
</div>
<div class="SIntrapara">
<table cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td>
<p><span class="hspace"> </span><a class="toptoclink" data-pltdoc="x" href="#%28part._sec~3aintro%29">1<span class="hspace"> </span>Introduction</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"></span></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toptoclink" data-pltdoc="x" href="#%28part._sec~3aprereq%29">2<span class="hspace"> </span>Install Prerequisite Software</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toclink" data-pltdoc="x" href="#%28part._.Backup_.Software%29">2.1<span class="hspace"> </span>Backup Software</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toclink" data-pltdoc="x" href="#%28part._.Optional_.G.U.I_for_.Client%29">2.2<span class="hspace"> </span>Optional GUI for Client</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toclink" data-pltdoc="x" href="#%28part._.Mirror_.Software%29">2.3<span class="hspace"> </span>Mirror Software</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"></span></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toptoclink" data-pltdoc="x" href="#%28part._sec~3ainit%29">3<span class="hspace"> </span>Initialize the Backup Repository</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toclink" data-pltdoc="x" href="#%28part._.Setup_.Server_.Environment%29">3.1<span class="hspace"> </span>Setup Server Environment</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toclink" data-pltdoc="x" href="#%28part._.Setup_.Client-.Only_.Environment%29">3.2<span class="hspace"> </span>Setup Client-Only Environment</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toclink" data-pltdoc="x" href="#%28part._.Create_the_.Encrypted_.Repository%29">3.3<span class="hspace"> </span>Create the Encrypted Repository</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toclink" data-pltdoc="x" href="#%28part._sec~3ainit~3async-client-only%29">3.4<span class="hspace"> </span>Mirror the Client-Only Repository Offsite</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"></span></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toptoclink" data-pltdoc="x" href="#%28part._sec~3aconfig-client%29">4<span class="hspace"> </span>Configure the Backup Client</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toclink" data-pltdoc="x" href="#%28part._.Install_.Backup_.Script%29">4.1<span class="hspace"> </span>Install Backup Script</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toclink" data-pltdoc="x" href="#%28part._sec~3aconfig-client~3aexclude%29">4.2<span class="hspace"> </span>Exclude Extraneous Files From Backup</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toclink" data-pltdoc="x" href="#%28part._.Configure_.Access_to_the_.Backup_.Repository%29">4.3<span class="hspace"> </span>Configure Access to the Backup Repository</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toclink" data-pltdoc="x" href="#%28part._.Client-only_.Repository_.Folder%29">4.3.1<span class="hspace"> </span>Client-only Repository Folder</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toclink" data-pltdoc="x" href="#%28part._.Backup_.Server_via_.S.S.H%29">4.3.2<span class="hspace"> </span>Backup Server via SSH</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toclink" data-pltdoc="x" href="#%28part._.Least_.Priviledge_for_.Client_.S.S.H_.Key%29">4.3.3<span class="hspace"> </span>Least Priviledge for Client SSH Key</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"></span></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toptoclink" data-pltdoc="x" href="#%28part._sec~3amirrors%29">5<span class="hspace"> </span>Configure Mirrors</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toclink" data-pltdoc="x" href="#%28part._.Least_.Priviledge_for_.Mirrors%29">5.1<span class="hspace"> </span>Least Priviledge for Mirrors</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"></span></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toptoclink" data-pltdoc="x" href="#%28part._sec~3amonitor%29">6<span class="hspace"> </span>Monitor and Check Backups</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toclink" data-pltdoc="x" href="#%28part._.Check_.Backups_are_.Happening%29">6.1<span class="hspace"> </span>Check Backups are Happening</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toclink" data-pltdoc="x" href="#%28part._.Integrity_.Check_the_.Repository%29">6.2<span class="hspace"> </span>Integrity Check the Repository</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toclink" data-pltdoc="x" href="#%28part._.Prune_.Expired_.Snapshots%29">6.3<span class="hspace"> </span>Prune Expired Snapshots</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toclink" data-pltdoc="x" href="#%28part._.Finding_.Large_.Extraneous_.Files_in_the_.Repository%29">6.4<span class="hspace"> </span>Finding Large Extraneous Files in the Repository</a></p></td></tr>
<tr>
<td>
<p><span class="hspace"></span></p></td></tr>
<tr>
<td>
<p><span class="hspace"> </span><a class="toptoclink" data-pltdoc="x" href="#%28part._.Restore_from_.Backups%29">7<span class="hspace"> </span>Restore from Backups</a></p></td></tr></tbody></table></div>
<h1>1
<tt> </tt><a name="(part._sec~3aintro)"></a>Introduction</h1>
<p>This guide will help you set up a backup system that automatically records hourly
snapshots, compresses, deduplicates, and encrypts them, enabling a very robust
and secure backup system that takes up very little drive space.
For example, I four machines backed up with 2.5TB of snapshots stored in 21GB of
space, mirrored on machines in multiple locations.
It would take an extraordinary event for me to lose data.
I’ve successfully recovered GBs of data usually resulting from my own stupidity,
and occasionally the result of various tools corrupting files or the whole
filesystem.</p>
<p>I describe two main configuration options: (1) client-only, which requires only a
single machine but relies on an external service for saving the backups
offsite; or (2) a client/server approach that requires access to an
always-on server but offers more redundancy.
Within these two main configurations, I describe additional configuration
measures, such as setting up offsite mirrors for the backup repository,
implementing principles of least priviledge to restrict remote access while
still automating backups.</p>
<p>At the end, you too will be able to (but probably shouldn’t) use <span class="stt">rm -rf</span>
without fear, among other benefits.</p>
<h1>2
<tt> </tt><a name="(part._sec~3aprereq)"></a>Install Prerequisite Software</h1>
<h2>2.1
<tt> </tt><a name="(part._.Backup_.Software)"></a>Backup Software</h2>
<p></p>
<div class="SIntrapara">The main backup software is <span class="stt">borg</span>.
</div>
<div class="SIntrapara">
<blockquote>
<p><a href="https://borgbackup.readthedocs.io/en/stable/index.html"><span class="url">https://borgbackup.readthedocs.io/en/stable/index.html</span></a></p></blockquote></div>
<p><span class="stt">borg</span> features automatic compression, deduplication, encryption.
It also supports an on-demand backup server via SSH, useful file exclusion
methods, and filtering/recreating backup archives for when you realize you
backed up something that you didn’t need to and it’s taking up too much space.
These features and its superb documentation and easy of use have made it better
than every other tool I’ve tried.</p>
<p>Install this on the server and all clients.</p>
<p></p>
<div class="SIntrapara">For example, on Arch:
</div>
<div class="SIntrapara">
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">pacman -S borg</span></p></td></tr></tbody></table></div></div>
<p></p>
<div class="SIntrapara">Or macOS:
</div>
<div class="SIntrapara">
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">brew cask install borgbackup</span></p></td></tr></tbody></table></div></div>
<h2>2.2
<tt> </tt><a name="(part._.Optional_.G.U.I_for_.Client)"></a>Optional GUI for Client</h2>
<p></p>
<div class="SIntrapara"><span class="stt">borg</span> has an optional, third-party (still free software) GUI you can
install called <span class="stt">vorta</span>.
</div>
<div class="SIntrapara">
<blockquote>
<p><a href="https://vorta.borgbase.com/"><span class="url">https://vorta.borgbase.com/</span></a></p></blockquote></div>
<p>If you’re uncomfortable with commandline nonsense, you can to use this on
the clients to configure most of what I describe about below.
I haven’t used it myself, so you’ll need to figure out the translation from each
concept and my scripts to the equivalent in the GUI.
The GUIs looks pretty discoverable, though, so this shouldn’t be hard.</p>
<h2>2.3
<tt> </tt><a name="(part._.Mirror_.Software)"></a>Mirror Software</h2>
<p>To make redundant mirrors of your backup repository offsite, you’ll need a tool
to synchronize the repository to the mirrors.
I own several machines, and treat all of them as mirrors for maximum redundancy
without relying on cloud services.</p>
<p></p>
<div class="SIntrapara">I recommend <span class="stt">rclone</span> for this, but alternatives like <span class="stt">rsync</span> or
<a href="https://github.com/bcpierce00/unison"><span class="stt">unison</span></a> work well too.
</div>
<div class="SIntrapara">
<blockquote>
<p><a href="https://rclone.org/"><span class="url">https://rclone.org/</span></a></p></blockquote></div>
<p><span class="stt">rclone</span> provides <span class="stt">rsync</span> like capabilities, but also performs local
caching to speed up the computing the delta to be transfered, supports
various cloud storage backends, in case you want to sync to ~the cloud~.</p>
<p>Install this on all mirrors.</p>
<p></p>
<div class="SIntrapara">Arch:
</div>
<div class="SIntrapara">
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">pacman -S rclone</span></p></td></tr></tbody></table></div></div>
<p></p>
<div class="SIntrapara">macOS:
</div>
<div class="SIntrapara">
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">brew install rclone</span></p></td></tr></tbody></table></div></div>
<p>If you’re using a client-only configuration, you can also install this on the
client if you wish to synchronize the local repository to a cloud service or
secondary machine.
However, unless your cloud service features strong and easy to use version
control, I recommend installing <span class="stt">git</span> instead, as there are some downsides
to a client automatically synchronizing a local backup repository without
version control.
I discuss this in <a data-pltdoc="x" href="#%28part._sec~3ainit~3async-client-only%29">Mirror the Client-Only Repository Offsite</a>.</p>
<h1>3
<tt> </tt><a name="(part._sec~3ainit)"></a>Initialize the Backup Repository</h1>
<h2>3.1
<tt> </tt><a name="(part._.Setup_.Server_.Environment)"></a>Setup Server Environment</h2>
<p>For the client/server model, the backup server needs:</p>
<ol>
<li>
<p>A name or fixed IP address. I call this <span class="stt">backup-server.tld</span>.</p></li>
<li>
<p>An SSH daemon.</p></li>
<li>
<p>A user with SSH access, permission to execute <span class="stt">borg</span>, and shell access.
I’ll call this user <span class="stt">backupd</span>.</p></li>
<li>
<p>A folder this user owns to store the backup repository.
I call this folder <span class="stt">~/backups</span> (meaning <span class="stt">~backupd/backups</span>).</p></li></ol>
<h2>3.2
<tt> </tt><a name="(part._.Setup_.Client-.Only_.Environment)"></a>Setup Client-Only Environment</h2>
<p>For the client-only model, you only need a folder that the client has read/write
access to.
I’ll call this folder <span class="stt">~/backups</span>, and call client user <span class="stt">client-user</span>.</p>
<h2>3.3
<tt> </tt><a name="(part._.Create_the_.Encrypted_.Repository)"></a>Create the Encrypted Repository</h2>
<p>Next we need to initialize the backup repository with an encryption key.
The backup repository is encrypted at-rest.</p>
<p>Run the following command.</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">borg init -e repokey ~/backups</span></p></td></tr></tbody></table></div>
<p>You’ll be prompted for a password.</p>
<p>I strongly recommend storing the password in a password manager.
<span class="stt">borg</span> can automatically read from the password manager using the environment
variable <span class="stt">BORG_PASSCOMMAND</span>.
For example, I use <a href="https://www.passwordstore.org/"><span class="stt">pass</span></a> as
my password manager, and set <span class="stt">BORG_PASSCOMMAND="pass show
backup-server.tld/borg"</span>, which in turn causes <span class="stt">gpg-agent</span> to query me or
my login keychain for the master password.</p>
<p>You can also set the password as a string the environment variable
<span class="stt">BORG_PASSPHRASE</span>.
For example, if you’re password is "password", you can set
<span class="stt">BORG_PASSPHRASE="password"</span>.
You should not do this if the environment variable is stored in a plaintext
file.</p>
<p>There are several other initialization options which you can explore if you want
to customize encryption levels, disable encryption (don’t do it!), or optimize
for hardware acceleration, but I’m happy with the default.</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">borg init --help</span></p></td></tr></tbody></table></div>
<h2>3.4
<tt> </tt><a name="(part._sec~3ainit~3async-client-only)"></a>Mirror the Client-Only Repository Offsite</h2>
<p>If you do not have a backup server, we need to set up at least one mirror.
We need to make sure the local backup repository is stored somewhere
else in the event of a total data loss locally (<span class="emph">e.g.,</span> a stolen laptop),
or a partial data loss that affects the backup repository itself (<span class="emph">e.g.,</span> a
corrupted drive).</p>
<p>Bad solutions include using a file synchronization service such as Dropbox,
Google Drive, or OneDrive as a mirror; or automatically synchronizing via rsync,
unison, or rclone to a secondary machine.
In the event of data loss, an automatic synchronization service could
overwrite the remote copy with a completely empty backup repository, totally
destroying your backups.
Some file-sync services will allow you to restore older versions of a file,
which mitigates some of this risk.
This is not a good solution unless you’re really sure of the version control.</p>
<p>An acceptable solution is to use a version-controlled file hosting service like
GitHub or GitLab to host your backup repository.
You can set up a cron job to automatically commit and push the backup repository
regularly, tagging each commit in the same way as the archives are tagged.
Ideally, the repository should be private, but since it’s encrypted, this is not
strictly required.
This exposes your data to more risk, as with sufficient resources, a dedicated
attacker (such as a corporation or government) could break the encryption.
However, such attackers probably aren’t targeting you, and if they are, you
might have bigger problems.</p>
<p>To use my suggested method, first make <span class="stt">~/backups</span> a git repo.
Run the following commands.</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">cd ~/backups</span></p></td></tr>
<tr>
<td>
<p><span class="stt">git init</span></p></td></tr>
<tr>
<td>
<p><span class="stt">git checkout -b main</span></p></td></tr>
<tr>
<td>
<p><span class="stt">git add -A</span></p></td></tr>
<tr>
<td>
<p><span class="stt">git commit -m "Initilize repo"</span></p></td></tr></tbody></table></div>
<p>Next, add the remote repository:</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">git remote add -m main origin git@git-repo.tld:client-user/backup-repo.git</span></p></td></tr></tbody></table></div>
<p>Now add a cron job.
Run <span class="stt">crontab -e</span> and add the following line.</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">@hourly /home/client-user/bin/sync-local-borg-repo.sh</span></p></td></tr></tbody></table></div>
<p>Finally, install the following script in <span class="stt">~/bin/</span> for the client:</p>
<p></p>
<div class="SIntrapara"><a href="//resources/@|filename|">sync-local-borg-repo.sh</a></div>
<div class="SIntrapara">
<div class="brush: shell">
<pre><code>#!/bin/sh
cd ~/backups
git add -A
git commit --fixup HEAD
git tag `hostname`+`date +"%Y-%m-%dT%H_%M_%S"`
git push origin main</code></pre></div></div>
<p>And make it executable: <span class="stt">chmod +x ~/bin/sync-local-borg-repo.sh</span>.</p>
<p>This method will use considerable client disk space, which is split between the
client and server in the client/server configuration.
I recommend your regularly prune the git repo, but only do so manually after
checking your backups (see <a data-pltdoc="x" href="#%28part._sec~3amonitor%29">Monitor and Check Backups</a>).
Setting up an automatic job to prune it risks deleting your backup repository in
the event of a data loss.
The commit option <span class="stt">--fixup HEAD</span> in line 5 makes this easy with the
following commands:</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">env EDITOR=true git rebase --root --autosquash -i</span></p></td></tr>
<tr>
<td>
<p><span class="stt">git gc</span></p></td></tr>
<tr>
<td>
<p><span class="stt">git push -f origin main</span></p></td></tr></tbody></table></div>
<p>This will squash the entire history of the repo and force push to the remote.
Losing the history is not a big deal, since the backup repository is actually
keeping hourly snapshots.
The git history is only for preventing synchronization from losing data if an
automatic push happens after a data loss.</p>
<h1>4
<tt> </tt><a name="(part._sec~3aconfig-client)"></a>Configure the Backup Client</h1>
<p>Each backup client needs:</p>
<ol>
<li>
<p>A user with read access to all files included in the backup.
I call this user <span class="stt">client-user</span>.
For me, this is my username on the client machine.
In some circumstances, I create a group, <span class="stt">backupg</span>, to give this user read
access to special files.</p></li>
<li>
<p>A cron daemon of some kind.</p></li></ol>
<p>To start the backup system, we need to add a script to run automatically backing
up files, and exclude any extraneous files.
I take the approach of including everything by default, and then manually
inspecting archives from time to time for large extraneous files and folders.</p>
<h2>4.1
<tt> </tt><a name="(part._.Install_.Backup_.Script)"></a>Install Backup Script</h2>
<p>I use the following script, which I set to run every hour.
Add the following cron job to <span class="stt">client-users</span>’s crontab by running
<span class="stt">crontab -e</span>, and adding:</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">@hourly /home/client-user/bin/borg-backup.sh</span></p></td></tr></tbody></table></div>
<p>Then install the following script in <span class="stt">~/bin/</span> for <span class="stt">client-user</span>.</p>
<p></p>
<div class="SIntrapara"><a href="//resources/@|filename|">borg-backup.sh</a></div>
<div class="SIntrapara">
<div class="brush: shell">
<pre><code>#!/bin/sh
## borg-backup.sh
## Usage:
# run `borg-backup.sh`
#
# Optional environment variable inputs:
# - TAG By default, the tag for the archive is set using the hostname of the
# client machine. To manually set a tag, set the environment variable
# `TAG` prior to running, e.g., `env TAG="manual-tag+"
# borg-backup.sh`.
# - WAIT The wait time in seconds to obtain a write lock on the repository from
# the server. By default, 600 seconds (10 minutes).
## Configuration
# Set to the location of the backup repository.
# Can be a remote directory, using SSH, or a local directory.
# Make sure the SSH agent and/or SSH key is readable by the backup daemon,
# and the remote location is accessible by a key in the ssh-agent or configured
# in .ssh/config.
#
# Example: REPO="backupd@backup-server.tld:backups"
# Example: REPO="~/backups"
REPO="borg-server:backups"
# Set the password or passcommand for encrypted repositories.
export BORG_PASSCOMMAND='pass show backup-server.tld/borg'
## Create auxiliary files to be part of the backup.
# Export the installed package list from the package manager, so it can be backed up.
mkdir -p /tmp/pacman-local/
echo "# Pipe to pakku -S to reinstall" > /tmp/pacman-local/pacman.lst
pacman -Qenq >> /tmp/pacman-local/pacman.lst
pacman -Qemq >> /tmp/pacman-local/pacman.lst
## Create a new backup archive.
# Add additional files to backup as needed.
borg create \
-C lzma,9 \
-c 60 \
--exclude-from ~/borg-exclude \
--exclude-if-present '.borg-ignore' \
--lock-wait ${WAIT:-600} \
$REPO::'{hostname}+'${TAG:-}'{now:%Y-%m-%dT%H:%M:%S}' \
/tmp/pacman-local/ \
/etc/sysctl.d \
/etc/modprobe.d \
/etc/makepkg.conf \
/etc/pacman.conf \
/etc/fstab \
/etc/X11 \
~/</code></pre></div></div>
<p>Make it executable with <span class="stt">chmod +x ~/bin/borg-backup.sh</span>.</p>
<p>There are two necessary configuration steps:</p>
<ul>
<li>
<p>Change the <span class="stt">REPO</span> variable to point to your backup repository.
If you’re using a client-only model, this is the path to the backup
repository <span class="stt">~/backups</span>.
If you’re using a server, you can enter the SSH address and path, or configure
the <span class="stt">.ssh/config</span> file as discussed later.</p></li>
<li>
<p>Change the <span class="stt">export BORG_PASSCOMMAND</span> to export your password manager
command, or change the line to <span class="stt">export BORG_PASSPHRASE</span> to export the
password string as described earlier.
You really shouldn’t use <span class="stt">BORG_PASSPHRASE</span> since this stores the password
in plaintext, but I suppose if your hard drive is encrypted, and the backup
script is only stored on the client, it’s probably fine. Ish.</p></li></ul>
<p>You’ll probably also want to change the list of files that are included in the
snapshot.
I include my list for reference, which assumes an Arch Linux machine and
includes some of my customized root config files.</p>
<p></p>
<div class="SIntrapara">The script is documented with its major features, but I’ll explain the
<span class="stt">borg</span> command in more detail.
</div>
<div class="SIntrapara">
<ul>
<li>
<p>The option <span class="stt">-C lzma,9</span> enables LZMA compression level 9 (maximum
compression).
This slows down archive creation but decreases the archive size substantially.
In my experience, my snapshots take about a minute to create and upload to
the server, so I’m fine with max compression.</p></li>
<li>
<p>The option <span class="stt">-c 60</span> tells <span class="stt">borg</span> to create a checkpoint every 60
seconds, saving a partial backup if the backup process is interrupted.
This can happen if you’re running on a laptop that goes to sleep in the
middle of the backup, for example.
I choose 60 seconds since most of my snapshots only take that long, so any
longer might indicate a real change to keep track of.</p></li>
<li>
<p>The option <span class="stt">--exclude-from ~/borg-exclude</span> excludes any files that match
the pattern specification found in the file <span class="stt">~/borg-exclude</span>.
I use this file to filter common files, such as compiler generated files.
I share this file in <a data-pltdoc="x" href="#%28part._sec~3aconfig-client~3aexclude%29">Exclude Extraneous Files From Backup</a>.</p></li>
<li>
<p>The option <span class="stt">--exclude-if-present '.borg-ignore'</span> excludes the directory
from the backup if there is a file named <span class="stt">.borg-ignore</span> in that directory.
I use this for excluding directories that don’t neatly fit some pattern in
<span class="stt">borg-exclude</span>, such as large git repos that I contribute to infrequently but
don’t manage, or cache or temporary directories.</p></li>
<li>
<p>The option <span class="stt">--lock-wait</span> specifies how long to wait for a lock.
Only one client can write to the backup repository at a time.
I use 10 minutes as a default; my clients usually only take a minute or so to
finish running a backup, so waiting 10 minutes should be enough for all clients
to finish if there’s contention.</p></li>
<li>
<p>Line 47, <span class="stt">$REPO::'{hostname}+' ...</span>, tells <span class="stt">borg</span> where the backup
repository is located (before the <span class="stt">::</span>), and what the backup archive should be
named.
I name the archive using the hostname of the client, followed by <span class="stt">+</span> as a
delimiter, followed optionally by some tag, followed by a timestamp.
This naming scheme makes it easy to sort and filter backups when validating
backups or searching for a restore point.</p></li>
<li>
<p>The remaining lines are files or directories to include in the backup archive.
All files and sub-directories, recursively, are includes, unless excluded by
one of the above exclude options.</p></li></ul></div>
<h2>4.2
<tt> </tt><a name="(part._sec~3aconfig-client~3aexclude)"></a>Exclude Extraneous Files From Backup</h2>
<p>My <span class="stt">~/borg-exclude</span> file is below.
Install this file in <span class="stt">~/</span> on the client; it only needs read permissions for
<span class="stt">client-user</span>.</p>
<p></p>
<div class="SIntrapara"><a href="//resources/@|filename|">borg-exclude</a></div>
<div class="SIntrapara">
<div class="brush: shell">
<pre><code>re:/\.ssh
re:/\.bash_history
.zsh_*
re:/no-backup/
re:/\.junk/
re:/\.cron/
re:workspace/aur4/.*/pkg
re:workspace/aur4/.*/src
re:compiled/
*.tar.xz
*.tar.gz
*/.emacs.d
*/.unison/fp*
*/.unison/ar*
*/.vim/bundle
*~
.*.trash
*.aux
*.log
*.out
*.toc
*.fls
*.swp
*.class
*.pyc
*.fdb_latexmk
*.o
*.out
*.xpi
*.zo
*.dep
*.vo
*.glob
*.bbl
*.safe
*.agdai
*.hi
*.tdo
re:\.mutt/cache
re:\.mutt/sent
re:workspace/.*/paper.pdf
re:workspace/.*/techrpt.pdf
re:workspace/.*/final.pdf
*/retex-cache/*
re:\.gnupg/S\..*
re:\.~lock.*\.odp#
y
re:/Pictures/.*/\._
re:/Pictures/.*/\.comments
*.DS_Store</code></pre></div></div>
<p>This configuration file accepts exclude patterns, one per line.
Each exclude pattern can be either a shell glob or regexp pattern prefixed by
<span class="stt">re:</span>.
I exclude lots of generated files patterns, certain mail folders, and files or
folders that are tracked by other systems.
Some depend on my workflows and naming conventions, so they might not be
relevant to you.</p>
<p>If I want to exclude some folder that doens’t neatly fit a pattern, I run
<span class="stt">touch path/to/folder/.borg-ignore</span>, and <span class="stt">borg</span> will automatically
begin ignoring it due to the <span class="stt">--exclude-if-present</span> option in
<span class="stt">borg-backup.sh</span>.</p>
<p>Be sure to run <span class="stt">touch ~/backups/.borg-ignore</span>.
This will prevent you from DOSing yourself if either you use a client-only
configuration, or if your clients are also mirrors.</p>
<h2>4.3
<tt> </tt><a name="(part._.Configure_.Access_to_the_.Backup_.Repository)"></a>Configure Access to the Backup Repository</h2>
<p>Finally, we need to make sure the backup script has uninterrupted access to the
backup repository.</p>
<h3>4.3.1
<tt> </tt><a name="(part._.Client-only_.Repository_.Folder)"></a>Client-only Repository Folder</h3>
<p>If you’re using a client-only configuration, you’re done!</p>
<h3>4.3.2
<tt> </tt><a name="(part._.Backup_.Server_via_.S.S.H)"></a>Backup Server via SSH</h3>
<p>If you’re running a separate server, we’ll configure SSH access.
Ideally, we don’t even want to be prompted for an SSH key password to ensure
backups are running uninterrupted.
(Although, I do deal with this on one of my clients, because I haven’t
configured the keychain to cache the SSH key while logged in.)</p>
<p>I recommend configuring access through the <span class="stt">.ssh/config</span> file, and either a
keychain that caches your SSH key that you use everywhere (probably acceptable
security), or a fresh passwordless SSH key the provides <span class="stt">client-user</span>
restricted access to <span class="stt">borg</span> as the <span class="stt">backupd</span> user on
<span class="stt">backup-server.tld</span> (better practice security).</p>
<p>I’ll assume you have a fresh passwordless private key called
<span class="stt">~/.ssh/id_rsa-borg-client</span> paired with the public key
<span class="stt">~/.ssh/id_rsa-borg-client.pub</span> on the client machines.
You can generate a fresh passwordless key-pair with:</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">ssh-keygen -t rsa -b 4096 -C "borg client" -f /home/client-user/.ssh/id_rsa-borg-client -P ""</span></p></td></tr></tbody></table></div>
<p>Make sure to set the permissions correctly, restricting access to the private key.</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">chmod 600 ~/.ssh/id_rsa-borg-client</span></p></td></tr></tbody></table></div>
<p>Add the following snippet to your <span class="stt">.ssh/config</span>, and the
<span class="stt">borg-backup.sh</span> will automatically use the SSH key
<span class="stt">~/.ssh/id_rsa-borg-client</span> on the client machine when connecting as
<span class="stt">backupd</span> to the <span class="stt">backup-server.tld</span>.</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">Host borg-server</span></p></td></tr>
<tr>
<td>
<p><span class="stt"></span><span class="hspace"> </span><span class="stt">Hostname backup-server.tld</span></p></td></tr>
<tr>
<td>
<p><span class="stt"></span><span class="hspace"> </span><span class="stt">IdentityFile ~/.ssh/id_rsa-borg-client</span></p></td></tr>
<tr>
<td>
<p><span class="stt"></span><span class="hspace"> </span><span class="stt">User backupd</span></p></td></tr>
<tr>
<td>
<p><span class="stt"></span><span class="hspace"> </span><span class="stt">ForwardAgent no</span></p></td></tr></tbody></table></div>
<h3>4.3.3
<tt> </tt><a name="(part._.Least_.Priviledge_for_.Client_.S.S.H_.Key)"></a>Least Priviledge for Client SSH Key</h3>
<p>If you want to follow better practice security, you should restrict access for
the <span class="stt">id_rsa-borg-client</span> key so it has only the permission it needs: to
communicate with the <span class="stt">borg</span> server.
Add the following line to <span class="stt">~/.ssh/authorized_keys</span> for <span class="stt">backupd</span> on
the server, replacing <span class="stt"><id_rsa-borg-client.pub></span> by the contents of the
public key <span class="stt">~/.ssh/id_rsa-borg-client.pub</span> from the client.</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">command="/home/backupd/.ssh/ssh-borg-serve.sh",no-pty,no-agent-forwarding,no-port-forwarding <id_rsa-borg-client.pub></span></p></td></tr></tbody></table></div>
<p>Next, install the following file in <span class="stt">~/.ssh/</span> on the server and give it
execute permissions with <span class="stt">chmod +x ~/.ssh/ssh-borg-serve.sh</span>.</p>
<p></p>
<div class="SIntrapara"><a href="//resources/@|filename|">ssh-borg-serve.sh</a></div>
<div class="SIntrapara">
<div class="brush: shell">
<pre><code>#!/bin/sh
set -f
case "$SSH_ORIGINAL_COMMAND" in
"borg serve"*)
exec $SSH_ORIGINAL_COMMAND
;;
# "/usr/lib/ssh/sftp-server")
# exec /usr/lib/ssh/sftp-server -R
# ;;
*)
echo "Invalid command $SSH_ORIGINAL_COMMAND"
exit 1
;;
esac</code></pre></div></div>
<p>This will allow the key <span class="stt">id_rsa-borg-client</span> to run <span class="emph">only</span> a command
starting with <span class="stt">borg serve</span>, which launches the <span class="stt">borg</span> server.
If an attacker gets your <span class="stt">id_rsa-borg-client</span> key, they can launch the
<span class="stt">borg</span> server, but without the backup repository password, they won’t be
able to do anything.</p>
<p>The second, commented out, command would enable the client to launch a read-only
SFTP server.
This is useful for making all clients mirrors.
However, allowing the client key to also use the SFTP server violates the
principle of least privilege, and you should instead configure a separate mirror
key as described in <a data-pltdoc="x" href="#%28part._sec~3amirrors%29">Configure Mirrors</a>.
An attacker with SFTP access would be able to download the encrypted repository,
and possibly read other files on the server.</p>
<h1>5
<tt> </tt><a name="(part._sec~3amirrors)"></a>Configure Mirrors</h1>
<p>Having backups stored offsite is good, but what if the server goes down, or
is struck by a meteor?
It’s best to have not only offsite backups, but redundant offsite backups.
Thankfully, this is easy to support.
Particularly, if you, like me, have too many computers: a laptop, a desktop, a
media server, a VPS, and a work computer... mirrors galore!</p>
<p>On each mirror, we configure <span class="stt">rclone</span> with the server as a remote.
Add the following to <span class="stt">~/.config/rclone/rclone.conf</span> on the mirror.</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">[borg-server]</span></p></td></tr>
<tr>
<td>
<p><span class="stt">type = sftp</span></p></td></tr>
<tr>
<td>
<p><span class="stt">host = backup-server.tld</span></p></td></tr>
<tr>
<td>
<p><span class="stt">user = backupd</span></p></td></tr>
<tr>
<td>
<p><span class="stt">port =</span></p></td></tr>
<tr>
<td>
<p><span class="stt">pass =</span></p></td></tr>
<tr>
<td>
<p><span class="stt">key_file = id_rsa-borg-mirror</span></p></td></tr>
<tr>
<td>
<p><span class="stt">md5sum_command = md5sum</span></p></td></tr>
<tr>
<td>
<p><span class="stt">sha1sum_command = sha1sum</span></p></td></tr></tbody></table></div>
<p>This tells <span class="stt">rclone</span> how to connect to the server via SFTP.
Following principle of least privilege, we’ll need a new key pair for the
mirror.</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">ssh-keygen -t rsa -b 4096 -C "borg mirror" -f /home/client-user/.ssh/id_rsa-borg-mirror -P ""</span></p></td></tr>
<tr>
<td>
<p><span class="stt">chmod 600 ~/.ssh/id_rsa-borg-mirror</span></p></td></tr></tbody></table></div>
<p>And we need to install and restrict the key on the server.
Add the following line to the <span class="stt">~/.ssh/authorized-keys</span> file on the server.</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">command="/home/backupd/.ssh/ssh-borg-mirror.sh",no-pty,no-agent-forwarding,no-port-forwarding <id_rsa-borg-mirror.pub></span></p></td></tr></tbody></table></div>
<p>Next, install the following file <span class="stt">~/.ssh/</span> on the server and give it
execute permissions with <span class="stt">chmod +x ~/.ssh/ssh-borg-mirror.sh</span>.</p>
<p></p>
<div class="SIntrapara"><a href="//resources/@|filename|">ssh-borg-mirror.sh</a></div>
<div class="SIntrapara">
<div class="brush: shell">
<pre><code>#!/bin/sh
set -f
case "$SSH_ORIGINAL_COMMAND" in
"/usr/lib/ssh/sftp-server")
exec /usr/lib/ssh/sftp-server -R
;;
*)
echo "Invalid command $SSH_ORIGINAL_COMMAND"
exit 1
;;
esac</code></pre></div></div>
<p>This restricts the mirror’s key so it can only be used to launch the SFTP server
in read-only mode.</p>
<p>Finally, set up a cron job to mirror the repository.
Run <span class="stt">crontab -e</span> on the mirror and enter:</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">@hourly rclone sync borg-server:backups ~/backups</span></p></td></tr></tbody></table></div>
<p><span class="stt">rclone</span> will perform a one-way sync from the server to the mirror every
hour.
<span class="stt">rclone</span> uses a delta transfer algorithm with caching.
It’s faster than <span class="stt">rsync</span>, but with the same low-bandwidth transfer.
It also supports more backends than <span class="stt">rsync</span>, so you can set up additional
mirrors to cloud services like Dropbox, Google Drive, etc, if you want.</p>
<p>Now when a meteor strikes your server just after a burglar stole your laptop,
you’ll still have your data.
Setup LOTS of mirrors for extra redundancy.</p>
<h2>5.1
<tt> </tt><a name="(part._.Least_.Priviledge_for_.Mirrors)"></a>Least Priviledge for Mirrors</h2>
<p>I know it seems like we already did this with the whole read-only SFTP server,
but that’s not enough.
Right now, an attacker compromising the mirror key can read <span class="emph">any</span> file that
<span class="stt">backupd</span> has access to.
That’s no good.
Better security practice would be to configure the SSH daemon to <span class="stt">chroot</span> the
mirror to the <span class="stt">~/backups</span> directory, so they can only read this folder.
Recall this folder is encrypted, so an attacker compromising the mirror SSH key
still has to break the encryption to get anything.</p>
<p>Unfortunately, this requires root access on the server, reconfiguring the SSH
daemon, and creating and managing multiple user and group permissions, which you
may be unable or unwilling to do.</p>
<p>To <span class="stt">chroot</span> the mirror, we need a second user on the server, which I’ll call
<span class="stt">mirrord</span>.
The <span class="stt">ssh-borg-mirror.sh</span> script and addition to <span class="stt">authorized_keys</span> we
added to <span class="stt">backupd</span> above should be thrown out, as we require a different
configuration to <span class="stt">chroot</span>.</p>
<p>Next, we need a new group, <span class="stt">mirrorg</span>, to provide <span class="stt">mirrord</span> read access
to the directory <span class="stt">~backupd/backups</span>, owned by <span class="stt">backupd</span>.</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">groupadd mirrorg</span></p></td></tr>
<tr>
<td>
<p><span class="stt">gpasswd -a mirrord mirrorg</span></p></td></tr></tbody></table></div>
<p>Now we set the group on <span class="stt">~/backups</span> to <span class="stt">mirrorg</span>, and provide the
group read access.
As user <span class="stt">backupd</span>, run the following commands.</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">chgrp ~backupd/backups</span></p></td></tr>
<tr>
<td>
<p><span class="stt">chmod g+r -R ~backupd/backups</span></p></td></tr></tbody></table></div>
<p>We need to modify the <span class="stt">ssh-borg-serve.sh</span> script (owned by <span class="stt">backupd</span>)
to maintain the group-read permission.
Change the file using the following diff.</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">- exec $SSH_ORIGINAL_COMMAND</span></p></td></tr>
<tr>
<td>
<p><span class="stt">+ exec borg serve --umask=027</span></p></td></tr></tbody></table></div>
<p>This will force the <span class="stt">borg</span> server to provide read permissions to
<span class="stt">mirrorg</span> when writing to the backup repository.</p>
<p>Now, modify the SSH daemon to <span class="stt">chroot</span> the <span class="stt">mirrord</span> user.
As <span class="stt">root</span> on the server, add the following to <span class="stt">/etc/ssh/sshd_config</span>.</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">Match User mirrord</span></p></td></tr>
<tr>
<td>
<p><span class="stt"></span><span class="hspace"> </span><span class="stt">ChrootDirectory ~backups/backupd</span></p></td></tr>
<tr>
<td>
<p><span class="stt"></span><span class="hspace"> </span><span class="stt">ForceCommand internal-sftp -R</span></p></td></tr>
<tr>
<td>
<p><span class="stt"></span><span class="hspace"> </span><span class="stt">AllowTcpForwarding no</span></p></td></tr>
<tr>
<td>
<p><span class="stt"></span><span class="hspace"> </span><span class="stt">X11Forwarding no</span></p></td></tr>
<tr>
<td>
<p><span class="stt"></span><span class="hspace"> </span><span class="stt">PasswordAuthentication no</span></p></td></tr></tbody></table></div>
<p>Finally, add the following line to <span class="stt">~/.ssh/authorized_keys</span> for <span class="stt">mirrord</span>.</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt"><id_rsa-borg-mirror.pub></span></p></td></tr></tbody></table></div>
<p>Note that we do not require any restrictions, since the SSH daemon is already
restricting <span class="stt">mirrord</span>.</p>
<p>Now you have a pretty secure mirror.</p>
<h1>6
<tt> </tt><a name="(part._sec~3amonitor)"></a>Monitor and Check Backups</h1>
<h2>6.1
<tt> </tt><a name="(part._.Check_.Backups_are_.Happening)"></a>Check Backups are Happening</h2>
<p>Backups are no good if you can’t restore from them.
I have a weekly reminder to check on my backups.
To check, I run <span class="stt">borg list -P machine-name+</span> on the repository machine
(server, or client-only), which lists the backups for the machine with
<span class="stt">hostname</span> "machine-name".
I check to see that hourly backups are being created for each client.
If they aren’t, the daemon on that client may not be working for some reason.</p>
<h2>6.2
<tt> </tt><a name="(part._.Integrity_.Check_the_.Repository)"></a>Integrity Check the Repository</h2>
<p>Every month of so, I run <span class="stt">borg check ~/backups</span>.
This runs some integrity checks on the whole repository, and can take a while.
I recommend running it in a <span class="stt">screen</span> session so you can disconnect and
check back on it later.
I’ve never had any integrity problems.</p>
<h2>6.3
<tt> </tt><a name="(part._.Prune_.Expired_.Snapshots)"></a>Prune Expired Snapshots</h2>
<p>I don’t want to keep hourly snapshots forever.
I have a policy for expiring backups, and a script for doing it.
I keep hourly snapshots for the last 24 hours, daily snapshots for the last
week, weekly snapshots for the last month, and monthly snapshots forever.
With deduplication and my workload, this strikes a good balance between data
recovery and minimizing the repository size.</p>
<p>Each week after checking my backups, I run the following script to prune any
expired snapshots:</p>
<p></p>
<div class="SIntrapara"><a href="//resources/@|filename|">borg-prune.sh</a></div>
<div class="SIntrapara">
<div class="brush: shell">
<pre><code>#!/bin/sh
# borg-prune.sh
## Usage
# - borg-prune.sh machine-name Perform a pruning dry-run, seeing what
# would be pruned.
# - borg-prune.sh machine-name --wet Perform a non-dry run.
REPO=$HOME/backups
DRY_RUN="-n"
if [[ "$2" == "--wet" ]]; then
echo "Pruning..."
DRY_RUN=""
fi
borg prune --list $REPO --prefix "$1+" \
--keep-hourly 24 \
--keep-daily 7 \
--keep-weekly 4 \
--keep-monthly -1 \
--keep-yearly -1 \
$DRY_RUN \
-v</code></pre></div></div>
<h2>6.4
<tt> </tt><a name="(part._.Finding_.Large_.Extraneous_.Files_in_the_.Repository)"></a>Finding Large Extraneous Files in the Repository</h2>
<p>Sometimes, a large file will get backed up and make the repository unnecessary
large.
A few times, I’ve accidental backed up the entire repository in itself, DOSing
my VPS by filling the drive.</p>
<p><span class="stt">borg</span> makes it sort of easy to find these mistakes.</p>
<p>On the repository machine, run <span class="stt">borg info -P machine-name+</span> to get a print
out of the size of each archive for <span class="stt">machine-name</span>.
When one of the archives prints out as suddenly larger, that’s usually a good target.
Copy that archive name; I’ll call it <span class="stt">$archive_name</span>.</p>
<p>Next, we mount the archive to see what files are too large.
Run the following commands on repository machine.</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">mkdir -p /tmp/borg</span></p></td></tr>
<tr>
<td>
<p><span class="stt">borg mount ~/backups::$archive_name</span></p></td></tr></tbody></table></div>
<p>Now we can explore the mounted archive to find large files.
I run the command the following command, which I alias as <span class="stt">ducks</span> in my shell.</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">du -sch * .* | sort -rn | head</span></p></td></tr></tbody></table></div>
<p>This will print out a list of the 10 largest files or folders in the current
directory.
You might need to exclude the <span class="stt">.*</span> pattern if there are no hidden files.</p>
<p>I then follow the large directories until I find a likely looking file; call it
<span class="stt">/path/to/large-unnecessary-file</span>.</p>
<p>Once we find a file, we want to exclude it from further backups and remove it
from existing backups.
I add it to the <span class="stt">borg-exclude</span> patterns or add a <span class="stt">.borg-ignore</span> file
as appropriate.
Then, I run the following loop to recreate and filter all archives.
This loop is in <span class="stt">fish</span> syntax; you’ll need to figure out loops in your
shell on your own, because I’ve never figured out how to write a shell loop
properly.</p>
<p>I’ve never had any problems, but <span class="emph">you should backup your repository before
running <span class="stt">borg recreate</span></span>.
Use <span class="stt">rclone</span> to put it anywhere else, at least temporarily.</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">for archive in (borg list --lock-wait 600 -P machine-name+ ~/backups | cut -f 1 -d ' ')</span></p></td></tr>
<tr>
<td>
<p><span class="stt"></span><span class="hspace"> </span><span class="stt">yes YES | borg recreate --lock-wait 600 -C lzma,9 -s --exclude "/path/to/large-unnecessary-file" backups::$archive</span></p></td></tr>
<tr>
<td>
<p><span class="stt">end</span></p></td></tr></tbody></table></div>
<p>This is considered experimental, so it requires that you confirm each recreation
by typing "YES".
I just pipe <span class="stt">yes YES</span> because I like to live on the edge, and have mirrors
of this repository if I break something.</p>
<p><span class="stt">borg recreate</span> can take multiple <span class="stt">--exclude</span> flags if you find
multiple files you want removed.
It will also recompress the archive, so you can specify new and different
compression options with <span class="stt">-C</span>, if you want to change the compression
algorithm.</p>
<p>Now the file should be excluded from all existing archives.</p>
<h1>7
<tt> </tt><a name="(part._.Restore_from_.Backups)"></a>Restore from Backups</h1>
<p>In the likely event that you need to restore from backups, run <span class="stt">borg list
-P machine-name+</span> to list the archives available for <span class="stt">machine-name</span>.
This will give you a list of archive names on the left, with some metadata on
the right.
Copy and paste the name for the archive you want to restore from; I’ll call this
<span class="stt">$archive_name</span>.</p>
<p>Next, we mount that archive.
Running the following commands, which will create a temporary mount point and
mount the archive.</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">mkdir -p /tmp/borg</span></p></td></tr>
<tr>
<td>
<p><span class="stt">borg mount ~/backups::$archive_name</span></p></td></tr></tbody></table></div>
<p>You can now see all your backed-up files in <span class="stt">/tmp/borg</span>.</p>
<p>Next, from the client, copy over your files:</p>
<div class="SCodeFlow">
<table cellpadding="0" cellspacing="0" class="SVerbatim">
<tbody>
<tr>
<td>
<p><span class="stt">rsync -avz --progress backupd@backup-server.tld:/tmp/borg/ /</span></p></td></tr></tbody></table></div>How I Redex---Experimenting with Languages in Redexurn:https-www-williamjbowman-com:-blog-2019-10-06-how-i-redex-experimenting-with-languages-in-redex2019-10-06T19:45:13Z2019-10-06T19:45:13ZWilliam J. Bowman
<p>Recently, I asked my research assistant, Paulette, to create a Redex model. She had never used Redex, so I pointed her to the usual tutorials:</p>
<ul>
<li><a href="https://redex.racket-lang.org/">https://redex.racket-lang.org/</a></li>
<li><a href="https://docs.racket-lang.org/redex/tutorial.html">https://docs.racket-lang.org/redex/tutorial.html</a></li>
<li><a href="https://docs.racket-lang.org/redex/redex2015.html">https://docs.racket-lang.org/redex/redex2015.html</a></li></ul>
<p>While she was able to create the model from the tutorials, she was left the question “what next?”. I realized that the existing tutorials and documentation for Redex do a good job of explaining <em>how</em> to implement a Redex model, but fail to communicate <em>why</em> and <em>what</em> one does with a Redex model.</p>
<p>I decided to write a tutorial that introduces Redex from the perspective I approach Redex while doing work on language models—a tool to experiment with language models. The tutorial was originally going to be a blog post, but it ended up quite a bit longer that is reasonable to see in a single page, so I’ve published it as a document here:</p>
<div style="text-align: center">
<h3><a href="/doc/experimenting-with-redex/">Experimenting with Languages in Redex</a></h3></div>