Moving to HGKeeper

Motivation

So, where do we come from? I started using the Mercurial version control system around 2009 if I remember correctly. Had used Subversion and SVK (does anyone remember that?) before and was curious about distributed version control. Back then Mercurial was better suited for platform independent use on Linux and Windows, and I still had to use the latter at work. Mercurial’s user interface was very much the same as Subversion’s and basically just push and pull were added, switching from Subversion to Mercurial was easy. Git was weird at the time, at least for me. Meanwhile we use Git at work exclusively and it got a lot better on Windows over time. However for nostalgic reasons and to stay somewhat fluent in other VCS I kept most of my private projects on Mercurial.

For “hosting” my own repos I used the Debian package mercurial-server, at least up to Debian 9 (stretch), but after upgrading that server to Debian 10 (buster) things started falling apart, and I looked out for a new hosting solution. For the record: I thought about converting all those repos to Git, but opted for not doing that, because I have accumulated quite a number of repos and although I did convert one or two already, I supposed it would be easier to switch the hosting instead of each repo.

Speaking of hosting: I don’t need a huge forge for myself, just some rather simple solution for having server side “central” repos so I can easily work from different laptops or workstations. So I scanned over MercurialHosting in the Mercurial wiki and every self hosting solution seemed like cracking a nut with a sledgehammer, except HGKeeper.

Introducing HGKeeper

HGKeeper introduces itself like this in its own repo:

HGKeeper is an server for mercurial repositories. It provides access control for SSH access and public HTTP access via hgweb.
It was originally designed to be run in a container but recently support has been added to run it from an existing openssh-server.

SSH and simple HTTP is all I need and running in a container suits me right, especially after I had started deploying things with Docker and Ansible and could a little more practice with that. Running in a container is especially helpful when running things implemented in those fancy new languages like Go or Rust on an old fashioned Linux like Debian. (For reference see for example Package managers all the way down on lwn about how modern languages create a dependency hell for classical Linux distributions.)

Running the HGKeeper Docker container itself was easy, however SSH access would go through a non-standard port, at least if I wanted to keep accessing the host machine through port 22.

The README promised HGKeeper can also be run together with OpenSSH running on default port. But is it possible to do both or all of this? Run in a container, access HGKeeper through port 22 and keep access to the host on the same port? I reached out to Gary Kramlich, the author of HGKeeper and that was a very nice experience. Let’s say I nerd sniped him somehow?!

Installing HGKeeper

So the goal is to run HGKeeper in a Docker container and access that through OpenSSH. While doing the setup I decided to go through an SSH server on a different machine, the one that’s exposed to the internet from my local network anyways, and where mercurial-server was installed before. So access from outside goes through OpenSSH on standard port 22 hg.example.com which is an alias for the virtual machine falbala.internal.example.com. That machine tunnels the traffic to another virtual machine miraculix.internal.example.com where HGKeeper runs on in the Docker container, with SSH port 22022 exposed to the local network.

Preparations

We follow the HGKeeper README and prepare things on the Docker host (miraculix) first. I created a directory /srv/data/hgkeeper where all related data is supposed to live. In the subfolder host-keys I created the SSH host keys as suggested in section “SSH Host Keys”:

$ ssh-keygen -t rsa -b 4096 -o host-keys/ssh_host_rsa_key

The docker container itself needs some preparation, so we run it once manually like suggested in section “Running in a Container”. The important part here is to pass the SSH public key of the client workstation you will access the HG repos first from. I copied that from my laptop to /srv/data/hgkeeper/tmp/ before. The admin username passed here (alex) should also be adapted to your needs:

cd /srv/data/hgkeeper
docker run --rm \
    -v $(pwd)/repos:/repos \
    -v $(pwd)/tmp/id_rsa.pub:/admin-pubkey:ro \
    -e HGK_ADMIN_USERNAME=alex \
    -e HGK_ADMIN_PUBKEY=/admin-pubkey \
    -e HGK_REPOS_PATH=/repos \
    docker.io/rwgrim/hgkeeper:latest \
    hgkeeper setup

Setting up OpenSSH

As stated before, I tried to setup as much things as possible with Ansible. The preparation stuff above could probably also done with Ansible, but I had that in place from playing around and did not bother at the time. It depends on your philosophy anyways if you want to automate such sensitive tasks as creating crypto keys. However from here on I have everything in a playbook, and will show that snippets to illustrate my setup. The OpenSSH config is for the host falbala and thus in falbala.yml so see the first part here:

---
- hosts: falbala
 become: true

 vars:
   hg_homedir: /var/lib/hg

 tasks:
   - name: Add system user hg
     user:
       name: hg
       comment: Mercurial Server
       group: nogroup
       shell: /bin/sh
       system: yes
       create_home: yes
       home: "{{ hg_homedir }}"

That system needs a local user. You can name it as you like, but it was named hg on my old setup and I wanted to keep that, so I don’t have to change my working copies. The user needs to have a shell set, otherwise OpenSSH won’t be able to call commands needed later. I use the $HOME dir to put the SSH known_hosts file in it, so it does not clutter my global settings. Doing this manually on Debian would look like this:

% sudo adduser --system --home /var/lib/hg --shell /bin/sh hg

Next step is that known_hosts file. You can do this manually by logging into that hg user once and do a manual connection to the SSH server on the other machine like this:

$ sudo -i -u hg
$ ssh -p 22022 hg@miraculix.internal.example.com

For Ansible I prepared a known_hosts file and that was somewhat tricky due to the different port used. You can not just look into your present files for reference, because host and port are hashed in there, and the documentation (man 8 sshd) does not cover that part. I had to guess from ssh -v output. The file I came up with is named pubkeys/hgkeeper in my Ansible project and it looks like this:

[miraculix.internal.example.com]:22022 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDD8hrAkg7z1ao3Hq1w/4u9Khxc4aDUfJiKfbhin0cYRY7XrNIn3mix9gwajGWlV1m0P9nyXiNTW4E/Z
W0rgTF4I1PZs3dbh66dIAH7Jif4YLFj5VPj350TF5XeytyFjalecWBa36S1Y+UydIY/o/yC104D5Hg7M27bQzL+blqk1eIlY0aaM+faEuxFYHexK5fa+Xq150F6NswHdsVPCYOKu6t+myGHpe2+X6qhVNuftDOP5J
QO6BzxhN6MmG1arZ9dkeBb6Ry++R4o3soeV1k9uZ33jbJGnqFryvL3cyOPq7mVdoSffwqef1i4+0fNTGgO8U93w2An6z5fRjvPufA+VIVvFDwRoREFKvO1Q+WdeOSUWOl6QwVjKPrv0M3QnSnTJHpZpNlshOaDZyQ
NHLXLEO43vdbGr6rk7l9ApUcF34Y7eLWp42XktQLlzDitua009v7uNBAuIzKR3+UAWaFpj+CGl1jDm7a3n8kXlJjumVN5hfXo0lLz7n+G/Yd/U87dHftL0kiYcVRR4n1qMmhV5UL4lq0FNDBwwzRzSKyNw80mRoMH
RiKBBUTFXJApzlIAXiJ7g1JThM2rcNnskpyhZSrL38ses5Ns2GBOzEZsi51U+S5O91+KwHDTb10sxoJskUvIyJxCUILkOGZpbd4uWI+6tAWycP4QMT33MUHFEQ==

With that in place, it’s straight forward in the playbook:

   - name: Ensure user hg has .ssh dir
     file:
       path: "{{ hg_homedir }}/.ssh"
       state: directory
       owner: hg
       group: nogroup
       mode: '0700'

   - name: Ensure known_hosts entry for miraculix exists
     known_hosts:
       path: "{{ hg_homedir }}/.ssh/known_hosts"
       name: "[miraculix.internal.example.com]:22022"
       key: "{{ lookup('file', 'pubkeys/hgkeeper') }}"

   - name: Ensure access rights for known_hosts file
     file:
       path: "{{ hg_homedir }}/.ssh/known_hosts"
       state: file
       owner: hg
       group: nogroup

For the SSH daemon configuration two things are needed. First, if you use domain names instead of IP addresses, you have to set UseDNS yes in sshd_config. This snipped does it in Ansible:

   - name: Ensure OpenSSH does remote host name resolution
     lineinfile:
       path: /etc/ssh/sshd_config
       regexp: '^#?UseDNS'
       line: 'UseDNS yes'
       validate: /usr/sbin/sshd -T -f %s
       backup: yes
     notify:
       - Restart sshd

The second, and most important part is matching the hg user, authenticating against the HGKeeper app running on the other host and tunneling the traffic through. This is done with the following snippet, which contains the literal block to be added to /etc/ssh/sshd_config if you’re doing it manually:

   - name: Ensure SSH tunnel block to hgkeeper is present
     blockinfile:
       path: /etc/ssh/sshd_config
       marker: "# {mark} ANSIBLE MANAGED BLOCK"
       insertafter: EOF
       validate: /usr/sbin/sshd -T -C user=hg -f %s
       backup: yes
       block: |
         Match User hg
             AuthorizedKeysCommand /usr/bin/curl -q --data-urlencode 'fp=%f' --get http://miraculix.internal.example.com:8081/hgk/authorized_keys
             AuthorizedKeysCommandUser hg
             PasswordAuthentication no
     notify:
       - Restart sshd

You might have noticed two things. curl has to be installed, and sshd should be restarted after its config has change. Here:

   - name: Ensure curl is installed
     package:
       name: curl
       state: present

  handlers:
    - name: Restart sshd
      service:
        name: sshd
        state: reloaded

Running HGKeeper Docker Container with Ansible

Running a Docker container from Ansible is quite easy. I just translated the call to docker run and its arguments from the HGKeeper documentation:

---
- hosts: miraculix
  become: true

  vars:
    data_basedir: /srv/data
    hgkeeper_data: "{{ data_basedir }}/hgkeeper"
    hgkeeper_host_keys: host-keys
    hgkeeper_repos: repos
    hgkeeper_ssh_port: "22022"

  tasks:
   - name: Setup docker container for HGKeeper
     docker_container:
       name: hgkeeper
       image: "docker.io/rwgrim/hgkeeper:latest"
       pull: true
       state: started
       detach: true
       restart_policy: unless-stopped
       volumes:
         - "{{ hgkeeper_data }}/{{ hgkeeper_host_keys }}:/{{ hgkeeper_host_keys }}:ro"
         - "{{ hgkeeper_data }}/{{ hgkeeper_repos }}:/{{ hgkeeper_repos }}"
       env:
         HGK_SSH_HOST_KEYS: "/{{ hgkeeper_host_keys }}"
         HGK_REPOS_PATH: "/{{ hgkeeper_repos }}"
         HGK_EXTERNAL_HOSTNAME: "miraculix.internal.example.com"
         HGK_EXTERNAL_PORT: "{{ hgkeeper_ssh_port }}"
       ports:
         - "8081:8080"   # http
         - "{{ hgkeeper_ssh_port }}:22222"   # ssh
       command: hgkeeper serve

Client Settings

Almost done, after trivially copying over my old repos from the old virtual machine to the new, the server is ready. For the laptops and workstations nothing in my setup has to be changed, but one thing. The new setup needs Agent Forwarding in the SSH client config. But is simple, see the lines I added to ~/.ssh/config here:

Host hg.example.com hg
HostName hg.example.com
ForwardAgent yes

After all this was a pleasant endeavor in both working on the project itself as well as the outcome I have now.

antiblau blog

tech blog, another one …