The Trouble with Tribbles...: HA PostgreSQL on Tribblix with Patroni

When it comes to managing PostgreSQL replication, there are a number of options available.

Updated: If you're running Tribblix m22 or later, you'll need python-3.7 as shown below. On older releases, use python-2.7 in the commands below.

I looked at stolon, but it's not the only game in town. In terms of a fully managed system, there's also patroni.

In terms of overall functionality, stolon and patroni are pretty similar. They both rely on etcd (or something similar) for storing state; they both take care of running the postgres server with the right options, and reconfiguring it as necessary; they'll both promote a replica to continue service if the master fails.

So, here's how to set up a HA PostgreSQL cluster using patroni.

Before starting on anything like this with Tribblix, it's always a good idea to

zap refresh

so that you're up to date in terms of packages and overlays.

First create 3 zones, just like before:

zap create-zone -z node1 -t whole \
-o base -O patroni -x 192.168.0.231

zap create-zone -z node2 -t whole \
-o base -O patroni -x 192.168.0.232

zap create-zone -z node3 -t whole \
-o base -O patroni -x 192.168.0.233

Building the zones like this, with the patroni overlay, will ensure that all the required packages are installed in the zones so you don't need to mess around with installing packages later.

Then zlogin to each node and run the etcd commands as before, to create the user and start etcd.

Now create a user to run postgres on each node

zlogin node1 (and 2 and 3)
useradd -u 11799 -g staff -s /bin/bash -d /export/home/pguser pguser
passwd -N pguser
mkdir -p /export/home/pguser
chown -hR pguser /export/home/pguser

Now you need to create yaml files containing the configuration for each node. See http://petertribble.co.uk/Solaris/patroni/ for the sample files I've used here

Log in to each node in turn

pfexec zlogin -l pguser node1

wget http://petertribble.co.uk/Solaris/patroni/node1.yaml
/usr/versions/python-3.7/bin/patroni ${HOME}/node1.yaml

And it initializes a cluster, with just the one node as of yet, and that node will start off as the master.

Now the 2nd node

pfexec zlogin -l pguser node2

wget http://petertribble.co.uk/Solaris/patroni/node2.yaml
/usr/versions/python-3.7/bin/patroni ${HOME}/node2.yaml

And it sets it up as a secondary, replicating from node1.

What do things look like right now? You can check that with:

/usr/versions/python-3.7/bin/patronictl \
-d etcd://192.168.0.231:2379 \
list my-ha-cluster

Now the third node:

pfexec zlogin -l pguser node3

wget http://petertribble.co.uk/Solaris/patroni/node3.yaml
/usr/versions/python-3.7/bin/patroni ${HOME}/node3.yaml

You can force a failover by killing (or ^C) the patroni process on the master, which should be node1. You'll see one of the replicas coming up as master, and replication on the other replica change to use the new master. One thing I did notice is that patroni initiates the failover process pretty much instantly, whereas stolon waits a few seconds to be sure.

You can initiate a planned failover too:

/usr/versions/python-3.7/bin/patronictl \
-d etcd://192.168.0.231:2379 \
failover my-ha-cluster

It will ask you for the new master node, and for confirmation, and then you'll have a new master.

But you're not done yet. There's nothing to connect to. For that, patroni doesn't supply its own component (like stolon does with its proxy) but depends on a haproxy instance. The overlay install we used when creating the zone will have made sure that haproxy is installed in each zone, all we have to do is configure and start it.

zlogin to each node, as root, and

wget http://petertribble.co.uk/Solaris/patroni/haproxy.cfg -O /etc/haproxy.cfg
svcadm enable haproxy

You don't have to set up the haproxy stats page, but it's a convenient way to see what's going on. If you go to the stats page

http://192.168.0.231:7000

Then you can see that it's got the active backend up and the secondaries marked as down - haproxy is checking the patroni REST api which is only showing the active postgres instance as up, so haproxy will route all connections through to the master. And, if you migrate the master, haproxy will follow it.

Which to choose? That's always a matter of opinion, and to be honest while there are a few differences, they're pretty much even.

stolon is in go, and comes as a tiny number of standalone binaries, which makes it easier to define how it's packaged up
patroni is in python, so needs python and a whole bunch of modules as dependencies, which makes deployment harder (which is why I created an overlay - there are 32 packages in there, over 2 dozen python modules)
stolon has its own proxy, rather than relying on a 3rd-party component like haproxy

As a distro maintainer, it doesn't make much difference - dealing with those differences and dependencies is part and parcel of daily life. For standalone use, I think I would probably tend towards stolon, simply because of the much smaller packaging effort.

(It's not that stolon necessarily has fewer dependencies, but remember that in go these are all resolved at build time rather than runtime.)

The Trouble with Tribbles...

Monday, April 29, 2019

HA PostgreSQL on Tribblix with Patroni

No comments: