Thursday, April 25, 2019

HA PostgreSQL on Tribblix with stolon

I wrote about setting up postgres replication, and noted there that while it did what it said it did - ensured that your data was safely sent off to another system - it wasn't a complete HA solution, requiring additional steps to actually make any use of the hot standby.

What I'm going to describe here is one way to create a fully-automatic HA configuration, using stolon. There's a longer article about stolon, roughly explaining the motivations behind the project.

Stolon uses etcd (or similar) as a reliable, distributed configuration store. So this article follows on directly from setting up an etcd cluster - I'm going to use the same zones, the same names, the same IP addresses, so you will need to have got the etcd cluster running as described there first.

We start off by logging in to each zone using zlogin (with pfexec if you set your account up as the zone administrator when creating the zone):

pfexec zlogin node1 (and node2 and node3)

Followed by installing stolon and postgres on each node, and creating an account for them to use:

zap refresh
zap install TRIBblix-postgres11 TRIBblix-stolon TRIBtext-locale
useradd -u 11799 -g staff -s /bin/bash -d /export/home/pguser pguser
passwd -N pguser
mkdir -p /export/home/pguser
chown -hR pguser /export/home/pguser

In all the following commands I'm assuming you have set your PATH correctly so it contains the postgres and stolon executables. Either add /opt/tribblix/postgres11/bin and /opt/tribblix/stolon/bin to the PATH, or prefix the commands with

env PATH=/opt/tribblix/postgres11/bin:/opt/tribblix/stolon/bin:$PATH

Log in to the first node as pguser.

pfexec zlogin -l pguser node1

Configure the cluster (do this just the once):

stolonctl --cluster-name stolon-cluster \
  --store-backend=etcdv3 init

It's saving the metadata to etcd. Although it's just a single key to mark the stolon cluster as existing at this point.

Now we need a sentinel.

stolon-sentinel --cluster-name stolon-cluster \
  --store-backend=etcdv3

It complains that there are no keepers, so zlogin to node1 in another window and start one of those up too:

stolon-keeper --cluster-name stolon-cluster \
  --store-backend=etcdv3 \
  --uid postgres0 --data-dir data/postgres0 \
  --pg-su-password=fancy1 \
  --pg-repl-username=repluser \
  --pg-repl-password=replpassword \
  --pg-listen-address='192.168.0.231'

after a little while, a postgres instance appears. Cool!

Note that you have to explicitly specify the listen address. That's also the address that other parts of the cluster use, so you can't use "localhost" or '*', you have to use the actual address.


You also specify the postgres superuser password, and the account for replication and its password. Obviously these ought to be the same for all the nodes in the cluster, so they can all talk to each other successfully.

And now we can add a proxy, after another zlogin to node1:

stolon-proxy --cluster-name stolon-cluster \
  --store-backend=etcdv3 --port 25432

If now you point your client (such as psql) at port 25432 you can talk to the database through the proxy.

Just having one node doesn't meet our desire to build a HA cluster, so let's add some more nodes.

Right, go to the second node,

pfexec zlogin -l pguser node2

and add a sentinel and keeper there:

stolon-sentinel --cluster-name stolon-cluster \
  --store-backend=etcdv3


stolon-keeper --cluster-name stolon-cluster \
  --store-backend=etcdv3 \
  --uid postgres1 --data-dir data/postgres1 \
  --pg-su-password=fancy1 \
  --pg-repl-username=repluser \
  --pg-repl-password=replpassword \
  --pg-listen-address='192.168.0.232'

What you'll then see happening on the second node is that stolon will automatically set the new postgres instance up as a replica of the first one (it assumes the first one you run is the master).

Then set up the third node:

pfexec zlogin -l pguser node3

with another sentinel and keeper:

stolon-sentinel --cluster-name stolon-cluster \
  --store-backend=etcdv3

stolon-keeper --cluster-name stolon-cluster \
  --store-backend=etcdv3 \
  --uid postgres2 --data-dir data/postgres2 \
  --pg-su-password=fancy1 \
  --pg-repl-username=repluser \
  --pg-repl-password=replpassword \
  --pg-listen-address='192.168.0.233'

You can also run a proxy on the second and third nodes (or on any other node you might wish to use, come to that). Stolon will configure the proxy for you so that it's always connecting to the master.

At this point you can play around, create a table, insert some data.

And you can test failover. This is the real meat of the problem.

Kill the master (^C its keeper). It takes a while, because it wants to be sure there's actually a problem before taking action, but what you'll see is one of the slaves being promoted to master. And if you run psql against the proxies, they'll send your queries off to the new master. Everything works as it should.

Even better, if you restart the old failed master (as in, restart its keeper), then it successfully sets the old master up as a slave. No split-brain, you get your redundancy back.

I tried this a few more times, killing the new master aka the original slave, and it fails across again.

I'm actually mighty impressed with stolon.

No comments: