This morning I have some weirdness to share with you, in the form of a post I just made to the Mattermost discussion forums.
(For those not familiar with it, Mattermost provides a collaboration hub - effectively “chat” with many useful plugins, like Slack or Discord - for teams. I use their open-source edition for home and business, and, this weirdness notwithstanding, highly recommend it. It’s an excellent product with the distinct advantage that you get to own your own discussions in-house.)
So here’s the scenario:
Mattermost is running as a container under Kubernetes (configured by the Mattermost operator), using as its back-end a Postgres database running in the same cluster. For configuring the connection to the database, we have a k8s secret defined thus:
---
apiVersion: v1
kind: Secret
metadata:
name: mattermost-db-credentials
namespace: mattermost
stringData:
DB_CONNECTION_STRING: "postgres://mattermost:PASSWORD_GOES_HERE@postgres-mattermost-postgresql.mattermost.svc.cluster.local:5432/mattermost?sslmode=disable"
which is referenced by the Mattermost manifest thus:
database:
external:
secret: mattermost-db-credentials
and so comes through in the final definition of the Mattermost pod thus:
env:
- name: MM_CONFIG
valueFrom:
secretKeyRef:
name: mattermost-db-credentials
key: DB_CONNECTION_STRING
So far, so good. It worked fine like this for a long time.
Until we needed to move that Postgres database. Same data and configuration - running off literally the same back-end storage - so all that would happen is the connection string changes. So all we should need to do is update that in the mattermost-db-credentials secret and restart the Mattermost pod, and all should be okay, yes?
No.
After updating the connection string (and making sure the new pod was picking it up), Mattermost wouldn’t start, and worse, the pod logs kept indicating that Mattermost couldn’t connect to the database using the old connection string, which was nowhere in the visible configuration.
After a little poking at this, it turns out that we would only see that error when Mattermost was properly configured with the new connection string, and not when it was configured with a blatantly erroneous connection string.
So, after taking a moment to say “Wut?”, it occurred to me to fire up my Postgres client and take a look at the configurations table in the Mattermost database, find the latest entry per the createat column, and what do I find:
The old connection string stored in the database! (And yes, changing the connection string in that entry in the new database eliminated the error and let Mattermost start up correctly; apparently Mattermost was connecting to the database at the new location for just long enough to read the latest configuration entry, pull the old connection string out of it, and then fail to connect to it at that old location.)
My working hypothesis here (not having had time to dig through the code to check if I’m right; I welcome corrections from those who know better) is that Mattermost copied the connection settings specified on the pod into the database way back when it was started the first time, and since then, those have been overriding the environment variables the operator sets on the pod, leading to a situation where any changes you make to the configuration on the k8s side end up simply ignored (apart from where to find the old configuration).
Which, well, I can see how it got there, but man, it’s confusing as hell when it actually happens to you.