Tailscale and Kubernetes: Cluster Peer Relays
Faster communications with Tailscale-using containers.
As those of you who’ve been following me for a while both know, I am an enthusiastic user of both Kubernetes…
…and of Tailscale.
Now, there are a number of containers running on my cluster which make use of Tailscale in one way or another. I have my Tor proxy there, which is published as a service using the Tailscale k8s operator. I keep a dedicated exit node in a pod, which is also handled for me by the operator. And there are some that use Tailscale directly: (golink, idp, etc.)
But using these isn’t as efficient as it could be. Because when you try to use them, traffic always starts out going via one of the off-site DERPs for a while until the hole-punching sorts itself out. After all, one of the points of k8s is that the internal pod and service networks are separate from your physical network, and not routable from it:
❯ tailscale ping idp
pong from idp (100.93.174.43) via DERP(dfw) in 34ms
pong from idp (100.93.174.43) via DERP(dfw) in 99ms
pong from idp (100.93.174.43) via DERP(dfw) in 28ms
pong from idp (100.93.174.43) via DERP(dfw) in 26ms
pong from idp (100.93.174.43) via 172.16.0.129:16232 in 1ms
So I started thinking about how to optimize this, and it occurs to me: we have peer relays now. And how hard could it be to put one of those inside a pod with access both to the host and the pod networks?
As it turns out, not very hard at all, but harder than it should be.
| --- | |
| apiVersion: v1 | |
| kind: Secret | |
| metadata: | |
| name: tailscale-peer-relay-state | |
| namespace: kube-public | |
| stringData: | |
| TS_AUTHKEY: tskey-auth-REDACTED_NOT_THAT_IT_MATTERS | |
| --- | |
| apiVersion: rbac.authorization.k8s.io/v1 | |
| kind: Role | |
| metadata: | |
| name: tailscale-peer-relay | |
| namespace: kube-public | |
| rules: | |
| - apiGroups: [""] # "" indicates the core API group | |
| resources: ["secrets"] | |
| # Create can not be restricted to a resource name. | |
| verbs: ["create"] | |
| - apiGroups: [""] # "" indicates the core API group | |
| resourceNames: ["tailscale-peer-relay-state"] | |
| resources: ["secrets"] | |
| verbs: ["get", "update", "patch"] | |
| - apiGroups: [""] # "" indicates the core API group | |
| resources: ["events"] | |
| verbs: ["get", "create", "patch"] | |
| --- | |
| apiVersion: rbac.authorization.k8s.io/v1 | |
| kind: RoleBinding | |
| metadata: | |
| name: tailscale-peer-relay | |
| namespace: kube-public | |
| subjects: | |
| - kind: ServiceAccount | |
| name: "tailscale-peer-relay" | |
| roleRef: | |
| kind: Role | |
| name: tailscale-peer-relay | |
| apiGroup: rbac.authorization.k8s.io | |
| --- | |
| apiVersion: v1 | |
| kind: ServiceAccount | |
| metadata: | |
| name: tailscale-peer-relay | |
| namespace: kube-public | |
| --- | |
| apiVersion: v1 | |
| kind: Service | |
| metadata: | |
| name: tailscale-peer-relay | |
| namespace: kube-public | |
| labels: | |
| app: tailscale-peer-relay | |
| annotations: | |
| metallb.io/loadBalancerIPs: fdc9:b01a:9d26:0:2::3,172.16.2.3 | |
| spec: | |
| type: LoadBalancer | |
| ipFamilyPolicy: RequireDualStack | |
| ports: | |
| - name: peer-relay | |
| protocol: UDP | |
| port: 61441 | |
| targetPort: 61441 | |
| selector: | |
| app: tailscale-peer-relay | |
| --- | |
| apiVersion: apps/v1 | |
| kind: Deployment | |
| metadata: | |
| name: tailscale-peer-relay | |
| namespace: kube-public | |
| labels: | |
| app: tailscale-peer-relay | |
| spec: | |
| replicas: 1 | |
| selector: | |
| matchLabels: | |
| app: tailscale-peer-relay | |
| template: | |
| metadata: | |
| labels: | |
| app: tailscale-peer-relay | |
| spec: | |
| serviceAccountName: tailscale-peer-relay | |
| containers: | |
| - name: tailscale | |
| imagePullPolicy: Always | |
| image: "ghcr.io/tailscale/tailscale:latest" | |
| env: | |
| - name: TS_KUBE_SECRET | |
| value: "tailscale-peer-relay-state" | |
| - name: TS_USERSPACE | |
| value: "false" | |
| - name: TS_DEBUG_FIREWALL_MODE | |
| value: auto | |
| - name: TS_AUTHKEY | |
| valueFrom: | |
| secretKeyRef: | |
| name: tailscale-peer-relay-state | |
| key: TS_AUTHKEY | |
| optional: true | |
| - name: TS_EXTRA_ARGS | |
| value: "--hostname=k8s-peer-relay" # --relay-server-port=61441 --relay-server-static-endpoints=\"[fdc9:b01a:9d26:0:2::3]:81441,172.16.2.3:61441\"" | |
| - name: POD_NAME | |
| valueFrom: | |
| fieldRef: | |
| fieldPath: metadata.name | |
| - name: POD_UID | |
| valueFrom: | |
| fieldRef: | |
| fieldPath: metadata.uid | |
| securityContext: | |
| privileged: true |
If you’re going to try and replicate this, note that I’ve put it in the kube-public namespace where I keep all the things that intermediate between the cluster and the outside world; and that you will need to change the MetalLB annotations to suit whatever load balancer and local network addressing scheme you have.
Also, this isn’t quite all of it.
You can configure Tailscale to be a peer relay using the tailscale set command after it’s running, but you can’t use the relevant —relay-server-port and —relay-server-static-endpoint options with tailscale up. Ordinarily I’d put a script in a custom container to do this sort of thing, but since at the moment I’m experimenting - and since we’re letting Tailscale store its state in a secret, it will persist past pod restarts - I just kubectl exec-ed into the tailscale-peer-relay pod once it was up and ran:
tailscale set --relay-server-port=61441 --relay-server-static-endpoints=”[fdc9:b01a:9d26:0:2::3]:81441,172.16.2.3:61441”
Again, if you’re doing this yourself, change the endpoints to the same ones you used for the service.
How does it look now?
❯ tailscale ping darkweb-tor
pong from darkweb-tor (100.65.32.54) via DERP(dfw) in 26ms
pong from darkweb-tor (100.65.32.54) via peer-relay(172.16.2.3:61441:vni:57) in 2ms
pong from darkweb-tor (100.65.32.54) via peer-relay(172.16.2.3:61441:vni:57) in 2ms
pong from darkweb-tor (100.65.32.54) via peer-relay(172.16.2.3:61441:vni:57) in 1ms
pong from darkweb-tor (100.65.32.54) via peer-relay(172.16.2.3:61441:vni:57) in 3ms
pong from darkweb-tor (100.65.32.54) via 172.16.0.130:23828 in 2ms
Much better! Everything but the first packet in the ping stays local, saving a good 100 ms of route-establishment.
Now, if I could just figure out a way to eliminate that first requirement to hit an external DERP at all, that’d be perfect. But this is a start.


Clever setup with the peer relay to skip the DERP hop. The latency improvment from 100ms to 2ms is huge, especially for services that need low latency like Tor proxies. I ran into this same issue when deploying service meshes across multi-zone k8s clusters where the initial DERP routing added noticable overhead. What would be interesting is if Tailscale could autodiscover peers on the same subnet to bypass the initial DERP entirely.