Talk to your house from anywhere!

Assuming you happen to be running my particular combination of software, of course.

May 12, 2023

Well, okay, that’s not entirely fair. You should be able to generalize what I did here to work on your personal setup, whatever that happens to be.

In my case, what I have is a Mattermost household chat server that I just got done to moving to self-hosting (but the same principles should be readily adjustable to Slack, Discord, Matrix, or whatever you use for this); my Home Assistant installation, with its recently-added Assist voice (or more accurately, natural-language) assistant; and of course the indispensable Node-RED to glue it all together.

For this is the story of how this morning I had an inspiration concerning the item “can I do ‘voice’ control of my house using text chat” and doing so turned out to be much simpler than I was expecting.

So how did I do it?

Well, the first task, of course, is to get the text out of the chat. To do that, I set up an outgoing web hook in Mattermost. First, though, I had to change a couple of global settings. First, this one, in the System Console under Environment→Developer:

You need to enter the hostname of your Node-RED server (since I run Home Assistant in Kubernetes, it has its own ingress hostname separate from the HA one) under “Allow untrusted internal connections to:”, otherwise your web hook won't work.

Secondly, of course, you have to actually enable the use of outgoing webhooks on your server down under Integrations→Integration Management:

As well as Enable Outgoing Webhooks, I also turned on Enable integrations to override usernames and Enable integrations to override picture icons. That’s because I prefer to have my Home Assistant appear under its own username and icon rather than sharing mine, as the webhook creator.

Having attended to these one-time prerequisites, it’s time to set up the webhook itself. On Mattermost, this is done in your “team”, under Integrations→Outgoing Webhooks.

So. Create a new outgoing webhook, thus:

You can give it any title and description you want, preferably something relevant.
Set the Content Type to application/json, since, conveniently, Node-RED parses that automatically when it receives it.
Set up the Channel and Trigger Words to ensure that commands you type are sent to the webhook. I have a dedicated channel (“Jeeves”) on my Mattermost where Home Assistant notifications are sent, so I used that one, and left the trigger words blank to have all messages sent via the webhook.

(Conveniently, neither responses to this webhook nor messages sent from the incoming webhook I use to deliver notifications are sent to the outgoing webhook, which saves me some later work. If you are using a general-purpose channel or trigger words, you’ll need to modify the Node-RED flow we’ll get to later to filter out unwanted messages and remove the trigger words before passing the message on to Assist.)
Set the callback URL to the URL of the Node-RED http in node that will be handling the webhook. You can set the end of the path to whatever you want and these nodes are normally located under /red-nodes, so it’s easy enough to figure out in advance, but if you want to be sure, you can set up the flow first.

For me, this comes out as https://jeeves-node.harmony.arkane-systems.lan/red-nodes/jeeves-assist.
Set the username to the username you are using for your Home Assistant.

The result should look something like this:

Now, onto the Node-RED part. I’m going to show you a picture of the flow and talk you through it first, and then I’ll include the code for you to import as you will.

See! Fairly simple.

We begin at the top-left, with the http in node that receives calls from the webhook in the form of an HTTP POST request.

After that, it goes on to Validate token. When you set up the webhook in Mattermost, it creates a token (as you can see in the above screenshot) which it sends along with every request to prove that it’s a genuine webhook call; this is a simple switch node that checks the token is valid. If it isn’t, it returns a 401 Forbidden error.

This doesn’t actually matter much in my case, where both the Mattermost server and the Node-RED server are on my internal network, and the latter has no direct exposure to the outside world so no-one’s going to be submitting fake webhook calls to it anyway. (Or if they can, I have much bigger issues.) On the other hand, if you’re using something cloud-hosted like Slack or Discord and as such you have to expose the Node-RED endpoint to the Internet, it will matter a great deal to you.

Next up, strip to text, which removes the various extraneous information the webhook gives us and leaves only the plain text of the chat message. (Right around this is when you’d do your filtering/editing, if you need to do what’s mentioned above, as before this point is when you have channel/team/user/etc. information handy.)

Then we have a current state node from Home Assistant. I have a helper set up in HA named input_boolean.bot_shutdown, and this node doesn’t change the message in any way. What it does is block any messages if that switch is turned on in Home Assistant. This lets me have a quick switch to flip if something goes wrong upstream of here, or for testing and maintenance, which will prevent any chat requests from being sent on to Assist.

Moving on, format for conversation API does exactly what it says on the tin, wraps the chat message in the JSON it needs to be understood by the Home Assistant conversation API (details here).

When the API returns, we split it into the three kinds of possible returns (completed actions, query answers, and errors), although you’ll notice in the current flow that we don’t actually handle then differently. This is, shall we say, room for future developments. ☺

And then we extract speech response from what the API returns, format it appropriately for the job, and use our http response node to pass that back to Mattermost.

Job done!

Ah, you ask, but what about all those conversation_id nodes? Well, Assist has the notion of an ongoing conversation, which provides context. To support this in its limited way, when the response from the API includes a (Home Assistant-generated) conversation ID, we stash that in a flow variable; and when that variable isn’t null, we add the ID back into the next request we make so it’s perceived as part of the same conversation. As a bit of a clumsy hack to avoid conversations lasting forever, 30 seconds after the last response to include a conversation ID, we flush the stored one, assuming that anything more after a pause that lengthy must be a new conversation.

You can download the code here.

And does it work?

Well:

(Like I said, much simpler than I expected. Kudos to the Home Assistant team on this one.)

Random Bytes

Discussion about this post