As the complexity of your smart home increases, it’s important to implement some kind of system monitoring to ensure the long-term reliability and stability of the different components that make up the system.
Technology failed me when I had my bike stolen not too long ago. When integration problems remain undetected over a period of weeks or months you are creating a false sense of reliability – placing trust in a system that is already failing and just waiting to show its symptoms.
To give some back story, on the night of the incident, the IP camera facing the driveway was not recording. The alarm system did not trigger because one motion sensor was out of battery and the other one had been turned off the day before due to some yard work. A Wifi enabled outdoor light did not turn on because of a break in Wifi connection at the time. This combination of failures resulted in the perfect opportunity for the bike thief.
Advertisement Begins
Advertisement End
This made me realise how wrong people approach monitoring in a smart home context. People’s efforts on home automation forums consist of monitoring host metrics like CPU and memory usage, percentage of available disk space, and internet connection details such as download/upload rates, ping statistics, etc.
It is a fun project to get this info into Home Assistant and create some nice dashboards in Lovelace using different cards available in Home Assistant Community Store. Over time, however, these dashboards are forgotten because there’s never a good reason to look at them and it’s easy to lose interest along the lines of “System was working fine yesterday, therefore it is working fine today”
The problem can be summarised like this:
- Tracking the wrong metrics that do not relate to the functionality relied upon on a daily basis
- Performing manual data analysis via dashboards
Start tracking meaningful metrics
CPU and RAM are not indicative of how the system is functioning from the user’s point of view. When was the last time a Wifi reconnection loop in one of your lights manifested itself as erratic CPU usage on the machine running Home Assistant?
In 5 years of home automation, my CPU usage was never all that exciting — and for good reason. The hardware is set up to adequately support the services I am running. Ensuring your home automation server runs on reliable computing and networking hardware is a prerequisite! Once this is achieved, monitoring the underlying infrastructure does not give any useful insights about the system once normal “business-as-usual” operation begins. Especially when there are more important metrics to collect and insights to be made aware of.
How do you know which metrics to track?
Look at the value chain of your smart home system. Where do you derive the most value? What are the most useful and critical functions your smart home performs for you? Do you have long chains of integrations that are on the critical path of some core functionality?
Example 1: If you depend on camera streams for your alarm system, set up pings to each camera’s IP address. This asserts network connectivity at the very least. A better approach would be to monitor the camera streaming API directly, as this is what is consumed by Home Assistant.
Example 2: Monitoring all the Wifi enabled devices with connectivity checks. This creates awareness of seldom used devices such as outdoor security lights.
Example 3: Monitor the uptime of critical services such as RF gateways, and network video recorders, and possibly even automate sensible recovery steps such as automatic reboots. These services are at the core of the system or perform critical functions. Downtime is not an option without severely impacting usability or safety.
Advertisement Begins
Advertisement End
Choose what to monitor
Monitor the critical devices you depend on every day to make sure they are operating reliably. My bike theft went undetected because a motion sensor was out of battery, a Wifi light was stuck in a reconnection boot loop and the camera was not transmitting a live stream at the time. Who knows how long these components were malfunctioning? It’s the same problem with smoke alarms. If you don’t test them regularly you have no idea if they are working correctly in the event of an emergency.
It’s easy enough to test smoke alarms. Smart homes differ due to the sheer number of services, potential points of failure and things to potentially monitor on a regular basis.
System monitoring for your smart home is like the automated “smoke detector” check for hundreds of detectors.
The following are some suggestions for good monitoring candidates in a home automation context:
- Monitor battery-operated devices
If your device sends battery information captured in Home Assistant, set up an alert to monitor the battery status.
- Monitor sensor events
If a sensor fails to send telemetry data for 5 minutes it may have connection issues.
- Monitor each individual application
Monitor each individual application installed on your server – from Home Assistant itself to the NVR running in a docker container.
- Monitor MQTT topics
Many devices transmit over MQTT and you should take advantage of this via monitoring. Should they stop sending telemetry data for an extended period of time, something must be wrong.
- Monitor your cameras object detection
This is a critical service! Does the camera normally record 15 motion events a day and there have not been any events in the last 2 days? This looks like a problem.
Home Assistant is the glue holding together multiple integrations and ecosystems. Consider how you are chaining devices from different ecosystems and how many dependencies this is creating. The longer the chain of integration, the more likely it becomes that any link breaks, which causes the whole process to fail. Once you identify these chains, think about ways to break them up. This is general advice for improving the reliability of your system.
Affiliate Content Start
Smart Watch(Answer/Make Call), 1.96" HD Smartwatch for Men Women, 2025 New Fitness Watch with 110+ Sport Modes, Heart Rate, Sleep Monitor, Pedometer, IP68 Waterproof Activity Tracker for Android iOS
$99.99 (as of January 22, 2025 09:51 GMT +08:00 - More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)Introducing Amazon Kindle Colorsoft Signature Edition (32 GB) – With color display, auto-adjusting front light, wireless charging, and long battery life - Metallic Black
$279.99 (as of January 22, 2025 09:51 GMT +08:00 - More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)2 Packs USB C Headphones for iPhone 16/15 Earbuds with Microphone & Volume Control HiFi Stereo Type C Earphones for Galaxy S23/S22/S21/S20/Ultra Note 10/20, for iPad Pro Pixel 7/6/6a/5/4 and More
$19.98 (as of January 22, 2025 09:51 GMT +08:00 - More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)Affiliate Content End
Follow along on the next page where I discuss how I implemented automated analysis and alerting based on heartbeat health checks.