Real-Time Monitoring with Netdata

Tags: DevOps , Homelab , Observability , Kubernettes

Published Jan. 25, 2026, 9:19 a.m. by wielandtech

I've been running a Kubernetes homelab for a while now, and while I've written about the setup before, I recently added something new that I'm pretty excited about: live cluster metrics on the homelab page.

What's New

The homelab page used to be pretty static--just a list of what's running and some hardware specs. Now it shows real-time metrics pulled directly from my cluster. CPU usage, memory, network traffic, disk I/O, temperatures, pod counts, deployment health, and uptime--all updating every second.

It's powered by Netdata agents running on each of my three nodes. I've been using Netdata alongside Prometheus and Grafana for a while, but I never exposed the data publicly before. The API is straightforward, and it gives me exactly what I need without the overhead of querying Prometheus for simple metrics.

The Metrics

Here's what you'll see on the page:

CPU Utilization: Aggregated across all nodes with a percentage and total core count
Memory: Cluster-wide usage in GB with a percentage
Network: Bandwidth in Mbps (sent and received)
Disk I/O: Read/write speeds aggregated across nodes
Temperature: Average and peak CPU temps
Pods: Real-time count of running pods
Deployments: Healthy vs total deployments
Uptime: Time since the last reboot (using the minimum across nodes)

Everything updates every second, which feels pretty snappy. There are visual indicators too--warnings when CPU or memory gets high, critical states when things are really stressed.

Why Netdata?

I already had Netdata running for troubleshooting, so it made sense to use it for this. The API is clean, it handles aggregation well, and I can query both system metrics and Kubernetes state through the k8s_state collector. Plus, it's already collecting everything I need.

The implementation was pretty straightforward. I'm using Netdata's API v1 for system metrics (CPU, memory, disk I/O, uptime) and API v2 for aggregated stuff like network traffic and Kubernetes metrics. The trickiest part was aggregating metrics across multiple nodes--especially for things like network traffic where I need to sum values, versus CPU where I want an average.

I also added a weather station page while I was at it. It pulls data from my Home Assistant setup via Prometheus, showing current conditions, an interactive wind compass, and historical charts. The data flows through Home Assistant → Prometheus → my Django API, updating every minute.

Technical Bits

The backend is a Django endpoint (/api/metrics/) that queries Netdata, aggregates the data, and caches it for one second. The frontend just fetches every second and updates the display. I'm using Netdata's context-based queries for Kubernetes metrics, which makes it easy to get pod counts and deployment status.

One thing I learned: Netdata's API v1 and v2 handle things differently. V1 is great for per-node system metrics, but v2's context aggregation is much better for cluster-wide views. The k8s_state collector is particularly handy for Kubernetes metrics.

Error handling was important too. If Netdata is unreachable, the page falls back to cached data so it doesn't just break. Same approach for the weather data.

Check It Out

You can see the live metrics on the homelab page and the weather station. The metrics update in real-time, so you'll see the cluster doing its thing. It's not production-grade monitoring, but it's a fun way to show what's happening in the homelab.

I'm thinking about adding more metrics or maybe some historical charts next. For now, it's nice to have a public view of what the cluster is up to.

0 comments

There are no comments yet.