Building the Foundation: A Deep Dive into Our Infrastructure Journey
In the world of modern development, it is easy to get caught up in the flashy final product: the user interface, the slick animations, or the latest feature set. But for those of us who live in the terminal, we know that the real magic happens in the shadows. Over the past year, we have been quietly obsessing over a complete overhaul of our infrastructure. We haven’t just been building applications; we’ve been building a fortress to house them.
The philosophy behind our current setup is simple: decentralized power with centralized control. We wanted an environment that was resilient enough to survive a node failure, transparent enough to monitor in real-time, and flexible enough to deploy new services without manual intervention. To get there, we had to move away from the "one big server" model and embrace a multi-VM architecture that separates our core logic from our data and monitoring tiers.
The Architecture of the "Master Stack"
At the heart of our operations is the Master VM. This is the orchestrator. By utilizing Docker Compose, we have moved toward a containerized-first workflow. This allows us to maintain a "Zero-Footprint" host. If we need to move our entire stack to a new provider tomorrow, we can do it in minutes rather than days.
The Master VM handles our primary entry points. We’ve implemented a robust reverse proxy layer using Nginx Proxy Manager (NPM). This serves as our digital gatekeeper, handling SSL termination through Let's Encrypt and ensuring that traffic is routed efficiently to the correct internal containers. This layer doesn’t just provide security; it provides a professional, unified front for every service we host, from internal documentation to public-facing blogs.
Separating the Data Layer
One of the biggest shifts in our infrastructure work was the decision to move our databases off the primary application server. We now leverage a dedicated database VM (VM3). In our earlier days, having a database live in a container on the same machine as the app was fine for development, but it created a single point of failure and resource contention.
By hosting our PostgreSQL and MongoDB instances on a separate, hardened VM, we have achieved a much higher level of data integrity. This separation allows us to perform maintenance on the application stack without touching the raw data. It also allows us to tune the kernel of the database VM specifically for high-disk I/O, ensuring that as our services grow, our data access remains lightning-fast.
Security and Identity: The Authentik Integration
Infrastructure is nothing without security. We’ve spent a significant amount of time integrating Authentik into our core stack. We believe that "security through obscurity" is a myth, so we’ve opted for a robust Single Sign-On (SSO) and OpenID Connect (OIDC) framework.
Nearly every service we run is now gated behind this identity provider. This means we have a centralized dashboard to manage permissions, MFA (Multi-Factor Authentication), and user access. Whether we are accessing an internal monitoring tool or a private management interface, the experience is seamless and secure. By using OIDC, we’ve eliminated the "password fatigue" that comes with managing twenty different sets of credentials, all while significantly narrowing our attack surface.
Transparency Through Observability
You cannot fix what you cannot see. Perhaps the work we are most proud of is our observability stack. We’ve deployed a comprehensive monitoring suite consisting of Prometheus, Grafana, Loki, and Promtail.
- Prometheus acts as our time-series database, scraping metrics from our containers and host machines to give us a heartbeat of our CPU, RAM, and network health.
- Grafana serves as our "mission control," where we’ve designed custom dashboards that give us a bird's-eye view of the entire network.
- Loki and Promtail handle our log aggregation. Instead of SSH-ing into three different machines to check docker logs, we now have a centralized, searchable stream of every log produced across our entire infrastructure.
This level of transparency has changed how we develop. We no longer guess why a service crashed; we look at the Grafana spikes and the Loki logs to see exactly what happened in the seconds leading up to the failure.
The Road Ahead
Building this infrastructure hasn't been without its headaches. We’ve battled database migration loops, permission issues with persistent volumes, and the complexities of cross-VM networking over encrypted tunnels like Tailscale. But each hurdle has resulted in a more stable environment.
We didn't build this infra just to say we have it. We built it because a solid foundation is a prerequisite for innovation. By automating our deployments, securing our identity layer, and making our system transparent, we’ve freed up our mental energy to focus on what actually matters: building great things. The "boring" work of configuring YAML files and tuning database parameters is exactly what makes the "exciting" work possible.
Welcome to the new era of our digital home. It’s clean, it’s fast, and it’s built to last.
-Bray