Every great system starts with a deliberate choice of tools. As I begin this journey, I want to share the foundational stack that will underpin the systems we’ll design here.
Software engineering is dynamic, and these choices aren’t set in stone. However, part of a Systems Architect’s job is to choose tools that provide the best balance of reliability, maintainability, and speed. These are my choices for the projects ahead.
Terraform: Infrastructure as Code (IaC)
For provisioning, I’ll be using Terraform. While a developer might manually spin up a server to “get it working”, a Systems Architect views infrastructure as a versioned, repeatable asset.
Why Terraform?
- Declarative Blueprints: Describe the desired state of infrastructure. To replicate environments, run code — not memory.
- Provider Ecosystem: Manage multiple services (AWS, Azure, Cloudflare, etc.) through a single workflow.
- State Management: Track resource state so infrastructure can evolve without configuration drift.
Ansible: Configuration and Orchestration
If Terraform builds the house, Ansible moves the furniture in. I’ll use Ansible to manage the internal state of servers and services.
Why Ansible?
- Idempotency: Run the same playbook repeatedly and it only changes what’s needed.
- Agentless Simplicity: Uses SSH — avoids installing and managing agents across servers, which reduces operational burden and shrinks the attack surface.
- Consistency: Every node becomes a predictable clone, reducing “it works on my machine” failures.
Provisioning vs. Configuration: Where the Lines Are Drawn
| Feature | Terraform (The Foundation) | Ansible (The Interior) |
|---|---|---|
| Primary Goal | Provisioning: Create infrastructure (servers, VPCs, DBs). | Configuration: Set up software and app settings. |
| Model | Declarative: define what should exist. | Procedural/Hybrid: define how to reach the state. |
| State / Source of Truth | Stateful: the .tfstate file represents reality. |
Stateless: the live system state is the truth. |
Note: These lines can blur. My rule of thumb is: Terraform owns infrastructure lifecycle; Ansible owns OS/app configuration.
Eraser.io + Mermaid: sketch fast, version the truth
Architecture only works if it can be communicated — a design you can’t explain can’t be built, operated, or improved reliably.
For system design and communication, I use Eraser.io for speed and clarity — especially early on. I’ll often jump on a Teams call, share my screen, and sketch ideas live so we can align quickly.
Once a design becomes real (or needs to live alongside the code), I translate the diagram into Mermaid so it’s:
- Versioned with the repository (PR-reviewed like everything else)
- Easy to update as the architecture evolves
- Portable across docs, READMEs, and internal write-ups
Here’s a small example of the kind of diagram I’ll keep in-repo: a single flow showing where I draw the line between provisioning and configuration.
Demo: provisioning vs configuration in one flow
sequenceDiagram
autonumber
participant SA as Systems Architect
participant TF as Terraform
participant Cloud as Cloud Provider (AWS/Azure)
participant Servers as Compute (Servers)
participant AN as Ansible
participant App as Application
SA->>TF: Apply IaC (desired state)
TF->>Cloud: Provision VPC/Network
TF->>Servers: Provision VM Instances
Note over TF,Servers: Output: IP Addresses/Tags
SA->>AN: Run playbook (targeted by Tags/IP)
AN->>Servers: Configure OS/Packages
AN->>App: Deploy Code & Config
AN->>App: Restart Service
The Systems Architect’s Filter: Questions to Guide Design
Tools alone don’t create great systems; the “Architectural Ilities” do. For every project on this blog, I filter decisions through these eight questions. This is how we move from “making it work” to “making it resilient.”
1) What are the core requirements?
What is the minimum we need? Before touching code, define the problem and solve it with the least complexity possible.
- The Systems Architect view: What is the Minimum Viable Architecture (MVA)? Are we over-engineering?
- The goal: Avoid resume-driven development. Build what’s needed—no more, no less.
2) What are the constraints?
Constraints shape architecture more than preferences do.
- The Systems Architect view: What are the hard limits—time, budget, team size, hosting, data sensitivity, compliance, and operational support?
- The goal: Make constraints explicit so trade-offs are intentional, not accidental.
3) Are we adding components because they’re useful — or because they’re trendy?
New tools are tempting. AI components are especially easy to shoehorn in because they feel innovative.
- The Systems Architect view: What problem does this solve? What’s the simplest alternative? What failure modes, security risks, operational burden, and vendor dependencies does it introduce?
- The goal: Avoid hype-driven architecture. If it doesn’t earn its place with measurable value, we don’t use it.
4) Where are the bottlenecks?
A system is only as fast as its slowest component.
- The Systems Architect view: If we hit 500 concurrent requests, will the DB pool exhaust? Will service-to-service latency create a “distributed monolith”?
- The goal: Identify the critical path and plan for pressure (caching, load balancing, queueing, async processing).
5) Will this scale?
Scaling isn’t just “add more servers.” It’s about the scaling efficiency of the architecture.
- The Systems Architect view: Where is state stored? If I double hardware, do I get double throughput — or hit a locked resource?
- The goal: Prefer horizontal scaling. Components should be replicable and replaceable without manual re-wiring.
6) What breaks, and how do we recover?
Failure is inevitable. Resilience is designed.
- The Systems Architect view: What are the likely failure modes (dependency outage, slow DB, bad deploy, malformed input, credential expiry)? What’s the blast radius? What is the rollback plan?
- The goal: Reduce downtime with graceful degradation, retries with backoff, circuit breakers where appropriate, and clear runbooks.
7) How secure is this system?
Security is not a feature you add at the end—it’s a system property.
- The Systems Architect view: Are we using least privilege? Are network boundaries tight in IaC? Are secrets stored and rotated properly?
- The goal: Security by design: protect data at rest and in transit from day one.
8) How do we monitor, operate, and maintain it?
If it breaks at 3 AM, the system must tell us why, not just that it broke.
- The Systems Architect view: Do we have correlation IDs? Structured logs? SLOs and alerts tied to user impact rather than noise?
- The goal: Reduce MTTR by making the system transparent, measurable, and operable.
Closing Thoughts
With these tools and principles, I’m ready to start designing systems that are scalable, efficient, and understandable. These choices are a strong starting point, but the real value is in how they’re applied to solve real problems.