summaryrefslogtreecommitdiff
path: root/CLAUDE.md
blob: 1213254a6ae9499d3cdfc222e4d8ee3c691ce319 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Overview

This is an Ansible-based infrastructure-as-code repository for managing the liz.coffee homelab infrastructure. It orchestrates deployment of services across a Docker Swarm cluster (3 nodes: swarm-one, swarm-two, swarm-three) and an outbound proxy server.

## Architecture

### Infrastructure Layout

- **Swarm Cluster**: 3-node Docker Swarm cluster at 10.128.0.201-203
  - Primary services deployed as Docker Swarm stacks
  - Shared Ceph storage mounted across all nodes
  - Keepalived for high availability
  - Traefik as the ingress controller with automatic TLS via Let's Encrypt

- **Outbound Proxy**: External-facing NGINX reverse proxy (outbound-two.liz.coffee)
  - Routes external traffic to internal services via the swarm loadbalancer
  - Uses docker-compose instead of swarm stacks

### Service Deployment Patterns

Services fall into two deployment models:

1. **Docker Swarm Services** (most services): Use `tasks/manage-docker-swarm-service.yml`
   - Deployed via `docker stack deploy`
   - Templates rendered from `playbooks/roles/{service}/templates/`
   - Health checks and rolling updates configured in docker-compose.yml
   - Traefik labels for automatic routing and TLS

2. **Docker Compose Services** (nginx_proxy, outbound): Use `tasks/manage-docker-compose-service.yml`
   - Deployed via systemd service `docker-compose@{service}`
   - Supports rollout using docker-rollout tool for zero-downtime deployments

### Common Task Files

- `tasks/manage-docker-swarm-service.yml`: Renders templates and deploys swarm stack
- `tasks/manage-docker-compose-service.yml`: Renders templates, manages systemd service, performs rollouts
- `tasks/copy-rendered-templates-recursive.yml`: Copies Jinja2 templates to destination

## Key Commands

### Vault Management

Initialize or update vault secrets:
```bash
./ansible-vault-init.sh [secret_name]
```

To avoid password prompts, store vault password in `secrets.pwd`:
```bash
echo "your_password" > secrets.pwd
```

### Deployment

Full deployment (all services in order):
```bash
ansible-playbook -e @secrets.enc --vault-password-file secrets.pwd deploy.yml
```

Deploy a specific playbook during development:
```bash
ansible-playbook -e @secrets.enc --vault-password-file secrets.pwd playbooks/{service}.yml
```

### Linting

```bash
yamllint --strict .
ansible-lint
```

### Creating New Services

Use the `create.py` script to scaffold a new service:
```bash
./create.py --service-name myservice --container-image myimage:latest --service-port 8080 [--external] [--internal]
```

This generates:
- Ansible role in `playbooks/roles/{service}/`
- Docker compose template with Traefik labels
- Group vars in `group_vars/{service}.yml`
- Inventory entry and playbook hook in `deploy.yml`
- NGINX config (if `--external` specified)
- DNS records (Cloudflare if `--external`, LabDNS if `--internal`)

## File Organization

- `inventory`: Ansible inventory defining host groups and connection details
- `deploy.yml`: Master playbook importing all service playbooks in deployment order
- `playbooks/`: Individual service playbooks
- `playbooks/roles/`: Service-specific roles containing tasks and templates
  - `{service}/tasks/main.yml`: Task entry point
  - `{service}/templates/`: Jinja2 templates (docker-compose.yml, configs, etc.)
- `group_vars/`: Variables per service/host group
- `secrets.enc`: Ansible vault encrypted secrets
- `ansible.cfg`: Ansible configuration (inventory path, SSH settings)

## Variable Conventions

Each service typically defines in `group_vars/{service}.yml`:
- `{service}_domain`: FQDN for the service
- `{service}_base`: Base directory path on swarm nodes (usually under `{{ swarm_base }}`)

Common variables available across all playbooks:
- `deployment_time`: Timestamp of deployment (forces container recreation)
- `timezone`: System timezone
- `homelab_build`: Boolean indicating local vs production deployment
- `loadbalancer_ip`: Internal VIP for the swarm cluster

## Important Notes

- Most services use swarm-one (10.128.0.201) as the deployment target in inventory
- Secrets are referenced as `{{ secret_name }}` from the vault
- All swarm services should connect to the `proxy` external network for Traefik routing
- Use `--resolve-image=always` in stack deploys to ensure latest images are pulled
- The outbound role manages NGINX configs in `playbooks/roles/outbound/templates/proxy/nginx/conf.d/`