Architecture

This app is a federated/distributed training PoC built around two components: a central coordinator and distributed workers.

Components

Server / Coordinator (FastAPI)

Hosts the REST API on port 8000
Assigns and shards data (CIFAR-10 baseline) across registered workers
Aggregates model updates using FedAvg (Federated Averaging)
Tracks metrics (loss, accuracy) per round
Manages training state: round progression, worker registration, result collection
Runs on macOS or Linux — CPU-only is fine for the coordinator

Workers

Each worker runs on a separate machine (Ubuntu recommended for GPU support):

Register — worker connects to the coordinator and announces itself (hardware info, GPU type)
Poll — worker checks if there’s work to do (a new round)
Download — worker pulls the current global model
Train — worker trains locally on its assigned data shard for N epochs
Submit — worker sends model updates (gradients/weights) back to the coordinator
Repeat — worker loops back to polling for the next round

Workers can be:

NEW GPU (RTX 20/30/40, A100, H100) — current PyTorch + CUDA
LEGACY GPU (GTX 1080 / 1080 Ti) — pinned legacy Torch stack
CPU-only — slower but functional

Networking

Machines communicate over Tailscale (100.x addresses). This avoids NAT issues and keeps the multi-node setup reproducible across different network environments.

See the Tailscale networking guide for setup instructions.

FedAvg algorithm

The coordinator implements Federated Averaging:

Initialize — coordinator creates a global model
Distribute — coordinator sends the global model to all workers
Local training — each worker trains on its local data shard
Aggregate — coordinator collects updates and averages them (weighted by data size)
Update — the averaged result becomes the new global model
Repeat — steps 2–5 repeat for N rounds

This approach allows training across machines without sharing raw data — only model updates are transmitted.

Project structure

lydianai_ml/
├── server/       # FastAPI coordinator
├── worker/       # Worker agent
├── client/       # CLI client (submit_job)
├── common/       # Shared utilities and model definitions
├── requirements_server.txt
├── requirements_new_gpu.txt
└── requirements_legacy_gpu.txt

Data flow

CLI (start) → Server → Workers register
                ↓
         Round N begins
                ↓
    Server sends global model → Workers
                                   ↓
                            Local training
                                   ↓
    Server ← model updates ← Workers
                ↓
        FedAvg aggregation
                ↓
        Round N+1 begins...