Quickstart (FedAvg + FastAPI)
This PoC performs federated/distributed training across heterogeneous machines:
- Server/Coordinator (macOS CPU OK): FastAPI + FedAvg aggregation
- Workers (Ubuntu GPU/CPU): register → poll → train → submit update
- Dataset: CIFAR-10
0) Networking (Tailscale)
On all machines:
sudo tailscale up
tailscale ip -4
Note the server’s 100.x IP — workers will need it. See the Tailscale networking guide for more detail.
1) Clone the repo
git clone https://github.com/polyplay/lydianai_ml.git
cd lydianai_ml
2) Set up the server (coordinator)
The server runs on macOS or Linux. CPU-only is fine.
python3 -m venv venv
source venv/bin/activate
pip install -U pip
pip install -r requirements_server.txt
Start the coordinator:
python -m server.main --host 0.0.0.0 --port 8000
The server listens on port 8000 and exposes a FastAPI API for workers and the CLI client.
3) Set up workers
Workers run on Ubuntu machines with GPU or CPU. Choose your install path based on your hardware — see the NEW vs LEGACY GPU guide.
NEW GPU (RTX 20/30/40, A100, H100):
python3.12 -m venv venv
source venv/bin/activate
pip install -U pip
pip install -r requirements_new_gpu.txt
LEGACY GPU (GTX 1080 / 1080 Ti):
python3.10 -m venv venv
source venv/bin/activate
pip install -U pip
pip install -r requirements_legacy_gpu.txt
Start a worker (point it at the server’s Tailscale IP):
python -m worker.main --server http://<server-ip>:8000
The worker registers with the coordinator, then enters a loop: poll → download model → train → submit update.
4) Launch a training run
From any machine that can reach the server:
python -m client.submit_job --server http://<server-ip>:8000 start
This starts FedAvg training. The coordinator shards CIFAR-10 across registered workers and begins round 1.
5) Monitor and fetch results
# Check training status
python -m client.submit_job --server http://<server-ip>:8000 status
# Watch round-by-round progress
python -m client.submit_job --server http://<server-ip>:8000 monitor
# Fetch final results
python -m client.submit_job --server http://<server-ip>:8000 results
# List connected workers
python -m client.submit_job --server http://<server-ip>:8000 workers
Next steps
- Architecture — understand the server/worker design
- CLI reference — full list of commands
- NEW vs LEGACY GPU setup — detailed install paths
- Tailscale networking — network configuration