LydianAI distributed training

How federated ML works

This first LydianAI app is a federated/distributed training PoC: a FastAPI coordinator runs FedAvg rounds while workers (CPU-only or GPU) train locally and submit model updates.

Server / Coordinator (macOS CPU OK)

Hosts a FastAPI API, shards CIFAR-10 across workers, aggregates updates using FedAvg, and tracks metrics per round.

Workers (Ubuntu GPU/CPU)

Register → poll → download model → train locally → submit update → repeat. Workers can be modern GPUs, legacy Pascal GPUs, or CPU-only.


Key design goals

Heterogeneous hardware

Mixed compute is the default: different GPUs, different speeds, and even CPU-only machines.

Legacy GPU support

Pascal GPUs (sm_61) require older Torch wheels and often older Python. The PoC supports a LEGACY install path and a legacy Torch mode.

Simple networking

All machines join the same Tailscale tailnet. Workers connect to the server using stable 100.x IPs.


What v1 is (and isn't)

It is

  • FedAvg training rounds
  • FastAPI coordinator + worker loop
  • CLI client to start/monitor/results
  • NEW vs LEGACY GPU install paths

It isn't

  • A production scheduler
  • A hosted "managed service"
  • A full marketplace / multi-tenant platform

Roadmap

Start with working code on real hardware. Then harden it.

Now: Distributed training PoC

  • FastAPI coordinator
  • FedAvg aggregation
  • Heterogeneous workers
  • NEW vs LEGACY GPU support
  • CLI: start/monitor/results

Next: reliability + observability

  • Round watchdogs and timeouts
  • Better worker diagnostics
  • Structured logging + metrics
  • More robust state handling

Then: Inference-first runtime

A clean inference mode (jobs, batching, routing) is the next practical wedge once the cluster mechanics are solid.

Later: optional managed control plane

Hosted control plane is optional. Self-hosting remains first-class.

Clone, install & run →