LydianAI distributed inference

How it works

This first LydianAI app is a federated/distributed training PoC: a FastAPI coordinator runs FedAvg rounds while workers (CPU-only or GPU) train locally and submit model updates.

Server / Coordinator (macOS CPU OK)

Hosts a FastAPI API, shards CIFAR-10 across workers, aggregates updates using FedAvg, and tracks metrics per round.

Workers (Ubuntu GPU/CPU)

Register → poll → download model → train locally → submit update → repeat. Workers can be modern GPUs, legacy Pascal GPUs, or CPU-only.


Key design goals

Heterogeneous hardware

Mixed compute is the default: different GPUs, different speeds, and even CPU-only machines.

Legacy GPU support

Pascal GPUs (sm_61) require older Torch wheels and often older Python. The PoC supports a LEGACY install path and a legacy Torch mode.

Simple networking

All machines join the same Tailscale tailnet. Workers connect to the server using stable 100.x IPs.


What v1 is (and isn’t)

It is

  • FedAvg training rounds
  • FastAPI coordinator + worker loop
  • CLI client to start/monitor/results
  • NEW vs LEGACY GPU install paths

It isn’t

  • A production scheduler
  • A hosted “managed service”
  • A full marketplace / multi-tenant platform