Workflows that don’t lose state.

A workflow engine in Go that runs your functions reliably across restarts, retries, and worker failure. Postgres-backed history, Redis-stream dispatch, replay on yield, standby HA. Operate it yourself.

Get started Star on GitHub

Apache 2.0Go 1.25+Postgres + Redis

MonthlyReportwf-1a2b3c

Running

0/9 events

Three ideas the engine takes seriously.

Durable state

Every state transition persists to Postgres before the RPC returns. Workflows are append-only event logs. A restart loses nothing.

Replay on yield

Workflow functions are deterministic over their event history. A worker dies, the engine re-dispatches against the same history, recorded commands become no-ops on the second run.

Standby HA

Engine replicas hold a Postgres advisory lock. The leader serves traffic, standbys idle. Lose the leader and one promotes within a single retry interval.

Write workflows as plain Go.

A workflow is a function. Inside it, the SDK exposes a few durable primitives: schedule an activity, sleep, wait for a signal. Each one is recorded in history; the engine replays the function against that history on re-dispatch.

No DSL, no YAML, no state machine generator. Just Go with an honest set of constraints.

client.RegisterWorkflow("MonthlyReport", func(ctx sdk.WorkflowContext, input any) (any, error) {
    for i := range 3 {
        ctx.QueueActivity("SendEmail", fmt.Sprintf("user%d@example.com", i))
        ctx.Sleep(2 * time.Second)
    }
    return "report complete", nil
})

Architecture

Three processes, two data stores, one binary.

Workers run your code and poll the engine over bidi gRPC streams. The engine owns the durable side: it writes history to Postgres, queues tasks on Redis Streams, and emits metrics and traces. No sidecars, no coordinator, no leader election service beyond a Postgres advisory lock.

What the engine does for you.

Durable workflow execution

Workflows survive engine restart, worker crash, and network blips. Every transition is in the event log.

Retries with exponential backoff

Failing activities are retried per the registered RetryPolicy. Backoff lives in a real timer, not a sleep loop.

Signals and waits

WaitForSignal yields the workflow function. When the signal arrives the engine re-dispatches; replay returns the recorded value.

Bidi-streamed dispatch

Workers open a stream once and ack credits per task. Sub-millisecond delivery latency. Graceful shutdown is instant.

Activity heartbeats

Long-running activities heartbeat to extend visibility. Cancellation propagates via the same channel.

Full observability

Structured slog logs, Prometheus metrics on /metrics, OpenTelemetry tracing across engine/worker/dashboard.

Three commands to a running engine.

Clone and bring up the stack.

git clone https://github.com/edaywalid/sched.git
cd sched
make up

Build the dashboard bundle.

make web-build
docker compose build dashboard
docker compose up -d dashboard

Open the dashboard and start a workflow.

# Visit http://localhost:8080
# Or queue one via the API:
curl -X POST http://localhost:8080/api/workflows \
  -H "Content-Type: application/json" \
  -d '{"workflowName":"HelloWorld","input":"world"}'