More than edge scripts. Workers can host full-stack web apps, APIs, AI agents, static sites, mobile backends, and framework-based applications on Cloudflare's global network.
Code is cheaper. App count goes up. The bottleneck moves from writing software to running it safely, globally, and without infrastructure glue.
Absorb the spike. Give every app platform services. Scale to zero when the work disappears.
More apps only helps if the runtime can absorb them. So the next question is overhead: how small can the deployable unit become?
Workers uses V8 isolates instead of provisioning a VM or container per unit of work. Less runtime per app, more apps per machine, faster scale-down to zero.
The less runtime you duplicate per app, the easier it is to run many small, globally distributed pieces of code — bursty APIs, full-stack routes, and agent workloads included.
One box, one app, operational ownership of OS, runtime, packages, and capacity.
Better isolation and scheduling, but each unit still carries a lot of runtime surface.
Great packaging model, but still a heavier unit than code running in a shared isolate runtime.
Lightweight user code running on Cloudflare's global runtime, surrounded by platform bindings.
Lightweight compute is the start, not the whole platform. The real leverage appears when storage, AI, media, and orchestration are native bindings.
A Worker is the request path. Bindings are the platform around it — data, AI, media, orchestration, and network controls — exposed directly to code without hand-rolled API-token plumbing between every service.
This is the mental model to land: Workers is not isolated from the platform. It is the place where the platform is composed.
Once the platform is available inside the Worker, the same model can host websites, APIs, tenant code, agents, storage-backed apps, and media workflows.
Workers becomes the deployment target for modern web applications: server-rendered routes, APIs, static assets, and edge logic all living close to users.
React Routernpm create cloudflare@latest -- rr-app
# choose React Router
Next.jsnpm create cloudflare@latest -- next-app
# choose Next (via @opennextjs/cloudflare)
Astronpm create cloudflare@latest -- astro-app
# choose Astro
Vite + Reactnpm create cloudflare@latest -- vite-app
# choose Vite
npm create cloudflare@latestbindings → env.DB / env.AI / env.BUCKETwrangler versions upload → wrangler versions deployNot every workload starts as a full-stack app. Some begin as one route, one API, or one integration point.
Workers can be a single request handler, a structured API, or a backend that sits in front of other services. You choose how much framework you need.
Use vanilla JavaScript or TypeScript for small APIs, Hono for lightweight routing, or Python support where that ecosystem fits the workload.
Then the same request path can become intelligent: retrieve context, choose a model, call tools, keep state, and queue follow-up work.
Workers AI runs 50+ open-source models on Cloudflare's GPU network — Llama, Mistral, Stable Diffusion, Whisper, BGE embeddings — billed per request, no GPUs to provision. AI Gateway puts the same code path in front of external providers (OpenAI, Anthropic, Replicate, Google AI, Groq) with caching, logging, rate limits, and BYOK key management.
Workers AI is the inference layer. AI Gateway is the control plane. Vectorize and Agents SDK compose on top.
env.AI.run("@cf/meta/llama-3.1-8b-instruct", { messages: [{ role: "system", content: SYSTEM_PROMPT }, { role: "user", content: userContent }], max_tokens: 300 })
50+ open-source models on Cloudflare's GPU network. One binding (env.AI). Pricing uses Neurons, a metric for the actual processing power used: $0.011 per 1,000 Neurons, with 10,000 free Neurons per day.
Same code path, BYOK, cached and logged. Switch providers without changing the Worker.
Compute and AI need state. The platform exposes storage as Worker bindings — pick the shape that matches the access pattern.
Different access patterns deserve different storage. The platform exposes each primitive as a Worker binding, so the same code path can read from key/value, SQL, objects, an existing relational database, or strongly consistent per-key state.
Five bindings cover most application data needs. Mix per app — KV for the hot path, D1 for relations, R2 for files, Hyperdrive in front of an existing database, Durable Objects when state must be strongly consistent.
env.KV.get(key)env.DB.prepare(sql)env.BUCKET.put(key, body)A Durable Object is a tiny serverless actor with its own SQLite database, WebSockets, timers, and one globally agreed identity — perfect for anything that must coordinate live state.
chatRoom:42, match:abc, or userSession:xyz always routes to the same global instance — no lookup service, no sticky-session hacks.Mental model: a tiny serverless actor with a database that the entire world agrees on — the missing stateful building block for global applications.
Where customers feel the magic Chat rooms · multiplayer matches · collaborative docs · bookings with no double-books · live auctions · presence · leaderboards · rate limiters · workflow state.
One instance worldwide. Strong consistency by default.
Put it together — compute, AI, storage, media, network, orchestration. This is what changes when "serverless" becomes "platform".
Lambda gave us isolates without the platform around them. Workers is the inverse: a real runtime with storage, AI, media, queues, security, and network controls all bound directly to code. One programmable edge. No glue between services. No infrastructure to provision. Scale to zero, scale with the internet.
Compute is the start, not the product. The product is what you can compose around it — inference next to data, retrieval next to context, state next to coordination, the CDN and WAF already in the request path. That is the platform Workers ships.
No pre-provisioned GPUs, VMs, or containers. Compute-based pricing — pay nothing when idle, pay per request when busy. Apps that wait on I/O do not pay for the wait.
Code runs within 50 ms of ~95% of the internet-connected world. Smart Placement and bindings keep compute next to the data, models, and services it needs.
Inference, state, storage, media, queues, deployment, observability — in one platform, behind one binding model. Local dev with Wrangler, idea to production in seconds.
The categories above explain what you can build. These demos show the same platform primitives in motion: media, agents, retrieval, queues, AI, and multi-cloud ingress.
Three videos, three postures: public, origin-restricted, signed-URL gated. Signed video uses the Stream binding to mint short-lived JWTs server-side.
One Agent, one MCP Server Portal, three MCP servers behind Access. Tools discovered at runtime; every call crosses a real Zero-Trust boundary. Workers AI generates images; D1 / R2 / KV store results.
Docs into R2 → Queue → Workflow → Workers AI embeds → vectors in Vectorize. At chat time, query vector + top matches stitched into the LLM prompt. Six bindings, one Worker.
Cloudflare edge validates and normalises events, pushes clean payloads into a Queue. A GCP Cloud Run backend pulls via HTTP and acks. Bot mitigation, queueing, and AI visibility in one path via AI Gateway.
A 9-step space mission: SWAPI fetch → AI analysis → AI risk assessment → human-in-the-loop pause → AI report → KV archive. Arm a mid-flight "bomb" to fail a step — previous outputs stay preserved, retry recovers without re-running the AI calls.