Skip to main content
Start the Flash development server for local testing with automatic updates. A local development server provides a unified interface for testing while @Endpoint functions execute on Runpod Serverless.
flash run [OPTIONS]

Example

Start the development server with defaults:
flash run
Start with auto-provisioning to eliminate cold-start delays:
flash run --auto-provision
Start on a custom port:
flash run --port 3000

Flags

--host
string
default:"localhost"
Host address to bind the server to.
--port, -p
integer
default:8888
Port number to bind the server to.
--reload/--no-reload
default:"enabled"
Enable or disable auto-reload on code changes. Enabled by default.
--auto-provision
Auto-provision all Serverless endpoints on startup instead of lazily on first call. Eliminates cold-start delays during development.

Endpoint descriptions from docstrings

Flash extracts the first line of each function’s docstring and uses it in two places:
  • Startup table: The “Description” column shows the docstring when the server starts.
  • Swagger UI: The endpoint summary in the API explorer at /docs.
Add docstrings to your @Endpoint functions to make your API self-documenting:
@Endpoint(name="text-processor", gpu=GpuGroup.ANY)
def analyze_text(text: str) -> dict:
    """Analyze text and return sentiment scores."""
    # Implementation here
    return {"sentiment": "positive"}
When you run flash run, the startup table displays “Analyze text and return sentiment scores” as the description for this endpoint, and the same text appears in the Swagger UI summary.

Architecture

With flash run, Flash starts a local development server alongside remote Serverless endpoints: Key points:
  • A local development server provides a convenient testing interface at localhost:8888.
  • @Endpoint functions deploy to Runpod Serverless with live- prefix to distinguish from production.
  • Code changes are picked up automatically without restarting the server.
  • The development server routes requests to appropriate remote endpoints.
This differs from flash deploy, where all endpoints run on Runpod without a local server.

Auto-provisioning

By default, endpoints are provisioned lazily on first @Endpoint function call. Use --auto-provision to provision all endpoints at server startup:
flash run --auto-provision

How it works

  1. Discovery: Scans your app for @Endpoint decorated functions.
  2. Deployment: Deploys resources concurrently (up to 3 at a time).
  3. Confirmation: Asks for confirmation if deploying more than 5 endpoints.
  4. Caching: Stores deployed resources in .runpod/resources.pkl for reuse.
  5. Updates: Recognizes existing endpoints and updates if configuration changed.

Benefits

  • Zero cold start: All endpoints ready before you test them.
  • Faster development: No waiting for deployment on first HTTP call.
  • Resource reuse: Cached endpoints are reused across server restarts.

When to use

  • Local development with multiple endpoints.
  • Testing workflows that call multiple remote functions.
  • Debugging where you want deployment separated from handler logic.

Provisioning modes

ModeWhen endpoints are deployed
Default (lazy)On first @Endpoint function call
--auto-provisionAt server startup

Testing your API

Once the server is running, test your endpoints:
# Health check
curl http://localhost:8888/

# Call a queue-based GPU endpoint (gpu_worker.py)
curl -X POST http://localhost:8888/gpu_worker/runsync \
  -H "Content-Type: application/json" \
  -d '{"input": {"message": "Hello from GPU!"}}'

# Call a load-balanced endpoint (lb_worker.py)
curl -X POST http://localhost:8888/lb_worker/process \
  -H "Content-Type: application/json" \
  -d '{"data": "test"}'
Queue-based endpoints require the {"input": {...}} wrapper format to match deployed endpoint behavior. Load-balanced endpoints accept direct JSON payloads.
Open http://localhost:8888/docs for the interactive API explorer.

Requirements

  • RUNPOD_API_KEY must be set in your .env file or environment.
  • A valid Flash project structure (created by flash init or manually).

flash run vs flash deploy

Aspectflash runflash deploy
Local development serverYes (http://localhost:8888)No
@Endpoint functions run onRunpod ServerlessRunpod Serverless
Endpoint persistenceTemporary (live- prefix)Persistent
Code updatesAutomatic reloadManual redeploy
Use caseDevelopmentProduction