@Endpoint functions execute on Runpod Serverless.
Example
Start the development server with defaults:Flags
Host address to bind the server to.
Port number to bind the server to.
Enable or disable auto-reload on code changes. Enabled by default.
--auto-provision
Auto-provision all Serverless endpoints on startup instead of lazily on first call. Eliminates cold-start delays during development.
Endpoint descriptions from docstrings
Flash extracts the first line of each function’s docstring and uses it in two places:- Startup table: The “Description” column shows the docstring when the server starts.
- Swagger UI: The endpoint summary in the API explorer at
/docs.
@Endpoint functions to make your API self-documenting:
flash run, the startup table displays “Analyze text and return sentiment scores” as the description for this endpoint, and the same text appears in the Swagger UI summary.
Architecture
Withflash run, Flash starts a local development server alongside remote Serverless endpoints:
Key points:
- A local development server provides a convenient testing interface at
localhost:8888. @Endpointfunctions deploy to Runpod Serverless withlive-prefix to distinguish from production.- Code changes are picked up automatically without restarting the server.
- The development server routes requests to appropriate remote endpoints.
flash deploy, where all endpoints run on Runpod without a local server.
Auto-provisioning
By default, endpoints are provisioned lazily on first@Endpoint function call. Use --auto-provision to provision all endpoints at server startup:
How it works
- Discovery: Scans your app for
@Endpointdecorated functions. - Deployment: Deploys resources concurrently (up to 3 at a time).
- Confirmation: Asks for confirmation if deploying more than 5 endpoints.
- Caching: Stores deployed resources in
.runpod/resources.pklfor reuse. - Updates: Recognizes existing endpoints and updates if configuration changed.
Benefits
- Zero cold start: All endpoints ready before you test them.
- Faster development: No waiting for deployment on first HTTP call.
- Resource reuse: Cached endpoints are reused across server restarts.
When to use
- Local development with multiple endpoints.
- Testing workflows that call multiple remote functions.
- Debugging where you want deployment separated from handler logic.
Provisioning modes
| Mode | When endpoints are deployed |
|---|---|
| Default (lazy) | On first @Endpoint function call |
--auto-provision | At server startup |
Testing your API
Once the server is running, test your endpoints:Queue-based endpoints require the
{"input": {...}} wrapper format to match deployed endpoint behavior. Load-balanced endpoints accept direct JSON payloads.Requirements
RUNPOD_API_KEYmust be set in your.envfile or environment.- A valid Flash project structure (created by
flash initor manually).
flash run vs flash deploy
| Aspect | flash run | flash deploy |
|---|---|---|
| Local development server | Yes (http://localhost:8888) | No |
@Endpoint functions run on | Runpod Serverless | Runpod Serverless |
| Endpoint persistence | Temporary (live- prefix) | Persistent |
| Code updates | Automatic reload | Manual redeploy |
| Use case | Development | Production |
Related commands
flash init- Create a new projectflash deploy- Deploy to productionflash undeploy- Remove endpoints