Skip to main content

synapse_sdk.plugins.actions.inference.deployment

Ray Serve deployment action base class.

DeploymentProgressCategories

class DeploymentProgressCategories:
INITIALIZE = 'initialize'
DEPLOY = 'deploy'
REGISTER = 'register'

BaseDeploymentAction

class BaseDeploymentAction(BaseAction[P]):

Base class for deploying BaseServeDeployment subclasses to Ray Serve. Handles Ray initialization, serve deployment with decorators, and backend registration.

Class Attributes

AttributeTypeDescription
progressDeploymentProgressCategoriesProgress category constants
entrypointtype | NoneBaseServeDeployment subclass to deploy

Properties

client

@property
def client(self) -> BackendClient

Backend client from runtime context.

agent_client

@property
def agent_client(self) -> AgentClient

Agent client for serve application registration.

Methods

ray_init

def ray_init(self, **kwargs: Any) -> None

Initialize Ray cluster connection. Connects to the cluster specified by environment.

deploy

def deploy(self) -> None

Deploy the entrypoint class to Ray Serve. Automatically applies @serve.deployment and @serve.ingress(app) decorators, using get_ray_actor_options() for resource configuration and get_route_prefix() for URL routing.

Runs _check_serve_capacity(ray_actor_options, serve_options) immediately before serve.run(...). Raises RuntimeError when the agent denies capacity or is unreachable (see below).

_check_serve_capacity (SYN-7005)

def _check_serve_capacity(
self,
ray_actor_options: dict[str, Any],
serve_options: dict[str, Any],
) -> None

Pre-flight cluster capacity gate. Calls ctx.agent_client.check_feasibility(kind='serve', ...) once before serve.run(...). See Inference Actions — Capacity Gate for the user-facing branch / payload matrix.

Raises:

  • RuntimeError('Serve deploy denied: insufficient cluster capacity ([reasons])') when agent returns allowed=False.
  • RuntimeError('Serve deploy capacity check failed: agent unreachable ({error_type})') on ClientError / ClientTimeoutError / TimeoutError / ConnectionError / OSError / malformed response. Original exception attached via __cause__.

Graceful skip: when ctx.agent_client is None, emits WARNING: serve capacity gate skipped: agent_client not provided and returns.

Module-level constants (_CAPACITY_DENIED_MSG, _CAPACITY_UNREACHABLE_MSG, _CAPACITY_SKIP_LOG) are exported for downstream test monkey-patching.

register_serve_application

def register_serve_application(self) -> int | None

Register the deployed serve application with the Synapse backend.

Returns: Serve application ID if created, None otherwise.

Configuration Methods

Override these to customize deployment behavior:

get_serve_app_name

def get_serve_app_name(self) -> str

Default: SYNAPSE_PLUGIN_RELEASE_CODE env var.

get_route_prefix

def get_route_prefix(self) -> str

Default: /{SYNAPSE_PLUGIN_RELEASE_CHECKSUM} or /{md5(app_name)}.

get_ray_actor_options

def get_ray_actor_options(self) -> dict[str, Any]

Default: Extracts num_cpus, num_gpus, memory from action params.

get_runtime_env

def get_runtime_env(self) -> dict[str, Any]

Default: Empty dict {}.

Usage

See Inference Actions for a complete example.