Skip to main content

synapse_sdk.plugins.actions.inference.serve

Ray Serve deployment base class and model multiplexing utilities.

BaseServeDeployment

class BaseServeDeployment(BaseAction):

Base class for Ray Serve inference deployments. Inherits from BaseAction to enable action discovery. Provides model loading with JWT-based multiplexing support.

Class Attributes

AttributeTypeDescription
action_namestrAction name for discovery (e.g., 'inference')
appFastAPIFastAPI app instance. Decorators applied by deploy()

Constructor

def __init__(self, backend_url: str) -> None
ParameterTypeDescription
backend_urlstrURL of the Synapse backend for fetching models

Methods

execute

def execute(self) -> None

No-op. Serve deployments handle inference via infer(), not execute().

infer_remote (classmethod)

@classmethod
def infer_remote(cls, params: dict[str, Any], ctx: Any) -> Any

Call the deployed serve endpoint for inference. Resolves route prefix, creates model multiplexing token, and forwards the request.

Params keys:

KeyTypeRequiredDefaultDescription
modelint | strNoNoneModel ID for multiplexing header
methodstrNo'post'HTTP method
jsondictYes{}Request body forwarded to serve endpoint

Route prefix resolution order:

  1. SYNAPSE_PLUGIN_RELEASE_CHECKSUM env var → /{checksum}
  2. SYNAPSE_PLUGIN_RELEASE_CODE env var → /{md5(code)}
  3. config.yaml in working directory → /{md5(code@version)}

get_model

async def get_model(self) -> Any

Get the current model using Ray Serve's multiplexed model ID from request headers.

_get_model (abstract)

@abstractmethod
async def _get_model(self, model_info: dict[str, Any]) -> Any

Load model from extracted artifacts. Override to implement model-specific loading.

ParameterTypeDescription
model_infodictModel metadata with 'path' key pointing to extracted directory

infer (abstract)

@abstractmethod
async def infer(self, *args: Any, **kwargs: Any) -> Any

Run inference. Override and decorate with @app.post('/') to define the endpoint.

create_serve_multiplexed_model_id

def create_serve_multiplexed_model_id(
model_id: int | str,
token: str,
backend_url: str,
tenant: str | None = None,
) -> str

Create a JWT-encoded model ID for serve multiplexing.

ParameterTypeRequiredDescription
model_idint | strYesModel ID to encode
tokenstrYesUser access token
backend_urlstrYesBackend URL (used as JWT secret)
tenantstr | NoneNoTenant identifier

Returns: JWT-encoded model token string for the serve_multiplexed_model_id header.