Skip to content

Allow configuring keepalive_ping_timeout_seconds for tts.stream_websocket (AsyncFishAudio) #47

@stackpiles-naka

Description

@stackpiles-naka

I'm using AsyncFishAudio with tts.stream_websocket to stream TTS audio over WebSocket in a conversational application.

When generating relatively long responses, I frequently hit a WebSocketNetworkError coming from httpx_ws during ws.receive_bytes(). After some investigation, I found that increasing keepalive_ping_timeout_seconds on the underlying aconnect_ws call from the default 20 seconds to 60 seconds completely eliminates these errors for my use case.

Right now, the only way I can change this timeout is by modifying the library source directly, which is not maintainable.

Environment

  • Library: fishaudio (AsyncFishAudio)
  • Feature: AsyncTTSClient.stream_websocket / client.tts.stream_websocket(...)
  • Transport: WebSocket via httpx_ws.aconnect_ws
  • Use case: Long-form, streamed conversational TTS (responses can be quite long)

Current behavior

  • AsyncTTSClient.stream_websocket internally calls:
 async with aconnect_ws(
     "/v1/tts/live",
     client=self._client.client,
     headers={"model": model, "Authorization": f"Bearer {self._client.api_key}"}
 ) as ws:
     ...
  • For long TTS generations, there can be periods where no audio chunks or other frames are received for more than 20 seconds.
  • In those cases, httpx_ws raises a WebSocketNetworkError, which bubbles up to my application and breaks the TTS stream.

Workaround

If I patch the library locally and change the aconnect_ws call to:

async with aconnect_ws(
    "/v1/tts/live",
    client=self._client.client,
    headers={"model": model, "Authorization": f"Bearer {self._client.api_key}"},
    keepalive_ping_timeout_seconds=60,
) as ws:

...then the WebSocketNetworkError no longer occurs, and long TTS responses stream successfully. However, this requires modifying the installed package, which is fragile and hard to maintain across upgrades.

Requested / expected behavior

It would be great if the keepalive timeout were configurable from the public API, for example by:

  • Adding an optional parameter to:
 async def stream_websocket(
     self,
     text_stream: AsyncIterable[Union[str, TextEvent, FlushEvent]],
     *,
     reference_id: Optional[str] = None,
     references: Optional[List[ReferenceAudio]] = None,
     format: Optional[AudioFormat] = None,
     latency: Optional[LatencyMode] = None,
     speed: Optional[float] = None,
     config: TTSConfig = TTSConfig(),
     model: Model = "s1",
     keepalive_ping_timeout_seconds: int | None = None,  # for example
 ):

and passing it through to aconnect_ws, with a sensible default (e.g. current behavior).

  • Or alternatively, exposing this via some configuration or RequestOptions-like object.

Questions

  1. Is keepalive_ping_timeout_seconds intended to be user-configurable for long-running TTS streams?
  2. Would you be open to a PR that adds an optional parameter (or configuration mechanism) to control this timeout without patching the library source?
  3. Is there a recommended pattern in this library for configuring WebSocket-level timeouts for TTS streaming?

Having an official way to configure this timeout would make it much easier to support long-form conversational TTS without resorting to local patches. Thanks!

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions