-
Notifications
You must be signed in to change notification settings - Fork 25
Description
I'm using AsyncFishAudio with tts.stream_websocket to stream TTS audio over WebSocket in a conversational application.
When generating relatively long responses, I frequently hit a WebSocketNetworkError coming from httpx_ws during ws.receive_bytes(). After some investigation, I found that increasing keepalive_ping_timeout_seconds on the underlying aconnect_ws call from the default 20 seconds to 60 seconds completely eliminates these errors for my use case.
Right now, the only way I can change this timeout is by modifying the library source directly, which is not maintainable.
Environment
- Library:
fishaudio(AsyncFishAudio) - Feature:
AsyncTTSClient.stream_websocket/client.tts.stream_websocket(...) - Transport: WebSocket via
httpx_ws.aconnect_ws - Use case: Long-form, streamed conversational TTS (responses can be quite long)
Current behavior
AsyncTTSClient.stream_websocketinternally calls:
async with aconnect_ws(
"/v1/tts/live",
client=self._client.client,
headers={"model": model, "Authorization": f"Bearer {self._client.api_key}"}
) as ws:
...
- For long TTS generations, there can be periods where no audio chunks or other frames are received for more than 20 seconds.
- In those cases,
httpx_wsraises aWebSocketNetworkError, which bubbles up to my application and breaks the TTS stream.
Workaround
If I patch the library locally and change the aconnect_ws call to:
async with aconnect_ws(
"/v1/tts/live",
client=self._client.client,
headers={"model": model, "Authorization": f"Bearer {self._client.api_key}"},
keepalive_ping_timeout_seconds=60,
) as ws:
...then the WebSocketNetworkError no longer occurs, and long TTS responses stream successfully. However, this requires modifying the installed package, which is fragile and hard to maintain across upgrades.
Requested / expected behavior
It would be great if the keepalive timeout were configurable from the public API, for example by:
- Adding an optional parameter to:
async def stream_websocket(
self,
text_stream: AsyncIterable[Union[str, TextEvent, FlushEvent]],
*,
reference_id: Optional[str] = None,
references: Optional[List[ReferenceAudio]] = None,
format: Optional[AudioFormat] = None,
latency: Optional[LatencyMode] = None,
speed: Optional[float] = None,
config: TTSConfig = TTSConfig(),
model: Model = "s1",
keepalive_ping_timeout_seconds: int | None = None, # for example
):
and passing it through to aconnect_ws, with a sensible default (e.g. current behavior).
- Or alternatively, exposing this via some configuration or
RequestOptions-like object.
Questions
- Is
keepalive_ping_timeout_secondsintended to be user-configurable for long-running TTS streams? - Would you be open to a PR that adds an optional parameter (or configuration mechanism) to control this timeout without patching the library source?
- Is there a recommended pattern in this library for configuring WebSocket-level timeouts for TTS streaming?
Having an official way to configure this timeout would make it much easier to support long-form conversational TTS without resorting to local patches. Thanks!