Releases: mkiol/dsnote
Releases · mkiol/dsnote
Speech Note 4.8.3
Linux Desktop
Changes:
- General
- Fix: The model cannot be activated when the license file cannot be downloaded due to an error.
- Speech to Text
- Fix: App crashes when WhisperCpp is used on certain CPUs.
- Text to Speech
- Fix: The Coqui XTTS model license cannot be downloaded.
- Translator
- Fix: App crashes when the translator is used on certain CPUs.
Speech Note 4.8.2
Linux Desktop
Changes:
- Text to Speech
- New Piper voices for Argentine Spanish, Hindi, Malayalam and Nepali
- Fix: Using the Coqui TTS engine causes the app to crash on some platforms.
- Fix: Crash when the TTS engine generates a corrupted audio file
- Speech to Text
- New languages enabled in Whisper: Azerbaijani, Belarusian, Kannada, Malayalam, Tamil
- Flatpak
- Downgrade numba Python package to 0.60.0 version
Sailfish OS
Changes:
- Text to Speech
- New Piper voices for Argentine Spanish, Hindi, Malayalam and Nepali
- Fix: Crash when the TTS engine generates a corrupted audio file
- Speech to Text
- New languages enabled in Whisper: Azerbaijani, Belarusian, Kannada, Malayalam, Tamil
Speech Note 4.8.1
Linux Desktop
Changes:
- Translator
- Fix: Model download error for Portuguese, Dutch, Persian, Norwegian and Icelandic languages.
- Updated models with improved accuracy: German to English, Dutch to English, English to Ukrainian, English to Hungarian, English to Catalan, Catalan to English, English to Lithuanian, English to Latvian, English to Slovenian, Slovenian to English, English to Slovak, English to Russian
- New models: Azerbaijani to English, Belarusian to English, Bengali to English, Gujarati to English, Hebrew to English, Hindi to English, Kannada to English, Malayalam to English, Malay to English, Albanian to English, Tamil to English
- Speech to Text
- New very large Vosk model for German language: Tuda-DE Large
- Text to Speech
- Coqui MMS models for the following new languages: Kannada, Malayalam, Tamil
- User Interface
- Speech Note has been translated into German language.
Sailfish OS
Changes:
- Translator
- Fix: Model download error for Portuguese, Dutch, Persian, Norwegian and Icelandic languages.
- Updated models with improved accuracy: German to English, Dutch to English, English to Ukrainian, English to Hungarian, English to Catalan, Catalan to English, English to Lithuanian, English to Latvian, English to Slovenian, Slovenian to English, English to Slovak, English to Russian
- New models: Azerbaijani to English, Belarusian to English, Bengali to English, Gujarati to English, Hebrew to English, Hindi to English, Kannada to English, Malayalam to English, Malay to English, Albanian to English, Tamil to English
- User Interface
- Speech Note has been translated into German language.
Speech Note 4.8.0
Linux Desktop
Video presentation of all new features: https://www.youtube.com/watch?v=ww6skKOOzZ8
Changes:
- General
- Case-sensitive matching in Rules
- User Interface
- Speech Note has been translated into Arabic, Catalan, Spanish, Turkish and French-Canadian languages.
- Command line option and DBus API for exporting synthesized speech to an audio file instead of playing it aloud. Use
--output-filetogether withstart-reading-clipboardorstart-reading-textactions.
- Speech to Text
- New CrisperWhisper model for FasterWhisper engine. CrisperWhisper is designed for fast, precise, and verbatim speech recognition with accurate word-level timestamps. Unlike the original Whisper, which tends to omit disfluencies and follows more of a intended transcription style, CrisperWhisper aims to transcribe every spoken word exactly as it is, including fillers, pauses, stutters and false starts. CrisperWhisper model is enabled only for English and German languages.
- New KBLab Whisper models for Swedish. The National Library of Sweden has released fine-tuned STT models trained on its library collections. The models have significantly improved accuracy compared to regular Whisper models.
- FUTO Whisper models. New models used in the FUTO mobile keyboard app.
- Using an existing note as the initial context in decoding. This has the potential to improve transcription quality and reduce "hallucination" problem. If you observe a degradation in quality, turn off the Use note as context option.
- Option to pause listening while processing. This option can be useful when Listening mode is Always on. By default, listening continues even when a piece of audio data is being processed. Using this option, you can temporarily pause listening for the duration of processing.
- Option to play an audible tone when starting and stopping listening
- Text to Speech
- Kokoro TTS engine. Kokoro is a compact yet powerful open-source multilingual TTS engine. Despite its modest size (trained on less than 100 hours of audio), it delivers impressive results. Kokoro voices are enabled for: English, Chinese, Japanese, Hindi, Italian, French, Spanish and Portuguese.
- F5-TTS engine. The F5-TTS provides exceptional voice cloning capabilities. The currently enabled model works with English and Chinese languages. F5-TTS works best with CUDA acceleration. CPU only processing can be very slow.
- Parler-TTS engine. Parler-TTS can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc). The speaker's characteristics are defined by a text description (prompt). To use Parler-TTS models, you need to configure a Text voice profile. This can be done in the Voice profiles menu. Parler-TTS primarily supports English, but a multilingual model for French, Spanish, Portuguese, Polish, German, Dutch and Italian is also included. Currently, the multilingual model provides rather poor quality and not entirely usable speech. Parler-TTS works best with CUDA acceleration. CPU only processing can be very slow.
- S.A.M. TTS engine. S.A.M. is a small speech synthesizer designed for the Commodore 64. It features a robotic voice that evokes a strong sense of nostalgia. The S.A.M. voice is available in English only.
- Normalize audio setting option. Use this option to enable/disable audio volume normalization. The volume is normalized independently for each sentence, which can lead to unstable volume levels in different sentences. Disable this option if you observe this problem.
- New Piper voices for Dutch, Finnish, German and Luxembourgish
- New RHVoice voice for Spanish
- Updated RHVoice voice for Czech
- Translator
- New models: English to Chinese, English to Arabic, Arabic to English, English to Korean, English to Japanese
- Accessibility (Wayland)
- Support for Insert into active window under Wayland. Using
start-listening-active-windoworstart-listening-translate-active-windowactions you can directly insert the decoded text into any window which is currently in focus. This feature worked under X11 only, but now it is also supported under Wayland. For actions to work, ydotool daemon must be installed and running. If you are using Flatpak, also make sure that the application has permission to accessydotooldaemon socket file. - Support for Global keyboard shortcuts under Wayland. Global keyboard shortcuts allow you to start or stop listening and reading using keyboard even when the application is not active (e.g. minimized or in the background). Until now, this capability was only available under X11. Now integration with XDG Desktop Portal has been added, making global keyboard shortcuts possible also under Wayland. For shortcuts to work, your desktop environment has to support GlobalShortcuts interface on XDG Desktop Portal service. Right now,
GlobalShortcutsis only supported in KDE Plasma and latest GNOME.
- Support for Insert into active window under Wayland. Using
- Flatpak
- Python support enabled in Tiny and ARM packages. Python libraries are not included in Tiny or ARM packages, but using the Location of Python libraries option, you can set an external directory that contains the libraries. Make sure that the Flatpak application has permissions to access this directory.
- Flatpak runtime update to version 5.15-24.08
Sailfish OS
Changes:
- User Interface
- Speech Note has been translated into Arabic, Catalan, Spanish, Turkish and French-Canadian languages.
- Speech to Text
- New KBLab Whisper models for Swedish. The National Library of Sweden has released fine-tuned STT models trained on its library collections. The models have significantly improved accuracy compared to regular Whisper models.
- FUTO Whisper models. New models used in the FUTO mobile keyboard app.
- Using an existing note as the initial context in decoding. This has the potential to improve transcription quality and reduce "hallucination" problem. If you observe a degradation in quality, turn off the Use note as context option.
- Option to pause listening while processing. This option can be useful when Listening mode is Always on. By default, listening continues even when a piece of audio data is being processed. Using this option, you can temporarily pause listening for the duration of processing.
- Option to play an audible tone when starting and stopping listening
- Text to Speech
- S.A.M. TTS engine. S.A.M. is a small speech synthesizer designed for the Commodore 64. It features a robotic voice that evokes a strong sense of nostalgia. The S.A.M. voice is available in English only.
- Normalize audio setting option. Use this option to enable/disable audio volume normalization. The volume is normalized independently for each sentence, which can lead to unstable volume levels in different sentences. Disable this option if you observe this problem.
- New Piper voices for Dutch, Finnish, German and Luxembourgish
- New RHVoice voice for Spanish
- Updated RHVoice voice for Czech
- Translator
- New models: English to Chinese, English to Arabic, Arabic to English, English to Korean, English to Japanese
Speech Note 4.8.0 Beta4
Speech Note 4.8.0 Beta 4
Speech Note 4.7.1
Linux Desktop
Changes:
- General
- Fix: The application failed to start when the processor did not support AVX CPU extension.
- Translator:
- New models: Korean to English, Japanese to English
- Updated models: Chinese to English
Speech Note 4.7.0
Linux Desktop
Changes:
- General
- Rules for text transformations that can be applied after Speech to Text or before Text to Speech. With Rules, you can easily and flexibly correct errors in decoded text or correct mispronounced words.
- New modes for inserting text at the cursor position or replacing the current note. To insert text at the cursor position rather than at the end of the note, change Text appending mode option to Add at the cursor position in the settings. When the Replace an existing note option is set, whenever new text is added, it will replace the existing note.
- DBus API for integration with external applications
- User Interface:
- Speech Note has been translated into Slovenian language.
- Status indication in the system tray icon. When using the system tray icon, statuses such as processing, listening, etc. are presented with an animated tray icon.
- Models grouped by type in model browser. To improve usability, instead of a list containing models of all types, models are grouped by type in separate tabs.
- New General and Advanced tabs in Settings
- Command line options for printing available or active model IDs. Use --print-available-models or --print-active-model to list all available models or the currently active model.
- Command line option to print the current status of the application. Use --print-state to see the current state. This option can be useful when integrating with external programs or widgets on the desktop.
- Speech to Text:
- Support for Vulkan GPU acceleration in WhisperCpp. Vulkan acceleration enables much faster STT decoding with Intel, AMD or NVIDIA graphics cards. With Vulkan, decoding is quicker than with OpenVINO, OpenCL and ROCm, but still may be slightly slower compared to CUDA. The biggest advantage of Vulkan is that you can use it without installing any GPU acceleration add-ons.
- New Whisper Large Turbo model for both WhisperCpp and FasterWhisper. Turbo is a finetuned version of a pruned Whisper Large-v3. It's the exact same model, except that the number of decoding layers have reduced. As a result, the model is way faster, at the expense of a minor quality degradation. Turbo model does not have the ability to translate into English, as does the regular Large model.
- Simplified engine configuration options. Instead of multiple options, you can now select a Profile, which allows you to change the engine's processing parameters. There are three profiles to choose from: Best Performance, Best Quality and Custom.
- Echo mode. After processing, the decoded text will be immediately read out using the currently set Text to Speech model.
- Text to Speech:
- New Piper voice for Latvian
- New WhisperSpeech Small model for: English, Italian, German, French, Spanish, Dutch and Portuguese
- Translator:
- New models: English to Finnish, English to Turkish, English to Swedish, Swedish to English, English to Slovak, English to Indonesian, English to Romanian, English to Greek, Chinese to English
- Updated models: English to Catalan, English to Russian, English to Ukrainian, English to Czech
- Accessibility:
- Option to scan special key strokes when setting keyboard shortcuts (X11 only). If you want to use special keys as shortcuts (so-called "multimedia keys"), instead of typing their names, you can automatically set the key by pressing it.
- Keyboard shortcuts enabled for several user interface elements. Elements such as menu items or buttons can be controlled using the keyboard shortcuts. Examples: Switch to Notepad (Ctrl+N), Switch to Translator (Ctrl+T), Open Languages (Ctrl+L), Read (Ctrl+Alt+Shift+R), Listen (Ctrl+Alt+Shift+L), Stop (Ctrl+Alt+Shift+S), Cancel (Ctrl+Alt+Shift+C), Pause (Ctrl+Alt+Shift+P) and more...
- New Actions and global keyboard shotcut to force translation of text in STT: start-listening-translate, start-listening-translate-active-window, start-listening-translate-clipboard. The decoded text is always translated into English when the "translate" action is triggered. This only works when using Whisper models.
- New Action to read text from the command-line option: start-reading-text. To pass text, use the --text option in the command-line interface.
- Flatpak:
- The Flatpak GPU acceleration add-on for AMD is no longer recommended. Better results can be achieved with Vulkan acceleration, which does not require the add-on.
- whisper.cpp update to version 1.7.1
- PyTorch update to version 2.5.1
- ROCm update to version 6.2.2 (AMD add-on)
- cuDNN update to version 9.5.1 (NVIDIA add-on)
Video presentation of all new features: https://www.youtube.com/watch?v=cEht4Fts6Bo
Sailfish OS
Changes:
- General
- New mode for replacing the current note instead of appending new text to it. When the Replace an existing note option is set, whenever new text is added, it will replace the existing note.
- User Interface:
- Speech Note has been translated into Slovenian language.
- Speech to Text:
- Settings option Profile which allows you to change WhisperCpp processing parameters. There are two profiles to choose from: Best Performance, Best Quality.
- Echo mode. After processing, the decoded text will be immediately read out using the currently set Text to Speech model.
- Update the whisper.cpp library. This provides a 10% increase in STT speed with WhisperCpp models.
- Text to Speech:
- New Piper voice for Latvian
- Translator:
- New models: English to Finnish, English to Turkish, English to Swedish, Swedish to English, English to Slovak, English to Indonesian, English to Romanian, English to Greek, Chinese to English
- Updated models: English to Catalan, English to Russian, English to Ukrainian, English to Czech
Speech Note 4.6.1
Linux Desktop
Changes:
- General
- Fix: The application failed to start when the processor did not support the required CPU extension.
- User Interface
- Swedish translation has been updated.
- Accessibility
- Fix: Special keyboard keys were not supported as a keyboard shortcut. Examples: 'Favorites', 'Launch Mail', 'Refresh', 'Home Page', 'Calculator' and many more...
- Translator
- New models: English to Latvian, English to Danish, English to Croatian, English to Slovenian, Indonesian to English, Romanian to English
- Updated models: English to Hungarian, Czech to English, Greek to English
Sailfish OS
Changes:
- User Interface
- Swedish translation has been updated.
- Translator
- New models: English to Latvian, English to Danish, English to Croatian, English to Slovenian, Indonesian to English, Romanian to English
- Updated models: English to Hungarian, Czech to English, Greek to English
Speech Note 4.6.0
Linux Desktop
Changes:
- User Interface
- Speech Note has been translated into Norwegian language.
- Grouped models. Models that provide multiple sub-models (for example, TTS models that provide different voices) are shown in groups. This makes it easier to find models in the model browser.
- Speech to Text
- The name of the all Whisper models has been changed to WhisperCpp to better reflect the engine behind them.
- Automatic language detection in STT. To automatically detect the language during STT, select one of the models that is in the Auto detected category in the language list.
- Separate settings for engines. The configuration of each engine has been separated in the settings. You can separately set the parameters for WhisperCpp and FasterWhisper. The new configuration parameters that have been added to the settings are: Number of simultaneous threads, Beam search width, Audio context size, Use Flash Attention.
- Quicker decoding with WhisperCpp. Optimization for short sentences has been added to WhisperCpp. With it, the speed of STT has doubled!
- Support for OpenVINO hardware acceleration in WhisperCpp engine. With OpenVINO decoding on CPU is much quicker. If you are not using GPU acceleration, it is recommended to enable OpenVINO in WhisperCpp engine settings. Currently, OpenVINO is enabled only for CPU acceleration.
- Option for inserting processing statistics. New settings option allows inserting processing related information to the text after decoding, such as processing time and audio length. This can be useful for comparing the performance of different models, engines and their parameters.
- Text to Speech
- Control tags for advance TTS processing. Control tags allow you to dynamically change the speed of synthesized text or add silence between sentences. To use control tags, insert {speed: 0.5} or {silence: 1s} into the text. For convenience, you can also insert predefined control tags using text context menu Insert control tag.
- Welsh language. New language is enabled with Piper voice.
- New Piper voices for Spanish, Italian and English
- New RHVoice voices for Slovak and Croatian
- Translator
- Improved Translator UI. The Translate, Switch languages and Add buttons have been placed between text areas which is more convenient.
- Support for older hardware. Until now, the translator did not work on older processors without CPU AVX extension. Now there is no such restriction anymore.
- New models: English to Lithuanian, Croatian to English, Latvian to English, Danish to English, Serbian to English, Slovak to English, Bosnian to English, Vietnamese to English
- Updated models: Lithuanian to English, Slovenian to English, Russian to English, Ukrainian to English
- Flatpak
- New library: OpenVINO version 2024.1.0.15008
- whisper.cpp update to version 1.6.2
- CTranslate2 update to version 4.3.1
Video presentation of all new features: https://www.youtube.com/watch?v=AVW5OY63wjg
Sailfish OS
Changes:
- User Interface
- Speech Note has been translated into Norwegian language.
- Grouped models. Models that provide multiple sub-models (for example, TTS models that provide different voices) are shown in groups. This makes it easier to find models in the model browser.
- Option to enable/disable support for subtitles. Subtitle support is a niche functionality. To simplify the user interface, the subtitle options is not visible by default. To enable them, use the Subtitles support option in the settings.
- Speech to Text
- The name of the all Whisper models has been changed to WhisperCpp to better reflect the engine behind them.
- Automatic language detection in STT. To automatically detect the language during STT, select one of the models that is in the Auto detected category in the language list.
- Quicker decoding with WhisperCpp. Optimization for short sentences has been added to WhisperCpp. With it, the speed of STT has doubled!
- Translate to English option for WhisperCpp models. When enabled, speech is automatically translated into English.
- Option for inserting processing statistics. New settings option allows inserting processing related information to the text after decoding, such as processing time and audio length. This can be useful for comparing the performance of different models, engines and their parameters.
- Text to Speech
- Welsh language. New language is enabled with Piper voice.
- New Piper voices for Spanish, Italian and English
- New RHVoice voices for Slovak and Croatian
- Translator
- New button for switching languages.
- New models: English to Lithuanian, Croatian to English, Latvian to English, Danish to English, Serbian to English, Slovak to English, Bosnian to English, Vietnamese to English
- Updated models: Lithuanian to English, Slovenian to English, Russian to English, Ukrainian to English
Speech Note 4.5.0
Linux Desktop
Changes:
- User Interface
- Import subtitles embedded into video file. If your video file contains one or many subtitle streams, you can import the selected subtitles into notepad
- Support for more subtitles formats. You can import and export subtitles in SRT, WebVTT and ASS formats.
- Unified file importing and exporting. Text, subtitles, audio and video files can be imported or exported using unified menu bar option.
- Settings option to enable/disable remembering the last note. If the option is disabled, the last note will not be available after restarting the app.
- Settings option for default action when importing note from a file. You can set Ask whether to add or replace, Add to an existing note or Replace an existing note.
- Enhanced text editor font settings. You can set the font family, style and size of the font used in the text editor.
- Text to Text repair options. With these options you can directly fix diacritical marks and punctuation in the text.
- Text context menu with additional options: Read selection and Translate selection. To activate context menu use mouse right click.
- New text appending style: After empty line
- System tray menu for changing active STT/TTS model
- User friendly names of audio input devices
- Simplified model filtering. It is now less flexible, but much easier to understand and use.
- Speech Note has been translated into Ukrainian and Russian languages.
- Fix: Cancellation was blocking the user interface.
- Speech to Text
- Text to Speech
- WhisperSpeech engine that generates voice with exceptional naturalness. The new engine comes with models for English and Polish languages. All models support voice cloning.
- New voice cloning model for Vietnamese: viXTTS. Model is a fine-tuned version of the phenomenal Coqui XTTS.
- New Piper voices for English, Persian, Slovenian, Turkish, French and Spanish
- New RHVoice voice for Czech
- Settings option to enable/disable speech synchronization with subtitle timestamps. This may be useful for creating voice overs.
- Mixing speech with audio from an existing file. When exporting to a file, you can overlay speech with audio from an existing media file. This can be useful when creating voice overs from subtitles.
- Context menu option to read from cursor position or read only selected text. To activate context menu use mouse right click.
- Speech audio is always normalized after TTS processing.
- Fix: Mimic3 models could not be downloaded.
- Translator
- New models: Greek to English, Maltese to English, Slovenian to English, Turkish to English, English to Catalan
- Updated models: Czech and Lithuanian
- Handy buttons to quickly add translated text to the note or to replace it and switch languages
- Context menu option to translate from cursor position or translate only selected text. To activate context menu use mouse right click.
- Accessibility
- New Actions for STT/TTS models switching: switch-to-next-stt-model, switch-to-prev-stt-model, switch-to-next-tts-model, switch-to-prev-tts-model, set-stt-model, set-tts-model
- New global keyboard shortcuts for STT/TTS models switching (X11 only): Switch to next STT model, Switch to prev STT model, Switch to next TTS model, Switch to prev TTS model
- Toggle option for keyboard shortcuts (X11 only). When Toggle behavior is enabled, Start listening/reading shortcuts will also stop listening/reading if they are triggered while listening/reading is active.
- Fix: Accented characters (e.g.: ã, ê) were not transferred correctly to the active window.
- Flatpak
- Flatpak runtime update to version 5.15-23.08
- AMD ROCm update to version 5.7.3
- PyTorch update to version 2.2.1
- CTranslate2 update to version 4.2.1
- Faster-Whisper update to version 1.0.2
A video demonstration of all the changes in 4.5.0: https://www.youtube.com/watch?v=S9MJ7y8-bcw
Sailfish OS
Changes:
- User Interface
- Import subtitles in many formats and subtitles embedded into video file. You can import and export subtitles in SRT, WebVTT and ASS formats. If your video file contains one or many subtitle streams, you can import the selected subtitles into notepad.
- Unified file importing and exporting. Text, subtitles, audio and video files can be imported or exported using unified pull-down menu option.
- Settings option to enable/disable remembering the last note. If the option is disabled, the last note will not be available after restarting the app.
- Settings option for default action when importing note from a file. You can set Ask whether to add or replace, Add to an existing note or Replace an existing note.
- New text appending style: After empty line
- Speech Note has been translated into Ukrainian and Russian languages.
- Fix: Cancellation was blocking the user interface.
- Speech to Text
- Subtitles support in STT. To generate timestamped text in SRT format, change the text format to SRT Subtitles using the button at the bottom of the text area. Check the settings to find more subtitle options.
- Text to Speech
- Speech synchronized with subtitle timestamps in TTS. When the text format is set to SRT Subtitles, the generated speech will be synchronized with the subtitle timestamps. This can be useful if you want to make voice over.
- New Piper voices for English, Persian, Slovenian, Turkish, French and Spanish
- New RHVoice voice for Czech
- Settings option to enable/disable speech synchronization with subtitle timestamps.
- Speech audio is always normalized after TTS processing.
- Translator
- New models: Greek to English, Maltese to English, Slovenian to English, Turkish to English, English to Catalan
- Updated models: Czech and Lithuanian