Merge pull request #658 from harry0703/dev

bump version to 1.2.6
Merge branch 'add-siliconflow-tts' into dev
2025-05-10 14:14:42 +08:00 · 2025-05-10 14:13:37 +08:00 · 2025-05-10 14:13:18 +08:00 · 2025-05-10 14:12:11 +08:00 · 2025-05-10 14:11:26 +08:00 · 2025-05-10 14:10:42 +08:00
65 changed files with 5296 additions and 1404 deletions
--- a/.github/ISSUE_TEMPLATE/bug_report.yml
+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -0,0 +1,81 @@
+name: 🐛 Bug
+description: 出现错误或未按预期工作
+title: "请在此处填写标题"
+labels:
+  - bug
+
+body:
+  - type: markdown
+    attributes:
+      value: |
+        **在提交此问题之前，请确保您已阅读以下文档：[Getting Started (英文)](https://github.com/harry0703/MoneyPrinterTurbo/blob/main/README-en.md#system-requirements-) 或 [快速开始 (中文)](https://github.com/harry0703/MoneyPrinterTurbo/blob/main/README.md#%E5%BF%AB%E9%80%9F%E5%BC%80%E5%A7%8B-)。**
+
+        **请填写以下信息：**
+  - type: checkboxes
+    attributes:
+      label: 是否已存在类似问题？
+      description: |
+        请务必检查此问题是否已有用户反馈。
+        
+        在提交新问题前，使用 GitHub 的问题搜索框（包括已关闭的问题）或通过 Google、StackOverflow 等工具搜索，确认该问题是否重复。
+        
+        您可能已经可以找到解决问题的方法！
+      options:
+        - label: 我已搜索现有问题
+          required: true
+  - type: textarea
+    attributes:
+      label: 当前行为
+      description: 描述您当前遇到的情况。
+      placeholder: |
+        MoneyPrinterTurbo 未按预期工作。当我执行某个操作时，视频未成功生成/程序报错了...
+    validations:
+      required: true
+  - type: textarea
+    attributes:
+      label: 预期行为
+      description: 描述您期望发生的情况。
+      placeholder: |
+        当我执行某个操作时，程序应当...
+    validations:
+      required: true
+  - type: textarea
+    attributes:
+      label: 重现步骤
+      description: 描述重现问题的步骤。描述的越详细，越有助于定位和修复问题。
+    validations:
+      required: true
+  - type: textarea
+    attributes:
+      label: 堆栈追踪/日志
+      description: |
+        如果您有任何堆栈追踪或日志，请将它们粘贴在此处。（注意不要包含敏感信息）
+    validations:
+      required: true
+  - type: input
+    attributes:
+      label: Python 版本
+      description: 您遇到此问题时使用的 Python 版本。
+      placeholder: v3.13.0, v3.10.0 等
+    validations:
+      required: true
+  - type: input
+    attributes:
+      label: 操作系统
+      description: 您使用 MoneyPrinterTurbo 遇到问题时的操作系统信息。
+      placeholder: macOS 14.1, Windows 11 等
+    validations:
+      required: true
+  - type: input
+    attributes:
+      label: MoneyPrinterTurbo 版本
+      description: 您在哪个版本的 MoneyPrinterTurbo 中遇到了此问题？
+      placeholder: v1.2.2 等
+    validations:
+      required: true
+  - type: textarea
+    attributes:
+      label: 其他信息
+      description: 您还有什么其他信息想补充吗？例如问题的截图或视频记录。
+    validations:
+      required: false
--- a/.github/ISSUE_TEMPLATE/config.yml
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1 @@
+blank_issues_enabled: false
--- a/.github/ISSUE_TEMPLATE/feature_request.yml
+++ b/.github/ISSUE_TEMPLATE/feature_request.yml
@@ -0,0 +1,38 @@
+name: ✨ 增加功能
+description: 为此项目提出一个新想法
+title: "请在此处填写标题"
+labels:
+  - enhancement
+
+body:
+  - type: checkboxes
+    attributes:
+      label: 是否已存在类似的功能请求？
+      description: 请确保此功能请求是否重复。
+      options:
+        - label: 我已搜索现有的功能请求
+          required: true
+  - type: textarea
+    attributes:
+      label: 痛点
+      description: 请解释您的功能请求。
+      placeholder: 我希望可以实现这一点
+    validations:
+      required: true
+  - type: textarea
+    attributes:
+      label: 建议的解决方案
+      description: 请描述您能想到的解决方案。
+      placeholder: 您可以添加这个功能 / 更改这个流程 / 使用某种方法
+    validations:
+      required: true
+  - type: textarea
+    attributes:
+      label: 有用的资源
+      description: 请提供一些有助于实现您建议的资源。
+  - type: textarea
+    attributes:
+      label: 其他信息
+      description: 您还有什么其他想补充的信息吗？例如问题的截图或视频记录。
+    validations:
+      required: false
--- a/.gitignore
+++ b/.gitignore
@@ -20,3 +20,9 @@ node_modules
 /sites/docs/.vuepress/.cache
 # VuePress 默认构建生成的静态文件目录
 /sites/docs/.vuepress/dist
+# 模型目录
+/models/
+./models/*
+
+venv/
+.venv
--- a/.pdm-python
+++ b/.pdm-python
@@ -0,0 +1 @@
+./MoneyPrinterTurbo/.venv/bin/python
--- a/4
+++ b/4
@@ -1,5 +1,5 @@
 # Use an official Python runtime as a parent image
-FROM python:3.10-slim-bullseye
+FROM python:3.11-slim-bullseye

 # Set the working directory in the container
 WORKDIR /MoneyPrinterTurbo
@@ -41,4 +41,4 @@ CMD ["streamlit", "run", "./webui/Main.py","--browser.serverAddress=127.0.0.1","
 ## For Linux or MacOS:
 # docker run -v $(pwd)/config.toml:/MoneyPrinterTurbo/config.toml -v $(pwd)/storage:/MoneyPrinterTurbo/storage -p 8501:8501 moneyprinterturbo
 ## For Windows:
-# docker run -v %cd%/config.toml:/MoneyPrinterTurbo/config.toml -v %cd%/storage:/MoneyPrinterTurbo/storage -p 8501:8501 moneyprinterturbo
+# docker run -v ${PWD}/config.toml:/MoneyPrinterTurbo/config.toml -v ${PWD}/storage:/MoneyPrinterTurbo/storage -p 8501:8501 moneyprinterturbo
--- a/README-en.md
+++ b/README-en.md
@@ -35,9 +35,18 @@ like to express our special thanks to
 **RecCloud (AI-Powered Multimedia Service Platform)** for providing a free `AI Video Generator` service based on this
 project. It allows for online use without deployment, which is very convenient.

-https://reccloud.com
+- Chinese version: https://reccloud.cn
+- English version: https://reccloud.com

-![](docs/reccloud.com.jpg)
+![](docs/reccloud.cn.jpg)
+
+## Thanks for Sponsorship 🙏
+
+Thanks to Picwish https://picwish.cn for supporting and sponsoring this project, enabling continuous updates and maintenance.
+
+Picwish focuses on the **image processing field**, providing a rich set of **image processing tools** that extremely simplify complex operations, truly making image processing easier.
+
+![picwish.jpg](docs/picwish.jpg)

 ## Features 🎯

@@ -51,28 +60,26 @@ https://reccloud.com
  satisfactory one
 - [x] Supports setting the **duration of video clips**, facilitating adjustments to material switching frequency
 - [x] Supports video copy in both **Chinese** and **English**
- [x] Supports **multiple voice** synthesis
+- [x] Supports **multiple voice** synthesis, with **real-time preview** of effects
 - [x] Supports **subtitle generation**, with adjustable `font`, `position`, `color`, `size`, and also
  supports `subtitle outlining`
 - [x] Supports **background music**, either random or specified music files, with adjustable `background music volume`
- [x] Video material sources are **high-definition** and **royalty-free**
- [x] Supports integration with various models such as **OpenAI**, **moonshot**, **Azure**, **gpt4free**, **one-api**,
-  **qianwen**, **Google Gemini**, **Ollama** and more
+- [x] Video material sources are **high-definition** and **royalty-free**, and you can also use your own **local materials**
+- [x] Supports integration with various models such as **OpenAI**, **Moonshot**, **Azure**, **gpt4free**, **one-api**,
+  **Qwen**, **Google Gemini**, **Ollama**, **DeepSeek**, **ERNIE** and more
+    - For users in China, it is recommended to use **DeepSeek** or **Moonshot** as the large model provider (directly accessible in China, no VPN needed. Free credits upon registration, generally sufficient for use)

 ❓[How to Use the Free OpenAI GPT-3.5 Model?](https://github.com/harry0703/MoneyPrinterTurbo/blob/main/README-en.md#common-questions-)

 ### Future Plans 📅

- [ ] Introduce support for GPT-SoVITS dubbing
- [ ] Enhance voice synthesis with large models for a more natural and emotionally resonant voice output
- [ ] Incorporate video transition effects to ensure a smoother viewing experience
- [ ] Improve the relevance of video content
- [ ] Add options for video length: short, medium, long
- [ ] Package the application into a one-click launch bundle for Windows and macOS for ease of use
- [ ] Enable the use of custom materials
- [ ] Offer voiceover and background music options with real-time preview
- [ ] Support a wider range of voice synthesis providers, such as OpenAI TTS, Azure TTS
- [ ] Automate the upload process to the YouTube platform
+- [ ] GPT-SoVITS dubbing support
+- [ ] Optimize voice synthesis using large models for more natural and emotionally rich voice output
+- [ ] Add video transition effects for a smoother viewing experience
+- [ ] Add more video material sources, improve the matching between video materials and script
+- [ ] Add video length options: short, medium, long
+- [ ] Support more voice synthesis providers, such as OpenAI TTS
+- [ ] Automate upload to YouTube platform

 ## Video Demos 📺

@@ -115,10 +122,27 @@ https://reccloud.com
 - Recommended minimum 4 CPU cores or more, 8G of memory or more, GPU is not required
 - Windows 10 or MacOS 11.0, and their later versions

+## Quick Start 🚀
+
+Download the one-click startup package, extract and use directly (the path should not contain **Chinese characters**, **special characters**, or **spaces**)
+
+### Windows
+- Baidu Netdisk (1.2.1 latest version): https://pan.baidu.com/s/1pSNjxTYiVENulTLm6zieMQ?pwd=g36q Extraction code: g36q
+
+After downloading, it is recommended to **double-click** `update.bat` first to update to the **latest code**, then double-click `start.bat` to launch
+
+After launching, the browser will open automatically (if it opens blank, it is recommended to use **Chrome** or **Edge**)
+
+### Other Systems
+
+One-click startup packages have not been created yet. See the **Installation & Deployment** section below. It is recommended to use **docker** for deployment, which is more convenient.
+
 ## Installation & Deployment 📥

+### Prerequisites
+
 - Try to avoid using **Chinese paths** to prevent unpredictable issues
- Ensure your **network** is stable, meaning you can access foreign websites normally
+- Ensure your **network** is stable, VPN needs to be in `global traffic` mode

 #### ① Clone the Project

@@ -132,11 +156,6 @@ git clone https://github.com/harry0703/MoneyPrinterTurbo.git
 - Follow the instructions in the `config.toml` file to configure `pexels_api_keys` and `llm_provider`, and according to
  the llm_provider's service provider, set up the corresponding API Key

-#### ③ Configure Large Language Models (LLM)
-
- To use `GPT-4.0` or `GPT-3.5`, you need an `API Key` from `OpenAI`. If you don't have one, you can set `llm_provider`
-  to `g4f` (a free-to-use GPT library https://github.com/xtekky/gpt4free)
-
 ### Docker Deployment 🐳

 #### ① Launch the Docker Container
@@ -152,6 +171,8 @@ cd MoneyPrinterTurbo
 docker-compose up
 ```

+> Note：The latest version of docker will automatically install docker compose in the form of a plug-in, and the start command is adjusted to `docker compose up `
+
 #### ② Access the Web Interface

 Open your browser and visit http://0.0.0.0:8501
@@ -162,27 +183,28 @@ Open your browser and visit http://0.0.0.0:8080/docs Or http://0.0.0.0:8080/redo

 ### Manual Deployment 📦

-#### ① Create a Python Virtual Environment
+> Video tutorials
+>
+> - Complete usage demonstration: https://v.douyin.com/iFhnwsKY/
+> - How to deploy on Windows: https://v.douyin.com/iFyjoW3M

-It is recommended to create a Python virtual environment
-using [conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html)
+#### ① Install Dependencies
+
+It is recommended to use [pdm](https://pdm-project.org/en/latest/#installation)

 ```shell
 git clone https://github.com/harry0703/MoneyPrinterTurbo.git
 cd MoneyPrinterTurbo
-conda create -n MoneyPrinterTurbo python=3.10
-conda activate MoneyPrinterTurbo
-pip install -r requirements.txt
+pdm sync
 ```

 #### ② Install ImageMagick

 ###### Windows:

- Download https://imagemagick.org/archive/binaries/ImageMagick-7.1.1-29-Q16-x64-static.exe
+- Download https://imagemagick.org/script/download.php Choose the Windows version, make sure to select the **static library** version, such as ImageMagick-7.1.1-32-Q16-x64-**static**.exe
 - Install the downloaded ImageMagick, **do not change the installation path**
- Modify the `config.toml` configuration file, set `imagemagick_path` to your actual installation path (if you didn't
-  change the path during installation, just uncomment it)
+- Modify the `config.toml` configuration file, set `imagemagick_path` to your actual installation path

 ###### MacOS:

@@ -209,14 +231,12 @@ Note that you need to execute the following commands in the `root directory` of
 ###### Windows

 ```bat
-conda activate MoneyPrinterTurbo
 webui.bat
 ```

 ###### MacOS or Linux

 ```shell
-conda activate MoneyPrinterTurbo
 sh webui.sh
 ```

@@ -235,13 +255,15 @@ online for a quick experience.

 A list of all supported voices can be viewed here: [Voice List](./docs/voice-list.txt)

+2024-04-16 v1.1.2 Added 9 new Azure voice synthesis voices that require API KEY configuration. These voices sound more realistic.
+
 ## Subtitle Generation 📜

 Currently, there are 2 ways to generate subtitles:

- edge: Faster generation speed, better performance, no specific requirements for computer configuration, but the
+- **edge**: Faster generation speed, better performance, no specific requirements for computer configuration, but the
  quality may be unstable
- whisper: Slower generation speed, poorer performance, specific requirements for computer configuration, but more
+- **whisper**: Slower generation speed, poorer performance, specific requirements for computer configuration, but more
  reliable quality

 You can switch between them by modifying the `subtitle_provider` in the `config.toml` configuration file
@@ -250,18 +272,22 @@ It is recommended to use `edge` mode, and switch to `whisper` mode if the qualit
 satisfactory.

 > Note:
-> If left blank, it means no subtitles will be generated.
+>
+> 1. In whisper mode, you need to download a model file from HuggingFace, about 3GB in size, please ensure good internet connectivity
+> 2. If left blank, it means no subtitles will be generated.

-**Download whisper**
- Please ensure a good internet connectivity
- `whisper` model can be downloaded from HuggingFace: https://huggingface.co/openai/whisper-large-v3/tree/main
+> Since HuggingFace is not accessible in China, you can use the following methods to download the `whisper-large-v3` model file

-After downloading the model to local machine, copy the whole folder and put it into the following path: `.\MoneyPrinterTurbo\models`
+Download links:

-This is what the final path should look like: `.\MoneyPrinterTurbo\models\whisper-large-v3`
+- Baidu Netdisk: https://pan.baidu.com/s/11h3Q6tsDtjQKTjUu3sc5cA?pwd=xjs9
+- Quark Netdisk: https://pan.quark.cn/s/3ee3d991d64b
+
+After downloading the model, extract it and place the entire directory in `.\MoneyPrinterTurbo\models`,
+The final file path should look like this: `.\MoneyPrinterTurbo\models\whisper-large-v3`

 ```
-MoneyPrinterTurbo  
+MoneyPrinterTurbo
  ├─models
  │   └─whisper-large-v3
  │          config.json
@@ -302,6 +328,16 @@ Once successfully started, modify the `config.toml` configuration as follows:
 - Change `openai_base_url` to `http://localhost:3040/v1/`
 - Set `openai_model_name` to `gpt-3.5-turbo`

+> Note: This method may be unstable
+
+### ❓AttributeError: 'str' object has no attribute 'choices'
+
+This issue is caused by the large language model not returning a correct response.
+
+It's likely a network issue. Use a **VPN**, or set `openai_base_url` to your proxy, which should solve the problem.
+
+At the same time, it is recommended to use **Moonshot** or **DeepSeek** as the large model provider, as these service providers have faster access and are more stable in China.
+
 ### ❓RuntimeError: No ffmpeg exe could be found

 Normally, ffmpeg will be automatically downloaded and detected.
@@ -326,14 +362,14 @@ ffmpeg_path = "C:\\Users\\harry\\Downloads\\ffmpeg.exe"
 [issue 56](https://github.com/harry0703/MoneyPrinterTurbo/issues/56)

 ```
-failed to generate audio, maybe the network is not available. 
+failed to generate audio, maybe the network is not available.
 if you are in China, please use a VPN.
 ```

 [issue 44](https://github.com/harry0703/MoneyPrinterTurbo/issues/44)

 ```
-failed to download videos, maybe the network is not available. 
+failed to download videos, maybe the network is not available.
 if you are in China, please use a VPN.
 ```

@@ -353,6 +389,43 @@ For Linux systems, you can manually install it, refer to https://cn.linux-consol

 Thanks to [@wangwenqiao666](https://github.com/wangwenqiao666) for their research and exploration

+### ❓ImageMagick's security policy prevents operations related to temporary file @/tmp/tmpur5hyyto.txt
+
+You can find these policies in ImageMagick's configuration file policy.xml.
+This file is usually located in /etc/ImageMagick-`X`/ or a similar location in the ImageMagick installation directory.
+Modify the entry containing `pattern="@"`, change `rights="none"` to `rights="read|write"` to allow read and write operations on files.
+
+### ❓OSError: [Errno 24] Too many open files
+
+This issue is caused by the system's limit on the number of open files. You can solve it by modifying the system's file open limit.
+
+Check the current limit:
+
+```shell
+ulimit -n
+```
+
+If it's too low, you can increase it, for example:
+
+```shell
+ulimit -n 10240
+```
+
+### ❓Whisper model download failed, with the following error
+
+LocalEntryNotfoundEror: Cannot find an appropriate cached snapshotfolderfor the specified revision on the local disk and
+outgoing trafic has been disabled.
+To enablerepo look-ups and downloads online, pass 'local files only=False' as input.
+
+or
+
+An error occured while synchronizing the model Systran/faster-whisper-large-v3 from the Hugging Face Hub:
+An error happened while trying to locate the files on the Hub and we cannot find the appropriate snapshot folder for the
+specified revision on the local disk. Please check your internet connection and try again.
+Trying to load the model directly from the local cache, if it exists.
+
+Solution: [Click to see how to manually download the model from netdisk](#subtitle-generation-)
+
 ## Feedback & Suggestions 📢

 - You can submit an [issue](https://github.com/harry0703/MoneyPrinterTurbo/issues) or
--- a/README.md
+++ b/README.md
@@ -59,7 +59,7 @@
 - [x] 支持 **背景音乐**，随机或者指定音乐文件，可设置`背景音乐音量`
 - [x] 视频素材来源 **高清**，而且 **无版权**，也可以使用自己的 **本地素材**
 - [x] 支持 **OpenAI**、**Moonshot**、**Azure**、**gpt4free**、**one-api**、**通义千问**、**Google Gemini**、**Ollama**、
-  **DeepSeek** 等多种模型接入
+  **DeepSeek**、 **文心一言** 等多种模型接入
    - 中国用户建议使用 **DeepSeek** 或 **Moonshot** 作为大模型提供商（国内可直接访问，不需要VPN。注册就送额度，基本够用）

 ### 后期计划 📅
@@ -72,10 +72,6 @@
 - [ ] 支持更多的语音合成服务商，比如 OpenAI TTS
 - [ ] 自动上传到YouTube平台

-## 交流讨论 💬
-
-<img src="docs/wechat-group.jpg" width="250">
-
 ## 视频演示 📺

 ### 竖屏 9:16
@@ -121,20 +117,15 @@

 ## 快速开始 🚀

-下载一键启动包，解压直接使用（路径不要有 **中文** 和 **空格**）
+下载一键启动包，解压直接使用（路径不要有 **中文**、**特殊字符**、**空格**）

 ### Windows
-
- 百度网盘: https://pan.baidu.com/s/1jKF1mgsjfN8fBk6uTEHArQ?pwd=jrp7 提取码: jrp7
+- 百度网盘（1.2.1 老版本）: https://pan.baidu.com/s/1pSNjxTYiVENulTLm6zieMQ?pwd=g36q 提取码: g36q

 下载后，建议先**双击执行** `update.bat` 更新到**最新代码**，然后双击 `start.bat` 启动

 启动后，会自动打开浏览器（如果打开是空白，建议换成 **Chrome** 或者 **Edge** 打开）

-### 其他系统
-
-还没有制作一键启动包，看下面的 **安装部署** 部分，建议使用 **docker** 部署，更加方便。
-
 ## 安装部署 📥

 ### 前提条件
@@ -148,7 +139,7 @@
 git clone https://github.com/harry0703/MoneyPrinterTurbo.git
 ```

-#### ② 修改配置文件
+#### ② 修改配置文件（可选，建议启动后也可以在 WebUI 里面配置）

 - 将 `config.example.toml` 文件复制一份，命名为 `config.toml`
 - 按照 `config.toml` 文件中的说明，配置好 `pexels_api_keys` 和 `llm_provider`，并根据 llm_provider 对应的服务商，配置相关的
@@ -170,6 +161,8 @@ cd MoneyPrinterTurbo
 docker-compose up
 ```

+> 注意：最新版的docker安装时会自动以插件的形式安装docker compose，启动命令调整为docker compose up
+
 #### ② 访问Web界面

 打开浏览器，访问 http://0.0.0.0:8501
@@ -185,16 +178,14 @@ docker-compose up
 - 完整的使用演示：https://v.douyin.com/iFhnwsKY/
 - 如何在Windows上部署：https://v.douyin.com/iFyjoW3M

-#### ① 创建虚拟环境
+#### ① 依赖安装

-建议使用 [conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) 创建 python 虚拟环境
+建议使用 [pdm](https://pdm-project.org/en/latest/#installation)

 ```shell
 git clone https://github.com/harry0703/MoneyPrinterTurbo.git
 cd MoneyPrinterTurbo
-conda create -n MoneyPrinterTurbo python=3.10
-conda activate MoneyPrinterTurbo
-pip install -r requirements.txt
+pdm sync
 ```

 #### ② 安装好 ImageMagick
@@ -225,14 +216,12 @@ pip install -r requirements.txt
 ###### Windows

 ```bat
-conda activate MoneyPrinterTurbo
 webui.bat
 ```

 ###### MacOS or Linux

 ```shell
-conda activate MoneyPrinterTurbo
 sh webui.sh
 ```

@@ -300,33 +289,6 @@ MoneyPrinterTurbo

 ## 常见问题 🤔

-### ❓如何使用免费的OpenAI GPT-3.5模型?
-
-[OpenAI宣布ChatGPT里面3.5已经免费了](https://openai.com/blog/start-using-chatgpt-instantly)，有开发者将其封装成了API，可以直接调用
-
-**确保你安装和启动了docker服务**，执行以下命令启动docker服务
-
-```shell
-docker run -p 3040:3040 missuo/freegpt35
-```
-
-启动成功后，修改 `config.toml` 中的配置
-
- `llm_provider` 设置为 `openai`
- `openai_api_key` 随便填写一个即可，比如 '123456'
- `openai_base_url` 改为 `http://localhost:3040/v1/`
- `openai_model_name` 改为 `gpt-3.5-turbo`
-
-> 注意：该方式稳定性较差
-
-### ❓AttributeError: 'str' object has no attribute 'choices'`
-
-这个问题是由于大模型没有返回正确的回复导致的。
-
-大概率是网络原因， 使用 **VPN**，或者设置 `openai_base_url` 为你的代理 ，应该就可以解决了。
-
-同时建议使用 **Moonshot** 或 **DeepSeek** 作为大模型提供商，这两个服务商在国内访问速度更快，更加稳定。
-
 ### ❓RuntimeError: No ffmpeg exe could be found

 通常情况下，ffmpeg 会被自动下载，并且会被自动检测到。
--- a/app/asgi.py
+++ b/app/asgi.py
@@ -1,12 +1,13 @@
 """Application implementation - ASGI."""
+
 import os

 from fastapi import FastAPI, Request
 from fastapi.exceptions import RequestValidationError
-from fastapi.responses import JSONResponse
-from loguru import logger
-from fastapi.staticfiles import StaticFiles
 from fastapi.middleware.cors import CORSMiddleware
+from fastapi.responses import JSONResponse
+from fastapi.staticfiles import StaticFiles
+from loguru import logger

 from app.config import config
 from app.models.exception import HttpException
@@ -24,7 +25,9 @@ def exception_handler(request: Request, e: HttpException):
 def validation_exception_handler(request: Request, e: RequestValidationError):
    return JSONResponse(
        status_code=400,
-        content=utils.get_response(status=400, data=e.errors(), message='field required'),
+        content=utils.get_response(
+            status=400, data=e.errors(), message="field required"
+        ),
    )


@@ -61,7 +64,9 @@ app.add_middleware(
 )

 task_dir = utils.task_dir()
-app.mount("/tasks", StaticFiles(directory=task_dir, html=True, follow_symlink=True), name="")
+app.mount(
+    "/tasks", StaticFiles(directory=task_dir, html=True, follow_symlink=True), name=""
+)

 public_dir = utils.public_dir()
 app.mount("/", StaticFiles(directory=public_dir, html=True), name="")
--- a/app/config/init.py
+++ b/app/config/init.py
@@ -10,7 +10,9 @@ from app.utils import utils
 def __init_logger():
    # _log_file = utils.storage_dir("logs/server.log")
    _lvl = config.log_level
-    root_dir = os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))
+    root_dir = os.path.dirname(
+        os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
+    )

    def format_record(record):
        # 获取日志记录中的文件全路径
@@ -21,10 +23,13 @@ def __init_logger():
        record["file"].path = f"./{relative_path}"
        # 返回修改后的格式字符串
        # 您可以根据需要调整这里的格式
-        _format = '<green>{time:%Y-%m-%d %H:%M:%S}</> | ' + \
-                  '<level>{level}</> | ' + \
-                  '"{file.path}:{line}":<blue> {function}</> ' + \
-                  '- <level>{message}</>' + "\n"
+        _format = (
+            "<green>{time:%Y-%m-%d %H:%M:%S}</> | "
+            + "<level>{level}</> | "
+            + '"{file.path}:{line}":<blue> {function}</> '
+            + "- <level>{message}</>"
+            + "\n"
+        )
        return _format

    logger.remove()
--- a/app/config/config.py
+++ b/app/config/config.py
@@ -1,7 +1,8 @@
 import os
-import socket
-import toml
 import shutil
+import socket
+
+import toml
 from loguru import logger

 root_dir = os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))
@@ -17,7 +18,7 @@ def load_config():
        example_file = f"{root_dir}/config.example.toml"
        if os.path.isfile(example_file):
            shutil.copyfile(example_file, config_file)
-            logger.info(f"copy config.example.toml to config.toml")
+            logger.info("copy config.example.toml to config.toml")

    logger.info(f"load config from file: {config_file}")

@@ -25,7 +26,7 @@ def load_config():
        _config_ = toml.load(config_file)
    except Exception as e:
        logger.warning(f"load config failed: {str(e)}, try to load as utf-8-sig")
-        with open(config_file, mode="r", encoding='utf-8-sig') as fp:
+        with open(config_file, mode="r", encoding="utf-8-sig") as fp:
            _cfg_content = fp.read()
            _config_ = toml.loads(_cfg_content)
    return _config_
@@ -35,6 +36,7 @@ def save_config():
    with open(config_file, "w", encoding="utf-8") as f:
        _cfg["app"] = app
        _cfg["azure"] = azure
+        _cfg["siliconflow"] = siliconflow
        _cfg["ui"] = ui
        f.write(toml.dumps(_cfg))

@@ -44,7 +46,13 @@ app = _cfg.get("app", {})
 whisper = _cfg.get("whisper", {})
 proxy = _cfg.get("proxy", {})
 azure = _cfg.get("azure", {})
-ui = _cfg.get("ui", {})
+siliconflow = _cfg.get("siliconflow", {})
+ui = _cfg.get(
+    "ui",
+    {
+        "hide_log": False,
+    },
+)

 hostname = socket.gethostname()

@@ -52,9 +60,11 @@ log_level = _cfg.get("log_level", "DEBUG")
 listen_host = _cfg.get("listen_host", "0.0.0.0")
 listen_port = _cfg.get("listen_port", 8080)
 project_name = _cfg.get("project_name", "MoneyPrinterTurbo")
-project_description = _cfg.get("project_description",
-                               "<a href='https://github.com/harry0703/MoneyPrinterTurbo'>https://github.com/harry0703/MoneyPrinterTurbo</a>")
-project_version = _cfg.get("project_version", "1.1.9")
+project_description = _cfg.get(
+    "project_description",
+    "<a href='https://github.com/harry0703/MoneyPrinterTurbo'>https://github.com/harry0703/MoneyPrinterTurbo</a>",
+)
+project_version = _cfg.get("project_version", "1.2.6")
 reload_debug = False

 imagemagick_path = app.get("imagemagick_path", "")
--- a/app/controllers/base.py
+++ b/app/controllers/base.py
@@ -7,14 +7,14 @@ from app.models.exception import HttpException


 def get_task_id(request: Request):
-    task_id = request.headers.get('x-task-id')
+    task_id = request.headers.get("x-task-id")
    if not task_id:
        task_id = uuid4()
    return str(task_id)


 def get_api_key(request: Request):
-    api_key = request.headers.get('x-api-key')
+    api_key = request.headers.get("x-api-key")
    return api_key


@@ -23,5 +23,9 @@ def verify_token(request: Request):
    if token != config.app.get("api_key", ""):
        request_id = get_task_id(request)
        request_url = request.url
-        user_agent = request.headers.get('user-agent')
-        raise HttpException(task_id=request_id, status_code=401, message=f"invalid token: {request_url}, {user_agent}")
+        user_agent = request.headers.get("user-agent")
+        raise HttpException(
+            task_id=request_id,
+            status_code=401,
+            message=f"invalid token: {request_url}, {user_agent}",
+        )
--- a/app/controllers/manager/base_manager.py
+++ b/app/controllers/manager/base_manager.py
@@ -1,5 +1,5 @@
 import threading
-from typing import Callable, Any, Dict
+from typing import Any, Callable, Dict


 class TaskManager:
@@ -18,28 +18,35 @@ class TaskManager:
                print(f"add task: {func.__name__}, current_tasks: {self.current_tasks}")
                self.execute_task(func, *args, **kwargs)
            else:
-                print(f"enqueue task: {func.__name__}, current_tasks: {self.current_tasks}")
+                print(
+                    f"enqueue task: {func.__name__}, current_tasks: {self.current_tasks}"
+                )
                self.enqueue({"func": func, "args": args, "kwargs": kwargs})

    def execute_task(self, func: Callable, *args: Any, **kwargs: Any):
-        thread = threading.Thread(target=self.run_task, args=(func, *args), kwargs=kwargs)
+        thread = threading.Thread(
+            target=self.run_task, args=(func, *args), kwargs=kwargs
+        )
        thread.start()

    def run_task(self, func: Callable, *args: Any, **kwargs: Any):
        try:
            with self.lock:
                self.current_tasks += 1
-            func(*args, **kwargs)  # 在这里调用函数，传递*args和**kwargs
+            func(*args, **kwargs)  # call the function here, passing *args and **kwargs.
        finally:
            self.task_done()

    def check_queue(self):
        with self.lock:
-            if self.current_tasks < self.max_concurrent_tasks and not self.is_queue_empty():
+            if (
+                self.current_tasks < self.max_concurrent_tasks
+                and not self.is_queue_empty()
+            ):
                task_info = self.dequeue()
-                func = task_info['func']
-                args = task_info.get('args', ())
-                kwargs = task_info.get('kwargs', {})
+                func = task_info["func"]
+                args = task_info.get("args", ())
+                kwargs = task_info.get("kwargs", {})
                self.execute_task(func, *args, **kwargs)

    def task_done(self):
--- a/app/controllers/manager/redis_manager.py
+++ b/app/controllers/manager/redis_manager.py
@@ -8,7 +8,7 @@ from app.models.schema import VideoParams
 from app.services import task as tm

 FUNC_MAP = {
-    'start': tm.start,
+    "start": tm.start,
    # 'start_test': tm.start_test
 }

@@ -24,11 +24,15 @@ class RedisTaskManager(TaskManager):
    def enqueue(self, task: Dict):
        task_with_serializable_params = task.copy()

-        if 'params' in task['kwargs'] and isinstance(task['kwargs']['params'], VideoParams):
-            task_with_serializable_params['kwargs']['params'] = task['kwargs']['params'].dict()
+        if "params" in task["kwargs"] and isinstance(
+            task["kwargs"]["params"], VideoParams
+        ):
+            task_with_serializable_params["kwargs"]["params"] = task["kwargs"][
+                "params"
+            ].dict()

        # 将函数对象转换为其名称
-        task_with_serializable_params['func'] = task['func'].__name__
+        task_with_serializable_params["func"] = task["func"].__name__
        self.redis_client.rpush(self.queue, json.dumps(task_with_serializable_params))

    def dequeue(self):
@@ -36,10 +40,14 @@ class RedisTaskManager(TaskManager):
        if task_json:
            task_info = json.loads(task_json)
            # 将函数名称转换回函数对象
-            task_info['func'] = FUNC_MAP[task_info['func']]
+            task_info["func"] = FUNC_MAP[task_info["func"]]

-            if 'params' in task_info['kwargs'] and isinstance(task_info['kwargs']['params'], dict):
-                task_info['kwargs']['params'] = VideoParams(**task_info['kwargs']['params'])
+            if "params" in task_info["kwargs"] and isinstance(
+                task_info["kwargs"]["params"], dict
+            ):
+                task_info["kwargs"]["params"] = VideoParams(
+                    **task_info["kwargs"]["params"]
+                )

            return task_info
        return None
--- a/app/controllers/ping.py
+++ b/app/controllers/ping.py
@@ -1,9 +1,13 @@
-from fastapi import APIRouter
-from fastapi import Request
+from fastapi import APIRouter, Request

 router = APIRouter()


-@router.get("/ping", tags=["Health Check"], description="检查服务可用性", response_description="pong")
+@router.get(
+    "/ping",
+    tags=["Health Check"],
+    description="检查服务可用性",
+    response_description="pong",
+)
 def ping(request: Request) -> str:
    return "pong"
--- a/app/controllers/v1/base.py
+++ b/app/controllers/v1/base.py
@@ -1,10 +1,10 @@
-from fastapi import APIRouter, Depends
+from fastapi import APIRouter


 def new_router(dependencies=None):
    router = APIRouter()
-    router.tags = ['V1']
-    router.prefix = '/api/v1'
+    router.tags = ["V1"]
+    router.prefix = "/api/v1"
    # 将认证依赖项应用于所有路由
    if dependencies:
        router.dependencies = dependencies
--- a/app/controllers/v1/llm.py
+++ b/app/controllers/v1/llm.py
@@ -1,31 +1,45 @@
 from fastapi import Request
+
 from app.controllers.v1.base import new_router
-from app.models.schema import VideoScriptResponse, VideoScriptRequest, VideoTermsResponse, VideoTermsRequest
+from app.models.schema import (
+    VideoScriptRequest,
+    VideoScriptResponse,
+    VideoTermsRequest,
+    VideoTermsResponse,
+)
 from app.services import llm
 from app.utils import utils

-# 认证依赖项
+# authentication dependency
 # router = new_router(dependencies=[Depends(base.verify_token)])
 router = new_router()


-@router.post("/scripts", response_model=VideoScriptResponse, summary="Create a script for the video")
+@router.post(
+    "/scripts",
+    response_model=VideoScriptResponse,
+    summary="Create a script for the video",
+)
 def generate_video_script(request: Request, body: VideoScriptRequest):
-    video_script = llm.generate_script(video_subject=body.video_subject,
-                                       language=body.video_language,
-                                       paragraph_number=body.paragraph_number)
-    response = {
-        "video_script": video_script
-    }
+    video_script = llm.generate_script(
+        video_subject=body.video_subject,
+        language=body.video_language,
+        paragraph_number=body.paragraph_number,
+    )
+    response = {"video_script": video_script}
    return utils.get_response(200, response)


-@router.post("/terms", response_model=VideoTermsResponse, summary="Generate video terms based on the video script")
+@router.post(
+    "/terms",
+    response_model=VideoTermsResponse,
+    summary="Generate video terms based on the video script",
+)
 def generate_video_terms(request: Request, body: VideoTermsRequest):
-    video_terms = llm.generate_terms(video_subject=body.video_subject,
-                                     video_script=body.video_script,
-                                     amount=body.amount)
-    response = {
-        "video_terms": video_terms
-    }
+    video_terms = llm.generate_terms(
+        video_subject=body.video_subject,
+        video_script=body.video_script,
+        amount=body.amount,
+    )
+    response = {"video_terms": video_terms}
    return utils.get_response(200, response)
--- a/app/controllers/v1/video.py
+++ b/app/controllers/v1/video.py
@@ -1,11 +1,12 @@
-import os
 import glob
+import os
 import pathlib
 import shutil
+from typing import Union

-from fastapi import Request, Depends, Path, BackgroundTasks, UploadFile
-from fastapi.responses import FileResponse, StreamingResponse
+from fastapi import BackgroundTasks, Depends, Path, Request, UploadFile
 from fastapi.params import File
+from fastapi.responses import FileResponse, StreamingResponse
 from loguru import logger

 from app.config import config
@@ -14,10 +15,19 @@ from app.controllers.manager.memory_manager import InMemoryTaskManager
 from app.controllers.manager.redis_manager import RedisTaskManager
 from app.controllers.v1.base import new_router
 from app.models.exception import HttpException
-from app.models.schema import TaskVideoRequest, TaskQueryResponse, TaskResponse, TaskQueryRequest, \
-    BgmUploadResponse, BgmRetrieveResponse, TaskDeletionResponse
-from app.services import task as tm
+from app.models.schema import (
+    AudioRequest,
+    BgmRetrieveResponse,
+    BgmUploadResponse,
+    SubtitleRequest,
+    TaskDeletionResponse,
+    TaskQueryRequest,
+    TaskQueryResponse,
+    TaskResponse,
+    TaskVideoRequest,
+)
 from app.services import state as sm
+from app.services import task as tm
 from app.utils import utils

 # 认证依赖项
@@ -34,48 +44,81 @@ _max_concurrent_tasks = config.app.get("max_concurrent_tasks", 5)
 redis_url = f"redis://:{_redis_password}@{_redis_host}:{_redis_port}/{_redis_db}"
 # 根据配置选择合适的任务管理器
 if _enable_redis:
-    task_manager = RedisTaskManager(max_concurrent_tasks=_max_concurrent_tasks, redis_url=redis_url)
+    task_manager = RedisTaskManager(
+        max_concurrent_tasks=_max_concurrent_tasks, redis_url=redis_url
+    )
 else:
    task_manager = InMemoryTaskManager(max_concurrent_tasks=_max_concurrent_tasks)

-# @router.post("/videos-test", response_model=TaskResponse, summary="Generate a short video")
-# async def create_video_test(request: Request, body: TaskVideoRequest):
-#     task_id = utils.get_uuid()
-#     request_id = base.get_task_id(request)
-#     try:
-#         task = {
-#             "task_id": task_id,
-#             "request_id": request_id,
-#             "params": body.dict(),
-#         }
-#         task_manager.add_task(tm.start_test, task_id=task_id, params=body)
-#         return utils.get_response(200, task)
-#     except ValueError as e:
-#         raise HttpException(task_id=task_id, status_code=400, message=f"{request_id}: {str(e)}")
-

@router.post("/videos", response_model=TaskResponse, summary="Generate a short video")
-def create_video(background_tasks: BackgroundTasks, request: Request, body: TaskVideoRequest):
+def create_video(
+    background_tasks: BackgroundTasks, request: Request, body: TaskVideoRequest
+):
+    return create_task(request, body, stop_at="video")
+
+
+@router.post("/subtitle", response_model=TaskResponse, summary="Generate subtitle only")
+def create_subtitle(
+    background_tasks: BackgroundTasks, request: Request, body: SubtitleRequest
+):
+    return create_task(request, body, stop_at="subtitle")
+
+
+@router.post("/audio", response_model=TaskResponse, summary="Generate audio only")
+def create_audio(
+    background_tasks: BackgroundTasks, request: Request, body: AudioRequest
+):
+    return create_task(request, body, stop_at="audio")
+
+
+def create_task(
+    request: Request,
+    body: Union[TaskVideoRequest, SubtitleRequest, AudioRequest],
+    stop_at: str,
+):
    task_id = utils.get_uuid()
    request_id = base.get_task_id(request)
    try:
        task = {
            "task_id": task_id,
            "request_id": request_id,
-            "params": body.dict(),
+            "params": body.model_dump(),
        }
        sm.state.update_task(task_id)
-        # background_tasks.add_task(tm.start, task_id=task_id, params=body)
-        task_manager.add_task(tm.start, task_id=task_id, params=body)
-        logger.success(f"video created: {utils.to_json(task)}")
+        task_manager.add_task(tm.start, task_id=task_id, params=body, stop_at=stop_at)
+        logger.success(f"Task created: {utils.to_json(task)}")
        return utils.get_response(200, task)
    except ValueError as e:
-        raise HttpException(task_id=task_id, status_code=400, message=f"{request_id}: {str(e)}")
+        raise HttpException(
+            task_id=task_id, status_code=400, message=f"{request_id}: {str(e)}"
+        )
+
+from fastapi import Query
+
+@router.get("/tasks", response_model=TaskQueryResponse, summary="Get all tasks")
+def get_all_tasks(request: Request, page: int = Query(1, ge=1), page_size: int = Query(10, ge=1)):
+    request_id = base.get_task_id(request)
+    tasks, total = sm.state.get_all_tasks(page, page_size)
+
+    response = {
+        "tasks": tasks,
+        "total": total,
+        "page": page,
+        "page_size": page_size,
+    }
+    return utils.get_response(200, response)


-@router.get("/tasks/{task_id}", response_model=TaskQueryResponse, summary="Query task status")
-def get_task(request: Request, task_id: str = Path(..., description="Task ID"),
-             query: TaskQueryRequest = Depends()):
+
+@router.get(
+    "/tasks/{task_id}", response_model=TaskQueryResponse, summary="Query task status"
+)
+def get_task(
+    request: Request,
+    task_id: str = Path(..., description="Task ID"),
+    query: TaskQueryRequest = Depends(),
+):
    endpoint = config.app.get("endpoint", "")
    if not endpoint:
        endpoint = str(request.base_url)
@@ -108,10 +151,16 @@ def get_task(request: Request, task_id: str = Path(..., description="Task ID"),
            task["combined_videos"] = urls
        return utils.get_response(200, task)

-    raise HttpException(task_id=task_id, status_code=404, message=f"{request_id}: task not found")
+    raise HttpException(
+        task_id=task_id, status_code=404, message=f"{request_id}: task not found"
+    )


-@router.delete("/tasks/{task_id}", response_model=TaskDeletionResponse, summary="Delete a generated short video task")
+@router.delete(
+    "/tasks/{task_id}",
+    response_model=TaskDeletionResponse,
+    summary="Delete a generated short video task",
+)
 def delete_video(request: Request, task_id: str = Path(..., description="Task ID")):
    request_id = base.get_task_id(request)
    task = sm.state.get_task(task_id)
@@ -125,32 +174,40 @@ def delete_video(request: Request, task_id: str = Path(..., description="Task ID
        logger.success(f"video deleted: {utils.to_json(task)}")
        return utils.get_response(200)

-    raise HttpException(task_id=task_id, status_code=404, message=f"{request_id}: task not found")
+    raise HttpException(
+        task_id=task_id, status_code=404, message=f"{request_id}: task not found"
+    )


-@router.get("/musics", response_model=BgmRetrieveResponse, summary="Retrieve local BGM files")
+@router.get(
+    "/musics", response_model=BgmRetrieveResponse, summary="Retrieve local BGM files"
+)
 def get_bgm_list(request: Request):
    suffix = "*.mp3"
    song_dir = utils.song_dir()
    files = glob.glob(os.path.join(song_dir, suffix))
    bgm_list = []
    for file in files:
-        bgm_list.append({
-            "name": os.path.basename(file),
-            "size": os.path.getsize(file),
-            "file": file,
-        })
-    response = {
-        "files": bgm_list
-    }
+        bgm_list.append(
+            {
+                "name": os.path.basename(file),
+                "size": os.path.getsize(file),
+                "file": file,
+            }
+        )
+    response = {"files": bgm_list}
    return utils.get_response(200, response)


-@router.post("/musics", response_model=BgmUploadResponse, summary="Upload the BGM file to the songs directory")
+@router.post(
+    "/musics",
+    response_model=BgmUploadResponse,
+    summary="Upload the BGM file to the songs directory",
+)
 def upload_bgm_file(request: Request, file: UploadFile = File(...)):
    request_id = base.get_task_id(request)
    # check file ext
-    if file.filename.endswith('mp3'):
+    if file.filename.endswith("mp3"):
        song_dir = utils.song_dir()
        save_path = os.path.join(song_dir, file.filename)
        # save file
@@ -158,26 +215,26 @@ def upload_bgm_file(request: Request, file: UploadFile = File(...)):
            # If the file already exists, it will be overwritten
            file.file.seek(0)
            buffer.write(file.file.read())
-        response = {
-            "file": save_path
-        }
+        response = {"file": save_path}
        return utils.get_response(200, response)

-    raise HttpException('', status_code=400, message=f"{request_id}: Only *.mp3 files can be uploaded")
+    raise HttpException(
+        "", status_code=400, message=f"{request_id}: Only *.mp3 files can be uploaded"
+    )


@router.get("/stream/{file_path:path}")
 async def stream_video(request: Request, file_path: str):
    tasks_dir = utils.task_dir()
    video_path = os.path.join(tasks_dir, file_path)
-    range_header = request.headers.get('Range')
+    range_header = request.headers.get("Range")
    video_size = os.path.getsize(video_path)
    start, end = 0, video_size - 1

    length = video_size
    if range_header:
-        range_ = range_header.split('bytes=')[1]
-        start, end = [int(part) if part else None for part in range_.split('-')]
+        range_ = range_header.split("bytes=")[1]
+        start, end = [int(part) if part else None for part in range_.split("-")]
        if start is None:
            start = video_size - end
            end = video_size - 1
@@ -186,7 +243,7 @@ async def stream_video(request: Request, file_path: str):
        length = end - start + 1

    def file_iterator(file_path, offset=0, bytes_to_read=None):
-        with open(file_path, 'rb') as f:
+        with open(file_path, "rb") as f:
            f.seek(offset, os.SEEK_SET)
            remaining = bytes_to_read or video_size
            while remaining > 0:
@@ -197,10 +254,12 @@ async def stream_video(request: Request, file_path: str):
                remaining -= len(data)
                yield data

-    response = StreamingResponse(file_iterator(video_path, start, length), media_type='video/mp4')
-    response.headers['Content-Range'] = f'bytes {start}-{end}/{video_size}'
-    response.headers['Accept-Ranges'] = 'bytes'
-    response.headers['Content-Length'] = str(length)
+    response = StreamingResponse(
+        file_iterator(video_path, start, length), media_type="video/mp4"
+    )
+    response.headers["Content-Range"] = f"bytes {start}-{end}/{video_size}"
+    response.headers["Accept-Ranges"] = "bytes"
+    response.headers["Content-Length"] = str(length)
    response.status_code = 206  # Partial Content

    return response
@@ -219,8 +278,10 @@ async def download_video(_: Request, file_path: str):
    file_path = pathlib.Path(video_path)
    filename = file_path.stem
    extension = file_path.suffix
-    headers = {
-        "Content-Disposition": f"attachment; filename={filename}{extension}"
-    }
-    return FileResponse(path=video_path, headers=headers, filename=f"{filename}{extension}",
-                        media_type=f'video/{extension[1:]}')
+    headers = {"Content-Disposition": f"attachment; filename={filename}{extension}"}
+    return FileResponse(
+        path=video_path,
+        headers=headers,
+        filename=f"{filename}{extension}",
+        media_type=f"video/{extension[1:]}",
+    )
--- a/app/models/const.py
+++ b/app/models/const.py
@@ -1,11 +1,25 @@
 PUNCTUATIONS = [
-    "?", ",", ".", "、", ";", ":", "!", "…",
-    "？", "，", "。", "、", "；", "：", "！", "...",
+    "?",
+    ",",
+    ".",
+    "、",
+    ";",
+    ":",
+    "!",
+    "…",
+    "？",
+    "，",
+    "。",
+    "、",
+    "；",
+    "：",
+    "！",
+    "...",
 ]

 TASK_STATE_FAILED = -1
 TASK_STATE_COMPLETE = 1
 TASK_STATE_PROCESSING = 4

-FILE_TYPE_VIDEOS = ['mp4', 'mov', 'mkv', 'webm']
-FILE_TYPE_IMAGES = ['jpg', 'jpeg', 'png', 'bmp']
+FILE_TYPE_VIDEOS = ["mp4", "mov", "mkv", "webm"]
+FILE_TYPE_IMAGES = ["jpg", "jpeg", "png", "bmp"]
--- a/app/models/exception.py
+++ b/app/models/exception.py
@@ -5,16 +5,18 @@ from loguru import logger


 class HttpException(Exception):
-    def __init__(self, task_id: str, status_code: int, message: str = '', data: Any = None):
+    def __init__(
+        self, task_id: str, status_code: int, message: str = "", data: Any = None
+    ):
        self.message = message
        self.status_code = status_code
        self.data = data
-        # 获取异常堆栈信息
+        # Retrieve the exception stack trace information.
        tb_str = traceback.format_exc().strip()
        if not tb_str or tb_str == "NoneType: None":
-            msg = f'HttpException: {status_code}, {task_id}, {message}'
+            msg = f"HttpException: {status_code}, {task_id}, {message}"
        else:
-            msg = f'HttpException: {status_code}, {task_id}, {message}\n{tb_str}'
+            msg = f"HttpException: {status_code}, {task_id}, {message}\n{tb_str}"

        if status_code == 400:
            logger.warning(msg)
--- a/app/models/schema.py
+++ b/app/models/schema.py
@@ -1,12 +1,16 @@
+import warnings
 from enum import Enum
-from typing import Any, Optional, List
+from typing import Any, List, Optional, Union

 import pydantic
 from pydantic import BaseModel
-import warnings

 # 忽略 Pydantic 的特定警告
-warnings.filterwarnings("ignore", category=UserWarning, message="Field name.*shadows an attribute in parent.*")
+warnings.filterwarnings(
+    "ignore",
+    category=UserWarning,
+    message="Field name.*shadows an attribute in parent.*",
+)


 class VideoConcatMode(str, Enum):
@@ -14,6 +18,15 @@ class VideoConcatMode(str, Enum):
    sequential = "sequential"


+class VideoTransitionMode(str, Enum):
+    none = None
+    shuffle = "Shuffle"
+    fade_in = "FadeIn"
+    fade_out = "FadeOut"
+    slide_in = "SlideIn"
+    slide_out = "SlideOut"
+
+
 class VideoAspect(str, Enum):
    landscape = "16:9"
    portrait = "9:16"
@@ -40,45 +53,6 @@ class MaterialInfo:
    duration: int = 0


-# VoiceNames = [
-#     # zh-CN
-#     "female-zh-CN-XiaoxiaoNeural",
-#     "female-zh-CN-XiaoyiNeural",
-#     "female-zh-CN-liaoning-XiaobeiNeural",
-#     "female-zh-CN-shaanxi-XiaoniNeural",
-#
-#     "male-zh-CN-YunjianNeural",
-#     "male-zh-CN-YunxiNeural",
-#     "male-zh-CN-YunxiaNeural",
-#     "male-zh-CN-YunyangNeural",
-#
-#     # "female-zh-HK-HiuGaaiNeural",
-#     # "female-zh-HK-HiuMaanNeural",
-#     # "male-zh-HK-WanLungNeural",
-#     #
-#     # "female-zh-TW-HsiaoChenNeural",
-#     # "female-zh-TW-HsiaoYuNeural",
-#     # "male-zh-TW-YunJheNeural",
-#
-#     # en-US
-#
-#     "female-en-US-AnaNeural",
-#     "female-en-US-AriaNeural",
-#     "female-en-US-AvaNeural",
-#     "female-en-US-EmmaNeural",
-#     "female-en-US-JennyNeural",
-#     "female-en-US-MichelleNeural",
-#
-#     "male-en-US-AndrewNeural",
-#     "male-en-US-BrianNeural",
-#     "male-en-US-ChristopherNeural",
-#     "male-en-US-EricNeural",
-#     "male-en-US-GuyNeural",
-#     "male-en-US-RogerNeural",
-#     "male-en-US-SteffanNeural",
-# ]
-
-
 class VideoParams(BaseModel):
    """
    {
@@ -93,30 +67,36 @@ class VideoParams(BaseModel):
      "stroke_width": 1.5
    }
    """
+
    video_subject: str
-    video_script: str = ""  # 用于生成视频的脚本
-    video_terms: Optional[str | list] = None  # 用于生成视频的关键词
+    video_script: str = ""  # Script used to generate the video
+    video_terms: Optional[str | list] = None  # Keywords used to generate the video
    video_aspect: Optional[VideoAspect] = VideoAspect.portrait.value
    video_concat_mode: Optional[VideoConcatMode] = VideoConcatMode.random.value
+    video_transition_mode: Optional[VideoTransitionMode] = None
    video_clip_duration: Optional[int] = 5
    video_count: Optional[int] = 1

    video_source: Optional[str] = "pexels"
-    video_materials: Optional[List[MaterialInfo]] = None  # 用于生成视频的素材
+    video_materials: Optional[List[MaterialInfo]] = (
+        None  # Materials used to generate the video
+    )

    video_language: Optional[str] = ""  # auto detect

    voice_name: Optional[str] = ""
    voice_volume: Optional[float] = 1.0
+    voice_rate: Optional[float] = 1.0
    bgm_type: Optional[str] = "random"
    bgm_file: Optional[str] = ""
    bgm_volume: Optional[float] = 0.2

    subtitle_enabled: Optional[bool] = True
    subtitle_position: Optional[str] = "bottom"  # top, bottom, center
+    custom_position: float = 70.0
    font_name: Optional[str] = "STHeitiMedium.ttc"
    text_fore_color: Optional[str] = "#FFFFFF"
-    text_background_color: Optional[str] = "transparent"
+    text_background_color: Union[bool, str] = True

    font_size: int = 60
    stroke_color: Optional[str] = "#000000"
@@ -125,6 +105,38 @@ class VideoParams(BaseModel):
    paragraph_number: Optional[int] = 1


+class SubtitleRequest(BaseModel):
+    video_script: str
+    video_language: Optional[str] = ""
+    voice_name: Optional[str] = "zh-CN-XiaoxiaoNeural-Female"
+    voice_volume: Optional[float] = 1.0
+    voice_rate: Optional[float] = 1.2
+    bgm_type: Optional[str] = "random"
+    bgm_file: Optional[str] = ""
+    bgm_volume: Optional[float] = 0.2
+    subtitle_position: Optional[str] = "bottom"
+    font_name: Optional[str] = "STHeitiMedium.ttc"
+    text_fore_color: Optional[str] = "#FFFFFF"
+    text_background_color: Union[bool, str] = True
+    font_size: int = 60
+    stroke_color: Optional[str] = "#000000"
+    stroke_width: float = 1.5
+    video_source: Optional[str] = "local"
+    subtitle_enabled: Optional[str] = "true"
+
+
+class AudioRequest(BaseModel):
+    video_script: str
+    video_language: Optional[str] = ""
+    voice_name: Optional[str] = "zh-CN-XiaoxiaoNeural-Female"
+    voice_volume: Optional[float] = 1.0
+    voice_rate: Optional[float] = 1.2
+    bgm_type: Optional[str] = "random"
+    bgm_file: Optional[str] = ""
+    bgm_volume: Optional[float] = 0.2
+    video_source: Optional[str] = "local"
+
+
 class VideoScriptParams:
    """
    {
@@ -133,6 +145,7 @@ class VideoScriptParams:
      "paragraph_number": 1
    }
    """
+
    video_subject: Optional[str] = "春天的花海"
    video_language: Optional[str] = ""
    paragraph_number: Optional[int] = 1
@@ -146,14 +159,17 @@ class VideoTermsParams:
      "amount": 5
    }
    """
+
    video_subject: Optional[str] = "春天的花海"
-    video_script: Optional[str] = "春天的花海，如诗如画般展现在眼前。万物复苏的季节里，大地披上了一袭绚丽多彩的盛装。金黄的迎春、粉嫩的樱花、洁白的梨花、艳丽的郁金香……"
+    video_script: Optional[str] = (
+        "春天的花海，如诗如画般展现在眼前。万物复苏的季节里，大地披上了一袭绚丽多彩的盛装。金黄的迎春、粉嫩的樱花、洁白的梨花、艳丽的郁金香……"
+    )
    amount: Optional[int] = 5


 class BaseResponse(BaseModel):
    status: int = 200
-    message: Optional[str] = 'success'
+    message: Optional[str] = "success"
    data: Any = None


@@ -188,9 +204,7 @@ class TaskResponse(BaseResponse):
            "example": {
                "status": 200,
                "message": "success",
-                "data": {
-                    "task_id": "6c85c8cc-a77a-42b9-bc30-947815aa0558"
-                }
+                "data": {"task_id": "6c85c8cc-a77a-42b9-bc30-947815aa0558"},
            },
        }

@@ -209,8 +223,8 @@ class TaskQueryResponse(BaseResponse):
                    ],
                    "combined_videos": [
                        "http://127.0.0.1:8080/tasks/6c85c8cc-a77a-42b9-bc30-947815aa0558/combined-1.mp4"
-                    ]
-                }
+                    ],
+                },
            },
        }

@@ -229,8 +243,8 @@ class TaskDeletionResponse(BaseResponse):
                    ],
                    "combined_videos": [
                        "http://127.0.0.1:8080/tasks/6c85c8cc-a77a-42b9-bc30-947815aa0558/combined-1.mp4"
-                    ]
-                }
+                    ],
+                },
            },
        }

@@ -243,7 +257,7 @@ class VideoScriptResponse(BaseResponse):
                "message": "success",
                "data": {
                    "video_script": "春天的花海，是大自然的一幅美丽画卷。在这个季节里，大地复苏，万物生长，花朵争相绽放，形成了一片五彩斑斓的花海..."
-                }
+                },
            },
        }

@@ -254,9 +268,7 @@ class VideoTermsResponse(BaseResponse):
            "example": {
                "status": 200,
                "message": "success",
-                "data": {
-                    "video_terms": ["sky", "tree"]
-                }
+                "data": {"video_terms": ["sky", "tree"]},
            },
        }

@@ -272,10 +284,10 @@ class BgmRetrieveResponse(BaseResponse):
                        {
                            "name": "output013.mp3",
                            "size": 1891269,
-                            "file": "/MoneyPrinterTurbo/resource/songs/output013.mp3"
+                            "file": "/MoneyPrinterTurbo/resource/songs/output013.mp3",
                        }
                    ]
-                }
+                },
            },
        }

@@ -286,8 +298,6 @@ class BgmUploadResponse(BaseResponse):
            "example": {
                "status": 200,
                "message": "success",
-                "data": {
-                    "file": "/MoneyPrinterTurbo/resource/songs/example.mp3"
-                }
+                "data": {"file": "/MoneyPrinterTurbo/resource/songs/example.mp3"},
            },
        }
--- a/app/router.py
+++ b/app/router.py
@@ -6,9 +6,10 @@ Resources:
    1. https://fastapi.tiangolo.com/tutorial/bigger-applications

 """
+
 from fastapi import APIRouter

-from app.controllers.v1 import video, llm
+from app.controllers.v1 import llm, video

 root_api_router = APIRouter()
 # v1
--- a/app/services/llm.py
+++ b/app/services/llm.py
@@ -1,10 +1,11 @@
+import json
 import logging
 import re
-import json
 from typing import List
+
+import g4f
 from loguru import logger
-from openai import OpenAI
-from openai import AzureOpenAI
+from openai import AzureOpenAI, OpenAI
 from openai.types.chat import ChatCompletion

 from app.config import config
@@ -13,189 +14,254 @@ _max_retries = 5


 def _generate_response(prompt: str) -> str:
-    content = ""
-    llm_provider = config.app.get("llm_provider", "openai")
-    logger.info(f"llm provider: {llm_provider}")
-    if llm_provider == "g4f":
-        model_name = config.app.get("g4f_model_name", "")
-        if not model_name:
-            model_name = "gpt-3.5-turbo-16k-0613"
-        import g4f
-        content = g4f.ChatCompletion.create(
-            model=model_name,
-            messages=[{"role": "user", "content": prompt}],
-        )
-    else:
-        api_version = ""  # for azure
-        if llm_provider == "moonshot":
-            api_key = config.app.get("moonshot_api_key")
-            model_name = config.app.get("moonshot_model_name")
-            base_url = "https://api.moonshot.cn/v1"
-        elif llm_provider == "ollama":
-            # api_key = config.app.get("openai_api_key")
-            api_key = "ollama"  # any string works but you are required to have one
-            model_name = config.app.get("ollama_model_name")
-            base_url = config.app.get("ollama_base_url", "")
-            if not base_url:
-                base_url = "http://localhost:11434/v1"
-        elif llm_provider == "openai":
-            api_key = config.app.get("openai_api_key")
-            model_name = config.app.get("openai_model_name")
-            base_url = config.app.get("openai_base_url", "")
-            if not base_url:
-                base_url = "https://api.openai.com/v1"
-        elif llm_provider == "oneapi":
-            api_key = config.app.get("oneapi_api_key")
-            model_name = config.app.get("oneapi_model_name")
-            base_url = config.app.get("oneapi_base_url", "")
-        elif llm_provider == "azure":
-            api_key = config.app.get("azure_api_key")
-            model_name = config.app.get("azure_model_name")
-            base_url = config.app.get("azure_base_url", "")
-            api_version = config.app.get("azure_api_version", "2024-02-15-preview")
-        elif llm_provider == "gemini":
-            api_key = config.app.get("gemini_api_key")
-            model_name = config.app.get("gemini_model_name")
-            base_url = "***"
-        elif llm_provider == "qwen":
-            api_key = config.app.get("qwen_api_key")
-            model_name = config.app.get("qwen_model_name")
-            base_url = "***"
-        elif llm_provider == "cloudflare":
-            api_key = config.app.get("cloudflare_api_key")
-            model_name = config.app.get("cloudflare_model_name")
-            account_id = config.app.get("cloudflare_account_id")
-            base_url = "***"
-        elif llm_provider == "deepseek":
-            api_key = config.app.get("deepseek_api_key")
-            model_name = config.app.get("deepseek_model_name")
-            base_url = config.app.get("deepseek_base_url")
-            if not base_url:
-                base_url = "https://api.deepseek.com"
-        else:
-            raise ValueError("llm_provider is not set, please set it in the config.toml file.")
-
-        if not api_key:
-            raise ValueError(f"{llm_provider}: api_key is not set, please set it in the config.toml file.")
-        if not model_name:
-            raise ValueError(f"{llm_provider}: model_name is not set, please set it in the config.toml file.")
-        if not base_url:
-            raise ValueError(f"{llm_provider}: base_url is not set, please set it in the config.toml file.")
-
-        if llm_provider == "qwen":
-            import dashscope
-            from dashscope.api_entities.dashscope_response import GenerationResponse
-            dashscope.api_key = api_key
-            response = dashscope.Generation.call(
+    try:
+        content = ""
+        llm_provider = config.app.get("llm_provider", "openai")
+        logger.info(f"llm provider: {llm_provider}")
+        if llm_provider == "g4f":
+            model_name = config.app.get("g4f_model_name", "")
+            if not model_name:
+                model_name = "gpt-3.5-turbo-16k-0613"
+            content = g4f.ChatCompletion.create(
                model=model_name,
-                messages=[{"role": "user", "content": prompt}]
+                messages=[{"role": "user", "content": prompt}],
+            )
+        else:
+            api_version = ""  # for azure
+            if llm_provider == "moonshot":
+                api_key = config.app.get("moonshot_api_key")
+                model_name = config.app.get("moonshot_model_name")
+                base_url = "https://api.moonshot.cn/v1"
+            elif llm_provider == "ollama":
+                # api_key = config.app.get("openai_api_key")
+                api_key = "ollama"  # any string works but you are required to have one
+                model_name = config.app.get("ollama_model_name")
+                base_url = config.app.get("ollama_base_url", "")
+                if not base_url:
+                    base_url = "http://localhost:11434/v1"
+            elif llm_provider == "openai":
+                api_key = config.app.get("openai_api_key")
+                model_name = config.app.get("openai_model_name")
+                base_url = config.app.get("openai_base_url", "")
+                if not base_url:
+                    base_url = "https://api.openai.com/v1"
+            elif llm_provider == "oneapi":
+                api_key = config.app.get("oneapi_api_key")
+                model_name = config.app.get("oneapi_model_name")
+                base_url = config.app.get("oneapi_base_url", "")
+            elif llm_provider == "azure":
+                api_key = config.app.get("azure_api_key")
+                model_name = config.app.get("azure_model_name")
+                base_url = config.app.get("azure_base_url", "")
+                api_version = config.app.get("azure_api_version", "2024-02-15-preview")
+            elif llm_provider == "gemini":
+                api_key = config.app.get("gemini_api_key")
+                model_name = config.app.get("gemini_model_name")
+                base_url = "***"
+            elif llm_provider == "qwen":
+                api_key = config.app.get("qwen_api_key")
+                model_name = config.app.get("qwen_model_name")
+                base_url = "***"
+            elif llm_provider == "cloudflare":
+                api_key = config.app.get("cloudflare_api_key")
+                model_name = config.app.get("cloudflare_model_name")
+                account_id = config.app.get("cloudflare_account_id")
+                base_url = "***"
+            elif llm_provider == "deepseek":
+                api_key = config.app.get("deepseek_api_key")
+                model_name = config.app.get("deepseek_model_name")
+                base_url = config.app.get("deepseek_base_url")
+                if not base_url:
+                    base_url = "https://api.deepseek.com"
+            elif llm_provider == "ernie":
+                api_key = config.app.get("ernie_api_key")
+                secret_key = config.app.get("ernie_secret_key")
+                base_url = config.app.get("ernie_base_url")
+                model_name = "***"
+                if not secret_key:
+                    raise ValueError(
+                        f"{llm_provider}: secret_key is not set, please set it in the config.toml file."
+                    )
+            else:
+                raise ValueError(
+                    "llm_provider is not set, please set it in the config.toml file."
+                )
+
+            if not api_key:
+                raise ValueError(
+                    f"{llm_provider}: api_key is not set, please set it in the config.toml file."
+                )
+            if not model_name:
+                raise ValueError(
+                    f"{llm_provider}: model_name is not set, please set it in the config.toml file."
+                )
+            if not base_url:
+                raise ValueError(
+                    f"{llm_provider}: base_url is not set, please set it in the config.toml file."
+                )
+
+            if llm_provider == "qwen":
+                import dashscope
+                from dashscope.api_entities.dashscope_response import GenerationResponse
+
+                dashscope.api_key = api_key
+                response = dashscope.Generation.call(
+                    model=model_name, messages=[{"role": "user", "content": prompt}]
+                )
+                if response:
+                    if isinstance(response, GenerationResponse):
+                        status_code = response.status_code
+                        if status_code != 200:
+                            raise Exception(
+                                f'[{llm_provider}] returned an error response: "{response}"'
+                            )
+
+                        content = response["output"]["text"]
+                        return content.replace("\n", "")
+                    else:
+                        raise Exception(
+                            f'[{llm_provider}] returned an invalid response: "{response}"'
+                        )
+                else:
+                    raise Exception(f"[{llm_provider}] returned an empty response")
+
+            if llm_provider == "gemini":
+                import google.generativeai as genai
+
+                genai.configure(api_key=api_key, transport="rest")
+
+                generation_config = {
+                    "temperature": 0.5,
+                    "top_p": 1,
+                    "top_k": 1,
+                    "max_output_tokens": 2048,
+                }
+
+                safety_settings = [
+                    {
+                        "category": "HARM_CATEGORY_HARASSMENT",
+                        "threshold": "BLOCK_ONLY_HIGH",
+                    },
+                    {
+                        "category": "HARM_CATEGORY_HATE_SPEECH",
+                        "threshold": "BLOCK_ONLY_HIGH",
+                    },
+                    {
+                        "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
+                        "threshold": "BLOCK_ONLY_HIGH",
+                    },
+                    {
+                        "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
+                        "threshold": "BLOCK_ONLY_HIGH",
+                    },
+                ]
+
+                model = genai.GenerativeModel(
+                    model_name=model_name,
+                    generation_config=generation_config,
+                    safety_settings=safety_settings,
+                )
+
+                try:
+                    response = model.generate_content(prompt)
+                    candidates = response.candidates
+                    generated_text = candidates[0].content.parts[0].text
+                except (AttributeError, IndexError) as e:
+                    print("Gemini Error:", e)
+
+                return generated_text
+
+            if llm_provider == "cloudflare":
+                import requests
+
+                response = requests.post(
+                    f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model_name}",
+                    headers={"Authorization": f"Bearer {api_key}"},
+                    json={
+                        "messages": [
+                            {
+                                "role": "system",
+                                "content": "You are a friendly assistant",
+                            },
+                            {"role": "user", "content": prompt},
+                        ]
+                    },
+                )
+                result = response.json()
+                logger.info(result)
+                return result["result"]["response"]
+
+            if llm_provider == "ernie":
+                import requests
+
+                params = {
+                    "grant_type": "client_credentials",
+                    "client_id": api_key,
+                    "client_secret": secret_key,
+                }
+                access_token = (
+                    requests.post(
+                        "https://aip.baidubce.com/oauth/2.0/token", params=params
+                    )
+                    .json()
+                    .get("access_token")
+                )
+                url = f"{base_url}?access_token={access_token}"
+
+                payload = json.dumps(
+                    {
+                        "messages": [{"role": "user", "content": prompt}],
+                        "temperature": 0.5,
+                        "top_p": 0.8,
+                        "penalty_score": 1,
+                        "disable_search": False,
+                        "enable_citation": False,
+                        "response_format": "text",
+                    }
+                )
+                headers = {"Content-Type": "application/json"}
+
+                response = requests.request(
+                    "POST", url, headers=headers, data=payload
+                ).json()
+                return response.get("result")
+
+            if llm_provider == "azure":
+                client = AzureOpenAI(
+                    api_key=api_key,
+                    api_version=api_version,
+                    azure_endpoint=base_url,
+                )
+            else:
+                client = OpenAI(
+                    api_key=api_key,
+                    base_url=base_url,
+                )
+
+            response = client.chat.completions.create(
+                model=model_name, messages=[{"role": "user", "content": prompt}]
            )
            if response:
-                if isinstance(response, GenerationResponse):
-                    status_code = response.status_code
-                    if status_code != 200:
-                        raise Exception(
-                            f"[{llm_provider}] returned an error response: \"{response}\"")
-
-                    content = response["output"]["text"]
-                    return content.replace("\n", "")
+                if isinstance(response, ChatCompletion):
+                    content = response.choices[0].message.content
                else:
                    raise Exception(
-                        f"[{llm_provider}] returned an invalid response: \"{response}\"")
+                        f'[{llm_provider}] returned an invalid response: "{response}", please check your network '
+                        f"connection and try again."
+                    )
            else:
                raise Exception(
-                    f"[{llm_provider}] returned an empty response")
+                    f"[{llm_provider}] returned an empty response, please check your network connection and try again."
+                )

-        if llm_provider == "gemini":
-            import google.generativeai as genai
-            genai.configure(api_key=api_key, transport='rest')
-
-            generation_config = {
-                "temperature": 0.5,
-                "top_p": 1,
-                "top_k": 1,
-                "max_output_tokens": 2048,
-            }
-
-            safety_settings = [
-                {
-                    "category": "HARM_CATEGORY_HARASSMENT",
-                    "threshold": "BLOCK_ONLY_HIGH"
-                },
-                {
-                    "category": "HARM_CATEGORY_HATE_SPEECH",
-                    "threshold": "BLOCK_ONLY_HIGH"
-                },
-                {
-                    "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
-                    "threshold": "BLOCK_ONLY_HIGH"
-                },
-                {
-                    "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
-                    "threshold": "BLOCK_ONLY_HIGH"
-                },
-            ]
-
-            model = genai.GenerativeModel(model_name=model_name,
-                                          generation_config=generation_config,
-                                          safety_settings=safety_settings)
-
-            try:
-                response = model.generate_content(prompt)
-                candidates = response.candidates
-                generated_text = candidates[0].content.parts[0].text
-            except (AttributeError, IndexError) as e:
-                print("Gemini Error:", e)
-
-            return generated_text
-
-        if llm_provider == "cloudflare":
-            import requests
-            response = requests.post(
-                f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model_name}",
-                headers={"Authorization": f"Bearer {api_key}"},
-                json={
-                    "messages": [
-                        {"role": "system", "content": "You are a friendly assistant"},
-                        {"role": "user", "content": prompt}
-                    ]
-                }
-            )
-            result = response.json()
-            logger.info(result)
-            return result["result"]["response"]
-
-        if llm_provider == "azure":
-            client = AzureOpenAI(
-                api_key=api_key,
-                api_version=api_version,
-                azure_endpoint=base_url,
-            )
-        else:
-            client = OpenAI(
-                api_key=api_key,
-                base_url=base_url,
-            )
-
-        response = client.chat.completions.create(
-            model=model_name,
-            messages=[{"role": "user", "content": prompt}]
-        )
-        if response:
-            if isinstance(response, ChatCompletion):
-                content = response.choices[0].message.content
-            else:
-                raise Exception(
-                    f"[{llm_provider}] returned an invalid response: \"{response}\", please check your network "
-                    f"connection and try again.")
-        else:
-            raise Exception(
-                f"[{llm_provider}] returned an empty response, please check your network connection and try again.")
-
-    return content.replace("\n", "")
+        return content.replace("\n", "")
+    except Exception as e:
+        return f"Error: {str(e)}"


-def generate_script(video_subject: str, language: str = "", paragraph_number: int = 1) -> str:
+def generate_script(
+    video_subject: str, language: str = "", paragraph_number: int = 1
+) -> str:
    prompt = f"""
 # Role: Video Script Generator

@@ -236,10 +302,10 @@ Generate a script for a video, depending on the subject of the video.
        paragraphs = response.split("\n\n")

        # Select the specified number of paragraphs
-        selected_paragraphs = paragraphs[:paragraph_number]
+        # selected_paragraphs = paragraphs[:paragraph_number]

        # Join the selected paragraphs into a single string
-        return "\n\n".join(selected_paragraphs)
+        return "\n\n".join(paragraphs)

    for i in range(_max_retries):
        try:
@@ -260,8 +326,10 @@ Generate a script for a video, depending on the subject of the video.

        if i < _max_retries:
            logger.warning(f"failed to generate video script, trying again... {i + 1}")
-
-    logger.success(f"completed: \n{final_script}")
+    if "Error: " in final_script:
+        logger.error(f"failed to generate video script: {final_script}")
+    else:
+        logger.success(f"completed: \n{final_script}")
    return final_script.strip()


@@ -295,21 +363,30 @@ Please note that you must use English for generating video search terms; Chinese
    logger.info(f"subject: {video_subject}")

    search_terms = []
+    response = ""
    for i in range(_max_retries):
        try:
            response = _generate_response(prompt)
+            if "Error: " in response:
+                logger.error(f"failed to generate video script: {response}")
+                return response
            search_terms = json.loads(response)
-            if not isinstance(search_terms, list) or not all(isinstance(term, str) for term in search_terms):
+            if not isinstance(search_terms, list) or not all(
+                isinstance(term, str) for term in search_terms
+            ):
                logger.error("response is not a list of strings.")
                continue

        except Exception as e:
-            match = re.search(r'\[.*]', response)
-            if match:
-                try:
-                    search_terms = json.loads(match.group())
-                except json.JSONDecodeError:
-                    pass
+            logger.warning(f"failed to generate video terms: {str(e)}")
+            if response:
+                match = re.search(r"\[.*]", response)
+                if match:
+                    try:
+                        search_terms = json.loads(match.group())
+                    except Exception as e:
+                        logger.warning(f"failed to generate video terms: {str(e)}")
+                        pass

        if search_terms and len(search_terms) > 0:
            break
@@ -322,9 +399,13 @@ Please note that you must use English for generating video search terms; Chinese

 if __name__ == "__main__":
    video_subject = "生命的意义是什么"
-    script = generate_script(video_subject=video_subject, language="zh-CN", paragraph_number=1)
+    script = generate_script(
+        video_subject=video_subject, language="zh-CN", paragraph_number=1
+    )
    print("######################")
    print(script)
-    search_terms = generate_terms(video_subject=video_subject, video_script=script, amount=5)
+    search_terms = generate_terms(
+        video_subject=video_subject, video_script=script, amount=5
+    )
    print("######################")
    print(search_terms)
--- a/app/services/material.py
+++ b/app/services/material.py
@@ -1,14 +1,14 @@
 import os
 import random
+from typing import List
 from urllib.parse import urlencode

 import requests
-from typing import List
 from loguru import logger
 from moviepy.video.io.VideoFileClip import VideoFileClip

 from app.config import config
-from app.models.schema import VideoAspect, VideoConcatMode, MaterialInfo
+from app.models.schema import MaterialInfo, VideoAspect, VideoConcatMode
 from app.utils import utils

 requested_count = 0
@@ -19,7 +19,8 @@ def get_api_key(cfg_key: str):
    if not api_keys:
        raise ValueError(
            f"\n\n##### {cfg_key} is not set #####\n\nPlease set it in the config.toml file: {config.config_file}\n\n"
-            f"{utils.to_json(config.app)}")
+            f"{utils.to_json(config.app)}"
+        )

    # if only one key is provided, return it
    if isinstance(api_keys, str):
@@ -30,28 +31,32 @@ def get_api_key(cfg_key: str):
    return api_keys[requested_count % len(api_keys)]


-def search_videos_pexels(search_term: str,
-                         minimum_duration: int,
-                         video_aspect: VideoAspect = VideoAspect.portrait,
-                         ) -> List[MaterialInfo]:
+def search_videos_pexels(
+    search_term: str,
+    minimum_duration: int,
+    video_aspect: VideoAspect = VideoAspect.portrait,
+) -> List[MaterialInfo]:
    aspect = VideoAspect(video_aspect)
    video_orientation = aspect.name
    video_width, video_height = aspect.to_resolution()
    api_key = get_api_key("pexels_api_keys")
    headers = {
-        "Authorization": api_key
+        "Authorization": api_key,
+        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36",
    }
    # Build URL
-    params = {
-        "query": search_term,
-        "per_page": 20,
-        "orientation": video_orientation
-    }
+    params = {"query": search_term, "per_page": 20, "orientation": video_orientation}
    query_url = f"https://api.pexels.com/videos/search?{urlencode(params)}"
    logger.info(f"searching videos: {query_url}, with proxies: {config.proxy}")

    try:
-        r = requests.get(query_url, headers=headers, proxies=config.proxy, verify=False, timeout=(30, 60))
+        r = requests.get(
+            query_url,
+            headers=headers,
+            proxies=config.proxy,
+            verify=False,
+            timeout=(30, 60),
+        )
        response = r.json()
        video_items = []
        if "videos" not in response:
@@ -83,10 +88,11 @@ def search_videos_pexels(search_term: str,
    return []


-def search_videos_pixabay(search_term: str,
-                          minimum_duration: int,
-                          video_aspect: VideoAspect = VideoAspect.portrait,
-                          ) -> List[MaterialInfo]:
+def search_videos_pixabay(
+    search_term: str,
+    minimum_duration: int,
+    video_aspect: VideoAspect = VideoAspect.portrait,
+) -> List[MaterialInfo]:
    aspect = VideoAspect(video_aspect)

    video_width, video_height = aspect.to_resolution()
@@ -97,13 +103,15 @@ def search_videos_pixabay(search_term: str,
        "q": search_term,
        "video_type": "all",  # Accepted values: "all", "film", "animation"
        "per_page": 50,
-        "key": api_key
+        "key": api_key,
    }
    query_url = f"https://pixabay.com/api/videos/?{urlencode(params)}"
    logger.info(f"searching videos: {query_url}, with proxies: {config.proxy}")

    try:
-        r = requests.get(query_url, proxies=config.proxy, verify=False, timeout=(30, 60))
+        r = requests.get(
+            query_url, proxies=config.proxy, verify=False, timeout=(30, 60)
+        )
        response = r.json()
        video_items = []
        if "hits" not in response:
@@ -121,7 +129,7 @@ def search_videos_pixabay(search_term: str,
            for video_type in video_files:
                video = video_files[video_type]
                w = int(video["width"])
-                h = int(video["height"])
+                # h = int(video["height"])
                if w >= video_width:
                    item = MaterialInfo()
                    item.provider = "pixabay"
@@ -153,9 +161,21 @@ def save_video(video_url: str, save_dir: str = "") -> str:
        logger.info(f"video already exists: {video_path}")
        return video_path

+    headers = {
+        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
+    }
+
    # if video does not exist, download it
    with open(video_path, "wb") as f:
-        f.write(requests.get(video_url, proxies=config.proxy, verify=False, timeout=(60, 240)).content)
+        f.write(
+            requests.get(
+                video_url,
+                headers=headers,
+                proxies=config.proxy,
+                verify=False,
+                timeout=(60, 240),
+            ).content
+        )

    if os.path.exists(video_path) and os.path.getsize(video_path) > 0:
        try:
@@ -168,20 +188,21 @@ def save_video(video_url: str, save_dir: str = "") -> str:
        except Exception as e:
            try:
                os.remove(video_path)
-            except Exception as e:
+            except Exception:
                pass
            logger.warning(f"invalid video file: {video_path} => {str(e)}")
    return ""


-def download_videos(task_id: str,
-                    search_terms: List[str],
-                    source: str = "pexels",
-                    video_aspect: VideoAspect = VideoAspect.portrait,
-                    video_contact_mode: VideoConcatMode = VideoConcatMode.random,
-                    audio_duration: float = 0.0,
-                    max_clip_duration: int = 5,
-                    ) -> List[str]:
+def download_videos(
+    task_id: str,
+    search_terms: List[str],
+    source: str = "pexels",
+    video_aspect: VideoAspect = VideoAspect.portrait,
+    video_contact_mode: VideoConcatMode = VideoConcatMode.random,
+    audio_duration: float = 0.0,
+    max_clip_duration: int = 5,
+) -> List[str]:
    valid_video_items = []
    valid_video_urls = []
    found_duration = 0.0
@@ -190,9 +211,11 @@ def download_videos(task_id: str,
        search_videos = search_videos_pixabay

    for search_term in search_terms:
-        video_items = search_videos(search_term=search_term,
-                                    minimum_duration=max_clip_duration,
-                                    video_aspect=video_aspect)
+        video_items = search_videos(
+            search_term=search_term,
+            minimum_duration=max_clip_duration,
+            video_aspect=video_aspect,
+        )
        logger.info(f"found {len(video_items)} videos for '{search_term}'")

        for item in video_items:
@@ -202,7 +225,8 @@ def download_videos(task_id: str,
                found_duration += item.duration

    logger.info(
-        f"found total videos: {len(valid_video_items)}, required duration: {audio_duration} seconds, found duration: {found_duration} seconds")
+        f"found total videos: {len(valid_video_items)}, required duration: {audio_duration} seconds, found duration: {found_duration} seconds"
+    )
    video_paths = []

    material_directory = config.app.get("material_directory", "").strip()
@@ -218,14 +242,18 @@ def download_videos(task_id: str,
    for item in valid_video_items:
        try:
            logger.info(f"downloading video: {item.url}")
-            saved_video_path = save_video(video_url=item.url, save_dir=material_directory)
+            saved_video_path = save_video(
+                video_url=item.url, save_dir=material_directory
+            )
            if saved_video_path:
                logger.info(f"video saved: {saved_video_path}")
                video_paths.append(saved_video_path)
                seconds = min(max_clip_duration, item.duration)
                total_duration += seconds
                if total_duration > audio_duration:
-                    logger.info(f"total duration of downloaded videos: {total_duration} seconds, skip downloading more")
+                    logger.info(
+                        f"total duration of downloaded videos: {total_duration} seconds, skip downloading more"
+                    )
                    break
        except Exception as e:
            logger.error(f"failed to download video: {utils.to_json(item)} => {str(e)}")
@@ -234,4 +262,6 @@ def download_videos(task_id: str,


 if __name__ == "__main__":
-    download_videos("test123", ["Money Exchange Medium"], audio_duration=100, source="pixabay")
+    download_videos(
+        "test123", ["Money Exchange Medium"], audio_duration=100, source="pixabay"
+    )
--- a/app/services/state.py
+++ b/app/services/state.py
@@ -1,12 +1,12 @@
 import ast
 from abc import ABC, abstractmethod
+
 from app.config import config
 from app.models import const


 # Base class for state management
 class BaseState(ABC):
-
    @abstractmethod
    def update_task(self, task_id: str, state: int, progress: int = 0, **kwargs):
        pass
@@ -15,19 +15,36 @@ class BaseState(ABC):
    def get_task(self, task_id: str):
        pass

+    @abstractmethod
+    def get_all_tasks(self, page: int, page_size: int):
+        pass
+

 # Memory state management
 class MemoryState(BaseState):
-
    def __init__(self):
        self._tasks = {}

-    def update_task(self, task_id: str, state: int = const.TASK_STATE_PROCESSING, progress: int = 0, **kwargs):
+    def get_all_tasks(self, page: int, page_size: int):
+        start = (page - 1) * page_size
+        end = start + page_size
+        tasks = list(self._tasks.values())
+        total = len(tasks)
+        return tasks[start:end], total
+
+    def update_task(
+        self,
+        task_id: str,
+        state: int = const.TASK_STATE_PROCESSING,
+        progress: int = 0,
+        **kwargs,
+    ):
        progress = int(progress)
        if progress > 100:
            progress = 100

        self._tasks[task_id] = {
+            "task_id": task_id,
            "state": state,
            "progress": progress,
            **kwargs,
@@ -43,17 +60,46 @@ class MemoryState(BaseState):

 # Redis state management
 class RedisState(BaseState):
-
-    def __init__(self, host='localhost', port=6379, db=0, password=None):
+    def __init__(self, host="localhost", port=6379, db=0, password=None):
        import redis
+
        self._redis = redis.StrictRedis(host=host, port=port, db=db, password=password)

-    def update_task(self, task_id: str, state: int = const.TASK_STATE_PROCESSING, progress: int = 0, **kwargs):
+    def get_all_tasks(self, page: int, page_size: int):
+        start = (page - 1) * page_size
+        end = start + page_size
+        tasks = []
+        cursor = 0
+        total = 0
+        while True:
+            cursor, keys = self._redis.scan(cursor, count=page_size)
+            total += len(keys)
+            if total > start:
+                for key in keys[max(0, start - total):end - total]:
+                    task_data = self._redis.hgetall(key)
+                    task = {
+                        k.decode("utf-8"): self._convert_to_original_type(v) for k, v in task_data.items()
+                    }
+                    tasks.append(task)
+                    if len(tasks) >= page_size:
+                        break
+            if cursor == 0 or len(tasks) >= page_size:
+                break
+        return tasks, total
+
+    def update_task(
+        self,
+        task_id: str,
+        state: int = const.TASK_STATE_PROCESSING,
+        progress: int = 0,
+        **kwargs,
+    ):
        progress = int(progress)
        if progress > 100:
            progress = 100

        fields = {
+            "task_id": task_id,
            "state": state,
            "progress": progress,
            **kwargs,
@@ -67,7 +113,10 @@ class RedisState(BaseState):
        if not task_data:
            return None

-        task = {key.decode('utf-8'): self._convert_to_original_type(value) for key, value in task_data.items()}
+        task = {
+            key.decode("utf-8"): self._convert_to_original_type(value)
+            for key, value in task_data.items()
+        }
        return task

    def delete_task(self, task_id: str):
@@ -79,7 +128,7 @@ class RedisState(BaseState):
        Convert the value from byte string to its original data type.
        You can extend this method to handle other data types as needed.
        """
-        value_str = value.decode('utf-8')
+        value_str = value.decode("utf-8")

        try:
            # try to convert byte string array to list
@@ -100,4 +149,10 @@ _redis_port = config.app.get("redis_port", 6379)
 _redis_db = config.app.get("redis_db", 0)
 _redis_password = config.app.get("redis_password", None)

-state = RedisState(host=_redis_host, port=_redis_port, db=_redis_db, password=_redis_password) if _enable_redis else MemoryState()
+state = (
+    RedisState(
+        host=_redis_host, port=_redis_port, db=_redis_db, password=_redis_password
+    )
+    if _enable_redis
+    else MemoryState()
+)
--- a/app/services/subtitle.py
+++ b/app/services/subtitle.py
@@ -1,9 +1,9 @@
 import json
 import os.path
 import re
+from timeit import default_timer as timer

 from faster_whisper import WhisperModel
-from timeit import default_timer as timer
 from loguru import logger

 from app.config import config
@@ -23,18 +23,22 @@ def create(audio_file, subtitle_file: str = ""):
        if not os.path.isdir(model_path) or not os.path.isfile(model_bin_file):
            model_path = model_size

-        logger.info(f"loading model: {model_path}, device: {device}, compute_type: {compute_type}")
+        logger.info(
+            f"loading model: {model_path}, device: {device}, compute_type: {compute_type}"
+        )
        try:
-            model = WhisperModel(model_size_or_path=model_path,
-                                 device=device,
-                                 compute_type=compute_type)
+            model = WhisperModel(
+                model_size_or_path=model_path, device=device, compute_type=compute_type
+            )
        except Exception as e:
-            logger.error(f"failed to load model: {e} \n\n"
-                         f"********************************************\n"
-                         f"this may be caused by network issue. \n"
-                         f"please download the model manually and put it in the 'models' folder. \n"
-                         f"see [README.md FAQ](https://github.com/harry0703/MoneyPrinterTurbo) for more details.\n"
-                         f"********************************************\n\n")
+            logger.error(
+                f"failed to load model: {e} \n\n"
+                f"********************************************\n"
+                f"this may be caused by network issue. \n"
+                f"please download the model manually and put it in the 'models' folder. \n"
+                f"see [README.md FAQ](https://github.com/harry0703/MoneyPrinterTurbo) for more details.\n"
+                f"********************************************\n\n"
+            )
            return None

    logger.info(f"start, output file: {subtitle_file}")
@@ -49,7 +53,9 @@ def create(audio_file, subtitle_file: str = ""):
        vad_parameters=dict(min_silence_duration_ms=500),
    )

-    logger.info(f"detected language: '{info.language}', probability: {info.language_probability:.2f}")
+    logger.info(
+        f"detected language: '{info.language}', probability: {info.language_probability:.2f}"
+    )

    start = timer()
    subtitles = []
@@ -62,11 +68,9 @@ def create(audio_file, subtitle_file: str = ""):
        msg = "[%.2fs -> %.2fs] %s" % (seg_start, seg_end, seg_text)
        logger.debug(msg)

-        subtitles.append({
-            "msg": seg_text,
-            "start_time": seg_start,
-            "end_time": seg_end
-        })
+        subtitles.append(
+            {"msg": seg_text, "start_time": seg_start, "end_time": seg_end}
+        )

    for segment in segments:
        words_idx = 0
@@ -84,7 +88,7 @@ def create(audio_file, subtitle_file: str = ""):
                    is_segmented = True

                seg_end = word.end
-                # 如果包含标点,则断句
+                # If it contains punctuation, then break the sentence.
                seg_text += word.word

                if utils.str_contains_punctuation(word.word):
@@ -119,7 +123,11 @@ def create(audio_file, subtitle_file: str = ""):
    for subtitle in subtitles:
        text = subtitle.get("msg")
        if text:
-            lines.append(utils.text_to_srt(idx, text, subtitle.get("start_time"), subtitle.get("end_time")))
+            lines.append(
+                utils.text_to_srt(
+                    idx, text, subtitle.get("start_time"), subtitle.get("end_time")
+                )
+            )
            idx += 1

    sub = "\n".join(lines) + "\n"
@@ -136,12 +144,12 @@ def file_to_subtitles(filename):
    current_times = None
    current_text = ""
    index = 0
-    with open(filename, 'r', encoding="utf-8") as f:
+    with open(filename, "r", encoding="utf-8") as f:
        for line in f:
            times = re.findall("([0-9]*:[0-9]*:[0-9]*,[0-9]*)", line)
            if times:
                current_times = line
-            elif line.strip() == '' and current_times:
+            elif line.strip() == "" and current_times:
                index += 1
                times_texts.append((index, current_times.strip(), current_text.strip()))
                current_times, current_text = None, ""
@@ -150,27 +158,124 @@ def file_to_subtitles(filename):
    return times_texts


+def levenshtein_distance(s1, s2):
+    if len(s1) < len(s2):
+        return levenshtein_distance(s2, s1)
+
+    if len(s2) == 0:
+        return len(s1)
+
+    previous_row = range(len(s2) + 1)
+    for i, c1 in enumerate(s1):
+        current_row = [i + 1]
+        for j, c2 in enumerate(s2):
+            insertions = previous_row[j + 1] + 1
+            deletions = current_row[j] + 1
+            substitutions = previous_row[j] + (c1 != c2)
+            current_row.append(min(insertions, deletions, substitutions))
+        previous_row = current_row
+
+    return previous_row[-1]
+
+
+def similarity(a, b):
+    distance = levenshtein_distance(a.lower(), b.lower())
+    max_length = max(len(a), len(b))
+    return 1 - (distance / max_length)
+
+
 def correct(subtitle_file, video_script):
    subtitle_items = file_to_subtitles(subtitle_file)
    script_lines = utils.split_string_by_punctuations(video_script)

    corrected = False
-    if len(subtitle_items) == len(script_lines):
-        for i in range(len(script_lines)):
-            script_line = script_lines[i].strip()
-            subtitle_line = subtitle_items[i][2]
-            if script_line != subtitle_line:
-                logger.warning(f"line {i + 1}, script: {script_line}, subtitle: {subtitle_line}")
-                subtitle_items[i] = (subtitle_items[i][0], subtitle_items[i][1], script_line)
+    new_subtitle_items = []
+    script_index = 0
+    subtitle_index = 0
+
+    while script_index < len(script_lines) and subtitle_index < len(subtitle_items):
+        script_line = script_lines[script_index].strip()
+        subtitle_line = subtitle_items[subtitle_index][2].strip()
+
+        if script_line == subtitle_line:
+            new_subtitle_items.append(subtitle_items[subtitle_index])
+            script_index += 1
+            subtitle_index += 1
+        else:
+            combined_subtitle = subtitle_line
+            start_time = subtitle_items[subtitle_index][1].split(" --> ")[0]
+            end_time = subtitle_items[subtitle_index][1].split(" --> ")[1]
+            next_subtitle_index = subtitle_index + 1
+
+            while next_subtitle_index < len(subtitle_items):
+                next_subtitle = subtitle_items[next_subtitle_index][2].strip()
+                if similarity(
+                    script_line, combined_subtitle + " " + next_subtitle
+                ) > similarity(script_line, combined_subtitle):
+                    combined_subtitle += " " + next_subtitle
+                    end_time = subtitle_items[next_subtitle_index][1].split(" --> ")[1]
+                    next_subtitle_index += 1
+                else:
+                    break
+
+            if similarity(script_line, combined_subtitle) > 0.8:
+                logger.warning(
+                    f"Merged/Corrected - Script: {script_line}, Subtitle: {combined_subtitle}"
+                )
+                new_subtitle_items.append(
+                    (
+                        len(new_subtitle_items) + 1,
+                        f"{start_time} --> {end_time}",
+                        script_line,
+                    )
+                )
                corrected = True
+            else:
+                logger.warning(
+                    f"Mismatch - Script: {script_line}, Subtitle: {combined_subtitle}"
+                )
+                new_subtitle_items.append(
+                    (
+                        len(new_subtitle_items) + 1,
+                        f"{start_time} --> {end_time}",
+                        script_line,
+                    )
+                )
+                corrected = True
+
+            script_index += 1
+            subtitle_index = next_subtitle_index
+
+    # Process the remaining lines of the script.
+    while script_index < len(script_lines):
+        logger.warning(f"Extra script line: {script_lines[script_index]}")
+        if subtitle_index < len(subtitle_items):
+            new_subtitle_items.append(
+                (
+                    len(new_subtitle_items) + 1,
+                    subtitle_items[subtitle_index][1],
+                    script_lines[script_index],
+                )
+            )
+            subtitle_index += 1
+        else:
+            new_subtitle_items.append(
+                (
+                    len(new_subtitle_items) + 1,
+                    "00:00:00,000 --> 00:00:00,000",
+                    script_lines[script_index],
+                )
+            )
+        script_index += 1
+        corrected = True

    if corrected:
        with open(subtitle_file, "w", encoding="utf-8") as fd:
-            for item in subtitle_items:
-                fd.write(f"{item[0]}\n{item[1]}\n{item[2]}\n\n")
-        logger.info(f"subtitle corrected")
+            for i, item in enumerate(new_subtitle_items):
+                fd.write(f"{i + 1}\n{item[1]}\n{item[2]}\n\n")
+        logger.info("Subtitle corrected")
    else:
-        logger.success(f"subtitle is correct")
+        logger.success("Subtitle is correct")


 if __name__ == "__main__":
--- a/app/services/task.py
+++ b/app/services/task.py
@@ -7,57 +7,42 @@ from loguru import logger

 from app.config import config
 from app.models import const
-from app.models.schema import VideoParams, VideoConcatMode
-from app.services import llm, material, voice, video, subtitle
+from app.models.schema import VideoConcatMode, VideoParams
+from app.services import llm, material, subtitle, video, voice
 from app.services import state as sm
 from app.utils import utils


-def start(task_id, params: VideoParams):
-    """
-    {
-        "video_subject": "",
-        "video_aspect": "横屏 16:9（西瓜视频）",
-        "voice_name": "女生-晓晓",
-        "enable_bgm": false,
-        "font_name": "STHeitiMedium 黑体-中",
-        "text_color": "#FFFFFF",
-        "font_size": 60,
-        "stroke_color": "#000000",
-        "stroke_width": 1.5
-    }
-    """
-    logger.info(f"start task: {task_id}")
-    sm.state.update_task(task_id, state=const.TASK_STATE_PROCESSING, progress=5)
-
-    video_subject = params.video_subject
-    voice_name = voice.parse_voice_name(params.voice_name)
-    paragraph_number = params.paragraph_number
-    n_threads = params.n_threads
-    max_clip_duration = params.video_clip_duration
-
+def generate_script(task_id, params):
    logger.info("\n\n## generating video script")
    video_script = params.video_script.strip()
    if not video_script:
-        video_script = llm.generate_script(video_subject=video_subject, language=params.video_language,
-                                           paragraph_number=paragraph_number)
+        video_script = llm.generate_script(
+            video_subject=params.video_subject,
+            language=params.video_language,
+            paragraph_number=params.paragraph_number,
+        )
    else:
        logger.debug(f"video script: \n{video_script}")

    if not video_script:
        sm.state.update_task(task_id, state=const.TASK_STATE_FAILED)
        logger.error("failed to generate video script.")
-        return
+        return None

-    sm.state.update_task(task_id, state=const.TASK_STATE_PROCESSING, progress=10)
+    return video_script

+
+def generate_terms(task_id, params, video_script):
    logger.info("\n\n## generating video terms")
    video_terms = params.video_terms
    if not video_terms:
-        video_terms = llm.generate_terms(video_subject=video_subject, video_script=video_script, amount=5)
+        video_terms = llm.generate_terms(
+            video_subject=params.video_subject, video_script=video_script, amount=5
+        )
    else:
        if isinstance(video_terms, str):
-            video_terms = [term.strip() for term in re.split(r'[,，]', video_terms)]
+            video_terms = [term.strip() for term in re.split(r"[,，]", video_terms)]
        elif isinstance(video_terms, list):
            video_terms = [term.strip() for term in video_terms]
        else:
@@ -68,9 +53,13 @@ def start(task_id, params: VideoParams):
    if not video_terms:
        sm.state.update_task(task_id, state=const.TASK_STATE_FAILED)
        logger.error("failed to generate video terms.")
-        return
+        return None

-    script_file = path.join(utils.task_dir(task_id), f"script.json")
+    return video_terms
+
+
+def save_script_data(task_id, video_script, video_terms, params):
+    script_file = path.join(utils.task_dir(task_id), "script.json")
    script_data = {
        "script": video_script,
        "search_terms": video_terms,
@@ -80,11 +69,16 @@ def start(task_id, params: VideoParams):
    with open(script_file, "w", encoding="utf-8") as f:
        f.write(utils.to_json(script_data))

-    sm.state.update_task(task_id, state=const.TASK_STATE_PROCESSING, progress=20)

+def generate_audio(task_id, params, video_script):
    logger.info("\n\n## generating audio")
-    audio_file = path.join(utils.task_dir(task_id), f"audio.mp3")
-    sub_maker = voice.tts(text=video_script, voice_name=voice_name, voice_file=audio_file)
+    audio_file = path.join(utils.task_dir(task_id), "audio.mp3")
+    sub_maker = voice.tts(
+        text=video_script,
+        voice_name=voice.parse_voice_name(params.voice_name),
+        voice_rate=params.voice_rate,
+        voice_file=audio_file,
+    )
    if sub_maker is None:
        sm.state.update_task(task_id, state=const.TASK_STATE_FAILED)
        logger.error(
@@ -93,86 +87,102 @@ def start(task_id, params: VideoParams):
 2. check if the network is available. If you are in China, it is recommended to use a VPN and enable the global traffic mode.
        """.strip()
        )
-        return
+        return None, None, None

-    audio_duration = voice.get_audio_duration(sub_maker)
-    audio_duration = math.ceil(audio_duration)
+    audio_duration = math.ceil(voice.get_audio_duration(sub_maker))
+    return audio_file, audio_duration, sub_maker

-    sm.state.update_task(task_id, state=const.TASK_STATE_PROCESSING, progress=30)

-    subtitle_path = ""
-    if params.subtitle_enabled:
-        subtitle_path = path.join(utils.task_dir(task_id), f"subtitle.srt")
-        subtitle_provider = config.app.get("subtitle_provider", "").strip().lower()
-        logger.info(f"\n\n## generating subtitle, provider: {subtitle_provider}")
-        subtitle_fallback = False
-        if subtitle_provider == "edge":
-            voice.create_subtitle(text=video_script, sub_maker=sub_maker, subtitle_file=subtitle_path)
-            if not os.path.exists(subtitle_path):
-                subtitle_fallback = True
-                logger.warning("subtitle file not found, fallback to whisper")
+def generate_subtitle(task_id, params, video_script, sub_maker, audio_file):
+    if not params.subtitle_enabled:
+        return ""

-        if subtitle_provider == "whisper" or subtitle_fallback:
-            subtitle.create(audio_file=audio_file, subtitle_file=subtitle_path)
-            logger.info("\n\n## correcting subtitle")
-            subtitle.correct(subtitle_file=subtitle_path, video_script=video_script)
+    subtitle_path = path.join(utils.task_dir(task_id), "subtitle.srt")
+    subtitle_provider = config.app.get("subtitle_provider", "edge").strip().lower()
+    logger.info(f"\n\n## generating subtitle, provider: {subtitle_provider}")

-        subtitle_lines = subtitle.file_to_subtitles(subtitle_path)
-        if not subtitle_lines:
-            logger.warning(f"subtitle file is invalid: {subtitle_path}")
-            subtitle_path = ""
+    subtitle_fallback = False
+    if subtitle_provider == "edge":
+        voice.create_subtitle(
+            text=video_script, sub_maker=sub_maker, subtitle_file=subtitle_path
+        )
+        if not os.path.exists(subtitle_path):
+            subtitle_fallback = True
+            logger.warning("subtitle file not found, fallback to whisper")

-    sm.state.update_task(task_id, state=const.TASK_STATE_PROCESSING, progress=40)
+    if subtitle_provider == "whisper" or subtitle_fallback:
+        subtitle.create(audio_file=audio_file, subtitle_file=subtitle_path)
+        logger.info("\n\n## correcting subtitle")
+        subtitle.correct(subtitle_file=subtitle_path, video_script=video_script)

-    downloaded_videos = []
+    subtitle_lines = subtitle.file_to_subtitles(subtitle_path)
+    if not subtitle_lines:
+        logger.warning(f"subtitle file is invalid: {subtitle_path}")
+        return ""
+
+    return subtitle_path
+
+
+def get_video_materials(task_id, params, video_terms, audio_duration):
    if params.video_source == "local":
        logger.info("\n\n## preprocess local materials")
-        materials = video.preprocess_video(materials=params.video_materials, clip_duration=max_clip_duration)
-        print(materials)
-
+        materials = video.preprocess_video(
+            materials=params.video_materials, clip_duration=params.video_clip_duration
+        )
        if not materials:
            sm.state.update_task(task_id, state=const.TASK_STATE_FAILED)
-            logger.error("no valid materials found, please check the materials and try again.")
-            return
-        for material_info in materials:
-            print(material_info)
-            downloaded_videos.append(material_info.url)
+            logger.error(
+                "no valid materials found, please check the materials and try again."
+            )
+            return None
+        return [material_info.url for material_info in materials]
    else:
        logger.info(f"\n\n## downloading videos from {params.video_source}")
-        downloaded_videos = material.download_videos(task_id=task_id,
-                                                     search_terms=video_terms,
-                                                     source=params.video_source,
-                                                     video_aspect=params.video_aspect,
-                                                     video_contact_mode=params.video_concat_mode,
-                                                     audio_duration=audio_duration * params.video_count,
-                                                     max_clip_duration=max_clip_duration,
-                                                     )
-    if not downloaded_videos:
-        sm.state.update_task(task_id, state=const.TASK_STATE_FAILED)
-        logger.error(
-            "failed to download videos, maybe the network is not available. if you are in China, please use a VPN.")
-        return
+        downloaded_videos = material.download_videos(
+            task_id=task_id,
+            search_terms=video_terms,
+            source=params.video_source,
+            video_aspect=params.video_aspect,
+            video_contact_mode=params.video_concat_mode,
+            audio_duration=audio_duration * params.video_count,
+            max_clip_duration=params.video_clip_duration,
+        )
+        if not downloaded_videos:
+            sm.state.update_task(task_id, state=const.TASK_STATE_FAILED)
+            logger.error(
+                "failed to download videos, maybe the network is not available. if you are in China, please use a VPN."
+            )
+            return None
+        return downloaded_videos

-    sm.state.update_task(task_id, state=const.TASK_STATE_PROCESSING, progress=50)

+def generate_final_videos(
+    task_id, params, downloaded_videos, audio_file, subtitle_path
+):
    final_video_paths = []
    combined_video_paths = []
-    video_concat_mode = params.video_concat_mode
-    if params.video_count > 1:
-        video_concat_mode = VideoConcatMode.random
+    video_concat_mode = (
+        params.video_concat_mode if params.video_count == 1 else VideoConcatMode.random
+    )
+    video_transition_mode = params.video_transition_mode

    _progress = 50
    for i in range(params.video_count):
        index = i + 1
-        combined_video_path = path.join(utils.task_dir(task_id), f"combined-{index}.mp4")
+        combined_video_path = path.join(
+            utils.task_dir(task_id), f"combined-{index}.mp4"
+        )
        logger.info(f"\n\n## combining video: {index} => {combined_video_path}")
-        video.combine_videos(combined_video_path=combined_video_path,
-                             video_paths=downloaded_videos,
-                             audio_file=audio_file,
-                             video_aspect=params.video_aspect,
-                             video_concat_mode=video_concat_mode,
-                             max_clip_duration=max_clip_duration,
-                             threads=n_threads)
+        video.combine_videos(
+            combined_video_path=combined_video_path,
+            video_paths=downloaded_videos,
+            audio_file=audio_file,
+            video_aspect=params.video_aspect,
+            video_concat_mode=video_concat_mode,
+            video_transition_mode=video_transition_mode,
+            max_clip_duration=params.video_clip_duration,
+            threads=params.n_threads,
+        )

        _progress += 50 / params.video_count / 2
        sm.state.update_task(task_id, progress=_progress)
@@ -180,13 +190,13 @@ def start(task_id, params: VideoParams):
        final_video_path = path.join(utils.task_dir(task_id), f"final-{index}.mp4")

        logger.info(f"\n\n## generating video: {index} => {final_video_path}")
-        # Put everything together
-        video.generate_video(video_path=combined_video_path,
-                             audio_path=audio_file,
-                             subtitle_path=subtitle_path,
-                             output_file=final_video_path,
-                             params=params,
-                             )
+        video.generate_video(
+            video_path=combined_video_path,
+            audio_path=audio_file,
+            subtitle_path=subtitle_path,
+            output_file=final_video_path,
+            params=params,
+        )

        _progress += 50 / params.video_count / 2
        sm.state.update_task(task_id, progress=_progress)
@@ -194,16 +204,136 @@ def start(task_id, params: VideoParams):
        final_video_paths.append(final_video_path)
        combined_video_paths.append(combined_video_path)

-    logger.success(f"task {task_id} finished, generated {len(final_video_paths)} videos.")
+    return final_video_paths, combined_video_paths
+
+
+def start(task_id, params: VideoParams, stop_at: str = "video"):
+    logger.info(f"start task: {task_id}, stop_at: {stop_at}")
+    sm.state.update_task(task_id, state=const.TASK_STATE_PROCESSING, progress=5)
+
+    if type(params.video_concat_mode) is str:
+        params.video_concat_mode = VideoConcatMode(params.video_concat_mode)
+
+    # 1. Generate script
+    video_script = generate_script(task_id, params)
+    if not video_script or "Error: " in video_script:
+        sm.state.update_task(task_id, state=const.TASK_STATE_FAILED)
+        return
+
+    sm.state.update_task(task_id, state=const.TASK_STATE_PROCESSING, progress=10)
+
+    if stop_at == "script":
+        sm.state.update_task(
+            task_id, state=const.TASK_STATE_COMPLETE, progress=100, script=video_script
+        )
+        return {"script": video_script}
+
+    # 2. Generate terms
+    video_terms = ""
+    if params.video_source != "local":
+        video_terms = generate_terms(task_id, params, video_script)
+        if not video_terms:
+            sm.state.update_task(task_id, state=const.TASK_STATE_FAILED)
+            return
+
+    save_script_data(task_id, video_script, video_terms, params)
+
+    if stop_at == "terms":
+        sm.state.update_task(
+            task_id, state=const.TASK_STATE_COMPLETE, progress=100, terms=video_terms
+        )
+        return {"script": video_script, "terms": video_terms}
+
+    sm.state.update_task(task_id, state=const.TASK_STATE_PROCESSING, progress=20)
+
+    # 3. Generate audio
+    audio_file, audio_duration, sub_maker = generate_audio(
+        task_id, params, video_script
+    )
+    if not audio_file:
+        sm.state.update_task(task_id, state=const.TASK_STATE_FAILED)
+        return
+
+    sm.state.update_task(task_id, state=const.TASK_STATE_PROCESSING, progress=30)
+
+    if stop_at == "audio":
+        sm.state.update_task(
+            task_id,
+            state=const.TASK_STATE_COMPLETE,
+            progress=100,
+            audio_file=audio_file,
+        )
+        return {"audio_file": audio_file, "audio_duration": audio_duration}
+
+    # 4. Generate subtitle
+    subtitle_path = generate_subtitle(
+        task_id, params, video_script, sub_maker, audio_file
+    )
+
+    if stop_at == "subtitle":
+        sm.state.update_task(
+            task_id,
+            state=const.TASK_STATE_COMPLETE,
+            progress=100,
+            subtitle_path=subtitle_path,
+        )
+        return {"subtitle_path": subtitle_path}
+
+    sm.state.update_task(task_id, state=const.TASK_STATE_PROCESSING, progress=40)
+
+    # 5. Get video materials
+    downloaded_videos = get_video_materials(
+        task_id, params, video_terms, audio_duration
+    )
+    if not downloaded_videos:
+        sm.state.update_task(task_id, state=const.TASK_STATE_FAILED)
+        return
+
+    if stop_at == "materials":
+        sm.state.update_task(
+            task_id,
+            state=const.TASK_STATE_COMPLETE,
+            progress=100,
+            materials=downloaded_videos,
+        )
+        return {"materials": downloaded_videos}
+
+    sm.state.update_task(task_id, state=const.TASK_STATE_PROCESSING, progress=50)
+
+    # 6. Generate final videos
+    final_video_paths, combined_video_paths = generate_final_videos(
+        task_id, params, downloaded_videos, audio_file, subtitle_path
+    )
+
+    if not final_video_paths:
+        sm.state.update_task(task_id, state=const.TASK_STATE_FAILED)
+        return
+
+    logger.success(
+        f"task {task_id} finished, generated {len(final_video_paths)} videos."
+    )

    kwargs = {
        "videos": final_video_paths,
-        "combined_videos": combined_video_paths
+        "combined_videos": combined_video_paths,
+        "script": video_script,
+        "terms": video_terms,
+        "audio_file": audio_file,
+        "audio_duration": audio_duration,
+        "subtitle_path": subtitle_path,
+        "materials": downloaded_videos,
    }
-    sm.state.update_task(task_id, state=const.TASK_STATE_COMPLETE, progress=100, **kwargs)
+    sm.state.update_task(
+        task_id, state=const.TASK_STATE_COMPLETE, progress=100, **kwargs
+    )
    return kwargs

-# def start_test(task_id, params: VideoParams):
-#     print(f"start task {task_id} \n")
-#     time.sleep(5)
-#     print(f"task {task_id} finished \n")
+
+if __name__ == "__main__":
+    task_id = "task_id"
+    params = VideoParams(
+        video_subject="金钱的作用",
+        voice_name="zh-CN-XiaoyiNeural-Female",
+        voice_rate=1.0,
+    )
+    start(task_id, params, stop_at="video")
--- a/app/services/utils/video_effects.py
+++ b/app/services/utils/video_effects.py
@@ -0,0 +1,21 @@
+from moviepy import Clip, vfx
+
+
+# FadeIn
+def fadein_transition(clip: Clip, t: float) -> Clip:
+    return clip.with_effects([vfx.FadeIn(t)])
+
+
+# FadeOut
+def fadeout_transition(clip: Clip, t: float) -> Clip:
+    return clip.with_effects([vfx.FadeOut(t)])
+
+
+# SlideIn
+def slidein_transition(clip: Clip, t: float, side: str) -> Clip:
+    return clip.with_effects([vfx.SlideIn(t, side)])
+
+
+# SlideOut
+def slideout_transition(clip: Clip, t: float, side: str) -> Clip:
+    return clip.with_effects([vfx.SlideOut(t, side)])
--- a/app/services/video.py
+++ b/app/services/video.py
@@ -1,15 +1,102 @@
 import glob
+import itertools
+import os
 import random
+import gc
+import shutil
 from typing import List
-from PIL import ImageFont, Image
 from loguru import logger
-from moviepy.editor import *
+from moviepy import (
+    AudioFileClip,
+    ColorClip,
+    CompositeAudioClip,
+    CompositeVideoClip,
+    ImageClip,
+    TextClip,
+    VideoFileClip,
+    afx,
+    concatenate_videoclips,
+)
 from moviepy.video.tools.subtitles import SubtitlesClip
+from PIL import ImageFont

 from app.models import const
-from app.models.schema import VideoAspect, VideoParams, VideoConcatMode, MaterialInfo
+from app.models.schema import (
+    MaterialInfo,
+    VideoAspect,
+    VideoConcatMode,
+    VideoParams,
+    VideoTransitionMode,
+)
+from app.services.utils import video_effects
 from app.utils import utils

+class SubClippedVideoClip:
+    def __init__(self, file_path, start_time=None, end_time=None, width=None, height=None, duration=None):
+        self.file_path = file_path
+        self.start_time = start_time
+        self.end_time = end_time
+        self.width = width
+        self.height = height
+        if duration is None:
+            self.duration = end_time - start_time
+        else:
+            self.duration = duration
+
+    def __str__(self):
+        return f"SubClippedVideoClip(file_path={self.file_path}, start_time={self.start_time}, end_time={self.end_time}, duration={self.duration}, width={self.width}, height={self.height})"
+
+
+audio_codec = "aac"
+video_codec = "libx264"
+fps = 30
+
+def close_clip(clip):
+    if clip is None:
+        return
+        
+    try:
+        # close main resources
+        if hasattr(clip, 'reader') and clip.reader is not None:
+            clip.reader.close()
+            
+        # close audio resources
+        if hasattr(clip, 'audio') and clip.audio is not None:
+            if hasattr(clip.audio, 'reader') and clip.audio.reader is not None:
+                clip.audio.reader.close()
+            del clip.audio
+            
+        # close mask resources
+        if hasattr(clip, 'mask') and clip.mask is not None:
+            if hasattr(clip.mask, 'reader') and clip.mask.reader is not None:
+                clip.mask.reader.close()
+            del clip.mask
+            
+        # handle child clips in composite clips
+        if hasattr(clip, 'clips') and clip.clips:
+            for child_clip in clip.clips:
+                if child_clip is not clip:  # avoid possible circular references
+                    close_clip(child_clip)
+            
+        # clear clip list
+        if hasattr(clip, 'clips'):
+            clip.clips = []
+            
+    except Exception as e:
+        logger.error(f"failed to close clip: {str(e)}")
+    
+    del clip
+    gc.collect()
+
+def delete_files(files: List[str] | str):
+    if isinstance(files, str):
+        files = [files]
+        
+    for file in files:
+        try:
+            os.remove(file)
+        except:
+            pass

 def get_bgm_file(bgm_type: str = "random", bgm_file: str = ""):
    if not bgm_type:
@@ -27,113 +114,203 @@ def get_bgm_file(bgm_type: str = "random", bgm_file: str = ""):
    return ""


-def combine_videos(combined_video_path: str,
-                   video_paths: List[str],
-                   audio_file: str,
-                   video_aspect: VideoAspect = VideoAspect.portrait,
-                   video_concat_mode: VideoConcatMode = VideoConcatMode.random,
-                   max_clip_duration: int = 5,
-                   threads: int = 2,
-                   ) -> str:
+def combine_videos(
+    combined_video_path: str,
+    video_paths: List[str],
+    audio_file: str,
+    video_aspect: VideoAspect = VideoAspect.portrait,
+    video_concat_mode: VideoConcatMode = VideoConcatMode.random,
+    video_transition_mode: VideoTransitionMode = None,
+    max_clip_duration: int = 5,
+    threads: int = 2,
+) -> str:
    audio_clip = AudioFileClip(audio_file)
    audio_duration = audio_clip.duration
-    logger.info(f"max duration of audio: {audio_duration} seconds")
+    logger.info(f"audio duration: {audio_duration} seconds")
    # Required duration of each clip
    req_dur = audio_duration / len(video_paths)
    req_dur = max_clip_duration
-    logger.info(f"each clip will be maximum {req_dur} seconds long")
+    logger.info(f"maximum clip duration: {req_dur} seconds")
    output_dir = os.path.dirname(combined_video_path)

    aspect = VideoAspect(video_aspect)
    video_width, video_height = aspect.to_resolution()

-    clips = []
+    processed_clips = []
+    subclipped_items = []
    video_duration = 0
-
-    raw_clips = []
    for video_path in video_paths:
-        clip = VideoFileClip(video_path).without_audio()
+        clip = VideoFileClip(video_path)
        clip_duration = clip.duration
+        clip_w, clip_h = clip.size
+        close_clip(clip)
+        
        start_time = 0

        while start_time < clip_duration:
-            end_time = min(start_time + max_clip_duration, clip_duration)
-            split_clip = clip.subclip(start_time, end_time)
-            raw_clips.append(split_clip)
-            # logger.info(f"splitting from {start_time:.2f} to {end_time:.2f}, clip duration {clip_duration:.2f}, split_clip duration {split_clip.duration:.2f}")
-            start_time = end_time
+            end_time = min(start_time + max_clip_duration, clip_duration)            
+            if clip_duration - start_time >= max_clip_duration:
+                subclipped_items.append(SubClippedVideoClip(file_path= video_path, start_time=start_time, end_time=end_time, width=clip_w, height=clip_h))
+            start_time = end_time    
            if video_concat_mode.value == VideoConcatMode.sequential.value:
                break

-    # random video_paths order
+    # random subclipped_items order
    if video_concat_mode.value == VideoConcatMode.random.value:
-        random.shuffle(raw_clips)
-
+        random.shuffle(subclipped_items)
+        
+    logger.debug(f"total subclipped items: {len(subclipped_items)}")
+    
    # Add downloaded clips over and over until the duration of the audio (max_duration) has been reached
-    while video_duration < audio_duration:
-        for clip in raw_clips:
-            # Check if clip is longer than the remaining audio
-            if (audio_duration - video_duration) < clip.duration:
-                clip = clip.subclip(0, (audio_duration - video_duration))
-            # Only shorten clips if the calculated clip length (req_dur) is shorter than the actual clip to prevent still image
-            elif req_dur < clip.duration:
-                clip = clip.subclip(0, req_dur)
-            clip = clip.set_fps(30)
-
+    for i, subclipped_item in enumerate(subclipped_items):
+        if video_duration > audio_duration:
+            break
+        
+        logger.debug(f"processing clip {i+1}: {subclipped_item.width}x{subclipped_item.height}, current duration: {video_duration:.2f}s, remaining: {audio_duration - video_duration:.2f}s")
+        
+        try:
+            clip = VideoFileClip(subclipped_item.file_path).subclipped(subclipped_item.start_time, subclipped_item.end_time)
+            clip_duration = clip.duration
            # Not all videos are same size, so we need to resize them
            clip_w, clip_h = clip.size
            if clip_w != video_width or clip_h != video_height:
                clip_ratio = clip.w / clip.h
                video_ratio = video_width / video_height
-
+                logger.debug(f"resizing clip, source: {clip_w}x{clip_h}, ratio: {clip_ratio:.2f}, target: {video_width}x{video_height}, ratio: {video_ratio:.2f}")
+                
                if clip_ratio == video_ratio:
-                    # 等比例缩放
-                    clip = clip.resize((video_width, video_height))
+                    clip = clip.resized(new_size=(video_width, video_height))
                else:
-                    # 等比缩放视频
                    if clip_ratio > video_ratio:
-                        # 按照目标宽度等比缩放
                        scale_factor = video_width / clip_w
                    else:
-                        # 按照目标高度等比缩放
                        scale_factor = video_height / clip_h

                    new_width = int(clip_w * scale_factor)
                    new_height = int(clip_h * scale_factor)
-                    clip_resized = clip.resize(newsize=(new_width, new_height))

-                    background = ColorClip(size=(video_width, video_height), color=(0, 0, 0))
-                    clip = CompositeVideoClip([
-                        background.set_duration(clip.duration),
-                        clip_resized.set_position("center")
-                    ])
-
-                logger.info(f"resizing video to {video_width} x {video_height}, clip size: {clip_w} x {clip_h}")
+                    background = ColorClip(size=(video_width, video_height), color=(0, 0, 0)).with_duration(clip_duration)
+                    clip_resized = clip.resized(new_size=(new_width, new_height)).with_position("center")
+                    clip = CompositeVideoClip([background, clip_resized])
+                    
+                    close_clip(clip_resized)
+                    close_clip(background)
+                    
+            shuffle_side = random.choice(["left", "right", "top", "bottom"])
+            if video_transition_mode.value == VideoTransitionMode.none.value:
+                clip = clip
+            elif video_transition_mode.value == VideoTransitionMode.fade_in.value:
+                clip = video_effects.fadein_transition(clip, 1)
+            elif video_transition_mode.value == VideoTransitionMode.fade_out.value:
+                clip = video_effects.fadeout_transition(clip, 1)
+            elif video_transition_mode.value == VideoTransitionMode.slide_in.value:
+                clip = video_effects.slidein_transition(clip, 1, shuffle_side)
+            elif video_transition_mode.value == VideoTransitionMode.slide_out.value:
+                clip = video_effects.slideout_transition(clip, 1, shuffle_side)
+            elif video_transition_mode.value == VideoTransitionMode.shuffle.value:
+                transition_funcs = [
+                    lambda c: video_effects.fadein_transition(c, 1),
+                    lambda c: video_effects.fadeout_transition(c, 1),
+                    lambda c: video_effects.slidein_transition(c, 1, shuffle_side),
+                    lambda c: video_effects.slideout_transition(c, 1, shuffle_side),
+                ]
+                shuffle_transition = random.choice(transition_funcs)
+                clip = shuffle_transition(clip)

            if clip.duration > max_clip_duration:
-                clip = clip.subclip(0, max_clip_duration)
-
-            clips.append(clip)
+                clip = clip.subclipped(0, max_clip_duration)
+                
+            # wirte clip to temp file
+            clip_file = f"{output_dir}/temp-clip-{i+1}.mp4"
+            clip.write_videofile(clip_file, logger=None, fps=fps, codec=video_codec)
+            
+            close_clip(clip)
+        
+            processed_clips.append(SubClippedVideoClip(file_path=clip_file, duration=clip.duration, width=clip_w, height=clip_h))
            video_duration += clip.duration
+            
+        except Exception as e:
+            logger.error(f"failed to process clip: {str(e)}")
+    
+    # loop processed clips until the video duration matches or exceeds the audio duration.
+    if video_duration < audio_duration:
+        logger.warning(f"video duration ({video_duration:.2f}s) is shorter than audio duration ({audio_duration:.2f}s), looping clips to match audio length.")
+        base_clips = processed_clips.copy()
+        for clip in itertools.cycle(base_clips):
+            if video_duration >= audio_duration:
+                break
+            processed_clips.append(clip)
+            video_duration += clip.duration
+        logger.info(f"video duration: {video_duration:.2f}s, audio duration: {audio_duration:.2f}s, looped {len(processed_clips)-len(base_clips)} clips")
+     
+    # merge video clips progressively, avoid loading all videos at once to avoid memory overflow
+    logger.info("starting clip merging process")
+    if not processed_clips:
+        logger.warning("no clips available for merging")
+        return combined_video_path
+    
+    # if there is only one clip, use it directly
+    if len(processed_clips) == 1:
+        logger.info("using single clip directly")
+        shutil.copy(processed_clips[0].file_path, combined_video_path)
+        delete_files(processed_clips)
+        logger.info("video combining completed")
+        return combined_video_path
+    
+    # create initial video file as base
+    base_clip_path = processed_clips[0].file_path
+    temp_merged_video = f"{output_dir}/temp-merged-video.mp4"
+    temp_merged_next = f"{output_dir}/temp-merged-next.mp4"
+    
+    # copy first clip as initial merged video
+    shutil.copy(base_clip_path, temp_merged_video)
+    
+    # merge remaining video clips one by one
+    for i, clip in enumerate(processed_clips[1:], 1):
+        logger.info(f"merging clip {i}/{len(processed_clips)-1}, duration: {clip.duration:.2f}s")
+        
+        try:
+            # load current base video and next clip to merge
+            base_clip = VideoFileClip(temp_merged_video)
+            next_clip = VideoFileClip(clip.file_path)
+            
+            # merge these two clips
+            merged_clip = concatenate_videoclips([base_clip, next_clip])

-    video_clip = concatenate_videoclips(clips)
-    video_clip = video_clip.set_fps(30)
-    logger.info(f"writing")
-    # https://github.com/harry0703/MoneyPrinterTurbo/issues/111#issuecomment-2032354030
-    video_clip.write_videofile(filename=combined_video_path,
-                               threads=threads,
-                               logger=None,
-                               temp_audiofile_path=output_dir,
-                               audio_codec="aac",
-                               fps=30,
-                               )
-    video_clip.close()
-    logger.success(f"completed")
+            # save merged result to temp file
+            merged_clip.write_videofile(
+                filename=temp_merged_next,
+                threads=threads,
+                logger=None,
+                temp_audiofile_path=output_dir,
+                audio_codec=audio_codec,
+                fps=fps,
+            )
+            close_clip(base_clip)
+            close_clip(next_clip)
+            close_clip(merged_clip)
+            
+            # replace base file with new merged file
+            delete_files(temp_merged_video)
+            os.rename(temp_merged_next, temp_merged_video)
+            
+        except Exception as e:
+            logger.error(f"failed to merge clip: {str(e)}")
+            continue
+    
+    # after merging, rename final result to target file name
+    os.rename(temp_merged_video, combined_video_path)
+    
+    # clean temp files
+    clip_files = [clip.file_path for clip in processed_clips]
+    delete_files(clip_files)
+            
+    logger.info("video combining completed")
    return combined_video_path


-def wrap_text(text, max_width, font='Arial', fontsize=60):
-    # 创建字体对象
+def wrap_text(text, max_width, font="Arial", fontsize=60):
+    # Create ImageFont
    font = ImageFont.truetype(font, fontsize)

    def get_text_size(inner_text):
@@ -145,13 +322,11 @@ def wrap_text(text, max_width, font='Arial', fontsize=60):
    if width <= max_width:
        return text, height

-    # logger.warning(f"wrapping text, max_width: {max_width}, text_width: {width}, text: {text}")
-
    processed = True

    _wrapped_lines_ = []
    words = text.split(" ")
-    _txt_ = ''
+    _txt_ = ""
    for word in words:
        _before = _txt_
        _txt_ += f"{word} "
@@ -167,14 +342,13 @@ def wrap_text(text, max_width, font='Arial', fontsize=60):
    _wrapped_lines_.append(_txt_)
    if processed:
        _wrapped_lines_ = [line.strip() for line in _wrapped_lines_]
-        result = '\n'.join(_wrapped_lines_).strip()
+        result = "\n".join(_wrapped_lines_).strip()
        height = len(_wrapped_lines_) * height
-        # logger.warning(f"wrapped text: {result}")
        return result, height

    _wrapped_lines_ = []
    chars = list(text)
-    _txt_ = ''
+    _txt_ = ""
    for word in chars:
        _txt_ += word
        _width, _height = get_text_size(_txt_)
@@ -182,24 +356,24 @@ def wrap_text(text, max_width, font='Arial', fontsize=60):
            continue
        else:
            _wrapped_lines_.append(_txt_)
-            _txt_ = ''
+            _txt_ = ""
    _wrapped_lines_.append(_txt_)
-    result = '\n'.join(_wrapped_lines_).strip()
+    result = "\n".join(_wrapped_lines_).strip()
    height = len(_wrapped_lines_) * height
-    # logger.warning(f"wrapped text: {result}")
    return result, height


-def generate_video(video_path: str,
-                   audio_path: str,
-                   subtitle_path: str,
-                   output_file: str,
-                   params: VideoParams,
-                   ):
+def generate_video(
+    video_path: str,
+    audio_path: str,
+    subtitle_path: str,
+    output_file: str,
+    params: VideoParams,
+):
    aspect = VideoAspect(params.video_aspect)
    video_width, video_height = aspect.to_resolution()

-    logger.info(f"start, video size: {video_width} x {video_height}")
+    logger.info(f"generating video: {video_width} x {video_height}")
    logger.info(f"  ① video: {video_path}")
    logger.info(f"  ② audio: {audio_path}")
    logger.info(f"  ③ subtitle: {subtitle_path}")
@@ -215,46 +389,71 @@ def generate_video(video_path: str,
        if not params.font_name:
            params.font_name = "STHeitiMedium.ttc"
        font_path = os.path.join(utils.font_dir(), params.font_name)
-        if os.name == 'nt':
+        if os.name == "nt":
            font_path = font_path.replace("\\", "/")

-        logger.info(f"using font: {font_path}")
+        logger.info(f"  ⑤ font: {font_path}")

    def create_text_clip(subtitle_item):
+        params.font_size = int(params.font_size)
+        params.stroke_width = int(params.stroke_width)
        phrase = subtitle_item[1]
        max_width = video_width * 0.9
-        wrapped_txt, txt_height = wrap_text(phrase,
-                                            max_width=max_width,
-                                            font=font_path,
-                                            fontsize=params.font_size
-                                            )
+        wrapped_txt, txt_height = wrap_text(
+            phrase, max_width=max_width, font=font_path, fontsize=params.font_size
+        )
+        interline = int(params.font_size * 0.25)
+        size=(int(max_width), int(txt_height + params.font_size * 0.25 + (interline * (wrapped_txt.count("\n") + 1))))
+
        _clip = TextClip(
-            wrapped_txt,
+            text=wrapped_txt,
            font=font_path,
-            fontsize=params.font_size,
+            font_size=params.font_size,
            color=params.text_fore_color,
            bg_color=params.text_background_color,
            stroke_color=params.stroke_color,
            stroke_width=params.stroke_width,
-            print_cmd=False,
+            # interline=interline,
+            # size=size,
        )
        duration = subtitle_item[0][1] - subtitle_item[0][0]
-        _clip = _clip.set_start(subtitle_item[0][0])
-        _clip = _clip.set_end(subtitle_item[0][1])
-        _clip = _clip.set_duration(duration)
+        _clip = _clip.with_start(subtitle_item[0][0])
+        _clip = _clip.with_end(subtitle_item[0][1])
+        _clip = _clip.with_duration(duration)
        if params.subtitle_position == "bottom":
-            _clip = _clip.set_position(('center', video_height * 0.95 - _clip.h))
+            _clip = _clip.with_position(("center", video_height * 0.95 - _clip.h))
        elif params.subtitle_position == "top":
-            _clip = _clip.set_position(('center', video_height * 0.1))
-        else:
-            _clip = _clip.set_position(('center', 'center'))
+            _clip = _clip.with_position(("center", video_height * 0.05))
+        elif params.subtitle_position == "custom":
+            # Ensure the subtitle is fully within the screen bounds
+            margin = 10  # Additional margin, in pixels
+            max_y = video_height - _clip.h - margin
+            min_y = margin
+            custom_y = (video_height - _clip.h) * (params.custom_position / 100)
+            custom_y = max(
+                min_y, min(custom_y, max_y)
+            )  # Constrain the y value within the valid range
+            _clip = _clip.with_position(("center", custom_y))
+        else:  # center
+            _clip = _clip.with_position(("center", "center"))
        return _clip

-    video_clip = VideoFileClip(video_path)
-    audio_clip = AudioFileClip(audio_path).volumex(params.voice_volume)
+    video_clip = VideoFileClip(video_path).without_audio()
+    audio_clip = AudioFileClip(audio_path).with_effects(
+        [afx.MultiplyVolume(params.voice_volume)]
+    )
+
+    def make_textclip(text):
+        return TextClip(
+            text=text,
+            font=font_path,
+            font_size=params.font_size,
+        )

    if subtitle_path and os.path.exists(subtitle_path):
-        sub = SubtitlesClip(subtitles=subtitle_path, encoding='utf-8')
+        sub = SubtitlesClip(
+            subtitles=subtitle_path, encoding="utf-8", make_textclip=make_textclip
+        )
        text_clips = []
        for item in sub.subtitles:
            clip = create_text_clip(subtitle_item=item)
@@ -264,24 +463,28 @@ def generate_video(video_path: str,
    bgm_file = get_bgm_file(bgm_type=params.bgm_type, bgm_file=params.bgm_file)
    if bgm_file:
        try:
-            bgm_clip = (AudioFileClip(bgm_file)
-                        .volumex(params.bgm_volume)
-                        .audio_fadeout(3))
-            bgm_clip = afx.audio_loop(bgm_clip, duration=video_clip.duration)
+            bgm_clip = AudioFileClip(bgm_file).with_effects(
+                [
+                    afx.MultiplyVolume(params.bgm_volume),
+                    afx.AudioFadeOut(3),
+                    afx.AudioLoop(duration=video_clip.duration),
+                ]
+            )
            audio_clip = CompositeAudioClip([audio_clip, bgm_clip])
        except Exception as e:
            logger.error(f"failed to add bgm: {str(e)}")

-    video_clip = video_clip.set_audio(audio_clip)
-    video_clip.write_videofile(output_file,
-                               audio_codec="aac",
-                               temp_audiofile_path=output_dir,
-                               threads=params.n_threads or 2,
-                               logger=None,
-                               fps=30,
-                               )
+    video_clip = video_clip.with_audio(audio_clip)
+    video_clip.write_videofile(
+        output_file,
+        audio_codec=audio_codec,
+        temp_audiofile_path=output_dir,
+        threads=params.n_threads or 2,
+        logger=None,
+        fps=fps,
+    )
    video_clip.close()
-    logger.success(f"completed")
+    del video_clip


 def preprocess_video(materials: List[MaterialInfo], clip_duration=4):
@@ -292,93 +495,40 @@ def preprocess_video(materials: List[MaterialInfo], clip_duration=4):
        ext = utils.parse_extension(material.url)
        try:
            clip = VideoFileClip(material.url)
-        except Exception as e:
+        except Exception:
            clip = ImageClip(material.url)

        width = clip.size[0]
        height = clip.size[1]
        if width < 480 or height < 480:
-            logger.warning(f"video is too small, width: {width}, height: {height}")
+            logger.warning(f"low resolution material: {width}x{height}, minimum 480x480 required")
            continue

        if ext in const.FILE_TYPE_IMAGES:
            logger.info(f"processing image: {material.url}")
-            # 创建一个图片剪辑，并设置持续时间为3秒钟
-            clip = ImageClip(material.url).set_duration(clip_duration).set_position("center")
-            # 使用resize方法来添加缩放效果。这里使用了lambda函数来使得缩放效果随时间变化。
-            # 假设我们想要从原始大小逐渐放大到120%的大小。
-            # t代表当前时间，clip.duration为视频总时长，这里是3秒。
-            # 注意：1 表示100%的大小，所以1.2表示120%的大小
-            zoom_clip = clip.resize(lambda t: 1 + (clip_duration * 0.03) * (t / clip.duration))
+            # Create an image clip and set its duration to 3 seconds
+            clip = (
+                ImageClip(material.url)
+                .with_duration(clip_duration)
+                .with_position("center")
+            )
+            # Apply a zoom effect using the resize method.
+            # A lambda function is used to make the zoom effect dynamic over time.
+            # The zoom effect starts from the original size and gradually scales up to 120%.
+            # t represents the current time, and clip.duration is the total duration of the clip (3 seconds).
+            # Note: 1 represents 100% size, so 1.2 represents 120% size.
+            zoom_clip = clip.resized(
+                lambda t: 1 + (clip_duration * 0.03) * (t / clip.duration)
+            )

-            # 如果需要，可以创建一个包含缩放剪辑的复合视频剪辑
-            # （这在您想要在视频中添加其他元素时非常有用）
+            # Optionally, create a composite video clip containing the zoomed clip.
+            # This is useful when you want to add other elements to the video.
            final_clip = CompositeVideoClip([zoom_clip])

-            # 输出视频
+            # Output the video to a file.
            video_file = f"{material.url}.mp4"
            final_clip.write_videofile(video_file, fps=30, logger=None)
-            final_clip.close()
+            close_clip(clip)
            material.url = video_file
-            logger.success(f"completed: {video_file}")
-    return materials
-
-
-if __name__ == "__main__":
-    m = MaterialInfo()
-    m.url = "/Users/harry/Downloads/IMG_2915.JPG"
-    m.provider = "local"
-    materials = preprocess_video([m], clip_duration=4)
-    print(materials)
-
-    # txt_en = "Here's your guide to travel hacks for budget-friendly adventures"
-    # txt_zh = "测试长字段这是您的旅行技巧指南帮助您进行预算友好的冒险"
-    # font = utils.resource_dir() + "/fonts/STHeitiMedium.ttc"
-    # for txt in [txt_en, txt_zh]:
-    #     t, h = wrap_text(text=txt, max_width=1000, font=font, fontsize=60)
-    #     print(t)
-    #
-    # task_id = "aa563149-a7ea-49c2-b39f-8c32cc225baf"
-    # task_dir = utils.task_dir(task_id)
-    # video_file = f"{task_dir}/combined-1.mp4"
-    # audio_file = f"{task_dir}/audio.mp3"
-    # subtitle_file = f"{task_dir}/subtitle.srt"
-    # output_file = f"{task_dir}/final.mp4"
-    #
-    # # video_paths = []
-    # # for file in os.listdir(utils.storage_dir("test")):
-    # #     if file.endswith(".mp4"):
-    # #         video_paths.append(os.path.join(utils.storage_dir("test"), file))
-    # #
-    # # combine_videos(combined_video_path=video_file,
-    # #                audio_file=audio_file,
-    # #                video_paths=video_paths,
-    # #                video_aspect=VideoAspect.portrait,
-    # #                video_concat_mode=VideoConcatMode.random,
-    # #                max_clip_duration=5,
-    # #                threads=2)
-    #
-    # cfg = VideoParams()
-    # cfg.video_aspect = VideoAspect.portrait
-    # cfg.font_name = "STHeitiMedium.ttc"
-    # cfg.font_size = 60
-    # cfg.stroke_color = "#000000"
-    # cfg.stroke_width = 1.5
-    # cfg.text_fore_color = "#FFFFFF"
-    # cfg.text_background_color = "transparent"
-    # cfg.bgm_type = "random"
-    # cfg.bgm_file = ""
-    # cfg.bgm_volume = 1.0
-    # cfg.subtitle_enabled = True
-    # cfg.subtitle_position = "bottom"
-    # cfg.n_threads = 2
-    # cfg.paragraph_number = 1
-    #
-    # cfg.voice_volume = 1.0
-    #
-    # generate_video(video_path=video_file,
-    #                audio_path=audio_file,
-    #                subtitle_path=subtitle_file,
-    #                output_file=output_file,
-    #                params=cfg
-    #                )
+            logger.success(f"image processed: {video_file}")
+    return materials
--- a/app/services/voice.py
+++ b/app/services/voice.py
@@ -2,21 +2,48 @@ import asyncio
 import os
 import re
 from datetime import datetime
+from typing import Union
 from xml.sax.saxutils import unescape
+
+import edge_tts
+import requests
+from edge_tts import SubMaker, submaker
 from edge_tts.submaker import mktimestamp
 from loguru import logger
-from edge_tts import submaker, SubMaker
-import edge_tts
 from moviepy.video.tools import subtitles

 from app.config import config
 from app.utils import utils


+def get_siliconflow_voices() -> list[str]:
+    """
+    获取硅基流动的声音列表
+
+    Returns:
+        声音列表，格式为 ["siliconflow:FunAudioLLM/CosyVoice2-0.5B:alex", ...]
+    """
+    # 硅基流动的声音列表和对应的性别（用于显示）
+    voices_with_gender = [
+        ("FunAudioLLM/CosyVoice2-0.5B", "alex", "Male"),
+        ("FunAudioLLM/CosyVoice2-0.5B", "anna", "Female"),
+        ("FunAudioLLM/CosyVoice2-0.5B", "bella", "Female"),
+        ("FunAudioLLM/CosyVoice2-0.5B", "benjamin", "Male"),
+        ("FunAudioLLM/CosyVoice2-0.5B", "charles", "Male"),
+        ("FunAudioLLM/CosyVoice2-0.5B", "claire", "Female"),
+        ("FunAudioLLM/CosyVoice2-0.5B", "david", "Male"),
+        ("FunAudioLLM/CosyVoice2-0.5B", "diana", "Female"),
+    ]
+
+    # 添加siliconflow:前缀，并格式化为显示名称
+    return [
+        f"siliconflow:{model}:{voice}-{gender}"
+        for model, voice, gender in voices_with_gender
+    ]
+
+
 def get_all_azure_voices(filter_locals=None) -> list[str]:
-    if filter_locals is None:
-        filter_locals = ["zh-CN", "en-US", "zh-HK", "zh-TW", "vi-VN"]
-    voices_str = """
+    azure_voices_str = """
 Name: af-ZA-AdriNeural
 Gender: Female

@@ -302,21 +329,33 @@ Gender: Female
 Name: en-US-AnaNeural
 Gender: Female

+Name: en-US-AndrewMultilingualNeural
+Gender: Male
+
 Name: en-US-AndrewNeural
 Gender: Male

 Name: en-US-AriaNeural
 Gender: Female

+Name: en-US-AvaMultilingualNeural
+Gender: Female
+
 Name: en-US-AvaNeural
 Gender: Female

+Name: en-US-BrianMultilingualNeural
+Gender: Male
+
 Name: en-US-BrianNeural
 Gender: Male

 Name: en-US-ChristopherNeural
 Gender: Male

+Name: en-US-EmmaMultilingualNeural
+Gender: Female
+
 Name: en-US-EmmaNeural
 Gender: Female

@@ -602,12 +641,24 @@ Gender: Male
 Name: it-IT-ElsaNeural
 Gender: Female

-Name: it-IT-GiuseppeNeural
+Name: it-IT-GiuseppeMultilingualNeural
 Gender: Male

 Name: it-IT-IsabellaNeural
 Gender: Female

+Name: iu-Cans-CA-SiqiniqNeural
+Gender: Female
+
+Name: iu-Cans-CA-TaqqiqNeural
+Gender: Male
+
+Name: iu-Latn-CA-SiqiniqNeural
+Gender: Female
+
+Name: iu-Latn-CA-TaqqiqNeural
+Gender: Male
+
 Name: ja-JP-KeitaNeural
 Gender: Male

@@ -644,7 +695,7 @@ Gender: Male
 Name: kn-IN-SapnaNeural
 Gender: Female

-Name: ko-KR-HyunsuNeural
+Name: ko-KR-HyunsuMultilingualNeural
 Gender: Male

 Name: ko-KR-InJoonNeural
@@ -758,7 +809,7 @@ Gender: Male
 Name: pt-BR-FranciscaNeural
 Gender: Female

-Name: pt-BR-ThalitaNeural
+Name: pt-BR-ThalitaMultilingualNeural
 Gender: Female

 Name: pt-PT-DuarteNeural
@@ -988,27 +1039,20 @@ Name: zh-CN-XiaoxiaoMultilingualNeural-V2
 Gender: Female
    """.strip()
    voices = []
-    name = ''
-    for line in voices_str.split("\n"):
-        line = line.strip()
-        if not line:
-            continue
-        if line.startswith("Name: "):
-            name = line[6:].strip()
-        if line.startswith("Gender: "):
-            gender = line[8:].strip()
-            if name and gender:
-                # voices.append({
-                #     "name": name,
-                #     "gender": gender,
-                # })
-                if filter_locals:
-                    for filter_local in filter_locals:
-                        if name.lower().startswith(filter_local.lower()):
-                            voices.append(f"{name}-{gender}")
-                else:
-                    voices.append(f"{name}-{gender}")
-                name = ''
+    # 定义正则表达式模式，用于匹配 Name 和 Gender 行
+    pattern = re.compile(r"Name:\s*(.+)\s*Gender:\s*(.+)\s*", re.MULTILINE)
+    # 使用正则表达式查找所有匹配项
+    matches = pattern.findall(azure_voices_str)
+
+    for name, gender in matches:
+        # 应用过滤条件
+        if filter_locals and any(
+            name.lower().startswith(fl.lower()) for fl in filter_locals
+        ):
+            voices.append(f"{name}-{gender}")
+        elif not filter_locals:
+            voices.append(f"{name}-{gender}")
+
    voices.sort()
    return voices

@@ -1028,33 +1072,76 @@ def is_azure_v2_voice(voice_name: str):
    return ""


-def tts(text: str, voice_name: str, voice_file: str) -> [SubMaker, None]:
+def is_siliconflow_voice(voice_name: str):
+    """检查是否是硅基流动的声音"""
+    return voice_name.startswith("siliconflow:")
+
+
+def tts(
+    text: str,
+    voice_name: str,
+    voice_rate: float,
+    voice_file: str,
+    voice_volume: float = 1.0,
+) -> Union[SubMaker, None]:
    if is_azure_v2_voice(voice_name):
        return azure_tts_v2(text, voice_name, voice_file)
-    return azure_tts_v1(text, voice_name, voice_file)
+    elif is_siliconflow_voice(voice_name):
+        # 从voice_name中提取模型和声音
+        # 格式: siliconflow:model:voice-Gender
+        parts = voice_name.split(":")
+        if len(parts) >= 3:
+            model = parts[1]
+            # 移除性别后缀，例如 "alex-Male" -> "alex"
+            voice_with_gender = parts[2]
+            voice = voice_with_gender.split("-")[0]
+            # 构建完整的voice参数，格式为 "model:voice"
+            full_voice = f"{model}:{voice}"
+            return siliconflow_tts(
+                text, model, full_voice, voice_rate, voice_file, voice_volume
+            )
+        else:
+            logger.error(f"Invalid siliconflow voice name format: {voice_name}")
+            return None
+    return azure_tts_v1(text, voice_name, voice_rate, voice_file)


-def azure_tts_v1(text: str, voice_name: str, voice_file: str) -> [SubMaker, None]:
+def convert_rate_to_percent(rate: float) -> str:
+    if rate == 1.0:
+        return "+0%"
+    percent = round((rate - 1.0) * 100)
+    if percent > 0:
+        return f"+{percent}%"
+    else:
+        return f"{percent}%"
+
+
+def azure_tts_v1(
+    text: str, voice_name: str, voice_rate: float, voice_file: str
+) -> Union[SubMaker, None]:
    voice_name = parse_voice_name(voice_name)
    text = text.strip()
+    rate_str = convert_rate_to_percent(voice_rate)
    for i in range(3):
        try:
            logger.info(f"start, voice name: {voice_name}, try: {i + 1}")

            async def _do() -> SubMaker:
-                communicate = edge_tts.Communicate(text, voice_name)
+                communicate = edge_tts.Communicate(text, voice_name, rate=rate_str)
                sub_maker = edge_tts.SubMaker()
                with open(voice_file, "wb") as file:
                    async for chunk in communicate.stream():
                        if chunk["type"] == "audio":
                            file.write(chunk["data"])
                        elif chunk["type"] == "WordBoundary":
-                            sub_maker.create_sub((chunk["offset"], chunk["duration"]), chunk["text"])
+                            sub_maker.create_sub(
+                                (chunk["offset"], chunk["duration"]), chunk["text"]
+                            )
                return sub_maker

            sub_maker = asyncio.run(_do())
            if not sub_maker or not sub_maker.subs:
-                logger.warning(f"failed, sub_maker is None or sub_maker.subs is None")
+                logger.warning("failed, sub_maker is None or sub_maker.subs is None")
                continue

            logger.info(f"completed, output file: {voice_file}")
@@ -1064,7 +1151,145 @@ def azure_tts_v1(text: str, voice_name: str, voice_file: str) -> [SubMaker, None
    return None


-def azure_tts_v2(text: str, voice_name: str, voice_file: str) -> [SubMaker, None]:
+def siliconflow_tts(
+    text: str,
+    model: str,
+    voice: str,
+    voice_rate: float,
+    voice_file: str,
+    voice_volume: float = 1.0,
+) -> Union[SubMaker, None]:
+    """
+    使用硅基流动的API生成语音
+
+    Args:
+        text: 要转换为语音的文本
+        model: 模型名称，如 "FunAudioLLM/CosyVoice2-0.5B"
+        voice: 声音名称，如 "FunAudioLLM/CosyVoice2-0.5B:alex"
+        voice_rate: 语音速度，范围[0.25, 4.0]
+        voice_file: 输出的音频文件路径
+        voice_volume: 语音音量，范围[0.6, 5.0]，需要转换为硅基流动的增益范围[-10, 10]
+
+    Returns:
+        SubMaker对象或None
+    """
+    text = text.strip()
+    api_key = config.siliconflow.get("api_key", "")
+
+    if not api_key:
+        logger.error("SiliconFlow API key is not set")
+        return None
+
+    # 将voice_volume转换为硅基流动的增益范围
+    # 默认voice_volume为1.0，对应gain为0
+    gain = voice_volume - 1.0
+    # 确保gain在[-10, 10]范围内
+    gain = max(-10, min(10, gain))
+
+    url = "https://api.siliconflow.cn/v1/audio/speech"
+
+    payload = {
+        "model": model,
+        "input": text,
+        "voice": voice,
+        "response_format": "mp3",
+        "sample_rate": 32000,
+        "stream": False,
+        "speed": voice_rate,
+        "gain": gain,
+    }
+
+    headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}
+
+    for i in range(3):  # 尝试3次
+        try:
+            logger.info(
+                f"start siliconflow tts, model: {model}, voice: {voice}, try: {i + 1}"
+            )
+
+            response = requests.post(url, json=payload, headers=headers)
+
+            if response.status_code == 200:
+                # 保存音频文件
+                with open(voice_file, "wb") as f:
+                    f.write(response.content)
+
+                # 创建一个空的SubMaker对象
+                sub_maker = SubMaker()
+
+                # 获取音频文件的实际长度
+                try:
+                    # 尝试使用moviepy获取音频长度
+                    from moviepy import AudioFileClip
+
+                    audio_clip = AudioFileClip(voice_file)
+                    audio_duration = audio_clip.duration
+                    audio_clip.close()
+
+                    # 将音频长度转换为100纳秒单位（与edge_tts兼容）
+                    audio_duration_100ns = int(audio_duration * 10000000)
+
+                    # 使用文本分割来创建更准确的字幕
+                    # 将文本按标点符号分割成句子
+                    sentences = utils.split_string_by_punctuations(text)
+
+                    if sentences:
+                        # 计算每个句子的大致时长（按字符数比例分配）
+                        total_chars = sum(len(s) for s in sentences)
+                        char_duration = (
+                            audio_duration_100ns / total_chars if total_chars > 0 else 0
+                        )
+
+                        current_offset = 0
+                        for sentence in sentences:
+                            if not sentence.strip():
+                                continue
+
+                            # 计算当前句子的时长
+                            sentence_chars = len(sentence)
+                            sentence_duration = int(sentence_chars * char_duration)
+
+                            # 添加到SubMaker
+                            sub_maker.subs.append(sentence)
+                            sub_maker.offset.append(
+                                (current_offset, current_offset + sentence_duration)
+                            )
+
+                            # 更新偏移量
+                            current_offset += sentence_duration
+                    else:
+                        # 如果无法分割，则使用整个文本作为一个字幕
+                        sub_maker.subs = [text]
+                        sub_maker.offset = [(0, audio_duration_100ns)]
+
+                except Exception as e:
+                    logger.warning(f"Failed to create accurate subtitles: {str(e)}")
+                    # 回退到简单的字幕
+                    sub_maker.subs = [text]
+                    # 使用音频文件的实际长度，如果无法获取，则假设为10秒
+                    sub_maker.offset = [
+                        (
+                            0,
+                            audio_duration_100ns
+                            if "audio_duration_100ns" in locals()
+                            else 10000000,
+                        )
+                    ]
+
+                logger.success(f"siliconflow tts succeeded: {voice_file}")
+                print("s", sub_maker.subs, sub_maker.offset)
+                return sub_maker
+            else:
+                logger.error(
+                    f"siliconflow tts failed with status code {response.status_code}: {response.text}"
+                )
+        except Exception as e:
+            logger.error(f"siliconflow tts failed: {str(e)}")
+
+    return None
+
+
+def azure_tts_v2(text: str, voice_name: str, voice_file: str) -> Union[SubMaker, None]:
    voice_name = is_azure_v2_voice(voice_name)
    if not voice_name:
        logger.error(f"invalid voice name: {voice_name}")
@@ -1074,8 +1299,12 @@ def azure_tts_v2(text: str, voice_name: str, voice_file: str) -> [SubMaker, None
    def _format_duration_to_offset(duration) -> int:
        if isinstance(duration, str):
            time_obj = datetime.strptime(duration, "%H:%M:%S.%f")
-            milliseconds = (time_obj.hour * 3600000) + (time_obj.minute * 60000) + (time_obj.second * 1000) + (
-                    time_obj.microsecond // 1000)
+            milliseconds = (
+                (time_obj.hour * 3600000)
+                + (time_obj.minute * 60000)
+                + (time_obj.second * 1000)
+                + (time_obj.microsecond // 1000)
+            )
            return milliseconds * 10000

        if isinstance(duration, int):
@@ -1108,20 +1337,29 @@ def azure_tts_v2(text: str, voice_name: str, voice_file: str) -> [SubMaker, None
            # Creates an instance of a speech config with specified subscription key and service region.
            speech_key = config.azure.get("speech_key", "")
            service_region = config.azure.get("speech_region", "")
-            audio_config = speechsdk.audio.AudioOutputConfig(filename=voice_file, use_default_speaker=True)
-            speech_config = speechsdk.SpeechConfig(subscription=speech_key,
-                                                   region=service_region)
+            audio_config = speechsdk.audio.AudioOutputConfig(
+                filename=voice_file, use_default_speaker=True
+            )
+            speech_config = speechsdk.SpeechConfig(
+                subscription=speech_key, region=service_region
+            )
            speech_config.speech_synthesis_voice_name = voice_name
            # speech_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceResponse_RequestSentenceBoundary,
            #                            value='true')
-            speech_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceResponse_RequestWordBoundary,
-                                       value='true')
+            speech_config.set_property(
+                property_id=speechsdk.PropertyId.SpeechServiceResponse_RequestWordBoundary,
+                value="true",
+            )

            speech_config.set_speech_synthesis_output_format(
-                speechsdk.SpeechSynthesisOutputFormat.Audio48Khz192KBitRateMonoMp3)
-            speech_synthesizer = speechsdk.SpeechSynthesizer(audio_config=audio_config,
-                                                             speech_config=speech_config)
-            speech_synthesizer.synthesis_word_boundary.connect(speech_synthesizer_word_boundary_cb)
+                speechsdk.SpeechSynthesisOutputFormat.Audio48Khz192KBitRateMonoMp3
+            )
+            speech_synthesizer = speechsdk.SpeechSynthesizer(
+                audio_config=audio_config, speech_config=speech_config
+            )
+            speech_synthesizer.synthesis_word_boundary.connect(
+                speech_synthesizer_word_boundary_cb
+            )

            result = speech_synthesizer.speak_text_async(text).get()
            if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
@@ -1129,9 +1367,13 @@ def azure_tts_v2(text: str, voice_name: str, voice_file: str) -> [SubMaker, None
                return sub_maker
            elif result.reason == speechsdk.ResultReason.Canceled:
                cancellation_details = result.cancellation_details
-                logger.error(f"azure v2 speech synthesis canceled: {cancellation_details.reason}")
+                logger.error(
+                    f"azure v2 speech synthesis canceled: {cancellation_details.reason}"
+                )
                if cancellation_details.reason == speechsdk.CancellationReason.Error:
-                    logger.error(f"azure v2 speech synthesis error: {cancellation_details.error_details}")
+                    logger.error(
+                        f"azure v2 speech synthesis error: {cancellation_details.error_details}"
+                    )
            logger.info(f"completed, output file: {voice_file}")
        except Exception as e:
            logger.error(f"failed, error: {str(e)}")
@@ -1168,11 +1410,7 @@ def create_subtitle(sub_maker: submaker.SubMaker, text: str, subtitle_file: str)
        """
        start_t = mktimestamp(start_time).replace(".", ",")
        end_t = mktimestamp(end_time).replace(".", ",")
-        return (
-            f"{idx}\n"
-            f"{start_t} --> {end_t}\n"
-            f"{sub_text}\n"
-        )
+        return f"{idx}\n{start_t} --> {end_t}\n{sub_text}\n"

    start_time = -1.0
    sub_items = []
@@ -1229,12 +1467,16 @@ def create_subtitle(sub_maker: submaker.SubMaker, text: str, subtitle_file: str)
            try:
                sbs = subtitles.file_to_subtitles(subtitle_file, encoding="utf-8")
                duration = max([tb for ((ta, tb), txt) in sbs])
-                logger.info(f"completed, subtitle file created: {subtitle_file}, duration: {duration}")
+                logger.info(
+                    f"completed, subtitle file created: {subtitle_file}, duration: {duration}"
+                )
            except Exception as e:
                logger.error(f"failed, error: {str(e)}")
                os.remove(subtitle_file)
        else:
-            logger.warning(f"failed, sub_items len: {len(sub_items)}, script_lines len: {len(script_lines)}")
+            logger.warning(
+                f"failed, sub_items len: {len(sub_items)}, script_lines len: {len(script_lines)}"
+            )

    except Exception as e:
        logger.error(f"failed, error: {str(e)}")
@@ -1258,7 +1500,6 @@ if __name__ == "__main__":
    voices = get_all_azure_voices()
    print(len(voices))

-
    async def _do():
        temp_dir = utils.storage_dir("temp")

@@ -1307,12 +1548,13 @@ if __name__ == "__main__":
        for voice_name in voice_names:
            voice_file = f"{temp_dir}/tts-{voice_name}.mp3"
            subtitle_file = f"{temp_dir}/tts.mp3.srt"
-            sub_maker = azure_tts_v2(text=text, voice_name=voice_name, voice_file=voice_file)
+            sub_maker = azure_tts_v2(
+                text=text, voice_name=voice_name, voice_file=voice_file
+            )
            create_subtitle(sub_maker=sub_maker, text=text, subtitle_file=subtitle_file)
            audio_duration = get_audio_duration(sub_maker)
            print(f"voice: {voice_name}, audio duration: {audio_duration}s")

-
    loop = asyncio.get_event_loop_policy().get_event_loop()
    try:
        loop.run_until_complete(_do())
--- a/app/utils/utils.py
+++ b/app/utils/utils.py
@@ -1,12 +1,13 @@
+import json
 import locale
 import os
-import platform
+from pathlib import Path
 import threading
 from typing import Any
-from loguru import logger
-import json
 from uuid import uuid4
+
 import urllib3
+from loguru import logger

 from app.models import const

@@ -15,44 +16,44 @@ urllib3.disable_warnings()

 def get_response(status: int, data: Any = None, message: str = ""):
    obj = {
-        'status': status,
+        "status": status,
    }
    if data:
-        obj['data'] = data
+        obj["data"] = data
    if message:
-        obj['message'] = message
+        obj["message"] = message
    return obj


 def to_json(obj):
    try:
-        # 定义一个辅助函数来处理不同类型的对象
+        # Define a helper function to handle different types of objects
        def serialize(o):
-            # 如果对象是可序列化类型，直接返回
+            # If the object is a serializable type, return it directly
            if isinstance(o, (int, float, bool, str)) or o is None:
                return o
-            # 如果对象是二进制数据，转换为base64编码的字符串
+            # If the object is binary data, convert it to a base64-encoded string
            elif isinstance(o, bytes):
                return "*** binary data ***"
-            # 如果对象是字典，递归处理每个键值对
+            # If the object is a dictionary, recursively process each key-value pair
            elif isinstance(o, dict):
                return {k: serialize(v) for k, v in o.items()}
-            # 如果对象是列表或元组，递归处理每个元素
+            # If the object is a list or tuple, recursively process each element
            elif isinstance(o, (list, tuple)):
                return [serialize(item) for item in o]
-            # 如果对象是自定义类型，尝试返回其__dict__属性
-            elif hasattr(o, '__dict__'):
+            # If the object is a custom type, attempt to return its __dict__ attribute
+            elif hasattr(o, "__dict__"):
                return serialize(o.__dict__)
-            # 其他情况返回None（或者可以选择抛出异常）
+            # Return None for other cases (or choose to raise an exception)
            else:
                return None

-        # 使用serialize函数处理输入对象
+        # Use the serialize function to process the input object
        serialized_obj = serialize(obj)

-        # 序列化处理后的对象为JSON字符串
+        # Serialize the processed object into a JSON string
        return json.dumps(serialized_obj, ensure_ascii=False, indent=4)
-    except Exception as e:
+    except Exception:
        return None


@@ -94,7 +95,7 @@ def task_dir(sub_dir: str = ""):


 def font_dir(sub_dir: str = ""):
-    d = resource_dir(f"fonts")
+    d = resource_dir("fonts")
    if sub_dir:
        d = os.path.join(d, sub_dir)
    if not os.path.exists(d):
@@ -103,7 +104,7 @@ def font_dir(sub_dir: str = ""):


 def song_dir(sub_dir: str = ""):
-    d = resource_dir(f"songs")
+    d = resource_dir("songs")
    if sub_dir:
        d = os.path.join(d, sub_dir)
    if not os.path.exists(d):
@@ -112,7 +113,7 @@ def song_dir(sub_dir: str = ""):


 def public_dir(sub_dir: str = ""):
-    d = resource_dir(f"public")
+    d = resource_dir("public")
    if sub_dir:
        d = os.path.join(d, sub_dir)
    if not os.path.exists(d):
@@ -182,7 +183,7 @@ def split_string_by_punctuations(s):
            next_char = s[i + 1]

        if char == "." and previous_char.isdigit() and next_char.isdigit():
-            # 取现1万，按2.5%收取手续费, 2.5 中的 . 不能作为换行标记
+            # # In the case of "withdraw 10,000, charged at 2.5% fee", the dot in "2.5" should not be treated as a line break marker
            txt += char
            continue

@@ -199,7 +200,8 @@ def split_string_by_punctuations(s):

 def md5(text):
    import hashlib
-    return hashlib.md5(text.encode('utf-8')).hexdigest()
+
+    return hashlib.md5(text.encode("utf-8")).hexdigest()


 def get_system_locale():
@@ -209,7 +211,7 @@ def get_system_locale():
        # en_US, en_GB return en
        language_code = loc[0].split("_")[0]
        return language_code
-    except Exception as e:
+    except Exception:
        return "en"


@@ -225,4 +227,4 @@ def load_locales(i18n_dir):


 def parse_extension(filename):
-    return os.path.splitext(filename)[1].strip().lower().replace(".", "")
+    return Path(filename).suffix.lower().lstrip('.')
--- a/changelog.py
+++ b/changelog.py
@@ -12,6 +12,6 @@ build_and_render(
    parse_refs=False,
    sections=["build", "deps", "feat", "fix", "refactor"],
    versioning="pep440",
-    bump="1.1.2",   # 指定bump版本
+    bump="1.1.2",  # 指定bump版本
    in_place=True,
 )
--- a/config.example.toml
+++ b/config.example.toml
@@ -1,194 +1,205 @@
 [app]
+video_source = "pexels" # "pexels" or "pixabay"

-    video_source = "pexels"  # "pexels" or "pixabay"
-    # Pexels API Key
-    # Register at https://www.pexels.com/api/ to get your API key.
-    # You can use multiple keys to avoid rate limits.
-    # For example: pexels_api_keys = ["123adsf4567adf89","abd1321cd13efgfdfhi"]
-    # 特别注意格式，Key 用英文双引号括起来，多个Key用逗号隔开
-    pexels_api_keys = []
+# 是否隐藏配置面板
+hide_config = false

-    # Pixabay API Key
-    # Register at https://pixabay.com/api/docs/ to get your API key.
-    # You can use multiple keys to avoid rate limits.
-    # For example: pixabay_api_keys = ["123adsf4567adf89","abd1321cd13efgfdfhi"]
-    # 特别注意格式，Key 用英文双引号括起来，多个Key用逗号隔开
-    pixabay_api_keys = []
+# Pexels API Key
+# Register at https://www.pexels.com/api/ to get your API key.
+# You can use multiple keys to avoid rate limits.
+# For example: pexels_api_keys = ["123adsf4567adf89","abd1321cd13efgfdfhi"]
+# 特别注意格式，Key 用英文双引号括起来，多个Key用逗号隔开
+pexels_api_keys = []

-    # 如果你没有 OPENAI API Key，可以使用 g4f 代替，或者使用国内的 Moonshot API
-    # If you don't have an OPENAI API Key, you can use g4f instead
+# Pixabay API Key
+# Register at https://pixabay.com/api/docs/ to get your API key.
+# You can use multiple keys to avoid rate limits.
+# For example: pixabay_api_keys = ["123adsf4567adf89","abd1321cd13efgfdfhi"]
+# 特别注意格式，Key 用英文双引号括起来，多个Key用逗号隔开
+pixabay_api_keys = []

-    # 支持的提供商 (Supported providers):
-    #   openai
-    #   moonshot (月之暗面)
-    #   oneapi
-    #   g4f
-    #   azure
-    #   qwen (通义千问)
-    #   gemini
-    llm_provider="openai"
+# 支持的提供商 (Supported providers):
+#   openai
+#   moonshot    (月之暗面)
+#   azure
+#   qwen        (通义千问)
+#   deepseek
+#   gemini
+#   ollama
+#   g4f
+#   oneapi
+#   cloudflare
+#   ernie       (文心一言)
+llm_provider = "openai"

-    ########## Ollama Settings
-    # No need to set it unless you want to use your own proxy
-    ollama_base_url = ""
-    # Check your available models at https://ollama.com/library
-    ollama_model_name = ""
+########## Ollama Settings
+# No need to set it unless you want to use your own proxy
+ollama_base_url = ""
+# Check your available models at https://ollama.com/library
+ollama_model_name = ""

-    ########## OpenAI API Key
-    # Get your API key at https://platform.openai.com/api-keys
-    openai_api_key = ""
-    # No need to set it unless you want to use your own proxy
-    openai_base_url = ""
-    # Check your available models at https://platform.openai.com/account/limits
-    openai_model_name = "gpt-4-turbo"
+########## OpenAI API Key
+# Get your API key at https://platform.openai.com/api-keys
+openai_api_key = ""
+# No need to set it unless you want to use your own proxy
+openai_base_url = ""
+# Check your available models at https://platform.openai.com/account/limits
+openai_model_name = "gpt-4o-mini"

-    ########## Moonshot API Key
-    # Visit https://platform.moonshot.cn/console/api-keys to get your API key.
-    moonshot_api_key=""
-    moonshot_base_url = "https://api.moonshot.cn/v1"
-    moonshot_model_name = "moonshot-v1-8k"
+########## Moonshot API Key
+# Visit https://platform.moonshot.cn/console/api-keys to get your API key.
+moonshot_api_key = ""
+moonshot_base_url = "https://api.moonshot.cn/v1"
+moonshot_model_name = "moonshot-v1-8k"

-    ########## OneAPI API Key
-    # Visit https://github.com/songquanpeng/one-api to get your API key
-    oneapi_api_key=""
-    oneapi_base_url=""
-    oneapi_model_name=""
+########## OneAPI API Key
+# Visit https://github.com/songquanpeng/one-api to get your API key
+oneapi_api_key = ""
+oneapi_base_url = ""
+oneapi_model_name = ""

-    ########## G4F
-    # Visit https://github.com/xtekky/gpt4free to get more details
-    # Supported model list: https://github.com/xtekky/gpt4free/blob/main/g4f/models.py
-    g4f_model_name = "gpt-3.5-turbo"
+########## G4F
+# Visit https://github.com/xtekky/gpt4free to get more details
+# Supported model list: https://github.com/xtekky/gpt4free/blob/main/g4f/models.py
+g4f_model_name = "gpt-3.5-turbo"

-    ########## Azure API Key
-    # Visit https://learn.microsoft.com/zh-cn/azure/ai-services/openai/ to get more details
-    # API documentation: https://learn.microsoft.com/zh-cn/azure/ai-services/openai/reference
-    azure_api_key = ""
-    azure_base_url=""
-    azure_model_name="gpt-35-turbo" # replace with your model deployment name
-    azure_api_version = "2024-02-15-preview"
+########## Azure API Key
+# Visit https://learn.microsoft.com/zh-cn/azure/ai-services/openai/ to get more details
+# API documentation: https://learn.microsoft.com/zh-cn/azure/ai-services/openai/reference
+azure_api_key = ""
+azure_base_url = ""
+azure_model_name = "gpt-35-turbo"        # replace with your model deployment name
+azure_api_version = "2024-02-15-preview"

-    ########## Gemini API Key
-    gemini_api_key=""
-    gemini_model_name = "gemini-1.0-pro"
+########## Gemini API Key
+gemini_api_key = ""
+gemini_model_name = "gemini-1.0-pro"

-    ########## Qwen API Key
-    # Visit https://dashscope.console.aliyun.com/apiKey to get your API key
-    # Visit below links to get more details
-    # https://tongyi.aliyun.com/qianwen/
-    # https://help.aliyun.com/zh/dashscope/developer-reference/model-introduction
-    qwen_api_key = ""
-    qwen_model_name = "qwen-max"
+########## Qwen API Key
+# Visit https://dashscope.console.aliyun.com/apiKey to get your API key
+# Visit below links to get more details
+# https://tongyi.aliyun.com/qianwen/
+# https://help.aliyun.com/zh/dashscope/developer-reference/model-introduction
+qwen_api_key = ""
+qwen_model_name = "qwen-max"


-    ########## DeepSeek API Key
-    # Visit https://platform.deepseek.com/api_keys to get your API key
-    deepseek_api_key = ""
-    deepseek_base_url = "https://api.deepseek.com"
-    deepseek_model_name = "deepseek-chat"
+########## DeepSeek API Key
+# Visit https://platform.deepseek.com/api_keys to get your API key
+deepseek_api_key = ""
+deepseek_base_url = "https://api.deepseek.com"
+deepseek_model_name = "deepseek-chat"

-    # Subtitle Provider, "edge" or "whisper"
-    # If empty, the subtitle will not be generated
-    subtitle_provider = "edge"
+# Subtitle Provider, "edge" or "whisper"
+# If empty, the subtitle will not be generated
+subtitle_provider = "edge"

-    #
-    # ImageMagick
-    #
-    # Once you have installed it, ImageMagick will be automatically detected, except on Windows!
-    # On Windows, for example "C:\Program Files (x86)\ImageMagick-7.1.1-Q16-HDRI\magick.exe"
-    # Download from https://imagemagick.org/archive/binaries/ImageMagick-7.1.1-29-Q16-x64-static.exe
+#
+# ImageMagick
+#
+# Once you have installed it, ImageMagick will be automatically detected, except on Windows!
+# On Windows, for example "C:\Program Files (x86)\ImageMagick-7.1.1-Q16-HDRI\magick.exe"
+# Download from https://imagemagick.org/archive/binaries/ImageMagick-7.1.1-29-Q16-x64-static.exe

-    # imagemagick_path = "C:\\Program Files (x86)\\ImageMagick-7.1.1-Q16\\magick.exe"
+# imagemagick_path = "C:\\Program Files (x86)\\ImageMagick-7.1.1-Q16\\magick.exe"


-    #
-    # FFMPEG
-    #
-    # 通常情况下，ffmpeg 会被自动下载，并且会被自动检测到。
-    # 但是如果你的环境有问题，无法自动下载，可能会遇到如下错误：
-    #   RuntimeError: No ffmpeg exe could be found.
-    #   Install ffmpeg on your system, or set the IMAGEIO_FFMPEG_EXE environment variable.
-    # 此时你可以手动下载 ffmpeg 并设置 ffmpeg_path，下载地址：https://www.gyan.dev/ffmpeg/builds/
+#
+# FFMPEG
+#
+# 通常情况下，ffmpeg 会被自动下载，并且会被自动检测到。
+# 但是如果你的环境有问题，无法自动下载，可能会遇到如下错误：
+#   RuntimeError: No ffmpeg exe could be found.
+#   Install ffmpeg on your system, or set the IMAGEIO_FFMPEG_EXE environment variable.
+# 此时你可以手动下载 ffmpeg 并设置 ffmpeg_path，下载地址：https://www.gyan.dev/ffmpeg/builds/

-    # Under normal circumstances, ffmpeg is downloaded automatically and detected automatically.
-    # However, if there is an issue with your environment that prevents automatic downloading, you might encounter the following error:
-    #   RuntimeError: No ffmpeg exe could be found.
-    #   Install ffmpeg on your system, or set the IMAGEIO_FFMPEG_EXE environment variable.
-    # In such cases, you can manually download ffmpeg and set the ffmpeg_path, download link: https://www.gyan.dev/ffmpeg/builds/
+# Under normal circumstances, ffmpeg is downloaded automatically and detected automatically.
+# However, if there is an issue with your environment that prevents automatic downloading, you might encounter the following error:
+#   RuntimeError: No ffmpeg exe could be found.
+#   Install ffmpeg on your system, or set the IMAGEIO_FFMPEG_EXE environment variable.
+# In such cases, you can manually download ffmpeg and set the ffmpeg_path, download link: https://www.gyan.dev/ffmpeg/builds/

-    # ffmpeg_path = "C:\\Users\\harry\\Downloads\\ffmpeg.exe"
-    #########################################################################################
+# ffmpeg_path = "C:\\Users\\harry\\Downloads\\ffmpeg.exe"
+#########################################################################################

-    # 当视频生成成功后，API服务提供的视频下载接入点，默认为当前服务的地址和监听端口
-    # 比如 http://127.0.0.1:8080/tasks/6357f542-a4e1-46a1-b4c9-bf3bd0df5285/final-1.mp4
-    # 如果你需要使用域名对外提供服务（一般会用nginx做代理），则可以设置为你的域名
-    # 比如 https://xxxx.com/tasks/6357f542-a4e1-46a1-b4c9-bf3bd0df5285/final-1.mp4
-    # endpoint="https://xxxx.com"
+# 当视频生成成功后，API服务提供的视频下载接入点，默认为当前服务的地址和监听端口
+# 比如 http://127.0.0.1:8080/tasks/6357f542-a4e1-46a1-b4c9-bf3bd0df5285/final-1.mp4
+# 如果你需要使用域名对外提供服务（一般会用nginx做代理），则可以设置为你的域名
+# 比如 https://xxxx.com/tasks/6357f542-a4e1-46a1-b4c9-bf3bd0df5285/final-1.mp4
+# endpoint="https://xxxx.com"

-    # When the video is successfully generated, the API service provides a download endpoint for the video, defaulting to the service's current address and listening port.
-    # For example, http://127.0.0.1:8080/tasks/6357f542-a4e1-46a1-b4c9-bf3bd0df5285/final-1.mp4
-    # If you need to provide the service externally using a domain name (usually done with nginx as a proxy), you can set it to your domain name.
-    # For example, https://xxxx.com/tasks/6357f542-a4e1-46a1-b4c9-bf3bd0df5285/final-1.mp4
-    # endpoint="https://xxxx.com"
-    endpoint=""
+# When the video is successfully generated, the API service provides a download endpoint for the video, defaulting to the service's current address and listening port.
+# For example, http://127.0.0.1:8080/tasks/6357f542-a4e1-46a1-b4c9-bf3bd0df5285/final-1.mp4
+# If you need to provide the service externally using a domain name (usually done with nginx as a proxy), you can set it to your domain name.
+# For example, https://xxxx.com/tasks/6357f542-a4e1-46a1-b4c9-bf3bd0df5285/final-1.mp4
+# endpoint="https://xxxx.com"
+endpoint = ""


-    # Video material storage location
-    # material_directory = ""                    # Indicates that video materials will be downloaded to the default folder, the default folder is ./storage/cache_videos under the current project
-    # material_directory = "/user/harry/videos"  # Indicates that video materials will be downloaded to a specified folder
-    # material_directory = "task"                # Indicates that video materials will be downloaded to the current task's folder, this method does not allow sharing of already downloaded video materials
+# Video material storage location
+# material_directory = ""                    # Indicates that video materials will be downloaded to the default folder, the default folder is ./storage/cache_videos under the current project
+# material_directory = "/user/harry/videos"  # Indicates that video materials will be downloaded to a specified folder
+# material_directory = "task"                # Indicates that video materials will be downloaded to the current task's folder, this method does not allow sharing of already downloaded video materials

-    # 视频素材存放位置
-    # material_directory = ""                    #表示将视频素材下载到默认的文件夹，默认文件夹为当前项目下的 ./storage/cache_videos
-    # material_directory = "/user/harry/videos"  #表示将视频素材下载到指定的文件夹中
-    # material_directory = "task"                #表示将视频素材下载到当前任务的文件夹中，这种方式无法共享已经下载的视频素材
+# 视频素材存放位置
+# material_directory = ""                    #表示将视频素材下载到默认的文件夹，默认文件夹为当前项目下的 ./storage/cache_videos
+# material_directory = "/user/harry/videos"  #表示将视频素材下载到指定的文件夹中
+# material_directory = "task"                #表示将视频素材下载到当前任务的文件夹中，这种方式无法共享已经下载的视频素材

-    material_directory = ""
+material_directory = ""

-    # Used for state management of the task
-    enable_redis = false
-    redis_host = "localhost"
-    redis_port = 6379
-    redis_db = 0
-    redis_password = ""
+# Used for state management of the task
+enable_redis = false
+redis_host = "localhost"
+redis_port = 6379
+redis_db = 0
+redis_password = ""

-    # 文生视频时的最大并发任务数
-    max_concurrent_tasks = 5
-
-    # webui界面是否显示配置项
-    # webui hide baisc config panel
-    hide_config = false
+# 文生视频时的最大并发任务数
+max_concurrent_tasks = 5


 [whisper]
-    # Only effective when subtitle_provider is "whisper"
+# Only effective when subtitle_provider is "whisper"

-    # Run on GPU with FP16
-    # model = WhisperModel(model_size, device="cuda", compute_type="float16")
+# Run on GPU with FP16
+# model = WhisperModel(model_size, device="cuda", compute_type="float16")

-    # Run on GPU with INT8
-    # model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
+# Run on GPU with INT8
+# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")

-    # Run on CPU with INT8
-    # model = WhisperModel(model_size, device="cpu", compute_type="int8")
+# Run on CPU with INT8
+# model = WhisperModel(model_size, device="cpu", compute_type="int8")

-    # recommended model_size: "large-v3"
-    model_size="large-v3"
-    # if you want to use GPU, set device="cuda"
-    device="CPU"
-    compute_type="int8"
+# recommended model_size: "large-v3"
+model_size = "large-v3"
+# if you want to use GPU, set device="cuda"
+device = "CPU"
+compute_type = "int8"


 [proxy]
-    ### Use a proxy to access the Pexels API
-    ### Format: "http://<username>:<password>@<proxy>:<port>"
-    ### Example: "http://user:pass@proxy:1234"
-    ### Doc: https://requests.readthedocs.io/en/latest/user/advanced/#proxies
+### Use a proxy to access the Pexels API
+### Format: "http://<username>:<password>@<proxy>:<port>"
+### Example: "http://user:pass@proxy:1234"
+### Doc: https://requests.readthedocs.io/en/latest/user/advanced/#proxies

-    # http = "http://10.10.1.10:3128"
-    # https = "http://10.10.1.10:1080"
+# http = "http://10.10.1.10:3128"
+# https = "http://10.10.1.10:1080"

 [azure]
-    # Azure Speech API Key
-    # Get your API key at https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/SpeechServices
-    speech_key=""
-    speech_region=""
+# Azure Speech API Key
+# Get your API key at https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/SpeechServices
+speech_key = ""
+speech_region = ""
+
+[siliconflow]
+# SiliconFlow API Key
+# Get your API key at https://siliconflow.cn
+api_key = ""
+
+[ui]
+# UI related settings
+# 是否隐藏日志信息
+# Whether to hide logs in the UI
+hide_log = false
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -6,7 +6,7 @@ services:
    build:
      context: .
      dockerfile: Dockerfile
-    container_name: "webui"
+    container_name: "moneyprinterturbo-webui"
    ports:
      - "8501:8501"
    command: [ "streamlit", "run", "./webui/Main.py","--browser.serverAddress=127.0.0.1","--server.enableCORS=True","--browser.gatherUsageStats=False" ]
@@ -16,7 +16,7 @@ services:
    build:
      context: .
      dockerfile: Dockerfile
-    container_name: "api"
+    container_name: "moneyprinterturbo-api"
    ports:
      - "8080:8080"
    command: [ "python3", "main.py" ]
--- a/docs/api.jpg
+++ b/docs/api.jpg
--- a/docs/webui-en.jpg
+++ b/docs/webui-en.jpg
--- a/docs/webui.jpg
+++ b/docs/webui.jpg
--- a/docs/wechat-group.jpg
+++ b/docs/wechat-group.jpg
--- a/main.py
+++ b/main.py
@@ -1,8 +1,16 @@
 import uvicorn
 from loguru import logger
+
 from app.config import config

-if __name__ == '__main__':
-    logger.info("start server, docs: http://127.0.0.1:" + str(config.listen_port) + "/docs")
-    uvicorn.run(app="app.asgi:app", host=config.listen_host, port=config.listen_port, reload=config.reload_debug,
-                log_level="warning")
+if __name__ == "__main__":
+    logger.info(
+        "start server, docs: http://127.0.0.1:" + str(config.listen_port) + "/docs"
+    )
+    uvicorn.run(
+        app="app.asgi:app",
+        host=config.listen_host,
+        port=config.listen_port,
+        reload=config.reload_debug,
+        log_level="warning",
+    )
--- a/pdm.lock
+++ b/pdm.lock
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -0,0 +1,32 @@
+[project]
+name = "MoneyPrinterTurbo"
+version = "1.2.3"
+description = "Default template for PDM package"
+authors = [
+    {name = "yyhhyyyyyy", email = "yyhhyyyyyy8@gmail.com"},
+]
+dependencies = [
+    "moviepy==2.1.1",
+    "streamlit==1.40.2",
+    "edge-tts==6.1.19",
+    "fastapi==0.115.6",
+    "uvicorn==0.32.1",
+    "openai==1.56.1",
+    "faster-whisper==1.1.0",
+    "loguru==0.7.2",
+    "google-generativeai==0.8.3",
+    "dashscope==1.20.14",
+    "g4f==0.3.8.1",
+    "azure-cognitiveservices-speech==1.41.1",
+    "redis==5.2.0",
+    "python-multipart==0.0.19",
+    "streamlit-authenticator==0.4.1",
+    "pyyaml",
+]
+requires-python = "==3.11.*"
+readme = "README.md"
+license = {text = "MIT"}
+
+
+[tool.pdm]
+distribution = false
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,26 +1,15 @@
-requests~=2.31.0
-moviepy~=2.0.0.dev2
-openai~=1.13.3
-faster-whisper~=1.0.1
-edge_tts~=6.1.10
-uvicorn~=0.27.1
-fastapi~=0.110.0
-tomli~=2.0.1
-streamlit~=1.33.0
-loguru~=0.7.2
-aiohttp~=3.9.3
-urllib3~=2.2.1
-pillow~=10.3.0
-pydantic~=2.6.3
-g4f~=0.3.0.4
-dashscope~=1.15.0
-google.generativeai~=0.4.1
-python-multipart~=0.0.9
-redis==5.0.3
-# if you use pillow~=10.3.0, you will get "PIL.Image' has no attribute 'ANTIALIAS'" error when resize video
-# please install opencv-python to fix "PIL.Image' has no attribute 'ANTIALIAS'" error
-opencv-python~=4.9.0.80
-# for azure speech
-# https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/9-more-realistic-ai-voices-for-conversations-now-generally/ba-p/4099471
-azure-cognitiveservices-speech~=1.37.0
-git-changelog~=2.5.2
+moviepy==2.1.2
+streamlit==1.45.0
+edge_tts==6.1.19
+fastapi==0.115.6
+uvicorn==0.32.1
+openai==1.56.1
+faster-whisper==1.1.0
+loguru==0.7.3
+google.generativeai==0.8.3
+dashscope==1.20.14
+g4f==0.5.2.2
+azure-cognitiveservices-speech==1.41.1
+redis==5.2.0
+python-multipart==0.0.19
+pyyaml
--- a/test/README.md
+++ b/test/README.md
@@ -0,0 +1,39 @@
+# MoneyPrinterTurbo Test Directory
+
+This directory contains unit tests for the **MoneyPrinterTurbo** project.
+
+## Directory Structure
+
+- `services/`: Tests for components in the `app/services` directory  
+  - `test_video.py`: Tests for the video service  
+  - `test_task.py`: Tests for the task service  
+
+## Running Tests
+
+You can run the tests using Python’s built-in `unittest` framework:
+
+```bash
+# Run all tests
+python -m unittest discover -s test
+
+# Run a specific test file
+python -m unittest test/services/test_video.py
+
+# Run a specific test class
+python -m unittest test.services.test_video.TestVideoService
+
+# Run a specific test method
+python -m unittest test.services.test_video.TestVideoService.test_preprocess_video
+````
+
+## Adding New Tests
+
+To add tests for other components, follow these guidelines:
+
+1. Create test files prefixed with `test_` in the appropriate subdirectory
+2. Use `unittest.TestCase` as the base class for your test classes
+3. Name test methods with the `test_` prefix
+
+## Test Resources
+
+Place any resource files required for testing in the `test/resources` directory.
--- a/test/init.py
+++ b/test/init.py
@@ -0,0 +1 @@
+# Unit test package for test
--- a/test/resources/1.png
+++ b/test/resources/1.png
--- a/test/resources/1.png.mp4
+++ b/test/resources/1.png.mp4
--- a/test/resources/2.png
+++ b/test/resources/2.png
--- a/test/resources/2.png.mp4
+++ b/test/resources/2.png.mp4
--- a/test/resources/3.png
+++ b/test/resources/3.png
--- a/test/resources/3.png.mp4
+++ b/test/resources/3.png.mp4
--- a/test/resources/4.png
+++ b/test/resources/4.png
--- a/test/resources/5.png
+++ b/test/resources/5.png
--- a/test/resources/6.png
+++ b/test/resources/6.png
--- a/test/resources/7.png
+++ b/test/resources/7.png
--- a/test/resources/8.png
+++ b/test/resources/8.png
--- a/test/resources/9.png
+++ b/test/resources/9.png
--- a/test/services/init.py
+++ b/test/services/init.py
@@ -0,0 +1 @@
+# Unit test package for services
--- a/test/services/test_task.py
+++ b/test/services/test_task.py
@@ -0,0 +1,66 @@
+import unittest
+import os
+import sys
+from pathlib import Path
+
+# add project root to python path
+sys.path.insert(0, str(Path(__file__).parent.parent.parent))
+
+from app.services import task as tm
+from app.models.schema import MaterialInfo, VideoParams
+
+resources_dir = os.path.join(os.path.dirname(os.path.dirname(__file__)), "resources")
+
+class TestTaskService(unittest.TestCase):
+    def setUp(self):
+        pass
+    
+    def tearDown(self):
+        pass
+    
+    def test_task_local_materials(self):
+        task_id = "00000000-0000-0000-0000-000000000000"
+        video_materials=[]
+        for i in range(1, 4):
+            video_materials.append(MaterialInfo(
+                provider="local",
+                url=os.path.join(resources_dir, f"{i}.png"),
+                duration=0
+            ))
+
+        params = VideoParams(
+            video_subject="金钱的作用",
+            video_script="金钱不仅是交换媒介，更是社会资源的分配工具。它能满足基本生存需求，如食物和住房，也能提供教育、医疗等提升生活品质的机会。拥有足够的金钱意味着更多选择权，比如职业自由或创业可能。但金钱的作用也有边界，它无法直接购买幸福、健康或真诚的人际关系。过度追逐财富可能导致价值观扭曲，忽视精神层面的需求。理想的状态是理性看待金钱，将其作为实现目标的工具而非终极目的。",
+            video_terms="money importance, wealth and society, financial freedom, money and happiness, role of money",
+            video_aspect="9:16",
+            video_concat_mode="random",
+            video_transition_mode="None",
+            video_clip_duration=3,
+            video_count=1,
+            video_source="local",
+            video_materials=video_materials,
+            video_language="",
+            voice_name="zh-CN-XiaoxiaoNeural-Female",
+            voice_volume=1.0,
+            voice_rate=1.0,
+            bgm_type="random",
+            bgm_file="",
+            bgm_volume=0.2,
+            subtitle_enabled=True,
+            subtitle_position="bottom",
+            custom_position=70.0,
+            font_name="MicrosoftYaHeiBold.ttc",
+            text_fore_color="#FFFFFF",
+            text_background_color=True,
+            font_size=60,
+            stroke_color="#000000",
+            stroke_width=1.5,
+            n_threads=2,
+            paragraph_number=1
+        )
+        result = tm.start(task_id=task_id, params=params)
+        print(result)
+    
+
+if __name__ == "__main__":
+    unittest.main() 
--- a/test/services/test_video.py
+++ b/test/services/test_video.py
@@ -0,0 +1,85 @@
+
+import unittest
+import os
+import sys
+from pathlib import Path
+from moviepy import (
+    VideoFileClip,
+)
+# add project root to python path
+sys.path.insert(0, str(Path(__file__).parent.parent.parent))
+from app.models.schema import MaterialInfo
+from app.services import video as vd
+from app.utils import utils
+
+resources_dir = os.path.join(os.path.dirname(os.path.dirname(__file__)), "resources")
+
+class TestVideoService(unittest.TestCase):
+    def setUp(self):
+        self.test_img_path = os.path.join(resources_dir, "1.png")
+    
+    def tearDown(self):
+        pass
+    
+    def test_preprocess_video(self):
+        if not os.path.exists(self.test_img_path):
+            self.fail(f"test image not found: {self.test_img_path}")
+        
+        # test preprocess_video function
+        m = MaterialInfo()
+        m.url = self.test_img_path
+        m.provider = "local"
+        print(m)
+        
+        materials = vd.preprocess_video([m], clip_duration=4)
+        print(materials)
+        
+        # verify result
+        self.assertIsNotNone(materials)
+        self.assertEqual(len(materials), 1)
+        self.assertTrue(materials[0].url.endswith(".mp4"))
+        
+        # moviepy get video info
+        clip = VideoFileClip(materials[0].url)
+        print(clip)
+        
+        # clean generated test video file
+        if os.path.exists(materials[0].url):
+            os.remove(materials[0].url)
+    
+    def test_wrap_text(self):
+        """test text wrapping function"""
+        try:
+            font_path = os.path.join(utils.font_dir(), "STHeitiMedium.ttc")
+            if not os.path.exists(font_path):
+                self.fail(f"font file not found: {font_path}")
+                
+            # test english text wrapping
+            test_text_en = "This is a test text for wrapping long sentences in english language"
+            
+            wrapped_text_en, text_height_en = vd.wrap_text(
+                text=test_text_en,
+                max_width=300,
+                font=font_path,
+                fontsize=30
+            )
+            print(wrapped_text_en, text_height_en)
+            # verify text is wrapped
+            self.assertIn("\n", wrapped_text_en)
+            
+            # test chinese text wrapping
+            test_text_zh = "这是一段用来测试中文长句换行的文本内容，应该会根据宽度限制进行换行处理"
+            wrapped_text_zh, text_height_zh = vd.wrap_text(
+                text=test_text_zh,
+                max_width=300,
+                font=font_path,
+                fontsize=30
+            )   
+            print(wrapped_text_zh, text_height_zh)
+            # verify chinese text is wrapped
+            self.assertIn("\n", wrapped_text_zh)
+        except Exception as e:
+            self.fail(f"test wrap_text failed: {str(e)}")
+
+if __name__ == "__main__":
+    unittest.main() 
--- a/webui/Main.py
+++ b/webui/Main.py
--- a/webui/i18n/de.json
+++ b/webui/i18n/de.json
@@ -1,6 +1,14 @@
 {
-  "Language": "German",
+  "Language": "Deutsch",
  "Translation": {
+    "Login Required": "Anmeldung erforderlich",
+    "Please login to access settings": "Bitte melden Sie sich an, um auf die Einstellungen zuzugreifen",
+    "Username": "Benutzername",
+    "Password": "Passwort",
+    "Login": "Anmelden",
+    "Login Error": "Anmeldefehler",
+    "Incorrect username or password": "Falscher Benutzername oder Passwort",
+    "Please enter your username and password": "Bitte geben Sie Ihren Benutzernamen und Ihr Passwort ein",
    "Video Script Settings": "**Drehbuch / Topic des Videos**",
    "Video Subject": "Worum soll es in dem Video gehen? (Geben Sie ein Keyword an, :red[Dank KI wird automatisch ein Drehbuch generieren])",
    "Script Language": "Welche Sprache soll zum Generieren von Drehbüchern  verwendet werden? :red[KI generiert anhand dieses Begriffs das Drehbuch]",
@@ -10,12 +18,19 @@
    "Generate Video Keywords": "Klicken Sie, um KI zum Generieren zu verwenden [Video Keywords] basierend auf dem **Drehbuch**",
    "Please Enter the Video Subject": "Bitte geben Sie zuerst das Drehbuch an",
    "Generating Video Script and Keywords": "KI generiert ein Drehbuch und Schlüsselwörter...",
-    "Generating Video Keywords": "AI is generating video keywords...",
+    "Generating Video Keywords": "KI generiert Video-Schlüsselwörter...",
    "Video Keywords": "Video Schlüsselwörter (:blue[① Optional, KI generiert ② Verwende **, (Kommas)** zur Trennung der Wörter, in englischer Sprache])",
    "Video Settings": "**Video Einstellungen**",
    "Video Concat Mode": "Videoverkettungsmodus",
    "Random": "Zufällige Verkettung (empfohlen)",
    "Sequential": "Sequentielle Verkettung",
+    "Video Transition Mode": "Video Übergangsmodus",
+    "None": "Kein Übergang",
+    "Shuffle": "Zufällige Übergänge",
+    "FadeIn": "FadeIn",
+    "FadeOut": "FadeOut",
+    "SlideIn": "SlideIn",
+    "SlideOut": "SlideOut",
    "Video Ratio": "Video-Seitenverhältnis",
    "Portrait": "Portrait 9:16",
    "Landscape": "Landschaft 16:9",
@@ -23,9 +38,10 @@
    "Number of Videos Generated Simultaneously": "Anzahl der parallel generierten Videos",
    "Audio Settings": "**Audio Einstellungen**",
    "Speech Synthesis": "Sprachausgabe",
-    "Speech Region": "Region(:red[Required，[Get Region](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/SpeechServices)])",
-    "Speech Key": "API Key(:red[Required，[Get API Key](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/SpeechServices)])",
+    "Speech Region": "Region(:red[Erforderlich，[Region abrufen](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/SpeechServices)])",
+    "Speech Key": "API-Schlüssel(:red[Erforderlich，[API-Schlüssel abrufen](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/SpeechServices)])",
    "Speech Volume": "Lautstärke der Sprachausgabe",
+    "Speech Rate": "Lesegeschwindigkeit (1,0 bedeutet 1x)",
    "Male": "Männlich",
    "Female": "Weiblich",
    "Background Music": "Hintergrundmusik",
@@ -41,6 +57,7 @@
    "Top": "Oben",
    "Center": "Mittig",
    "Bottom": "Unten (empfohlen)",
+    "Custom": "Benutzerdefinierte Position (70, was 70% von oben bedeutet)",
    "Font Size": "Schriftgröße für Untertitel",
    "Font Color": "Schriftfarbe",
    "Stroke Color": "Kontur",
@@ -52,26 +69,37 @@
    "Video Generation Completed": "Video erfolgreich generiert",
    "Video Generation Failed": "Video Generierung fehlgeschlagen",
    "You can download the generated video from the following links": "Sie können das generierte Video über die folgenden Links herunterladen",
-    "Basic Settings": "**Grunde Instellungen**",
-    "Pexels API Key": "Pexels API Key ([Get API Key](https://www.pexels.com/api/))",
-    "Pixabay API Key": "Pixabay API Key ([Get API Key](https://pixabay.com/api/docs/#api_search_videos))",
-    "Language": "Language",
-    "LLM Provider": "LLM Provider",
-    "API Key": "API Key (:red[Required])",
-    "Base Url": "Base Url",
-    "Model Name": "Model Name",
-    "Please Enter the LLM API Key": "Please Enter the **LLM API Key**",
-    "Please Enter the Pexels API Key": "Please Enter the **Pexels API Key**",
-    "Please Enter the Pixabay API Key": "Please Enter the **Pixabay API Key**",
-    "Get Help": "If you need help, or have any questions, you can join discord for help: https://harryai.cc",
-    "Video Source": "Video Source",
-    "TikTok": "TikTok (TikTok support is coming soon)",
-    "Bilibili": "Bilibili (Bilibili support is coming soon)",
-    "Xiaohongshu": "Xiaohongshu (Xiaohongshu support is coming soon)",
-    "Local file": "Local file",
-    "Play Voice": "Play Voice",
-    "Voice Example": "This is an example text for testing speech synthesis",
-    "Synthesizing Voice": "Synthesizing voice, please wait...",
-    "TTS Provider": "Select the voice synthesis provider"
+    "Basic Settings": "**Grundeinstellungen** (:blue[Klicken zum Erweitern])",
+    "Language": "Sprache",
+    "Pexels API Key": "Pexels API-Schlüssel ([API-Schlüssel abrufen](https://www.pexels.com/api/))",
+    "Pixabay API Key": "Pixabay API-Schlüssel ([API-Schlüssel abrufen](https://pixabay.com/api/docs/#api_search_videos))",
+    "LLM Provider": "KI-Modellanbieter",
+    "API Key": "API-Schlüssel (:red[Erforderlich])",
+    "Base Url": "Basis-URL",
+    "Account ID": "Konto-ID (Aus dem Cloudflare-Dashboard)",
+    "Model Name": "Modellname",
+    "Please Enter the LLM API Key": "Bitte geben Sie den **KI-Modell API-Schlüssel** ein",
+    "Please Enter the Pexels API Key": "Bitte geben Sie den **Pexels API-Schlüssel** ein",
+    "Please Enter the Pixabay API Key": "Bitte geben Sie den **Pixabay API-Schlüssel** ein",
+    "Get Help": "Wenn Sie Hilfe benötigen oder Fragen haben, können Sie dem Discord beitreten: https://harryai.cc",
+    "Video Source": "Videoquelle",
+    "TikTok": "TikTok (TikTok-Unterstützung kommt bald)",
+    "Bilibili": "Bilibili (Bilibili-Unterstützung kommt bald)",
+    "Xiaohongshu": "Xiaohongshu (Xiaohongshu-Unterstützung kommt bald)",
+    "Local file": "Lokale Datei",
+    "Play Voice": "Sprachausgabe abspielen",
+    "Voice Example": "Dies ist ein Beispieltext zum Testen der Sprachsynthese",
+    "Synthesizing Voice": "Sprachsynthese läuft, bitte warten...",
+    "TTS Provider": "Sprachsynthese-Anbieter auswählen",
+    "TTS Servers": "TTS-Server",
+    "No voices available for the selected TTS server. Please select another server.": "Keine Stimmen für den ausgewählten TTS-Server verfügbar. Bitte wählen Sie einen anderen Server.",
+    "SiliconFlow API Key": "SiliconFlow API-Schlüssel",
+    "SiliconFlow TTS Settings": "SiliconFlow TTS-Einstellungen",
+    "Speed: Range [0.25, 4.0], default is 1.0": "Geschwindigkeit: Bereich [0.25, 4.0], Standardwert ist 1.0",
+    "Volume: Uses Speech Volume setting, default 1.0 maps to gain 0": "Lautstärke: Verwendet die Sprachlautstärke-Einstellung, Standardwert 1.0 entspricht Verstärkung 0",
+    "Hide Log": "Protokoll ausblenden",
+    "Hide Basic Settings": "Basis-Einstellungen ausblenden\n\nWenn diese Option deaktiviert ist, wird die Basis-Einstellungen-Leiste nicht auf der Seite angezeigt.\n\nWenn Sie sie erneut anzeigen möchten, setzen Sie `hide_config = false` in `config.toml`",
+    "LLM Settings": "**LLM-Einstellungen**",
+    "Video Source Settings": "**Videoquellen-Einstellungen**"
  }
 }
--- a/webui/i18n/en.json
+++ b/webui/i18n/en.json
@@ -1,6 +1,14 @@
 {
  "Language": "English",
  "Translation": {
+    "Login Required": "Login Required",
+    "Please login to access settings": "Please login to access settings",
+    "Username": "Username",
+    "Password": "Password",
+    "Login": "Login",
+    "Login Error": "Login Error",
+    "Incorrect username or password": "Incorrect username or password",
+    "Please enter your username and password": "Please enter your username and password",
    "Video Script Settings": "**Video Script Settings**",
    "Video Subject": "Video Subject (Provide a keyword, :red[AI will automatically generate] video script)",
    "Script Language": "Language for Generating Video Script (AI will automatically output based on the language of your subject)",
@@ -16,6 +24,13 @@
    "Video Concat Mode": "Video Concatenation Mode",
    "Random": "Random Concatenation (Recommended)",
    "Sequential": "Sequential Concatenation",
+    "Video Transition Mode": "Video Transition Mode",
+    "None": "None",
+    "Shuffle": "Shuffle",
+    "FadeIn": "FadeIn",
+    "FadeOut": "FadeOut",
+    "SlideIn": "SlideIn",
+    "SlideOut": "SlideOut",
    "Video Ratio": "Video Aspect Ratio",
    "Portrait": "Portrait 9:16",
    "Landscape": "Landscape 16:9",
@@ -26,6 +41,7 @@
    "Speech Region": "Region(:red[Required，[Get Region](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/SpeechServices)])",
    "Speech Key": "API Key(:red[Required，[Get API Key](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/SpeechServices)])",
    "Speech Volume": "Speech Volume (1.0 represents 100%)",
+    "Speech Rate": "Speech Rate (1.0 means 1x speed)",
    "Male": "Male",
    "Female": "Female",
    "Background Music": "Background Music",
@@ -41,6 +57,7 @@
    "Top": "Top",
    "Center": "Center",
    "Bottom": "Bottom (Recommended)",
+    "Custom": "Custom position (70, indicating 70% down from the top)",
    "Font Size": "Subtitle Font Size",
    "Font Color": "Subtitle Font Color",
    "Stroke Color": "Subtitle Outline Color",
@@ -73,6 +90,16 @@
    "Play Voice": "Play Voice",
    "Voice Example": "This is an example text for testing speech synthesis",
    "Synthesizing Voice": "Synthesizing voice, please wait...",
-    "TTS Provider": "Select the voice synthesis provider"
+    "TTS Provider": "Select the voice synthesis provider",
+    "TTS Servers": "TTS Servers",
+    "No voices available for the selected TTS server. Please select another server.": "No voices available for the selected TTS server. Please select another server.",
+    "SiliconFlow API Key": "SiliconFlow API Key [Click to get](https://cloud.siliconflow.cn/account/ak)",
+    "SiliconFlow TTS Settings": "SiliconFlow TTS Settings",
+    "Speed: Range [0.25, 4.0], default is 1.0": "Speed: Range [0.25, 4.0], default is 1.0",
+    "Volume: Uses Speech Volume setting, default 1.0 maps to gain 0": "Volume: Uses Speech Volume setting, default 1.0 maps to gain 0",
+    "Hide Log": "Hide Log",
+    "Hide Basic Settings": "Hide Basic Settings\n\nHidden, the basic settings panel will not be displayed on the page.\n\nIf you need to display it again, please set `hide_config = false` in `config.toml`",
+    "LLM Settings": "**LLM Settings**",
+    "Video Source Settings": "**Video Source Settings**"
  }
 }
--- a/webui/i18n/pt.json
+++ b/webui/i18n/pt.json
@@ -0,0 +1,105 @@
+{
+  "Language": "Português Brasileiro",
+  "Translation": {
+    "Login Required": "Login Necessário",
+    "Please login to access settings": "Por favor, faça login para acessar as configurações",
+    "Username": "Nome de usuário",
+    "Password": "Senha",
+    "Login": "Entrar",
+    "Login Error": "Erro de Login",
+    "Incorrect username or password": "Nome de usuário ou senha incorretos",
+    "Please enter your username and password": "Por favor, digite seu nome de usuário e senha",
+    "Video Script Settings": "**Configurações do Roteiro do Vídeo**",
+    "Video Subject": "Tema do Vídeo (Forneça uma palavra-chave, :red[a IA irá gerar automaticamente] o roteiro do vídeo)",
+    "Script Language": "Idioma para Gerar o Roteiro do Vídeo (a IA irá gerar automaticamente com base no idioma do seu tema)",
+    "Generate Video Script and Keywords": "Clique para usar a IA para gerar o [Roteiro do Vídeo] e as [Palavras-chave do Vídeo] com base no **tema**",
+    "Auto Detect": "Detectar Automaticamente",
+    "Video Script": "Roteiro do Vídeo (:blue[① Opcional, gerado pela IA  ② Pontuação adequada ajuda na geração de legendas])",
+    "Generate Video Keywords": "Clique para usar a IA para gerar [Palavras-chave do Vídeo] com base no **roteiro**",
+    "Please Enter the Video Subject": "Por favor, insira o Roteiro do Vídeo primeiro",
+    "Generating Video Script and Keywords": "A IA está gerando o roteiro do vídeo e as palavras-chave...",
+    "Generating Video Keywords": "A IA está gerando as palavras-chave do vídeo...",
+    "Video Keywords": "Palavras-chave do Vídeo (:blue[① Opcional, gerado pela IA ② Use **vírgulas em inglês** para separar, somente em inglês])",
+    "Video Settings": "**Configurações do Vídeo**",
+    "Video Concat Mode": "Modo de Concatenação de Vídeo",
+    "Random": "Concatenação Aleatória (Recomendado)",
+    "Sequential": "Concatenação Sequencial",
+    "Video Transition Mode": "Modo de Transição de Vídeo",
+    "None": "Nenhuma Transição",
+    "Shuffle": "Transição Aleatória",
+    "FadeIn": "FadeIn",
+    "FadeOut": "FadeOut",
+    "SlideIn": "SlideIn",
+    "SlideOut": "SlideOut",
+    "Video Ratio": "Proporção do Vídeo",
+    "Portrait": "Retrato 9:16",
+    "Landscape": "Paisagem 16:9",
+    "Clip Duration": "Duração Máxima dos Clipes de Vídeo (segundos)",
+    "Number of Videos Generated Simultaneously": "Número de Vídeos Gerados Simultaneamente",
+    "Audio Settings": "**Configurações de Áudio**",
+    "Speech Synthesis": "Voz de Síntese de Fala",
+    "Speech Region": "Região(:red[Obrigatório，[Obter Região](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/SpeechServices)])",
+    "Speech Key": "Chave da API(:red[Obrigatório，[Obter Chave da API](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/SpeechServices)])",
+    "Speech Volume": "Volume da Fala (1.0 representa 100%)",
+    "Speech Rate": "Velocidade da Fala (1.0 significa velocidade 1x)",
+    "Male": "Masculino",
+    "Female": "Feminino",
+    "Background Music": "Música de Fundo",
+    "No Background Music": "Sem Música de Fundo",
+    "Random Background Music": "Música de Fundo Aleatória",
+    "Custom Background Music": "Música de Fundo Personalizada",
+    "Custom Background Music File": "Por favor, insira o caminho do arquivo para a música de fundo personalizada:",
+    "Background Music Volume": "Volume da Música de Fundo (0.2 representa 20%, a música de fundo não deve ser muito alta)",
+    "Subtitle Settings": "**Configurações de Legendas**",
+    "Enable Subtitles": "Ativar Legendas (Se desmarcado, as configurações abaixo não terão efeito)",
+    "Font": "Fonte da Legenda",
+    "Position": "Posição da Legenda",
+    "Top": "Superior",
+    "Center": "Centralizar",
+    "Bottom": "Inferior (Recomendado)",
+    "Custom": "Posição personalizada (70, indicando 70% abaixo do topo)",
+    "Font Size": "Tamanho da Fonte da Legenda",
+    "Font Color": "Cor da Fonte da Legenda",
+    "Stroke Color": "Cor do Contorno da Legenda",
+    "Stroke Width": "Largura do Contorno da Legenda",
+    "Generate Video": "Gerar Vídeo",
+    "Video Script and Subject Cannot Both Be Empty": "O Tema do Vídeo e o Roteiro do Vídeo não podem estar ambos vazios",
+    "Generating Video": "Gerando vídeo, por favor aguarde...",
+    "Start Generating Video": "Começar a Gerar Vídeo",
+    "Video Generation Completed": "Geração do Vídeo Concluída",
+    "Video Generation Failed": "Falha na Geração do Vídeo",
+    "You can download the generated video from the following links": "Você pode baixar o vídeo gerado a partir dos seguintes links",
+    "Basic Settings": "**Configurações Básicas** (:blue[Clique para expandir])",
+    "Language": "Idioma",
+    "Pexels API Key": "Chave da API do Pexels ([Obter Chave da API](https://www.pexels.com/api/))",
+    "Pixabay API Key": "Chave da API do Pixabay ([Obter Chave da API](https://pixabay.com/api/docs/#api_search_videos))",
+    "LLM Provider": "Provedor LLM",
+    "API Key": "Chave da API (:red[Obrigatório])",
+    "Base Url": "URL Base",
+    "Account ID": "ID da Conta (Obter no painel do Cloudflare)",
+    "Model Name": "Nome do Modelo",
+    "Please Enter the LLM API Key": "Por favor, insira a **Chave da API LLM**",
+    "Please Enter the Pexels API Key": "Por favor, insira a **Chave da API do Pexels**",
+    "Please Enter the Pixabay API Key": "Por favor, insira a **Chave da API do Pixabay**",
+    "Get Help": "Se precisar de ajuda ou tiver alguma dúvida, você pode entrar no discord para obter ajuda: https://harryai.cc",
+    "Video Source": "Fonte do Vídeo",
+    "TikTok": "TikTok (Suporte para TikTok em breve)",
+    "Bilibili": "Bilibili (Suporte para Bilibili em breve)",
+    "Xiaohongshu": "Xiaohongshu (Suporte para Xiaohongshu em breve)",
+    "Local file": "Arquivo local",
+    "Play Voice": "Reproduzir Voz",
+    "Voice Example": "Este é um exemplo de texto para testar a síntese de fala",
+    "Synthesizing Voice": "Sintetizando voz, por favor aguarde...",
+    "TTS Provider": "Selecione o provedor de síntese de voz",
+    "TTS Servers": "Servidores TTS",
+    "No voices available for the selected TTS server. Please select another server.": "Não há vozes disponíveis para o servidor TTS selecionado. Por favor, selecione outro servidor.",
+    "SiliconFlow API Key": "Chave API do SiliconFlow",
+    "SiliconFlow TTS Settings": "Configurações do SiliconFlow TTS",
+    "Speed: Range [0.25, 4.0], default is 1.0": "Velocidade: Intervalo [0.25, 4.0], o padrão é 1.0",
+    "Volume: Uses Speech Volume setting, default 1.0 maps to gain 0": "Volume: Usa a configuração de Volume de Fala, o padrão 1.0 corresponde ao ganho 0",
+    "Hide Log": "Ocultar Log",
+    "Hide Basic Settings": "Ocultar Configurações Básicas\n\nOculto, o painel de configurações básicas não será exibido na página.\n\nSe precisar exibi-lo novamente, defina `hide_config = false` em `config.toml`",
+    "LLM Settings": "**Configurações do LLM**",
+    "Video Source Settings": "**Configurações da Fonte do Vídeo**"
+  }
+}
--- a/webui/i18n/vi.json
+++ b/webui/i18n/vi.json
@@ -1,6 +1,14 @@
 {
  "Language": "Tiếng Việt",
  "Translation": {
+    "Login Required": "Yêu cầu đăng nhập",
+    "Please login to access settings": "Vui lòng đăng nhập để truy cập cài đặt",
+    "Username": "Tên đăng nhập",
+    "Password": "Mật khẩu",
+    "Login": "Đăng nhập",
+    "Login Error": "Lỗi đăng nhập",
+    "Incorrect username or password": "Tên đăng nhập hoặc mật khẩu không chính xác",
+    "Please enter your username and password": "Vui lòng nhập tên đăng nhập và mật khẩu của bạn",
    "Video Script Settings": "**Cài Đặt Kịch Bản Video**",
    "Video Subject": "Chủ Đề Video (Cung cấp một từ khóa, :red[AI sẽ tự động tạo ra] kịch bản video)",
    "Script Language": "Ngôn Ngữ cho Việc Tạo Kịch Bản Video (AI sẽ tự động xuất ra dựa trên ngôn ngữ của chủ đề của bạn)",
@@ -16,6 +24,13 @@
    "Video Concat Mode": "Chế Độ Nối Video",
    "Random": "Nối Ngẫu Nhiên (Được Khuyến Nghị)",
    "Sequential": "Nối Theo Thứ Tự",
+    "Video Transition Mode": "Chế Độ Chuyển Đổi Video",
+    "None": "Không Có Chuyển Đổi",
+    "Shuffle": "Chuyển Đổi Ngẫu Nhiên",
+    "FadeIn": "FadeIn",
+    "FadeOut": "FadeOut",
+    "SlideIn": "SlideIn",
+    "SlideOut": "SlideOut",
    "Video Ratio": "Tỷ Lệ Khung Hình Video",
    "Portrait": "Dọc 9:16",
    "Landscape": "Ngang 16:9",
@@ -26,6 +41,7 @@
    "Speech Region": "Vùng(:red[Bắt Buộc，[Lấy Vùng](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/SpeechServices)])",
    "Speech Key": "Khóa API(:red[Bắt Buộc，[Lấy Khóa API](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/SpeechServices)])",
    "Speech Volume": "Âm Lượng Giọng Đọc (1.0 đại diện cho 100%)",
+    "Speech Rate": "Tốc độ đọc (1.0 biểu thị tốc độ gốc)",
    "Male": "Nam",
    "Female": "Nữ",
    "Background Music": "Âm Nhạc Nền",
@@ -41,6 +57,7 @@
    "Top": "Trên",
    "Center": "Giữa",
    "Bottom": "Dưới (Được Khuyến Nghị)",
+    "Custom": "Vị trí tùy chỉnh (70, chỉ ra là cách đầu trang 70%)",
    "Font Size": "Cỡ Chữ Phụ Đề",
    "Font Color": "Màu Chữ Phụ Đề",
    "Stroke Color": "Màu Viền Phụ Đề",
@@ -52,10 +69,10 @@
    "Video Generation Completed": "Hoàn Tất Tạo Video",
    "Video Generation Failed": "Tạo Video Thất Bại",
    "You can download the generated video from the following links": "Bạn có thể tải video được tạo ra từ các liên kết sau",
-    "Pexels API Key": "Khóa API Pexels ([Lấy Khóa API](https://www.pexels.com/api/))",
-    "Pixabay API Key": "Pixabay API Key ([Get API Key](https://pixabay.com/api/docs/#api_search_videos))",
    "Basic Settings": "**Cài Đặt Cơ Bản** (:blue[Nhấp để mở rộng])",
    "Language": "Ngôn Ngữ",
+    "Pexels API Key": "Khóa API Pexels ([Lấy Khóa API](https://www.pexels.com/api/))",
+    "Pixabay API Key": "Khóa API Pixabay ([Lấy Khóa API](https://pixabay.com/api/docs/#api_search_videos))",
    "LLM Provider": "Nhà Cung Cấp LLM",
    "API Key": "Khóa API (:red[Bắt Buộc])",
    "Base Url": "Url Cơ Bản",
@@ -63,16 +80,26 @@
    "Model Name": "Tên Mô Hình",
    "Please Enter the LLM API Key": "Vui lòng Nhập **Khóa API LLM**",
    "Please Enter the Pexels API Key": "Vui lòng Nhập **Khóa API Pexels**",
-    "Please Enter the Pixabay API Key": "Vui lòng Nhập **Pixabay API Key**",
+    "Please Enter the Pixabay API Key": "Vui lòng Nhập **Khóa API Pixabay**",
    "Get Help": "Nếu bạn cần giúp đỡ hoặc có bất kỳ câu hỏi nào, bạn có thể tham gia discord để được giúp đỡ: https://harryai.cc",
-    "Video Source": "Video Source",
-    "TikTok": "TikTok (TikTok support is coming soon)",
-    "Bilibili": "Bilibili (Bilibili support is coming soon)",
-    "Xiaohongshu": "Xiaohongshu (Xiaohongshu support is coming soon)",
-    "Local file": "Local file",
-    "Play Voice": "Play Voice",
-    "Voice Example": "This is an example text for testing speech synthesis",
-    "Synthesizing Voice": "Synthesizing voice, please wait...",
-    "TTS Provider": "Select the voice synthesis provider"
+    "Video Source": "Nguồn Video",
+    "TikTok": "TikTok (Hỗ trợ TikTok sắp ra mắt)",
+    "Bilibili": "Bilibili (Hỗ trợ Bilibili sắp ra mắt)",
+    "Xiaohongshu": "Xiaohongshu (Hỗ trợ Xiaohongshu sắp ra mắt)",
+    "Local file": "Tệp cục bộ",
+    "Play Voice": "Phát Giọng Nói",
+    "Voice Example": "Đây là văn bản mẫu để kiểm tra tổng hợp giọng nói",
+    "Synthesizing Voice": "Đang tổng hợp giọng nói, vui lòng đợi...",
+    "TTS Provider": "Chọn nhà cung cấp tổng hợp giọng nói",
+    "TTS Servers": "Máy chủ TTS",
+    "No voices available for the selected TTS server. Please select another server.": "Không có giọng nói nào cho máy chủ TTS đã chọn. Vui lòng chọn máy chủ khác.",
+    "SiliconFlow API Key": "Khóa API SiliconFlow",
+    "SiliconFlow TTS Settings": "Cài đặt SiliconFlow TTS",
+    "Speed: Range [0.25, 4.0], default is 1.0": "Tốc độ: Phạm vi [0.25, 4.0], mặc định là 1.0",
+    "Volume: Uses Speech Volume setting, default 1.0 maps to gain 0": "Âm lượng: Sử dụng cài đặt Âm lượng Giọng nói, mặc định 1.0 tương ứng với tăng ích 0",
+    "Hide Log": "Ẩn Nhật Ký",
+    "Hide Basic Settings": "Ẩn Cài Đặt Cơ Bản\n\nẨn, thanh cài đặt cơ bản sẽ không hiển thị trên trang web.\n\nNếu bạn muốn hiển thị lại, vui lòng đặt `hide_config = false` trong `config.toml`",
+    "LLM Settings": "**Cài Đặt LLM**",
+    "Video Source Settings": "**Cài Đặt Nguồn Video**"
  }
-}
+}
--- a/webui/i18n/zh.json
+++ b/webui/i18n/zh.json
@@ -1,6 +1,14 @@
 {
  "Language": "简体中文",
  "Translation": {
+    "Login Required": "需要登录",
+    "Please login to access settings": "请登录后访问配置设置 (:gray[默认用户名: admin, 密码: admin, 您可以在 config.toml 中修改])",
+    "Username": "用户名",
+    "Password": "密码",
+    "Login": "登录",
+    "Login Error": "登录错误",
+    "Incorrect username or password": "用户名或密码不正确",
+    "Please enter your username and password": "请输入用户名和密码",
    "Video Script Settings": "**文案设置**",
    "Video Subject": "视频主题（给定一个关键词，:red[AI自动生成]视频文案）",
    "Script Language": "生成视频脚本的语言（一般情况AI会自动根据你输入的主题语言输出）",
@@ -16,6 +24,13 @@
    "Video Concat Mode": "视频拼接模式",
    "Random": "随机拼接（推荐）",
    "Sequential": "顺序拼接",
+    "Video Transition Mode": "视频转场模式",
+    "None": "无转场",
+    "Shuffle": "随机转场",
+    "FadeIn": "渐入",
+    "FadeOut": "渐出",
+    "SlideIn": "滑动入",
+    "SlideOut": "滑动出",
    "Video Ratio": "视频比例",
    "Portrait": "竖屏 9:16（抖音视频）",
    "Landscape": "横屏 16:9（西瓜视频）",
@@ -26,6 +41,7 @@
    "Speech Region": "服务区域 (:red[必填，[点击获取](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/SpeechServices)])",
    "Speech Key": "API Key (:red[必填，密钥1 或 密钥2 均可 [点击获取](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/SpeechServices)])",
    "Speech Volume": "朗读音量（1.0表示100%）",
+    "Speech Rate": "朗读速度（1.0表示1倍速）",
    "Male": "男性",
    "Female": "女性",
    "Background Music": "背景音乐",
@@ -41,6 +57,7 @@
    "Top": "顶部",
    "Center": "中间",
    "Bottom": "底部（推荐）",
+    "Custom": "自定义位置（70，表示离顶部70%的位置）",
    "Font Size": "字幕大小",
    "Font Color": "字幕颜色",
    "Stroke Color": "描边颜色",
@@ -54,8 +71,8 @@
    "You can download the generated video from the following links": "你可以从以下链接下载生成的视频",
    "Basic Settings": "**基础设置** (:blue[点击展开])",
    "Language": "界面语言",
-    "Pexels API Key": "Pexels API Key ([点击获取](https://www.pexels.com/api/))",
-    "Pixabay API Key": "Pixabay API Key ([点击获取](https://pixabay.com/api/docs/#api_search_videos))",
+    "Pexels API Key": "Pexels API Key ([点击获取](https://www.pexels.com/api/)) :red[推荐使用]",
+    "Pixabay API Key": "Pixabay API Key ([点击获取](https://pixabay.com/api/docs/#api_search_videos)) :red[可以不用配置，如果 Pexels 无法使用，再选择Pixabay]",
    "LLM Provider": "大模型提供商",
    "API Key": "API Key (:red[必填，需要到大模型提供商的后台申请])",
    "Base Url": "Base Url (可选)",
@@ -73,6 +90,16 @@
    "Play Voice": "试听语音合成",
    "Voice Example": "这是一段测试语音合成的示例文本",
    "Synthesizing Voice": "语音合成中，请稍候...",
-    "TTS Provider": "语音合成提供商"
+    "TTS Provider": "语音合成提供商",
+    "TTS Servers": "TTS服务器",
+    "No voices available for the selected TTS server. Please select another server.": "当前选择的TTS服务器没有可用的声音，请选择其他服务器。",
+    "SiliconFlow API Key": "硅基流动API密钥 [点击获取](https://cloud.siliconflow.cn/account/ak)",
+    "SiliconFlow TTS Settings": "硅基流动TTS设置",
+    "Speed: Range [0.25, 4.0], default is 1.0": "语速范围 [0.25, 4.0]，默认值为1.0",
+    "Volume: Uses Speech Volume setting, default 1.0 maps to gain 0": "音量：使用朗读音量设置，默认值1.0对应增益0",
+    "Hide Log": "隐藏日志",
+    "Hide Basic Settings": "隐藏基础设置\n\n隐藏后，基础设置面板将不会显示在页面中。\n\n如需要再次显示，请在 `config.toml` 中设置 `hide_config = false`",
+    "LLM Settings": "**大模型设置**",
+    "Video Source Settings": "**视频源设置**"
  }
 }
Author	SHA1	Message	Date
Harry	0bfec956c5	Merge pull request #658 from harry0703/dev bump version to 1.2.6	2025-05-10 14:14:42 +08:00
harry	fec3a8b6bd	Merge branch 'add-siliconflow-tts' into dev	2025-05-10 14:13:37 +08:00
harry	3108c2e4e5	perf: bump version to 1.2.6	2025-05-10 14:13:18 +08:00
Harry	d8dd1f1acf	Merge pull request #657 from harry0703/add-siliconflow-tts feat: update SiliconFlow API Key descriptions in localization files	2025-05-10 14:12:11 +08:00
Harry	208ea5c11b	Merge pull request #653 from yyhhyyyyyy/add-siliconflow-tts feat: Increase SiliconFlow TTS services.	2025-05-10 14:11:26 +08:00
harry	71d791a9af	feat: update SiliconFlow API Key descriptions in localization files	2025-05-10 14:10:42 +08:00
Harry	03a06f141c	Merge pull request #655 from harry0703/dev Dev	2025-05-10 13:27:27 +08:00
harry	4c9ac5e6df	feat: loop video clips to match audio duration	2025-05-10 13:26:24 +08:00
harry	4a64e211f9	fix: correct condition for subclipping	2025-05-10 12:35:45 +08:00
harry	97c631e696	feat: improve file extension parsing using pathlib	2025-05-10 12:34:53 +08:00
harry	a601705bf4	feat: add unit tests	2025-05-10 12:34:37 +08:00
yyhhyyyyyy	45f32756a3	feat: increase siliconflow TTS services	2025-05-09 23:31:04 +08:00
yyhhyyyyyy	22f47d90de	feat: add TTS services provider selection list	2025-05-09 22:14:43 +08:00
Harry	c03dc9c984	Merge pull request #652 from harry0703/dev perf: optimize memory usage and processing performance, bump version to 1.2.5	2025-05-09 20:56:14 +08:00
harry	7569c08a62	perf: bump version to 1.2.5	2025-05-09 20:55:36 +08:00
harry	f07e5802f7	perf: optimize memory usage and processing performance	2025-05-09 20:55:12 +08:00
Harry	ffcfe8e03b	Merge pull request #642 from harry0703/dev feat: remove voice filter	2025-05-08 18:10:16 +08:00
harry	35a7ef657a	feat: remove voice filter	2025-05-08 18:09:26 +08:00
Harry	250ec4f65c	Merge pull request #641 from harry0703/dev update	2025-05-08 17:39:44 +08:00
harry	5d0ffdad8a	feat: update README.md for clarity and remove outdated information	2025-05-08 17:39:16 +08:00
harry	95e4d3170d	feat: rename container names in docker-compose.yml	2025-05-08 17:35:12 +08:00
harry	dfa8328bb0	feat: optimize code	2025-05-08 17:34:51 +08:00
harry	5177c1871a	feat: comment out interline and size parameters in video.py	2025-05-08 17:34:09 +08:00
Harry	1901c2905b	Merge pull request #639 from harry0703/dev feat: remove streamlit_authenticator	2025-05-08 15:53:06 +08:00
harry	b312c52a33	feat: remove streamlit_authenticator	2025-05-08 15:51:33 +08:00
Harry	fb974cefcf	Merge pull request #638 from harry0703/dev bump version to 1.2.4	2025-05-08 15:45:00 +08:00
harry	c7f7fa12b4	feat: optimize code and bump version to 1.2.4	2025-05-08 15:44:07 +08:00
harry	6a19e2bb29	feat: update requirements.txt and config.example.toml	2025-05-08 15:40:46 +08:00
Harry	443f5bf61e	Merge pull request #632 from eren1106/fix-subtitle-bug Fix subtitle generation not working by setting the default subtitle provider to "edge"	2025-05-08 09:10:19 +08:00
Harry	7d00e9c768	Merge pull request #617 from garylab/main Solve subtitle header and footer was cut in some font family	2025-05-08 09:09:45 +08:00
Harry	c0ab0ba473	Merge pull request #614 from faycal-rakza/fix/comment fix(dockerfile): comment fix	2025-05-08 09:08:55 +08:00
Gary Meng	4b2f9e42d7	Merge branch 'harry0703:main' into main	2025-05-07 11:28:57 +04:00
eren	4ce32a8851	fix: set default subtitle provider to 'edge'	2025-05-01 14:35:23 +08:00
yyhhyyyyyy	47e4cff758	feat: Add PDM support with auth & i18n enhancements (#627 ) * feat: Add PDM support with auth & i18n enhancements 1. Added PDM project dependency management - Created pyproject.toml for dependency definitions - Added PDM lock file for reproducible builds - Created .pdm-python for virtual environment management 2. Enhanced authentication & configuration - Added user validation in base configuration - Implemented streamlit-authenticator for login functionality - Updated config.example.toml with user authentication fields 3. Improved internationalization (i18n) - Updated translation files for multiple languages (en, de, pt, vi, zh) - Enhanced i18n support in the web UI - Standardized translation structure across language files	2025-04-27 13:35:45 +08:00
Gary Meng	96e109e199	Solve subtitle header and footer was cut in some font family	2025-03-26 20:57:13 +04:00
Harry	36dffe8de3	Merge pull request #599 from bz-e/main refactor: Refactor the get_all_azure_voices function	2025-03-23 18:45:26 +08:00
Harry	6d2e4a8081	Merge pull request #603 from garymengcom/main Add get_all_tasks() endpoint and update .gitignore	2025-03-23 18:40:52 +08:00
faycal	a7c45b125f	fix(dockerfile): comment fix	2025-03-09 00:23:55 +01:00
Guozao Meng	6c2b5b8cf4	Update .gitignore	2025-03-08 22:54:10 +04:00
Guozao Meng	91e9f3900d	Add get_all_tasks() endpoint	2025-03-08 22:53:22 +04:00
evan.zhang5	ab1bd03f0b	refactor: Refactor the get_all_azure_voices function to reduce the amount of code by half	2025-02-27 17:31:32 +08:00
Harry	cd0cbc8061	Merge pull request #583 from iorikingdom/main Update requirements.txt	2025-02-10 11:08:23 +08:00
iorikingdom	c6c6390a83	Update requirements.txt	2025-02-09 02:26:43 +09:00
iorikingdom	6bfb9355cf	Update requirements.txt	2025-02-09 02:20:21 +09:00
harry	34d785a246	feat: remove wechat qrcode	2025-02-07 17:07:06 +08:00
harry	c9bd480514	fix: ModuleNotFoundError: No module named 'app'	2025-02-07 17:06:26 +08:00
Harry	5349f29415	Merge pull request #579 from vipinbihari/patch-1 Update video.py - Fixing BackGround Music Volume Multiplier	2025-02-05 14:53:04 +08:00
VIPIN BIHARI	6500cafa4f	Update video.py - Fixing BackGround Music Volume Multiplier These was a typo in MuiliplyVolume function parameter. The name of the parameter should be bgm_voice	2025-01-29 21:08:17 +05:30
yyhhyy	e2e92a433e	✨ feat: Add video transition effects (fadein, fadeout, slidein, slideout)	2025-01-23 12:13:04 +08:00
yyhhyyyyyy	dd90cfecbb	✨ feat: Added SlideIn and SlideOut video transition effects and optimized front-end implementation	2025-01-09 19:46:57 +08:00
yyhhyyyyyy	7a5b037ad8	✨ feat: Add video transition effects (fadein, fadeout)	2024-12-24 22:39:48 +08:00
Harry	ee0d2371d5	Merge pull request #554 from yyhhyyyyyy/llm-logic 🐛 fix: fix the LLM logic	2024-12-12 16:54:09 +08:00
yyhhyyyyyy	c4586d37f5	🎨 style: format llm.py code	2024-12-12 14:32:17 +08:00
yyhhyyyyyy	2d8cd23fe7	🐛 fix: fix the LLM logic	2024-12-12 14:29:14 +08:00
Harry	85d446e2d0	Merge pull request #552 from yyhhyyyyyy/code-cleanup 🎨 style: Format Code	2024-12-10 14:45:11 +08:00
yyhhyyyyyy	afd064e15d	🎨 style: Format Code	2024-12-10 10:34:56 +08:00
Harry	809d6cabbb	Merge pull request #548 from harry0703/dev feat: add feature request template	2024-12-06 15:48:01 +08:00
harry	8058eed9ab	feat: add feature request template	2024-12-06 15:47:04 +08:00
Harry	15ee6126a5	Merge pull request #547 from harry0703/dev feat: add issue template	2024-12-06 15:37:45 +08:00
harry	b6a7ea2756	feat: add issue template	2024-12-06 15:37:23 +08:00
Harry	63c3402c94	Update version to 1.2.2	2024-12-06 13:45:43 +08:00
Harry	5a6dd6c7a5	Merge pull request #541 from yyhhyyyyyy/update-requirements ⬆️ deps: Upgrade dependencies to latest versions and address minor issues	2024-12-05 11:02:14 +08:00
yyhhyy	8c226322a0	Merge branch 'main' into update-requirements	2024-12-05 10:59:41 +08:00
Harry	3a7888937f	Merge pull request #536 from Felix3322/main better requirements.txt	2024-12-05 10:47:26 +08:00
yyhhyyyyyy	6760a0ad00	📝 docs: Update documentation	2024-12-05 10:34:09 +08:00
yyhhyyyyyy	6288b70ae2	⬆️ deps: Upgrade dependencies to latest versions and address minor issues	2024-12-05 10:16:38 +08:00
Jiaying Liu	4adc010388	Update requirements.txt	2024-11-27 15:04:46 -05:00
Harry	162b5e17c3	Merge pull request #508 from flingjie/main allow api key empty when using ollama	2024-11-20 15:45:40 +08:00
Harry	0d43ba2124	Merge pull request #505 from LucasHenriqueDiniz/main feat: add PT-BR translation	2024-11-20 15:45:18 +08:00
Harry	080d8d82b4	Merge pull request #504 from Dreyfi/fix-403-error-pexels-request Fix the response 403 from pexels - search_videos_pexels - failed to download videos, maybe the network is not available. if you are in China, please use a VPN.	2024-11-20 15:44:46 +08:00
Harry	fc50e16bc5	Merge pull request #486 from FLY-Open-AI/main [Readme]Docker部署，启动命令优化。	2024-11-20 15:44:08 +08:00
Jie.F	345b6d59a1	allow api key empty when using ollama the ollama API key is not required	2024-10-08 09:44:39 +08:00
Dreyfi	4ec19fd56a	Add headers with user_agent to save_video request	2024-09-30 15:48:54 +10:00
Lucas Diniz	136630ec60	feat: add PT-BR translation	2024-09-29 19:30:12 -03:00
Dreyfi	9d3d99a595	Fix the response 403 from pexels search_videos_pexels - failed to download videos, maybe the network is not available. if you are in China, please use a VPN.	2024-09-28 16:25:53 +10:00
wangyanfei	747c745ec0	[Readme]Docker部署，启动命令优化。最新版的docker安装时会自动以插件的形式安装docker compose，启动命令调整为docker compose up	2024-08-31 07:22:05 +08:00
Harry	a53ca843e8	Merge pull request #467 from harry0703/dev update readme	2024-07-26 18:23:52 +08:00
harry	8b18d84d8a	update readme	2024-07-26 18:23:04 +08:00
Harry	edc4df6eb5	Merge pull request #466 from harry0703/dev fixed: subtitle generation failure	2024-07-26 17:56:32 +08:00
harry	5ed98d317c	fixed: subtitle generation failure	2024-07-26 17:55:26 +08:00
Harry	c22ef5f1d2	Merge pull request #462 from harry0703/dev update readme	2024-07-25 15:00:07 +08:00
harry	bcc9621976	update readme	2024-07-25 14:59:45 +08:00
Harry	6512e3f140	Merge pull request #461 from harry0703/dev Optimize memory usage in moviepy	2024-07-25 13:58:46 +08:00
harry	931e1a0caa	Optimize memory usage in moviepy Upgrade version number to 1.2.0	2024-07-25 13:57:39 +08:00
yyhhyy	84ae8e5248	Merge pull request #460 from yyhhyyyyyy/code-formatting Code Formatting	2024-07-25 13:39:05 +08:00
yyhhyyyyyy	5c2db3aa92	resolve issue with video concatenation order always being random	2024-07-25 13:36:21 +08:00
yyhhyyyyyy	905841965a	Format project code	2024-07-24 14:59:06 +08:00
Harry	bbd4e94941	Merge pull request #459 from yyhhyyyyyy/customize-subtitle-position feat: support custom subtitle positioning	2024-07-24 14:35:50 +08:00
yyhhyyyyyy	b89250874b	Change default value to 70.0	2024-07-24 14:31:56 +08:00
yyhhyyyyyy	e8b20c697d	feat: support custom subtitle positioning	2024-07-24 14:25:20 +08:00
Harry	e64041c93d	Merge pull request #458 from yyhhyyyyyy/refactor-task-add-subtitle-api Refactor task.py and add subtitle API	2024-07-24 11:47:27 +08:00
yyhhyyyyyy	17b4a61e64	1.Refactor task.py to encapsulate separable functions. 2.Add a new subtitle API.	2024-07-23 17:00:23 +08:00
Harry	6d520a4266	Merge pull request #453 from yyhhyyyyyy/fit-oneapi fit(oneapi):Fix the issue where model_name is always empty when using OneAPI as the LLM source.	2024-07-22 10:38:10 +08:00
yyhhyyyyyy	7ff8467f9d	Fix the issue where model_name is always empty when using OneAPI as the LLM source.	2024-07-20 09:36:19 +08:00
Harry	4cf9cefb5c	Merge pull request #450 from yyhhyyyyyy/fit-subtitle-correct fit(subtitle):Fix subtitle correction logic	2024-07-20 08:25:25 +08:00
yyhhyyyyyy	33534db8bb	1. .gitignore ignores the models folder 2. Fix subtitle correction logic	2024-07-19 15:00:17 +08:00
Harry	ec16f1c41b	Merge pull request #449 from harry0703/dev update readme	2024-07-19 14:21:56 +08:00
harry	9653d7d18a	update readme	2024-07-19 14:21:35 +08:00
Harry	36a367d713	Merge pull request #448 from yyhhyyyyyy/add-rate feat(azure_tts_v1): Allows to control the speed of speech generation.	2024-07-19 14:17:15 +08:00
yyhhyyyyyy	77b304537a	Speech Rate	2024-07-19 11:15:36 +08:00
yyhhyyyyyy	63fb848a17	1. Add azure_tts_v1 to control the speed of speech	2024-07-19 11:06:34 +08:00
Harry	6853163905	Merge pull request #447 from harry0703/dev update readme	2024-07-15 14:09:55 +08:00
harry	052c29b579	update readme	2024-07-15 14:09:33 +08:00
Harry	df62529f2a	Merge pull request #443 from harry0703/dev update readme	2024-07-09 13:41:04 +08:00
harry	934eff13ae	update readme	2024-07-09 13:40:43 +08:00
Harry	0472338184	Merge pull request #437 from harry0703/dev support baidu ERNIE llm	2024-07-03 21:13:51 +08:00
harry	66c81a04bf	support baidu ERNIE llm	2024-07-03 21:12:21 +08:00
Harry	8dd66cf624	Merge pull request #435 from harry0703/dev update readme	2024-07-02 10:00:53 +08:00
harry	dca23d99e4	update readme	2024-07-02 09:57:53 +08:00
Harry	42560cc7f5	Merge pull request #421 from harry0703/dev update readme	2024-06-21 11:01:41 +08:00
harry	11478063e7	update readme	2024-06-21 11:01:15 +08:00
Harry	bf0dbcc045	Merge pull request #414 from harry0703/dev update readme	2024-06-15 17:37:36 +08:00
harry	43df593ac3	update readme	2024-06-15 17:36:37 +08:00
Harry	7cf21c6541	Merge pull request #408 from harry0703/dev update readme	2024-06-11 11:50:48 +08:00
harry	f76f905833	update readme	2024-06-11 11:48:04 +08:00
Harry	0f27c26042	Merge pull request #399 from harry0703/dev update readme	2024-06-04 10:36:18 +08:00
harry	e1d7318cee	update readme	2024-06-04 10:34:32 +08:00
Harry	6408c31b7f	Merge pull request #391 from harry0703/dev update readme	2024-05-28 18:41:24 +08:00
harry	b0d694db08	update readme	2024-05-28 14:51:03 +08:00
Harry	730c2a461a	Merge pull request #381 from harry0703/dev update readme	2024-05-23 18:21:05 +08:00
harry	bdb49a4c82	update readme	2024-05-23 18:20:45 +08:00
Harry	a4692060a0	Merge pull request #372 from harry0703/dev enhanced exception handling for generating terms	2024-05-17 17:12:13 +08:00
harry	fc6844dd19	enhanced exception handling for generating terms	2024-05-17 17:11:35 +08:00
Harry	d740a6babd	Merge pull request #370 from harry0703/dev update readme	2024-05-17 08:44:01 +08:00
harry	9c58991830	update readme	2024-05-17 08:43:35 +08:00