issue/155: 服务端支持repetition_penalty #164
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Unique Token 跟踪 (
scripts/infer_task.py)_unique_generated_tokens集合来跟踪唯一 token IDnext()生成新 token 时增量更新get_unique_previous_tokens()返回排序的唯一 token 数组批处理层 (
scripts/jiuge.py)JiugeBatchedTask从所有任务中收集唯一 tokenC++ 接口更新
inferBatchJiuge()和inferBatch()以接受previous_tokens_per_req和previous_tokens_len_per_reqInferRequest结构体以包含唯一 token 字段inferDeviceBatch()和inferDeviceBatchPaged()以传递唯一 tokenInferenceContext::randomSample()以接受并转发唯一 tokenPython 绑定 (
scripts/libinfinicore_infer/jiuge.py)inferBatchJiuge参数类型以包含唯一 token 数组infer_batch()方法签名API 服务器 (
scripts/launch_server.py)--port和--host参数用于服务器配置/models端点chat_template_kwargs透传