-
Notifications
You must be signed in to change notification settings - Fork 873
Fix validate consistency #7679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Fix validate consistency #7679
Conversation
|
感谢你贡献飞桨文档,文档预览构建中,Docs-New 跑完后即可预览,预览链接:http://preview-pr-7679.paddle-docs-preview.paddlepaddle.org.cn/documentation/docs/zh/api/index_cn.html |
📚 本次 PR 文档预览链接(点击展开)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes validation consistency issues in the PyTorch to PaddlePaddle API documentation conversion tools. The changes improve the API difference validation script, refactor the API discovery logic to recursively scan all markdown files, and reorganize API documentation by moving files from category-specific subdirectories to more appropriate locations.
Changes:
- Enhanced validation script with overloaded API support and optimized lookup performance using dictionary mapping
- Refactored API discovery to recursively scan all markdown files regardless of directory structure
- Reorganized API documentation files, moving transformers APIs from
torch_more_args/paddle_more_args/otherstoinvok_only_diff/args_name_diff/composite_implement - Updated various API signatures with missing asterisks (*) to denote keyword-only parameters
Reviewed changes
Copilot reviewed 153 out of 153 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| validate_api_difference_consistency.py | Added OVERLOADED_APIS dictionary, ALLOW_MISSING_DIFF_DOCS list, api_diff_map for O(1) lookup, optimized validation logic |
| get_api_difference_info.py | Refactored discover_all_metas to recursively scan all .md files with automatic library prefix detection |
| transformers.PretrainedConfig.md | Deleted from torch_more_args (moved to invok_only_diff) |
| transformers.GenerationConfig.md | Deleted from torch_more_args (moved to invok_only_diff) |
| transformers.AddedToken.md | Deleted from torch_more_args (moved to invok_only_diff) |
| torchvision.models.inception.*.md | Added new InceptionA-E documentation |
| torchvision.models.Inception3.md | Added new Inception3 documentation |
| torch.*.md (multiple) | Updated API signatures with asterisks for keyword-only parameters |
| transformers.*.md (multiple) | Changed references from paddlenlp to paddleformers |
| torch.nn.Module.*.md (multiple) | Deleted files moved from paddle_more_args/input_args_usage_diff to other categories |
zhwesky2010
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 有些不符合 差异文档规范 的,需要按照这个文档规范来 pytorch_api_mapping_format_cn.md,CI上有自动化工具拦截不符合规范
- 可能有一个问题:paconvert中有些no_need_convert还没更新,最近又修改了不少api,可能得先测下paconvert,将已对齐的都加上去
...ides/model_convert/convert_from_pytorch/api_difference/composite_implement/torch.std_mean.md
Outdated
Show resolved
Hide resolved
.../guides/model_convert/convert_from_pytorch/api_difference/args_name_diff/torch.Tensor.std.md
Outdated
Show resolved
Hide resolved
...ides/model_convert/convert_from_pytorch/api_difference/composite_implement/torch.var_mean.md
Outdated
Show resolved
Hide resolved
...s/model_convert/convert_from_pytorch/api_difference/input_args_type_diff/torch.block_diag.md
Outdated
Show resolved
Hide resolved
..._convert/convert_from_pytorch/api_difference/input_args_type_diff/torch.broadcast_tensors.md
Outdated
Show resolved
Hide resolved
...ides/model_convert/convert_from_pytorch/api_difference/torch_more_args/torch.Tensor.round.md
Outdated
Show resolved
Hide resolved
...des/model_convert/convert_from_pytorch/api_difference/torch_more_args/torch.Tensor.round_.md
Outdated
Show resolved
Hide resolved
docs/guides/model_convert/convert_from_pytorch/api_difference/torch_more_args/torch.baddbmm.md
Outdated
Show resolved
Hide resolved
...el_convert/convert_from_pytorch/api_difference/torch_more_args/torch.nn.functional.kl_div.md
Outdated
Show resolved
Hide resolved
| # functions currently. Currently, we hard code the check of overloaded functions | ||
| # in this file. | ||
|
|
||
| OVERLOADED_APIS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个名单怎么这么长,这个是不是可以加到pre-commit白名单里
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个名单怎么这么长
移除了可以合并为一个签名的重载。
这个是不是可以加到pre-commit白名单里
感觉没必要吧,格式化后可读性可好一点
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个名单怎么这么长
移除了可以合并为一个签名的重载。
这个是不是可以加到pre-commit白名单里
感觉没必要吧,格式化后可读性可好一点
白名单只需要能跑就行,一般不需要可读性,不用占太多行或者单独挪一个文件。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前感觉还是有一些错误,但是CI通过了。这两个工具还是有不少完善的点:
- 差异文档格式检查工具
- 差异文档内容检查工具
存量修复过程中记录下工具漏检、误检的点,存量修完开展工具完善。
.../guides/model_convert/convert_from_pytorch/api_difference/args_name_diff/torch.Tensor.var.md
Show resolved
Hide resolved
| paddleformers.generation.LogitsProcessor(input_ids: paddle.Tensor, scores: paddle.Tensor) | ||
| ``` | ||
|
|
||
| 两者功能一致但参数名不一致,部分参数名不同,具体如 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
具体如下:
| torch.std_mean(input, dim, unbiased=True, keepdim=False) | ||
| torch.std_mean(input, dim=None, unbiased=True, keepdim=False, *, correction=None) | ||
| ``` | ||
| 用于实现返回 Tensor 的标准差和均值,PaddlePaddle 目前暂无对应 API,可使用如下代码组合实现该 API。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这种格式来写:
注:torch旧版本额外重载了
torch.std_mean(input, unbiased=True)的签名,该用法未提供转写示例
| ```python | ||
| # PyTorch 写法 | ||
| std, mean = torch.std_mean(x, dim=1) | ||
| std, mean = torch.std_mean(x, True) # torch 支持 unbiased 以第二个位置参数的形式传入 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不用写,容易导致看乱了
| ```python | ||
| # PyTorch 写法 | ||
| var, mean = torch.var_mean(x, dim=1) | ||
| var, mean = torch.var_mean(x, True) # torch 支持 unbiased 以第二个位置参数的形式传入 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同std_mean的形式
| ## [ 仅 API 调用方式不一致 ]transformers.LogitsProcessorList | ||
| ### [transformers.LogitsProcessorList](https://hf-mirror.com/docs/transformers/v4.42.0/en/internal/generation_utils#transformers.LogitsProcessorList) | ||
| ```python | ||
| transformers.LogitsProcessorList() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
形参有吗,太多的话写成 **kwargs,不然容易被用户当做无参函数。
| ```python | ||
| torch.linalg.matrix_rank(A, *, atol=None, rtol=None ,hermitian=False, out=None) | ||
| torch.linalg.matrix_rank(x, tol=None, hermitian=False, *, name=None) | ||
| torch.linalg.matrix_rank(x, tol=None, hermitian=False, *, out=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TypeError: linalg_matrix_rank() received an invalid combination of arguments - got (), but expected one of:
* (Tensor input, *, Tensor atol = None, Tensor rtol = None, bool hermitian = False, Tensor out = None)
* (Tensor input, *, float atol = None, float rtol = None, bool hermitian = False, Tensor out = None)
* (Tensor input, Tensor tol, bool hermitian = False, *, Tensor out = None)
* (Tensor input, float tol, bool hermitian = False, *, Tensor out = None)
| ```python | ||
| torch.linalg.matrix_rank(A, *, atol=None, rtol=None ,hermitian=False, out=None) | ||
| torch.linalg.matrix_rank(x, tol=None, hermitian=False, *, name=None) | ||
| torch.linalg.matrix_rank(x, tol=None, hermitian=False, *, out=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些签名是不是对的
| | inception_blocks | - | 用于构建网络的 Inception 模块,Paddle 无此参数,暂无转写方式。 | | ||
| | init_weights | - | 是否对权重进行初始化,Paddle 无此参数,暂无转写方式。 | | ||
| | dropout | - | Dropout 概率,Paddle 无此参数,暂无转写方式。 | | ||
| | - | with_pool | 是否在最后的全连接层前使用池化,Paddle 特有参数。 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddle应如何处理?
| # functions currently. Currently, we hard code the check of overloaded functions | ||
| # in this file. | ||
|
|
||
| OVERLOADED_APIS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个名单怎么这么长
移除了可以合并为一个签名的重载。
这个是不是可以加到pre-commit白名单里
感觉没必要吧,格式化后可读性可好一点
白名单只需要能跑就行,一般不需要可读性,不用占太多行或者单独挪一个文件。
本 PR 修复差异文档与 PaConvert 中映射规则不一致的问题,主要涉及以下几类