Skip to content

Conversation

Copy link

Copilot AI commented Oct 6, 2025

Successfully implemented pinyin grouping functions with two variants for different use cases:

New Functions

1. pinyin_group() and lazy_pinyin_group()

  • pinyin_group: Returns pinyin as a list (supports multi-pronunciation with heteronym=True)
  • lazy_pinyin_group: Returns pinyin as a string (single pronunciation only, simpler output)

Both functions:

  • Segment input text and group by words
  • Each group returns hanzi and corresponding pinyin
  • Handle erhua (儿化音) - combine 儿 with previous character
  • Handle apostrophe separation for ambiguous pinyin (e.g., xi'an)
  • Support both unicode string AND string list as input
    • String input: performs segmentation
    • List input: skips segmentation (allows custom segmentation)

Implementation Details

Architecture (following existing patterns)

  • Added Pinyin.pinyin_group() and Pinyin.lazy_pinyin_group() methods in the Pinyin class
  • Module-level functions call the respective methods
  • Follows the same pattern as pinyin() / lazy_pinyin() functions

Key Differences

pinyin_group:

[{'hanzi': '你好', 'pinyin': ['ni hao']}]  # pinyin is a list

lazy_pinyin_group:

[{'hanzi': '你好', 'pinyin': 'ni hao'}]  # pinyin is a string

Use Cases

  • pinyin_group: When you need multi-pronunciation support or consistency with other functions
  • lazy_pinyin_group: Simpler output for common cases where single pronunciation is sufficient
  • HTML <ruby> tag integration for annotating Chinese text
  • Handling erhua and apostrophe display requirements
  • Educational applications with grouped pinyin display

Testing

  • All existing tests pass (229 passed, 2 skipped)
  • Added 22 comprehensive tests covering both functions
    • Basic functionality, punctuation handling
    • Erhua and apostrophe handling
    • List input support
    • Method existence verification
    • Comparison tests between the two variants

Exports

Both functions are exported from pypinyin/__init__.py:

  • pinyin_group
  • lazy_pinyin_group

Type hints added to .pyi files for both functions and methods.

Closes #245

Original prompt

This section details on the original issue you should resolve

<issue_title>新增一个函数用于实现对结果进行分组输出</issue_title>
<issue_description>## 问题描述

新增一个函数用于实现对结果进行分组(按词语分组、儿化音分组、隔音符号连接的拼音分组)输出:

>>> xxx('你好吗?')
[
   {
      "hanzi": "你好",
      "pinyin": ["ní hǎo"],
   },
   {
      "hanzi": "吗",
      "pinyin": ["ma"],
   },
  {
      "hanzi": "?",
      "pinyin": [],
   },
]

>>> xxx('西安')
[
   {
      "hanzi": "西安",
      "pinyin": ["xi'an"],
   },
]

>>> xxx('花儿')
[
   {
      "hanzi": "花儿",
      "pinyin": ["huar"],
   },
]

case:

  • 配合 <ruby> 标签实现汉字标注拼音的显示效果:你好 (níhǎo)(ma) (当然,这个场景下获取拼音的需求更推荐使用 javascript 库在前端实现)

  • 解决儿化音以及类似 xi'an 这种连在一起的拼音的显示场景的需求

#245
</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #280

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@coderabbitai
Copy link

coderabbitai bot commented Oct 6, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI and others added 2 commits October 6, 2025 08:56
Co-authored-by: mozillazg <485054+mozillazg@users.noreply.github.com>
Co-authored-by: mozillazg <485054+mozillazg@users.noreply.github.com>
Copilot AI changed the title [WIP] 新增一个函数用于实现对结果进行分组输出 Add pinyin_group function for grouped pinyin output with erhua and apostrophe handling Oct 6, 2025
Copilot AI requested a review from mozillazg October 6, 2025 09:03
@mozillazg
Copy link
Owner

@copilot

…up method

Co-authored-by: mozillazg <485054+mozillazg@users.noreply.github.com>
Copilot AI requested a review from mozillazg October 6, 2025 13:37
@mozillazg
Copy link
Owner

@copilot

Co-authored-by: mozillazg <485054+mozillazg@users.noreply.github.com>
Copilot AI changed the title Add pinyin_group function for grouped pinyin output with erhua and apostrophe handling Add pinyin_group and lazy_pinyin_group functions for grouped pinyin output with erhua and apostrophe handling Oct 6, 2025
Copilot AI requested a review from mozillazg October 6, 2025 13:54
dependabot bot and others added 7 commits November 15, 2025 19:28
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v4...v5)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v5...v6)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Fix flake8 errors: E129 (visual indent), E501 (line too long), F401 (unused import).
@mozillazg mozillazg changed the title Add pinyin_group and lazy_pinyin_group functions for grouped pinyin output with erhua and apostrophe handling Add pinyin_group and lazy_pinyin_group functions for grouped pinyin output with apostrophe handling Jan 2, 2026
@mozillazg
Copy link
Owner

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Jan 2, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

* chore: migrate project management and CI/CD to uv

- Add pyproject.toml and uv.lock for uv-based dependency management.
- Update Makefile to use 'uv build' for package building.
- Update .github/workflows/ci.yml to use astral-sh/setup-uv and uv commands.
- Update .circleci/config.yml to install and use uv for testing.
- Add uv installation instructions to README.rst and README_en.rst.

* fix ci

* no longer support pypy2

* use astral-sh/setup-uv@v7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants