Skip to content

fix: resolve GBK encoding errors on Windows for Chinese content#610

Open
Mars-ending wants to merge 1 commit intoOpenBMB:mainfrom
Mars-ending:fix-gbk-encoding
Open

fix: resolve GBK encoding errors on Windows for Chinese content#610
Mars-ending wants to merge 1 commit intoOpenBMB:mainfrom
Mars-ending:fix-gbk-encoding

Conversation

@Mars-ending
Copy link
Copy Markdown

Problem

On Windows systems with Chinese locale, Python's stdout uses GBK encoding by default. This causes UnicodeEncodeError when:

  1. Model responses contain CJK characters or emoji
  2. Logs are written to files via FileHandler without encoding specified

Error example:
'gbk' codec can't encode character '\U0001f4d6' in position 1189

Changes

  1. server_main.py:

    • Wrap sys.stdout/stderr with UTF-8 TextIOWrapper on Windows
    • Add encoding='utf-8' to FileHandler for server.log
  2. utils/structured_logger.py:

    • Add encoding='utf-8' to FileHandler for workflow logs
  3. utils/logger.py:

    • Wrap print() in try/except for UnicodeEncodeError fallback
    • On encoding error, strip problematic characters gracefully

Testing

Verified on Windows 11 with Chinese locale:

  • Workflow with Chinese task prompts now completes without encoding errors
  • Generated files correctly contain Unicode characters (CJK, emoji)

## Problem

On Windows systems with Chinese locale, Python's stdout uses GBK encoding
by default. This causes UnicodeEncodeError when:
1. Model responses contain CJK characters or emoji
2. Logs are written to files via FileHandler without encoding specified

Error example:
  'gbk' codec can't encode character '\U0001f4d6' in position 1189

## Changes

1. server_main.py:
   - Wrap sys.stdout/stderr with UTF-8 TextIOWrapper on Windows
   - Add encoding='utf-8' to FileHandler for server.log

2. utils/structured_logger.py:
   - Add encoding='utf-8' to FileHandler for workflow logs

3. utils/logger.py:
   - Wrap print() in try/except for UnicodeEncodeError fallback
   - On encoding error, strip problematic characters gracefully

## Testing

Verified on Windows 11 with Chinese locale:
- Workflow with Chinese task prompts now completes without encoding errors
- Generated files correctly contain Unicode characters (CJK, emoji)

---
Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant