Skip to content

fix(docx): properly escape special markdown characters at start of lines#2160

Open
saitejabandaru-in wants to merge 1 commit into
microsoft:mainfrom
saitejabandaru-in:fix-docx-escaping
Open

fix(docx): properly escape special markdown characters at start of lines#2160
saitejabandaru-in wants to merge 1 commit into
microsoft:mainfrom
saitejabandaru-in:fix-docx-escaping

Conversation

@saitejabandaru-in

Copy link
Copy Markdown

What does this PR do?

Fixes an issue where DOCX conversion with markitdown fails to escape Markdown special characters at the beginning of a line (e.g. #, >, +), causing plain text that happens to start with these characters to be rendered as Markdown headings, blockquotes, or lists.

Fixes #2157

Changes

Added escaping logic in _markdownify.py's process_text to catch line-starting Markdown block characters and escape them. Also added a pytest to verify the escaping works correctly for strings starting with #.

@saitejabandaru-in

Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

converting docx files to markdown with markitdown.exe does not escape special characters

1 participant