Skip to content

MCP mode: AI assistant as OCR driver, no separate LLM configuration needed#2165

Open
mambo-wang wants to merge 12 commits into
microsoft:mainfrom
mambo-wang:pr_mcp
Open

MCP mode: AI assistant as OCR driver, no separate LLM configuration needed#2165
mambo-wang wants to merge 12 commits into
microsoft:mainfrom
mambo-wang:pr_mcp

Conversation

@mambo-wang

Copy link
Copy Markdown

Summary

  • Remove project-specific files (openspec configs, repowiki docs, QoderWork commands/skills, examples, banner image) that were accidentally included
  • Update README.md
  • Translate markitdown-convert SKILL.md to English

Changes

  • Deleted .qoderwork/, openspec/, repowiki/, examples/, PROJECT_GUIDE.md, markitdown-mcp-banner.png
  • Updated skills/markitdown-convert/SKILL.md (Chinese to English)
  • Updated README.md

…M dependency

Add extract_only mode across all packages enabling AI assistants (CodeBuddy/Qoder/QoderWork)
to drive OCR and image analysis using their own vision capabilities via file side-channel pattern.

- Core: MarkItDown(extract_only=True) propagates to all converters, extracts images to disk
- ImageConverter: extract_only branch saves images with metadata comments
- OCR plugin: all converters (PDF/DOCX/PPTX/XLSX) support extract_only with _convert_extract_only()
- MCP server: new analyze_document tool returns text skeleton + image manifest as JSON
- MCP deps: changed from markitdown[all] to markitdown (core) for Python 3.14 compatibility
- Added openspec proposal/design/specs/tasks and repowiki documentation
Relax mcp version constraint from ~=1.8.0 to >=1.8.0 to support latest
MCP SDK 1.28.0 (protocol version 2025-11-25). All imports and API calls
verified compatible.
- README.md: restructured as Chinese fork README with actual project state,
  covering AI assistant-driven OCR, extract_only mode, MCP server config,
  supported formats, installation, and usage examples
- PROJECT_GUIDE.md: new document explaining the motivation (eliminate external
  LLM dependency), architecture (file side-channel + two-phase workflow),
  implementation details, and step-by-step usage guide
- assistant-orchestration skill: two-phase workflow for AI assistant-driven OCR
- openspec workflow commands: propose, explore, apply-change, archive-change
Without [all] extras, converting pptx/docx/pdf/xlsx etc. via MCP
would fail with MissingDependencyException.
MCP Server (markitdown-mcp):
- Add ocr_image tool for single-image text extraction
- Add extract_images option to convert_to_markdown
- Support optional LLM client via MARKITDOWN_LLM_* env vars
- Default plugins to enabled (was false)
- Normalize Windows short paths (8.3 names) for tool compatibility
- Resolve relative image refs to absolute disk paths

OCR Plugin bug fixes:
- Fix duplicate image_output_dir kwarg in extract_only mode
  for pptx, docx, and xlsx converters

Optional dependency: pip install markitdown-mcp[ocr]
Add MarkItDown-MCP banner image for use in blog posts and documentation.
@mambo-wang mambo-wang changed the title Clean up project-specific artifacts and update skill docs MCP mode: AI assistant as OCR driver, no separate LLM configuration needed Jun 26, 2026
@mambo-wang

Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree company="New H3C"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant