MCP mode: AI assistant as OCR driver, no separate LLM configuration needed#2165
Open
mambo-wang wants to merge 12 commits into
Open
MCP mode: AI assistant as OCR driver, no separate LLM configuration needed#2165mambo-wang wants to merge 12 commits into
mambo-wang wants to merge 12 commits into
Conversation
…M dependency Add extract_only mode across all packages enabling AI assistants (CodeBuddy/Qoder/QoderWork) to drive OCR and image analysis using their own vision capabilities via file side-channel pattern. - Core: MarkItDown(extract_only=True) propagates to all converters, extracts images to disk - ImageConverter: extract_only branch saves images with metadata comments - OCR plugin: all converters (PDF/DOCX/PPTX/XLSX) support extract_only with _convert_extract_only() - MCP server: new analyze_document tool returns text skeleton + image manifest as JSON - MCP deps: changed from markitdown[all] to markitdown (core) for Python 3.14 compatibility - Added openspec proposal/design/specs/tasks and repowiki documentation
Relax mcp version constraint from ~=1.8.0 to >=1.8.0 to support latest MCP SDK 1.28.0 (protocol version 2025-11-25). All imports and API calls verified compatible.
- README.md: restructured as Chinese fork README with actual project state, covering AI assistant-driven OCR, extract_only mode, MCP server config, supported formats, installation, and usage examples - PROJECT_GUIDE.md: new document explaining the motivation (eliminate external LLM dependency), architecture (file side-channel + two-phase workflow), implementation details, and step-by-step usage guide
- assistant-orchestration skill: two-phase workflow for AI assistant-driven OCR - openspec workflow commands: propose, explore, apply-change, archive-change
Without [all] extras, converting pptx/docx/pdf/xlsx etc. via MCP would fail with MissingDependencyException.
MCP Server (markitdown-mcp): - Add ocr_image tool for single-image text extraction - Add extract_images option to convert_to_markdown - Support optional LLM client via MARKITDOWN_LLM_* env vars - Default plugins to enabled (was false) - Normalize Windows short paths (8.3 names) for tool compatibility - Resolve relative image refs to absolute disk paths OCR Plugin bug fixes: - Fix duplicate image_output_dir kwarg in extract_only mode for pptx, docx, and xlsx converters Optional dependency: pip install markitdown-mcp[ocr]
Add MarkItDown-MCP banner image for use in blog posts and documentation.
Author
|
@microsoft-github-policy-service agree company="New H3C" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes