What is OCR?
OCR is a command-line tool that converts PDF documents into formatted Markdown text.
It works by rendering PDF pages as images and feeding them into the deepseek-ocr:latest model via Ollama.
Unlike cloud-based solutions, this runs entirely on your machine—keeping your documents private while leveraging state-of-the-art vision-language models.
Features
Everything runs locally through Ollama. No data is ever sent to the cloud.
Converts scanned documents or slides directly into clean, editable Markdown format.
Process specific pages, ranges, or exclude parts of the document easily with CLI flags.
Uses deepseek-ocr, a specialized model for understanding layout and text in images.
Requirements
-
Ollama running locally with the model pulled:
ollama pull deepseek-ocr:latest -
Poppler (required for
pdf2image):
Debian/Ubuntu:sudo apt-get install poppler-utils
macOS:brew install poppler
Installation
The recommended way to install is via pipx:
pipx install git+https://github.com/arrase/OCR.git
Or with pip:
pip install git+https://github.com/arrase/OCR.git
Usage
Run the tool on any PDF file:
ocr document.pdf
This will create document.md in the same directory.
Page Selection
You can selectively process pages using --include and --exclude (1-based page numbers).
Process only the first page:
ocr --include 1 document.pdf
Process pages 1 through 5, skipping page 3:
ocr --include 1-5 --exclude 3 document.pdf
Complex combinations:
ocr --include 1,3,5-8 --exclude 6-7 document.pdf
Configuration
You can configure the tool using environment variables if your Ollama setup is non-standard.
OLLAMA_BASE_URL(default:http://localhost:11434/v1)OLLAMA_MODEL(default:deepseek-ocr:latest)