Turn PDFs into Markdown

Private, local, and accurate OCR using Ollama and DeepSeek-OCR.

Get Started

What is OCR?

OCR is a command-line tool that converts PDF documents into formatted Markdown text. It works by rendering PDF pages as images and feeding them into the deepseek-ocr:latest model via Ollama.

Unlike cloud-based solutions, this runs entirely on your machine—keeping your documents private while leveraging state-of-the-art vision-language models.

Features

🔒 Privacy First

Everything runs locally through Ollama. No data is ever sent to the cloud.

📄 PDF to Markdown

Converts scanned documents or slides directly into clean, editable Markdown format.

🎯 Page Selection

Process specific pages, ranges, or exclude parts of the document easily with CLI flags.

🤖 AI-Powered

Uses deepseek-ocr, a specialized model for understanding layout and text in images.

Requirements

  • Ollama running locally with the model pulled:
    ollama pull deepseek-ocr:latest
  • Poppler (required for pdf2image):
    Debian/Ubuntu: sudo apt-get install poppler-utils
    macOS: brew install poppler

Installation

The recommended way to install is via pipx:

pipx install git+https://github.com/arrase/OCR.git

Or with pip:

pip install git+https://github.com/arrase/OCR.git

Usage

Run the tool on any PDF file:

ocr document.pdf

This will create document.md in the same directory.

Page Selection

You can selectively process pages using --include and --exclude (1-based page numbers).

Process only the first page:

ocr --include 1 document.pdf

Process pages 1 through 5, skipping page 3:

ocr --include 1-5 --exclude 3 document.pdf

Complex combinations:

ocr --include 1,3,5-8 --exclude 6-7 document.pdf

Configuration

You can configure the tool using environment variables if your Ollama setup is non-standard.

  • OLLAMA_BASE_URL (default: http://localhost:11434/v1)
  • OLLAMA_MODEL (default: deepseek-ocr:latest)