How to Convert arXiv Papers to Markdown for AI Research

·

arXiv papers are PDFs. PDFs are terrible for AI workflows. They don’t search well, they waste tokens when fed to LLMs, and they can’t be easily combined with other research materials in a knowledge base.

If you’re doing AI research --- or any field that relies on arXiv --- converting papers to Markdown changes everything.

Why Markdown for Research Papers?

LLMs understand Markdown natively. Feed Claude or ChatGPT a PDF and it struggles with formatting, page breaks, and two-column layouts. Feed it Markdown and it reads perfectly --- every equation, every code block, every reference.

10x fewer tokens. A typical arXiv paper is 200-500KB as PDF. The same content in Markdown is 10-30KB. That means you can fit 10x more papers in a single Claude context window.

Searchable across your entire library. With 50 papers as Markdown files in a folder, you can grep for any concept across all of them in milliseconds. Try that with PDFs.

Works with Obsidian. Papers as Markdown files in Obsidian become linked, tagged, and searchable. Add your own notes inline. Create connections between papers with [[wikilinks]].

How to Save arXiv Papers as Markdown

Minibase converts the arXiv abstract page (and many HTML-rendered papers) to clean Markdown.

  1. Open the arXiv paper page (e.g., arxiv.org/abs/2401.12345)
  2. Click the Minibase extension icon
  3. Get a Markdown file with the title, authors, abstract, and available content

For papers with HTML versions (increasingly common on arXiv), Minibase extracts the full paper content including equations, figures references, and citations.

Method 2: arXiv HTML + Minibase

Many recent papers have an HTML version on arXiv (look for the “HTML” link next to the PDF). Open the HTML version and use Minibase --- you’ll get the full paper as clean Markdown.

Method 3: Semantic Scholar or Papers With Code

These sites often have cleaner HTML renderings of papers. Open the paper page and use Minibase.

Building a Research Knowledge Base

The real power comes from accumulating papers over time:

research/
  attention/
    attention-is-all-you-need.md
    flash-attention-v2.md
    multi-head-latent-attention.md
  scaling/
    chinchilla-scaling-laws.md
    scaling-data-constrained.md
  agents/
    toolformer.md
    react-prompting.md
    mcp-protocol.md

Point Claude Code at this folder:

cd research
claude

Now you can ask: “Compare the attention mechanisms in these papers” or “What are the key findings on scaling laws?” Claude reads all your papers and synthesizes answers grounded in actual research.

The Karpathy Pattern

Andrej Karpathy described this approach: build a personal wiki of markdown files, let an LLM research across them. For AI researchers, this means:

  1. Save every important paper as Markdown
  2. Organize by topic
  3. Add your own notes and annotations
  4. Let Claude or ChatGPT work with the full collection

After a few months, you have a personal research assistant that knows every paper you’ve read.

Get Started

Install Minibase and start with the next arXiv paper you read. Over time, your Markdown research library compounds into something no generic AI can match.


Turn arXiv papers into a searchable, AI-readable knowledge base. Install Minibase --- free to start.

Continue reading

S

Written by

Save Team

Learn more about Minibase

Ready to save smarter?

Convert any webpage to Markdown with one click.

Add to Chrome