Features How it works Pricing FAQ Investors Launch App →
v2.16 · Public Beta

Turn any document into
structured intelligence

Built for

GeoGraph OCR uses Google Gemini AI to extract entities, build knowledge graphs, and enrich every scan with GPS metadata — slashing research time from 40 hours to under 5 minutes. Your data, your ownership.

Free to start
Offline-first PWA
You own your data
Open source (MIT)
0%
Time savings for researchers
500B+
Pre-2010 docs not yet digitized
$1.2B
AI training data market opportunity
40 hrs → 5 min
Research time per collection
The problem

80–90% of historical data is
invisible to AI

Institutions have spent decades digitizing records, but the data stays locked in flat files — no relationships, no GPS, no structure AI can use. The "data wall" is arriving 2026–27.

Today's reality
  • Millions of digitized items stored as flat JSON — no relationships
  • 35+ inconsistent schemas across archives — impossible to join
  • No entity extraction, no GPS metadata, no semantic search
  • Researchers spend 40+ hours per collection on manual curation
  • LLMs trained on post-2010 web data — historical knowledge is dark
  • Google / AWS OCR: vendor lock-in, no graph, you don't own the rows
GeoGraph OCR solves this
  • Automatic entity extraction from any photo — people, places, dates, orgs
  • Knowledge graph auto-built with semantic cross-document links
  • GPS + GIS metadata on every document, normalized to WGS84
  • Reduce curation from 40 hours to under 5 minutes per collection
  • Structured AI training datasets ready to license
  • User-owned Supabase rows with Row-Level Security — not vendor-locked

Everything you need to unlock dark data

Built on Google Gemini 2.5 Flash with a privacy-first, offline-capable architecture — 50,000+ lines of production TypeScript.

AI-Powered OCR

Extract text from historical documents, artifacts, signs, and scenery. Multi-language detection built in. Configurable: Gemini, OpenAI, or local models.

Knowledge Graph

Automatically links entities across documents. Interactive force-directed D3.js visualization. Export to JSON, CSV, or GraphML.

GIS Metadata

Enrich every scan with GPS coordinates, zone classification, and historical location correlation. Coordinates normalized to WGS84.

3D Metaverse Explore

Navigate your entire document corpus in an immersive 3D spatial environment powered by Three.js. Semantic clustering groups related records visually.

Smart Deduplication

Semantic NLP detects and merges duplicate entities across thousands of documents automatically. No manual cleanup required.

Batch Processing

Process hundreds or thousands of documents in parallel. Server-side queue with pause/resume/cancel controls and real-time progress tracking.

Privacy-First Storage

All data stored locally in IndexedDB by default. Cloud sync with Supabase is fully opt-in. End-to-end encryption. You control what gets shared.

AI Training Marketplace

License your structured datasets to AI companies. Earn passive income as a fractional owner when your data is licensed via the Web3 marketplace.

Mobile PWA

Installable progressive web app with full offline capability. Point your phone at any document and capture it on the spot — museum, estate sale, or archive.

How it works

Three steps from photo
to structured knowledge

No complex setup. No vendor lock-in. Point, capture, explore.

1

📸 Capture

Point your camera at any document, artifact, sign, or page. GeoGraph captures the image and automatically tags the GPS location.

2

🧠 Extract

Gemini 2.5 Flash extracts raw text, then identifies entities (people, places, dates, orgs), temporal era, and semantic relationships.

3

🔗 Connect

Entities are linked across your entire corpus in an interactive knowledge graph. Search, query, visualize in 3D, export, or license to AI companies.

See GeoGraph OCR in action

A full-featured web app that works on desktop and mobile, online and offline.

geographocrnode.vercel.app
GeoGraph OCR main interface Knowledge graph visualization
GeoGraph OCR mobile
GeoGraph OCR mobile graph
Competitive comparison

Built for what the others missed

Existing tools give you OCR. GeoGraph gives you a structured, user-owned, monetizable knowledge base.

Capability Google Vision AWS Textract Smithsonian JSON GeoGraph OCR ✦
Text extraction (OCR)Partial
Entity extraction (NLP)LimitedLimited✓ Full
Knowledge graph auto-build
GPS / GIS metadata
User-owned DB rows (RLS)✗ Vendor lock✗ Vendor lock✓ Supabase RLS
Offline-first PWA
AI training data monetization✓ Marketplace
3D spatial visualization✓ Three.js

Who uses GeoGraph OCR

From solo researchers to enterprise institutions — if you work with documents, this is for you.

🏛️

Archivists & Museums

Process entire collections in hours. Auto-extract entities, build provenance graphs, and make holdings searchable by anyone.

~50K institutions globally
📚

Researchers & Historians

Stop spending 40 hours manually cataloging each collection. Let AI extract the knowledge so you can focus on the insights.

10M+ knowledge workers
⚖️

Legal Firms

Automate discovery. Extract key entities from thousands of documents in minutes. Full confidentiality with offline-first storage.

~1.3M firms globally
🤖

AI Companies

Buy structured, verified, historically rich training datasets from the marketplace — the kind of data that pushes models past the data wall.

$1.2B market opportunity
Pricing

Simple, transparent pricing

Start for free with your own API keys. Scale as your corpus grows. Cancel anytime.

Free
$0 / month

For individuals and hobbyists starting their first collections.

  • 100 OCR scans / month
  • Local IndexedDB storage
  • Basic knowledge graph
  • GIS metadata capture
  • Bring your own API keys
  • Community support
Get started free
Enterprise
$199 / month

For institutions, museums, and enterprise archival teams needing white-glove service.

  • Everything in Pro
  • Multi-user workspaces
  • AI training data licensing
  • Custom integrations & REST API
  • Dedicated onboarding & SLA
  • Compliance support
  • White-label option available
Contact sales

Processing credits included on paid plans. Free plan requires your own Gemini / OpenAI API key.

Get paid to live life.
Capture history. Earn income.

GeoGraph is the only OCR platform that gives you fractional ownership of the data you create. When AI companies license the corpus, revenue flows back to contributors — proportional to what you captured.

Step 1
📸

Capture documents

Photograph records at museums, archives, estate sales, workplaces. Every scan is structured and recorded in your account.

Step 2
🧩

Own a data shard

Your structured records are minted as GARD Data Shards — ERC-1155 tokens on-chain. You hold fractional ownership of the corpus.

Step 3
💰

AI companies license

OpenAI, Anthropic, and others buy structured historical datasets from the marketplace. Revenue splits proportionally to shard holders.

Step 4
🔄

Earn more, capture more

To grow your share, capture more records. Visit museums. Explore archives. Build a data portfolio while living your life.

ERC-1155 Shards Supabase RLS Ethers.js 6 On-chain provenance Fractionalized ownership

Seeking $150K for a
$1.2B opportunity

Pre-seed round — 8–10% equity to structure the first 100 archival collections and prove the AI licensing model. The AI training data crisis is real, and the window to own this market is closing.

✓ 50K+ lines TypeScript ✓ Production on Vercel + Supabase ✓ Security audited ✓ v2.16 — bi-weekly releases
Talk to us → Read pitch deck

Frequently asked

What types of documents can GeoGraph OCR process?

Any photo or scan — handwritten letters, printed records, historical maps, newspaper clippings, legal documents, safety posters, artifacts, and museum placards. Multi-language detection covers Latin scripts, CJK characters, and more.

Does it work offline?

Yes. GeoGraph OCR is an offline-first PWA. All data is stored in IndexedDB on your device. Cloud sync with Supabase is fully opt-in — your data never has to leave your device unless you choose to enable it.

How is my data kept private?

By default everything stays on your device. When you enable cloud sync, data is stored in your own Supabase instance with Row-Level Security — even we cannot read your rows. OCR API calls go directly from your browser to your AI provider using your own key.

What is the Web3 / data ownership model?

When you capture and structure documents, you can opt-in to mint them as GARD Data Shards — ERC-1155 NFTs representing fractional ownership of the corpus. When AI companies license datasets, revenue is distributed proportionally to shard holders. Think royalties for the data you help create.

Can I export the knowledge graph?

Yes. Export your entire graph as JSON, CSV, or GraphML. Pro users also get structured AI-training dataset exports compatible with Hugging Face and OpenAI fine-tuning formats.

Do I need a Google Gemini API key?

On the Free plan, yes — bring your own Gemini or OpenAI key (Gemini 2.5 Flash has a generous free tier). On paid plans, processing credits are included. Local model support via Ollama and LM Studio is also available.

Ready to unlock your
dark data?

Join researchers, archivists, and enterprises transforming their collections with GeoGraph OCR.

Start for free Talk to us