LLM URL Crawler

What it Does

Takes a website URL as input
Finds all pages on that website automatically
Downloads and converts content to clean, structured text
Saves everything into llm_full.txt in a format that's easy for LLMs to process

Setup

Make sure Python is installed
Set up your workspace:

# Create environment
python -m venv .venv

# Activate it (Windows)
.venv\Scripts\activate
# or (Mac/Linux)
source .venv/bin/activate

# Install packages
pip install -r requirements.txt

How to Use

Start the crawler:

python crawler.py

Enter a website URL (example: https://example.com)
The program will:
- Scan the website for all pages
- Convert content to LLM-friendly format
- Save everything to llm_full.txt

Docker Usage

Pull and run the container:

docker pull ghcr.io/YOUR_GITHUB_USERNAME/url-crawler:latest
docker run -it ghcr.io/YOUR_GITHUB_USERNAME/url-crawler

Or build locally:

docker build -t url-crawler .
docker run -it url-crawler

Output Format

The llm_full.txt file contains:

Clean, structured text without HTML or other markup
Clear separation between different pages
Content organized in a way that's optimal for LLM processing

Common Issues

Some websites might block automated access
Large websites might take longer to process
Make sure you have a stable internet connection

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
Dockerfile		Dockerfile
Instructions.txt		Instructions.txt
README.md		README.md
crawler.py		crawler.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM URL Crawler

What it Does

Setup

How to Use

Docker Usage

Output Format

Common Issues

About

Releases

Packages

Languages

Loongphy/url-crawler

Folders and files

Latest commit

History

Repository files navigation

LLM URL Crawler

What it Does

Setup

How to Use

Docker Usage

Output Format

Common Issues

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages