This project allows you to download and view the contents of a GitHub repository as a single text file. It also supports uploading local ZIP files containing text files and presenting their contents as a single text file.
This tool was created to make it easy to share context with LLM chatbots like ChatGPT, Claude, and Gemini. By downloading a GitHub repository or uploading a ZIP file as a single text file, you can easily provide the chatbot with the necessary context to understand and discuss the contents of the repository.
Here are a few examples of how this tool can be useful when interacting with chatbots:
-
Share the source code of a project with the chatbot to discuss implementation details or get suggestions for improvements.
-
Provide the chatbot with a repository containing documentation or research papers to facilitate a conversation about the topics covered.
-
Use the downloaded text file to give the chatbot context about a specific topic or domain, enabling more informed and relevant responses.
Here's some stuff that is at least somewhat interesting about the project.
We provide a token count for the text file. The
interesting thing about this is that the token count is done in-browser via a packaging of
the python tiktoken
library via WASM.
Counting tokens isn't exactly a trivial task. By doing it on the client we take a load off the server. The tradeoff is that it's a bit more than 4MB to download.
To download a given repo, we just slap /archive/master.zip
onto the end of the URL. This
is basically the same as clicking the "Download ZIP" button on the GitHub page. See
this issue for improvements
someone could make to this.
We don't write the downloaded files to disk, nor do we extract the ZIP file to disk. Everything is kept in memory for a couple of reasons:
- No static/media files to worry about. No storage needed. No cleaning up storage needed.
- Security implications of unzipping files from the internet. We don't have to think about handling relative paths or any other shenanigans that could be in the ZIP file.
A side effect of this is that we do a bit of non-standard handling of the POST/GET of the form. We do the downloading and processing of the repo in the view that you get redirected to after the form is submitted. The weird part is that the downloading and processing might lead to some errors that are best displayed on the form page. So we store the appropriate errors in the session and then redirect back to the form page which reads the errors from the session and sends them to the template for display.
As a side effect of doing everything in-memory, we don't have an URL to download the concatenated file from. Instead, we embed the file in the HTML on the results page as a data URI. I was at first concerned about the size of the data URI, but it seems that you can have a lot!
Firefox version 97 and newer supports data URLs of up to 32MB (before 97 the limit was close to 256MB). Chromium objects to URLs over 512MB, and Webkit (Safari) to URLs over 2048MB.
Since we also have a "copy to clipboard" button, we just read the data URI in Javascript, decode it, and copy it to the clipboard.
We do need to worry about downloading excessively large repos or delivering excessively large text files.
downloader.repo_utils.download_repo
restricts the largest size of ZIP file we download,downloader.file_utils.extract_text_files
limits the total number of files we'll process.downloader.file_utils.extract_text_files
also limits the total size of text we'll deliver.
Since we're constructing data to give to a text-based LLM, we only want text files. The first thought I had was to have a hard-coded list of file extensions that would be considered text, but that's not very flexible and requires maintenance.
Instead, I use the algorithm the zlib
library uses. You can see it in
downloader.file_utils.is_plain_text_file
. The algorithm identifies a file as plain text
if it contains at least one byte from an allowed list (including typical text characters
like TAB, LF, CR, and the range SPACE to 255) and no bytes from a block list (comprising
mainly control characters outside common text usage, such as NUL to 6, and 14 to 31),
offering a straightforward method for text detection without necessitating the analysis of
the entire file or considering ambiguous byte values.
Once we've determined a file is a text file, we check it for encoding declarations like
XML's <?xml version="1.0" encoding="UTF-8"?>
or Python's # -*- coding: utf-8 -*-
and
use whatever we find there. If we don't find anything we just assume "UTF-8". See more
here: downloader.file_utils.detect_internal_encoding
.
- The maximum size of repository that will be downloaded is in
settings.MAX_REPO_SIZE
. - The maximum size of ZIP file that can be uploaded is in
settings.MAX_REPO_SIZE
. - We'll stop after processing
settings.MAX_FILE_COUNT
files from the repo or ZIP file. - We'll only deliver up to
settings.MAX_TEXT_SIZE
of text.
Thank you for your interest in contributing! Here are the steps to get started with development:
- Python 3.12 or later
- Node.js 20 or later
- Fork the Repository: Begin by forking the project to your GitHub account.
- Clone Your Fork: Clone your fork to your local environment:
git clone https://github.com/your-username/gh_repo_download.git
For managing Python dependencies, you can choose between uv
and pip-tools
. uv
is
much faster for installing dependencies, but you may be more comfortable with pip-tools
.
uv
installation instructions here.
Use a virtualenv!
Select the one that best fits your workflow:
-
Using
uv
:- Install
uv
if you haven't already. - Compile and sync dependencies:
uv pip compile requirements.in dev-requirements.in -o requirements.txt uv pip sync requirements.txt
- Install
-
Using
pip-tools
:- Install
pip-tools
:python3 -m pip install pip-tools
- Compile and sync dependencies:
pip-compile requirements.in dev-requirements.in -o requirements.txt pip-sync requirements.txt
- Install
For Node.js dependencies:
npm install
-
Run the Server:
python manage.py runserver
-
Run the Server:
npm run dev
-
Create a new branch for your changes:
git checkout -b feature/your-feature-name
-
Implement your changes, adhering to the project's code style and guidelines.
-
Test your changes thoroughly.
-
Commit and push your changes:
git commit -am 'Add some feature' git push origin feature/your-feature-name
- Open a Pull Request (PR) from your branch to the main repository. Provide a detailed description of your changes and the impact they have.
- Adhere to the coding standards (PEP 8 for Python; black for code formatting).
- Ensure new features or fixes are accompanied by tests.
- Update the documentation as necessary to reflect your changes.
If you need assistance or have questions, feel free to open/continue an issue on GitHub.
- Enter the URL of a GitHub repository in the form.
- Click the "Submit" button.
- The site will process the repository and generate a downloadable text file.
- On the next page, click the "Download file" link to save the text file locally.
- Select a ZIP file containing the desired text files using the file input field.
- Click the "Submit" button to upload and process the ZIP file.
- The site will extract the text files from the ZIP and generate a downloadable text file.
- On the next page, click the "Download file" link to save the text file locally.
- The maximum size of repositories that can be downloaded is 10.0 MB.
- The maximum size of ZIP files that can be uploaded is 10.0 MB.
- Only text files from the repository or ZIP file will be included in the downloaded file.
- The downloaded file is a concatenation of the text files from the repository or ZIP file.
- We'll stop after processing 1000 files from the repo or ZIP file.
- We'll only deliver up to 10.0 MB of text.
Q: What types of files are included in the download?
A: Only text files from the repository or ZIP file are included in the downloaded file.
Q: What is the size limit for the repositories that can be downloaded?
A: The maximum size limit for repositories is 10.0 MB. Repositories exceeding this limit cannot be downloaded.
Q: What is the size limit for the ZIP files that can be uploaded?
A: The maximum size limit for ZIP files is 10.0 MB. ZIP files exceeding this limit cannot be uploaded.