Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'prediction' #765

Open
zvictor opened this issue Jan 16, 2025 · 0 comments
Open

KeyError: 'prediction' #765

zvictor opened this issue Jan 16, 2025 · 0 comments
Assignees
Labels
bug Something isn't working table structure

Comments

@zvictor
Copy link

zvictor commented Jan 16, 2025

Bug

When converting a specific PDF document to JSON using the docling CLI tool, the process fails with a KeyError: 'prediction' in the table_structure_model.py file.

This error occurs because the prediction key is missing in the table_out["predict_details"] dictionary during the table structure recognition step.

Notably, this issue only occurs with this specific PDF document, as other similar documents (around 10 tested so far) convert successfully.

Steps to reproduce

  1. Run the following command in the terminal:
    docling --from pdf --to json --image-export-mode placeholder --output /tmp https://venda-imoveis.caixa.gov.br/editais/EL01030224CPARE.PDF
  2. Observe the error message:
WARNING:docling.pipeline.base_pipeline:Encountered an error during conversion of document 3be6f7171b899a5cd051aefc1d9c3782971ce2a31a8394d1593596e4bf0d0f66:
Traceback (most recent call last):

 File "/home/zvictor/development/martelada/data-lab/.venv/lib/python3.12/site-packages/docling/pipeline/base_pipeline.py", line 150, in _build_document
   for p in pipeline_pages:  # Must exhaust!
            ^^^^^^^^^^^^^^

 File "/home/zvictor/development/martelada/data-lab/.venv/lib/python3.12/site-packages/docling/pipeline/base_pipeline.py", line 116, in _apply_on_pages
   yield from page_batch

 File "/home/zvictor/development/martelada/data-lab/.venv/lib/python3.12/site-packages/docling/models/page_assemble_model.py", line 60, in __call__
   for page in page_batch:
               ^^^^^^^^^^

 File "/home/zvictor/development/martelada/data-lab/.venv/lib/python3.12/site-packages/docling/models/table_structure_model.py", line 215, in __call__
   otsl_seq = table_out["predict_details"]["prediction"][
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^

KeyError: 'prediction'

Docling version

Docling version: 2.15.1
Docling Core version: 2.14.0
Docling IBM Models version: 3.1.2
Docling Parse version: 3.0.0

Python version

Python 3.12.8

Additional context

The issue appears to be specific to the structure or content of this PDF document, as other similar documents process without errors. This suggests that the document may contain unexpected or unsupported table structures that the model cannot handle.

@zvictor zvictor added the bug Something isn't working label Jan 16, 2025
@maxmnemonic maxmnemonic self-assigned this Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working table structure
Projects
None yet
Development

No branches or pull requests

3 participants