You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from unstructured.partition.pdf import partition_pdf
input_path = "../input/"
output_path = "../output/"
file_path = input_path + 'attention.pdf'
chunks = partition_pdf(
filename=file_path,
infer_table_structure=True, # extract tables
strategy="hi_res", # mandatory to infer tables
extract_image_block_types=["Image", 'Table'], # Add 'Table' to list to extract image of tables
# image_output_dir_path=output_path, # if None, images and tables will saved in base64
extract_image_block_to_payload=True, # if true, will extract base64 for API usage
chunking_strategy="by_title", # or 'basic'
max_characters=10000, # defaults to 500
combine_text_under_n_chars=2000, # defaults to 0
new_after_n_chars=6000,
# extract_images_in_pdf=True, # deprecated
)
No tables found in the chunks:
[<unstructured.documents.elements.CompositeElement at 0x7f86226bc0d0>,
<unstructured.documents.elements.CompositeElement at 0x7f86226bc2e0>,
<unstructured.documents.elements.CompositeElement at 0x7f86226bc160>,
<unstructured.documents.elements.CompositeElement at 0x7f86226bc280>,
<unstructured.documents.elements.CompositeElement at 0x7f86226bc3a0>,
<unstructured.documents.elements.CompositeElement at 0x7f86226bc3d0>,
<unstructured.documents.elements.CompositeElement at 0x7f86226bc580>,
<unstructured.documents.elements.CompositeElement at 0x7f86226bc5b0>,
<unstructured.documents.elements.CompositeElement at 0x7f8621e8e530>,
<unstructured.documents.elements.CompositeElement at 0x7f86226bc640>,
<unstructured.documents.elements.CompositeElement at 0x7f86226bc310>,
<unstructured.documents.elements.CompositeElement at 0x7f8621e8d870>]
The SAME code works well with the following version:
unstructured 0.11.5
unstructured-inference 0.7.19
Four tables found:
[<unstructured.documents.elements.CompositeElement at 0x7fdb74e00dc0>,
<unstructured.documents.elements.CompositeElement at 0x7fdb74d35060>,
<unstructured.documents.elements.CompositeElement at 0x7fdb74e018d0>,
<unstructured.documents.elements.CompositeElement at 0x7fdb74e012a0>,
<unstructured.documents.elements.CompositeElement at 0x7fdb74e028c0>,
<unstructured.documents.elements.CompositeElement at 0x7fdb74e011e0>,
<unstructured.documents.elements.Table at 0x7fdb6c1e02e0>,
<unstructured.documents.elements.CompositeElement at 0x7fdb74ccfa00>,
<unstructured.documents.elements.CompositeElement at 0x7fdb74e03250>,
<unstructured.documents.elements.Table at 0x7fdb6c210ac0>,
<unstructured.documents.elements.CompositeElement at 0x7fdb74e024d0>,
<unstructured.documents.elements.CompositeElement at 0x7fdb74e02830>,
<unstructured.documents.elements.Table at 0x7fdb6c3f49a0>,
<unstructured.documents.elements.CompositeElement at 0x7fdb74ebda20>,
<unstructured.documents.elements.Table at 0x7fdb74d37730>,
<unstructured.documents.elements.CompositeElement at 0x7fdb74e01150>,
<unstructured.documents.elements.CompositeElement at 0x7fdb74e00be0>]
The text was updated successfully, but these errors were encountered:
The bug exists on the following version:
Code:
No tables found in the chunks:
The SAME code works well with the following version:
Four tables found:
The text was updated successfully, but these errors were encountered: