Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max length check for Chipper #326

Merged
merged 3 commits into from
May 9, 2024
Merged

Max length check for Chipper #326

merged 3 commits into from
May 9, 2024

Conversation

ajjimeno
Copy link
Contributor

@ajjimeno ajjimeno commented Mar 7, 2024

Chipper uses a decoder with a max_length variable used during the generation step to limit the maximum number of tokens generated by Chipper. It might raise an error or produce non-sensical output after going over that limit.

Unfortunately, the max_length variable was not being used even if it was set in the configuration of each chipper model. This PR solved that problem.

You can check running the code below if it works. Without the proposed change, the document will be generated as usual. With the proposed change, up to 200 characters will be generated. The limit of 200 tokens is achieved by modifying the chipper model max_length parameter from 1536 to 200 to ensure that there is a limit easy to test.

from unstructured_inference.inference.layout import DocumentLayout
from unstructured_inference.models.base import get_model
from unstructured_inference.models.chipper import MODEL_TYPES


MODEL_TYPES['chipperv3']["max_length"] = 200

model = get_model("chipper")

doc = DocumentLayout.from_file(
    "sample-docs/layout-parser-paper.pdf",
    element_extraction_model=model,
    pdf_image_dpi=300,
)

print(doc)

@ajjimeno ajjimeno requested review from qued and leah1985 March 7, 2024 04:52
Copy link
Contributor

@awalker4 awalker4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could really help in the api!

@ajjimeno ajjimeno merged commit 76619ca into main May 9, 2024
5 of 7 checks passed
@ajjimeno ajjimeno deleted the feat/chipper-max-length branch May 9, 2024 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants