Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/pdfinfo win #367

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Falven
Copy link

@Falven Falven commented Jul 9, 2024

layout.py

def process_data_with_model(
    data: BinaryIO,
    model_name: Optional[str],
    suffix: Optional[str] = ".pdf",
    **kwargs,
) -> DocumentLayout:
    """Processes pdf file in the form of a file handler (supporting a read method) into a
    DocumentLayout by using a model identified by model_name."""
    with tempfile.NamedTemporaryFile(suffix=suffix) as tmp_file:
        tmp_file.write(data.read())
        tmp_file.flush()  # Make sure the file is written out
        layout = process_file_with_model(
            tmp_file.name,
            model_name,
            **kwargs,
        )

    return layout

Is creating a NamedTemporaryFile. On Windows, when you create a NamedTemporaryFile, the file is opened with an exclusive lock by default. This means that no other process can open the file while it is being used by the process that created it.

Up the stack, pdf2image's convert_from_path is being invoked, which uses the system poppler installation to gather information about the pdf. Because poppler is a separate process, this results in an error.

@Falven Falven force-pushed the feature/pdfinfo_win branch 4 times, most recently from 96a4a5b to 579c131 Compare July 9, 2024 21:35
@Falven Falven force-pushed the feature/pdfinfo_win branch from 437f176 to 437f852 Compare July 9, 2024 21:37
@Falven
Copy link
Author

Falven commented Sep 19, 2024

Could someone please review and approve? Without this fix unstructured will not work on windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant