Release magic_pdf-1.0.1-released · opendatalab/MinerU

What's Changed

New API Interface
- For the data-side API, we have introduced the Dataset class, designed to provide a robust and flexible data processing framework. This framework currently supports a variety of document formats, including images (.jpg and .png), PDFs, Word documents (.doc and .docx), and PowerPoint presentations (.ppt and .pptx). It ensures effective support for data processing tasks ranging from simple to complex.
- For the user-side API, we have meticulously designed the MinerU processing workflow as a series of composable Stages. Each Stage represents a specific processing step, allowing users to define new Stages according to their needs and creatively combine these stages to customize their data processing workflows.
Enhanced Compatibility
- By optimizing the dependency environment and configuration items, we ensure stable and efficient operation on ARM architecture Linux systems.
- We have deeply integrated with Huawei Ascend NPU acceleration, providing autonomous and controllable high-performance computing capabilities. This supports the localization and development of AI application platforms in China. Ascend NPU Acceleration
Automatic Language Identification
- By introducing a new language recognition model, setting the lang configuration to auto during document parsing will automatically select the appropriate OCR language model, improving the accuracy of scanned document parsing.
Other Changes
- Supported MPS acceleration on Apple silicon chips for certain supported tasks (such as layout detection and formula detection).
- Convert the OCR model to ONNX format to improve OCR performance on ARM CPUs.

New Contributors

@IMSUVEN made their first contribution in #1281
@pangguosheng1106 made their first contribution in #1325
@beholder91 made their first contribution in #1479

Full Changelog: magic_pdf-0.10.6-released...magic_pdf-1.0.1-released

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

magic_pdf-1.0.1-released

What's Changed

New Contributors

Contributors