ClusterTabNet: Supervised clustering method for table detection and table structure recognition

Description

Implementation of the table detection and table structure recognition deep learning model described in the paper "ClusterTabNet: Supervised clustering method for table detection and table structure recognition" https://arxiv.org/abs/2402.07502

Requirements

The requirements are detailed in the requirements.txt file

Download and Installation

For sample inference and training, please check out the jupyter notebook: demo.ipynb

Download datasets PubTables-1M, pubtabnet, fintabnet, synthtabnet, icdar2019 and format them using notebooks in the train_data_preparation folder.

To run the evaluation and further training you can call:
CUDA_VISIBLE_DEVICES=0 python train/table_extraction.py --output_dir=OUTPUT_DIRECTORY -t=both --ocr_labels_folder=ocr --learning_rate=0.00001 --is_use_4_points --is_use_image_patches --use_dox_datasets --eval_set='test' --checkpoint_path=model_weights/table_recognition.pth

Known Issues

No known issues

How to obtain support

Create an issue in this repository if you find a bug or have questions about the content.

For additional support, ask a question in SAP Community.

Contributing

If you wish to contribute code, offer fixes or improvements, please send a pull request. Due to legal reasons, contributors will be asked to accept a DCO when they create the first pull request to this project. This happens in an automated fashion during the submission process. SAP uses the standard DCO text of the Linux Foundation.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.reuse		.reuse
LICENSES		LICENSES
data_preparation		data_preparation
demo_results		demo_results
evaluation/debug_output		evaluation/debug_output
model_functions		model_functions
model_weights		model_weights
sample_pubtables1m		sample_pubtables1m
scripts		scripts
train		train
train_data_preparation		train_data_preparation
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
demo_data_visualization.ipynb		demo_data_visualization.ipynb
requirements.txt		requirements.txt
td_to_tr_matching_ocr_gt.json		td_to_tr_matching_ocr_gt.json
word_map_all_new_fix.json		word_map_all_new_fix.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClusterTabNet: Supervised clustering method for table detection and table structure recognition

Description

Requirements

Download and Installation

Known Issues

How to obtain support

Contributing

License

About

Releases

Packages

Contributors 2

Languages

License

SAP-samples/clustertabnet

Folders and files

Latest commit

History

Repository files navigation

ClusterTabNet: Supervised clustering method for table detection and table structure recognition

Description

Requirements

Download and Installation

Known Issues

How to obtain support

Contributing

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages