Documentazione tools per passaggio da pdf a ebook
- ocr da pdf a txt:
ocrmypdf --remove-background --deskew --clean --clean-final -l ita --sidecar out.txt in.pdf out.pdf
- proofreading txt, aggiustamenti markdown
% My Book
% Sam Smith
This is my book!
# Chapter One
Chapter one is over.
# Chapter Two
- da markdown a epub
pandoc in.md -o out.epub
https://pypi.org/project/ocrmypdf/
uso:
ocrmypdf in.pdf out.pdf
ocrmypdf main task is the creation of an mixed-mode (image, text) PDF.
Per epub usare opzione --sidecar file.txt
per creare un file di testo con il testo rilevato da ocr.
Risultati decenti con:
ocrmypdf --remove-background --deskew --clean --clean-final -l ita --sidecar out.txt in.pdf out.pdf
pandoc in.md -o out.epub
via calibre: https://ostechnix.com/convert-pdf-to-epub-in-linux/
uso:
ebook-convert file.pdf file.epub --enable-heuristics
vigil: https://itsfoss.com/sigile-epub-editor/
gscan2pdf
per import in google docs o editing successivi
pandoc MyFile.md -f markdown -t docx -s -o MyFile.docx