Notebooks with examples of Dedoc usage

Notebooks with Dedoc usage examples

Task description

Link to the notebook

Document text preprocessing for the following document classification:
  • automatic detection of document format: DOC, DOCX, PDF or any image format;

  • text extraction and its structuring;

  • saving the result to JSON file.

Notebook 1

Tables text and structure extraction from images of scanned documents:
  • automatic detection of document format: PDF or any image format;

  • tables extraction including multi-paged tables;

  • grouping tables by document page where they are located;

  • saving each page to CSV file.

Notebook 2

ADVANCED: Extract text from scanned documents and get its location on the document image:
  • automatic detection of image format;

  • text extraction from image;

  • text location visualization;

  • text recognition confidence visualization.

Notebook 3