Dedoc installation

There are two ways to install and run dedoc as a web application or a library that are described below.

Install and run dedoc using docker

You should have git and docker installed for running dedoc by this method. This method is more flexible because it doesn’t depend on the operating system and other user’s limitations, still, the docker application should be installed and configured properly.

  1. Clone the repository

git clone https://github.com/ispras/dedoc
  1. Go to the dedoc directory

cd dedoc
  1. Build the image and run the application

docker-compose up --build

If you need to change some application settings, you may update config.py according to your needs and re-build the image.

If you don’t need to change the application configuration, you may use the built docker image as well.

  1. Pull the image

docker pull dedocproject/dedoc
  1. Run the container

docker run -p 1231:1231 --rm dedocproject/dedoc python3 /dedoc_root/dedoc/main.py

Go to dockerhub to get more information about available dedoc images.

Install dedoc using pip

If you don’t want to use docker for running the application, it’s possible to run dedoc locally. However, it isn’t suitable for any operating system (Ubuntu 20+ is recommended) and there may be not enough machine’s resources for its work. You should have python (python3.8+ is recommended) and pip installed.

  1. Install libreoffice and djvulibre-bin packages:

sudo apt-get install -y libreoffice djvulibre-bin

These packages are used by converters (doc, odt to docx; xls, ods to xlsx; ppt, odp to pptx; djvu to pdf). If you don’t need converters, you can skip this step.

2. Install Tesseract OCR 5 framework. You can try any tutorial for this purpose or look here to get the example of Tesseract installing for dedoc container.

3. Install the dedoc library via pip. To fulfil all the library requirements, you should have torch~=1.11.0 and torchvision~=0.12.0 installed. You can install suitable for you versions of these libraries and install dedoc using pip command:

pip install dedoc

Or you can install dedoc with torch and torchvision included:

pip install "dedoc[torch]"