dedoc.converters

class dedoc.converters.AbstractConverter(*, config: dict)[source]

This class provides the common methods for all converters: can_convert() and do_convert().

__init__(*, config: dict) None[source]
Parameters:

config – configuration of the converter, e.g. logger for logging

abstract can_convert(extension: str, mime: str, parameters: dict | None = None) bool[source]

Check if this converter can convert file with the given extension.

Parameters:
  • extension – file extension, for example .doc or .pdf

  • mime – MIME type of file

  • parameters – any additional parameters for given document

Returns:

the indicator of possibility to convert this file

abstract do_convert(tmp_dir: str, filename: str, extension: str) str[source]

Convert the given file to another format if it’s possible. This method can only be called on appropriate files, ensure that can_convert() is True for the given file. If the file format is unsupported the ConversionException will be thrown.

Parameters:
  • tmp_dir – directory where the original file is located and where result will be saved

  • filename – name of the original file without extension

  • extension – extension of the original file

Returns:

name of the converted file

class dedoc.converters.FileConverterComposition(converters: List[AbstractConverter])[source]

This class allows to convert any document into the predefined list of formats according to the available list of converters. The list of converters is set via the class constructor. The first suitable converter is used (the one whose method can_convert() returns True), so the order of converters is important.

__init__(converters: List[AbstractConverter]) None[source]
Parameters:

converters – the list of converters that have methods can_convert() and do_convert(), they are used for files converting into specified formats

do_converting(tmp_dir: str, filename: str, parameters: dict | None = None) str[source]

Convert file if there is the converter that can do it. If there isn’t any converter that is able to convert the file, it isn’t changed.

Parameters:
  • tmp_dir – the directory where the file is located and where the converted file will be saved

  • filename – the name of the file to convert

  • parameters – parameters of converting

Returns:

name of the converted file if conversion was executed else name of the original file

class dedoc.converters.BinaryConverter(*, config: dict)[source]

Bases: AbstractConverter

Converts image-like documents with mime=application/octet-stream into PNG. Look to the AbstractConverter documentation to get the information about the methods’ parameters.

can_convert(extension: str, mime: str, parameters: dict | None = None) bool[source]

Checks if the document is image-like (e.g. it has .bmp, .jpg, .tiff, etc. extension) and has mime=application/octet-stream.

do_convert(tmp_dir: str, filename: str, extension: str) str[source]

Convert the image-like and application/octet-stream documents into files with .png extension.

class dedoc.converters.DocxConverter(*, config: dict)[source]

Bases: AbstractConverter

Converts docx-like documents into DOCX using the soffice application. Look to the AbstractConverter documentation to get the information about the methods’ parameters.

can_convert(extension: str, mime: str, parameters: dict | None = None) bool[source]

Checks if the document is docx-like, e.g. it has .doc, .rtf or .odt extension.

do_convert(tmp_dir: str, filename: str, extension: str) str[source]

Convert the docx-like documents into files with .docx extension using the soffice application.

class dedoc.converters.ExcelConverter(*, config: dict)[source]

Bases: AbstractConverter

Converts xlsx-like documents into XLSX using the soffice application. Look to the AbstractConverter documentation to get the information about the methods’ parameters.

can_convert(extension: str, mime: str, parameters: dict | None = None) bool[source]

Checks if the document is xlsx-like, e.g. it has .xls or .ods extension.

do_convert(tmp_dir: str, filename: str, extension: str) str[source]

Convert the xlsx-like documents into files with .xlsx extension using the soffice application.

class dedoc.converters.PptxConverter(*, config: dict)[source]

Bases: AbstractConverter

Converts pptx-like documents into PPTX using the soffice application. Look to the AbstractConverter documentation to get the information about the methods’ parameters.

can_convert(extension: str, mime: str, parameters: dict | None = None) bool[source]

Checks if the document is pptx-like, e.g. it has .ppt or .odp extension.

do_convert(tmp_dir: str, filename: str, extension: str) str[source]

Convert the pptx-like documents into files with .pptx extension using the soffice application.

class dedoc.converters.PDFConverter(*, config: dict)[source]

Bases: AbstractConverter

Converts pdf-like documents into PDF using the ddjvu application. Look to the AbstractConverter documentation to get the information about the methods’ parameters.

can_convert(extension: str, mime: str, parameters: dict | None = None) bool[source]

Checks if the document is pdf-like, e.g. it has .djvu extension.

do_convert(tmp_dir: str, filename: str, extension: str) str[source]

Convert the pdf-like documents into files with .pdf extension using the ddjvu application.

class dedoc.converters.PNGConverter(*, config: dict)[source]

Bases: AbstractConverter

Converts image-like documents into PNG. Look to the AbstractConverter documentation to get the information about the methods’ parameters.

can_convert(extension: str, mime: str, parameters: dict | None = None) bool[source]

Checks if the document is image-like, e.g. it has .bmp, .jpg, .tiff, etc. extension.

do_convert(tmp_dir: str, filename: str, extension: str) str[source]

Convert the image-like documents into files with .png extension.

class dedoc.converters.TxtConverter(*, config: dict)[source]

Bases: AbstractConverter

Converts txt-like documents into TXT by simple renaming. Look to the AbstractConverter documentation to get the information about the methods’ parameters.

can_convert(extension: str, mime: str, parameters: dict | None = None) bool[source]

Checks if the document is txt-like, e.g. it has .xml extension.

do_convert(tmp_dir: str, filename: str, extension: str) str[source]

Convert the txt-like documents into files with .txt extension by renaming it.