Structure type configuring

Parameters for structure type configuring

Parameter

Possible values

Default value

Where can be used

Description

document_type

other, law, tz, diploma, article, fintoc

other

Type of the document structure according to specific domain. If you use default manager config for DedocManager, then the following options are available:

If you use your custom configuration, look to the documentation of StructureExtractorComposition

patterns

list of patterns based on AbstractPattern, or list of patterns dicts, or list of dictionaries converted to string

None

This parameter is used only by DefaultStructureExtractor (document_type="other"). Configuration of default document structure, please see Configure structure extraction using patterns for more details.

structure_type

tree, linear

tree

The type of output document representation. If you use default manager config for DedocManager, then the following options are available:

  • tree – the document is represented as a hierarchical structure where nodes are document lines/paragraphs and child nodes have greater hierarchy level then parents according to the level found by structure extractor. In this case, TreeConstructor is used to construct structure.

  • linear – the document is represented as a tree where the root is empty node, and all document lines are children of the root. In this case, LinearConstructor is used to construct structure.

If you use your custom configuration, look to the documentation of StructureConstructorComposition