GOST frame handling

Parameters for GOST frame handling

Parameter

Possible values

Default value

Where can be used

Description

need_gost_frame_analysis

True, False

False

This option is used to enable GOST (Russian government standard “ГОСТ Р 21.1101”) frame recognition for PDF documents or images.

The content of each page of some technical documents is placed in special GOST frames. An example of GOST frames is shown in the example below (Examples of GOST frame). Such frames contain meta-information and are not part of the text content of the document. Based on this, we have implemented the functionality for ignoring GOST frames in documents, which works for:

If parameter need_gost_frame_analysis=True, the GOST frame itself is ignored and only the contents inside the frame are extracted.

Examples of GOST frame

For example, your send PDF-document with two pages PDF-document with two pages:

../_images/page_with_gost_frame_1.png ../_images/page_with_gost_frame_2.png

Parameter’s usage

import requests

data = {
    "pdf_with_text_layer": "auto_tabby",
    "need_gost_frame_analysis": "true",
    "return_format": "html"
}
with open(filename, "rb") as file:
    files = {"file": (filename, file)}
    r = requests.post("http://localhost:1231/upload", files=files, data=data)
    result = r.content.decode("utf-8")

Request’s result

../_images/result_gost_frame.png