GOST frame handling
Parameter |
Possible values |
Default value |
Where can be used |
Description |
|---|---|---|---|---|
need_gost_frame_analysis |
True, False |
False |
This option is used to enable GOST (Russian government standard “ГОСТ Р 21.1101”) frame recognition for PDF documents or images. |
The content of each page of some technical documents is placed in special GOST frames. An example of GOST frames is shown in the example below (Examples of GOST frame). Such frames contain meta-information and are not part of the text content of the document. Based on this, we have implemented the functionality for ignoring GOST frames in documents, which works for:
Copyable PDF documents (
dedoc.readers.PdfTxtlayerReaderanddedoc.readers.PdfTabbyReader);Non-copyable PDF documents and Images (
dedoc.readers.PdfImageReader).
If parameter need_gost_frame_analysis=True, the GOST frame itself is ignored and only the contents inside the frame are extracted.
Examples of GOST frame
For example, your send PDF-document with two pages PDF-document with two pages:
Parameter’s usage
import requests
data = {
"pdf_with_text_layer": "auto_tabby",
"need_gost_frame_analysis": "true",
"return_format": "html"
}
with open(filename, "rb") as file:
files = {"file": (filename, file)}
r = requests.post("http://localhost:1231/upload", files=files, data=data)
result = r.content.decode("utf-8")
Request’s result