Types of textual lines
Each reader returns UnstructuredDocument
with textual lines.
Readers don’t fill hierarchy_level
metadata field (structure extractors do this), but they can fill tag_hierarchy_level
with information about line types.
Below the readers are enlisted that can return non-empty tag_hierarchy_level
in document lines metadata:
+ means that the reader can return lines of this type.
- means that the reader doesn’t return lines of this type due to complexity of the task or lack of information provided by the format.
Reader |
header |
list_item |
unknown |
key |
+ |
+ |
+ |
- |
|
+ |
+ |
+ |
- |
|
+ |
+ |
+ |
- |
|
- |
- |
+ |
- |
|
- |
+ |
+ |
+ |
|
- |
- |
+ |
- |
|
+ |
+ |
+ |
- |
|
- |
- |
+ |
- |