Contents
The PageXML Document Model and Parsing PageXML files
Reading and analysing archives with many PageXML files
Text search in PageXML files
Text Search in Line Format Files
Analysing Characteristics of an entire PageXML Archive
Analysing scan characteristics: Checking quality
Analysing scan characteristics: Comparing subsets
Analysing individual inventories
Turning text lines into running text elements
Sorting text regions as columns or as rows
Restructuring PageXML Documents