pagexml.analysis package
Submodules
- pagexml.analysis.layout_stats module
average_baseline_height()categorise_line_width()compute_baseline_distances()compute_bounding_box_distances()compute_columns_stats()compute_height_stats()compute_lines_stats()compute_pages_stats()compute_pagexml_stats()compute_points_distances()compute_scans_stats()compute_textregion_distance()compute_textregions_stats()find_line_width_boundary_points()find_lowest_point()get_baseline_y()get_bottom_points()get_boundary_width_ranges()get_line_distances()get_line_height_stats()get_line_width_stats()get_line_widths()get_text_heights()get_textregion_avg_char_width()get_textregion_avg_line_distance()get_textregion_avg_line_width()get_textregion_line_distances()interpolate_baseline_points()interpolate_points()line_starts_with_big_capital()sort_coords_above_below_baseline()
- pagexml.analysis.stats module
- pagexml.analysis.text_stats module
LineAnalyserLineCharAnalyserLineWordAnalyserWordBreakDetectorcompute_complement_keyness()compute_expected()compute_keyness()compute_log_likelihood()determine_word_break()determine_word_break_typical_merge_end()end_is_common_word()end_start_are_bigram()end_start_are_hyphenated_compound()get_doc_words()get_keyness_vocab()get_line_text()get_observed()get_typical_start_end_words()get_word_cat_stats()get_words_per_line()has_common_merge_end()has_non_merge_word()has_word_break_symbol()is_non_mid_word()make_line_analyser()merge_analysers()merge_is_more_common()show_word_break_context()start_is_titleword()start_word_has_incorrect_titlecase()