Module 3: Text extraction for search and analysis
How to extract text from Akoma Ntoso XML documents for use in full-text search indexing and machine learning analysis.
Last updated
How to extract text from Akoma Ntoso XML documents for use in full-text search indexing and machine learning analysis.
Last updated
In this module we'll cover the following:
Why extracting text is important
The basics of extracting text
Extracting text for specific document portions or provisions (eg. chapters, sections)
We will use:
Python and the XML library
The full code for this module is available on Google Colab: