Laws.Africa Developer Guide

CtrlK

Module 3: Text extraction for search and analysis

How to extract text from Akoma Ntoso XML documents for use in full-text search indexing and machine learning analysis.

In this module we'll cover the following:

Why extracting text is important
The basics of extracting text
Extracting text for specific document portions or provisions (eg. chapters, sections)

We will use:

Python and the lxml XML library

The full code for this module is available on Google Colab: https://colab.research.google.com/drive/1BEPG5abvOKoCB5zmHWDUBNNPIof1cE0M

PreviousAdvanced interactivity NextWhy extracting text is important

Last updated 1 year ago