Extracting text for analysis and machine learning
How to extract text from all portions of a document, for use in analysis or machine learning.
In this section
Extracting text from different portions of a document
Recursive text extract using the Table of Contents
Fetch the Table of Contents (TOC)
import json
url = "https://api.laws.africa/v3/akn/za-cpt/act/by-law/2014/control-undertakings-liquor/eng/toc.json"
request = Request(url, headers={"Authorization": f"Token {TOKEN}"})
toc = json.loads(urlopen(request).read())['toc']
toc
# [{'type': 'preface',
# 'component': 'main',
# 'title': 'Preface',
# 'children': [],
# 'basic_unit': False,
# 'num': None,
# ....Extract text from TOC entries
Last updated