# Why extracting text is important

Akoma Ntoso XML documents and the resulting HTML documents contain rich, structured information. While this structure is important for displaying, formatting and analysis of the document, sometimes it's important to work with just the text content.

Extracting text content from a structure Akoma Ntoso XML or HTML document is useful for use cases such as:

* Indexing into Elasticsearch or other search engines to support full-text search
* Calculating [embeddings](https://en.wikipedia.org/wiki/Word_embedding) for use with machine learning models such as Large-Language Models (LLMs)
* Natural-language processing and analysis (NLP)

The text content we need depends on our use case. We may need to ignore certain text content such as metadata, editorial comments, embedded text, headings and portion numbers.

To extract text content from a document, we process the XML and use XML-based queries to extract just the text we're interested in.

In this module we'll explore how to extract text content for different uses cases.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developers.laws.africa/tutorial/module-3-text-extraction-for-search-and-analysis/why-extracting-text-is-important.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
