How to Parse XML in Python: ElementTree, lxml, XPath, Namespaces, and Large Files
XML is a text-based markup language that we can use to store and exchange structured data. If you’ve worked with XML, you know it can be easily read by humans. And it is also machine-readable. But a program needs structured data to understand and work with the XML. This is why we parse XML data. Many developers use the Python programming language to parse XML because it makes it quick and easy to work with XML data. We can also use it for reading configuration files and structuring data with ease.
This guide/tutorial will show you how to parse XML in Python. We’ll cover everything, from key tools and patterns to code examples.
Key Takeaways:
- XML is still used in SOAP APIs, RSS feeds, CRMs, and ERPs.
- With Python’s built‑in ElementTree, basic XML parsing is easy.
- lxml provides XPath support, and it is great for complex files.
- You can use Python to parse XML from files, strings, and even URLs.
- It’s important to understand tree structure, including elements, attributes, and nested tags, before parsing XML.
- Streaming and event‑driven parsing help with large XML files.
- Having correct namespace information is important so your queries don’t provide the wrong elements.
- With Python, you can modify and save XML back to disk.
- For secure parsing, defusedxml is a good option.
Why Use XML in Modern Development?
Although JSON is now preferred for storing and sending structured data, many APIs, enterprise services, and config files still use XML. Here’s a quick overview of where XML is still used:
- Many enterprise APIs still return XML responses instead of JSON data.
- Blog readers and news sites usually provide XML feeds that you need to parse and present.
- Some web applications store settings in XML
- Enterprise systems, such as ERP and CRM, usually exchange large XML documents.
Python is great for reading, manipulating XML, saving XML documents programmatically, and extracting information.
Quick Start: Parse XML in 60 Seconds (ElementTree)
Before looking at complex examples of Python XML parsing, let’s first understand how XML is structured and a simple example of parsing XML in Python.
The Minimum You Need to Know About XML Structure
An XML file contains hierarchical structures. It consists of:
- Elements: The main building blocks (tags).
- Attributes: Extra data in key-value form inside tags.
- Text: The content inside an element.
- Nesting: Elements can contain child elements.
- Root node: The single top-level element containing everything.
Here is an example XML document:
| <book> <title>Python Fundamentals</title> <author>Jane Doe</author> <year>2025</year> </book> |
<book> is the root element, <title>, <author>, and <year> are child elements, and Jane Doe is the text inside <author>.
Your First Parse: File → Root → Value
The simplest way to parse XML using Python is the built-in xml.etree.ElementTree library. Here is how you can access a specific value inside the XML:
| import xml.etree.ElementTree as ET tree = ET.parse(‘book.xml’) root = tree.getroot() print(root.find(‘title’).text) # Output: Python Fundamentals |
In this code:
- We are using ElementTree to read an XML file
- ET.parse() reads the XML and creates an object.
- getroot() gives the root element.
- root.find(‘title’) finds the first child element
- .text gets the text inside the title, and prints it
XML Parsing With Python: Environment Setup
A proper Python setup includes everything (tools, packages, and libraries) you’ll need to parse XML
Required Libraries
Python offers several XML parsing packages. In this tutorial, we’ll use the following libraries:
- xml.etree.ElementTree
It’s a built-in library that many developers use when they want to parse simple XML. It is easy to learn and supports basic XPath-style searches.
- XML parsing with lxml
lxml is a third-party Python library for XML parsing. It’s faster than ElementTree and is great for projects that use XPath or have large files.
Here’s how you can install it:
| pip install lxml |
- defusedxml
When you’re parsing XML from user uploads, defusedxml is the best choice. It protects against XML attacks, such as entity expansion or Billion Laughs attacks, and provides drop-in replacement for libraries like ElementTree.
Here’s how to install it:
| pip install defusedxml |
Choosing the Right Parser
If you’re just getting started and working with small and simple XML files, ElementTree is the best option. For advanced searching or complex documents, lxml works best.
If you’re parsing user-provided documents, use defusedxml as it protects against XML attacks.
Loading and Parsing XML in Python
When we parse XML with Python, we need to load the XML from a file, a string, or a remote source before parsing it. It’s easy to do this with Python’s ElementTree modules. The library allows us to represent the XML as a tree of elements that we can easily inspect.
Loading XML from a File
Many times, we need to work with XML stored on disk. Here’s how you can load XML from a file:
| tree = ET.parse(‘data.xml’) root = tree.getroot() |
- ET.parse(‘data.xml’) finds and reads the file and turns it into an object.
- tree.getroot() gives you the root element. You can use it to look at child elements.
For very large files, it’s best to use ET.iterparse() instead of loading the whole tree to save memory.
Loading XML from a String
Here’s how to load XML from a string:
| xml_string = ‘<root><item>Example</item></root>’ root = ET.fromstring(xml_string) |
In this code, ET.fromstring(xml_string) takes an XML document in the form of a text string and turns it into an object that is the root of the tree. It provides the same type of object you’d get from parsing a file, so you can still navigate the tree.
You can use this method when the XML is coming from other functions or an API response. However, always make sure the string has proper XML before parsing to avoid errors.
Loading XML from a URL (Web API / Feed)
Many APIs and data feeds (such as RSS/Atom) provide XML over HTTP. Here’s an example for loading XML from a web API or feed:
| import requests response = requests.get(‘https://example.com/feed.xml’, timeout=10) if response.headers[‘Content-Type’] == ‘application/xml’: root = ET.fromstring(response.content) |
In this code, requests.get(…) downloads the content at the URL. The Content-Type header check verifies that the server is returning XML. ET.fromstring(response.content) parses the downloaded XML into an Element that you can navigate.
Navigating the XML Tree Structure
Once you parse a document and have the root element, you get a hierarchical data tree of elements. Here’s how you can access them:
Accessing Child Elements
| for child in root: print(child.tag, child.text) |
One important thing to know here is that iterating like this only gives you immediate children. It won’t automatically explore deeper nested elements. To search deeper, you can use .findall() (below) or a recursive iterator.
Accessing Nested Elements
Here is a simple way to find nested elements with XML Parsing in Python:
| nested = root.find(‘parent/child’) if nested is not None: print(nested.text) |
With this code, you can get nested elements. However, it only finds matching direct nested children in that element sequence.
If the element is deeper than you expect, the path won’t match. In such cases, you can use .findall() with .// for recursive searching.
Accessing Attributes
| element = root.find(‘book’) print(element.get(‘id’, ‘default_id’)) |
If you want to access all attributes at once, you can use .attrib.
Finding Specific Elements
ElementTree provides many methods that let you search based on tag names or simple paths without manually looping:
- find(): Provides the first matching element
- findall(): Provides a list of all matching elements
- findtext(): Provides the text of the first element that matches the tag name, or a default value
- .//tag: Recursive search for all matching tags at any depth
Best Libraries that You Can Use to Parse XML

XML Parsing in Python: Standard Library Modules
Using xml.etree.ElementTree
Most developers use this library for simple, well-formed XML. It’s lightweight, easy to use, and supports basic XPath.
Using xml.dom.minidom for DOM-Based XML Parsing in Python
It parses the whole document into memory. It’s useful for pretty-printing small XML files.
Using xml.sax for Event-Driven XML Parsing with Python
It processes XML as a stream of events. It’s the best choice for larger XML files where building a tree is too costly.
Third-Party XML Parsing Libraries
Many useful third-party XML parsing libraries are also available:
Using lxml for High-Performance XML Parsing
lxml is a fast library written in C. It can handle HTML documents and XML exceptionally well and parse documents quickly. lxml also supports full XPath queries and XSLT transformations for complex data.
Using BeautifulSoup for Forgiving XML Parsing
BeautifulSoup is widely used for HTML web scraping. It can also parse XML efficiently. It can handle messy markup that doesn’t strictly follow XML rules. It builds a navigable tree that you can search using find() and find_all().
Using untangle for Simple XML-to-Object Conversion
untangle doesn’t make you work with ElementTree or node trees at all. Instead, it takes an XML document and turns it into a normal Python object that you can work with. After parsing, each XML tag is a part of that object; you can access it using dot notation.
Performance Comparison: What Actually Matters
For high performance of XML parsing, you need to consider different things, such as how big your XML files are, what features you need, and how the parser handles building the in-memory tree of elements.
For small files, Python’s built-in library is a great choice because you don’t need any extra setup, and it gives you a simple tree that is easy to work with.
But if XML documents are large, they can take up lots of memory. For such documents, it’s best to use SAX parsers or iterparse(). Why? Because these tools don’t load everything into memory at once. So when you use these tools, you can read and handle the XML file in pieces.
So which library should you use for advanced searches and full XPath expressions? lxml is the right choice.
Choosing an XML Python Library for Your Project
- Are you working with small and simple XML files? ElementTree will work well for your project.
- For XPath-heavy files, most developers prefer lxml
- Are your XML files huge? use iterparse/SAX.
- When XML documents are from a third party, it’s best to use defusedxml
- BeautifulSoup is a great option for messy XML
- For quick object mapping, you can use untangle
Using XPath for Complex Queries
If your XML has deep nesting, writing so many manual loops can be difficult and time-consuming. XPath is a small query language for XML that lets you express exactly which nodes you want to find in a single string.
XPath uses path‑like expressions to move through the document tree. It doesn’t walk children and grandchildren step by step. With XPath, you need to write only one expression that the parser can use to return what you need. Using XPath makes complex navigation much simpler.
Here is an example of how to use XPath:
| books = root.xpath(‘//book[author=”Jane Doe”]/title’) |
Common XPath Patterns You’ll Actually Use
Here are the most common XPath patterns:
- One very basic pattern is //. We use it in front of a tag name, and XPath looks through the whole document for that name. For example, if we use //book, it will find every <book> element anywhere in the XML.
- Attribute names filtering is also used frequently. For instance, if we write //*[@type=’text’], XPath will find all elements whose type attribute is exactly “text”, so you don’t need to write any extra code.
Handling XML Namespaces Without Pain
Namespaces in XML are a great way to avoid naming conflicts, but they can make searching and parsing a little tricky. A namespace appears in XML like xmlns:prefix=”URI” and tells the parser that any tag using that prefix belongs to a particular space.
Python’s XML libraries, such as xml.etree.ElementTree and the lxml library understand these default namespaces. But this means you usually need to include the namespace URI when you search for elements by name. Otherwise, your search won’t find the nodes you expect.
Detecting Namespaces in Real Files
A simple way to detect what namespaces are present is to inspect the nsmap property if you are using lxml. Or you can check the xmlns attributes in the root element of your XML document.
Searching Namespaced Elements
If you’re using Python’s ElementTree, you can search namespaced elements. You need to provide a dictionary that links a prefix to its full namespace URI. You can then use that prefix in your search path when calling findall() or find()
Here’s one example for searching namespaced elements:
| ns = {‘ns’: ‘http://example.com/ns’} root.find(‘ns:book’, namespaces=ns) |
Parsing Huge XML Files Safely and Efficiently
iterparse for Incremental Processing
For huge files that don’t fit in RAM, iterparse is a good option. It lets you process piece by piece, so memory usage is low. Basically, you can work with each tag when it appears in the file, which means it doesn’t build a complete tree in memory.
When SAX Is Better Than iterparse
SAX (Simple API for XML) is another great parser for big XML documents. It never stores the XML structure but only the element you’re processing, which means it doesn’t use much memory.
Modifying XML Data
Changing Element Text
Here is how we can change element text:
| book = root.find(‘book/title’) # Find the <title> element under the first <book> element if book is not None: book.text = “Updated Python Fundamentals” # Change the text of the <title> element |
Modifying Attributes
Here’s how you can change attributes:
| book = root.find(“book”) # Find the first <book> element if book is not None: book.set(“genre”, “Python”) # Modify the genre attribute |
Inserting New Elements into XML
Adding a New Element
Here’s how we can add a new element:
| # Create a new <book> element with an attribute id=”b3″ new_book = ET.Element(‘book’, attrib={‘id’:’b3′}) # Create a <title> child element under the new_book elementtitle = ET.SubElement(new_book, ‘title’) # Set the text inside the <title> element to “Python for Data Science” title.text = ‘Python for Data Science’ root.append(new_book) |
Inserting Elements with Attributes
| # Create a new element new_book = ET.Element(“book”, attrib={ “id”: “bk003”, # Unique identifier for this book “author”: “Jane Doe”, # Author attribute “genre”: “Programming” # Genre attribute }) # Optionally, add a child <title> element under the new book title = ET.SubElement(new_book, “title”) title.text = “Python Advanced Concepts” # Set the text for <title> root.append(new_book) # Append the new book element to the root |
Deleting XML Elements
Sometimes you need to delete some elements instead of writing and modifying. However, most XML parsers don’t allow a node remove itself, so we use .remove() on the element’s parent. That’s because XML is a tree, and only a parent knows how to detach one of its child nodes.
Removing Elements
When you know the parent element, you call parent.remove(child) and pass the child element you want to delete.
Here’s an example of basic removal:
| for element in root.findall(‘item’): if element.get(‘type’) == ‘obsolete’: root.remove(element) # remove by calling .remove() on parent (root) |
Saving Changes to XML Files
Once you’ve parsed and modified an XML tree in memory, you need to write your changes back to disk. Without this, everything you did stays only in memory and won’t be visible the next time you load the file.
Saving a File to Disk
You can call tree.write(…) on an ElementTree. Python then writes the entire XML tree to a file. You should always specify encoding (use encoding=”utf-8″) when writing XML files. Also, add the xml_declaration=True, which tells parsers how to interpret the file’s encoding.
Here’s a simple example to write the file back to disk:
| tree.write(“updated.xml”, encoding=”utf-8″, xml_declaration=True) |
Pretty-Printing (Optional)
Even though you can use Raw XML output, it’s usually not easy to read. You can pretty-print your XML, formatting it with indents and line breaks.
But it’s not recommended to pretty-print large files. Pretty-printing adds extra whitespace that further increases the file size.
Encoding and Decoding XML
Writing XML with Proper Encoding
Getting encoding right is important so that other systems and parsers can also understand and read your data.
Similarly, when you’re receiving XML from different sources, you need to detect and decode different encodings before parsing.
When you save or generate an XML file in Python, it’s recommended to explicitly set the encoding. UTF‑8 is the most widely supported and used encoding.
Decoding XML Data from Different Encodings
XML parsers usually detect encoding based on the XML declaration or a byte order mark (BOM) at the start of a file. If you read the XML using the wrong character encoding, the text can look mixed up or unreadable.
To avoid this, it’s better to open your file in ‘rb’binary mode so the XML parser can correctly detect the right encoding from the XML declaration or a byte order mark (BOM). Or you can convert the raw bytes yourself based on what the file says.
Mapping XML to Python Objects
Map XML to dict / dataclass (Recommended)
When you have an XML and you just want to use the data in your code, you can convert the XML into a dictionary-like structure. Python dictionaries use simple keys to access values. When you use this method, keys reflect XML tag names, and nested tags convert into lists.
Another option is Python dataclass, which is a built‑in standard feature. It lets you define simple classes with fields and type hints.
Example: Mapping XML Data to a Custom Class
Here’s a simple coding example that parses XML with Python, extracts values, and constructs Python objects from it.
| import xml.etree.ElementTree as ET from dataclasses import dataclass @dataclass class Book: id: int title: str author: str year: int # Parse the XML tree = ET.parse(“library.xml”) root = tree.getroot() books = [] for book_elem in root.findall(“book”): book = Book( id=int(book_elem.get(“id”)), # parse attribute title=book_elem.findtext(“title”) or “”, # extract text child author=book_elem.findtext(“author”) or “”, year=int(book_elem.findtext(“year”) or “0”) ) books.append(book) # Now `books` is a list of Book objects for b in books: print(b.title, b.author) |
Error Handling and Debugging
When you call ET.parse() or ET.fromstring(), you will get an error from the parser if the XML isn’t well‑formed. There can be many reasons for this error:
- Mismatched or unclosed tags
- Encoding issues
- A missing root element
- Invalid characters.
Sometimes, syntax validity isn’t enough. Your XML should follow a specific structure. You can use schema validation for this purpose. XML documents can include or reference a DTD (Document Type Definition). Or you can use an external XML Schema (XSD).
XML Parsing in Python: Security Considerations
When you’re processing XML files from external sources, there are security risks. Here’s how you can process such files securely:
Use defusedxml for Untrusted Input
If you’re parsing XML that comes from users, you should avoid using the standard parsers, such as xml.etree.ElementTree. In such cases, you should use defusedxml because it prevents common XML attacks.
Safe Limits and Timeouts (When XML Comes from the Web)
When you’re parsing XML from the web, there are huge security risks. You should limit the download size and use connect and read timeouts for such files.
XML Parsing in Python Best Practices
- Write clear and specific XML queries instead of broad ones
- Choose a parser that is fast and easy to use.
- Creating small XML sample files for unit tests is also important.
Article written by:

Full Stack AI Engineer
Alexandre brings deep full-stack expertise to Proxywing's engineering efforts — from backend architecture and performance optimization to AI-driven development workflows. His hands-on work spans Node.js, React, cloud infrastructure, and RAG pipelines, giving him a rare ability to tackle both proxy platform internals and user-facing product challenges. At Proxywing, Alexandre focuses on designing resilient systems, eliminating performance bottlenecks, and integrating modern AI tooling into the development process. Outside of coding, he's passionate about exploring the frontiers of AI engineering and building side projects that push his technical boundaries.
All articles by author (42)


