XML vulnerabilities and Excel files

If your code ingests .xlsx files that come from sources in which you do not have absolute trust, please be aware that .xlsx files are made up of XML and, as such, are susceptible to the vulnerabilities of XML.

excelrd uses ElementTree to parse XML, but as you’ll find if you look into it, there are many different ElementTree implementations. A good summary of vulnerabilities you should worry can be found here: xml-vulnerabilities.

For clarity, excelrd will try and import ElementTree from the following sources. The list is in priority order, with those earlier in the list being preferred to those later in the list:

  1. xml.etree.cElementTree

  2. cElementTree

  3. lxml.etree

  4. xml.etree.ElementTree

  5. elementtree.ElementTree

To guard against these problems, you should consider the defusedxml project which can be used as follows:

import defusedxml
from defusedxml.common import EntitiesForbidden
from excelrd import open_workbook
defusedxml.defuse_stdlib()


def secure_open_workbook(**kwargs):
    try:
        return open_workbook(**kwargs)
    except EntitiesForbidden:
        raise ValueError('Please use a xlsx file without XEE')