Grauw’s blog

On application/xml and text/xml

November 26th, 2008

XML has two MIME types, application/xml and text/xml. These are often used interchangeably, but there is a subtle difference which is why application/xml is generally recommended over the latter.

Let me explain why: according to the standard, text/*-MIME types have a us-ascii character set unless otherwise specified in the HTTP headers. This effectively means that any encoding defined in the XML prolog (e.g. <?xml version=”1.0” encoding=”UTF-8”?>) is ignored. This is of course not the expected and desired behaviour.

To further complicate matters, most/all browser implementations actually implement nonstandard behaviour for text/xml because they process the encoding as if it were application/xml.

So, text/* has encoding issues, and is not implemented by browsers in a standards-compliant manner, which is why using application/* is recommended.

Update: The situation has changed in the new HTTP/1.1 RFC:

The default charset of ISO-8859-1 for text media types has been removed; the default is now whatever the media type definition says.

So, there is no obstruction to using text/* media types anymore.

Grauw

Comments

Thanks by Rich at 2009-07-01 19:28

I’ve been searching for a concise explanation of these differences. Thanks very much! :)
-Rich

thnx by 3rdbit at 2010-12-15 13:41

thnx 4 the nfo

by at 2011-01-20 21:05

Nice!

Just what I needed by Antony at 2011-03-30 13:16

Concise answer to my simple question – nice

efficient by Lawrence at 2011-12-09 10:21

No need to crawl hundreds of forum when a mere webpage gives you the right explanation.
I Appreciate It

You Rules by Paul at 2012-06-13 17:50

concise explanation. two Thumbs up!

(-_-) by Hassan at 2012-07-02 06:59

merci

Awsome Article by Osmumos at 2012-07-05 05:53

This has saved my day! I was really torn between the two wondering why in the world they should exist together.
Thanks!

Interesting by Tom at 2012-07-13 10:33

This is very interesting. I’ve never seen a mention of this issue but it certainly seems important. Might I ask about the source of this information? Have you found it in a standard I can refer to or confirmed this by experiment?

Re: Interesting by Grauw at 2012-07-21 18:22

Hi Tom,

It came up in various discussions on W3C mailing lists I used to frequent. The character encoding of the text media type is described in section 4.1.2 of RFC 2046:

4.1.2. Charset Parameter

A critical parameter that may be specified in the Content-Type field for “text/plain” data is the character set. This is specified with a “charset” parameter, as in:

Content-type: text/plain; charset=iso-8859-1

Unlike some other parameter values, the values of the charset parameter are NOT case sensitive. The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII.

The specification for any future subtypes of “text” must specify whether or not they will also utilize a “charset” parameter, and may possibly restrict its values as well. For other subtypes of “text” than “text/plain”, the semantics of the “charset” parameter should be defined to be identical to those specified here for “text/plain”, i.e., the body consists entirely of characters in the given charset.

[...]

Note that the character set used, if anything other than US-ASCII, must always be explicitly specified in the Content-Type field.

The underlying reason for this default is the assumption that the encoding can only be specified by the MIME header. After all, if the encoding would be specified in the content, for a document in an unknown encoding how would you be able to read this encoding identifier without knowing its encoding.

However in practice there are now several content types which allow this (XML, HTML, CSS, etc.), and the reason this works is because of the predominance of ASCII-based character sets.

Thanks by Leonid at 2012-11-21 13:46

Thanks a lot. Very clearly stated and helped me to avoid encoding issue as the server did not pass the encoding in Content-Type as I have specified.

Yet another thankyou by Owen Wood at 2013-03-14 10:21

I know its been said but Great Work :).

Im new to REST services, being more comportable in the SOAP world.

Was recently thrown a REST service and had to write a client for that took SOAP requests from our internal SOAP WS and passed on the XML (JAXB) to to a REST API living out on the interwebs (Die Antwoord reference). Our SOAP service served as an aggregation / interrogation / seurity layer. XML parsing was a must and was wondering what the difference / benefits of xml application over xml text were.

Saved another newbie looking stupid in the face of a / multiple Project Managers.

Depends on presence of embeded encoding by Chris Hill at 2014-02-14 13:44

I would expect that if the XML contains an encoding specification (<?xml version="1.0" encoding="UTF-8"?>) then sending it as text/xml with any character encoding (explicit or implicit) that differs from the embedded encoding is a bad idea, since it is ambiguous. In this case application/xml has got to be the right answer, the recipient can inspect the header to determine the encoding and understand the content, using the algorithm described in the XML specification.

If the XML contains no encoding specification then this information should be expressed externally to the XML, implicitly or explicitly in the text/xml charset specification.

Consider the issues surrounding an EBCDIC encoded XML file. The recipient on their ASCII platform should be able to unambiguously determine how to process the XML. Either it comes as text/xml (with charset set to an EBCDIC character set) or it comes as application/xml with an embedded encoding specification. If it is text/xml then any XML header encoding specification must match the charset from the mime header, otherwise the recipient doesn’t know how to process it as there is ambiguity.

Thanks by Ahmed at 2015-05-27 13:36

Thanks