Tuesday, June 13, 2006

What, exactly, is XML?

This seems a dumb question: What, exactly, is XML? But, being honest to yourself, can anybody out there explain to me in a few words what XML really is? Or, to be more precise, what is XML all about?

For a couple of years I tried to just listen to the guys 'n ladies who should know the answer and, believe it or not, I got a couple of answers depending on whom I asked for:

Listening to consultants it is the one-and-only salvage to our new computer-century, solving all problems you ever had. Not even enough, it is capable to solve problems you are not yet aware of ever to have.

Listening to the techies you might learn that all you want to achieve technically may be done using XML.

Listening to the decision-makers it is the answer to life, the Universe and everything, and that all doing one big shot.

But, honestly, can anyone out there answer the simple question: "What, exactly, is XML"? Doubt so. This is because merely (in first sight) it only is a format but then (in second sight) everyone put their hopes in it to solve problems XML never was ment to solve. I still can remember the hype in IT where, at a meeting, someone simply said: "Well to solve this problem we provide a XML interface, that should do.". Needless to say that it didn't.

And it is easy to process. There are so much frameworks, tools and API's to cover all that unconvenient stuff of "reading", "parsing" and "testing" XML (configurations) that most people may suggest to you that using XML is pretty more simple than doing without it. Never had the sensation that you needed one hour to do something in XML which could have been done in a couple of minutes without? I had.

To be more precise, the troubles I have with XML are the following:

- It is not defined which element (or tag) to use if there are more than one in one realm. As far as I know all the tools and API's take the first value they find, but it might be a source for errors, especially if you insert a new element which overrides (or better: does not override) the previous value.

- XML is not human readable. It is an exchange format with fault-tolerancy. This is the one advantage XML has (compared to, say, a fixed size record-driven format): In case an elenment is not defined, the structure is kept (mostly). But reading XML as a human reader is painful.

- XML is a format and no programming language. XSLT is a cute way to transform XML into another XML (or text or u-name-it), but XSLT is no programming language either. The moment you stop thinking of XML as a language is the moment you will better recognize how often it is misused nowadays.

- XML produces overhead. Huge overhead. Only think of SOAP-WebServices (or BLOBs represented in XML): To send one bit of information you will need to process, assemble and send some kilobytes round the corner. Agreed, SOAP-WebServices are well-defined and can simply be extended. But do you need XML-ish things for each and every problem?

So what, exactly is XML? An all-intention-problem-solver? Definitely not. An extensible exchange-format with fault-tolerancy and many available tools to deal with and transform or show it? I bet on that one.

Labels: , , ,



Anonymous Anonymous said...


7:51 PM  
Anonymous Anonymous said...

XML is simply a canonical text encoding for structured data so the data can be transferred between processes that speak different languages, or persisted to a file in a "neutral" format.

Whenever someone uses XML in a conversation, I mentally replace it with the word text.

6:45 AM  
Anonymous Kick The Donkey said...

"- XML is not human readable. It is an exchange format with fault-tolerancy. This is the one advantage XML has (compared to, say, a fixed size record-driven format): In case an elenment is not defined, the structure is kept (mostly). But reading XML as a human reader is painful."

I actually disagree with this one here. The system I support sends time reports to another system, and the format is a fixed length text file. The lines are quite long. So long, that they're impossible to read. That, and if you don't have the schema next to you, it's not very easy to read.

With that in mind, I wrote a parser that would take that fixed length record file and generate XML out of it. Very easy to read.

So long as the XML (or any markup for that matter) is nicely indented, I would rather read XML than CSV or flat text record data ANY day.

6:50 AM  
Anonymous Anonymous said...

My biggest gripe with XML, is that for being a text format, it sure is painful to have to worry about escaping ordinary text character like "<", ">" all the time.

6:55 AM  
Blogger Loud Monkey said...

XML is a database format. It's a text/tag based way of creating data tables. It isn't particularly efficient in terms of characters, so XML isn't intended to replace large databases, but apps like iTunes make good use of it.

7:01 AM  
Anonymous Anonymous said...

BTW, it's "meant" not "ment".

7:05 AM  
Anonymous BonzaiEvilJoker said...

XML is God!

Now that we've established this simple principle, I always thought XML was better used for fairly complex, constantly evolving files. Like RSS or ATOM feeds. I wouldn't use XML, for example, to keep information about a 3D model or for entries into a large database.

That being said, XML is better of representing metadata rather than the "data" itself.

7:19 AM  
Anonymous Prashanth said...

If you dont want to escape common characters like '<', etc, you could always send it as CDATA.

7:47 AM  
Blogger Georgi said...

Kick the donkey, you are right. Fixed size formats, CSV or EDI are very hard to read. But why did you write a parser to generate XML?`Imagine the following:

Input: aaaaaabbbbbcccccccdddddd
where tha aaaaaaa is field a, bbbbb is field b etc.

a: aaaaaa
b: bbbbb
c: ccccccc
d: dddddd

Or simply asked: Where is the advantage to process text files into XML to make it more readable?
I often had to deal with XML where you did not expect it, like configuration files:

{c='value1' /} --> could not use less/greater than here
{d='value2' /}

Why not simply using a property file for such simple purposes? It would look like:


Isn't that more readable? I do think so. Greetings, Georgi

9:38 AM  
Blogger Georgi said...

Uups.. it ate my XML example... doing it with { and } and . instead of space...
..{c="value1" /}
..{d="value2" /}

9:41 AM  
Anonymous Anonymous said...

I STILL don't know what XML is.

When I hear things like "wrapper" and I want to run for the hills.

The only other wrapper I have experience with is .AVI which is the tenth level of hell where the devil takes a dump on you. You never know what audio or video codec is "wrapped" in an AVI. There are dozens of custom varients and you need to find the right player for the right codec because every new things doesn't worth with whatever you were using.

It's a minor mircle that VLC does as good of job as it does.

Now we get to XML where is can have ANYTHING in there. :Roll Eyes:

What's wrong with keeping it simple. Maybe 8.3 naming is outdated. But the other side of the dot principle works just fine and lets a human being manage his own data.

11:21 AM  
Anonymous PatoPoc said...

-XML is not intended to be human readable. It's self-describing, that's one of its greatest features.
-Of course it's not a programming language!! Were you expecting it to be one?
-Yeah, overhead... There you have a point! But what kind of overhead? Storage overhead? Storage is cheap so who cares... and the trade-off for the interoperability is worth it.
Parsing complexity is a real issue with XML... You need a big computational power to parse large XML files (Several Mb's). So that's where the academia has to focus now.
The industry is OK with XML as it is right now, as long as we understand what is XML and what is not.

4:21 PM  

Post a Comment

Links to this post:

Create a Link

<< Home