“Capture” is the wrong word. These languages (or formats, that’s probably a more accurate description of them) are used to tag text data, to give data semantic meaning. That means that if you have data tagged in a given format you can send it to another program, which in turn can automatically extract the data.
So for example, the following is a description of a book. It’s useless as far as computers are concerned because it’s written in natural language, in English.
The Baron in the Trees is a book by the Italian author Italo Calvino, first published in1957. A metaphor for independence, the book recounts the adventures of a boy who climbs up a tree one day and never climbs down, spending the rest of his life inhabiting an arboreal kingdom.
Let’s assume you want to involve computers here. Say you have many books (could be ten, a hundred, a million; doesn’t matter) and you want them recorded. So let’s structure the contents of that text a bit:
title: The Baron in the Trees
author: Italo Calvino
first published: 1957
description: A metaphor for independence, the book recounts the adventures of a boy who climbs up a tree one day and never climbs down, spending the rest of his life inhabiting an arboreal kingdom.
Computers find it difficult to understand spaces. Most languages have a concept called strings, with some way to denote verbatim text characters. So maybe the following would be better:
"title": "The Baron in the Trees",
"author": "Italo Calvino",
"first published": 1957,
"description": "A metaphor for independence, the book recounts the adventures of a boy who climbs up a tree one day and never climbs down, spending the rest of his life inhabiting an arboreal kingdom."
That would be JSON.
But there are loads of other formats, JSON is just one of them. If you want to transfer data (be that within a program, between programs, between computers, etc), as long as the sender can inform the receiver of the format used, the receiver can then [in theory] trivially extract data. The point is to automate, the formatting enables the automation.
Say you want to render the information about the book on a web page. The transfer format here, HTML, exists to allow the web browser to render that text in a way that can be controlled.
<h1>The Baron in the Trees</h1>
<p><a href="link/to/author">Italo Calvino</a></p>
<p>A metaphor for independence, the book recounts the adventures of a boy who climbs up a tree one day and never climbs down, spending the rest of his life inhabiting an arboreal kingdom.</p>
HTML is strictly specified: there are a set of tags you can use, and those tags can have some attributes applied to them to enhance the information the browser can extract (so, for example, that anchor tag has an
href attribute). So if you send some data in HTML format and you read that data with a browser, the browser will do a predictable thing with it. It’s not at all useful for providing contextual structure to the information though. It is a way of taking some blocks of text and telling the browser where to put them.
So for structured data, JSON is one [of many] formats. XML is another.
XML is from the same family as HTML, using
<named_tags_in_angle_brackets>To wrap data
rather than just
named_keys="with some data associated to them"
There are certain advantages to formatting that looks like XML; parsing is easier in some ways, and in some respects it’s easier to define complex [nested] data. As an aside, there is a programming language called Lisp that this kind of structure is mainly borrowed from, anyway.
So in XML, could maybe format my book like this:
<title>The Baron in the Trees</title>
<description>A metaphor for independence, the book recounts the adventures of a boy who climbs up a tree one day and never climbs down, spending the rest of his life inhabiting an arboreal kingdom.</description>
Or maybe like this:
title="The Baron in the Trees"
description="A metaphor for independence, the book recounts the adventures of a boy who climbs up a tree one day and never climbs down, spending the rest of his life inhabiting an arboreal kingdom."
I say “maybe” because what XML requires, alongside the data, is some information that describes how the format for the data is structured. XML is extremely powerful and can be used to describe any structured data. The tradeoff is that requirement, the requirement to describe how the data is structured so that it has meanings that can be easily extracted. This often requires a suite of associated tools (xpath, xslt, etc).
So XML was seen as a sort of ultimate data transfer format. It is, I guess, but it’s also complicated and ugly. JSON has none of the power, but it’s very very simple and very easy to read (both in human and computer terms). So JSON is likely the most common format you’re going need to work directly with.
XML used to be used extensively as an output format for data APIs (eg you query
example.com/api/books/123, what comes back is that data above). Not so much now, JSON is by far and away the most common response format. So
15-20 years ago, quite a lot because if you were getting data from somewhere it would be common for it to be in XML format. Now, not so much in a “requesting data over HTTP” context, JSON won that particular battle.
That being said it is still used all over the place.
- @a_aramini mentioned Android manifests – that’s an example of it being used for configuration settings.
- Lots of document formats are built in XML – docx, the MS Word file format, is a good example. HTML is ok for basic webpage formatting, but for complex documents you often require a lot more functionality. So you can define something like HTML but with a lot more built-in. This means an application can save your work in that format, then you can open it anywhere else that understands the format. It also explains why there is often a huge issue opening older documents in newer versions.of the program (or vice versa): in those cases the formatting instructions for the XML have changed between program versions.
It doesn’t do anything, it’s just a way to define a structure for some data. You may pass that data between a server and a client. You may pass that data between programs on a single computer. You may pass that data between components inside a single application.
To take the books example: you could write about your favourite books in prose form, then write and train and tune some highly complex AI that can infer meaning from natural language, which will in turn extract information from your prose, and then you can use that information in some application, which will maybe take a few years. Or you can just write the information about your favourite books in the form of JSON or XML or whatever; that’ll maybe take you a few minutes instead.
No, not any more or less, because it’s just a way to format data in a structured manner. It is unlikely you need to learn about it in any detail as direct use of it isn’t hugely common nowadays (YMMV). Edit: so, for example. I mentioned JSX. You don’t need to know how XML works, or how it works in that particular context, you can just use JSX. Same for most things.
Note XML works just fine in a web browser; if you open an arbitrary XML file then normally it will open the file as-is, along with a warning that there is no styling information associated with it. But if styling information is included, that can tell the data to be rendered to a user in any way.