Decoding HTML encoded source to XML text

March 2026
M	T	W	T	F	S	S
	1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Posted by jpluimers on 2026/03/03

For Some links on getting the most recent defragmentation time of a Windows volume I needed to copy back and forth some XML code back and forth between my ARM MacBook Pro to a remote Windows machine accessing via the Microsoft Windows App (the app formerly known as Microsoft Remote Desktop for Mac).

The problem with that is the copying would lose line breaks, which for XML meaning is no problem, but for human understandability while editing the XML in the Event View query dialog was.

So I decided to go to the “Code” view in my Classic WordPress editor (did I ever tell you much I dislike – especially the accessibility of – the not so new but still haughty named Gutenberg editor?), copied the HTML encoded form and wanted to convert it to unencoded XML text.

Well, here I got to naming confusion land, on which I will talk further below, but first two of the potential solutions:

Note: the tools are in the below particular order, as I forgot about naming confusion and CyberChef by GCHQ could be used as well.

[Wayback/Archive] HTML Decode / Unescape – Online Tools from [Wayback/Archive] Online Tools (their main page [Wayback/Archive] emn178’s Online Tools refers to it) with source code at
[Wayback/Archive] HTML Decode Online is the Best Tool to Decode HTML String, HTML URL and HTML File. from [Wayback/Archive] Code Beautify and Code Formatter For Developers – to Beautify, Validate, Minify, JSON, XML, JavaScript, CSS, HTML, Excel and more of which I blogged first more than 10 years ago in Best Online XML Viewer, Formatter, Editor, Analyser, Beautify-Beautifier, Minify, Tree structure, Notepad, Marker

[Wayback/Archive] CyberChef with the “From HTML Entity” example at [Wayback/Archive] From HTML Entity – CyberChef – the HTML Entity text to XML text conversion

from

&lt;QueryList&gt;
  &lt;Query Id="0" Path="Application"&gt;
    &lt;Select Path="Application"&gt;
*[System[Provider[@Name='Microsoft-Windows-Defrag'] and (Level=4 or Level=0) and (EventID=258)]]
and
*[EventData[Data[1]='defragmentation']]
and
*[EventData[Data[2]='(C:)']]
    &lt;/Select&gt;
  &lt;/Query&gt;
&lt;/QueryList&gt;

<QueryList>
  <Query Id="0" Path="Application">
    <Select Path="Application">
*[System[Provider[@Name='Microsoft-Windows-Defrag'] and (Level=4 or Level=0) and (EventID=258)]]
and
*[EventData[Data[1]='defragmentation']]
and
*[EventData[Data[2]='(C:)']]
    </Select>
  </Query>
</QueryList>

All three tools completely work on the client side.

Both the first and last tools have source code on-line:

Terminology

Above you see these terminologies:

HTML encoded
HTML decode
HTML unescape
HTML entity

The last one is the formal term, and the ones needing decoding were < and >. Their explanation is in List of XML and HTML character entity references: List of character entity references in HTML – Wikipedia from where I copied some table rows into this an HTML table with a proper <thead>:

All named character entity references in HTML and XML

Entities Char. Codepoints Standard DTD^[b] Old ISO subset^[c] Description^[d]

… … … … … … …

<^[a]
&LT;^[a] < U+003C

XML 1.0

HTML 5.0

html.dtd
HTMLspecial ISOnum less-than sign

… … … … … … …

… … … … … … …

>^[a]
&GT;^[a] > U+003E

XML 1.0

HTML 5.0

html.dtd
HTMLspecial ISOnum greater-than sign

… … … … … … …

All named character entity references in HTML and XML
Entities	Char.	Codepoints	Standard	DTD^[b]	Old ISO subset^[c]	Description^[d]
…	…	…	…	…	…	…
<^[a] &LT;^[a]	<	U+003C	XML 1.0 HTML 5.0	html.dtd HTMLspecial	ISOnum	less-than sign
…	…	…	…	…	…	…
…	…	…	…	…	…	…
>^[a] &GT;^[a]	>	U+003E	XML 1.0 HTML 5.0	html.dtd HTMLspecial	ISOnum	greater-than sign
…	…	…	…	…	…	…

What you see is that it points to the html.dtd which further below in that page in a table is described as

HTML DTD entities subsets

Name Version Formal public identifier System identifier

… … … …

HTMLspecial HTML 4 "-//W3C//ENTITIES Special//EN//HTML" "http://www.w3.org/TR/html4/HTMLspecial.ent" (optional)

XHTML 1 "-//W3C//ENTITIES Special for XHTML//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent"

html.dtd^[i] N/A "http://info.cern.ch/MarkUp/html-spec/html.dtd"

… … … …

HTML DTD entities subsets
Name	Version	Formal public identifier	System identifier
…	…	…	…
HTMLspecial	HTML 4	`"-//W3C//ENTITIES Special//EN//HTML"`	`"http://www.w3.org/TR/html4/HTMLspecial.ent"` (optional)
XHTML 1	`"-//W3C//ENTITIES Special for XHTML//EN"`	`"http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent"`
html.dtd^[i]	N/A	`"http://info.cern.ch/MarkUp/html-spec/html.dtd"`
…	…	…	…

where the [i] points to

The original HTML 1.0 DTD, which would have been available at http://info.cern.ch/MarkUp/html-spec/html.dtd

It implies that html.dtd did not and does not exist: it was never created in the turmoil of the early HTML days, hence it’s system identifier page [Wayback/Archive] http://info.cern.ch/MarkUp/html-spec/html.dtd returns a HTTP 404.

As a side note, there is no DTD for HTML 5 either (due to billion laughs attacks).

The HTMLSpecial system identifiers do exist and have not change since early this century. Here they are with the respective entries quoted:

20000712: [Wayback/Archive] http://www.w3.org/TR/html4/HTMLspecial.ent

<!ENTITY lt CDATA "<" -- less-than sign, U+003C ISOnum --> <!ENTITY gt CDATA ">" -- greater-than sign, U+003E ISOnum -->

20020806: [Wayback/Archive] http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent

<!ENTITY lt "&#60;"> <!-- less-than sign, U+003C ISOnum --> <!ENTITY gt ">"> <!-- greater-than sign, U+003E ISOnum -->

Wikipedia links

Character encodings in HTML – Wikipedia
List of XML and HTML character entity references – Wikipedia has sections
- List of character entity references in HTML
- Formal public identifiers for HTML DTD entities subsets
HTML – Wikipedia has these sections
- HTML version timeline
- HTML draft version timeline
  - XHTML versions (the former XHTML which is no longer developed)

Queries

html decoder – Google Search (could not be archived)
[Wayback/Archive] html decoder at DuckDuckGo
[Wayback/Archive] emn178.github.io at DuckDuckGo
[Wayback/Archive] html 1.0 dtd at DuckDuckGo
[Wayback/Archive] HTML and XHTML Document Type Definitions at DuckDuckGo

--jeroen

This entry was posted on 2026/03/03 at 18:00 and is filed under Cyberchef, Development, Encoding, HTML, Mojibake, Software Development, URL Encoding, Web Development. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

	jpluimers on Windows warned me of disk full…
	jpluimers on Started making people walk me…
	jpluimers on Stack Overflow’s forum is dead…
	jpluimers on Some links on getting the most…
	boctorbill on Some links on getting the most…

The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

Subscribe

Archives

Recent Comments

Recent Posts

Blog Stats

Meta title

Tag Cloud Title

Top Clicks

Top Posts

My badges

Twitter Updates

My Flickr Stream

Pages

All categories

Email Subscription