.NET/C#: Generating a WordPress posting categories page – part 1
Posted by jpluimers on 2012/07/31
From the category cloud it is hard to see that the categories are organized as a hierarchy. The combobox on the right shows that, but does not have room to properly show the hierarchy. Since WordPress.com does not allow you to deploy your own code, I worked around it in this way using a small .NET C# console program:
- Extract the HTML for the All Categories combobox on the right of the page.
- Convert that HTML to XHTML (and therefore XML)
- Generate XSD from that XML
- Generate C# class wrappers from the XSD
Future posts will show more logic on how to handle the imported information, and generate nice category overviews. Preliminary source code is at the BeSharp.net source repository.
Extract the HTML
The HTML is not fully accurate (see my post on HTML and XML escapes from last week), but it is fairly easy to extract. Most web browsers allow you to view the source of your web page. Do that, then search for “All Categories”. Now you see HTML like this:
</pre> <h2 class="widgettitle">All categories</h2> <pre><select class="postform" name="cat"><option value="-1">Select Category</option></select><select class="postform" name="cat"><option class="level-0" value="256">About (66)</option></select><select class="postform" name="cat"><option class="level-1" value="64"> Personal (60)</option></select><select class="postform" name="cat"><option class="level-2" value="20254983"> Adest Musica (7)</option></select><select class="postform" name="cat"><option class="level-2" value="32122"> Certifications (2)</option></select><select class="postform" name="cat">...</select><select class="postform" name="cat"><option class="level-0" value="756">Comics (3)</option></select><select class="postform" name="cat"><option class="level-0" value="780">Development (473)</option></select><select class="postform" name="cat"><option class="level-1" value="872460"> Database Development (55)</option></select><select class="postform" name="cat">...</select><select class="postform" name="cat"><option class="level-0" value="9280">User Experience (3)</option></select>
I don’t need the H2 heading line, but the rest I do need to generate XML from. I saved the HTML into a text file for processing by the console app.
Convert the HTML to XML
The HTML contains loads of , but XML does not allow for that entity. So the & ampersand needs to be escaped into &This also solves other uses of & in the HTML. The rest of the HTML is XHTML compliant, so does not require change, which results into this C# conversion method:
private static string toXml(string inputHtml)
{
string result = inputHtml.Replace("&", "&");
return result;
}
Generate an XSD for the XML, then amend the XSD
Given my comparison of tools for generating XSD from XML, so I used the XmlForAsp XML Schema generator, with the “Separate Complex Types” option.
(Note: I will link to the XSD before/after, as WordPress – yet again – screws the XSD sourcecode in the post; this should do for now). That gives me XSD like this (XML is also at pastebin):
<?xml version="1.0" encoding="utf-8"?> <xsd:schema attributeFormDefault="unqualified" elementFormDefault="qualified" version="1.0" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="select" type="selectType" /> <xsd:complexType name="selectType"> <xsd:sequence> <xsd:element maxOccurs="unbounded" name="option" type="optionType" /> </xsd:sequence> <xsd:attribute name="name" type="xsd:string" /> <xsd:attribute name="id" type="xsd:string" /> <xsd:attribute name="class" type="xsd:string" /> </xsd:complexType> <xsd:complexType name="optionType"> <xsd:attribute name="value" type="xsd:int" /> </xsd:complexType> </xsd:schema>
Which is not complete, but gives a good start. The actual XSD it needs to be like this with a more elaborate optionType complex type that also defines it’s own content as deriving from xsd:string, and adds the class attribute (XML is also at pastebin):
<?xml version="1.0" encoding="utf-8"?> <xsd:schema attributeFormDefault="unqualified" elementFormDefault="qualified" version="1.0" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="select" type="selectType" /> <xsd:complexType name="selectType"> <xsd:sequence> <xsd:element maxOccurs="unbounded" name="option" type="optionType" /> </xsd:sequence> <xsd:attribute name="name" type="xsd:string" /> <xsd:attribute name="id" type="xsd:string" /> <xsd:attribute name="class" type="xsd:string" /> </xsd:complexType> <xsd:complexType name="optionType"> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute name="class" type="xsd:string" /> <xsd:attribute name="value" type="xsd:int" /> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:schema>
Generate C# classes from the XSD
You can generate C# wrapper classes using the XSD.exe tool that ships with Visual Studio, but XSD.exe is hard to use, is hard to integrate into Visual Studio (despite Microsoft Connect request for it), the XSD.exe generated code still needs work for deserializing, and XSD.exe has very limited generation options (heck, after it changed from .NET 1.x to 2.0, it hasn’t been updated for about a decade). XSD2Code has some great reviews, to I used that in stead. And indeed, very well integrates into Visual Studio 2010, and generates very nice C#, especially when you use the options (see also the screenshot on the right):
- Under Serialization, set Enabled to True
- Under Serialization, set GenerateXmlAttributes to True
That way, loading the HTML, converting it to XML, then deserializing it into object instances is as simple as this:
string inputFileName = args[0];
string inputHtml = getHtml(inputFileName);
string xml = toXml(inputHtml);
selectType select = selectType.Deserialize(xml);
More on actually working with the loaded instances in the next episode, including the great benefit of XSD2Code: it generates C# code as partial classes.
–jeroen
Rate this:
Share this:
- Click to share on Mastodon (Opens in new window) Mastodon
- Click to share on Bluesky (Opens in new window) Bluesky
- Share on Tumblr
- Click to share on Reddit (Opens in new window) Reddit
- Click to share on Threads (Opens in new window) Threads
- Tweet
- Click to share on Telegram (Opens in new window) Telegram
- Click to share on Nextdoor (Opens in new window) Nextdoor
- Click to share on WhatsApp (Opens in new window) WhatsApp
- Click to print (Opens in new window) Print
- Click to email a link to a friend (Opens in new window) Email
Related
This entry was posted on 2012/07/31 at 06:00 and is filed under .NET, C#, C# 4.0, C# 5.0, Development, SocialMedia, Software Development, Usability, User Experience (ux), Web Development, WordPress, WordPress, XML, XML escapes, XML/XSD, XSD. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
2 Responses to “.NET/C#: Generating a WordPress posting categories page – part 1”
Leave a reply to .NET/C#: Generating a WordPress posting categories page – part 2 « The Wiert Corner – irregular stream of Wiert stuff Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.






Finding the path of xsd.exe from your Visual Studio Build Events « The Wiert Corner – irregular stream of stuff said
[…] .NET/C#: Generating a WordPress posting categories page – part 1. […]
.NET/C#: Generating a WordPress posting categories page – part 2 « The Wiert Corner – irregular stream of Wiert stuff said
[…] Generating a WordPress posting categories page – part 1, you learned how […]