When it comes to Web standards, I like to consider myself a fan and follower, although not in an obsessive kind of way.  I think that it’s also important (usually more important) to get the necessary features in the product.

To validate or not to validate, that is the question


The line for me is exactly there, I always try to deliver the best I can.  For a website, for example, standards compliance is important but meeting users’ needs is more important.

Let’s take Joomla! as an example.  This open source CMS allows you to have a website up and running much faster than if you were developing it from scratch.  It has a powerful backend and a lot of already-tested features and extensions in place.

However, when it comes to having the site be XHTML 1.0 compliant, Joomla! fights you all the way.  You can go crazy trying to do it.  I know, I’ve tried.

Creating a valid XHTML 1.0 template is not a particularly difficult thing to do if you pay even minimal attention to standards.  Extensions are a different ball of wax altogether.

Houston, We Have a Problem

Making sure that all the components and modules you use generate valid XHTML 1.0 code is damn near impossible.  Joomla! extensions (i.e., components, modules and plug-ins) are developed by lots of people from all over the world, some who care about Web standards (good) and some who don’t (not so good).

To correct this situation you have to have a pretty good understanding of Joomla’s architecture, dive into PHP/HTML code and modify its core files.  Ditto for the extensions.  That would be a crazy thing to do.  It invalidates one of the big benefits of working with an OSS CMS in that you are now going to be messing around with time-tested code.

This is unsustainable, not to mention impractical.  Every time you update one of your components, you will most likely have to implement your changes all over again.  Alternatively, you could keep track of all your customized files and then review each one and manually upgrade after each update.  More likely than not, once you’ve made extensive changes, you’ll be very reticent to update and would not be able to take advantage of bug fixes, new features, etc.

Creating Valid XHTML 1.0 Content

One popular component is TinyMCE, the default text editor that comes with Joomla!

This little monster has a mind on its own and will mess with everything you do.  You can turn off the “Code Cleanup on Startup” and “Code cleanup on save” parameters but to no avail.  If TinyMCE could talk, this is the conversation you’d have with it:

You: Hi Tiny, how have you been?
Tiny MCE: Great, buddy, what can I do for you?
You: Please put this <br /> tag in my article’s HTML code.
Tiny MCE: He he he. You crazy. I’ll save space by putting the usual <br> instead
You: No please, add the slash in there. It’s important so the site can be valid XHTML 1.0.
Tiny MCE: Oh, I’m sorry, I already did it my way. BTW, what does “valid XHTML 1.0” mean?
You: [Uncontrollable sobbing sounds]

Even if you could find a better HTML editor that listens to you and respects your feelings you may still end up with non-compliant code because each author can mess with the HTML code directly (and since we’re a technology company, many of our authors do).  In this case, you would have to monitor and check every single article that comes into the system to make sure it’s compliant.

Another Approach

At this point it is very tempting to just give up on the whole thing and let the site be non-compliant.

Even though passing the XHTML 1.0 validation is not vital for a Joomla!-based website, it has been bothering me for a while and I don’t like it.  I mean, we’re a technology company and this reflects badly on us.

So, I decided to create the CleanHTML plug-in to resolve this situation.

This plug-in runs after the final HTML has been generated by Joomla! and just before the page is sent to the visitor (onAfterRender Joomla! event).  At that point, the plug-in cleans up the code to make it compliant, regardless of what extensions and authors do (yeay!).

CleanHTML Features

CleanHTML can be configured to use the Tidy PHP extension to parse and fix your code to make it valid XHTML 1.0 or HMTL 4.01.

When Tidy is not available (i.e., most hosting providers still don’t include this extension) or if you just decide not to enable this feature, the plug-in can perform a set of manual cleanups using regular expressions such as replacing for , fix unescaped query strings, sanitize ID attributes, enclose and blocks of code in CDATA sections, etc.

Last but not least, you can add custom rules with raw PHP code that are then applied to the page, held in the $html variable.

For example, say you’ve noticed that some onclick attributes in certain pages are coded as onClick (i.e., making the page XHTML 1.0 non-compliant).  In this case, you can add a PHP replacement line such as:

$html = str_replace(‘onclick=’, onclick=’, $html);

This will fix all instances of this problem in all of your frontend pages.  Cool, huh?

Final Notes

It is important to mention that executing regular expression replacements for every generated page can have an impact in website performance.  This impact is almost imperceptible for the most part unless you have lots and lots of visitors and a lousy hosting provider.  Otherwise, you won’t even notice the difference.

Given the “global community” nature of Joomla!, it’s very hard to get all the contributors lined up to generate valid XHTML 1.0 websites out of the box.  For that you would need to,

  1. Have the Joomla! Core Team signed up to generate only valid XHTML 1.0 markup.
  2. Have some sort of automated QA subsystem to qualify Joomla! extensions (i.e., sounds complicated and unreal).
  3. Replace TinyMCE with one that generates valid XHTML 1.0 code.

Of course, all of this is complicated by the fact that “compliance” is somewhat of a loose thing because THERE’S NO SAMPLE IMPLEMENTATION of these standards.

Therefore, this is the only way I can think to achieve this (for now at least).  So, there you have it.  What do you think?

CleanHTML 1.0 plug-in for Joomla 1.5

If you decide to try this plug-in, please let me know if you find any issues with it so I can fix them.  I hope you find this plug-in useful.