Showing posts with label xml. Show all posts
Showing posts with label xml. Show all posts

Tuesday, April 12, 2011

PHP: Encode Text Into Numeric HTML Entities While Keeping HTML

If for whatever reason you need to have some text with special characters encoded into numeric HTML entities, but you also have HTML that you do not want to encode, well here you have it!


Why Numeric HTML Entities?

If you've worked with RSS before I'm sure you understand the dilemma. For those of you that don't know, RSS has a problem with validating if there are non-numeric HTML entities (the normal output you get from htmlentities()). More headaches, more XML fun!


Why Keep HTML?

Well, to be honest, it's just because we have a special scenario right now. But, I'm sure some people may find this helpful.



*Note* The numeric HTML entities foreach loop is based off of Michael Krenz's xml_character_encode function. Thank you sir!


function htmlentities_keephtml($text) {
$entities = get_html_translation_table(HTML_ENTITIES);
unset($entities['"']);
unset($entities['<']); unset($entities['>']);
unset($entities['&']);
foreach ($entities as $k=>$v)
$entities[$k] = "&#" . ord($k) . ";";
$s = array_keys($entities);
$r = array_values($entities);

$text = html_entity_decode($text, ENT_NOQUOTES); // decode the named entities
$text = str_replace($s, $r, $text); // now encode to numeric entities

return $text;
}

Tuesday, September 30, 2008

XML RFC822 and W3CDTF Validation

I've been tasked to clean up XML pages at work. We streamline the process with a perl script (just like many of our other publishing tools) and then a nice neat XML file is produced.

So, I've been spending my time lately validating XML feeds on FeedValidator.org (very useful validator, bookmark it!). Apparently, some of our date formats were not valid. We weren't following the proper RFC822 format for the "pubDate" tag and W3CDTF format for the "dc:date" tag.

Luckily with localtime and strftime, this process is easy. So, I wrote a new subroutine for our perl module and here we have it!

use POSIX qw(strftime);



sub Now2
{
my $a = shift;

if ($a == 1) {
## returns in RFC822 format:
## e.g. Tue, 30 Sep 2008 12:57:06 -0400
my $now_string = strftime "%a, %e %b %Y %H:%M:%S %z", localtime;
return $now_string;
}
if ($a == 2) {
## returns in W3CDTF format:
## e.g. 2008-09-30T13:00:35-0400
my $now_string = strftime "%Y-%m-%dT%H:%M:%S%z", localtime;
return $now_string;
}
else {
return;
}
}


You basically call the subroutine specifying a "1" or "2" depending on which date format you want it in. So for example to get a W3CDTF date format, call the subroutine like this: Now2(2)

Cheers!