PHP Regex to Preserve HTML Entities

posted Saturday, July 16th 2011 at 7:13 PM by

So heres the deal. Sometimes I post stuff and its got ampersands in it. Stuff like M&Ms, Good & Plenty, Mike & Ike... etc.

But, being the W3 standards minded person I am, naturally I wanted to convert all my & characters to &

No biggie, right? Just PHP str_str the shit out of it, right? And Kablamo! Suddenly all the beautiful links you had to www.example.com/?foo=bar&oh=noes turned into www.example.com/?foo=bar&oh=noes

So whats a nerd to do? Solve it with a regular expression! (insert XKCD comic here)

aaaaaaand what that does, is replaces any instance of & that has a ; within 5 characters after it, or does not replace if & is followed by = sign.

So, that will replace all & symbols with the proper & yet keep URLs intact, including crazy URLs with semi-colons in them, i.e. www.example.com/?foo=bar&foo2=jkl;

That covers pretty much any case I could think of where you'd want to replace & with & and the scenarios where you'd want to keep it. Testing has been limited, and doesn't include the things I haven't thought of yet, but hopefully this is of use to someone.

Share This:



View (0) Comments Post a Comment
  • Replying to Adam Konieska on PHP Regex to Preserve HTML Entities