Shopping List: Input Encoding

It all started a couple weeks ago with an apostrophe. We were adding “Esther’s 80th card” to the shopping list, and it kept coming up “Esther\’s 80th card”. Seemed innocuous enough – at some level the apostrophe is probably being escaped with a backslash. This wasn’t the first time we’d seen it and I had a little time over the weekend, so I thought I would jump in and fix it with some better encoding.

Little did I know that this would expose all kinds of other problems with special characters, particularly ampersands, spaces, and quote marks of any kind. There seemed to be some black magic going on at multiple levels making it difficult to debug why simple encode()/decode() calls were falling far short.

Eventually, after learning about such fun concepts as PHP’s “magic quotes” and all the different variants of encoding and decoding calls that are broken in fun ways, here’s what I came up with for saving data:

 

I think┬áthis works for all reasonable inputs. At least it works for every test I can come up with. We end up with fully-encoded strings stored in the flat file, that aren’t decoded until immediately before being displayed to the user. The most fun part by far is the oddity that $_POST only operates properly with an encoded input string, but then it unhelpfully decodes it for you (I couldn’t figure out how to turn this off, anyway). So I have to encode once in the Javascript and once in the PHP.

In any case, time to go get some M&Ms! I can’t remember the last time I ate regular M&Ms, but it makes for a great example in HTML special characters.

Leave a Reply