There are a number of hazards that using multiple character sets and multi-byte character sets can expose web applications to. This article will examine the normal method of sanitizing strings in SQL statements, research into multi-byte character sets, and the hazards they can introduce.
SQL Injection and Sanitizing
Web applications sanitize the apostrophe (') character in strings coming from user input being passed to SQL statements using an escape (\) character. The hex code for the escape character is 0x5c. When an attacker puts an apostrophe into a user input, the ' is turned into \' during the sanitizing process. The DBMS does not treat \' as a string delimiter and thusly the attacker (in normal circumstances) is prevented from terminating the string and injecting malicious SQL into the statement.
If a multi-byte character supported by the server ended in the hex code 0x5c, it is possible for an attacker to insert the prefix to this character before the apostrophe, so that the escape, in combination with this prefix, turns into a different character altogether and allows the single quote to escape the string input unscathed. While this idea isn't necessarily new, finding research online that includes an entire list of character sets and characters is cumbersome at best. This article attempts to put all of the research and tools in one place.
Researching Multi-byte Character Sets
A small python script was devised to determine which character set and characters within them contained multi-byte characters ending in 0x5c. The script iterates over all installed character sets and then inspects their hexadecimal values for each character. A list of character sets found to contain valid multi-byte character sets ending in 0x5c is provided in Figure A. Additionally, a video of running the script has been provided to show what the output should look like in Figure B.
In conclusion, there are hundreds of multi-byte characters that could potentially allow attackers to perform SQL injection through sanitizing. It is interesting to note that these character sets are intended for use in a specific region of the world. Ways to fix this by forcing both the webserver and the SQL server to use the same character set exist, as this vulnerability only occurs when multiple (and different) character sets are in use. Those looking to do so may find this research interesting.