Fix UTF-8 in the surrogate range passing as good.

Our UTF-8 validity checker failed to recognize that characters in the surrogate range (D800-DFFF) were invalid. Fortunately, Python 2 is happy about that, therefore it doesn't crash (Python 3 fixed that range too). Unfortunately, SL isn't, therefore we fix it.

Added corresponding unit tests.
This commit is contained in:
Sei Lisa 2016-12-13 03:10:01 +01:00
parent ae984169ad
commit b593141f9f
2 changed files with 12 additions and 1 deletions

View file

@ -460,8 +460,9 @@ def InternalUTF8toString(s):
if partialchar:
if 0x80 <= o < 0xC0 and (
partialchar[1:2]
or b'\xC2' <= partialchar < b'\xF4' and partialchar not in b'\xE0\xF0'
or b'\xC2' <= partialchar < b'\xF4' and partialchar not in b'\xE0\xED\xF0'
or partialchar == b'\xE0' and o >= 0xA0
or partialchar == b'\xED' and o < 0xA0
or partialchar == b'\xF0' and o >= 0x90
or partialchar == b'\xF4' and o < 0x90
):