Implement a couple subtleties of the lexer in the handling of strings.

First, a string does not have its own context, it's a regular expression, therefore if there isn't a matching closing quote until EOF, it's not an unterminated string. It plainly *isn't* a string.

Second, the string lexer searches for either the regexp '[^"\\]' (any character except a double quote or a backslash) or the regexp '\\.', that is, a backslash followed by an "any character" meta. But the "any character" meta does NOT match a newline. The consequence is that a '\' followed by a newline cancels the matching, and the whole thing is not considered a string.

Isolated double quotes are then plain ignored as many other characters.

An example illustrating both cases:
```
default
{
    state_entry()
    {
        string msg = "Hello, Avatar!";
        llOwnerSay"(msg); // the opening quote will be ignored
        // the following backslash at EOL cancels the previous double quote: \
        llOwnerSay(llGetKey(")); // Unterminated string; unmatched so ignored.
    }
}
```
This commit is contained in:
Sei Lisa 2015-03-08 16:53:58 +01:00
parent defa9fde97
commit 6a0eebf157

View file

@ -356,8 +356,16 @@ class parser(object):
self.pos += 1
strliteral = '"'
savepos = self.pos # we may need to backtrack
is_string = True # by default
while self.script[self.pos:self.pos+1] != '"':
self.ueof()
# per the grammar, on EOF, it's not considered a string
if self.pos >= self.length:
self.pos = savepos
is_string = False
break
if self.script[self.pos] == '\\':
self.pos += 1
self.ueof()
@ -365,14 +373,21 @@ class parser(object):
strliteral += '\n'
elif self.script[self.pos] == 't':
strliteral += ' '
elif self.script[self.pos] == '\n':
# '\' followed by a newline; it's not a string.
self.pos = savepos
is_string = False
break
else:
strliteral += self.script[self.pos]
else:
strliteral += self.script[self.pos]
self.pos += 1
self.pos += 1
return ('STRING_VALUE', strliteral.decode('utf8'))
if is_string:
self.pos += 1
return ('STRING_VALUE', strliteral.decode('utf8'))
# fall through (to consider the L or to ignore the ")
if isalpha_(c):
# Identifier or reserved