This solves a long-standing issue where we needed more data about LSL functions than just whether it's side-effect-free.
There's still some debug code, which is kept for history purposes.
- Separate library loading code into a new module. parser.__init__() no longer loads the library; it accepts (but does not depend on) a library as a parameter.
- Add an optional library argument to parse(). It's no longer mandatory to create a new parser for switching to a different builtins or seftable file.
- Move warning() and types from lslparse to lslcommon.
- Add .copy() to uses of base_keywords, to not rely on it being a frozen set.
- Adjust the test suite.
They were returning TOUCH_INVALID_TEXCOORD for num <= idx <= 15 in detection events which were not touch events. That is incorrect.
Now it correctly returns:
- ZERO_VECTOR when idx < 0 or idx > 15 or the event is known not to be a detection event.
- TOUCH_INVALID_TEXCOORD when idx == 0 and the event is known to be a detection event that is not a touch event.
- Raises ELSLCantCompute otherwise.
We oversought that the optimization that 8d33746 applied was already present, so no need to duplicate it.
A better place for handling '|' was under the code that already did so. No functionality change involved.
CleanNode was too greedy, because children of global declarations (particularly lists) are not marked executable. Make a special case for them and don't recurse, since what matters is whether the declaration itself is executed. Its contents can't be cleaned up.
In Python, NaN*Indet in any order returns the second operand, and NaN/Indet in any order returns the first operand. LSL consistently returns NaN in all cases.
The force type functions ff(), fi(), fs()... should normally trigger ELSLTypeMismatch when the input is not in the expected range of types, rather than ELSLInvalidType, which is reserved for the case where the type is not a valid LSL type.
It can't be done always: flag1 and flag2 must be nonzero powers of two. In that case, we can transform it to:
!~(x|~(flag1|flag2)) = !~(x|constant)
The -2147483648 case has trouble with the sign hack detector and I couldn't trigger it.
Rather than assert that the types are correct, use the force type functions on the parameters:
ff, fk, fs, q2f, v2f, and the new fi, fl.
These functions have also been modified to ensure that the input type supports an implicit typecast to the target type and perform it, or emit ELSLInvalidType otherwise, rather than an assertion failure. fl in particular returns the original list if it isn't changed, or a copy if it is.
A couple bugs were found in testfuncs.py as a result, which have been fixed as well. A test has been added to ensure that the exception that caught these bugs remains in place.
The isxxxx functions are no longer necessary, so they are removed. Same goes for the painful cast handling process in foldconst, which was basically performing this task, and not necessarily well.
This approach is much more robust and should have been used since the beginning, but I didn't figure it out then.
Also simplify and fix the matching expression for #line (gcc inserts numeric flags at the end).
It still has many problems. It's O(n^2). It's calculated at every EParse, and EParse can be triggered and ignored while scanning vectors or globals. UniConvScript doesn't read #line at all, thus failing to report a meaningful input line. But at least it's a start.
ReportError() needed to account for terminal encodings that don't support the characters being printed. It was also reporting an inaccurate column number and its corresponding marker position, because the count was in bytes, not in characters, so that has been fixed.
Now EParse.__init__() calls a new function GetErrLineCol() that calculates the line and column corresponding to an error position.
The algorithm for finding the start of the line has also been changed in both ReportError() and EParse.__init__(); as a result, function fieldpos() has been removed.
The exception's lno and cno fields have been changed to be 1-based, rather than 0-based.
Thanks to @Jomik for the report. Fixes#5.
lslcleanup: Variables renamed, order changed, comments added.
Other changes: remove semicolon at end of sentence, use self.Cast instead of creating a CAST node on the fly.
That's exactly what 'cond' is for. Things like a loop with a constant zero vector condition should be recognized.
It was correct before just by chance, because FoldCond currently transforms a constant node into an integer node, but let's not rely on that.
When the index is good, on non-touch functions:
- llDetectedTouchFace returns -1.
- llDetectedTouchST and llDetectedTouchUV return TOUCH_INVALID_TEXCOORD.
We were returning 0 and ZERO_VECTOR respectively.
No other functional changes. This required quite some reorganization affecting many files. As a side effect, PythonType2LSL and LSLType2Python aren't duplicate anymore.
- Resolve for lists that aren't constants, but that are SEF, and reference a constant element, e.g. llList2String([llGetKey(), 5], 1) -> 5.
- Make it work with invalid conversions, if they are SEF.
- Simplify invalid conversions when the list argument is a llGetObjectDetails call and SEF.
- Simplify llList2String(llGetObjectDetails(id, [single_element]), 0) (or -1) to (string)llGetObjectDetails...; same with llList2Key and llList2Integer, converting it to (integer)((string)llGetObjectDetails...).
- Add TODO item to do it similarly for llGet[Link]PrimitiveParams.
Adds auxiliary functions to return a node or constant from a list in any of its forms, its length, and a constant for a node or constant.
Adds some utility members for LSL<->Python type conversion (one of them duplicated from lslparse.py).
Needs maintenance of the types returned by llGetObjectDetails.
if(list!=[]) is always better than if(list), which compiles to if(!(list==[])).
if(float) can't be optimized, but it is equivalent to if(float!=0.0) which can. Same for vector, rotation.
It was failing with a && b && c where a, b, c were known to be bools. It managed to simplify them to a & b & -c but that's not optimal.
Now it recurses on some operators that may return bool when used with bool operands: &, |, ^, * (in the case of &, it's bool if either operand is; for the rest, it's bool if both are). As a result, the above now simplifies to a & b & c, which is optimal.
Adds a new EParseInvalidBrkContArg exception. Previously it raised EParseInvalidBreak or EParseInvalidCont, whose text was misleading for this type of error.
We were letting Python typecast, and that causes the wrong result on Windows. Return the correct result explicitly when "nan" is found in the string.
Also, small reformatting of an else if -> elif.
The reusable names table was being emptied as identifiers were used. This was sub-optimal for function parameters, because new identifiers would need to be created when exhausted.
Reset the reusable names list to a saved copy every time a new parameter list is started, in a similar fashion to what we did with the sequentially generated identifiers.
Rather than using tuples for set belonging, use a frozen set. That eliminates the need to separate them by lengths.
Also, 'Pop' is not a reserved word. It's perfectly valid as a possible substitution identifier, so allow it. If used, it will go to the used words anyway and thus will be skipped by the sequential name generator.
The former ones don't match LSL's behaviour, in particular llRotBetween(<1,0,0>,<-1,0,0>) should return <0,0,1,0> but it didn't. Fix by using an algorithm that is closer to the one used by the sim.
We had too much precision. In LSL, llRound(0.49999997) gives 1, not 0, because the loss of precision after adding 0.5 to that makes the result 1. Fixed by converting to F32 prior to flooring.
LL's algorithm for llEuler2Rot involves calculating a rotation matrix from the Eulers, then transforming the rotation matrix to quaternion. This generates sign discrepancies with our direct quaternion calculation algorithm.
Fixed by making only the necessary calculations to find the correct sign. Unfortunately, they are still heavy (notably, six more trig functions).
Mainly, the input quaternion wasn't being normalized prior to using it for calculations, and that broke compatibility. But even then, sometimes the arc sine of a value greater than 1 could be calculated, so we clamp it.
For all zeros input, the result was <1,0,0,0> but we were producing <0,0,0,.5> because the first branch was taken. Fixed.
We forgot to take the square root when calculating the magnitude of the quaternion while normalizing. Fixed.
Use qnz instead of special casing.
LSL behaves as if it was ZERO_ROTATION instead, so we fix it by creating a function that returns ZERO_ROTATION when given <0,0,0,0>. Use it for llRot2{Fwd,Left,Up} as well.
llAbs(-2147483648) raises this:
System.OverflowException: Value is too small.
at System.Math.Abs (Int32 value) [0x00000] in <filename unknown>:0
at LindenLab.SecondLife.Library.llAbs (Int32 i) [0x00000] in <filename unknown>:0
So it's actually not computable. In LSO it returns -2147483648, though.
llGetSubString and llList2List could produce output longer than the input for some params. Fix that, and join the functions into one unique function for uniformity. Also, get rid of the special case of empty elements, because it can be treated properly by the general case, and it's not so common as to merit special attention.
When the input was of the form e.g. "%4%40", the second "%" was erroneously starting another quoted character. LSL doesn't behave that way: parsing resumes without starting another quoted character. Disturbingly, the expected result in the corresponding test was wrong. Fixed both the test case and the code to match actual LSL behaviour.
llBase64ToString hid another surprise: characters in range from U+0000 to U+001F are substituted by "?" except for tabs (\x09), form feeds (\x0A), shift ins (\x0F) and unit separators (\x1F), which were kept verbatim. So, mimic this behaviour.
When a function was successfully applied, the resulting constant node was not marked as side effect-free. This could potentially lead to poor optimization.
Testing shows that AGENT_LIST_xxx values are not used as a bitfield, but as an enum. There are only three possible valid values. Check only those three.
Lists can't contain lists at runtime, but they can at parse time, so the optimizer must behave properly when handling nested lists. And it didn't, because it neglected to preserve the previous state of self.listmode. So we fix that.
The function's SEF status was not taken into account when substituting functions with their values. This affected llModPow and llXorBase64Strings, both of which have a delay, and were erroneously substituted.
But allow them to be substituted when in calculator mode.
When 'if', 'for', and 'while' statements have a label as the sub-statement, e.g. `while (FALSE) @label;`, the label was improperly removed in some cases, causing a potential compilation failure.
Why someone would want to do that, one never knows, but just to be sure, it has been fixed.
When a global list includes a reference to a global variable of type key, the corresponding list entry type is string, not key (SCR-295, possibly caused by SVC-1710 or SVC-4485).
This implementation is fishy, because it hard-codes the type in the node regardless of the child types. But in some quick experimenting, it seemed to work. And since the main purpose is to document LSO's behaviour, rather than actually being usable, it's OK like that.
The rationale for using -1 was that it had all bits set, but that's a pretty weak argument, really. Lack of optimization of the sign could be worse, so we change it to 1, which is the value of the constant TRUE.
Also change the wording of a comment, for clarity.
The reason is they have an embedded delay. A script might rely on it, therefore substituting the call with its value is not equivalent to leaving the call. They were both already excluded from the SEF table for the same reason.
When we reduced the scope of the try block in commit a823158, we introduced a bug because the tree modification was attempted even if no value was assigned (when the exception was triggered). Returning when the function is not computable ensures that this won't happen.
Scaringly, there was no check that caught this.
Apparently, under Windows, Python does a UTF-16 word-by-word comparison when comparing two strings:
>>> u'\ud700' < u'\U0001d41a'
True
>>> u'\ue000' < u'\U0001d41a'
False
Fix it by encoding as UTF-32 big endian before comparison, when that happens.
- Remove it from lslextrafuncs, and move all the code to lslbasefuncs.
- Make it behave like SL's more accurately. Denormals return 0 always in SL.
- Use int() for truncation rather then floor/ceil.
- Add test cases.
inf and nan already did the right thing in Python, but just in case that doesn't happen in all platforms, we handle them explicitly. Also, that will make it more immune to bugs in future.
We had a big chaos with type conversion. That caused a bug where passing a key to a function that required a string, or vice versa, crashed the script.
Diminish the chaos by modifying the parameters just prior to invocation (in lslfoldconst). We also remove the, now unnecessary, calls to force floats, either alone or within vectors or quaternions.
The previous commit didn't work as expected. "from module import var" freezes the value at load time; changing it later has no effect. A reference to the module needs to be used.
Fix that and the similar problem with LSO. Also revert some "from lslcommon import *" introduced earlier.
That also revealed another bug about missing 'cond' in the import list of lslextrafuncs. This should fix all functions that return values on null key input.
Instead of using an option in the command line, use a global in lslcommon, settable by the main program (only the main LSLCalc program, which differs from LSL-PyOptimizer's main, changes it).
Failure to do so caused a regression test to fail. Harmless, because that option is overriden by main, but fixed.
Bug was introduced in commit 397dc89, with the requirement that the 'optimize' option be active for output optimizations to be applied, by forgetting to update the function header to add that default option.
lslcalc is currently derived from a snapshot of PyOptimizer at some point in past, making it difficult to maintain when bug fixes are applied to the optimizer. Solve this by incorporating expression evaluation capabilities.
llBase64ToString doesn't behave like llUnescapeURL wrt invalid UTF-8. First, the last NUL if any is removed, and the remaining NULs are converted to "?". Second, all overlong sequences are converted to a single "?", *including the 5- and 6-byte UTF-8*.
Implement this behavour and the corresponding unit tests.
LSO strings are byte arrays, but our strings are made for Mono which uses Unicode, and invalid UTF-8 sequences can't be stored in Unicode without using a custom representation.
One possible representation is to only use the codepoints 0-255 in the Unicode string, to avoid supporting multiple types for strings. Something to study in future.
Our UTF-8 validity checker failed to recognize that characters in the surrogate range (D800-DFFF) were invalid. Fortunately, Python 2 is happy about that, therefore it doesn't crash (Python 3 fixed that range too). Unfortunately, SL isn't, therefore we fix it.
Added corresponding unit tests.
This option normally takes effect through the base class in lsloptimize.py, which doesn't call the optimization techniques if deactivated. However, lsloutput.py is not called by it, and it applied some optimizations on its own.
Fixed on the reading of the optimization options, by filtering them by whether the optimize option is active.
Literal strings were not conforming to Mono's strict "NUL is end-of-string" rules, so a file with an embedded NUL within a string literal would let the NUL be part of the string. Mostly academic, because it's probably not possible to upload a file with an embedded NUL to SL, but fixed it regardless.
As an enhancement over LSL, we trigger a Type Mismatch when there are void expressions in list constructors, because in LSL, while accepted, they trigger an ugly runtime exception.
This works fine, but then expression lists, where this are checked, are not exclusive of list constructor; they are used in other places. One of these places is the initializer and the iterator of FOR loops. As a consequence, we didn't allow void functions in the FOR initializer or iterator.
Fix by adding another possible value to the parameter 'expected_types' in Parse_expression_list. False means don't allow void either (what Null did before); Null now means allow anything. All callers to Parse_expression_list are changed accordingly. Added corresponding regression test.
This caused "Label not defined within scope" when breakcont was active:
default{timer(){jump x;while(0)@x;}}
The problem was that breakcont opens a new scope for the case where it has to deal with a loop which is the single statement of an outer statement, e.g.
default{timer(){if(1)while(1)break;}}
would add braces to jump to the correct break point:
default{timer(){if(1){while(1)jump brk;@brk;}}
To avoid excessive complication, a new scope was always opened for the whole statement in each of the loops, regardless of whether we needed braces later on or not. That should be transparent most of the time, but then if the statement was a label declaration, the label would be in a new scope that would be invisible outside the loop.
Fix that by checking explicitly for a label to temporarily get rid of the new scope in that case, and add a test case for it.
The pragma warnings were duplicated, one during globals scan, another during actual parsing. This should fix it.
This is somewhat potentially dangerous, as some directives (pragmas, notably) could in future affect the global scan phase by changing the language.
Commit b73805e introduced lazy lists in assignments only. Commit 890e960 generalized it, allowing any identifier to be followed by brackets, removing the need for assignment-specific treatment. However, we forgot to remove the assignment-specific code parsing, so do it now.
'(float)"-nan"' doesn't return Indet in LSL:
llOwnerSay(llList2CSV([(float)"-nan"])); // outputs "nan", not "-nan"
Therefore, for the output to yield the correct result we have to use a different strategy to generate an indeterminate. We choose '(1e40*0)' which is shorter than the rest.
Also, we don't output infinites as '(float)"[-]inf"' but alwas as '1e40' or '(float)-1e40' (or just '-1e40' if we're in globals).
All three functions have in SL the Intel problem of inaccuracy with high values. Since we work in single precision, it's barely detectable usually, but for large input numbers the difference between the imprecise results of SL and the more accurate results of recent glibc become more obvious.
This change brings back the Intel inaccuracy, even across systems (different versions of Python/C might behave differently otherwise).
Reference:
https://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-bounds-by-1-3-quintillion/
It was designed so that only ints and floats should be accepted. But for safety, and for reuse, make that "any non-float" so that long integers are also truncated to F32.
We were using a very dubious method to distinguish an indeterminate from a NaN, via struct.unpack. Turns out that math.copysign gets the job done and seems more robust.
Besides dividing by zero, any result producing NaN including inf/inf, NaN/anything, anything/NaN causes a math error as well. We only contemplated NaN/anything and neglected the rest, so we generalize it.
Well, within reasonable limits.
For break, it's explained in the code. If the block is empty of actual code (code that generates output), then the break position is eliminated.
For default, if the default label is at the top of the block, the jump and the label are both eliminated. This check doesn't include verifying that there are other non-code-generating statements in between, so there's room for improvement (added corresponding TODO item).
This way, we're on par with FS, in case someone (ab)used the FS brokeness when the 'default' label was absent. All one has to do now is add the default label at the top, and the generated code will be the same as it was in FS when abusing that bug, with no extra burden.
Example: In FS, the code at the top acted as default and there was no jump for it:
default { timer() {
switch(1)
{
0;
case 1:
1;
}
}}
This generated roughly:
default { timer() {
{
if (1 == 1) jump CASE_1;
0;
@CASE_1;
1;
}
}}
In this system, it would trigger an error by default. To obtain the FS behaviour, the scripter can do instead:
default { timer() {
switch(1)
{
default:
0;
case 1:
1;
}
}}
Thanks to this optimization, the code will be the same as in FS. Without it, an extra default label and corresponding jump would be present.
Commit 5804a9a introduced a bug where having the foldtabs option disabled (normal) prevented optimizations of functions. Fix it for good (hopefully). While on it, rename the nofoldtabs option to warntabs, making it default, and use it to disable a warning when there are tabs in a string.
Changed my mind. This looks saner. Now, if the 'default:' label is missing, an error will be thrown by default. It has to be explicitly disabled if normal C-like behaviour is desired (namely to jump to the 'break' label if no condition is met).
In the absence of a 'default' label within a switch, FS falls through to the first CASE label (or to the code before the first CASE if there's any).
This commit adds an option, active by default for FS compatibility, that mimics this broken behaviour, as well as a warning in case the 'default' label is absent, which also triggers another warning when the option is active.
For example, this script reported the type mismatch in default:
key k=0;
default{timer(){}}
Now it's reported at the semicolon, which is as early as it can be identified and much more meaningful.
When a function folds to a string that contains a tab, e.g. llUnescapeURL("%09"), or to a list that contains a string that contains a tab, a warning is emitted unless the foldtabs option (which forces the optimization) is used. This option allows to quiet the warning without forcing the optimization.
Resolving some constant function calls may generate tabs, e.g. llUnescapeURL("%09"). That can't be pasted into a script, and a warning is emitted. The warning text was misleading, making one think that something went wrong when all that happened is that an optimization couldn't be applied. Rewrite the text of the warning to be clearer about what the problem is.
Identifiers can be equal if they belong to different syntactic scopes. That will allow better reuse and less creation of new identifiers, most notably as function and event parameter names.
The implementation would require a stack of counters where the current value is pushed when entering a new scope, and popped when exiting, rather than using a single counter for the whole program.
- Move TODO of <0 to !=-1 to boolean, as it's counter-productive otherwise.
- All a!=b except list!=list are equivalent to !(a==b).
- Change a>b to b<a to simplify cases to analize.
- Proper fall-through in some spots, or return in some others where it didn't apply.
- Simplify handling of a<-2147483648 as a&0, falling through, instead of coping with it ourselves.