It's unlikely to hit this one (requires more than 10,000 identifiers to trigger) but it's a bug nevertheless.
While touching the file, try to explain the situation better in the first comment.
That was long overdue. Obviously, this is a large commit.
The new nr (node record) class has built-in dump capabilities, rather than using print_node().
SEF always exists now, and is a boolean, rather than using the existence of SEF as the flag. This was changed for sanity. However, other flags like 'X' are still possibly absent, and in some cases the absence itself has meaning (in the case of 'X', its absence means that the node has not yet been analyzed).
Similarly, an event is distinguished from a UDF by checking for the existence of the 'scope' attribute. This trick works because events are not in the symbol table therefore they have no scope. But this should probably be changed in future to something more rational and faster.
A few minor bugfixes were applied while going through the code.
- Some tabs used as Unicode were written as byte strings. Add the u'\t' prefix.
- After simplifying a%1 -> a&0, fold again the node and return. It's not clear why it didn't return, and whether it depended on subsequent passes (e.g. after DCR) for possibly optimizing out the result. Now we're sure.
- A few places lacked a SEF declaration.
- Formatting changes to split lines that spilled the margin.
- Some comment changes.
- Expanded lazy_list_set definition while adapting it to object format. The plan was to re-compress it after done, but decided to leave it in expanded form.
- Added a few TODOs & FIXMEs, resisting the temptation to fix them in the same commit:
- TODO: ~-~-~-expr -> expr + -3.
- FIXME: Now that we have CompareTrees, we can easily check if expr + -expr cancels out and remove a TODO. Low-hanging fruit.
- TODO: Check what we can do when comparing non-SEF and non-CONST values in '>' (current code relies on converting '>' to '<' for applying more optimizations, but that may miss some opportunities).
- FIXME: Could remove one comparison in nt == '&&' or nt == '||'. Low-hanging fruit.
Implements another TODO.
There was a TODO about a new counter per scope, but that makes no sense. The renamer only acts on global variables, global function and parameter names, state names, and event parameters. We're already restarting the counters at every function, which is the closest to what that TODO was about.
Gives us a few more opportunities for catching single-letter identifiers.
UsedNames was not restarted. It's unlikely that this had any detrimental effect on optimization, and it was certainly safe to not restart it. But it looks more correct like this.
The reusable names table was being emptied as identifiers were used. This was sub-optimal for function parameters, because new identifiers would need to be created when exhausted.
Reset the reusable names list to a saved copy every time a new parameter list is started, in a similar fashion to what we did with the sequentially generated identifiers.
Rather than using tuples for set belonging, use a frozen set. That eliminates the need to separate them by lengths.
Also, 'Pop' is not a reserved word. It's perfectly valid as a possible substitution identifier, so allow it. If used, it will go to the used words anyway and thus will be skipped by the sequential name generator.
Identifiers can be equal if they belong to different syntactic scopes. That will allow better reuse and less creation of new identifiers, most notably as function and event parameter names.
The implementation would require a stack of counters where the current value is pushed when entering a new scope, and popped when exiting, rather than using a single counter for the whole program.