Hi,
Here are my thoughts regarding positions and AST. The major goal was
to come up with a consistent view, even though this causes some more work
for refactoring and for the implementation of the new AST. But I think
it might be worth if we can come up with a consistent story for positions.
Some general statements
-
sourceStart and sourceEnd should always cover the whole node. This is different
to the current implementation where for some nodes declarationSourceStart
and declararionSourcEnd covers the whole node and sourceStart and sourceEnd
only covers the name (examples are: LocalVariableDeclaration, TypeDeclararion,
...)
-
sourceStart and sourceEnd should also cover all subnodes
-
whenever possible we should follow the grammar as defined in The Java Language
Specification book. So if the grammar says that a production includes the
semicolon then the AST node should include it too. For example the grammar
defines a return statement like
return (expression) ;
So the corresponding AST node should include the ;
Some statements from earlier discussions (mainly between Jim, Philippe,
and me)
-
There will be an ExpressionStatement node for expressions used as statements.
For example "if (isChecked()) {}" versus "isCheck();". We agreed that the
expression will not include the semicolon and the ExpressionStatement will.
For the isCheck() example this will look like [[isChecked()];]. This is
consistent with the grammer defined in (3). This together with the general
statement (2) leads to the conclusion that statements that have child statements
will include the semicolon if the child statement has one. For the example
for (int i= 0; i < 10; i++)
foo();
sourceEnd of the for statement will include the semicolon of the expression
statement.
Open issues
Multiple local declarations
Currently multiple local declarations appear in the AST as n separate local
declarations without any relationship to each others. This raises various
questions:
-
what are the positions of those local declarations
-
how is a visitor of that AST able to figure out that he deals with multiple
local declaration.
Since the new AST isn't a 1:1 mapping of the compiler's AST anyway (we
have the ExpressionStatement node) I opt to introduce new nodes as defined
in the grammar. Since the semicolon doesn't belong to the variable declaration,
it should be managed by the parent node that ties together multiple declarations.
Here is an example:
int x= 10, x[]= null, i;
LocalVariableDeclaration node manages:
the type (e.g. int)
the positions of the commas (if needed)
the actual variable declarators
sourceStart= start of the type
sourceEnd= ;
VariableDeclarator node manages:
the variable name and its positions
the initialization
sourceStart= start of variable name
sourceEnd= end of initialization. Doesn't include
the comma.
If we want to do some optimization we could also have a node SingleLocalVariableDeclaration
for declaration like int x; or int y= 10; The node would have the following
fields:
the type
the variable name and its positions
the initialization
sourceStart= start of type
sourceEnd= ;
Updates in for statements
Analogous to the local variable declaration, the comma to separate the
update expressions can not be part of the expression (expressions don't
contain a semicolon so they can't contain a comma either). To know the
positions of the commas the for statement should manage them in a separate
array.
The general rule is, that whenever language elements are separate
using a comma (for example an interface list in the implements statement,
arguments of a method declaration, ...) the node containing the separated
nodes should manage the positions of the comma, if they are of any interest.
In a first implementation we could leave these positions out and use the
scanner to find them if they are of interest.
Treatment of semicolon
From our experiences with refactoring it is helpful in some cases to know
where the position of the semicolon is. For example if the user extract
a for statement and he doesn't select the action's semicolon we allow the
extraction. So what can we do in these cases:
-
simple don't allow the case. To support better selection we can offer some
actions to extend the text selection to spawn valid AST nodes. We have
a running prototype for this.
-
do some parsing of the source code to find the position of the semicolon.
We could use the scanner for this.
Answers to explicit questions from Olivier
-
for (;;); : in this case the for statement should cover the semicolon.
The best way to achieve this is to have an empty statement as defined in
the grammar.
-
declaration source start of an argument: yes, Adam uses argument.type.sourceStart
as the start not declarationSourceStart.
-
test: the test we have are the refactoring test. We don't have special
test to check if the AST positions are correct.