See the JavaCC documentation for details. Also see the mini-tutorial on the JavaCC site for tips on writing lexer specifications from which JavaCC can generate. At the end of the tutorial, we will parse a SQL file and extract table specifications ( please note that this is for an illustrative purpose; complete. In this first edition of the new Cool Tools column, Oliver Enseling discusses JavaCC — the Java Compiler Compiler. JavaCC facilitates.
|Published (Last):||20 December 2005|
|PDF File Size:||7.35 Mb|
|ePub File Size:||15.21 Mb|
|Price:||Free* [*Free Regsitration Required]|
This tutorial refers to examples that are available in the Lookahead directory under the examples directory of the release. We assume that you have already taken a look at some of the simple examples provided in the release before you read this section. The job of a parser is to read an input stream and determine whether or not the input stream conforms to the grammar. This determination in its most general form can be quite time consuming.
Consider the following example file Example1. In this simple example, it is quite clear that there are exactly two strings that match the above grammar, namely:. The general way to perform this match is to walk through the grammar based on the string as follows.
Here, we use “abc” as the input string:. As the above example indicates, the general problem of matching an input with a grammar may result in large amounts of backtracking and making new choices and this can consume a lot of time.
The amount of time taken can also be a function of how the grammar is written. Note that many grammars can be written to cover the same set of inputs – or the same language i.
For example, the following grammar would speed up the parsing of the same language as compared to the previous grammar:. The performance hit from such backtracking is unacceptable for most systems that include a parser.
Hence most parsers do not backtrack in this general manner or do not backtrack at allrather they make decisions at choice points based on limited information and then commit to it. Parsers generated by Java Compiler Compiler make decisions at choice points based on some tutorual of tokens further ahead in the input stream, and once they make such a decision, they commit to it. But these choices are made in different ways and are the subject of a different tutorial.
The default choice determination algorithm looks ahead 1 token in the input stream and uses this to help make its choice at choice points. In the above example, the grammar has been written such that the default choice determination algorithm does the right thing.
Another thing to note is that the choice determination algorithm works in a top to bottom order – if Choice 1 was selected, the other choices are not even considered.
While this is not an issue in this example except for performanceit will become important later below when local ambiguities require the insertion of LOOKAHEAD hints. More on this later. You can try running tuhorial parser generated from Example3. It will complain that it encountered a “. Note – when you built the parser, it would have given you the following warning message:.
Essentially, JavaCC is saying it has detected a situation in your grammar which may cause the default lookahead algorithm to do strange things. The generated parser will still work using the default lookahead algorithm – except that it may not do what you expect of it. Here’s how the choice determination algorithm works:. In the above example, note that the choice determination algorithm does not look beyond the Suppose there was another production in that same grammar as follows file Example5.
Intuitively, the tutoroal thing to do in this situation is to skip the We have shown you examples of two kinds of choice points in the examples above – “exp1 exp So far, we have described the default lookahead algorithm of the generated parsers.
In the majority of situations, the default algorithm works just fine. In situations where it does not work well, Java Compiler Compiler provides you with warning messages like the ones shown above. If you have a grammar that goes through Java Compiler Compiler without producing any warnings, then the grammar is a LL 1 grammar. You can modify your grammar so that the warning messages go away.
That is, you can attempt to make your grammar LL 1 by making some changes to it.
JavaCC™: LOOKAHEAD MiniTutorial
What we have done here is to factor the fourth choice into the first choice. This process of tutofial grammars to make them LL 1 is called “left factoring”. You can provide the generated parser with some hints to help it out in the non-LL 1 situations that the warning messages bring to your attention.
A design decision must be made to determine if Option 1 or Option 2 is the right one to take. The only advantage of choosing Option 1 is that it makes your grammar perform better.
Getting started in JavaCC
JavaCC generated parsers can handle LL 1 constructs much faster than other constructs. However, the advantage of choosing Option 2 is that you have a simpler grammar – one that is easier to develop and maintain – one that focuses on human-friendliness and not machine-friendliness.
Sometimes Option 2 is the only choice – especially in the presence of user actions. The value of this option is an integer which is the number of tokens to look ahead when making choice decisions. Suppose you set the value of this option to 2. Hence, the parser will now work properly for Example3. Similarly, the problem with Example5.
This way, the majority of the grammar can remain LL 1 and hence perform better, while at the same time one gets the flexibility of LL k grammars. Most grammars are predominantly LL 1hence you will be unnecessarily degrading performance by converting the entire grammar to LL k to facilitate just some portions of the grammar that are not LL 1.
If your grammar and input files being parsed are very small, then this is okay. You should also keep in mind that the warning messages JavaCC prints when it detects ambiguities at choice points such as the two messages shown earlier simply tells you that the specified choice points are not LL 1.
The “else S2” can be bound to either of the two if statements. The standard interpretation is that it is bound to the inner if statement the one closest to it.
The default choice determination algorithm happens to do the right thing, but it still prints the following warning message:. To suppress the warning message, you could simply tell JavaCC that you know what you are doing as follows:. At the syntactic level, ClassDeclaration can start with any number of “abstract”s, “final”s, and “public”s.
While a subsequent semantic check will produce error messages for multiple uses of the same modifier, this does not happen until parsing is completely over. Similarly, InterfaceDeclaration can start with any number of “abstract”s and “public”s. What if the next tokens in the input stream are a very large number of “abstract”s yutorial of them followed by “interface”?
One can argue that this is such a weird situation that it does not warrant any reasonable error message and that it is okay to make the wrong choice in some pathological situations. But suppose one wanted to be precise about this. One way to do this is to use a very large integer value such as the largest possible integer as follows:. In this case, the LOOKAHEAD calculation can stop as soon as the token “class” is encountered, but the specification forces the calculation to continue until the end of the class declaration has been reached – which is rather time consuming.
By doing this, you make the choice determination algorithm stop as soon as it sees “class” – i. Actually, when such a limit is not specified, it defaults to the largest integer value Let us suppose that there is a good reason for writing a grammar this way maybe the way actions are embedded.
As noted earlier, this grammar recognizes two string “abc” and “abcc”.
An Introduction to JavaCC
The problem here is that the default LL 1 algorithm will choose the [ “c” ] every time it sees a “c” and therefore “abc” will never tutorail matched. We need to specify that this choice must be made only when the next token is a “c”, and the token following that is not a “c”. The boolean expression essentially states the desired property.
The ttutorial determination decision is therefore:. A couple of advanced topics follow. At least one of the three entries must be present. If more than one are present, they are separated by commas. The default values for each of these entities is defined below:. Lookahead tutorial We assume that you have already taken a look at some of the simple examples provided in the release before you read this section. Here, we use “abc” as the input string: There is only one choice here – the first input character must be ‘a’ – and since that is indeed the case, we are OK.
We now proceed on to non-terminal BC. Here again, there is only one choice for the next input character – it must be ‘b’. The input matches this one too, so we are still OK. We now come to a “choice point” in the grammar.
We can either go inside the [ We decide to go inside. So the next input character must be a ‘c’. We are again OK. Now we have completed with non-terminal BC and go back to non-terminal Input.
Now the grammar says the next character must be yet another ‘c’. But there are no more input characters. So we have a problem. When we have such a problem in the general case, we conclude that we may have made tuotrial bad choice somewhere. In this case, we made the bad choice in Step 3. So we retrace our steps back to step 3 and make another choice and try that. This process is called “backtracking”. We have now backtracked and made the other choice we could have made at Step 3 – namely, ignore the [ The next input character is a ‘c’, so we are OK javavc.
We realize we have reached the end of the grammar end of non-terminal Input successfully.