Friday, October 5, 2007

Outline of the Flex Generated Scanner Routine yylex()

Recently I was implementing a parser for the Network Access Identifier (NAI) using flex and yacc (that comes default with an Red Hat Enterprise Linux installation). While working on these tools, I realized how important it was to understand the code that is generated, to appreciate the functioning of these tools. It might not be essential to know every detail of the generated code (but that is a good exercise) to make effective use of these tools, but knowing the big picture will help you to grasp the essence of the tools.

To understand how flex works, it will be enlightening to see the file lex.yy.c that it generates. We all know that the output of the lexical analyzer (flex) is the C function yylex(). While the actual generated yylex() routine is complex, its outline is provided below.
 1 int yylex(void)
2
{
3
YY_USER_INIT;
4 while
(1) {
5
read_input_and_match_pattern();
6 int
yy_act = find_correct_action();
7 switch
( yy_act ) {
8 case
1:
9
YY_RULE_SETUP;
10
/* user-defined action */
11
YY_BREAK;
12 break
;
13
/* more-user-defined-actions */
14 case
82:
15
YY_RULE_SETUP;
16
/* pre-defined action */
17
YY_BREAK;
18 break
;
19
/* more-pre-defined-actions */
20 default
:
21
/* error: no action */
22
}
23
}
24
}
Here are some useful information to help in understanding the big picture.
  • The macros YY_USER_INIT, YY_RULE_SETUP, YY_BREAK can be defined by the user. They are the hooks provided by the tool, so that you can have a little bit of customization for the generated yylex() function. Under normal circumstances, you will be pointing them to your own functions.
  • I hope you are aware of the flex rules that you specify. The rules have patterns in the left hand side and user-defined actions in the right hand side. These user defined actions would be part of the switch-case statement. This information would be very useful for you to understand where the C-code that you write in the flex input file goes.
  • Apart from the user-defined actions, there are also pre-defined actions. The predefined action for any character is to print it to the standard output. Similarly, there are predefined actions for the end-of-file.
  • If you look at the above yylex() function, you will notice that it will not return unless the user-defined action uses the return statement. When you have to make flex and yacc work together, then you have to return from the lex generated scanner after every token.
The functions read_input_and_match_pattern() and find_correct_action() used in the above code snippet are just placeholder functions (pseudo-code) to show the functionality provided by the relevant portion of the yylex() routine. Let that not confuse you. Remember that the point of this post is to know the place of the user-defined action in the yylex() routine.

0 comments: