Aaron (sodabrew) wrote,
Aaron
sodabrew

libSieve hacking

Today I spent some time hacking on libSieve for the first time in way too long. I was telling a friend about the inefficiency of putting all of your keywords into a lexer because it has to go through tons of state shifts to get to the terminals. A better way to do it is to match a generic identifier with a regular expression, then look up the matches in a hash table of keywords. Well, enough was enough, time for me to do that! The old CMU sieve-lex.l, with 56 keywords each as its own lexer state, yielded this flex -v output:

/bin/sh ../ylwrap sieve-lex.l lex.yy.c sieve-lex.c — flex -v -s -olex.yy.c
flex version 2.5.33 usage statistics:
scanner options: -svB8 -Cem -olex.yy.c -Plibsieve_sieve
584/2000 NFA states
340/1000 DFA states (737 words)
72 rules
Compressed tables always back-up
Beginning-of-line patterns used
3/40 start conditions
179 epsilon states, 98 double epsilon states
8/100 character classes needed 22/500 words of storage, 0 reused
1235 state/nextstate pairs created
414/821 unique/duplicate transitions
347/1000 base-def entries created
448/2000 (peak 399) nxt-chk entries created
28/2500 (peak 266) template nxt-chk entries created
0 empty table entries
8 protos created
7 templates created, 23 uses
38/256 equivalence classes created
4/256 meta-equivalence classes created
0 (15 saved) hash collisions, 81 DFAs equal
0 sets of reallocations needed
1884 total table entries needed

After moving the keywords out of sieve-lex.l and into sieve-keywords.gperf, replacing them all with a generic state,

ident :?[a-zA-Z_][a-zA-Z_0-9]*
<INITIAL>{ident} { int k = libsieve_keyword(yytext, yyleng); … error handling … }

The flex -v output reads like this:

flex version 2.5.33 usage statistics:
scanner options: -svB8 -Cem -olex.yy.c -Plibsieve_sieve
146/2000 NFA states
72/1000 DFA states (304 words)
19 rules
Compressed tables always back-up
Beginning-of-line patterns used
3/40 start conditions
79 epsilon states, 50 double epsilon states
11/100 character classes needed 165/500 words of storage, 1 reused
830 state/nextstate pairs created
144/686 unique/duplicate transitions
83/1000 base-def entries created
204/2000 (peak 392) nxt-chk entries created
88/2500 (peak 308) template nxt-chk entries created
0 empty table entries
13 protos created
11 templates created, 35 uses
28/256 equivalence classes created
8/256 meta-equivalence classes created
3 (3 saved) hash collisions, 79 DFAs equal
0 sets of reallocations needed
858 total table entries needed

Ok, that’s better. But what about that one ident with the :? up front? What if it were this:

ident [a-zA-Z_][a-zA-Z_0-9]*
<INITIAL>{ident} { int k = libsieve_keyword(yytext, yyleng); … error handling … }
<INITIAL>:{ident} { int k = libsieve_keyword(yytext, yyleng); … error handling … }

And flex -v says:

flex version 2.5.33 usage statistics:
scanner options: -svB8 -Cem -olex.yy.c -Plibsieve_sieve
150/2000 NFA states
73/1000 DFA states (306 words)
20 rules
Compressed tables always back-up
Beginning-of-line patterns used
3/40 start conditions
81 epsilon states, 52 double epsilon states
11/100 character classes needed 165/500 words of storage, 3 reused
844 state/nextstate pairs created
145/699 unique/duplicate transitions
85/1000 base-def entries created
207/2000 (peak 420) nxt-chk entries created
96/2500 (peak 336) template nxt-chk entries created
0 empty table entries
14 protos created
12 templates created, 36 uses
28/256 equivalence classes created
8/256 meta-equivalence classes created
0 (4 saved) hash collisions, 79 DFAs equal
0 sets of reallocations needed
868 total table entries needed

Originally published at HydricAcid.. You can comment here or there.

Tags: general
Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 0 comments