[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Patch to gperf 2.7.2 for determining minimal key character position set
From: |
Bruce Lilly |
Subject: |
Patch to gperf 2.7.2 for determining minimal key character position set |
Date: |
Mon, 25 Feb 2002 15:18:44 -0500 |
Currently, the user must specify a set of key
positions with the -k option to specify key
character positions for the hash function.
Determining an optimal set of character positions
manually is not a pleasant task for a large key
set.
The attached patch to gperf 2.7.2 computes an
optimal set if there is no user-specified -k
option. Because the computation may take some
time, the current defaults of '1,$' are used if
the -f option is given (one could also explicitly
specify '1,$'). The optimal set determined is
recorded in a comment so that future gperf runs
on the same key set can explicitly set the
appropriate -k option.
In implementing this, the underlying implementation
of the stored key positions was changed; sorting
is no longer required, and as the key positions are
recorded as a map, access to whether a particular
position is set is now O(1) instead of linear (O(N)).
The patch includes an improved version of the case-
independence patch and also corrects a typo.
Best regards,
Bruce Lilly
*** doc/gperf.1.orig Fri Feb 15 00:33:37 2002
--- doc/gperf.1 Fri Feb 15 00:37:30 2002
***************
*** 54,62 ****
\fB\-7\fR, \fB\-\-seven\-bit\fR
Assume 7-bit characters.
.TP
\fB\-c\fR, \fB\-\-compare\-strncmp\fR
Generate comparison code using strncmp rather than
! strcmp.
.TP
\fB\-C\fR, \fB\-\-readonly\-tables\fR
Make the contents of generated lookup tables
--- 54,65 ----
\fB\-7\fR, \fB\-\-seven\-bit\fR
Assume 7-bit characters.
.TP
+ \fB\-A\fR, \fB\-\-ignore\-case\fR
+ Match keys independent of case.
+ .TP
\fB\-c\fR, \fB\-\-compare\-strncmp\fR
Generate comparison code using strncmp rather than
! strcmp (strncasecmp if -A is specified).
.TP
\fB\-C\fR, \fB\-\-readonly\-tables\fR
Make the contents of generated lookup tables
*** doc/gperf.texi.orig Fri Feb 15 00:33:59 2002
--- doc/gperf.texi Sun Feb 24 18:41:24 2002
***************
*** 535,545 ****
However, the keywords in the input file still must not contain NUL
characters.
! If option @samp{-l} is used, then the hash table performs binary
! comparison. The keywords in the input file may contain NUL characters,
! written in string syntax as @code{\000} or @code{\x00}, and the code
! generated by @code{gperf} will treat NUL like any other character.
! Also, in this case the @samp{-c} option is ignored.
@node Options, Bugs, Description, Top
@chapter Invoking @code{gperf}
--- 535,545 ----
However, the keywords in the input file still must not contain NUL
characters.
! If option @samp{-l} is used and @samp{-A} is not used, then the hash table
! performs binary comparison. The keywords in the input file may contain NUL
! characters, written in string syntax as @code{\000} or @code{\x00}, and the
! code generated by @code{gperf} will treat NUL like any other character. Also,
! in this case the @samp{-c} option is ignored.
@node Options, Bugs, Description, Top
@chapter Invoking @code{gperf}
***************
*** 671,680 ****
default in versions of @code{gperf} earlier than 2.7; now the default is
to assume 8-bit characters.
@item -c
@itemx --compare-strncmp
Generates C code that uses the @code{strncmp} function to perform
! string comparisons. The default action is to use @code{strcmp}.
@item -C
@itemx --readonly-tables
--- 671,685 ----
default in versions of @code{gperf} earlier than 2.7; now the default is
to assume 8-bit characters.
+ @item -A
+ @itemx --ignore-case
+ Matches keywords in a case-independent manner.
+
@item -c
@itemx --compare-strncmp
Generates C code that uses the @code{strncmp} function to perform
! string comparisons (@code{strncasecmp} if -A is specified). The default
! action is to use @code{strcmp} (@code{strcasecmp} if -A is specified).
@item -C
@itemx --readonly-tables
*** src/gen-perf.cc.orig Thu Feb 14 22:19:30 2002
--- src/gen-perf.cc Sun Feb 24 18:35:50 2002
***************
*** 19,24 ****
--- 19,25 ----
along with GNU GPERF; see the file COPYING. If not, write to the Free
Software Foundation, 59 Temple Place - Suite 330, Boston, MA 02111, USA. */
+ #include <ctype.h>
#include <stdio.h>
#include <stdlib.h> /* declares rand(), srand() */
#include <time.h> /* declares time() */
***************
*** 41,46 ****
--- 42,51 ----
int asso_value_max;
int non_linked_length;
+ if (option[DEFAULTCHARS] && option[FAST]) {
+ option.add_key(1);
+ option = LASTCHAR;
+ }
Key_List::read_keys ();
if (option[ORDER])
reorder ();
***************
*** 61,67 ****
srand ((long) time (0));
for (int i = 0; i < ALPHA_SIZE; i++)
! asso_values[i] = (rand () & asso_value_max - 1);
}
else
{
--- 66,75 ----
srand ((long) time (0));
for (int i = 0; i < ALPHA_SIZE; i++)
! if ((option[NOCASE] != 0) && (islower(i) != 0))
! asso_values[i] = asso_values[toupper(i)];
! else
! asso_values[i] = (rand () & asso_value_max - 1);
}
else
{
***************
*** 187,192 ****
--- 195,205 ----
asso_values[(unsigned char)c] =
(asso_values[(unsigned char)c] + (option.get_jump () ?
option.get_jump () : rand ()))
& (option.get_asso_max () - 1);
+ if (option[NOCASE])
+ if (isupper((unsigned char)c))
+ asso_values[tolower(c)] = asso_values[(unsigned char)c];
+ else if (islower((unsigned char)c))
+ asso_values[toupper(c)] = asso_values[(unsigned char)c];
/* Iteration Number array is a win, O(1) intialization time! */
reset ();
***************
*** 208,213 ****
--- 221,231 ----
/* Restore original values, no more tries. */
asso_values[(unsigned char)c] = original_char;
+ if (option[NOCASE])
+ if (isupper((unsigned char)c))
+ asso_values[tolower(c)] = original_char;
+ else if (islower((unsigned char)c))
+ asso_values[toupper(c)] = original_char;
/* If we're this far it's time to try the next character.... */
return 1;
}
*** src/key-list.cc.orig Thu Feb 14 22:10:58 2002
--- src/key-list.cc Sun Feb 24 21:42:14 2002
***************
*** 470,475 ****
--- 470,540 ----
}
if (option[ALLCHARS])
option.set_keysig_size (max_key_len);
+ if (option[DEFAULTCHARS] && !option[FAST]) /* find minimal key char set
for keys */
+ {
+ int i, m, n, npos, x;
+
+ for (x=0; x<2; x++) /* 0: w/o LASTCHAR, 1: w/ */
+ {
+ option.set_keys(npos=max_key_len); /* all chars */
+ for (m=max_key_len; m>0; m--)
+ { /* fast but not optimal elimination of key positions
that don't differentiate key words */
+ option.remove_key(m);
+ npos--;
+ rehash(); /* uses hash surrogate which considers raw
char_set and optionally key_length */
+ if (collision())
+ { /* need key position m; restore it */
+ option.add_key(m);
+ npos++;
+ }
+ }
+ if (npos && (npos != max_key_len))
+ {
+ option.clear_keys(); /* find optimum key set for npos keys
(slow) */
+ for (i=0,m=max_key_len; i<npos; i++,m--)
+ option.add_key(m); /* largest key positions (ignored for
short keys) */
+ do
+ rehash();
+ while (collision() && option.worse_keyset()); /* move some
key positions to earlier characters if required */
+ for (m=max_key_len; m>0; m--) /* further elimination of
non-differentiating key positions (very fast) */
+ {
+ if (!option.has_key(m))
+ continue;
+ option.remove_key(m);
+ npos--;
+ rehash();
+ if (collision())
+ { /* need key position m; restore it */
+ option.add_key(m);
+ npos++;
+ break; /* no further elimination possible */
+ }
+ }
+ }
+ if (!x)
+ {
+ option.save_keys();
+ n = npos;
+ option = LASTCHAR; /* try again with '$' */
+ }
+ else if (n <= npos + 1)
+ {
+ option != LASTCHAR;
+ option.restore_keys();
+ npos = n;
+ }
+ }
+ rehash(); /* fix char_set */
+ /* set occurrences to correct values */
+ memset(occurrences, 0, MAX_ALPHA_SIZE * sizeof(int));
+ for (temp = head; temp; temp=temp->next)
+ for (ptr=temp->char_set,i=1,m=temp->char_set_length; i <= m;
i++,ptr++)
+ {
+ ++occurrences[(unsigned char)(*ptr)];
+ if (option[NOCASE])
+ ++occurrences[toupper(*ptr)];
+ }
+ }
}
}
***************
*** 485,490 ****
--- 550,557 ----
T (Trace t ("Key_List::merge");)
List_Node *result;
List_Node **resultp = &result;
+ int len = !option[NOLENGTH];
+ int which;
for (;;)
{
if (!list1)
***************
*** 497,507 ****
*resultp = list1;
break;
}
! if (occurrence_sort && list1->occurrence < list2->occurrence
! || hash_sort && list1->hash_value > list2->hash_value)
{
*resultp = list2;
! resultp = &list2->next; list2 = list1; list1 = *resultp;
}
else
{
--- 564,595 ----
*resultp = list1;
break;
}
! which = 0;
! if (occurrence_sort)
! if (list1->occurrence < list2->occurrence)
! which = 1;
! else
! which = -1;
! if (!which) {
! if (hash_sort)
! if (list1->hash_value > list2->hash_value)
! which = 1;
! else
! which = -1;
! if (!which) {
! if (len)
! which = list1->key_length - list2->key_length;
! if (!which) {
! which = list1->char_set_length - list2->char_set_length;
! if (!which)
! which = strcmp(list1->char_set, list2->char_set);
! }
! }
! }
! if (which > 0)
{
*resultp = list2;
! resultp = &list2->next; list2 = *resultp;
}
else
{
***************
*** 587,593 ****
}
/* Reorders the table by first sorting the list so that frequently occuring
! keys appear first, and then the list is reorded so that keys whose values
are already determined will be placed towards the front of the list. This
helps prune the search time by handling inevitable collisions early in the
search process. See Cichelli's paper from Jan 1980 JACM for details.... */
--- 675,681 ----
}
/* Reorders the table by first sorting the list so that frequently occuring
! keys appear first, and then the list is reordered so that keys whose values
are already determined will be placed towards the front of the list. This
helps prune the search time by handling inevitable collisions early in the
search process. See Cichelli's paper from Jan 1980 JACM for details.... */
***************
*** 708,713 ****
--- 796,864 ----
return count;
}
+ /* -------------------------------------------------------------------------
*/
+
+ /* Returns nonzero if there would be duplicate hash values. */
+
+ int
+ Key_List::collision (void)
+ {
+ T (Trace t ("Key_List::collision");)
+ List_Node *temp;
+ int len = !option[NOLENGTH];
+
+ /* sort by key length, char_set_length, char_set */
+ hash_sort = occurrence_sort = 0;
+ for (temp = head = merge_sort(head); temp->next; temp=temp->next)
+ {
+ if (len && (temp->key_length != temp->next->key_length))
+ continue;
+ if (temp->char_set_length != temp->next->char_set_length)
+ continue;
+ if (!memcmp(temp->char_set, temp->next->char_set,
temp->char_set_length))
+ return 1;
+ }
+ return 0;
+ }
+
+ /* -------------------------------------------------------------------------
*/
+
+ /* Re-sorts list by key length (if !option[NOLENGTH]) char_set_length, and
char_set. */
+ /* Maintains consistent char_set, char_set_length. */
+
+ void
+ Key_List::rehash (void)
+ {
+ T (Trace t ("Key_List::rehash");)
+ List_Node *temp;
+ char *cs, *ptr;
+ const char *key;
+ int i, j;
+ char tmp;
+
+ for (temp = head; temp; temp=temp->next)
+ {
+ for (cs=ptr=temp->char_set,i=1,j=temp->key_length,key=temp->key; i <=
j; i++)
+ if (option.has_key(i))
+ if (option[NOCASE] && isupper(key[i-1]))
+ *ptr++ = tolower(key[i-1]); /* char_set is always lower-case for
case-independence */
+ else
+ *ptr++ = key[i-1];
+ if (option[LASTCHAR])
+ if (option[NOCASE] && isupper(key[j-1]))
+ *ptr++ = tolower(key[j-1]);
+ else
+ *ptr++ = key[j-1];
+ *ptr = '\0';
+ for (i=0,j=(temp->char_set_length=ptr-cs)-1; i<j; i++)
+ {
+ for (j=i+1,tmp=cs[j]; j>0 && tmp<cs[j-1]; j--)
+ cs[j] = cs[j - 1];
+ cs[j] = tmp;
+ }
+ }
+ }
+
/* -------------------- Output_Constants and subclasses --------------------
*/
/* This class outputs an enumeration defining some constants. */
***************
*** 972,977 ****
--- 1123,1174 ----
printf ("[len] == '\\0'");
}
+ /* This class outputs a comparison using strcasecmp. */
+
+ struct Output_Compare_Strcasecmp : public Output_Compare
+ {
+ virtual void output_comparison (const Output_Expr& expr1,
+ const Output_Expr& expr2) const;
+ Output_Compare_Strcasecmp () {}
+ virtual ~Output_Compare_Strcasecmp () {}
+ };
+
+ void Output_Compare_Strcasecmp::output_comparison (const Output_Expr& expr1,
+ const Output_Expr& expr2) const
+ {
+ T (Trace t ("Output_Compare_Strcasecmp::output_comparison");)
+ printf ("!strcasecmp (");
+ expr1.output_expr ();
+ printf (", ");
+ expr2.output_expr ();
+ printf (")");
+ }
+
+ /* This class outputs a comparison using strncasecmp.
+ Note that the length of expr1 will be available through the local variable
+ `len'. */
+
+ struct Output_Compare_Strncasecmp : public Output_Compare
+ {
+ virtual void output_comparison (const Output_Expr& expr1,
+ const Output_Expr& expr2) const;
+ Output_Compare_Strncasecmp () {}
+ virtual ~Output_Compare_Strncasecmp () {}
+ };
+
+ void Output_Compare_Strncasecmp::output_comparison (const Output_Expr& expr1,
+ const Output_Expr& expr2)
const
+ {
+ T (Trace t ("Output_Compare_Strncasecmp::output_comparison");)
+ printf ("!strncasecmp (");
+ expr1.output_expr ();
+ printf (", ");
+ expr2.output_expr ();
+ printf (", len) && ");
+ expr2.output_expr ();
+ printf ("[len] == '\\0'");
+ }
+
/* This class outputs a comparison using memcmp.
Note that the length of expr1 (available through the local variable `len')
must be verified to be equal to the length of expr2 prior to this
***************
*** 1009,1016 ****
Key_List::output_hash_function (void)
{
T (Trace t ("Key_List::output_hash_function");)
! const int max_column = 10;
int field_width;
/* Calculate maximum number of digits required for MAX_HASH_VALUE. */
field_width = 2;
--- 1206,1215 ----
Key_List::output_hash_function (void)
{
T (Trace t ("Key_List::output_hash_function");)
! const int max_column = 8;
int field_width;
+ int key_pos;
+ int contd = 0;
/* Calculate maximum number of digits required for MAX_HASH_VALUE. */
field_width = 2;
***************
*** 1053,1171 ****
/* Output the function's body. */
printf ("{\n");
! /* First the asso_values array. */
! printf (" static %s%s asso_values[] =\n"
! " {",
! const_readonly_array,
! smallest_integral_type (max_hash_value + 1));
!
! for (int count = 0; count < ALPHA_SIZE; count++)
{
! if (count > 0)
! printf (",");
! if (!(count % max_column))
! printf ("\n ");
! printf ("%*d", field_width,
! occurrences[count] ? asso_values[count] : max_hash_value + 1);
! }
!
! printf ("\n"
! " };\n");
!
! /* Optimize special case of ``-k 1,$'' */
! if (option[DEFAULTCHARS])
! printf (" return %sasso_values[%sstr[len - 1]] +
asso_values[%sstr[0]];\n",
! option[NOLENGTH] ? "" : "len + ",
! char_to_index, char_to_index);
! else
! {
! int key_pos;
!
! option.reset ();
! /* Get first (also highest) key position. */
! key_pos = option.get ();
!
! if (!option[ALLCHARS] && (key_pos == WORD_END || key_pos <=
min_key_len))
{
! /* We can perform additional optimizations here:
! Write it out as a single expression. Note that the values
! are added as `int's even though the asso_values array may
! contain `unsigned char's or `unsigned short's. */
!
! printf (" return %s",
! option[NOLENGTH] ? "" : "len + ");
!
! for (; key_pos != WORD_END; )
! {
! printf ("asso_values[%sstr[%d]]", char_to_index, key_pos - 1);
! if ((key_pos = option.get ()) != EOS)
! printf (" + ");
! else
! break;
! }
!
! if (key_pos == WORD_END)
! printf ("asso_values[%sstr[len - 1]]", char_to_index);
!
! printf (";\n");
}
- else
- {
- /* We've got to use the correct, but brute force, technique. */
- printf (" register int hval = %s;\n\n"
- " switch (%s)\n"
- " {\n"
- " default:\n",
- option[NOLENGTH] ? "0" : "len",
- option[NOLENGTH] ? "len" : "hval");
! /* User wants *all* characters considered in hash. */
! if (option[ALLCHARS])
! {
! for (int i = max_key_len; i > 0; i--)
! printf (" case %d:\n"
! " hval += asso_values[%sstr[%d]];\n",
! i, char_to_index, i - 1);
!
! printf (" break;\n"
! " }\n"
! " return hval;\n");
! }
! else /* do the hard part... */
! {
! while (key_pos != WORD_END && key_pos > max_key_len)
! if ((key_pos = option.get ()) == EOS)
! break;
! if (key_pos != EOS && key_pos != WORD_END)
! {
! int i = key_pos;
! do
! {
! for ( ; i >= key_pos; i--)
! printf (" case %d:\n", i);
! printf (" hval += asso_values[%sstr[%d]];\n",
! char_to_index, key_pos - 1);
! key_pos = option.get ();
! }
! while (key_pos != EOS && key_pos != WORD_END);
! for ( ; i >= min_key_len; i--)
! printf (" case %d:\n", i);
! }
! printf (" break;\n"
! " }\n"
! " return hval");
! if (key_pos == WORD_END)
! printf (" + asso_values[%sstr[len - 1]]", char_to_index);
! printf (";\n");
! }
! }
}
printf ("}\n\n");
}
--- 1252,1344 ----
/* Output the function's body. */
printf ("{\n");
! if (option.get_max_keysig_size())
{
! /* First the asso_values array. */
! printf (" static %s%s asso_values[] =\n"
! " {",
! const_readonly_array,
! smallest_integral_type (max_hash_value + 1));
! for (int count = 0; count < ALPHA_SIZE; count++)
{
! if (count > 0)
! printf (",");
! if (!(count % max_column)) {
! if (count && isalnum(count-1) && isalnum(count-max_column))
! printf ("\t/* %c - %c */", count-max_column, count-1);
! printf ("\n ");
! }
! printf ("%*d", field_width,
! occurrences[count] ? asso_values[count] : max_hash_value +
1);
}
! printf ("\n"
! " };\n");
! }
! if (!option[ALLCHARS] && (option.max_key() <= min_key_len))
! {
! /* We can perform additional optimizations here:
! Write it out as a single expression. Note that the values
! are added as `int's even though the asso_values array may
! contain `unsigned char's or `unsigned short's. */
! if (!option[NOLENGTH])
! contd = 1;
! printf (" return %s", contd ? "len" : "");
! for (key_pos=1; key_pos <= min_key_len; key_pos++)
! if (!option.has_key(key_pos))
! continue;
! else
! printf ("%sasso_values[%sstr[%d]]",
! contd++?" + ":"", char_to_index, key_pos - 1);
! if (option[LASTCHAR])
! printf ("%sasso_values[%sstr[len - 1]]", contd?" + ":"",
char_to_index);
!
! printf (";\n");
}
+ else
+ {
+ /* We've got to use the correct, but brute force, technique. */
+ printf (" register int hval = %s;\n\n"
+ " switch (%s)\n"
+ " {\n"
+ " default:\n",
+ option[NOLENGTH] ? "0" : "len",
+ option[NOLENGTH] ? "len" : "hval");
+
+ /* User wants *all* characters considered in hash. */
+ if (option[ALLCHARS])
+ {
+ for (int i = max_key_len; i > 0; i--)
+ printf (" case %d:\n"
+ " hval += asso_values[%sstr[%d]];\n",
+ i, char_to_index, i - 1);
+
+ printf (" break;\n"
+ " }\n"
+ " return hval;\n");
+ }
+ else /* do the hard part... */
+ {
+ for (key_pos=max_key_len; key_pos > 0; key_pos--) {
+ printf (" case %d:\n", key_pos);
+ if (option.has_key(key_pos))
+ printf (" hval += asso_values[%sstr[%d]];\n",
+ char_to_index, key_pos - 1);
+ }
+ printf (" break;\n"
+ " }\n"
+ " return hval");
+ if (option[LASTCHAR])
+ printf (" + asso_values[%sstr[len - 1]]", char_to_index);
+ printf (";\n");
+ }
+ }
printf ("}\n\n");
}
***************
*** 2010,2021 ****
if (!option[GLOBAL])
output_lookup_tables ();
! if (option[LENTABLE])
! output_lookup_function_body (Output_Compare_Memcmp ());
! else
! {
! if (option[COMP])
! output_lookup_function_body (Output_Compare_Strncmp ());
else
output_lookup_function_body (Output_Compare_Strcmp ());
}
--- 2183,2202 ----
if (!option[GLOBAL])
output_lookup_tables ();
! if (option[LENTABLE]) {
! if (option[NOCASE])
! output_lookup_function_body (Output_Compare_Strcasecmp ());
! else
! output_lookup_function_body (Output_Compare_Memcmp ());
! } else
! {
! if (option[COMP]) {
! if (option[NOCASE])
! output_lookup_function_body (Output_Compare_Strncasecmp ());
! else
! output_lookup_function_body (Output_Compare_Strncmp ());
! } else if (option[NOCASE])
! output_lookup_function_body (Output_Compare_Strcasecmp ());
else
output_lookup_function_body (Output_Compare_Strcmp ());
}
*** src/key-list.h.orig Sat Feb 23 14:07:28 2002
--- src/key-list.h Sun Feb 24 17:57:14 2002
***************
*** 77,82 ****
--- 77,84 ----
const char *get_special_input (char delimiter);
List_Node *merge (List_Node *list1, List_Node *list2);
List_Node *merge_sort (List_Node *head);
+ int collision (void);
+ void rehash (void);
protected:
List_Node *head; /* Points to the head of
the linked list. */
*** src/list-node.cc.orig Thu Feb 14 23:09:40 2002
--- src/list-node.cc Mon Feb 25 13:59:08 2002
***************
*** 20,25 ****
--- 20,26 ----
#include "list-node.h"
+ #include <ctype.h>
#include <stdio.h>
#include <stdlib.h> /* declares exit() */
#include "options.h"
***************
*** 61,95 ****
link (0), next (0), key (k), key_length (len), rest (r), index (0)
{
T (Trace t ("List_Node::List_Node");)
! char *key_set = new char[(option[ALLCHARS] ? len :
option.get_max_keysig_size ())];
char *ptr = key_set;
int i;
! if (option[ALLCHARS]) /* Use all the character positions in the
KEY. */
! for (i = len; i > 0; k++, ptr++, i--)
! ++occurrences[(unsigned char)(*ptr = *k)];
! else /* Only use those character positions
specified by the user. */
{
! /* Iterate through the list of key_positions, initializing occurrences
table
! and char_set (via char * pointer ptr). */
! for (option.reset (); (i = option.get ()) != EOS; )
! {
! if (i == WORD_END) /* Special notation for last KEY
position, i.e. '$'. */
! *ptr = key[len - 1];
! else if (i <= len) /* Within range of KEY length, so we'll keep
it. */
! *ptr = key[i - 1];
! else /* Out of range of KEY length, so we'll just
skip it. */
! continue;
! ++occurrences[(unsigned char)(*ptr++)];
}
/* Didn't get any hits and user doesn't want to consider the
keylength, so there are essentially no usable hash positions! */
if (ptr == char_set && option[NOLENGTH])
{
fprintf (stderr, "Can't hash keyword %.*s with chosen key
positions.\n",
! key_length, key);
exit (1);
}
}
--- 62,115 ----
link (0), next (0), key (k), key_length (len), rest (r), index (0)
{
T (Trace t ("List_Node::List_Node");)
! char *key_set = new char[(option[ALLCHARS]? len + 1 : option[DEFAULTCHARS]
? option[FAST] ? 3 : len + 2 : option.get_max_keysig_size () +
(option[LASTCHAR]?2:1))];
char *ptr = key_set;
int i;
! if (option[ALLCHARS] || (option[DEFAULTCHARS] && !option[FAST])) {
/* Use all the character positions in the KEY. */
! for (i=1; i<=len; k++, ptr++, i++)
! {
! if (option[NOCASE] && isupper(*k))
! *ptr = tolower(*k); /* char_set is always lower-case for
case-independence */
! else
! *ptr = *k;
! ++occurrences[(unsigned char)(*ptr)];
! if (option[NOCASE])
! ++occurrences[toupper(*ptr)];
! }
! } else /* Only use those character positions
specified by the user. */
{
! /* Iterate through the list of key_positions, initializing char_set
(via char * pointer ptr). */
! for (i=1; i <= len; i++)
! if (option.has_key(i)) {
! if (option[NOCASE] && isupper(k[i-1]))
! *ptr = tolower(k[i-1]); /* char_set is always lower-case for
case-independence */
! else
! *ptr = k[i-1];
! ++occurrences[(unsigned char)(*ptr)];
! if (option[NOCASE])
! ++occurrences[toupper(*ptr)];
! ptr++;
}
+ if (option[LASTCHAR]) { /* Special notation for last KEY
position, i.e. '$'. */
+ if (option[NOCASE] && isupper(k[len-1]))
+ *ptr = tolower(k[len-1]);
+ else
+ *ptr = k[len - 1];
+ ++occurrences[(unsigned char)(*ptr)];
+ if (option[NOCASE])
+ ++occurrences[toupper(*ptr)];
+ ptr++;
+ }
+ *ptr = '\0';
/* Didn't get any hits and user doesn't want to consider the
keylength, so there are essentially no usable hash positions! */
if (ptr == char_set && option[NOLENGTH])
{
fprintf (stderr, "Can't hash keyword %.*s with chosen key
positions.\n",
! key_length, k);
exit (1);
}
}
*** src/list-node.h.orig Sat Feb 23 20:13:28 2002
--- src/list-node.h Sun Feb 24 21:44:48 2002
***************
*** 33,39 ****
const char *key; /* Each keyword string stored here. */
int key_length; /* Length of the key. */
const char *rest; /* Additional information for building hash
function. */
! const char *char_set; /* Set of characters to hash, specified by
user. */
int char_set_length; /* Length of char_set. */
int hash_value; /* Hash value for the key. */
int occurrence; /* A metric for frequency of key set
occurrences. */
--- 33,39 ----
const char *key; /* Each keyword string stored here. */
int key_length; /* Length of the key. */
const char *rest; /* Additional information for building hash
function. */
! char *char_set; /* Set of characters to hash, specified by
user. */
int char_set_length; /* Length of char_set. */
int hash_value; /* Hash value for the key. */
int occurrence; /* A metric for frequency of key set
occurrences. */
*** src/options.cc.orig Thu Feb 14 21:02:42 2002
--- src/options.cc Sun Feb 24 18:37:40 2002
***************
*** 24,29 ****
--- 24,30 ----
#include "getopt.h"
#include "options.h"
#include "iterator.h"
+ #include "key-list.h"
#include "trace.h"
#include "vectors.h"
#include "version.h"
***************
*** 61,66 ****
--- 62,68 ----
int Options::option_word;
int Options::total_switches;
int Options::total_keysig_size;
+ int Options::saved_keysig_size;
int Options::size;
int Options::key_pos;
int Options::jump;
***************
*** 76,81 ****
--- 78,84 ----
const char *Options::wordlist_name;
const char *Options::delimiters;
char Options::key_positions[MAX_KEY_POS];
+ char Options::saved_keys[MAX_KEY_POS];
/* Prints program usage to given stream. */
***************
*** 83,89 ****
Options::short_usage (FILE * strm)
{
T (Trace t ("Options::short_usage");)
! fprintf (strm, "Usage: %s
[-cCdDef[num]F<initializers>GhH<hashname>i<init>Ijk<keys>K<keyname>lL<language>nN<function
name>ors<size>S<switches>tTvW<wordlistname>Z<class name>7] [input-file]\n"
"Try `%s --help' for more information.\n",
program_name, program_name);
}
--- 86,92 ----
Options::short_usage (FILE * strm)
{
T (Trace t ("Options::short_usage");)
! fprintf (strm, "Usage: %s
[-aAcCdDef[num]F<initializers>GhH<hashname>i<init>Ijk<keys>K<keyname>lL<language>nN<function
name>ors<size>S<switches>tTvW<wordlistname>Z<class name>7] [input-file]\n"
"Try `%s --help' for more information.\n",
program_name, program_name);
}
***************
*** 133,139 ****
" `Perfect_Hash'.\n"
" -7, --seven-bit Assume 7-bit characters.\n"
" -c, --compare-strncmp Generate comparison code using strncmp
rather than\n"
! " strcmp.\n"
" -C, --readonly-tables Make the contents of generated lookup
tables\n"
" constant, i.e., readonly.\n"
" -E, --enum Define constant values using an enum
local to the\n"
--- 136,143 ----
" `Perfect_Hash'.\n"
" -7, --seven-bit Assume 7-bit characters.\n"
" -c, --compare-strncmp Generate comparison code using strncmp
rather than\n"
! " strcmp, or strncasecmp rather than
strcasecmp if -A\n"
! " is specified.\n"
" -C, --readonly-tables Make the contents of generated lookup
tables\n"
" constant, i.e., readonly.\n"
" -E, --enum Define constant values using an enum
local to the\n"
***************
*** 160,165 ****
--- 164,173 ----
" Prevents the transfer of the type
declaration to the\n"
" output file. Use this option if the type
is already\n"
" defined elsewhere.\n"
+ " -A, --ignore-case Match keywords in a case-independent
manner. String\n"
+ " comparisons are made with strcasecmp
(strncasecmp if\n"
+ " -c is specified) and the associate
values are symmetric\n"
+ " with respect to lower-case and
upper-case ASCII indices.\n"
"\n"
"Algorithm employed by gperf:\n"
" -k, --key-positions=KEYS\n"
***************
*** 275,280 ****
--- 283,302 ----
}
printf (" */");
+ if (option_word & DEFAULTCHARS)
+ {
+ int i, m;
+
+ /* output comment detailing key set */
+ putchar ('\n');
+ printf ("/* %s key set: ", (option_word&FAST)?"Default":"Minimal");
+ for (i=0,m=0; m<MAX_KEY_POS; m++)
+ if (key_positions[m])
+ printf ("%s%d", i++?",":"", m+1);
+ if (option[LASTCHAR])
+ printf ("%s$", i++?",":"");
+ fputs (" */", stdout);
+ }
}
/* Sorts the key positions *IN REVERSE ORDER!!*
***************
*** 282,305 ****
Uses a simple Insertion Sort since the set is probably ordered.
Returns 1 if there are no duplicates, 0 otherwise. */
! inline int
! Options::key_sort (char *base, int len)
{
! T (Trace t ("Options::key_sort");)
! int i, j;
! for (i = 0, j = len - 1; i < j; i++)
! {
! int curr, tmp;
! for (curr = i + 1,tmp = base[curr]; curr > 0 && tmp >= base[curr - 1];
curr--)
! if ((base[curr] = base[curr - 1]) == tmp) /* oh no, a duplicate!!! */
! return 0;
! base[curr] = tmp;
! }
! return 1;
}
/* Sets the default Options. */
--- 304,431 ----
Uses a simple Insertion Sort since the set is probably ordered.
Returns 1 if there are no duplicates, 0 otherwise. */
! void
! Options::save_keys (void)
{
! T (Trace t ("Options::save_keys");)
! memcpy(saved_keys, key_positions, sizeof(saved_keys));
! saved_keysig_size = total_keysig_size;
! if (option_word & LASTCHAR)
! saved_keysig_size--;
! }
! void
! Options::restore_keys (void)
! {
! T (Trace t ("Options::restore_keys");)
! memcpy(key_positions, saved_keys, sizeof(saved_keys));
! total_keysig_size = saved_keysig_size;
! if (option_word & LASTCHAR)
! total_keysig_size++;
! }
! void
! Options::clear_keys (void)
! {
! T (Trace t ("Options::clear_keys");)
! memset(key_positions, 0, sizeof(key_positions));
! total_keysig_size = 0;
! if (option_word & LASTCHAR)
! total_keysig_size++;
! }
! void
! Options::set_keys (int len)
! {
! T (Trace t ("Options::set_keys");)
! memset(key_positions, 1, total_keysig_size = len);
! if (option_word & LASTCHAR)
! total_keysig_size++;
! }
!
! void
! Options::add_key (int key)
! {
! T (Trace t ("Options::add_key");)
! if (has_key(key))
! return;
! key_positions[key-1] = 1;
! total_keysig_size++;
! }
!
! int
! Options::has_key (int key)
! {
! T (Trace t ("Options::has_key");)
! return key_positions[key-1];
! }
!
! void
! Options::remove_key (int key)
! {
! T (Trace t ("Options::remove_key");)
! if (!has_key(key))
! return;
! key_positions[key-1] = 0;
! total_keysig_size--;
! }
!
! int
! Options::max_key (void)
! {
! T (Trace t ("Options::max_key");)
! int i;
! for (i=MAX_KEY_POS-1; i; i--)
! if (key_positions[i])
! break;
! return i+1;
! }
!
! int
! Options::worse_keyset(void)
! {
! T (Trace t ("Options::worse_keyset");)
! int m, min0, min1, x;
!
! for (min1=0; !key_positions[min1] && (min1<MAX_KEY_POS); min1++)
! ;
! if (min1 >= MAX_KEY_POS)
! return 0;
! if (!key_positions[0])
! {
! for (m=min1; m>0; m--)
! if (!key_positions[m-1])
! {
! key_positions[m-1] = 1, key_positions[min1] = 0;
! return 1;
! }
! return 0;
! }
! else
! {
! for (min0=0; key_positions[min0] && (min0<MAX_KEY_POS); min0++)
! ;
! if (min0 >= MAX_KEY_POS)
! return 0;
! if (min1 < min0)
! for (min1=min0+1; min1<MAX_KEY_POS; min1++)
! if (key_positions[min1])
! break;
! if (min1 >= MAX_KEY_POS)
! return 0;
! for (m=min1-1; (m>min0) && key_positions[m]; m--)
! ;
! key_positions[m] = 1, key_positions[min1] = 0;
! for (--m; m>0; m--)
! if (!key_positions[m])
! {
! x = key_positions[0], key_positions[m] = x, key_positions[0] = 0;
! break;
! }
! return 1;
! }
! return 0;
}
/* Sets the default Options. */
***************
*** 307,316 ****
Options::Options (void)
{
T (Trace t ("Options::Options");)
! key_positions[0] = WORD_START;
! key_positions[1] = WORD_END;
! key_positions[2] = EOS;
! total_keysig_size = 2;
delimiters = DEFAULT_DELIMITERS;
jump = DEFAULT_JUMP_VALUE;
option_word = DEFAULTCHARS | C;
--- 433,439 ----
Options::Options (void)
{
T (Trace t ("Options::Options");)
! memset(key_positions, total_keysig_size = 0, sizeof(key_positions));
delimiters = DEFAULT_DELIMITERS;
jump = DEFAULT_JUMP_VALUE;
option_word = DEFAULTCHARS | C;
***************
*** 331,337 ****
T (Trace t ("Options::~Options");)
if (option_word & DEBUG)
{
! char *ptr;
fprintf (stderr, "\ndumping Options:"
"\nDEBUG is.......: %s"
--- 454,460 ----
T (Trace t ("Options::~Options");)
if (option_word & DEBUG)
{
! int i;
fprintf (stderr, "\ndumping Options:"
"\nDEBUG is.......: %s"
***************
*** 355,360 ****
--- 478,485 ----
"\nENUM is........: %s"
"\nINCLUDE is.....: %s"
"\nSEVENBIT is....: %s"
+ "\nNOCASE is......: %s"
+ "\nLASTCHAR is....: %s"
"\niterations = %d"
"\nlookup function name = %s"
"\nhash function name = %s"
***************
*** 387,392 ****
--- 512,519 ----
option_word & ENUM ? "enabled" : "disabled",
option_word & INCLUDE ? "enabled" : "disabled",
option_word & SEVENBIT ? "enabled" : "disabled",
+ option_word & NOCASE ? "enabled" : "disabled",
+ option_word & LASTCHAR ? "enabled" : "disabled",
iterations,
function_name, hash_name, wordlist_name, key_name,
initializer_suffix, jump, size - 1, initial_asso_value,
***************
*** 397,407 ****
fprintf (stderr, "maximum keysig size = %d\nkey positions are: \n",
total_keysig_size);
! for (ptr = key_positions; *ptr != EOS; ptr++)
! if (*ptr == WORD_END)
! fprintf (stderr, "$\n");
! else
! fprintf (stderr, "%d\n", *ptr);
fprintf (stderr, "finished dumping Options\n");
}
--- 524,534 ----
fprintf (stderr, "maximum keysig size = %d\nkey positions are: \n",
total_keysig_size);
! for (i = 0; i < MAX_KEY_POS; i++)
! if (key_positions[i])
! fprintf (stderr, "%d\n", i+1);
! if (option_word & LASTCHAR)
! fprintf (stderr, "$\n");
fprintf (stderr, "finished dumping Options\n");
}
***************
*** 421,426 ****
--- 548,554 ----
{ "lookup-fn-name", required_argument, 0, 'N' },
{ "class-name", required_argument, 0, 'Z' },
{ "seven-bit", no_argument, 0, '7' },
+ { "ignore-case", no_argument, 0, 'A' },
{ "compare-strncmp", no_argument, 0, 'c' },
{ "readonly-tables", no_argument, 0, 'C' },
{ "enum", no_argument, 0, 'E' },
***************
*** 457,463 ****
while ((option_char =
getopt_long (argument_count, argument_vector,
! "adcCDe:Ef:F:gGhH:i:Ij:k:K:lL:nN:oprs:S:tTvW:Z:7",
long_options, (int *)0))
!= -1)
{
--- 585,591 ----
while ((option_char =
getopt_long (argument_count, argument_vector,
! "aAdcCDe:Ef:F:gGhH:i:Ij:k:K:lL:nN:oprs:S:tTvW:Z:7",
long_options, (int *)0))
!= -1)
{
***************
*** 465,470 ****
--- 593,603 ----
{
case 'a': /* Generated code uses the ANSI prototype
format. */
break; /* This is now the default. */
+ case 'A': /* Generate strcasecmp rather than strcmp,
make associate values case-symmetric. */
+ {
+ option_word |= NOCASE;
+ break;
+ }
case 'c': /* Generate strncmp rather than strcmp. */
{
option_word |= COMP;
***************
*** 564,572 ****
option_word = (option_word & ~DEFAULTCHARS) | ALLCHARS;
else
{
- char *key_pos;
! for (key_pos = key_positions; (value = expand ()) != EOS;
key_pos++)
if (value == BAD_VALUE)
{
fprintf (stderr, "Illegal key value or range, use
1,2,3-%d,'$' or '*'.\n",
--- 697,704 ----
option_word = (option_word & ~DEFAULTCHARS) | ALLCHARS;
else
{
! while ((value = expand ()) != EOS)
if (value == BAD_VALUE)
{
fprintf (stderr, "Illegal key value or range, use
1,2,3-%d,'$' or '*'.\n",
***************
*** 575,600 ****
exit (1);
}
else
! *key_pos = value;;
!
! *key_pos = EOS;
! if (! (total_keysig_size = (key_pos - key_positions)))
{
fprintf (stderr, "No keys selected.\n");
short_usage (stderr);
exit (1);
}
- else if (! key_sort (key_positions, total_keysig_size))
- {
- fprintf (stderr, "Duplicate keys selected\n");
- short_usage (stderr);
- exit (1);
- }
! if (total_keysig_size != 2
! || (key_positions[0] != 1 || key_positions[1] !=
WORD_END))
! option_word &= ~DEFAULTCHARS;
}
break;
}
--- 707,731 ----
exit (1);
}
else
! if (value == WORD_END)
! option_word |= LASTCHAR;
! else
! key_positions[value-1] = 1;
!
! for (value=0; value < MAX_KEY_POS; value++)
! if (key_positions[value])
! total_keysig_size++;
! if (option_word & LASTCHAR)
! total_keysig_size++;
! if (! total_keysig_size)
{
fprintf (stderr, "No keys selected.\n");
short_usage (stderr);
exit (1);
}
! option_word &= ~DEFAULTCHARS;
}
break;
}
*** src/options.h.orig Thu Feb 14 21:09:03 2002
--- src/options.h Sun Feb 24 13:57:14 2002
***************
*** 43,49 ****
ALLCHARS = 04, /* Use all characters in hash function. */
TYPE = 010, /* Handle user-defined type structured
keyword input. */
RANDOM = 020, /* Randomly initialize the associated values
table. */
! DEFAULTCHARS = 040, /* Make default char positions be 1,$ (end of
keyword). */
SWITCH = 0100, /* Generate switch output to save space. */
NOLENGTH = 0200, /* Don't include keyword length in hash
computations. */
LENTABLE = 0400, /* Generate a length table for string
comparison. */
--- 43,49 ----
ALLCHARS = 04, /* Use all characters in hash function. */
TYPE = 010, /* Handle user-defined type structured
keyword input. */
RANDOM = 020, /* Randomly initialize the associated values
table. */
! DEFAULTCHARS = 040, /* Make default char positions be optimal
set. */
SWITCH = 0100, /* Generate switch output to save space. */
NOLENGTH = 0200, /* Don't include keyword length in hash
computations. */
LENTABLE = 0400, /* Generate a length table for string
comparison. */
***************
*** 59,65 ****
CPLUSPLUS = 01000000, /* Generate C++ code: prototypes, const,
class, inline, enum. */
ENUM = 02000000, /* Use enum for constants. */
INCLUDE = 04000000, /* Generate #include statements. */
! SEVENBIT = 010000000 /* Assume 7-bit, not 8-bit, characters. */
};
/* Define some useful constants (these don't really belong here, but I'm
--- 59,67 ----
CPLUSPLUS = 01000000, /* Generate C++ code: prototypes, const,
class, inline, enum. */
ENUM = 02000000, /* Use enum for constants. */
INCLUDE = 04000000, /* Generate #include statements. */
! SEVENBIT = 010000000, /* Assume 7-bit, not 8-bit, characters. */
! NOCASE = 020000000, /* Case-independent comparisons */
! LASTCHAR = 040000000, /* Use last character of key in hash function
*/
};
/* Define some useful constants (these don't really belong here, but I'm
***************
*** 69,75 ****
enum
{
MAX_KEY_POS = 128 - 1, /* Max size of each word's key set. */
- WORD_START = 1, /* Signals the start of a word. */
WORD_END = 0, /* Signals the end of a word. */
EOS = MAX_KEY_POS /* Signals end of the key list. */
};
--- 71,76 ----
***************
*** 88,95 ****
static void print_options (void);
static void set_asso_max (int r);
static int get_asso_max (void);
- static void reset (void);
- static int get (void);
static int get_iterations (void);
static int get_max_keysig_size (void);
static void set_keysig_size (int);
--- 89,94 ----
***************
*** 103,113 ****
--- 102,123 ----
static const char *get_hash_name (void);
static const char *get_wordlist_name (void);
static const char *get_delimiter (void);
+ static void save_keys (void);
+ static void restore_keys (void);
+ static void clear_keys (void);
+ static void set_keys (int);
+ static void add_key (int);
+ static int has_key (int);
+ static void remove_key (int);
+ static void change_key (int, int);
+ static int max_key (void);
+ static int worse_keyset (void);
private:
static int option_word; /* Holds the
user-specified Options. */
static int total_switches; /* Number of switch
statements to generate. */
static int total_keysig_size; /* Total number of
distinct key_positions. */
+ static int saved_keysig_size; /* Number of
distinct key_positions in saved_keys. */
static int size; /* Range of the
hash table. */
static int key_pos; /* Tracks current
key position for Iterator. */
static int jump; /* Jump length when
trying alternative values. */
***************
*** 122,129 ****
static const char *hash_name; /* Name used for
generated hash function. */
static const char *wordlist_name; /* Name used for
hash table array. */
static const char *delimiters; /* Separates
keywords from other attributes. */
! static char key_positions[MAX_KEY_POS]; /* Contains
user-specified key choices. */
! static int key_sort (char *base, int len); /* Sorts key
positions in REVERSE order. */
static void short_usage (FILE * strm); /* Prints proper
program usage. */
static void long_usage (FILE * strm); /* Prints proper
program usage. */
};
--- 132,139 ----
static const char *hash_name; /* Name used for
generated hash function. */
static const char *wordlist_name; /* Name used for
hash table array. */
static const char *delimiters; /* Separates
keywords from other attributes. */
! static char key_positions[MAX_KEY_POS]; /* Contains
user-specified or determined minimal key set. */
! static char saved_keys[MAX_KEY_POS]; /* Temporary copy
of key set. */
static void short_usage (FILE * strm); /* Prints proper
program usage. */
static void long_usage (FILE * strm); /* Prints proper
program usage. */
};
*** src/options.icc.orig Sat Feb 23 16:20:34 2002
--- src/options.icc Sun Feb 24 16:49:52 2002
***************
*** 36,41 ****
--- 36,43 ----
{
T (Trace t ("Options::operator=");)
option_word |= opt;
+ if (opt & LASTCHAR)
+ total_keysig_size++;
}
/* Disables option OPT. */
***************
*** 44,65 ****
{
T (Trace t ("Options::operator!=");)
option_word &= ~opt;
! }
!
! /* Initializes the key Iterator. */
! INLINE void
! Options::reset (void)
! {
! T (Trace t ("Options::reset");)
! key_pos = 0;
! }
!
! /* Returns current key_position and advance index. */
! INLINE int
! Options::get (void)
! {
! T (Trace t ("Options::get");)
! return key_positions[key_pos++];
}
/* Sets the size of the table size. */
--- 46,53 ----
{
T (Trace t ("Options::operator!=");)
option_word &= ~opt;
! if (opt & LASTCHAR)
! total_keysig_size--;
}
/* Sets the size of the table size. */
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Patch to gperf 2.7.2 for determining minimal key character position set,
Bruce Lilly <=