Cannot destroy semantic value of $undefined token

bug-bison
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Cannot destroy semantic value of $undefined token - unfreed memory

From:	Tim Reid
Subject:	Cannot destroy semantic value of $undefined token - unfreed memory
Date:	Sat, 19 Sep 2015 21:16:45 +0100
When a valid token causes a syntax error, any appropriate destructors
are called for the current lookahead as well as those symbols on the
stack. This allows memory to be tidied up, rather than simply lost.

However, when the lexer passes a token which is unrecognised in the
grammar, it is translated internally (by macro YYTRANSLATE) to a token
of type $undefined. A syntax error then happens, as before, but when the
lookahead token is passed to yydestruct(), nothing happens, as "default
%destructors only for user-defined as opposed to Bison-defined symbols"
and there seems to be no way to specify a destructor specifically for
$undefined. This means that there is no opportunity to destroy the
semantic value of the token passed in.

As a simple example, consider the following example.y. The grammar
attempts to match the value "ab":

============================================================================
%{

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define YYDEBUG 1

#define YYSTYPE struct semantic_value*
struct semantic_value {
  int value;
};

YYSTYPE new_semantic_value (int);
void destroy_semantic_value (YYSTYPE);

int yylex (void);
int yyerror (const char *);
int yydebug;

%}

%destructor { destroy_semantic_value($$); } <>

%%
z : 'a' 'b' { destroy_semantic_value($1); destroy_semantic_value($2); $$ =
new_semantic_value('z'); };
%%

YYSTYPE new_semantic_value (int c) {
  YYSTYPE p = malloc(sizeof *p);
  p->value = c;
  printf("New semantic value created for '%c'\n", c);

  return p;
}

void destroy_semantic_value (YYSTYPE p) {
  printf("Destructor called for '%c'\n", p->value);
  free(p);
}

int yyerror (const char *s) {
  printf("Bison error: %s\n", s);
  return 1;
}

const char *p;
extern YYSTYPE yylval;

int yylex (void) {
  int c;

  if (!*p)
    return 0;

  c = *(p++);
  yylval = new_semantic_value(c);

  return c;
}

int main (int argc, char *argv[]) {
  int errors = 0;
  int i = 1;

  if (argc > i &&
      strcmp(argv[i], "--trace") == 0) {
    yydebug = 1;
    i++;
  }

  while (i<argc) {
    int rc;

    printf("Parsing: %s\n", argv[i]);
    p = argv[i];
    rc = yyparse();

    if (rc == 0) {
      printf("OK\n\n");
    } else {
      errors++;
      printf("Syntax error\n\n");
    }
    i++;
  }

  return errors > 0 ? 1 : 0;
}
============================================================================

When parsing input "ab", everything goes as planned, the destructor is
called explicitly for 'a' and for 'b' in the rule, and Bison calls the
destructor for 'z' once the input is accepted:

============================================================================
$ bison -o example.c example.y
$ gcc -W -Wall -o example example.c -ly
$ ./example ab
Parsing: ab
New semantic value created for 'a'
New semantic value created for 'b'
Destructor called for 'a'
Destructor called for 'b'
New semantic value created for 'z'
Destructor called for 'z'
OK

============================================================================

When parsing input "aa", a syntax error occurs, and the 'a' lookahead
and the 'a' on the stack are both destroyed correctly:

============================================================================
$ ./example aa
Parsing: aa
New semantic value created for 'a'
New semantic value created for 'a'
Bison error: syntax error
Destructor called for 'a'
Destructor called for 'a'
Syntax error

============================================================================

However, when parsing "ac", a syntax error occurs, and only the 'a'
on the stack is destroyed. The 'c' in the lookahead is not present in
the grammar, so it becomes $undefined, and no destructor is called. This
results in memory being left unfreed.

============================================================================
$ ./example ac
Parsing: ac
New semantic value created for 'a'
New semantic value created for 'c'
Bison error: syntax error
Destructor called for 'a'
Syntax error

$ valgrind ./example ac
==24504== Memcheck, a memory error detector
==24504== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==24504== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==24504== Command: ./example ac
==24504==
Parsing: ac
New semantic value created for 'a'
New semantic value created for 'c'
Bison error: syntax error
Destructor called for 'a'
Syntax error

==24504==
==24504== HEAP SUMMARY:
==24504==     in use at exit: 4 bytes in 1 blocks
==24504==   total heap usage: 2 allocs, 1 frees, 8 bytes allocated
==24504==
==24504== LEAK SUMMARY:
==24504==    definitely lost: 0 bytes in 0 blocks
==24504==    indirectly lost: 0 bytes in 0 blocks
==24504==      possibly lost: 0 bytes in 0 blocks
==24504==    still reachable: 4 bytes in 1 blocks
==24504==         suppressed: 0 bytes in 0 blocks
==24504== Rerun with --leak-check=full to see details of leaked memory
==24504==
==24504== For counts of detected and suppressed errors, rerun with: -v
==24504== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
============================================================================

This problem can be avoided by ensuring that the lexer does not pass
tokens to Bison which are not part of the grammar. This way, no tokens
are translated to $undefined, and appropriate destructors can take care
of freeing memory. The problem with this approach, however, is that the
lexer needs to know exactly which tokens are acceptable to the grammar.
As the grammar is developed, the lexer needs to be kept in lock-step with
those changes so that it is always able to generate only those tokens
which are in the grammar. While possible, this represents additional
effort, requires the lexer to be specific to the grammar, and defeats
the whole point of the $undefined token.

Possible ways to address this problem include:

  * adding an optional mechanism to allow $undefined tokens to be
      destroyed, or
  * detecting that a token will be translated as $undefined and calling
      the appropriate destructors *before* the translation happens.

In the absence of a solution, it seems as if memory leaks can be avoided
only by ensuring that the lexer does not generate tokens which will be
translated to $undefined.

Tim Reid.
[Prev in Thread]
Current Thread
[Next in Thread]
Cannot destroy semantic value of $undefined token - unfreed memory, Tim Reid <=
Prev by Date: Good regression test case for Bison (and Flex).
Previous by thread: Good regression test case for Bison (and Flex).
Index(es):
- Date
- Thread