develooper Front page | perl.perl5.porters | Postings from July 2014

[perl #122283] Possible regexp memory explosion in 5.20.0

Thread Previous | Thread Next
From:
Hugo van der Sanden
Date:
July 13, 2014 14:27
Subject:
[perl #122283] Possible regexp memory explosion in 5.20.0
Message ID:
rt-4.0.18-449-1405261635-1578.122283-75-0@perl.org
# New Ticket Created by  Hugo van der Sanden 
# Please include the string:  [perl #122283]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org/Ticket/Display.html?id=122283 >



This is a bug report for perl from hv@crypt.org,
generated with the help of perlbug 1.40 running under perl 5.20.0.


-----------------------------------------------------------------
[Please describe your issue here]

I've been experimenting with an attempt to take a SQL grammar expressed
in BNF and convert it (programmatically) into something that can parse
SQL with it as a Regexp::Grammars (v1.035) grammar.

The code below is (60%) cut down from an interim stage in that process;
this reaches about 10MB process size under perl-5.16.3; under perl-5.20.0
it grows to over 1GB. Cutting down the grammar rule by rule does gradually
reduce the memory use, but it remains a high multiple of the memory use
under perl-5.16.3, and I've not yet found any smoking gun; I've included
the full 200-odd lines here rather than risk eliding something important.

Damain and I are looking into it, but he suggested I perlbug it as a
heads-up of a possible problem in 5.20, likely of interest to davem
as potentially relating to regexp engine changes.

zen% ulimit -v # I've set a 1GB process-size limit
1000000
zen% /usr/bin/time /opt/perl-5.16.3/bin/perl ./t0 # top(1) shows peak 10MB VIRT
ok
8.52user 0.01system 0:08.54elapsed 99%CPU (0avgtext+0avgdata 34816maxresident)k
0inputs+0outputs (0major+2331minor)pagefaults 0swaps
zen% /usr/bin/time /opt/perl-5.20.0/bin/perl ./t0
Out of memory!
Command exited with non-zero status 1
41.59user 2.10system 0:43.83elapsed 99%CPU (0avgtext+0avgdata 3641344maxresident)k
0inputs+0outputs (0major+228082minor)pagefaults 0swaps
zen% cat t0
#!/opt/perl-5.20.0/bin/perl
use strict;
use warnings;
use Regexp::Grammars;

my $g = qr{
^ <query_specification> $

<rule: simple_Latin_letter> <simple_Latin_upper_case_letter> | <simple_Latin_lower_case_letter>
<token: simple_Latin_upper_case_letter> A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
<token: simple_Latin_lower_case_letter> a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z
<token: digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<token: double_quote> \"
<token: left_paren> \(
<token: right_paren> \)
<token: asterisk> \*
<token: plus_sign> \+
<token: comma> \,
<token: minus_sign> \-
<token: period> \.
<token: solidus> \/
<token: less_than_operator> \<
<token: equals_operator> \=
<token: greater_than_operator> \>
<token: question_mark> \?
<token: underscore> _
<rule: regular_identifier> <identifier_body>
<rule: identifier_body> <identifier_start> (?:<underscore> | <identifier_part>)*
<token: identifier_start> \w
<rule: identifier_part> <identifier_start> | <digit>
<rule: unsigned_integer> [0-9]+
<rule: sign> <plus_sign> | <minus_sign>
<rule: introducer> <underscore>
<rule: character_set_specification> <standard_character_repertoire_name> | <implementation_dash_defined_character_repertoire_name> | <user_dash_defined_character_repertoire_name> | <standard_universal_character_form_dash_of_dash_use_name> | <implementation_dash_defined_universal_character_form_dash_of_dash_use_name>
<rule: standard_character_repertoire_name> <character_set_name>
<rule: character_set_name> (?:<schema_name> <period>)? <SQL_language_identifier>
<rule: schema_name> (?:<catalog_name> <period>)? <unqualified_schema_name>
<rule: catalog_name> <identifier>
<rule: identifier> (?:<introducer> <character_set_specification>)? <actual_identifier>
<rule: actual_identifier> <regular_identifier>
<token: nondoublequote_character> [^"]
<rule: doublequote_symbol> <double_quote> <double_quote>
<rule: unqualified_schema_name> <identifier>
<rule: SQL_language_identifier> <SQL_language_identifier_start> (?:<underscore> | <SQL_language_identifier_part>)*
<rule: SQL_language_identifier_start> <simple_Latin_letter>
<rule: SQL_language_identifier_part> <simple_Latin_letter> | <digit>
<rule: implementation_dash_defined_character_repertoire_name> <character_set_name>
<rule: user_dash_defined_character_repertoire_name> <character_set_name>
<rule: standard_universal_character_form_dash_of_dash_use_name> <character_set_name>
<rule: implementation_dash_defined_universal_character_form_dash_of_dash_use_name> <character_set_name>
<token: not_equals_operator> \<\>
<token: greater_than_or_equals_operator> \>\=
<token: less_than_or_equals_operator> \<\=
<token: concatenation_operator> \|\|
<rule: qualified_local_table_name> MODULE <period> <local_table_name>
<rule: local_table_name> <qualified_identifier>
<rule: qualified_identifier> <identifier>
<rule: column_name> <identifier>
<rule: time_precision> <time_fractional_seconds_precision>
<rule: time_fractional_seconds_precision> <unsigned_integer>
<rule: timestamp_precision> <time_fractional_seconds_precision>
<rule: interval_qualifier> <start_field> TO <end_field> | <single_datetime_field>
<rule: start_field> <non_dash_second_datetime_field> (?:<left_paren> <interval_leading_field_precision> <right_paren>)?
<token: non_dash_second_datetime_field> YEAR | MONTH | DAY | HOUR | MINUTE
<rule: interval_leading_field_precision> <unsigned_integer>
<rule: end_field> <non_dash_second_datetime_field> | SECOND (?:<left_paren> <interval_fractional_seconds_precision> <right_paren>)?
<rule: interval_fractional_seconds_precision> <unsigned_integer>
<rule: single_datetime_field> <non_dash_second_datetime_field> (?:<left_paren> <interval_leading_field_precision> <right_paren>)? | SECOND (?:<left_paren> <interval_leading_field_precision> (?:<comma> <left_paren> <interval_fractional_seconds_precision>)? <right_paren>)?
<rule: qualified_name> (?:<schema_name> <period>)? <qualified_identifier>
<rule: datetime_value_function> <current_date_value_function> | <current_time_value_function> | <current_timestamp_value_function>
<token: current_date_value_function> CURRENT_DATE
<rule: current_time_value_function> CURRENT_TIME (?:<left_paren> <time_precision> <right_paren>)?
<rule: current_timestamp_value_function> CURRENT_TIMESTAMP (?:<left_paren> <timestamp_precision> <right_paren>)?
<rule: table_name> <qualified_name> | <qualified_local_table_name>
<rule: column_name_list> <column_name> (?:<comma> <column_name>)*
<rule: search_condition> <in_predicate>
<rule: boolean_term> <boolean_factor> (?:AND <boolean_factor>)*
<rule: boolean_factor> (?:NOT)? <boolean_test>
<rule: boolean_test> <boolean_primary> (?:IS (?:NOT)? <truth_value>)?
<rule: boolean_primary> <predicate> | <left_paren> <search_condition> <right_paren>
<rule: predicate> <comparison_predicate> | <between_predicate> | <in_predicate> | <like_predicate> | <null_predicate> | <quantified_comparison_predicate> | <exists_predicate> | <match_predicate> | <overlaps_predicate>
<rule: comparison_predicate> <row_value_constructor> <comp_op> <row_value_constructor>
<rule: row_value_constructor> <row_value_constructor_element> | <left_paren> <row_value_constructor_list> <right_paren> | <row_subquery>
<rule: row_value_constructor_element> <value_expression> | <null_specification> | <default_specification>
<rule: value_expression> <numeric_value_expression> | <string_value_expression> | <datetime_value_expression> | <interval_value_expression>
<rule: numeric_value_expression> <term> (?:<plus_sign> <term> | <minus_sign> <term>)*
<rule: term> <factor> (?:<asterisk> <factor> | <solidus> <factor>)*
<rule: factor> <sign>? <numeric_primary>
<rule: numeric_primary> <value_expression_primary> | <numeric_value_function>
<rule: value_expression_primary> <question_mark> | <column_reference>
<rule: column_reference> (?:<qualifier> <period>)? <column_name>
<rule: qualifier> <table_name> | <correlation_name>
<rule: correlation_name> <identifier>
<token: set_quantifier> DISTINCT | ALL
<rule: subquery> <left_paren> <query_expression> <right_paren>
<rule: query_expression> <non_dash_join_query_expression> | <joined_table>
<rule: non_dash_join_query_expression> (?:<non_dash_join_query_term> | <joined_table> UNION (?:ALL)? <corresponding_spec>? <query_term> | <joined_table> EXCEPT (?:ALL)? <corresponding_spec>? <query_term>) (?:UNION (?:ALL)? <corresponding_spec>? <query_term> | EXCEPT (?:ALL)? <corresponding_spec>? <query_term>)*
<rule: non_dash_join_query_term> (?:<non_dash_join_query_primary> | <joined_table> INTERSECT (?:ALL)? <corresponding_spec>? <query_primary>) (?:INTERSECT (?:ALL)? <corresponding_spec>? <query_primary>)*
<rule: non_dash_join_query_primary> <simple_table> | <left_paren> <non_dash_join_query_expression> <right_paren>
<rule: simple_table> <query_specification> | <table_value_constructor> | <explicit_table>
<rule: query_specification> SELECT <set_quantifier>? <select_list> <table_expression>
<rule: select_list> <asterisk> | <select_sublist> (?:<comma> <select_sublist>)*
<rule: select_sublist> <derived_column> | <qualifier> <period> <asterisk>
<rule: derived_column> <value_expression> <as_clause>?
<rule: as_clause> (?:AS)? <column_name>
<rule: table_expression> <from_clause> <where_clause>?
<rule: from_clause> FROM <table_reference> (?:<comma> <table_reference>)*
<rule: table_reference> <table_name>
<rule: table_subquery> <subquery>
<rule: joined_table> <cross_join> | <qualified_join> | <left_paren> <joined_table> <right_paren>
<rule: cross_join> <table_reference> CROSS JOIN <table_reference>
<rule: qualified_join> <table_reference> (?:NATURAL)? <join_type>? JOIN <table_reference> <join_specification>?
<rule: join_type> INNER | <outer_join_type> (?:OUTER)? | UNION
<token: outer_join_type> LEFT | RIGHT | FULL
<rule: join_specification> <join_condition> | <named_columns_join>
<rule: join_condition> ON <search_condition>
<rule: named_columns_join> USING <left_paren> <join_column_list> <right_paren>
<rule: join_column_list> <column_name_list>
<rule: where_clause> WHERE <search_condition>
<rule: collate_clause> COLLATE <collation_name>
<rule: collation_name> <qualified_name>
<rule: table_value_constructor> VALUES <table_value_constructor_list>
<rule: table_value_constructor_list> <row_value_constructor> (?:<comma> <row_value_constructor>)*
<rule: explicit_table> TABLE <table_name>
<rule: query_term> <non_dash_join_query_term> | <joined_table>
<rule: corresponding_spec> CORRESPONDING (?:BY <left_paren> <corresponding_column_list> <right_paren>)?
<rule: corresponding_column_list> <column_name_list>
<rule: query_primary> <non_dash_join_query_primary> | <joined_table>
<rule: numeric_value_function> <position_expression> | <extract_expression> | <length_expression>
<rule: position_expression> POSITION <left_paren> <character_value_expression> IN <character_value_expression> <right_paren>
<rule: character_value_expression> <character_factor> (?:<concatenation_operator> <character_factor>)*
<rule: concatenation> <character_value_expression> <concatenation_operator> <character_factor>
<rule: character_factor> <character_primary> <collate_clause>?
<rule: character_primary> <value_expression_primary> | <string_value_function>
<rule: string_value_function> <character_value_function> | <bit_value_function>
<rule: character_value_function> <character_substring_function> | <fold> | <form_dash_of_dash_use_conversion> | <character_translation> | <trim_function>
<rule: character_substring_function> SUBSTRING <left_paren> <character_value_expression> FROM <start_position> (?:FOR <string_length>)? <right_paren>
<rule: start_position> <numeric_value_expression>
<rule: string_length> <numeric_value_expression>
<rule: fold> (?:UPPER | LOWER) <left_paren> <character_value_expression> <right_paren>
<rule: form_dash_of_dash_use_conversion> CONVERT <left_paren> <character_value_expression> USING <form_dash_of_dash_use_conversion_name> <right_paren>
<rule: form_dash_of_dash_use_conversion_name> <qualified_name>
<rule: character_translation> TRANSLATE <left_paren> <character_value_expression> USING <translation_name> <right_paren>
<rule: translation_name> <qualified_name>
<rule: trim_function> TRIM <left_paren> <trim_operands> <right_paren>
<rule: trim_operands> (?:<trim_specification>? <trim_character>? FROM)? <trim_source>
<token: trim_specification> LEADING | TRAILING | BOTH
<rule: trim_character> <character_value_expression>
<rule: trim_source> <character_value_expression>
<rule: bit_value_function> <bit_substring_function>
<rule: bit_substring_function> SUBSTRING <left_paren> <bit_value_expression> FROM <start_position> (?:FOR <string_length>)? <right_paren>
<rule: bit_value_expression> <bit_factor> (?:<concatenation_operator> <bit_factor>)*
<rule: bit_concatenation> <bit_value_expression> <concatenation_operator> <bit_factor>
<rule: bit_factor> <bit_primary>
<rule: bit_primary> <value_expression_primary> | <string_value_function>
<rule: extract_expression> EXTRACT <left_paren> <extract_field> FROM <extract_source> <right_paren>
<rule: extract_field> <datetime_field> | <time_zone_field>
<rule: datetime_field> <non_dash_second_datetime_field> | SECOND
<token: time_zone_field> TIMEZONE_HOUR | TIMEZONE_MINUTE
<rule: extract_source> <datetime_value_expression> | <interval_value_expression>
<rule: datetime_value_expression> (?:<datetime_term> | <interval_value_expression> <plus_sign> <datetime_term>) (?:<plus_sign> <interval_term> | <minus_sign> <interval_term>)*
<rule: interval_term> (?:<interval_factor> | <term> <asterisk> <interval_factor>) (?:<asterisk> <factor> | <solidus> <factor>)*
<rule: interval_factor> <sign>? <interval_primary>
<rule: interval_primary> <value_expression_primary> <interval_qualifier>?
<rule: interval_term_2> <interval_term>
<rule: interval_value_expression> (?:<interval_term> | <left_paren> <datetime_value_expression> <minus_sign> <datetime_term> <right_paren> <interval_qualifier>) (?:<plus_sign> <interval_term_1> | <minus_sign> <interval_term_1>)*
<rule: interval_value_expression_1> <interval_value_expression>
<rule: interval_term_1> <interval_term>
<rule: datetime_term> <datetime_factor>
<rule: datetime_factor> <datetime_primary> <time_zone>?
<rule: datetime_primary> <value_expression_primary> | <datetime_value_function>
<rule: time_zone> AT <time_zone_specifier>
<rule: time_zone_specifier> LOCAL | TIME ZONE <interval_value_expression>
<rule: length_expression> <char_length_expression> | <octet_length_expression> | <bit_length_expression>
<rule: char_length_expression> (?:CHAR_LENGTH | CHARACTER_LENGTH) <left_paren> <string_value_expression> <right_paren>
<rule: string_value_expression> <character_value_expression> | <bit_value_expression>
<rule: octet_length_expression> OCTET_LENGTH <left_paren> <string_value_expression> <right_paren>
<rule: bit_length_expression> BIT_LENGTH <left_paren> <string_value_expression> <right_paren>
<token: null_specification> NULL
<token: default_specification> DEFAULT
<rule: row_value_constructor_list> <row_value_constructor_element> (?:<comma> <row_value_constructor_element>)*
<rule: row_subquery> <subquery>
<rule: comp_op> <equals_operator> | <not_equals_operator> | <less_than_operator> | <greater_than_operator> | <less_than_or_equals_operator> | <greater_than_or_equals_operator>
<rule: between_predicate> <row_value_constructor> (?:NOT)? BETWEEN <row_value_constructor> AND <row_value_constructor>
<rule: in_predicate> <row_value_constructor> (?:NOT)? IN <in_predicate_value>
<rule: in_predicate_value> <left_paren> <in_value_list> <right_paren>
<rule: in_value_list> <question_mark> (?:<comma> <question_mark>)+
<rule: like_predicate> <match_value> (?:NOT)? LIKE <pattern> (?:ESCAPE <escape_character>)?
<rule: match_value> <character_value_expression>
<rule: pattern> <character_value_expression>
<rule: escape_character> <character_value_expression>
<token: null_predicate> IS (?:NOT)? NULL
<rule: quantified_comparison_predicate> <row_value_constructor> <comp_op> <quantifier> <table_subquery>
<rule: quantifier> <all> | <some>
<token: all> ALL
<token: some> SOME | ANY
<rule: exists_predicate> EXISTS <table_subquery>
<rule: match_predicate> <row_value_constructor> MATCH (?:UNIQUE)? (?:PARTIAL | FULL)? <table_subquery>
<rule: overlaps_predicate> <row_value_constructor_1> OVERLAPS <row_value_constructor_2>
<rule: row_value_constructor_1> <row_value_constructor>
<rule: row_value_constructor_2> <row_value_constructor>
<token: truth_value> TRUE | FALSE | UNKNOWN

}x;
print "ok\n";
zen% 

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=medium
---
Site configuration information for perl 5.20.0:

Configured by hv at Thu Jun 26 22:38:47 BST 2014.

Summary of my perl5 (revision 5 version 20 subversion 0) configuration:
   
  Platform:
    osname=linux, osvers=2.6.32-46-generic, archname=i686-linux
    uname='linux shad 2.6.32-46-generic #108-ubuntu smp thu apr 11 15:55:01 utc 2013 i686 gnulinux '
    config_args='-des -Dprefix=/opt/perl-5.20.0 -Doptimize=-g -O6 -Dusedevel -Uversiononly'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-g -O6',
    cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.4.3', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/i486-linux-gnu/4.4.3/include-fixed /usr/include/i486-linux-gnu /usr/lib /lib/../lib /usr/lib/../lib /lib /usr/lib/i486-linux-gnu /usr/lib64
    libs=-lnsl -ldb -ldl -lm -lcrypt -lutil -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.11.1.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.11.1'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -g -O6 -L/usr/local/lib -fstack-protector'


---
@INC for perl 5.20.0:
    /opt/perl-5.20.0/lib/site_perl/5.20.0/i686-linux
    /opt/perl-5.20.0/lib/site_perl/5.20.0
    /opt/perl-5.20.0/lib/5.20.0/i686-linux
    /opt/perl-5.20.0/lib/5.20.0
    .

---
Environment for perl 5.20.0:
    HOME=/home/hv
    LANG=C
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/hv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
    PERL_BADLANG (unset)
    SHELL=/bin/bash


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About