This fixes a false positive warning seen with LTO:
12/bits/regex_compiler.tcc:443:32: error: '__last_char._M_char' may be used uninitialized [-Werror=maybe-uninitialized]
Given that the std::regex code is not very efficient anyway, the
overhead of initializing this byte should be minimal.
libstdc++-v3/ChangeLog:
PR middle-end/103984
* include/bits/regex_compiler.h (_BracketMatcher::_M_char): Use
default member initializer.
This moves the last two template parameters of __regex_algo_impl to be
runtime function parameters instead, so that we don't need four
different instantiations for the possible ways to call it. Most of the
function (and what it instantiates) is the same in all cases, so making
them compile-time choices doesn't really have much benefit.
Use 'if constexpr' for conditions that check template parameters, so
that when we do depend on a compile-time condition we only instantiate
what we need to.
libstdc++-v3/ChangeLog:
* include/bits/regex.h (__regex_algo_impl): Change __policy and
__match_mode template parameters to be function parameters.
(regex_match, regex_search): Pass policy and match mode as
function arguments.
* include/bits/regex.tcc (__regex_algo_impl): Change template
parameters to function parameters.
* include/bits/regex_compiler.h (_RegexTranslatorBase): Use
'if constexpr' for conditions using template parameters.
(_RegexTranslator): Likewise.
* include/bits/regex_executor.tcc (_Executor::_M_handle_accept):
Likewise.
* testsuite/util/testsuite_regex.h (regex_match_debug)
(regex_search_debug): Move template arguments to function
arguments.
std::regex currently allows invalid bracket ranges such as [\w-a] which
are only allowed by ECMAScript when in web browser compatibility mode.
It should be an error, because the start of the range is a character
class, not a single character. The current implementation of
_Compiler::_M_expression_term does not provide a way to reject this,
because we only remember a previous character, not whether we just
processed a character class (or collating symbol etc.)
This patch replaces the pair<bool, CharT> used to emulate
optional<CharT> with a custom class closer to pair<tribool,CharT>. That
allows us to track three states, so that we can tell when we've just
seen a character class.
With this additional state the code in _M_expression_term for processing
the _S_token_bracket_dash can be improved to correctly reject the [\w-a]
case, without regressing for valid cases such as [\w-] and [----].
libstdc++-v3/ChangeLog:
PR libstdc++/102447
* include/bits/regex_compiler.h (_Compiler::_BracketState): New
class.
(_Compiler::_BrackeyMatcher): New alias template.
(_Compiler::_M_expression_term): Change pair<bool, CharT>
parameter to _BracketState. Process first character for
ECMAScript syntax as well as POSIX.
* include/bits/regex_compiler.tcc
(_Compiler::_M_insert_bracket_matcher): Pass _BracketState.
(_Compiler::_M_expression_term): Use _BracketState to store
state between calls. Improve handling of dashes in ranges.
* testsuite/28_regex/algorithms/regex_match/cstring_bracket_01.cc:
Add more tests for ranges containing dashes. Check invalid
ranges with character class at the beginning.
This change is inspired by the suggestion in
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1715r0.html
The new std::__conditional_t alias template is functionally equivalent
to std::conditional_t but should be more efficient to compile, due to
only ever instantiating two specializations (std::__conditional<true>
and std::__conditional<false>) rather than a new specialization for
every use of std::conditional.
The new alias template is also available in C++11, unlike the C++14
std::conditional_t alias.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/std/type_traits (__conditional): New class template
for internal uses of std::conditional.
(__conditional_t): New alias template to replace conditional_t.
(__and_, __or_, __result_of_memfun, __result_of_memobj): Use
__conditional_t instead of conditional::type.
* include/bits/atomic_base.h (__atomic_impl::_Diff): Likewise.
* include/bits/hashtable.h (_Hashtable): Likewise.
* include/bits/hashtable_policy.h (_Node_iterator, _Insert_base)
(_Local_iterator): Likewise. Replace typedefs with
using-declarations.
* include/bits/move.h (move_if_noexcept): Use __conditional_t.
* include/bits/parse_numbers.h (_Select_int_base): Likewise.
* include/bits/ptr_traits.h (__make_not_void): Likewise.
* include/bits/ranges_algobase.h (__copy_or_move_backward)
(__copy_or_move): Likewise.
* include/bits/ranges_base.h (borrowed_iterator_t): Likewise.
* include/bits/ranges_util.h (borrowed_subrange_t): Likewise.
* include/bits/regex_compiler.h (_BracketMatcher): Use
__conditional_t. Replace typedefs with using-declarations.
* include/bits/shared_ptr_base.h (__shared_count): Use
__conditional_t.
* include/bits/stl_algobase.h (__copy_move, __copy_move_backward):
Likewise.
* include/bits/stl_iterator.h (__detail::__clamp_iter_cat)
(reverse_iterator::iterator_concept)
(__make_move_if_noexcept_iterator)
(iterator_traits<common_iterator<_It, _Sent>>)
(iterator_traits<counted_iterator<_It>>): Likewise.
* include/bits/stl_pair.h (_PCC, pair::operator=): Likewise.
* include/bits/stl_tree.h (_Rb_tree::insert_return_type)
(_Rb_tree::_M_clone_node): Likewise.
* include/bits/unique_ptr.h (unique_ptr(unique_ptr<U,E>&&)):
Likewise.
* include/bits/uses_allocator.h (__uses_alloc): Likewise.
(__is_uses_allocator_predicate): Likewise.
* include/debug/functions.h (__foreign_iterator_aux2): Likewise.
* include/experimental/any (any::_Manager, __any_caster):
Likewise.
* include/experimental/executor (async_completion): Likewise.
* include/experimental/functional (__boyer_moore_base_t):
Likewise.
* include/std/any (any::_Manager): Likewise.
* include/std/functional (__boyer_moore_base_t): Likewise.
* include/std/ranges (borrowed_iterator_t)
(borrowed_subrange_t, __detail::__maybe_present_t)
(__detail::__maybe_const_t, split_view): Likewise.
* include/std/tuple (__empty_not_final, tuple::operator=):
Likewise.
* include/std/variant (__detail::__variant::__get_t): Likewise.
The standard says that it is invalid for more than one grammar element
to be set in a value of type regex_constants::syntax_option_type. This
adds a check in the regex compiler andthrows an exception if an invalid
value is used.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/regex_compiler.h (_Compiler::_S_validate): New
function.
* include/bits/regex_compiler.tcc (_Compiler::_Compiler): Use
_S_validate to check flags.
* include/bits/regex_error.h (_S_grammar): New error code for
internal use.
* testsuite/28_regex/basic_regex/ctors/grammar.cc: New test.
Introduce a new _M_compile function which does the common work needed by
all constructors and assignment. Call that directly to avoid multiple
levels of constructor delegation or calls to basic_regex::assign
overloads.
For assignment, there is no need to construct a std::basic_string if we
already have a contiguous sequence of the correct character type, and no
need to construct a temporary basic_regex when assigning from an
existing basic_regex.
Also define the copy and move assignment operators as defaulted, which
does the right thing without constructing a temporary and swapping it.
Copying or moving the shared_ptr member cannot fail, so they can be
noexcept. The assign(const basic_regex&) and assign(basic_regex&&)
member can then be defined in terms of copy or move assignment.
The new _M_compile function takes pointer arguments, so the caller has
to convert arbitrary iterator ranges into a contiguous sequence of
characters. With that simplification, the __compile_nfa helpers are not
needed and can be removed.
This also fixes a bug where construction from a contiguous sequence with
the wrong character type would fail to compile, rather than converting
the elements to the regex character type.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/regex.h (__detail::__is_contiguous_iter): Move
here from <bits/regex_compiler.h>.
(basic_regex::_M_compile): New function to compile an NFA from
a regular expression string.
(basic_regex::basic_regex): Use _M_compile instead of delegating
to other constructors.
(basic_regex::operator=(const basic_regex&)): Define as
defaulted.
(basic_regex::operator=(initializer_list<C>)): Use _M_compile.
(basic_regex::assign(const basic_regex&)): Use copy assignment.
(basic_regex::assign(basic_regex&&)): Use move assignment.
(basic_regex::assign(const C*, flag_type)): Use _M_compile
instead of constructing a temporary string.
(basic_regex::assign(const C*, size_t, flag_type)): Likewise.
(basic_regex::assign(const basic_string<C,T,A>&, flag_type)):
Use _M_compile instead of constructing a temporary basic_regex.
(basic_regex::assign(InputIter, InputIter, flag_type)): Avoid
constructing a temporary string for contiguous iterators of the
right value type.
* include/bits/regex_compiler.h (__is_contiguous_iter): Move to
<bits/regex.h>.
(__enable_if_contiguous_iter, __disable_if_contiguous_iter)
(__compile_nfa): Remove.
* testsuite/28_regex/basic_regex/assign/exception_safety.cc: New
test.
* testsuite/28_regex/basic_regex/ctors/char/other.cc: New test.
There is no benefit to using _SizeT instead of size_t, and IterT tells
you less about the type than const _CharT*. This removes some unhelpful
typedefs.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/regex_automaton.h (_NFA_base::_SizeT): Remove.
* include/bits/regex_compiler.h (_Compiler::_IterT): Remove.
* include/bits/regex_compiler.tcc: Likewise.
* include/bits/regex_scanner.h (_Scanner::_IterT): Remove.
* include/bits/regex_scanner.tcc: Likewise.
The std::regex code uses std::map and std::vector, which means that when
_GLIBCXX_DEBUG is defined it uses the debug versions of those
containers. That no longer compiles, because I changed <regex> to
include <bits/stl_map.h> and <bits/stl_vector.h> instead of <map> and
<vector>, so the debug versions aren't defined, and std::map doesn't
compile. There is also a use of std::stack, which defaults to std::deque
which is the debug deque when _GLIBCXX_DEBUG is defined.
Using std::map, std::vector, and std::deque is probably a mistake, and
we should qualify them with _GLIBCXX_STD_C instead so that the debug
versions aren't used. We do not need the overhead of checking our own
uses of those containers, which should be correct anyway. The exception
is the vector base class of std::match_results, which exposes iterators
to users, so can benefit from debug mode checks for its iterators. For
other accesses to the vector elements, match_results already does its
own checks, so can access the _GLIBCXX_STD_C::vector base class
directly.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/regex.h (basic_regex::transform_primary): Use
_GLIBCXX_STD_C::vector for local variable.
* include/bits/regex.tcc (__regex_algo_impl): Use reference to
_GLIBCXX_STD_C::vector base class of match_results.
* include/bits/regex_automaton.tcc (_StateSeq:_M_clone): Use
_GLIBCXX_STD_C::map and _GLIBCXX_STD_C::deque for local
variables.
* include/bits/regex_compiler.h (_BracketMatcher): Use
_GLIBCXX_STD_C::vector for data members.
* include/bits/regex_executor.h (_Executor): Likewise.
* include/std/regex [_GLIBCXX_DEBUG]: Include <debug/vector>.
Avoid creation of unnecessary basic_string objects by using a simplified
string_view type and performing comparisons on that type instead. A
temporary basic_string object is still used when the sub_match's
iterators are not contiguous, in order to get an object that the
__string_view can reference.
* include/bits/regex.h (sub_match::operator string_type): Call str().
(sub_match::compare): Use _M_str() instead of str().
(sub_match::_M_compare): New public function.
(sub_match::__string_view): New helper type.
(sub_match::_M_str): New overloaded functions to avoid creating a
string_type object when not needed.
(operator==, operator!=, operator<, operator>, operator<=, operator>=):
Use sub_match::_M_compare instead of creating string_type objects.
Fix Doxygen comments.
* include/bits/regex_compiler.h (__has_contiguous_iter): Remove.
(__is_contiguous_normal_iter): Rename to __is_contiguous_iter and
simplify.
(__enable_if_contiguous_iter, __disable_if_contiguous_iter): Use
__enable_if_t.
* include/std/type_traits (__enable_if_t): Define for C++11.
* testsuite/28_regex/sub_match/compare.cc: New.
* testsuite/util/testsuite_iterators.h (remove_cv): Add transformation
trait.
(input_iterator_wrapper): Use remove_cv for value_type argument of
std::iterator base class.
From-SVN: r262318
PR libstdc++/81002
* include/bits/regex.h (basic_regex): Adjust call to __compile_nfa
so iterator type is deduced.
* include/bits/regex_compiler.h (__compile_nfa): Reorder template
parameters to allow iterator type to be deduced.
* testsuite/28_regex/basic_regex/ctors/basic/iter.cc: New.
From-SVN: r248989
PR libstdc++/71500
* include/bits/regex.h (basic_regex::basic_regex): Use ECMAScript
when the syntax is not specified.
* include/bits/regex_compiler.h (_RegexTranslator,
_RegexTranslatorBase): Partially support icase in ranges.
* include/bits/regex_compiler.tcc (_BracketMatcher::_M_apply):
Refactor _M_apply to make the control flow easier to follow, and
call _M_translator._M_match_range as added previously.
* testsuite/28_regex/traits/char/icase.cc: Add new tests.
* testsuite/28_regex/traits/char/user_defined.cc: Add new tests.
From-SVN: r243093
PR libstdc++/67015
* include/bits/regex_compiler.h (_Compiler<>::_M_expression_term,
_BracketMatcher<>::_M_add_collating_element): Change signature
to make checking the and of bracket expression easier.
* include/bits/regex_compiler.tcc (_Compiler<>::_M_expression_term):
Treat '-' as a valid literal if it's at the end of bracket expression.
* testsuite/28_regex/algorithms/regex_match/cstring_bracket_01.cc:
New testcases.
From-SVN: r226336
PR libstdc++/64584
PR libstdc++/64585
* include/bits/regex.h (basic_regex<>::basic_regex,
basic_regex<>::assign, basic_regex<>::imbue,
basic_regex<>::swap, basic_regex<>::mark_count): Drop NFA after
imbuing basic_regex; Make assign() transactional against exception.
* include/bits/regex_compiler.h (__compile_nfa<>): Add back
__compile_nfa SFINAE.
* include/std/regex: Adjust include order to avoid __compile_nfa
forward declaration.
* testsuite/28_regex/basic_regex/assign/char/string.cc: New testcase.
* testsuite/28_regex/basic_regex/imbue/string.cc: New testcase.
From-SVN: r219865
* include/bits/regex_compiler.h (__detail::_BracketMatcher): Reorder
members to avoid wasted space when not using a cache.
(__detail::_BracketMatcher::_M_ready()): Sort and deduplicate set.
* include/bits/regex_compiler.tcc
(__detail::_BracketMatcher::_M_apply(_CharT, false_type)): Use binary
search on set.
* include/bits/regex_executor.h (__detail::_Executor::_Match_mode):
New enumeration type to indicate match mode.
(__detail::_Executor::_State_info): New type holding members only
needed in BFS-mode. Replace unique_ptr<vector<bool>> with
unique_ptr<bool[]>.
(__detail::_Executor::_M_rep_once_more, __detail::_Executor::_M_dfs):
Replace template parameter with run-time function parameter.
(__detail::_Executor::_M_main): Likewise. Dispatch to ...
(__detail::_Executor::_M_main_dispatch): New overloaded functions to
implement DFS and BFS mode.
* include/bits/regex_executor.tcc (__detail::_Executor::_M_main):
Split implementation into ...
(__detail::_Executor::_M_main_dispatch): New overloaded functions.
(__detail::_Executor::_M_lookahead): Create nested executor on stack.
(__detail::_Executor::_M_rep_once_more): Pass match mode as function
argument instead of template argument.
(__detail::_Executor::_M_dfs): Likewise.
* include/bits/regex_scanner.tcc: Fix typos in comments.
* testsuite/performance/28_regex/range.cc: New.
From-SVN: r211143
2014-04-27 Tim Shen <timshen91@gmail.com>
* include/bits/regex_automaton.h (_NFA<>::_M_insert_repeat):
Add _S_opcode_repeat support to distingush a loop from
_S_opcode_alternative.
* include/bits/regex_automaton.tcc (_State_base::_M_print,
_State_base::_M_dot, _NFA<>::_M_eliminate_dummy,
_StateSeq<>::_M_clone): Likewise.
* include/bits/regex_compiler.tcc (_Compiler<>::_M_quantifier):
Likewise.
* include/bits/regex_executor.tcc (_Executor<>::_M_dfs): Likewise.
* include/bits/regex_scanner.tcc (_Scanner<>::_M_eat_escape_ecma):
Uglify local variable __i.
* include/bits/regex_compiler.h (_BracketMatcher<>::_M_make_cache):
Use size_t instead of int to compare with vector::size().
2014-04-27 Tim Shen <timshen91@gmail.com>
* include/bits/regex_executor.h: Add _M_rep_count to track how
many times this repeat node are visited.
* include/bits/regex_executor.tcc (_Executor<>::_M_rep_once_more,
_Executor<>::_M_dfs): Use _M_rep_count to prevent entering
infinite loop.
2014-04-27 Tim Shen <timshen91@gmail.com>
* include/bits/regex.tcc (__regex_algo_impl<>): Remove
_GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT and use
_GLIBCXX_REGEX_USE_THOMPSON_NFA instead.
* include/bits/regex_automaton.h: Remove quantifier counting variable.
* include/bits/regex_automaton.tcc (_State_base::_M_dot):
Adjust debug NFA dump.
From-SVN: r209844
2014-01-17 Tim Shen <timshen91@gmail.com>
* include/bits/regex_automaton.tcc (_StateSeq<>::_M_clone()): Do not
use std::map.
* include/bits/regex_automaton.h: Do not use std::set.
* include/bits/regex_compiler.h (_BracketMatcher<>::_M_add_char(),
_BracketMatcher<>::_M_add_collating_element(),
_BracketMatcher<>::_M_add_equivalence_class(),
_BracketMatcher<>::_M_make_range()): Likewise.
* include/bits/regex_compiler.tcc (_BracketMatcher<>::_M_apply()):
Likewise.
* include/bits/regex_executor.h: Do not use std::queue.
* include/bits/regex_executor.tcc (_Executor<>::_M_main(),
_Executor<>::_M_dfs()): Likewise.
* include/std/regex: Remove <map>, <set> and <queue>.
2014-01-17 Tim Shen <timshen91@gmail.com>
* include/bits/regex.h (__compile_nfa<>(), basic_regex<>::basic_regex(),
basic_regex<>::assign()): Change __compile_nfa to accept
const _CharT* only.
* include/bits/regex_compiler.h: Change _Compiler's template
argument from <_FwdIter, _TraitsT> to <_TraitsT>.
* include/bits/regex_compiler.tcc: Likewise.
2014-01-17 Tim Shen <timshen91@gmail.com>
* include/bits/regex_compiler.h: Change _ScannerT into char-type
templated.
* include/bits/regex_scanner.h (_Scanner<>::_Scanner()): Separate
_ScannerBase from _Scanner; Change _Scanner's template argument from
_FwdIter to _CharT. Avoid use of std::map and std::set by using arrays
instead.
* include/bits/regex_scanner.tcc (_Scanner<>::_Scanner(),
_Scanner<>::_M_scan_normal(), _Scanner<>::_M_eat_escape_ecma(),
_Scanner<>::_M_eat_escape_posix(), _Scanner<>::_M_eat_escape_awk()):
Likewise.
* include/std/regex: Add <cstring> for using strchr.
2014-01-17 Tim Shen <timshen91@gmail.com>
* bits/regex_automaton.tcc: Indentation fix.
* bits/regex_compiler.h (__compile_nfa<>(), _Compiler<>,
_RegexTranslator<> _AnyMatcher<>, _CharMatcher<>,
_BracketMatcher<>): Add bool option template parameters and
specializations to make matching more efficient and space saving.
* bits/regex_compiler.tcc: Likewise.
From-SVN: r206690
* include/bits/regex_compiler.h (__detail::__compile_nfa): Overload
so that std::basic_string<C> and std::vector<C> iterators dispatch to
the const C* compiler.
From-SVN: r204574
* include/bits/regex_automaton.h (__detail::_State): Split
non-dependent parts into new _State_base.
(__detail::_NFA): Likewise for _NFA_base. Use std::move() to avoid
copies when inserting _MatcherT and _StateT objects.
* include/bits/regex_automaton.tcc: Move member definitions to base
class. Qualify dependent names.
* include/bits/regex_compiler.h (__detail::_Compiler::_M_get_nfa): Make
non-const and use std::move to avoid copying.
* include/bits/regex_compiler.tcc: Likewise.
* include/bits/regex_executor.h (__detail::_Executor::_M_is_word): Use
array, so past-the-end iterator is valid.
From-SVN: r204571
2013-09-24 Tim Shen <timshen91@gmail.com>
* include/Makefile.am: Add regex.tcc.
* include/Makefile.in: Regenerate.
* include/bits/regex.h: Remove definitions to regex.tcc.
* include/bits/regex.tcc: New.
(match_results::format, regex_replace): Implement;
* include/bits/regex_compiler.h: Move _M_flags to the top of class
member list, because other members' initialization depend on it.
* include/bits/regex_compiler.tcc
(_Compiler<>::_Compiler): Adjust member initializations.
(_Compiler<>::_M_quantifier): Fix ungreedy interval quantifier.
* include/bits/regex_executor.h: Remove _RegexT from _*Executor classes.
In the future, all regex classes may refactor to *Impl style.
* include/bits/regex_executor.tcc (_Executor::_M_set_results):
Merge identical code from _*Executor classes.
* testsuite/28_regex/algorithms/regex_match/extended/
string_dispatch_01.cc (fake_match<>): Adjust the hacking-style testcase
caller for new __get_executors interface.
* testsuite/28_regex/algorithms/regex_replace/char/basic_replace.cc:
New.
* testsuite/28_regex/match_results/format.cc: New.
* testsuite/28_regex/traits/char/lookup_collatename.cc: Remove digraph
testcase.
* testsuite/28_regex/traits/wchar_t/lookup_collatename.cc: Likewise.
From-SVN: r202858