Currently move_max follows the tuning feature first, but ideally it
should sync with prefer-vector-width when it is explicitly set to keep
vector move and operation with same vector size.
gcc/ChangeLog:
PR target/112824
* config/i386/i386-options.cc (ix86_option_override_internal):
Sync ix86_move_max/ix86_store_max with prefer_vector_width when
it is explicitly set.
gcc/testsuite/ChangeLog:
PR target/112824
* gcc.target/i386/pieces-memset-45.c: Remove
-mprefer-vector-width=256.
* g++.target/i386/pr112824-1.C: New test.
gcc/ChangeLog:
* config/i386/driver-i386.cc (host_detect_local_cpu): Do not
set Grand Ridge depending on RAO-INT.
* config/i386/i386.h: Remove PTA_RAOINT from PTA_GRANDRIDGE.
* doc/invoke.texi: Adjust documentation.
Notice current generic vector cost model make PR112387 failed to vectorize.
Adapt it same as ARM SVE generic vector cost model which can fix it.
Committed as it is obvious fix.
PR target/112387
gcc/ChangeLog:
* config/riscv/riscv.cc: Adapt generic cost model same ARM SVE.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr112387.c: Moved to...
* gcc.dg/vect/costmodel/riscv/rvv/pr112387-1.c: ...here.
* gcc.dg/vect/costmodel/riscv/rvv/pr112387-2.c: New test.
Follow Richard's suggestions, we should not model address cost in the loop
vectorizer for select_vl or decrement IV since other style vectorization doesn't
do that.
To make cost model comparison apple to apple.
This patch set COST from 2 to 1 which turns out have better codegen
in various codegen for RVV.
Ok for trunk ?
PR target/111153
gcc/ChangeLog:
* tree-vect-loop.cc (vect_estimate_min_profitable_iters):
Remove address cost for select_vl/decrement IV.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr111153.c: Moved to...
* gcc.dg/vect/costmodel/riscv/rvv/pr11153-2.c: ...here.
* gcc.dg/vect/costmodel/riscv/rvv/pr111153-1.c: New test.
This adds the C++23 std::print functions, which use std::format to write
to a FILE stream or std::ostream (defaulting to stdout).
The new extern symbols are in the libstdc++exp.a archive, so we aren't
committing to stable symbols in the DSO yet. There's a UTF-8 validating
and transcoding function added by this change. That can certainly be
optimized, but it's internal to libstdc++exp.a so can be tweaked later
at leisure.
Currently the external symbols work for all targets, but are only
actually used for Windows, where it's necessary to transcode to UTF-16
to write to the console. The standard seems to encourage us to also
diagnose invalid UTF-8 for non-Windows targets when writing to a
terminal (and only when writing to a terminal), but I'm reliably
informed that that wasn't the intent of the wording. Checking for
invalid UTF-8 sequences only needs to happen for Windows, which is good
as checking for a terminal requires a call to isatty, and on Linux that
uses an ioctl syscall, which would make std::print ten times slower!
Testing the std::print behaviour is difficult if it depends on whether
the output stream is connected to a Windows console or not, as we can't
(as far as I know) do that non-interactively in DejaGNU. One of the new
tests uses the internal __write_to_terminal function directly. That
allows us to verify its UTF-8 error handling on POSIX targets, even
though that's not actually used by std::print. For Windows, that
__write_to_terminal function transcodes to UTF-16 but then uses
WriteConsoleW which fails unless it really is writing to the console.
That means the 27_io/print/2.cc test FAILs on Windows. The UTF-16
transcoding has been manually tested using mingw-w64 and Wine, and
appears to work.
libstdc++-v3/ChangeLog:
PR libstdc++/107760
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/version.def (__cpp_lib_print): Define.
* include/bits/version.h: Regenerate.
* include/std/format (__literal_encoding_is_utf8): New function.
(_Seq_sink::view()): New member function.
* include/std/ostream (vprintf_nonunicode, vprintf_unicode)
(print, println): New functions.
* include/std/print: New file.
* src/c++23/Makefile.am: Add new source file.
* src/c++23/Makefile.in: Regenerate.
* src/c++23/print.cc: New file.
* testsuite/27_io/basic_ostream/print/1.cc: New test.
* testsuite/27_io/print/1.cc: New test.
* testsuite/27_io/print/2.cc: New test.
Fix an incorrect call to _Sink::_M_reserve() which should have passed
the __n parameter. This was not actually a problem because it was in an
discarded statement, because only the _Seq_sink<basic_string<C>>
specialization was used.
Also add some branch prediction hints, explanatory comments, and debug
mode assertions to _Seq_sink.
libstdc++-v3/ChangeLog:
* include/std/format (_Seq_sink): Fix missing argument in
discarded statement. Add comments, likely/unlikely attributes
and debug assertions as sanity checks.
These tests are expected to run interactively, with the output checked
by eye. Nobody ever does that, but we can at least use dg-output to
check that the output is as expected.
libstdc++-v3/ChangeLog:
* testsuite/27_io/objects/char/2.cc: Use dg-output.
* testsuite/27_io/objects/wchar_t/2.cc: Use dg-output.
I got the order of arguments to std::format_to wrong. It was in a
discarded statement, for a case which wasn't being tested.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (__formatter_chrono::_M_S): Fix order
of arguments to std::format_to.
* testsuite/20_util/duration/io.cc: Test subsecond duration with
floating-point rep.
This is sort of like r14-5514, but at block scope. Consider
struct A { A(int, int); };
void
g (int a)
{
A bar(auto(a), 42); // not a fn decl
}
where we emit error: 'auto' parameter not permitted in this context
which is bogus -- bar doesn't declare a function, so the auto is OK,
but we don't know it till we've seen the second argument. The error
comes from grokdeclarator invoked just after we've parsed the auto(a).
A possible approach seems to be to delay the auto parameter checking
and only check once we know we are indeed dealing with a function
declaration. For tparms, we should still emit the error right away.
PR c++/112482
gcc/cp/ChangeLog:
* decl.cc (grokdeclarator): Do not issue the auto parameter error while
tentatively parsing a function parameter.
* parser.cc (cp_parser_parameter_declaration_clause): Check it here.
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/auto-fncast15.C: New test.
After r14-2667-gceae1400cf24f329393e96dd9720, we force a constant to a register
if it is shared with one of the other operands. The problem is used the comparison
mode for the register but that could be different from the operand mode. This
causes some issues on some targets.
To fix it, we need to make sure the mode of the comparison matches the mode
of the other operands, before we can compare the constants (CONST_INT has no
modes so compare_rtx returns true if they have the same value even if the usage
is in a different mode).
Bootstrapped and tested on both aarch64-linux-gnu and x86_64-linux.
PR middle-end/111260
gcc/ChangeLog:
* optabs.cc (emit_conditional_move): Change the modes to be
equal before forcing the constant to a register.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/condmove-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
This test shows that we cannot clear *walk_subtrees in
cp_fold_immediate_r when we're in_immediate_context, because that
checks even e.g. sk_template_parms, and, as the comment says, affects
cp_fold_r as well. Here we had an expression with
min ((long int) VIEW_CONVERT_EXPR<long unsigned int>(bytecount), (long int) <<< Unknown tree: sizeof_expr
(int) <<< error >>> >>>)
as its sub-expression, and we never evaluated that into
min ((long int) bytecount, 4)
so the SIZEOF_EXPR leaked into the middle end. We need to make sure
we are calling cp_fold on the SIZEOF_EXPR.
PR c++/112869
gcc/cp/ChangeLog:
* cp-gimplify.cc (cp_fold_immediate_r): Check cp_unevaluated_operand
and DECL_IMMEDIATE_FUNCTION_P rather than in_immediate_context.
gcc/testsuite/ChangeLog:
* g++.dg/template/sizeof18.C: New test.
Recent commit f5fc001a84
"aarch64: enable mixed-types for aarch64 simdclones" added lines to those
test cases and GCN-specific line numbers got out of sync, which had
originally gotten added in commit b73c49f6f8
"amdgcn: OpenMP SIMD routine support".
gcc/testsuite/
* gcc.dg/vect/vect-simd-clone-1.c: Update GCN 'dg-warning's.
* gcc.dg/vect/vect-simd-clone-2.c: Likewise.
* gcc.dg/vect/vect-simd-clone-3.c: Likewise.
* gcc.dg/vect/vect-simd-clone-4.c: Likewise.
* gcc.dg/vect/vect-simd-clone-5.c: Likewise.
* gcc.dg/vect/vect-simd-clone-8.c: Likewise.
Add a new parameter param_fully_pipelined_fma. If it is non-zero,
reassociation considers the benefit of parallelizing FMA's
multiplication part and addition part, assuming FMUL and FMA use the
same units that can also do FADD.
With the patch and new option, there's ~2% improvement in spec2017
508.namd on AmpereOne. (The other options are "-Ofast -mcpu=ampere1
-flto".)
PR tree-optimization/110279
gcc/ChangeLog:
* doc/invoke.texi: New parameter fully-pipelined-fma.
* params.opt: New parameter fully-pipelined-fma.
* tree-ssa-reassoc.cc (get_mult_latency_consider_fma): Return
the latency of MULT_EXPRs that can't be hidden by the FMAs.
(get_reassociation_width): Search for a smaller width
considering the benefit of fully pipelined FMA.
(rank_ops_for_fma): Return the number of MULT_EXPRs.
(reassociate_bb): Pass the number of MULT_EXPRs to
get_reassociation_width; avoid calling
get_reassociation_width twice.
gcc/testsuite/ChangeLog:
* gcc.dg/pr110279-2.c: New test.
PR fortran/112873
gcc/fortran/ChangeLog:
* gfortran.texi: Update to reflect the changes.
* intrinsic.cc (add_functions): Update the standard that the
various degree trigonometric functions have been described in.
(gfc_check_intrinsic_standard): Add an error string for F2023.
* intrinsic.texi: Update accordingly.
The test says that CTAD from inherited constructors doesn't work
before C++23 so we should use c++20_down for the error.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/class-deduction67.C: Correct dg-error target.
In extract_bit_field_1 we try to get a better vector mode before
extracting from it. Better refers to the case when the requested target
mode does not equal the inner mode of the vector to extract from and we
have an equivalent tieable vector mode with a fitting inner mode.
On riscv this triggered an ICE (PR112999) because we would take the
detour of extracting from a mask-mode vector via a vector integer mode.
One element of that mode could be subreg-punned with TImode which, in
turn, would need to be operated on in DImode chunks.
This patch adds
&& known_eq (bitsize, GET_MODE_UNIT_PRECISION (new_mode))
&& multiple_p (bitnum, GET_MODE_UNIT_PRECISION (new_mode))
to the list of criteria for a better mode.
gcc/ChangeLog:
PR target/112999
* expmed.cc (extract_bit_field_1): Ensure better mode
has fitting unit_precision.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112999.c: New test.
This changes the vec_extract path of extract_bit_field to use
GET_MODE_PRECISION instead of GET_MODE_BITSIZE and uses
the mode obtained from insn_data[icode].operand[0] as target mode.
Also, it adds a vec_extract<mode>bi expander for riscv that maps
to vec_extract<mode>qi. This fixes an ICE on riscv where we did
not find a vec_extract optab and continued with the generic code
which requires 1-byte alignment that riscv mask modes do not provide.
Apart from that it adds poly_int support to riscv's vec_extract
expander and makes the RVV..BImode -> QImode expander call
emit_vec_extract in order not to duplicate code.
gcc/ChangeLog:
PR target/112773
* config/riscv/autovec.md (vec_extract<mode>bi): New expander
calling vec_extract<mode>qi.
* config/riscv/riscv-protos.h (riscv_legitimize_poly_move):
Export.
(emit_vec_extract): Change argument from poly_int64 to rtx.
* config/riscv/riscv-v.cc (shuffle_extract_and_slide1up_patterns):
Ditto.
* config/riscv/riscv.cc (riscv_legitimize_poly_move): Export.
(riscv_legitimize_move): Use rtx instead of poly_int64.
* expmed.cc (store_bit_field_1): Change BITSIZE to PRECISION.
(extract_bit_field_1): Change BITSIZE to PRECISION and use
return mode from insn_data as target mode.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/partial/pr112773.c: New test.
As it stands, GCC doesn't document any public AArch64-specific operand
modifiers for use in inline asm. This patch fixes that by documenting
an initial set of public AArch64-specific operand modifiers.
gcc/ChangeLog:
* doc/extend.texi: Document AArch64 Operand Modifiers.
This makes constexpr std::vector (mostly) work in Debug Mode. All safe
iterator instrumentation and checking is disabled during constant
evaluation, because it requires mutex locks and calls to non-inline
functions defined in libstdc++.so. It should be OK to disable the safety
checks, because most UB should be detected during constant evaluation
anyway.
We could try to enable the full checking in constexpr, but it would mean
wrapping all the non-inline functions like _M_attach with an inline
_M_constexpr_attach that does the iterator housekeeping inline without
mutex locks when called for constant evaluation, and calls the
non-inline function at runtime. That could be done in future if we find
that we've lost safety or useful checking by disabling the safe
iterators.
There are a few test failures in C++20 mode, which I'm unable to
explain. The _Safe_iterator::operator++() member gives errors for using
non-constexpr functions during constant evaluation, even though those
functions are guarded by std::is_constant_evaluated() checks. The same
code works fine for C++23 and up.
libstdc++-v3/ChangeLog:
PR libstdc++/109536
* include/bits/c++config (__glibcxx_constexpr_assert): Remove
macro.
* include/bits/stl_algobase.h (__niter_base, __copy_move_a)
(__copy_move_backward_a, __fill_a, __fill_n_a, __equal_aux)
(__lexicographical_compare_aux): Add constexpr to overloads for
debug mode iterators.
* include/debug/helper_functions.h (__unsafe): Add constexpr.
* include/debug/macros.h (_GLIBCXX_DEBUG_VERIFY_COND_AT): Remove
macro, folding it into ...
(_GLIBCXX_DEBUG_VERIFY_AT_F): ... here. Do not use
__glibcxx_constexpr_assert.
* include/debug/safe_base.h (_Safe_iterator_base): Add constexpr
to some member functions. Omit attaching, detaching and checking
operations during constant evaluation.
* include/debug/safe_container.h (_Safe_container): Likewise.
* include/debug/safe_iterator.h (_Safe_iterator): Likewise.
* include/debug/safe_iterator.tcc (__niter_base, __copy_move_a)
(__copy_move_backward_a, __fill_a, __fill_n_a, __equal_aux)
(__lexicographical_compare_aux): Add constexpr.
* include/debug/vector (_Safe_vector, vector): Add constexpr.
Omit safe iterator operations during constant evaluation.
* testsuite/23_containers/vector/bool/capacity/constexpr.cc:
Remove dg-xfail-if for debug mode.
* testsuite/23_containers/vector/bool/cmp_c++20.cc: Likewise.
* testsuite/23_containers/vector/bool/cons/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/element_access/1.cc:
Likewise.
* testsuite/23_containers/vector/bool/element_access/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/modifiers/assign/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/modifiers/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/modifiers/swap/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/capacity/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/cmp_c++20.cc: Likewise.
* testsuite/23_containers/vector/cons/constexpr.cc: Likewise.
* testsuite/23_containers/vector/data_access/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/element_access/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/modifiers/assign/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/modifiers/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/modifiers/swap/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/cons/destructible_debug_neg.cc:
Adjust dg-error line number.
When BB reduction vectorization picks up a chain with an ASM def
in it and that's inside the vectorized region we fail to get its
LHS. Instead of trying to get the correct def the following
avoids vectorizing such def and instead keeps it as def to add
in the epilog.
PR tree-optimization/113018
* tree-vect-slp.cc (vect_slp_check_for_roots): Only start
SLP discovery from stmts with a LHS.
This patch implements C++23 class template argument deduction from
inherited constructors, the mechanism for which relies on alias
CTAD which we already fully support. The process for transforming
the return type of an inherited guide is specified in terms of a
partially specialized class template, but this patch implements it in
a simpler way, effectively performing ahead of time deduction instead
of instantiation time deduction. I wasn't able to find an example for
which this implementation strategy makes a difference, but I didn't
look very hard. Support seems good enough to advertise as complete
but there doesn't seem to be a feature-test macro update for this
feature yet. There should be no functional change before C++23 mode.
There's a couple of FIXMEs, one in inherited_ctad_tweaks for recognizing
more forms of inherited constructors, and one in deduction_guides_for for
making the cache aware of base-class dependencies.
gcc/cp/ChangeLog:
* cp-tree.h (type_targs_deducible_from): Adjust return type.
* pt.cc (alias_ctad_tweaks): Also handle C++23 inherited CTAD.
(inherited_ctad_tweaks): Define.
(type_targs_deducible_from): Return the deduced arguments or
NULL_TREE instead of a bool. Handle 'tmpl' being a TREE_LIST
representing a synthetic alias template.
(ctor_deduction_guides_for): Do inherited_ctad_tweaks for each
USING_DECL in C++23 mode.
(deduction_guides_for): Add FIXME for stale cache entries in
light of inherited CTAD.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/class-deduction67.C: Accept in C++23 mode.
* g++.dg/cpp23/class-deduction-inherited1.C: New test.
* g++.dg/cpp23/class-deduction-inherited2.C: New test.
* g++.dg/cpp23/class-deduction-inherited3.C: New test.
* g++.dg/cpp23/class-deduction-inherited4.C: New test.
The following makes the attempt at code-generating a constant/external
SLP node twice well-formed as that can happen when partitioning BB
vectorization attempts where we keep constants/externals unpartitioned.
PR tree-optimization/112793
* tree-vect-slp.cc (vect_schedule_slp_node): Already
code-generated constant/external nodes are OK.
* g++.dg/vect/pr112793.cc: New testcase.
Avoid copying eedges in infinite_loop::infinite_loop.
Use initializer lists in the various places reported in
PR analyzer/112655 (apart from coord_test's ctor, which
would require nontrivial refactoring).
gcc/analyzer/ChangeLog:
PR analyzer/112655
* infinite-loop.cc (infinite_loop::infinite_loop): Pass eedges
via rvalue reference rather than by value.
(starts_infinite_loop_p): Move eedges when constructing an
infinite_loop instance.
* sm-file.cc (fileptr_state_machine::fileptr_state_machine): Use
initializer list for states.
* sm-sensitive.cc
(sensitive_state_machine::sensitive_state_machine): Likewise.
* sm-signal.cc (signal_state_machine::signal_state_machine):
Likewise.
* sm-taint.cc (taint_state_machine::taint_state_machine):
Likewise.
* varargs.cc (va_list_state_machine::va_list_state_machine): Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Being very simplistic, early-ra just models an allocno's live range
as a single interval. This doesn't work well for single-register
accumulators that are updated multiple times in a loop, since in
SSA form, each intermediate result will be a separate SSA name and
will remain separate from the accumulator even after out-of-ssa.
This means that in something like:
for (;;)
{
x = x + ...;
x = x + ...;
}
the first definition of x and the second use will be a separate pseudo
from the "main" loop-carried pseudo.
A real RA would fix this by keeping general, segmented live ranges.
But that feels like a slippery slope in this context.
This patch instead looks for sharability at a more local level,
as described in the comments. It's a bit hackish, but hopefully
not too much.
The patch also contains some small tweaks that are needed to make
the new and existing tests pass:
- fix a case where a pseudo that was only moved was wrongly treated
as not an FPR candidate
- fix some bookkeeping related to is_strong_copy_src
- use the number of FPR preferences as a tiebreaker when sorting colors
I fully expect that we'll need to be more aggressive at skipping the
early-ra allocation. For example, it probably makes sense to refuse any
allocation that involves an FPR move. But I'd like to keep collecting
examples of where things go wrong first, so that hopefully we can improve
the cases with strided registers or structures.
gcc/
* config/aarch64/aarch64-early-ra.cc (allocno_info::is_equiv): New
member variable.
(allocno_info::equiv_allocno): Replace with...
(allocno_info::related_allocno): ...this member variable.
(allocno_info::chain_prev): Put into an enum with...
(allocno_info::last_use_point): ...this new member variable.
(color_info::num_fpr_preferences): New member variable.
(early_ra::m_shared_allocnos): Likewise.
(allocno_info::is_shared): New member function.
(allocno_info::is_equiv_to): Likewise.
(early_ra::dump_allocnos): Dump sharing information. Tweak column
widths.
(early_ra::fpr_preference): Check ALLOWS_NONFPR before returning -2.
(early_ra::start_new_region): Handle m_shared_allocnos.
(early_ra::create_allocno_group): Set related_allocno rather than
equiv_allocno.
(early_ra::record_allocno_use): Likewise. Detect multiple calls
for the same program point. Update last_use_point and is_equiv.
Clear is_strong_copy_src rather than is_strong_copy_dest.
(early_ra::record_allocno_def): Use related_allocno rather than
equiv_allocno. Update last_use_point.
(early_ra::valid_equivalence_p): Replace with...
(early_ra::find_related_start): ...this new function.
(early_ra::record_copy): Look for cases where a destination copy chain
can be shared with the source allocno.
(early_ra::find_strided_accesses): Update for equiv_allocno->
related_allocno change. Only call consider_strong_copy_src_chain
at the head of a copy chain.
(early_ra::is_chain_candidate): Skip shared allocnos. Update for
new representation of equivalent allocnos.
(early_ra::chain_allocnos): Update for new representation of
equivalent allocnos.
(early_ra::try_to_chain_allocnos): Likewise.
(early_ra::merge_fpr_info): New function, split out from...
(early_ra::set_single_color_rep): ...here.
(early_ra::form_chains): Handle shared allocnos.
(early_ra::process_copies): Count the number of FPR preferences.
(early_ra::cmp_decreasing_size): Rename to...
(early_ra::cmp_allocation_order): ...this. Sort equal-sized groups
by the number of FPR preferences.
(early_ra::finalize_allocation): Handle shared allocnos.
(early_ra::process_region): Reset chain_prev as well as chain_next.
gcc/testsuite/
* gcc.target/aarch64/sve/accumulators_1.c: New test.
* gcc.target/aarch64/sve/acle/asm/create2_1.c: Allow the moves to
be in any order.
* gcc.target/aarch64/sve/acle/asm/create3_1.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/create4_1.c: Likewise.
Arrange for strub internal wrappers to pass volatile arguments by
reference to the wrapped bodies.
for gcc/ChangeLog
PR middle-end/112938
* ipa-strub.cc (pass_ipa_strub::execute): Pass volatile args
by reference to internal strub wrapped bodies.
for gcc/testsuite/ChangeLog
PR middle-end/112938
* gcc.dg/strub-internal-volatile.c: Check indirection of
volatile args.
When generating code for an internal strub wrapper, don't clear the
DECL_NOT_GIMPLE_REG_P flag of volatile args, and gimplify them both
before and after any conversion.
While at that, move variable TMP into narrower scopes so that it's
more trivial to track where ARG lives.
for gcc/ChangeLog
PR middle-end/112938
* ipa-strub.cc (pass_ipa_strub::execute): Handle promoted
volatile args in internal strub. Simplify.
for gcc/testsuite/ChangeLog
PR middle-end/112938
* gcc.dg/strub-internal-volatile.c: New.
More fallout from the c99 conversion. The m68k specific test pr63347.c calls
exit and abort without a prototype in scope. This patch turns them into
__builtin calls avoiding the error.
Bootstrapped and regression tested on m68k-linux-gnu, pushed to the trunk.
gcc/testsuite
* gcc.target/m68k/pr63347.c: Call __builtin_abort and __builtin_exit
instead of abort and exit.
... to avoid issues such as:
In file included from [...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/xmmintrin.h:34:0,
from [...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/x86intrin.h:31,
from [...]/i686-pc-linux-gnu/include/c++/5.2.0/i686-pc-linux-gnu/64/bits/opt_random.h:33,
from [...]/i686-pc-linux-gnu/include/c++/5.2.0/random:50,
from [...]/i686-pc-linux-gnu/include/c++/5.2.0/bits/stl_algo.h:66,
from [...]/i686-pc-linux-gnu/include/c++/5.2.0/algorithm:62,
from [...]/source-gcc/gcc/gimple-ssa-sccopy.cc:32:
[...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/mm_malloc.h:42:12: error: attempt to use poisoned "malloc"
return malloc (size);
^
make[2]: *** [Makefile:1197: gimple-ssa-sccopy.o] Error 1
Minor fix-up for commit cd794c3961
"A new copy propagation and PHI elimination pass".
gcc/
* gimple-ssa-sccopy.cc: '#define INCLUDE_ALGORITHM' instead of
'#include <algorithm>'.
Define the libgrust directory as a host compilation module as well as
for targets. Disable target libgrust if we're not building target
libstdc++.
ChangeLog:
* Makefile.def: Add libgrust as host & target module.
* configure.ac: Add libgrust to host tools list. Add libgrust to
noconfigdirs if we're not building target libstdc++.
* Makefile.in: Regenerate.
* configure: Regenerate.
gcc/rust/ChangeLog:
* config-lang.in: Add libgrust as a target module for the rust
language.
Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
Signed-off-by: Pierre-Emmanuel Patry <pierre-emmanuel.patry@embecosm.com>
On top of the previously posted patch, this simplifies say (x * 16) / (x * 4)
into 4. Unlike the previous pattern, this is something we didn't fold
previously on GENERIC, so I think it shouldn't be all wrapped with #if
GIMPLE. The question whether there should be fold_overflow_warning for the
TYPE_OVERFLOW_UNDEFINED case remains.
2023-12-14 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112994
* match.pd ((t * u) / (t * v) -> (u / v)): New simplification.
* gcc.dg/tree-ssa/pr112994-2.c: New test.
The following testcase is optimized just on GENERIC (using
strict_overflow_p = false;
if (TREE_CODE (arg1) == INTEGER_CST
&& (tem = extract_muldiv (op0, arg1, code, NULL_TREE,
&strict_overflow_p)) != 0)
{
if (strict_overflow_p)
fold_overflow_warning (("assuming signed overflow does not occur "
"when simplifying division"),
WARN_STRICT_OVERFLOW_MISC);
return fold_convert_loc (loc, type, tem);
}
) but not on GIMPLE.
An earlier version of the patch regressed
+FAIL: gcc.dg/Wstrict-overflow-3.c correct warning (test for warnings, line 12)
test, we are indeed assuming that signed overflow does not occur
when simplifying division in there.
This version of the patch (which provides the simplification only
for GIMPLE) fixes that.
And/or we could add the
fold_overflow_warning (("assuming signed overflow does not occur "
"when simplifying division"),
WARN_STRICT_OVERFLOW_MISC);
call into the simplification, but in that case IMHO it should go into
the (t * u) / u -> t simplification as well, there we assume the exact
same thing (of course, in both cases only in the spots where we don't
verify it through ranger that it never overflows).
Guarding the whole simplification to GIMPLE only IMHO makes sense because
the above mentioned folding does it for GENERIC (and extract_muldiv even
handles far more cases, dunno how many from that we should be doing on
GIMPLE in match.pd and what could be done elsewhere; e.g. extract_muldiv
can handle (x * 16 + y * 32) / 8 -> x * 2 + y * 4 etc.).
Dunno about the fold_overflow_warning, I always have doubts about why
such a warning is useful to users.
2023-12-14 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112994
* match.pd ((t * 2) / 2 -> t): Adjust comment to use u instead of 2.
Punt without range checks if TYPE_OVERFLOW_SANITIZED.
((t * u) / v -> t * (u / v)): New simplification.
* gcc.dg/tree-ssa/pr112994-1.c: New test.
This patch adds the strongly-connected copy propagation (SCCOPY) pass.
It is a lightweight GIMPLE copy propagation pass that also removes some
redundant PHI statements. It handles degenerate PHIs, e.g.:
_5 = PHI <_1>;
_6 = PHI <_6, _6, _1, _1>;
_7 = PHI <16, _7>;
// Replaces occurences of _5 and _6 by _1 and _7 by 16
It also handles more complicated situations, e.g.:
_8 = PHI <_9, _10>;
_9 = PHI <_8, _10>;
_10 = PHI <_8, _9, _1>;
// Replaces occurences of _8, _9 and _10 by _1
gcc/ChangeLog:
* Makefile.in: Added sccopy pass.
* passes.def: Added sccopy pass before LTO streaming and before
RTL expansion.
* tree-pass.h (make_pass_sccopy): Added sccopy pass.
* gimple-ssa-sccopy.cc: New file.
gcc/testsuite/ChangeLog:
* gcc.dg/sccopy-1.c: New test.
Signed-off-by: Filip Kastl <fkastl@suse.cz>
This patch half-reverts 3aaf704bca and replaces it with a fix with
relaxed requiremets for invoking build_reconstructed_reference in
build_ref_for_model.
build_ref_for_model/build_ref_for_offset is used in two slightly
different contexts. The first is when we are looking at an assignmernt
like
p->field_A.field_B = s.field_B;
and we have a replacements for e.g. s.field_B.field_C.field_D and we
want to store them directly to p->field_A.field_B.field_C.field_D (as
opposed to going through s or using a MEM_REF based in
p->field_A.field_B). In this case, the offset of the
"model" (s.field_B.field_C.field_D) within this can be different than
offset within the LHS that we want to reach (field_C.field_D within
the "base" p->field_A.field_B). Patch 3aaf704bca has caused us to
unnecessarily create MEM_REFs for these situations. These uses of
build_ref_for_model work with the relaxed condition just fine.
The second, problematic, context is when somewhere in the function we
have an assignment
s.field_A = t.field_A.field_B;
and we are creating an access structure to represent s.field_A.field_B
even if it is not actually accessed in the original input. This is
done after scanning the entire function body and we need to construct
a "universal" reference to s.field_A.field_B. In this case the "base"
is "s" and it has to be the DECL itself and not some reference for it
because for arbitrary references we need a GSI pointing to a statement
which we don't have, the reference is supposed to be universal.
But then using build_ref_for_model and within it
build_reconstructed_reference misbihaves if the expression contains
any ARRAY_REFs. In the first case those are fine because as we
eventually reach the aggregate type that matches a real LHS or RHS, we
know we we can just bolt the rest of the references onto it and end up
with the correct overall reference. However when dealing with
s.array[1].field_A = s.array[2].field_B;
we cannot just bolt array[2] reference when we want array[1] but that
is exactly what happens when we use build_reconstructed_reference and
keep it walking all the way to s.
I was consiering making all users of the second kind use directly
build_ref_for_offset instead of build_ref_for_model but the latter
also handles COMPONENT_REFs to bit-fields which the former does not.
THerefore I have deided to use the NULL-ness of GSI as an indicator
how strict we need to be. I have changed the function comment to
reflect that.
I have been able to observe diambiguation improvements with this patch
over currenct master, we do successfuly manage a few more
aliasing_component_refs_p disambiguations when compiling cc1, going
from:
Alias oracle query stats:
refs_may_alias_p: 94354287 disambiguations, 106279231 queries
ref_maybe_used_by_call_p: 1572511 disambiguations, 95618222 queries
call_may_clobber_ref_p: 649273 disambiguations, 659371 queries
stmt_kills_ref_p: 142342 kills, 8407309 queries
nonoverlapping_component_refs_p: 19 disambiguations, 10227 queries
nonoverlapping_refs_since_match_p: 15665 disambiguations, 52585 must overlaps, 68893 queries
aliasing_component_refs_p: 67090 disambiguations, 3081766 queries
TBAA oracle: 22675296 disambiguations 61781978 queries
14045969 are in alias set 0
10997085 queries asked about the same object
153 queries asked about the same alias set
0 access volatile
12485774 are dependent in the DAG
1577701 are aritificially in conflict with void *
Modref stats:
modref kill: 832 kills, 19399 queries
modref use: 50760 disambiguations, 1825109 queries
modref clobber: 1371014 disambiguations, 40152535 queries
5190238 tbaa queries (0.129263 per modref query)
1341663 base compares (0.033414 per modref query)
PTA query stats:
pt_solution_includes: 36784427 disambiguations, 46141175 queries
pt_solutions_intersect: 4519387 disambiguations, 17081996 queries
to:
Alias oracle query stats:
refs_may_alias_p: 94354083 disambiguations, 106278948 queries
ref_maybe_used_by_call_p: 1572511 disambiguations, 95618018 queries
call_may_clobber_ref_p: 649273 disambiguations, 659371 queries
stmt_kills_ref_p: 142342 kills, 8407310 queries
nonoverlapping_component_refs_p: 19 disambiguations, 10227 queries
nonoverlapping_refs_since_match_p: 15665 disambiguations, 52585 must overlaps, 68893 queries
aliasing_component_refs_p: 67104 disambiguations, 3081781 queries
TBAA oracle: 22676608 disambiguations 61782455 queries
14044948 are in alias set 0
10998619 queries asked about the same object
153 queries asked about the same alias set
0 access volatile
12484882 are dependent in the DAG
1577245 are aritificially in conflict with void *
Modref stats:
modref kill: 832 kills, 19399 queries
modref use: 50760 disambiguations, 1825106 queries
modref clobber: 1371028 disambiguations, 40152504 queries
5190319 tbaa queries (0.129265 per modref query)
1341403 base compares (0.033408 per modref query)
PTA query stats:
pt_solution_includes: 36784449 disambiguations, 46141210 queries
pt_solutions_intersect: 4519320 disambiguations, 17082083 queries
gcc/ChangeLog:
2023-12-13 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/111807
* tree-sra.cc (build_ref_for_model): Allow offset smaller than
model->offset when gsi is non-NULL. Adjust function comment.
vpbroadcastd/vpbroadcastq is avaiable under TARGET_AVX2, but
vec_dup{v4di,v8si} pattern is avaiable under AVX with memory operand.
And it will cause LRA/Reload to generate spill and reload if we put
constant in register.
gcc/ChangeLog:
PR target/112992
* config/i386/i386-expand.cc
(ix86_convert_const_wide_int_to_broadcast): Don't convert to
broadcast for vec_dup{v4di,v8si} when TARGET_AVX2 is not
available.
(ix86_broadcast_from_constant): Allow broadcast for V4DI/V8SI
when !TARGET_AVX2 since it will be forced to memory later.
(ix86_expand_vector_move): Force constant to mem for
vec_dup{vssi,v4di} when TARGET_AVX2 is not available.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr100865-7a.c: Adjust testcase.
* gcc.target/i386/pr100865-7c.c: Ditto.
* gcc.target/i386/pr112992.c: New test.
After recent RVV cost model tweak, I found this PR issue has been fixed.
Add testcase and committed.
PR target/112387
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr112387.c: New test.
struct bar { int num_vectors; double *vectors; };
is 16 bytes only on 64-bit targets, on 32-bit ones it is just 8 bytes,
so the explicit matching of the * 16 multiplication only works on the
former.
2023-12-14 Jakub Jelinek <jakub@redhat.com>
* c-c++-common/gomp/target-enter-data-1.c: Match also sizeof bar on
32-bit targets - 8 bytes - rather than just 16 bytes.
On Fri, Dec 08, 2023 at 03:12:00PM +0800, liuhongt wrote:
> * g++.target/i386/pr112904.C: New test.
The new test FAILs on i686-linux and even on x86_64-linux I think
it doesn't actually test what was reported, unless one performs testing
with -march= for some XOP enabled CPU or -mxop.
The following patch fixes that, tested on x86_64-linux with
make check-g++ RUNTESTFLAGS='--target_board=unix\{-m32,-m32/-mno-sse/-mno-mmx,-m64\} i386.exp=pr112904.C'
2023-12-14 Jakub Jelinek <jakub@redhat.com>
PR target/112904
* g++.target/i386/pr112904.C: Add dg-do compile, dg-options -mxop
and for ia32 also dg-additional-options -mmmx.
With valgrind checking, there are various errors reported on some C++26
libstdc++ tests, like:
==2009913== Conditional jump or move depends on uninitialised value(s)
==2009913== at 0x914C59: gt_ggc_mx_lang_tree_node(void*) (gt-cp-tree.h:107)
==2009913== by 0x8AB7A5: gt_ggc_mx_tinst_level(void*) (gt-cp-pt.h:32)
==2009913== by 0xB89B25: ggc_mark_root_tab(ggc_root_tab const*) (ggc-common.cc:75)
==2009913== by 0xB89DF4: ggc_mark_roots() (ggc-common.cc:104)
==2009913== by 0x9D6311: ggc_collect(ggc_collect) (ggc-page.cc:2227)
==2009913== by 0xDB70F6: execute_one_pass(opt_pass*) (passes.cc:2738)
==2009913== by 0xDB721F: execute_pass_list_1(opt_pass*) (passes.cc:2755)
==2009913== by 0xDB7258: execute_pass_list(function*, opt_pass*) (passes.cc:2766)
==2009913== by 0xA55525: cgraph_node::analyze() (cgraphunit.cc:695)
==2009913== by 0xA57CC7: analyze_functions(bool) (cgraphunit.cc:1248)
==2009913== by 0xA5890D: symbol_table::finalize_compilation_unit() (cgraphunit.cc:2555)
==2009913== by 0xEB02A1: compile_file() (toplev.cc:473)
I think the problem is in the tinst_level::to_list optimization from 2018.
That function returns a TREE_LIST with TREE_PURPOSE/TREE_VALUE filled in.
Either it freshly allocates using build_tree_list (NULL, NULL); + stores
TREE_PURPOSE/TREE_VALUE, that case is fine (the whole tree_list object
is zeros, except for TREE_CODE set to TREE_LIST and TREE_PURPOSE/TREE_VALUE
modified later; the above also means in particular TREE_TYPE of it is NULL
and TREE_CHAIN is NULL and both are accessible/initialized even in valgrind
annotations.
Or it grabs a TREE_LIST node from a freelist.
If defined(ENABLE_GC_CHECKING), the object is still all zeros except
for TREE_CODE/TREE_PURPOSE/TREE_VALUE like in the fresh allocation case
(but unlike the build_tree_list case in the valgrind annotations
TREE_TYPE and TREE_CHAIN are marked as uninitialized).
If !defined(ENABLE_GC_CHECKING), I believe the actual memory content
is that everything but TREE_CODE/TREE_PURPOSE/TREE_VALUE/TREE_CHAIN is
zeros and TREE_CHAIN is something random (whatever next entry is in the
freelist, nothing overwrote it) and from valgrind POV again,
TREE_TYPE and TREE_CHAIN are marked as uninitialized.
When using the other freelist instantiations (pending_template and
tinst_level) I believe everything is correct, from valgrind POV it marks
the whole pending_template or tinst_level as uninitialized, but the
caller initializes it all).
One way to fix this would be let tinst_level::to_list not store just
TREE_PURPOSE (ret) = tldcl;
TREE_VALUE (ret) = targs;
but also
TREE_TYPE (ret) = NULL_TREE;
TREE_CHAIN (ret) = NULL_TREE;
Though, that seems like wasted effort in the build_tree_list case to me.
So, the following patch instead does that TREE_CHAIN = NULL_TREE store only
in the case where it isn't already done (and likewise for TREE_TYPE just to
be sure) and marks both TREE_CHAIN and TREE_TYPE as initialized (the latter
is at that spot, the former is because we never really touch TREE_TYPE of a
TREE_LIST anywhere and so the NULL gets stored into the freelist and
restored from there (except for ENABLE_GC_CHECKING where it is poisoned
and then cleared again).
2023-12-14 Jakub Jelinek <jakub@redhat.com>
PR c++/112968
* pt.cc (freelist<tree_node>::reinit): Make whole obj->common
defined for valgrind annotations rather than just obj->base,
and do it even for ENABLE_GC_CHECKING. If not ENABLE_GC_CHECKING,
clear TREE_CHAIN (obj) and TREE_TYPE (obj).
The alpha port failed its weekly test due to a lack of a prototype for the
syscall() routine. Fixed thusly and pushed to the trunk.
gcc/testsuite
* gcc.c-torture/execute/20001229-1.c: Prototype syscall().