This patch adds support for non-constant component offsets in "map"
clauses for OpenMP (and the equivalants for OpenACC), which are not able
to be sorted into order at compile time. Normally struct accesses in
such clauses are gathered together and sorted into increasing address
order after a "GOMP_MAP_STRUCT" node: if we have variable indices,
that is no longer possible.
This version of the patch scales back the previously-posted version to
merely add a diagnostic for incorrect usage of component accesses with
variably-indexed arrays of structs: the only permitted variant is where
we have multiple indices that are the same, but we could not prove so
at compile time. Rather than silently producing the wrong result for
cases where the indices are in fact different, we error out (e.g.,
"map(dtarr(i)%arrptr, dtarr(j)%arrptr(4:8))", for different i/j).
For now, multiple *constant* array indices are still supported (see
map-arrayofstruct-1.c). That could perhaps be addressed with a follow-up
patch, if necessary.
This version of the patch renumbers the GOMP_MAP_STRUCT_UNORD kind to
avoid clashing with the OpenACC "non-contiguous" dynamic array support
(though that is not yet applied to mainline).
2023-08-18 Julian Brown <julian@codesourcery.com>
gcc/
* gimplify.cc (extract_base_bit_offset): Add VARIABLE_OFFSET parameter.
(omp_get_attachment, omp_group_last, omp_group_base,
omp_directive_maps_explicitly): Add GOMP_MAP_STRUCT_UNORD support.
(omp_accumulate_sibling_list): Update calls to extract_base_bit_offset.
Support GOMP_MAP_STRUCT_UNORD.
(omp_build_struct_sibling_lists, gimplify_scan_omp_clauses,
gimplify_adjust_omp_clauses, gimplify_omp_target_update): Add
GOMP_MAP_STRUCT_UNORD support.
* omp-low.cc (lower_omp_target): Add GOMP_MAP_STRUCT_UNORD support.
* tree-pretty-print.cc (dump_omp_clause): Likewise.
include/
* gomp-constants.h (gomp_map_kind): Add GOMP_MAP_STRUCT_UNORD.
libgomp/
* oacc-mem.c (find_group_last, goacc_enter_data_internal,
goacc_exit_data_internal, GOACC_enter_exit_data): Add
GOMP_MAP_STRUCT_UNORD support.
* target.c (gomp_map_vars_internal): Add GOMP_MAP_STRUCT_UNORD support.
Detect incorrect use of variable indexing of arrays of structs.
(GOMP_target_enter_exit_data, gomp_target_task_fn): Add
GOMP_MAP_STRUCT_UNORD support.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c: New test.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c: New test.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c: New test.
* testsuite/libgomp.fortran/map-subarray-5.f90: New test.
This patch uses the new force_reload_address routine added by the
previous patch to fix PR112906.
gcc/ChangeLog:
PR target/112906
* config/aarch64/aarch64-sve.md (@aarch64_vec_duplicate_vq<mode>_le):
Use force_reload_address to reload addresses that aren't suitable for
ld1rq in the pre-RA splitter.
gcc/testsuite/ChangeLog:
PR target/112906
* gcc.target/aarch64/sve/acle/general/pr112906.c: New test.
In PR112906 we ICE because we try to use force_reg to reload an
auto-increment address, but force_reg can't do this.
With the aim of fixing the PR by supporting reloading arbitrary
addresses in pre-RA splitters, this patch generalizes
lra-constraints.cc:emit_inc and makes it available to the rest of the
compiler by moving the generalized version to emit-rtl.cc.
We observe that the separate IN parameter to LRA's emit_inc is
redundant, since the function is static and is only (statically) called
once in lra-constraints.cc, with in == value. As such, we drop the IN
parameter and simplify the code accordingly.
We wrap the emit_inc code in a virtual class to allow LRA to override
how reload pseudos are created, thereby preserving the existing LRA
behaviour as much as possible.
We then add a second (higher-level) routine to emit-rtl.cc,
force_reload_address, which can reload arbitrary addresses. This uses
the generalized emit_inc code to handle the RTX_AUTOINC case. The
second patch in this series uses force_reload_address to fix PR112906.
Since we intend to call address_reload_context::emit_autoinc from within
splitters, and the code lifted from LRA calls recog, we have to avoid
clobbering recog_data. We do this by introducing a new RAII class for
saving/restoring recog_data on the stack.
gcc/ChangeLog:
PR target/112906
* emit-rtl.cc (address_reload_context::emit_autoinc): New.
(force_reload_address): New.
* emit-rtl.h (struct address_reload_context): Declare.
(force_reload_address): Declare.
* lra-constraints.cc (class lra_autoinc_reload_context): New.
(emit_inc): Drop IN parameter, invoke
code moved to emit-rtl.cc:address_reload_context::emit_autoinc.
(curr_insn_transform): Drop redundant IN parameter in call to
emit_inc.
* recog.h (class recog_data_saver): New.
For example, for GCN or nvptx target configurations, using newlib:
FAIL: gcc.dg/pr110279-2.c (test for excess errors)
UNRESOLVED: gcc.dg/pr110279-2.c scan-tree-dump-not reassoc2 "was chosen for reassociation"
UNRESOLVED: gcc.dg/pr110279-2.c scan-tree-dump-times optimized "\\.FMA " 3
[...]/source-gcc/gcc/testsuite/gcc.dg/pr110279-2.c:11:1: error: unknown type name '__attribute_noinline__'
[...]/source-gcc/gcc/testsuite/gcc.dg/pr110279-2.c:12:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'foo'
We cannot assume 'stdio.h' to define '__attribute_noinline__' -- but then, that
also isn't necessary for this test case (there is nothing to inline into).
gcc/testsuite/
* gcc.dg/pr110279-2.c: Don't '#include <stdio.h>'. Remove
'__attribute_noinline__'.
While looking at a bitint ICE, I've noticed we don't optimize
in f1 and f5 functions below the 2 casts into just one at GIMPLE,
even when optimize it in convert_to_integer if it appears in the same
stmt. The large match.pd simplification of two conversions in a row
has many complex rules and as the testcase shows, everything else from
the narrowest -> widest -> prec_in_between all integer conversions
is already handled, either because the inside_unsignedp == inter_unsignedp
rule kicks in, or the
&& ((inter_unsignedp && inter_prec > inside_prec)
== (final_unsignedp && final_prec > inter_prec))
one, but there is no reason why sign extension to from narrowest to
widest type followed by truncation to something in between can't be
done just as sign extension from narrowest to the final type. After all,
if the widest type is signed rather than unsigned, regardless of the final
type signedness we already handle it that way.
And since PR93044 we also handle it if the final precision is not wider
than the inside precision.
2023-12-15 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113024
* match.pd (two conversions in a row): Simplify scalar integer
sign-extension followed by truncation.
* gcc.dg/tree-ssa/pr113024.c: New test.
As shown in the testcase, .{ADD,SUB,MUL}_OVERFLOW calls are another
exception to the middle/large/huge _BitInt discovery through SSA_NAMEs
next to stores of INTEGER_CSTs to memory and their conversions to
floating point.
The calls can have normal COMPLEX_TYPE with INTEGER_TYPE elts return type
(or BITINT_TYPE with small precision) and one of the arguments can be
SSA_NAME with an INTEGER_TYPE or small BITINT_TYPE as well; still, when
there is an INTEGER_CST argument with large/huge BITINT_TYPE, we need to
lower it that way.
2023-12-15 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113003
* gimple-lower-bitint.cc (arith_overflow_arg_kind): New function.
(gimple_lower_bitint): Use it to catch .{ADD,SUB,MUL}_OVERFLOW
calls with large/huge INTEGER_CST arguments.
* gcc.dg/bitint-54.c: New test.
This test fails on darwin as it does not support _Decimal64,
so require dfp for it.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr112943.c: Require dfp.
Currently move_max follows the tuning feature first, but ideally it
should sync with prefer-vector-width when it is explicitly set to keep
vector move and operation with same vector size.
gcc/ChangeLog:
PR target/112824
* config/i386/i386-options.cc (ix86_option_override_internal):
Sync ix86_move_max/ix86_store_max with prefer_vector_width when
it is explicitly set.
gcc/testsuite/ChangeLog:
PR target/112824
* gcc.target/i386/pieces-memset-45.c: Remove
-mprefer-vector-width=256.
* g++.target/i386/pr112824-1.C: New test.
gcc/ChangeLog:
* config/i386/driver-i386.cc (host_detect_local_cpu): Do not
set Grand Ridge depending on RAO-INT.
* config/i386/i386.h: Remove PTA_RAOINT from PTA_GRANDRIDGE.
* doc/invoke.texi: Adjust documentation.
Notice current generic vector cost model make PR112387 failed to vectorize.
Adapt it same as ARM SVE generic vector cost model which can fix it.
Committed as it is obvious fix.
PR target/112387
gcc/ChangeLog:
* config/riscv/riscv.cc: Adapt generic cost model same ARM SVE.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr112387.c: Moved to...
* gcc.dg/vect/costmodel/riscv/rvv/pr112387-1.c: ...here.
* gcc.dg/vect/costmodel/riscv/rvv/pr112387-2.c: New test.
Follow Richard's suggestions, we should not model address cost in the loop
vectorizer for select_vl or decrement IV since other style vectorization doesn't
do that.
To make cost model comparison apple to apple.
This patch set COST from 2 to 1 which turns out have better codegen
in various codegen for RVV.
Ok for trunk ?
PR target/111153
gcc/ChangeLog:
* tree-vect-loop.cc (vect_estimate_min_profitable_iters):
Remove address cost for select_vl/decrement IV.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr111153.c: Moved to...
* gcc.dg/vect/costmodel/riscv/rvv/pr11153-2.c: ...here.
* gcc.dg/vect/costmodel/riscv/rvv/pr111153-1.c: New test.
This adds the C++23 std::print functions, which use std::format to write
to a FILE stream or std::ostream (defaulting to stdout).
The new extern symbols are in the libstdc++exp.a archive, so we aren't
committing to stable symbols in the DSO yet. There's a UTF-8 validating
and transcoding function added by this change. That can certainly be
optimized, but it's internal to libstdc++exp.a so can be tweaked later
at leisure.
Currently the external symbols work for all targets, but are only
actually used for Windows, where it's necessary to transcode to UTF-16
to write to the console. The standard seems to encourage us to also
diagnose invalid UTF-8 for non-Windows targets when writing to a
terminal (and only when writing to a terminal), but I'm reliably
informed that that wasn't the intent of the wording. Checking for
invalid UTF-8 sequences only needs to happen for Windows, which is good
as checking for a terminal requires a call to isatty, and on Linux that
uses an ioctl syscall, which would make std::print ten times slower!
Testing the std::print behaviour is difficult if it depends on whether
the output stream is connected to a Windows console or not, as we can't
(as far as I know) do that non-interactively in DejaGNU. One of the new
tests uses the internal __write_to_terminal function directly. That
allows us to verify its UTF-8 error handling on POSIX targets, even
though that's not actually used by std::print. For Windows, that
__write_to_terminal function transcodes to UTF-16 but then uses
WriteConsoleW which fails unless it really is writing to the console.
That means the 27_io/print/2.cc test FAILs on Windows. The UTF-16
transcoding has been manually tested using mingw-w64 and Wine, and
appears to work.
libstdc++-v3/ChangeLog:
PR libstdc++/107760
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/version.def (__cpp_lib_print): Define.
* include/bits/version.h: Regenerate.
* include/std/format (__literal_encoding_is_utf8): New function.
(_Seq_sink::view()): New member function.
* include/std/ostream (vprintf_nonunicode, vprintf_unicode)
(print, println): New functions.
* include/std/print: New file.
* src/c++23/Makefile.am: Add new source file.
* src/c++23/Makefile.in: Regenerate.
* src/c++23/print.cc: New file.
* testsuite/27_io/basic_ostream/print/1.cc: New test.
* testsuite/27_io/print/1.cc: New test.
* testsuite/27_io/print/2.cc: New test.
Fix an incorrect call to _Sink::_M_reserve() which should have passed
the __n parameter. This was not actually a problem because it was in an
discarded statement, because only the _Seq_sink<basic_string<C>>
specialization was used.
Also add some branch prediction hints, explanatory comments, and debug
mode assertions to _Seq_sink.
libstdc++-v3/ChangeLog:
* include/std/format (_Seq_sink): Fix missing argument in
discarded statement. Add comments, likely/unlikely attributes
and debug assertions as sanity checks.
These tests are expected to run interactively, with the output checked
by eye. Nobody ever does that, but we can at least use dg-output to
check that the output is as expected.
libstdc++-v3/ChangeLog:
* testsuite/27_io/objects/char/2.cc: Use dg-output.
* testsuite/27_io/objects/wchar_t/2.cc: Use dg-output.
I got the order of arguments to std::format_to wrong. It was in a
discarded statement, for a case which wasn't being tested.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (__formatter_chrono::_M_S): Fix order
of arguments to std::format_to.
* testsuite/20_util/duration/io.cc: Test subsecond duration with
floating-point rep.
This is sort of like r14-5514, but at block scope. Consider
struct A { A(int, int); };
void
g (int a)
{
A bar(auto(a), 42); // not a fn decl
}
where we emit error: 'auto' parameter not permitted in this context
which is bogus -- bar doesn't declare a function, so the auto is OK,
but we don't know it till we've seen the second argument. The error
comes from grokdeclarator invoked just after we've parsed the auto(a).
A possible approach seems to be to delay the auto parameter checking
and only check once we know we are indeed dealing with a function
declaration. For tparms, we should still emit the error right away.
PR c++/112482
gcc/cp/ChangeLog:
* decl.cc (grokdeclarator): Do not issue the auto parameter error while
tentatively parsing a function parameter.
* parser.cc (cp_parser_parameter_declaration_clause): Check it here.
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/auto-fncast15.C: New test.
After r14-2667-gceae1400cf24f329393e96dd9720, we force a constant to a register
if it is shared with one of the other operands. The problem is used the comparison
mode for the register but that could be different from the operand mode. This
causes some issues on some targets.
To fix it, we need to make sure the mode of the comparison matches the mode
of the other operands, before we can compare the constants (CONST_INT has no
modes so compare_rtx returns true if they have the same value even if the usage
is in a different mode).
Bootstrapped and tested on both aarch64-linux-gnu and x86_64-linux.
PR middle-end/111260
gcc/ChangeLog:
* optabs.cc (emit_conditional_move): Change the modes to be
equal before forcing the constant to a register.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/condmove-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
This test shows that we cannot clear *walk_subtrees in
cp_fold_immediate_r when we're in_immediate_context, because that
checks even e.g. sk_template_parms, and, as the comment says, affects
cp_fold_r as well. Here we had an expression with
min ((long int) VIEW_CONVERT_EXPR<long unsigned int>(bytecount), (long int) <<< Unknown tree: sizeof_expr
(int) <<< error >>> >>>)
as its sub-expression, and we never evaluated that into
min ((long int) bytecount, 4)
so the SIZEOF_EXPR leaked into the middle end. We need to make sure
we are calling cp_fold on the SIZEOF_EXPR.
PR c++/112869
gcc/cp/ChangeLog:
* cp-gimplify.cc (cp_fold_immediate_r): Check cp_unevaluated_operand
and DECL_IMMEDIATE_FUNCTION_P rather than in_immediate_context.
gcc/testsuite/ChangeLog:
* g++.dg/template/sizeof18.C: New test.
Recent commit f5fc001a84
"aarch64: enable mixed-types for aarch64 simdclones" added lines to those
test cases and GCN-specific line numbers got out of sync, which had
originally gotten added in commit b73c49f6f8
"amdgcn: OpenMP SIMD routine support".
gcc/testsuite/
* gcc.dg/vect/vect-simd-clone-1.c: Update GCN 'dg-warning's.
* gcc.dg/vect/vect-simd-clone-2.c: Likewise.
* gcc.dg/vect/vect-simd-clone-3.c: Likewise.
* gcc.dg/vect/vect-simd-clone-4.c: Likewise.
* gcc.dg/vect/vect-simd-clone-5.c: Likewise.
* gcc.dg/vect/vect-simd-clone-8.c: Likewise.
Add a new parameter param_fully_pipelined_fma. If it is non-zero,
reassociation considers the benefit of parallelizing FMA's
multiplication part and addition part, assuming FMUL and FMA use the
same units that can also do FADD.
With the patch and new option, there's ~2% improvement in spec2017
508.namd on AmpereOne. (The other options are "-Ofast -mcpu=ampere1
-flto".)
PR tree-optimization/110279
gcc/ChangeLog:
* doc/invoke.texi: New parameter fully-pipelined-fma.
* params.opt: New parameter fully-pipelined-fma.
* tree-ssa-reassoc.cc (get_mult_latency_consider_fma): Return
the latency of MULT_EXPRs that can't be hidden by the FMAs.
(get_reassociation_width): Search for a smaller width
considering the benefit of fully pipelined FMA.
(rank_ops_for_fma): Return the number of MULT_EXPRs.
(reassociate_bb): Pass the number of MULT_EXPRs to
get_reassociation_width; avoid calling
get_reassociation_width twice.
gcc/testsuite/ChangeLog:
* gcc.dg/pr110279-2.c: New test.
PR fortran/112873
gcc/fortran/ChangeLog:
* gfortran.texi: Update to reflect the changes.
* intrinsic.cc (add_functions): Update the standard that the
various degree trigonometric functions have been described in.
(gfc_check_intrinsic_standard): Add an error string for F2023.
* intrinsic.texi: Update accordingly.
The test says that CTAD from inherited constructors doesn't work
before C++23 so we should use c++20_down for the error.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/class-deduction67.C: Correct dg-error target.
In extract_bit_field_1 we try to get a better vector mode before
extracting from it. Better refers to the case when the requested target
mode does not equal the inner mode of the vector to extract from and we
have an equivalent tieable vector mode with a fitting inner mode.
On riscv this triggered an ICE (PR112999) because we would take the
detour of extracting from a mask-mode vector via a vector integer mode.
One element of that mode could be subreg-punned with TImode which, in
turn, would need to be operated on in DImode chunks.
This patch adds
&& known_eq (bitsize, GET_MODE_UNIT_PRECISION (new_mode))
&& multiple_p (bitnum, GET_MODE_UNIT_PRECISION (new_mode))
to the list of criteria for a better mode.
gcc/ChangeLog:
PR target/112999
* expmed.cc (extract_bit_field_1): Ensure better mode
has fitting unit_precision.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112999.c: New test.
This changes the vec_extract path of extract_bit_field to use
GET_MODE_PRECISION instead of GET_MODE_BITSIZE and uses
the mode obtained from insn_data[icode].operand[0] as target mode.
Also, it adds a vec_extract<mode>bi expander for riscv that maps
to vec_extract<mode>qi. This fixes an ICE on riscv where we did
not find a vec_extract optab and continued with the generic code
which requires 1-byte alignment that riscv mask modes do not provide.
Apart from that it adds poly_int support to riscv's vec_extract
expander and makes the RVV..BImode -> QImode expander call
emit_vec_extract in order not to duplicate code.
gcc/ChangeLog:
PR target/112773
* config/riscv/autovec.md (vec_extract<mode>bi): New expander
calling vec_extract<mode>qi.
* config/riscv/riscv-protos.h (riscv_legitimize_poly_move):
Export.
(emit_vec_extract): Change argument from poly_int64 to rtx.
* config/riscv/riscv-v.cc (shuffle_extract_and_slide1up_patterns):
Ditto.
* config/riscv/riscv.cc (riscv_legitimize_poly_move): Export.
(riscv_legitimize_move): Use rtx instead of poly_int64.
* expmed.cc (store_bit_field_1): Change BITSIZE to PRECISION.
(extract_bit_field_1): Change BITSIZE to PRECISION and use
return mode from insn_data as target mode.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/partial/pr112773.c: New test.
As it stands, GCC doesn't document any public AArch64-specific operand
modifiers for use in inline asm. This patch fixes that by documenting
an initial set of public AArch64-specific operand modifiers.
gcc/ChangeLog:
* doc/extend.texi: Document AArch64 Operand Modifiers.
This makes constexpr std::vector (mostly) work in Debug Mode. All safe
iterator instrumentation and checking is disabled during constant
evaluation, because it requires mutex locks and calls to non-inline
functions defined in libstdc++.so. It should be OK to disable the safety
checks, because most UB should be detected during constant evaluation
anyway.
We could try to enable the full checking in constexpr, but it would mean
wrapping all the non-inline functions like _M_attach with an inline
_M_constexpr_attach that does the iterator housekeeping inline without
mutex locks when called for constant evaluation, and calls the
non-inline function at runtime. That could be done in future if we find
that we've lost safety or useful checking by disabling the safe
iterators.
There are a few test failures in C++20 mode, which I'm unable to
explain. The _Safe_iterator::operator++() member gives errors for using
non-constexpr functions during constant evaluation, even though those
functions are guarded by std::is_constant_evaluated() checks. The same
code works fine for C++23 and up.
libstdc++-v3/ChangeLog:
PR libstdc++/109536
* include/bits/c++config (__glibcxx_constexpr_assert): Remove
macro.
* include/bits/stl_algobase.h (__niter_base, __copy_move_a)
(__copy_move_backward_a, __fill_a, __fill_n_a, __equal_aux)
(__lexicographical_compare_aux): Add constexpr to overloads for
debug mode iterators.
* include/debug/helper_functions.h (__unsafe): Add constexpr.
* include/debug/macros.h (_GLIBCXX_DEBUG_VERIFY_COND_AT): Remove
macro, folding it into ...
(_GLIBCXX_DEBUG_VERIFY_AT_F): ... here. Do not use
__glibcxx_constexpr_assert.
* include/debug/safe_base.h (_Safe_iterator_base): Add constexpr
to some member functions. Omit attaching, detaching and checking
operations during constant evaluation.
* include/debug/safe_container.h (_Safe_container): Likewise.
* include/debug/safe_iterator.h (_Safe_iterator): Likewise.
* include/debug/safe_iterator.tcc (__niter_base, __copy_move_a)
(__copy_move_backward_a, __fill_a, __fill_n_a, __equal_aux)
(__lexicographical_compare_aux): Add constexpr.
* include/debug/vector (_Safe_vector, vector): Add constexpr.
Omit safe iterator operations during constant evaluation.
* testsuite/23_containers/vector/bool/capacity/constexpr.cc:
Remove dg-xfail-if for debug mode.
* testsuite/23_containers/vector/bool/cmp_c++20.cc: Likewise.
* testsuite/23_containers/vector/bool/cons/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/element_access/1.cc:
Likewise.
* testsuite/23_containers/vector/bool/element_access/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/modifiers/assign/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/modifiers/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/modifiers/swap/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/capacity/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/cmp_c++20.cc: Likewise.
* testsuite/23_containers/vector/cons/constexpr.cc: Likewise.
* testsuite/23_containers/vector/data_access/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/element_access/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/modifiers/assign/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/modifiers/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/modifiers/swap/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/cons/destructible_debug_neg.cc:
Adjust dg-error line number.
When BB reduction vectorization picks up a chain with an ASM def
in it and that's inside the vectorized region we fail to get its
LHS. Instead of trying to get the correct def the following
avoids vectorizing such def and instead keeps it as def to add
in the epilog.
PR tree-optimization/113018
* tree-vect-slp.cc (vect_slp_check_for_roots): Only start
SLP discovery from stmts with a LHS.
This patch implements C++23 class template argument deduction from
inherited constructors, the mechanism for which relies on alias
CTAD which we already fully support. The process for transforming
the return type of an inherited guide is specified in terms of a
partially specialized class template, but this patch implements it in
a simpler way, effectively performing ahead of time deduction instead
of instantiation time deduction. I wasn't able to find an example for
which this implementation strategy makes a difference, but I didn't
look very hard. Support seems good enough to advertise as complete
but there doesn't seem to be a feature-test macro update for this
feature yet. There should be no functional change before C++23 mode.
There's a couple of FIXMEs, one in inherited_ctad_tweaks for recognizing
more forms of inherited constructors, and one in deduction_guides_for for
making the cache aware of base-class dependencies.
gcc/cp/ChangeLog:
* cp-tree.h (type_targs_deducible_from): Adjust return type.
* pt.cc (alias_ctad_tweaks): Also handle C++23 inherited CTAD.
(inherited_ctad_tweaks): Define.
(type_targs_deducible_from): Return the deduced arguments or
NULL_TREE instead of a bool. Handle 'tmpl' being a TREE_LIST
representing a synthetic alias template.
(ctor_deduction_guides_for): Do inherited_ctad_tweaks for each
USING_DECL in C++23 mode.
(deduction_guides_for): Add FIXME for stale cache entries in
light of inherited CTAD.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/class-deduction67.C: Accept in C++23 mode.
* g++.dg/cpp23/class-deduction-inherited1.C: New test.
* g++.dg/cpp23/class-deduction-inherited2.C: New test.
* g++.dg/cpp23/class-deduction-inherited3.C: New test.
* g++.dg/cpp23/class-deduction-inherited4.C: New test.
The following makes the attempt at code-generating a constant/external
SLP node twice well-formed as that can happen when partitioning BB
vectorization attempts where we keep constants/externals unpartitioned.
PR tree-optimization/112793
* tree-vect-slp.cc (vect_schedule_slp_node): Already
code-generated constant/external nodes are OK.
* g++.dg/vect/pr112793.cc: New testcase.
Avoid copying eedges in infinite_loop::infinite_loop.
Use initializer lists in the various places reported in
PR analyzer/112655 (apart from coord_test's ctor, which
would require nontrivial refactoring).
gcc/analyzer/ChangeLog:
PR analyzer/112655
* infinite-loop.cc (infinite_loop::infinite_loop): Pass eedges
via rvalue reference rather than by value.
(starts_infinite_loop_p): Move eedges when constructing an
infinite_loop instance.
* sm-file.cc (fileptr_state_machine::fileptr_state_machine): Use
initializer list for states.
* sm-sensitive.cc
(sensitive_state_machine::sensitive_state_machine): Likewise.
* sm-signal.cc (signal_state_machine::signal_state_machine):
Likewise.
* sm-taint.cc (taint_state_machine::taint_state_machine):
Likewise.
* varargs.cc (va_list_state_machine::va_list_state_machine): Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Being very simplistic, early-ra just models an allocno's live range
as a single interval. This doesn't work well for single-register
accumulators that are updated multiple times in a loop, since in
SSA form, each intermediate result will be a separate SSA name and
will remain separate from the accumulator even after out-of-ssa.
This means that in something like:
for (;;)
{
x = x + ...;
x = x + ...;
}
the first definition of x and the second use will be a separate pseudo
from the "main" loop-carried pseudo.
A real RA would fix this by keeping general, segmented live ranges.
But that feels like a slippery slope in this context.
This patch instead looks for sharability at a more local level,
as described in the comments. It's a bit hackish, but hopefully
not too much.
The patch also contains some small tweaks that are needed to make
the new and existing tests pass:
- fix a case where a pseudo that was only moved was wrongly treated
as not an FPR candidate
- fix some bookkeeping related to is_strong_copy_src
- use the number of FPR preferences as a tiebreaker when sorting colors
I fully expect that we'll need to be more aggressive at skipping the
early-ra allocation. For example, it probably makes sense to refuse any
allocation that involves an FPR move. But I'd like to keep collecting
examples of where things go wrong first, so that hopefully we can improve
the cases with strided registers or structures.
gcc/
* config/aarch64/aarch64-early-ra.cc (allocno_info::is_equiv): New
member variable.
(allocno_info::equiv_allocno): Replace with...
(allocno_info::related_allocno): ...this member variable.
(allocno_info::chain_prev): Put into an enum with...
(allocno_info::last_use_point): ...this new member variable.
(color_info::num_fpr_preferences): New member variable.
(early_ra::m_shared_allocnos): Likewise.
(allocno_info::is_shared): New member function.
(allocno_info::is_equiv_to): Likewise.
(early_ra::dump_allocnos): Dump sharing information. Tweak column
widths.
(early_ra::fpr_preference): Check ALLOWS_NONFPR before returning -2.
(early_ra::start_new_region): Handle m_shared_allocnos.
(early_ra::create_allocno_group): Set related_allocno rather than
equiv_allocno.
(early_ra::record_allocno_use): Likewise. Detect multiple calls
for the same program point. Update last_use_point and is_equiv.
Clear is_strong_copy_src rather than is_strong_copy_dest.
(early_ra::record_allocno_def): Use related_allocno rather than
equiv_allocno. Update last_use_point.
(early_ra::valid_equivalence_p): Replace with...
(early_ra::find_related_start): ...this new function.
(early_ra::record_copy): Look for cases where a destination copy chain
can be shared with the source allocno.
(early_ra::find_strided_accesses): Update for equiv_allocno->
related_allocno change. Only call consider_strong_copy_src_chain
at the head of a copy chain.
(early_ra::is_chain_candidate): Skip shared allocnos. Update for
new representation of equivalent allocnos.
(early_ra::chain_allocnos): Update for new representation of
equivalent allocnos.
(early_ra::try_to_chain_allocnos): Likewise.
(early_ra::merge_fpr_info): New function, split out from...
(early_ra::set_single_color_rep): ...here.
(early_ra::form_chains): Handle shared allocnos.
(early_ra::process_copies): Count the number of FPR preferences.
(early_ra::cmp_decreasing_size): Rename to...
(early_ra::cmp_allocation_order): ...this. Sort equal-sized groups
by the number of FPR preferences.
(early_ra::finalize_allocation): Handle shared allocnos.
(early_ra::process_region): Reset chain_prev as well as chain_next.
gcc/testsuite/
* gcc.target/aarch64/sve/accumulators_1.c: New test.
* gcc.target/aarch64/sve/acle/asm/create2_1.c: Allow the moves to
be in any order.
* gcc.target/aarch64/sve/acle/asm/create3_1.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/create4_1.c: Likewise.
Arrange for strub internal wrappers to pass volatile arguments by
reference to the wrapped bodies.
for gcc/ChangeLog
PR middle-end/112938
* ipa-strub.cc (pass_ipa_strub::execute): Pass volatile args
by reference to internal strub wrapped bodies.
for gcc/testsuite/ChangeLog
PR middle-end/112938
* gcc.dg/strub-internal-volatile.c: Check indirection of
volatile args.
When generating code for an internal strub wrapper, don't clear the
DECL_NOT_GIMPLE_REG_P flag of volatile args, and gimplify them both
before and after any conversion.
While at that, move variable TMP into narrower scopes so that it's
more trivial to track where ARG lives.
for gcc/ChangeLog
PR middle-end/112938
* ipa-strub.cc (pass_ipa_strub::execute): Handle promoted
volatile args in internal strub. Simplify.
for gcc/testsuite/ChangeLog
PR middle-end/112938
* gcc.dg/strub-internal-volatile.c: New.
More fallout from the c99 conversion. The m68k specific test pr63347.c calls
exit and abort without a prototype in scope. This patch turns them into
__builtin calls avoiding the error.
Bootstrapped and regression tested on m68k-linux-gnu, pushed to the trunk.
gcc/testsuite
* gcc.target/m68k/pr63347.c: Call __builtin_abort and __builtin_exit
instead of abort and exit.
... to avoid issues such as:
In file included from [...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/xmmintrin.h:34:0,
from [...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/x86intrin.h:31,
from [...]/i686-pc-linux-gnu/include/c++/5.2.0/i686-pc-linux-gnu/64/bits/opt_random.h:33,
from [...]/i686-pc-linux-gnu/include/c++/5.2.0/random:50,
from [...]/i686-pc-linux-gnu/include/c++/5.2.0/bits/stl_algo.h:66,
from [...]/i686-pc-linux-gnu/include/c++/5.2.0/algorithm:62,
from [...]/source-gcc/gcc/gimple-ssa-sccopy.cc:32:
[...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/mm_malloc.h:42:12: error: attempt to use poisoned "malloc"
return malloc (size);
^
make[2]: *** [Makefile:1197: gimple-ssa-sccopy.o] Error 1
Minor fix-up for commit cd794c3961
"A new copy propagation and PHI elimination pass".
gcc/
* gimple-ssa-sccopy.cc: '#define INCLUDE_ALGORITHM' instead of
'#include <algorithm>'.
Define the libgrust directory as a host compilation module as well as
for targets. Disable target libgrust if we're not building target
libstdc++.
ChangeLog:
* Makefile.def: Add libgrust as host & target module.
* configure.ac: Add libgrust to host tools list. Add libgrust to
noconfigdirs if we're not building target libstdc++.
* Makefile.in: Regenerate.
* configure: Regenerate.
gcc/rust/ChangeLog:
* config-lang.in: Add libgrust as a target module for the rust
language.
Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
Signed-off-by: Pierre-Emmanuel Patry <pierre-emmanuel.patry@embecosm.com>
On top of the previously posted patch, this simplifies say (x * 16) / (x * 4)
into 4. Unlike the previous pattern, this is something we didn't fold
previously on GENERIC, so I think it shouldn't be all wrapped with #if
GIMPLE. The question whether there should be fold_overflow_warning for the
TYPE_OVERFLOW_UNDEFINED case remains.
2023-12-14 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112994
* match.pd ((t * u) / (t * v) -> (u / v)): New simplification.
* gcc.dg/tree-ssa/pr112994-2.c: New test.
The following testcase is optimized just on GENERIC (using
strict_overflow_p = false;
if (TREE_CODE (arg1) == INTEGER_CST
&& (tem = extract_muldiv (op0, arg1, code, NULL_TREE,
&strict_overflow_p)) != 0)
{
if (strict_overflow_p)
fold_overflow_warning (("assuming signed overflow does not occur "
"when simplifying division"),
WARN_STRICT_OVERFLOW_MISC);
return fold_convert_loc (loc, type, tem);
}
) but not on GIMPLE.
An earlier version of the patch regressed
+FAIL: gcc.dg/Wstrict-overflow-3.c correct warning (test for warnings, line 12)
test, we are indeed assuming that signed overflow does not occur
when simplifying division in there.
This version of the patch (which provides the simplification only
for GIMPLE) fixes that.
And/or we could add the
fold_overflow_warning (("assuming signed overflow does not occur "
"when simplifying division"),
WARN_STRICT_OVERFLOW_MISC);
call into the simplification, but in that case IMHO it should go into
the (t * u) / u -> t simplification as well, there we assume the exact
same thing (of course, in both cases only in the spots where we don't
verify it through ranger that it never overflows).
Guarding the whole simplification to GIMPLE only IMHO makes sense because
the above mentioned folding does it for GENERIC (and extract_muldiv even
handles far more cases, dunno how many from that we should be doing on
GIMPLE in match.pd and what could be done elsewhere; e.g. extract_muldiv
can handle (x * 16 + y * 32) / 8 -> x * 2 + y * 4 etc.).
Dunno about the fold_overflow_warning, I always have doubts about why
such a warning is useful to users.
2023-12-14 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112994
* match.pd ((t * 2) / 2 -> t): Adjust comment to use u instead of 2.
Punt without range checks if TYPE_OVERFLOW_SANITIZED.
((t * u) / v -> t * (u / v)): New simplification.
* gcc.dg/tree-ssa/pr112994-1.c: New test.
This patch adds the strongly-connected copy propagation (SCCOPY) pass.
It is a lightweight GIMPLE copy propagation pass that also removes some
redundant PHI statements. It handles degenerate PHIs, e.g.:
_5 = PHI <_1>;
_6 = PHI <_6, _6, _1, _1>;
_7 = PHI <16, _7>;
// Replaces occurences of _5 and _6 by _1 and _7 by 16
It also handles more complicated situations, e.g.:
_8 = PHI <_9, _10>;
_9 = PHI <_8, _10>;
_10 = PHI <_8, _9, _1>;
// Replaces occurences of _8, _9 and _10 by _1
gcc/ChangeLog:
* Makefile.in: Added sccopy pass.
* passes.def: Added sccopy pass before LTO streaming and before
RTL expansion.
* tree-pass.h (make_pass_sccopy): Added sccopy pass.
* gimple-ssa-sccopy.cc: New file.
gcc/testsuite/ChangeLog:
* gcc.dg/sccopy-1.c: New test.
Signed-off-by: Filip Kastl <fkastl@suse.cz>
This patch half-reverts 3aaf704bca and replaces it with a fix with
relaxed requiremets for invoking build_reconstructed_reference in
build_ref_for_model.
build_ref_for_model/build_ref_for_offset is used in two slightly
different contexts. The first is when we are looking at an assignmernt
like
p->field_A.field_B = s.field_B;
and we have a replacements for e.g. s.field_B.field_C.field_D and we
want to store them directly to p->field_A.field_B.field_C.field_D (as
opposed to going through s or using a MEM_REF based in
p->field_A.field_B). In this case, the offset of the
"model" (s.field_B.field_C.field_D) within this can be different than
offset within the LHS that we want to reach (field_C.field_D within
the "base" p->field_A.field_B). Patch 3aaf704bca has caused us to
unnecessarily create MEM_REFs for these situations. These uses of
build_ref_for_model work with the relaxed condition just fine.
The second, problematic, context is when somewhere in the function we
have an assignment
s.field_A = t.field_A.field_B;
and we are creating an access structure to represent s.field_A.field_B
even if it is not actually accessed in the original input. This is
done after scanning the entire function body and we need to construct
a "universal" reference to s.field_A.field_B. In this case the "base"
is "s" and it has to be the DECL itself and not some reference for it
because for arbitrary references we need a GSI pointing to a statement
which we don't have, the reference is supposed to be universal.
But then using build_ref_for_model and within it
build_reconstructed_reference misbihaves if the expression contains
any ARRAY_REFs. In the first case those are fine because as we
eventually reach the aggregate type that matches a real LHS or RHS, we
know we we can just bolt the rest of the references onto it and end up
with the correct overall reference. However when dealing with
s.array[1].field_A = s.array[2].field_B;
we cannot just bolt array[2] reference when we want array[1] but that
is exactly what happens when we use build_reconstructed_reference and
keep it walking all the way to s.
I was consiering making all users of the second kind use directly
build_ref_for_offset instead of build_ref_for_model but the latter
also handles COMPONENT_REFs to bit-fields which the former does not.
THerefore I have deided to use the NULL-ness of GSI as an indicator
how strict we need to be. I have changed the function comment to
reflect that.
I have been able to observe diambiguation improvements with this patch
over currenct master, we do successfuly manage a few more
aliasing_component_refs_p disambiguations when compiling cc1, going
from:
Alias oracle query stats:
refs_may_alias_p: 94354287 disambiguations, 106279231 queries
ref_maybe_used_by_call_p: 1572511 disambiguations, 95618222 queries
call_may_clobber_ref_p: 649273 disambiguations, 659371 queries
stmt_kills_ref_p: 142342 kills, 8407309 queries
nonoverlapping_component_refs_p: 19 disambiguations, 10227 queries
nonoverlapping_refs_since_match_p: 15665 disambiguations, 52585 must overlaps, 68893 queries
aliasing_component_refs_p: 67090 disambiguations, 3081766 queries
TBAA oracle: 22675296 disambiguations 61781978 queries
14045969 are in alias set 0
10997085 queries asked about the same object
153 queries asked about the same alias set
0 access volatile
12485774 are dependent in the DAG
1577701 are aritificially in conflict with void *
Modref stats:
modref kill: 832 kills, 19399 queries
modref use: 50760 disambiguations, 1825109 queries
modref clobber: 1371014 disambiguations, 40152535 queries
5190238 tbaa queries (0.129263 per modref query)
1341663 base compares (0.033414 per modref query)
PTA query stats:
pt_solution_includes: 36784427 disambiguations, 46141175 queries
pt_solutions_intersect: 4519387 disambiguations, 17081996 queries
to:
Alias oracle query stats:
refs_may_alias_p: 94354083 disambiguations, 106278948 queries
ref_maybe_used_by_call_p: 1572511 disambiguations, 95618018 queries
call_may_clobber_ref_p: 649273 disambiguations, 659371 queries
stmt_kills_ref_p: 142342 kills, 8407310 queries
nonoverlapping_component_refs_p: 19 disambiguations, 10227 queries
nonoverlapping_refs_since_match_p: 15665 disambiguations, 52585 must overlaps, 68893 queries
aliasing_component_refs_p: 67104 disambiguations, 3081781 queries
TBAA oracle: 22676608 disambiguations 61782455 queries
14044948 are in alias set 0
10998619 queries asked about the same object
153 queries asked about the same alias set
0 access volatile
12484882 are dependent in the DAG
1577245 are aritificially in conflict with void *
Modref stats:
modref kill: 832 kills, 19399 queries
modref use: 50760 disambiguations, 1825106 queries
modref clobber: 1371028 disambiguations, 40152504 queries
5190319 tbaa queries (0.129265 per modref query)
1341403 base compares (0.033408 per modref query)
PTA query stats:
pt_solution_includes: 36784449 disambiguations, 46141210 queries
pt_solutions_intersect: 4519320 disambiguations, 17082083 queries
gcc/ChangeLog:
2023-12-13 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/111807
* tree-sra.cc (build_ref_for_model): Allow offset smaller than
model->offset when gsi is non-NULL. Adjust function comment.