When top-level configure has either --enable-checking=valgrind or
--enable-valgrind-annotations, we want to activate a couple of workarounds
in libcpp. They do not use anything from the Valgrind API, so just
delete all detection.
libcpp/ChangeLog:
* config.in: Regenerate.
* configure: Regenerate.
* configure.ac (ENABLE_VALGRIND_CHECKING): Delete.
(ENABLE_VALGRIND_ANNOTATIONS): Rename to
ENABLE_VALGRIND_WORKAROUNDS. Delete Valgrind header checks.
* lex.cc (new_buff): Adjust for renaming.
(_cpp_free_buff): Ditto.
The following testcase ICEs, because cbranchv16qi4 expansion calls
ix86_expand_branch with op1 being a pre-AVX unaligned memory and
ix86_expand_branch emits a xorv16qi3 instruction without making sure
the operand predicates are satisfied.
While I could manually check if the argument (or both?) doesn't
match vector_operand predicate (apparently this one or bcst_vector_operand
is used in all integral 16+ bytes *xorv*3 instructions) force it into a
register, but as all gen_xorv*3 expanders call
ix86_expand_vector_logical_operator, it seems easier to just call that
function which ensures the right thing happens. Calling the individual
gen_xorv*3 functions would mean ugly switch on the modes and using high
level expand_simple_binop here seems too high level to me.
2023-11-24 Jakub Jelinek <jakub@redhat.com>
PR target/112681
* config/i386/i386-expand.cc (ix86_expand_branch): Use
ix86_expand_vector_logical_operator to expand vector XOR rather than
gen_rtx_SET on gen_rtx_XOR.
* gcc.target/i386/sse4-pr112681.c: New test.
This adds some helpers to access-utils.h for removing accesses from an
access_array. This is needed by the upcoming aarch64 load/store pair
fusion pass.
gcc/ChangeLog:
* rtl-ssa/access-utils.h (filter_accesses): New.
(remove_regno_access): New.
(check_remove_regno_access): New.
* rtl-ssa/accesses.cc (rtl_ssa::remove_note_accesses_base): Use
new filter_accesses helper.
The upcoming aarch64 load pair pass needs to form store pairs, and can
re-order stores over loads when alias analysis determines this is safe.
In the case that both mem defs have uses in the RTL-SSA IR, and both
stores require re-ordering over their uses, we represent that as
(tentative) deletion of the original store insns and creation of a new
insn, to prevent requiring repeated re-parenting of uses during the
pass. We then update all mem uses that require re-parenting in one go
at the end of the pass.
To support this, RTL-SSA needs to handle inserting new insns (rather
than just changing existing ones), so this patch adds support for that.
New insns (and new accesses) are temporaries, allocated above a temporary
obstack_watermark, such that the user can easily back out of a change without
awkward bookkeeping.
gcc/ChangeLog:
* rtl-ssa/accesses.cc (function_info::create_set): New.
* rtl-ssa/accesses.h (access_info::is_temporary): New.
* rtl-ssa/changes.cc (move_insn): Handle new (temporary) insns.
(function_info::finalize_new_accesses): Handle new/temporary
user-created accesses.
(function_info::apply_changes_to_insn): Ensure m_is_temp flag
on new insns gets cleared.
(function_info::change_insns): Handle new/temporary insns.
(function_info::create_insn): New.
* rtl-ssa/changes.h (class insn_change): Make function_info a
friend class.
* rtl-ssa/functions.h (function_info): Declare new entry points:
create_set, create_insn. Declare new change_alloc helper.
* rtl-ssa/insns.cc (insn_info::print_full): Identify temporary insns in
dump.
* rtl-ssa/insns.h (insn_info): Add new m_is_temp flag and accompanying
is_temporary accessor.
* rtl-ssa/internals.inl (insn_info::insn_info): Initialize m_is_temp to
false.
* rtl-ssa/member-fns.inl (function_info::change_alloc): New.
* rtl-ssa/movement.h (restrict_movement_for_defs_ignoring): Add
handling for temporary defs.
The following testcase is lowered by the bitint lowering pass, then
vectorizer vectorizes one of the loops in it, so we have
vect__18.6_34 = VIEW_CONVERT_EXPR<vector(4) unsigned long>(x_35(D));
_8 = BIT_FIELD_REF <vect__18.6_34, 64, 0>;
...
_18 = BIT_FIELD_REF <vect__18.6_34, 64, 64>;
etc. where x_35(D) is _BitInt(256) argument. That is valid BIT_FIELD_REF,
the first argument is a vector and it extracts the vector elements from it.
Then comes forwprop4 and simplifies that using match.pd into
_8 = (unsigned long) x_35(D);
...
_18 = BIT_FIELD_REF <x_35(D), 64, 64>;
and tree-cfg verification ICEs on the latter (though, even the first cast
is kind of undesirable after bitint lowering, we want large/huge bitints
lowered). The ICE is because if BIT_FIELD_REFs first argument has
INTEGRAL_TYPE_P, we require type_has_mode_precision_p, but that is not the
case of _BitInt(256), it has BLKmode.
The following patch fixes it by doing the BIT_FIELD_REF with VCE to
BIT_FIELD_REF simplification only if the result is valid.
2023-11-24 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112673
* match.pd (bit_field_ref (vce @0) -> bit_field_ref @0): Only simplify
if either @0 doesn't have scalar integral type or if it has mode
precision.
* gcc.dg/pr112673.c: New test.
The bitint lowering pass only does something if it sees BITINT_TYPE (medium,
large, huge) SSA_NAMEs. In the past I've already ran into one special case
where the above doesn't work well, if there is a store of medium/large/huge
BITINT_TYPE INTEGER_CST into memory, there might not be any BITINT_TYPE
SSA_NAMEs in the function, yet we need to lower. This has been solved by
also checking for SSA_NAME_IS_VIRTUAL_OPERAND if at the vdef there isn't
such a store (the whole intent is make the pass as cheap as possible in the
currently very likely case that the IL doesn't have any BITINT_TYPEs at
all).
And the following testcase shows a similar problem. With -frounding-math
we don't fold some of FLOAT_EXPRs with INTEGER_CST operands, and if those
INTEGER_CSTs are medium/large/huge BITINT_TYPEs, we need to either cast
the INTEGER_CST to corresponding INTEGER_TYPE (for medium) or lower to
internal fn call which is later turned into libgcc call (for large/huge).
The following patch does that, but of course admittedly this discovery
of stores and FLOAT_EXPRs means we already look through quite a few
SSA_NAME_DEF_STMTs even when BITINT_TYPEs never appear.
2023-11-23 Jakub Jelinek <jakub@redhat.com>
PR middle-end/112679
* gimple-lower-bitint.cc (gimple_lower_bitint): Also stop first loop on
floating point SSA_NAME set in FLOAT_EXPR assignment from BITINT_TYPE
INTEGER_CST. Set has_large_huge for those if that BITINT_TYPE is large
or huge. Set kind to such FLOAT_EXPR assignment rhs1 BITINT_TYPE's kind.
* gcc.dg/bitint-42.c: New test.
The following makes sure to allocate enough space for vectype_op
in vectorizable_reduction.
PR tree-optimization/112677
* tree-vect-loop.cc (vectorizable_reduction): Use alloca
to allocate vectype_op.
The by pieces compare can be implemented by overlapped operations. So
it should be taken into consideration when doing the adjustment for
overlap operations. The mode returned from
widest_fixed_size_mode_for_size is already checked with mov_optab in
by_pieces_mode_supported_p called by widest_fixed_size_mode_for_size.
So it is no need to check mov_optab again in by_pieces_ninsns. The
patch fixes these issues.
gcc/
* expr.cc (by_pieces_ninsns): Include by pieces compare when
do the adjustment for overlap operations. Replace mov_optab
checks with gcc assertion.
As the following testcase shows, there are some bugs in the
-fnon-call-exceptions bit-field load lowering. In particular, there
is a case where we want to emit a load early in the initialization
(before m_init_gsi) and because that load might throw exception, need
to split block after the load so that it has an EH edge.
Now, across this splitting, we have m_init_gsi, save_gsi (something
we put back into m_gsi afterwards) statement iterators and m_preheader_bb
which is used to determine the pre-header edge of a loop (if any).
As the testcase shows, both of these statement iterators and m_preheader_bb
as well need adjustments if the block was split. If the stmt iterators
refer to a statement, they need to be updated so that if the statement is
in the bb after the split gsi_bb and gsi_seq is updated, otherwise they
ought to be the start of the new (second) bb.
Similarly, m_preheader_bb should be updated to the second bb if it was
the first before. Other spots where we insert something before m_init_gsi
don't split blocks in there and are fine.
The m_gsi iterator is normal iterator to insert statements before it,
so gsi_end_p means insert statements at the end of basic block.
m_init_gsi is on the other side an iterator after which statements should be
inserted (so gsi_end_p means insert statements at the start of basic block
after labels), but the whole pass is written for insertion of statements before
iterators, so when in 3 spots it wants to insert something after m_init_gsi,
it saves current iterator to save_gsi and sets m_gsi to gsi_after_labels
if m_init_gsi was gsi_end_p, or to the next statement. But it actually wasn't
updating m_init_gsi back when switching to normal iterator, this patch changes
that such that further statements after m_init_gsi will appear after the
set of statements inserted before m_init_gsi.
Finally, the pass had a couple of places where it wanted to create a gsi_end_p
iterator for a particular basic block, instead of doing
m_gsi = gsi_last_bb (bb); if (!gsi_end_p (m_gsi)) gsi_next (&m_gsi);
the pass now uses new m_gsi = gsi_end_bb (bb) function.
2023-11-24 Jakub Jelinek <jakub@redhat.com>
PR middle-end/112668
* gimple-iterator.h (gsi_end, gsi_end_bb): New inline functions.
* gimple-lower-bitint.cc (bitint_large_huge::handle_cast): After
temporarily adding statements after m_init_gsi, update m_init_gsi
such that later additions after it will be after the added statements.
(bitint_large_huge::handle_load): Likewise. When splitting
gsi_bb (m_init_gsi) basic block, update m_preheader_bb if needed
and update saved m_gsi as well if needed.
(bitint_large_huge::lower_mergeable_stmt,
bitint_large_huge::lower_comparison_stmt,
bitint_large_huge::lower_mul_overflow,
bitint_large_huge::lower_bit_query): Use gsi_end_bb.
* gcc.dg/bitint-40.c: New test.
The following testcase ICEs with -std=c++98 since r14-5086 because
block_may_fallthru is called on a TRY_CATCH_EXPR whose second operand
is a MODIFY_EXPR rather than STATEMENT_LIST, which try_catch_may_fallthru
apparently expects.
I've been wondering whether that isn't some kind of FE bug and whether
there isn't some unwritten rule that second operand of TRY_CATCH_EXPR
must be a STATEMENT_LIST. Looking at the FEs, the C++ FE uses mostly its
own trees, TRY_BLOCK (TRY_CATCH_EXPR replacement) with HANDLER in it (CATCH_EXPR
replacement) - but HANDLER can be immediate second operand rather than nested
in STATEMENT_LIST, EH_SPEC_BLOCK (this one stands for both TRY_CATCH_EXPR
and EH_FILTER_EXPR in its second argument); both of these are only replaced
by the generic trees during gimplification though, so will unlikely be seen
by block_may_fallthru; and then CLEANUP_STMT, which is genericized
into TRY_CATCH_EXPR with non-CATCH_EXPR/EH_FILTER_EXPR in its body (this is
the one that causes the ICE on this testcase).
The Go and Rust FEs create TRY_CATCH_EXPR with CATCH_EXPR immediately in its
second argument (but either are unlucky that block_may_fallthru isn't called
or the body can always fallthru, or latent ICE), while the D FE most likely
hit this ICE and attempts to work around it, by checking at TRY_CATCH_EXPR
creation time if the second argument from pop_stmt_list is STATEMENT_LIST and
if not, forcefully wraps it into a STATEMENT_LIST.
Unfortunately, I don't see an easy way to create an artificial tree iterator
from just a single tree statement, so the patch duplicates what the loops
later do (after all, it is very simple, just didn't want to duplicate
also the large comments explaning it, so the 3 See below. comments).
2023-11-24 Jakub Jelinek <jakub@redhat.com>
PR c++/112619
* tree.cc (try_catch_may_fallthru): If second operand of
TRY_CATCH_EXPR is not a STATEMENT_LIST, handle it as if it was a
STATEMENT_LIST containing a single statement.
* g++.dg/eh/pr112619.C: New test.
The following tries to reduce the number of cases we use an unsigned
type for the addition when we know the original signed increment was
OK which is when the total unsigned increment computed fits the signed
type as well.
This fixes the observed testsuite fallout.
PR tree-optimization/112344
* tree-chrec.cc (chrec_apply): Only use an unsigned add
when the overall increment doesn't fit the signed type.
When working on fixing bugs of zvl1024b. I notice a special VLA SLP case
can be better optimized.
v = vec_perm (op1, op2, { nunits - 1, nunits, nunits + 1, ... })
Before this patch, we are using genriec approach (vrgather):
vid
vadd.vx
vrgather
vmsgeu
vrgather
With this patch, we use vec_extract + slide1up:
scalar = vec_extract (last element of op1)
v = slide1up (op2, scalar)
Tested on zvl128b/zvl256b/zvl512b/zvl1024b of both RV32 and RV64 no regression.
Ok for trunk ?
PR target/112599
gcc/ChangeLog:
* config/riscv/riscv-v.cc (shuffle_extract_and_slide1up_patterns): New function.
(expand_vec_perm_const_1): Add new optimization.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112599-2.c: New test.
The testcase noted in the PR fails because the context of the lambda is
not in namespace scope, but rather in class scope. This patch removes
the assertion that the context must be a namespace and ensures that
lambdas in class scope still get the correct merge_kind.
PR c++/107398
gcc/cp/ChangeLog:
* module.cc (trees_out::get_merge_kind): Handle lambdas in class
scope.
(maybe_key_decl): Remove assertion and fix whitespace.
gcc/testsuite/ChangeLog:
* g++.dg/modules/lambda-6_a.C: New test.
* g++.dg/modules/lambda-6_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
gcc/ChangeLog:
PR target/112643
* config/i386/driver-i386.cc (check_avx10_avx512_features):
Renamed to ...
(check_avx512_features): this and remove avx10 check.
(host_detect_local_cpu): Never append -mno-avx10.1-{256,512} to
avoid emitting warnings when building GCC with native arch.
* config/i386/i386-builtin.def (BDESC): Add missing AVX512VL for
128/256 bit builtin for AVX512VP2INTERSECT.
* config/i386/i386-options.cc (ix86_option_override_internal):
Also check whether the AVX512 flags is set when trying to reset.
* config/i386/i386.h
(PTA_SKYLAKE_AVX512): Add missing PTA_EVEX512.
(PTA_ZNVER4): Ditto.
Checks for exporting a declaration that was previously declared as not
exported is implemented in 'duplicate_decls', but this doesn't handle
declarations of classes. This patch adds these checks and slightly
adjusts the associated error messages for clarity.
PR c++/98885
gcc/cp/ChangeLog:
* decl.cc (duplicate_decls): Adjust error message.
(xref_tag): Adjust error message. Check exporting decl that is
already declared as non-exporting.
gcc/testsuite/ChangeLog:
* g++.dg/modules/export-1.C: Adjust error messages. Remove
xfails for working case. Add new test case.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Tests with keys that match both PASS, FAIL (or now
optionally XPASS), count as fail. XPASSes were previously
ignored. Handling them as FAIL seems the most useful
alternative, but not counting XPASSes may be deliberate.
It's also a matter of compatibility, so make it optional.
Attempts to use --handle-xpass-as-fail was previously
flagged as a usage error. If you pass it now, on state with
previous mixed XPASS and PASS results but doesn't change in
this run, the XPASS is discovered as a (new) regression.
For new XPASSing tests, it's handled as a new FAIL.
* btest-gcc.sh (--handle-xpass-as-fail): New option.
This is a long-standing bug: passing "-j --add-passes-despite-regression"
or "--add-passes-despite-regression -j" caused the second option to be
treated as TARGET; the first non-option parameter.
* btest-gcc.sh (Option handling): Handle multiple options.
This is needed to avoid a linker warning.
2023-11-23 John David Anglin <danglin@gcc.gnu.org>
gcc/testsuite/ChangeLog:
* g++.dg/pr104869.C: Export main on hpux.
The change applied in r14-5760-g2a46e0e7e20 changed the behaviour of
functions with assembly like:
bar:
__acle_se_bar:
Where both bar and __acle_se_bar are globals refering to the same
function body. The old behaviour overrides 'bar' with '__acle_se_bar'
and the scan tests for that label.
The change here re-allows the override.
Case like this are not legal Mach-O (where two global symbols cannot
have the same address in the assembler output). However, given the
constraints on the Mach-O scanning, it does not seem that it is
necessary to skip the change (any incorrect case should be easily
evident in the assembler).
gcc/testsuite/ChangeLog:
* lib/scanasm.exp: Allow multiple function start symbols,
taking the last as the function name.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
The Fortran standard requires that NULL() passed to an assumed-rank
dummy argument has a MOLD argument.
gcc/testsuite/ChangeLog:
PR fortran/104819
* gfortran.dg/assumed_rank_10.f90: Add MOLD argument to NULL().
* gfortran.dg/assumed_rank_8.f90: Likewise.
Fortran 2023 added restrictions on integer arguments to SYSTEM_CLOCK to
have a decimal exponent range at least as large as a default integer,
and that all integer arguments have the same kind type parameter.
gcc/fortran/ChangeLog:
PR fortran/112609
* check.cc (gfc_check_system_clock): Add checks on integer arguments
to SYSTEM_CLOCK specific to F2023.
* error.cc (notify_std_msg): Adjust to handle new features added
in F2023.
* gfortran.texi (_gfortran_set_options): Document GFC_STD_F2023_DEL,
remove obsolete option GFC_STD_F2008_TS and fix enumeration values.
* libgfortran.h (GFC_STD_F2023_DEL): Add and use in GFC_STD_OPT_F23.
* options.cc (set_default_std_flags): Add GFC_STD_F2023_DEL.
gcc/testsuite/ChangeLog:
PR fortran/112609
* gfortran.dg/system_clock_1.f90: Add option -std=f2003.
* gfortran.dg/system_clock_3.f08: Add option -std=f2008.
* gfortran.dg/system_clock_4.f90: New test.
2023-11-23 John David Anglin <danglin@gcc.gnu.org>
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/initlist-const1.C: xfail scan-assembler-not
check on hppa*-*-hpux*.
This adds the std::ranges::to functions for C++23. The rest of P1206R7
is not yet implemented, i.e. the new constructors taking the
std::from_range tag, and the new insert_range, assign_range, etc. member
functions. std::ranges::to works with the standard containers even
without the new constructors, so this is useful immediately.
The __cpp_lib_ranges_to_container feature test macro can be defined now,
because that only indicates support for the changes in <ranges>, which
are implemented by this patch. The __cpp_lib_containers_ranges macro
will be defined once all containers support the new member functions.
libstdc++-v3/ChangeLog:
PR libstdc++/111055
* include/bits/ranges_base.h (from_range_t): Define new tag
type.
(from_range): Define new tag object.
* include/bits/version.def (ranges_to_container): Define.
* include/bits/version.h: Regenerate.
* include/std/ranges (ranges::to): Define.
* testsuite/std/ranges/conv/1.cc: New test.
* testsuite/std/ranges/conv/2_neg.cc: New test.
* testsuite/std/ranges/conv/version.cc: New test.
The operator== function is only a friend of the LHS argument, so cannot
access the private member of the RHS argument. Use the public accessor
instead.
libstdc++-v3/ChangeLog:
* testsuite/util/testsuite_allocator.h (uneq_allocator): Fix
equality operator for heterogeneous comparisons.
2023-11-23 John David Anglin <danglin@gcc.gnu.org>
gcc/testsuite/ChangeLog:
* c-c++-common/Wattributes.c: Don't skip check for warning
at line 411 in Wattributes.c on hppa*64*-*-*.
In <https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628748.html>
I proposed -fhardened, a new umbrella option that enables a reasonable set
of hardening flags. The read of the room seems to be that the option
would be useful. So here's a patch implementing that option.
Currently, -fhardened enables:
-D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
-D_GLIBCXX_ASSERTIONS
-ftrivial-auto-var-init=zero
-fPIE -pie -Wl,-z,relro,-z,now
-fstack-protector-strong
-fstack-clash-protection
-fcf-protection=full (x86 GNU/Linux only)
-fhardened will not override options that were specified on the command line
(before or after -fhardened). For example,
-D_FORTIFY_SOURCE=1 -fhardened
means that _FORTIFY_SOURCE=1 will be used. Similarly,
-fhardened -fstack-protector
will not enable -fstack-protector-strong.
Currently, -fhardened is only supported on GNU/Linux.
In DW_AT_producer it is reflected only as -fhardened; it doesn't expand
to anything. This patch provides -Whardened, enabled by default, which
warns when -fhardened couldn't enable a particular option. I think most
often it will say that _FORTIFY_SOURCE wasn't enabled because optimization
were not enabled.
gcc/c-family/ChangeLog:
* c-opts.cc: Include "target.h".
(c_finish_options): Maybe cpp_define _FORTIFY_SOURCE
and _GLIBCXX_ASSERTIONS.
gcc/ChangeLog:
* common.opt (Whardened, fhardened): New options.
* config.in: Regenerate.
* config/bpf/bpf.cc: Include "opts.h".
(bpf_option_override): If flag_stack_protector_set_by_fhardened_p, do
not inform that -fstack-protector does not work.
* config/i386/i386-options.cc (ix86_option_override_internal): When
-fhardened, maybe enable -fcf-protection=full.
* config/linux-protos.h (linux_fortify_source_default_level): Declare.
* config/linux.cc (linux_fortify_source_default_level): New.
* config/linux.h (TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL): Redefine.
* configure: Regenerate.
* configure.ac: Check if the linker supports '-z now' and '-z relro'.
Check if -fhardened is supported on $target_os.
* doc/invoke.texi: Document -fhardened and -Whardened.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL): Add.
* gcc.cc (driver_handle_option): Remember if any link options or -static
were specified on the command line.
(process_command): When -fhardened, maybe enable -pie and
-Wl,-z,relro,-z,now.
* opts.cc (flag_stack_protector_set_by_fhardened_p): New global.
(finish_options): When -fhardened, enable
-ftrivial-auto-var-init=zero and -fstack-protector-strong.
(print_help_hardened): New.
(print_help): Call it.
* opts.h (flag_stack_protector_set_by_fhardened_p): Declare.
* target.def (fortify_source_default_level): New target hook.
* targhooks.cc (default_fortify_source_default_level): New.
* targhooks.h (default_fortify_source_default_level): Declare.
* toplev.cc (process_options): When -fhardened, enable
-fstack-clash-protection. If flag_stack_protector_set_by_fhardened_p,
do not warn that -fstack-protector not supported for this target.
Don't enable -fhardened when !HAVE_FHARDENED_SUPPORT.
gcc/testsuite/ChangeLog:
* gcc.misc-tests/help.exp: Test -fhardened.
* c-c++-common/fhardened-1.S: New test.
* c-c++-common/fhardened-1.c: New test.
* c-c++-common/fhardened-10.c: New test.
* c-c++-common/fhardened-11.c: New test.
* c-c++-common/fhardened-12.c: New test.
* c-c++-common/fhardened-13.c: New test.
* c-c++-common/fhardened-14.c: New test.
* c-c++-common/fhardened-15.c: New test.
* c-c++-common/fhardened-2.c: New test.
* c-c++-common/fhardened-3.c: New test.
* c-c++-common/fhardened-4.c: New test.
* c-c++-common/fhardened-5.c: New test.
* c-c++-common/fhardened-6.c: New test.
* c-c++-common/fhardened-7.c: New test.
* c-c++-common/fhardened-8.c: New test.
* c-c++-common/fhardened-9.c: New test.
* gcc.target/i386/cf_check-6.c: New test.
The function __hardcfr_check_fail in hardcfr.c is internal and static
inline. It receives many arguments, which require more than five
registers to be passed in bpf-none-unknown targets. BPF is limited to
that number of registers to pass arguments, and therefore libgcc fails
to build in that target. This patch marks the function with the
always_inline attribute, fixing the bpf build.
Tested in bpf-unknown-none target and x86_64-linux-gnu host.
libgcc/ChangeLog:
* hardcfr.c (__hardcfr_check_fail): Mark as always_inline.
We have an issue with `scan-assembler-times' handling expressions using
subexpressions as produced by capturing parentheses `()' in an odd way,
and one that is inconsistent with `scan-assembler', `scan-assembler-not',
etc. The problem comes from calling `regexp' with `-inline -all', which
causes a list to be returned that would otherwise be placed in match
variables.
Consequently if we have say:
/* { dg-final { scan-assembler-times "\\s(foo|bar)\\s" 1 } } */
in a test case and there is a lone `foo' present in output being matched,
then our invocation of `regexp -inline -all' in `scan-assembler-times'
will return:
{ foo } foo
and that in turn will confuse our match count calculation as `llength'
will return 2 rather than 1, making the test fail even though `foo' was
only actually matched once.
It seems unclear why we chose to call `regexp' in such an odd way in the
first place just to figure out the number of matches. The first version
of TCL that supports the `-all' option to `regexp' is 8.3, and according
to its documentation[1][2] `regexp' already returns the number of matches
found whenever `-all' has been used *unless* `-inline' has also been used.
Remove the `-inline' option then along with the `llength' invocation.
References:
[1] "Tcl Built-In Commands - regexp manual page",
<https://www.tcl.tk/man/tcl8.2.3/TclCmd/regexp.html>
[2] "Tcl Built-In Commands - regexp manual page",
<https://www.tcl.tk/man/tcl8.3/TclCmd/regexp.html>
gcc/testsuite/
* lib/scanasm.exp (scan-assembler-times): Remove the `-inline'
option to `regexp' and the wrapping `llength' call.
Use non-capturing parentheses for the subexpressions used with
`scan-assembler-times', to avoid a quirk with double-counting.
gcc/testsuite/
* gcc.target/aarch64/ccmp_1.c: Use non-capturing parentheses
with `scan-assembler-times'.
Use non-capturing parentheses for the subexpressions used with
`scan-assembler-times', to avoid a quirk with double-counting.
gcc/testsuite/
* gcc.target/arm/pr53447-5.c: Use non-capturing parentheses with
`scan-assembler-times'.
My recent commit 0c2037d9d9 added a
switch statement lacking a default clause, leading to warnings or
errors when building with --enable-werror-always.
Fix by adding an empty default.
Committed as obvious.
2023-11-23 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm-mve-builtins-functions.h
(full_width_access::memory_vector_mode): Add default clause.
gen_parityhi2_cmp instruction clobbers its input operand, so use
a temporary register in the call to gen_parityhi2_cmp.
PR target/112672
gcc/ChangeLog:
* config/i386/i386.md (parityhi2):
Use temporary register in the call to gen_parityhi2_cmp.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr112672.c: New test.
With the above two options, use a temporary register regno (as returned
from split_stack_prologue_scratch_regno) as an indirect call scratch
register to hold __morestack function address. On 64-bit targets, two
temporary registers are always available, so load the function addres in
%r11 and call __morestack_large_model with its one-argument-register value
rn %r10. On 32-bit targets, bail out with a "sorry" if the temporary
register can not be obtained.
On 32-bit targets, also emit PIC sequence that re-uses the obtained indirect
call scratch register before moving the function address to it. We can
not set up %ebx PIC register in this case, but __morestack is prepared
for this situation and sets it up by itself.
PR target/89316
gcc/ChangeLog:
* config/i386/i386.cc (ix86_expand_split_stack_prologue): Obtain
scratch regno when flag_force_indirect_call is set. On 64-bit
targets, call __morestack_large_model when flag_force_indirect_call
is set and on 32-bit targets with -fpic, manually expand PIC sequence
to call __morestack. Move the function address to an indirect
call scratch register.
gcc/testsuite/ChangeLog:
* g++.target/i386/pr89316.C: New test.
* gcc.target/i386/pr112605-1.c: New test.
* gcc.target/i386/pr112605-2.c: New test.
* gcc.target/i386/pr112605.c: New test.
Implement flags output for inline assemblies. Only use one output constraint
that captures the whole condition code. No breakout into different condition
codes is allowed. Also, only one condition code variable is allowed.
Add further logic to canonicalize various cases where we combine different
cases of possible condition codes.
gcc/ChangeLog:
* config/s390/s390-c.cc (s390_cpu_cpp_builtins): Define
__GCC_ASM_FLAG_OUTPUTS__.
* config/s390/s390.cc (s390_canonicalize_comparison): More
UNSPEC_CC_TO_INT cases.
(s390_md_asm_adjust): Implement flags output.
* config/s390/s390.md (ccstore4): Allow mask operands.
* doc/extend.texi: Document flags output.
gcc/testsuite/ChangeLog:
* gcc.target/s390/ccor.c: New test.
Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
Issue two loads when using GPRs instead of one load-multiple.
Bootstrapped and tested on s390. OK for mainline?
gcc/ChangeLog:
* config/s390/s390.md: Split TImode loads.
gcc/testsuite/ChangeLog:
* gcc.target/s390/int128load.c: New test.
Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
When using GNU vector extensions, an access outside of the vector size
caused an ICE on s390. Fix this by aligning with the vec_extract
builtin, i.e., computing constant index modulo number of lanes.
Fixes testcase gcc.target/s390/pr89233.c.
gcc/ChangeLog:
* config/s390/vector.md: (*vec_extract) Fix.
Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
Previously for ops.length >= 3, when FMA is present, we don't
rank the operands so that more FMAs can be preserved. But this
brings more FMAs with loop dependency, which lead to worse
performance on some targets.
Rank the oprands (set width=2) when:
1. avoid_fma_max_bits is set.
2. And loop dependent FMA sequence is found.
In this way, we don't have to discard all the FMA candidates
in the bad shaped sequence in widening_mul, instead we can keep
fewer FMAs without loop dependency.
With this patch, there's about 2% improvement in 510.parest_r
1-copy run on ampere1 (with "-Ofast -mcpu=ampere1 -flto
--param avoid-fma-max-bits=512").
PR tree-optimization/110279
gcc/ChangeLog:
* tree-ssa-reassoc.cc (get_reassociation_width): check
for loop dependent FMAs.
(reassociate_bb): For 3 ops, refine the condition to call
swap_ops_for_binary_stmt.
gcc/testsuite/ChangeLog:
* gcc.dg/pr110279-1.c: New test.
Add wrapper for vec_extract since my following patch will need to call it.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (emit_vec_extract): New function.
* config/riscv/riscv-v.cc (emit_vec_extract): Ditto.
* config/riscv/riscv.cc (riscv_legitimize_move): Refine codes.
This patch fixes following FAILs in zvl1024b of both RV32/RV64:
FAIL: gcc.c-torture/execute/990128-1.c -O2 execution test
FAIL: gcc.c-torture/execute/990128-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/990128-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test
FAIL: gcc.c-torture/execute/990128-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test
FAIL: gcc.c-torture/execute/990128-1.c -O3 -g execution test
FAIL: gcc.dg/torture/pr58955-2.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test
The root case can be simpliy described in this following small case:
https://godbolt.org/z/7GaxbEGzG
typedef int64_t v1024b __attribute__ ((vector_size (128)));
void foo (void *out, void *in, int64_t a, int64_t b)
{
v1024b v = {a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a};
v1024b v2 = {b,b,b,b,b,b,b,b,b,b,b,b,b,b,b,b};
v1024b index = *(v1024b*)in;
v1024b v3 = __builtin_shuffle (v, v2, index);
__riscv_vse64_v_i64m1 (out, (vint64m1_t)v3, 10);
}
Incorrect ASM:
foo:
li a5,31
vsetivli zero,10,e64,m1,ta,mu
vmv.v.x v2,a5
vl1re64.v v1,0(a1)
vmv.v.x v4,a2
vand.vv v1,v1,v2
vmv.v.x v3,a3
vmsgeu.vi v0,v1,16
vrgather.vv v2,v4,v1 --> AVL = VLMAX according to codes.
vadd.vi v1,v1,-16
vrgather.vv v2,v3,v1,v0.t --> AVL = VLMAX according to codes.
vse64.v v2,0(a0) --> AVL = 10 according to codes.
ret
For vrgather dest, source, index instruction, when index may has the value > the following store AVL
that is index value > 10. In this situation, the codes above will end up with:
The source vector of vrgather has undefined value on index >= AVL (which is 10 in this case).
So disable AVL propagation for vrgather instruction.
PR target/112599
PR target/112670
gcc/ChangeLog:
* config/riscv/riscv-avlprop.cc (alv_can_be_propagated_p): New function.
(vlmax_ta_p): Disable vrgather AVL propagation.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112599-1.c: New test.
As the following testcase shows, we ICE when trying to emit ADDR_EXPR of
a bitint variable which doesn't have mode width.
The problem is in the EXTEND_BITINT stuff which makes sure we treat the
padding bits on memory reads from user bitint vars as undefined.
When expanding ADDR_EXPR on such vars inside outside of initializers,
expand_expr_addr* uses EXPAND_CONST_ADDRESS modifier and EXTEND_BITINT
does nothing, but in initializers it keeps using EXPAND_INITIALIZER
modifier. So, we need to treat EXPAND_INITIALIZER the same as
EXPAND_CONST_ADDRESS for this regard.
2023-11-23 Jakub Jelinek <jakub@redhat.com>
PR middle-end/112336
* expr.cc (EXTEND_BITINT): Don't call reduce_to_bit_field_precision
if modifier is EXPAND_INITIALIZER.
* gcc.dg/bitint-41.c: New test.
The _M_realloc_insert member does not have the trivial relocation
optimization for C++98, which seems to be why the _M_end_of_storage
member does not get optimized away. Make this test unsupported for
C++98.
gcc/testsuite/ChangeLog:
PR libstdc++/110879
* g++.dg/opt/pr110879.C: Require C++11 or later.