The XNACK feature allows memory load instructions to restart safely following
a page-miss interrupt. This is useful for shared-memory devices, like APUs,
and to implement OpenMP Unified Shared Memory.
To support the feature we must be able to set the appropriate meta-data and
set the load instructions to early-clobber. When the port supports scheduling
of s_waitcnt instructions there will be further requirements.
gcc/ChangeLog:
* config/gcn/gcn-hsa.h (NO_XNACK): Ignore missing -march.
(XNACKOPT): Match on/off; ignore any.
* config/gcn/gcn-valu.md (gather<mode>_insn_1offset<exec>):
Add xnack compatible alternatives.
(gather<mode>_insn_2offsets<exec>): Likewise.
* config/gcn/gcn.cc (gcn_option_override): Permit -mxnack for devices
other than Fiji and gfx1030.
(gcn_expand_epilogue): Remove early-clobber problems.
(gcn_hsa_declare_function_name): Obey -mxnack setting.
* config/gcn/gcn.md (xnack): New attribute.
(enabled): Rework to include "xnack" attribute.
(*movbi): Add xnack compatible alternatives.
(*mov<mode>_insn): Likewise.
(*mov<mode>_insn): Likewise.
(*mov<mode>_insn): Likewise.
(*movti_insn): Likewise.
* config/gcn/gcn.opt (-mxnack): Change the default to "any".
* doc/invoke.texi: Remove placeholder notice for -mxnack.
Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall. Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.
This implementation will work OK for page-scale allocations, and finer-grained
allocations will be implemented in a future patch.
libgomp/ChangeLog:
* allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(MEMSPACE_VALIDATE): Add PIN.
(omp_init_allocator): Use MEMSPACE_VALIDATE to check pinning.
(omp_aligned_alloc): Add pinning to all MEMSPACE_* calls.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
(omp_free): Likewise.
* config/linux/allocator.c: New file.
* config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(MEMSPACE_VALIDATE): Add PIN.
* config/gcn/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
* libgomp.texi: Switch pinned trait to supported.
(MEMSPACE_VALIDATE): Add PIN.
* testsuite/libgomp.c/alloc-pinned-1.c: New test.
* testsuite/libgomp.c/alloc-pinned-2.c: New test.
* testsuite/libgomp.c/alloc-pinned-3.c: New test.
* testsuite/libgomp.c/alloc-pinned-4.c: New test.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
Add dg-do compile target directive that limits the test case to being built
on c++17 compiles or greater.
2023-12-13 Peter Bergner <bergner@linux.ibm.com>
gcc/testsuite/
PR tree-optimization/112822
* g++.dg/pr112822.C: Add dg-do compile target c++17 directive.
Refine the test cases for:
* Name convention.
* Add run case.
These test cases used to cause out-of-bounds writes to the stack
and therefore showed unreliable behavior. Depending on the
execution environment they can either pass or fail. As of now,
with the latest QEMU version, they will pass even without the
underlying issue fixed. As the test case is known to have
caused the problem before we keep it as a run test case for
future reference.
PR target/112929
PR target/112988
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/pr112929.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr112929-1.c: ...here.
* gcc.target/riscv/rvv/vsetvl/pr112988.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr112988-1.c: ...here.
* gcc.target/riscv/rvv/vsetvl/pr112929-2.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr112988-2.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
This patch improves the code generated for bitfield sign extensions on
ARC cpus without a barrel shifter.
Compiling the following test case:
int foo(int x) { return (x<<27)>>27; }
with -O2 -mcpu=em, generates two loops:
foo: mov lp_count,27
lp 2f
add r0,r0,r0
nop
2: # end single insn loop
mov lp_count,27
lp 2f
asr r0,r0
nop
2: # end single insn loop
j_s [blink]
and the closely related test case:
struct S { int a : 5; };
int bar (struct S *p) { return p->a; }
generates the slightly better:
bar: ldb_s r0,[r0]
mov_s r2,0 ;3
add3 r0,r2,r0
sexb_s r0,r0
asr_s r0,r0
asr_s r0,r0
j_s.d [blink]
asr_s r0,r0
which uses 6 instructions to perform this particular sign extension.
It turns out that sign extensions can always be implemented using at
most three instructions on ARC (without a barrel shifter) using the
idiom ((x&mask)^msb)-msb [as described in section "2-5 Sign Extension"
of Henry Warren's book "Hacker's Delight"]. Using this, the sign
extensions above on ARC's EM both become:
bmsk_s r0,r0,4
xor r0,r0,16
sub r0,r0,16
which takes about 3 cycles, compared to the ~112 cycles for the loops
in foo.
2023-12-13 Roger Sayle <roger@nextmovesoftware.com>
Jeff Law <jlaw@ventanamicro.com>
gcc/ChangeLog
* config/arc/arc.md (*extvsi_n_0): New define_insn_and_split to
implement SImode sign extract using a AND, XOR and MINUS sequence.
gcc/testsuite/ChangeLog
* gcc.target/arc/extvsi-1.c: New test case.
* gcc.target/arc/extvsi-2.c: Likewise.
Due to the crypto vector entension is depend on the Vector extension,
so add the implied ISA info with the corresponding crypto vector extension.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Modify implied ISA info.
* config/riscv/arch-canonicalize: Add crypto vector implied info.
The change in r14-6468-ga01462ae8bafa8 was only supposed to apply to %C
formats, not %Y.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (__formatter_chrono::_M_C_y_Y): Do
not round century down for %Y formats.
This fixes issues reported by David Edelsohn <dje.gcc@gmail.com>, and by
Eric Gallager <egallager@gcc.gnu.org>.
ChangeLog:
* Makefile.def (gettext): Disable (via missing)
{install-,}{pdf,html,info,dvi} and TAGS targets. Set no_install
to true. Add --disable-threads --disable-libasprintf. Drop the
lib_path (as there are no shared libs).
* Makefile.in: Regenerate.
Fix VSETVL BUG that AVL is polluted
.L15:
li a3,9
lui a4,%hi(s)
sw a3,%lo(j)(t2)
sh a5,%lo(s)(a4) <--a4 is hold the address of s
beq t0,zero,.L42
sw t5,8(t4)
vsetvli zero,a4,e8,m8,ta,ma <<--- a4 as avl
Actually, this vsetvl is redundant.
The root cause we include full available optimization in LCM local data computation.
full available optimization should be after LCM computation.
PR target/112929
PR target/112988
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc
(pre_vsetvl::compute_lcm_local_properties): Remove full available.
(pre_vsetvl::pre_global_vsetvl_info): Add full available optimization.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/pr112929.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr112988.c: New test.
Some toolchain configs would report:
fatal error: gnu/stubs-ilp32.h: No such file or directory
Fix method suggested by Juzhe-Zhong
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h: New file.
Signed-off-by: demin.han <demin.han@starfivetech.com>
Signed-off-by: demin.han <demin.han@starfivetech.com>
Hi, before this patch, a simple conversion case for RVV codegen:
foo:
ble a2,zero,.L8
addiw a5,a2,-1
li a4,6
bleu a5,a4,.L6
srliw a3,a2,3
slli a3,a3,3
add a3,a3,a0
mv a5,a0
mv a4,a1
vsetivli zero,8,e16,m1,ta,ma
.L4:
vle8.v v2,0(a5)
addi a5,a5,8
vzext.vf2 v1,v2
vse16.v v1,0(a4)
addi a4,a4,16
bne a3,a5,.L4
andi a5,a2,-8
beq a2,a5,.L10
.L3:
slli a4,a5,32
srli a4,a4,32
subw a2,a2,a5
slli a2,a2,32
slli a5,a4,1
srli a2,a2,32
add a0,a0,a4
add a1,a1,a5
vsetvli zero,a2,e16,m1,ta,ma
vle8.v v2,0(a0)
vzext.vf2 v1,v2
vse16.v v1,0(a1)
.L8:
ret
.L10:
ret
.L6:
li a5,0
j .L3
This vectorization go through first loop:
vsetivli zero,8,e16,m1,ta,ma
.L4:
vle8.v v2,0(a5)
addi a5,a5,8
vzext.vf2 v1,v2
vse16.v v1,0(a4)
addi a4,a4,16
bne a3,a5,.L4
Each iteration processes 8 elements.
For a scalable vectorization with VLEN > 128 bits CPU, it's ok when VLEN = 128.
But, as long as VLEN > 128 bits, it will waste the CPU resources. That is, e.g. VLEN = 256bits.
only half of the vector units are working and another half is idle.
After investigation, I realize that I forgot to adjust COST for SELECT_VL.
So, adjust COST for SELECT_VL styple length vectorization. We adjust COST from 3 to 2. since
after this patch:
foo:
ble a2,zero,.L5
.L3:
vsetvli a5,a2,e16,m1,ta,ma -----> SELECT_VL cost.
vle8.v v2,0(a0)
slli a4,a5,1 -----> additional shift of outcome SELECT_VL for memory address calculation.
vzext.vf2 v1,v2
sub a2,a2,a5
vse16.v v1,0(a1)
add a0,a0,a5
add a1,a1,a4
bne a2,zero,.L3
.L5:
ret
This patch is a simple fix that I previous forgot.
Ok for trunk ?
If not, I am going to adjust cost in backend cost model.
PR target/111317
gcc/ChangeLog:
* tree-vect-loop.cc (vect_estimate_min_profitable_iters): Adjust for COST for decrement IV.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr111317.c: New test.
The following testcase ICEs, because a PHI argument from latch edge
uses a SSA_NAME set only in a conditionally executed block inside of the
loop.
This happens when we have some outer cast which lowers its operand several
times, under some condition with variable index, under different condition
with some constant index, otherwise something else, and then there is
an inner cast from non-_BitInt integer (or small/middle one). Such cast
in certain conditions is emitted by initializing some SSA_NAMEs in the
initialization statements before loops (say for casts from <= limb size
precision by computing a SSA_NAME for the first limb and then extension
of it for the later limbs) and uses the prepare_data_in_out function
to create a PHI node. Such function is passed the value (constant or
SSA_NAME) to use in the PHI argument from the pre-header edge, but for
the latch edge it always created a new SSA_NAME and then caller emitted
in the following 3 spots an extra assignment to set that SSA_NAME to
whatever value we want from the latch edge. In all these 3 cases
the argument from the latch edge is known already before the loop though,
either constant or SSA_NAME computed in pre-header as well.
But the need to emit an assignment combined with the handle_operand done
in a conditional basic block results in the SSA verification failure.
The following patch fixes it by extending the prpare_data_in_out method,
so that when the latch edge argument is known before (constant or computed
in pre-header), we can just use it directly and avoid the extra assignment
that would normally be hopefully optimized away later to what we now emit
directly.
2023-12-13 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112940
* gimple-lower-bitint.cc (struct bitint_large_huge): Add another
argument to prepare_data_in_out method defaulted to NULL_TREE.
(bitint_large_huge::handle_operand): Pass another argument to
prepare_data_in_out instead of emitting an assignment to set it.
(bitint_large_huge::prepare_data_in_out): Add VAL_OUT argument.
If non-NULL, use it as PHI argument instead of creating a new
SSA_NAME.
(bitint_large_huge::handle_cast): Pass rext as another argument
to 2 prepare_data_in_out calls instead of emitting assignments
to set them.
* gcc.dg/bitint-53.c: New test.
The r14-6076 change changed the allocation of attribute tables from
table = new attribute_spec[2];
to
table = new attribute_spec { ... };
with
ignored_attributes_table.safe_push (table);
later in both cases, but didn't change the corresponding delete in
free_attr_data, which means valgrind is unhappy about that:
FAIL: c-c++-common/Wno-attributes-2.c -Wc++-compat (test for excess errors)
Excess errors:
==974681== Mismatched free() / delete / delete []
==974681== at 0x484965B: operator delete[](void*) (vg_replace_malloc.c:1103)
==974681== by 0x707434: free_attr_data() (attribs.cc:318)
==974681== by 0xCFF8A4: compile_file() (toplev.cc:454)
==974681== by 0x704D23: do_compile (toplev.cc:2150)
==974681== by 0x704D23: toplev::main(int, char**) (toplev.cc:2306)
==974681== by 0x7064BA: main (main.cc:39)
==974681== Address 0x51dffa0 is 0 bytes inside a block of size 40 alloc'd
==974681== at 0x4845FF5: operator new(unsigned long) (vg_replace_malloc.c:422)
==974681== by 0x70A040: handle_ignored_attributes_option(vec<char*, va_heap, vl_ptr>*) (attribs.cc:301)
==974681== by 0x7FA089: handle_pragma_diagnostic_impl<false, false> (c-pragma.cc:934)
==974681== by 0x7FA089: handle_pragma_diagnostic(cpp_reader*) (c-pragma.cc:1028)
==974681== by 0x75814F: c_parser_pragma(c_parser*, pragma_context, bool*) (c-parser.cc:14707)
==974681== by 0x784A85: c_parser_external_declaration(c_parser*) (c-parser.cc:2027)
==974681== by 0x785223: c_parser_translation_unit (c-parser.cc:1900)
==974681== by 0x785223: c_parse_file() (c-parser.cc:26713)
==974681== by 0x7F6331: c_common_parse_file() (c-opts.cc:1301)
==974681== by 0xCFF87D: compile_file() (toplev.cc:446)
==974681== by 0x704D23: toplev::main(int, char**) (toplev.cc:2306)
==974681== by 0x7064BA: main (main.cc:39)
2023-12-13 Jakub Jelinek <jakub@redhat.com>
PR middle-end/112953
* attribs.cc (free_attr_data): Use delete x rather than delete[] x.
The following patch fixes ICE on the testcase in similar way to how
other folded builtins are handled in ix86_gimple_fold_builtin when
they don't have a lhs; these builtins are const or pure, so normally
DCE would remove them later, but with -O0 that isn't guaranteed to
happen, and during expansion if they are marked TREE_SIDE_EFFECTS
it might still be attempted to be expanded.
This removes them right away during the folding.
Initially I wanted to also change all gsi_replace last args in that function
to true, but Andrew pointed to PR107209, so I've kept them as is.
2023-12-13 Jakub Jelinek <jakub@redhat.com>
PR target/112962
* config/i386/i386.cc (ix86_gimple_fold_builtin): For shifts
and abs without lhs replace with nop.
* gcc.target/i386/pr112962.c: New test.
When investigating PR111591 with respect to TBAA and stack slot sharing
I noticed we're eventually scrapping a [TARGET_]MEM_REF offset when
rewriting the VAR_DECL base of the MEM_EXPR to use a pointer to the
partition instead. The following makes sure to preserve that.
* emit-rtl.cc (set_mem_attributes_minus_bitpos): Preserve
the offset when rewriting an exising MEM_REF base for
stack slot sharing.
The following does away with the fake edge adding as in the original
PR112961 fix and instead exposes handling of entry PHIs as additional
parameter of the region VN run.
PR tree-optimization/112991
PR tree-optimization/112961
* tree-ssa-sccvn.h (do_rpo_vn): Add skip_entry_phis argument.
* tree-ssa-sccvn.cc (do_rpo_vn): Likewise.
(do_rpo_vn_1): Likewise, merge with auto-processing.
(run_rpo_vn): Adjust.
(pass_fre::execute): Likewise.
* tree-if-conv.cc (tree_if_conversion): Revert last change.
Value-number latch block but disable value-numbering of
entry PHIs.
* tree-ssa-uninit.cc (execute_early_warn_uninitialized): Adjust.
* gcc.dg/torture/pr112991.c: New testcase.
The following avoids creating an unsupported VEC_PERM after vector
lowering from the pattern merging a bit-insert from a bit-field-ref
to a VEC_PERM. For the already existing s390 testcase we get
TImode vectors which later ICE during attempted expansion of
a vec_perm_const.
PR tree-optimization/112990
* match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..):
Restrict to vector modes after lowering.
While tidying the prototype patch I've done for the reduced testcase
in PR111591 and in that process trying to produce a testcase that
is miscompiled by stack slot coalescing and the TBAA info that
remains un-altered I've realized we do not need to adjust TBAA info.
The following documents this in the place we adjust points-to info
which we do need to adjust.
PR middle-end/111591
* cfgexpand.cc (update_alias_info_with_stack_vars): Document
why not adjusting TBAA info on accesses is OK.
Rather than a dubious fix for a dubious warning, namely adding a
period after a parenthesized @xref because the warning demands it, use
@pxref that is meant for exactly this case. Thanks to Joseph Myers
for introducing me to it.
for gcc/ChangeLog
* doc/invoke.texi (multiflags): Drop extraneous period, use
@pxref instead.
Implement the ACLE data and instruction prefetch functions[1] with the
following signatures:
1. Data prefetch intrinsics:
----------------------------
void __pldx (/*constant*/ unsigned int /*access_kind*/,
/*constant*/ unsigned int /*cache_level*/,
/*constant*/ unsigned int /*retention_policy*/,
void const volatile *addr);
void __pld (void const volatile *addr);
2. Instruction prefetch intrinsics:
-----------------------------------
void __plix (/*constant*/ unsigned int /*cache_level*/,
/*constant*/ unsigned int /*retention_policy*/,
void const volatile *addr);
void __pli (void const volatile *addr);
`__pldx' affords the programmer more fine-grained control over the
data prefetch behaviour than the analogous GCC builtin
`__builtin_prefetch', and allows access to the "SLC" cache level.
While `__builtin_prefetch' chooses both cache-level and retention
policy automatically via the optional `locality' parameter, `__pldx'
expects 2 (mandatory) arguments to explicitly define the desired
cache-level and retention policies.
`__plix' on the other hand, generates a code prefetch instruction and
so extends functionality on aarch64 targets beyond that which is
exposed by `builtin_prefetch'.
`__pld' and `__pli' do prefetch of data and instructions,
respectively, using default values for both cache-level and retention
policies.
Bootstrapped and tested on aarch64-none-linux-gnu.
[1] https://arm-software.github.io/acle/main/acle.html#memory-prefetch-intrinsics
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc:
(AARCH64_PLD): New enum aarch64_builtins entry.
(AARCH64_PLDX): Likewise.
(AARCH64_PLI): Likewise.
(AARCH64_PLIX): Likewise.
(aarch64_init_prefetch_builtin): New.
(aarch64_general_init_builtins): Call prefetch init function.
(aarch64_expand_prefetch_builtin): New.
(aarch64_general_expand_builtin): Add prefetch expansion.
(require_const_argument): New.
* config/aarch64/aarch64.md (UNSPEC_PLDX): New.
(aarch64_pldx): Likewise.
* config/aarch64/arm_acle.h (__pld): Likewise.
(__pli): Likewise.
(__plix): Likewise.
(__pldx): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/builtin_pld_pli.c: New.
* gcc.target/aarch64/builtin_pld_pli_illegal.c: New.
As PR112788 shows, on rs6000 with -mabi=ieeelongdouble type _Float128
has the different type precision (128) from that (127) of type long
double, but actually they has the same underlying mode, so they have
the same precision as the mode indicates the same real type format
ieee_quad_format.
It's not sensible to have such two types which have the same mode but
different type precisions, some fix attempt was posted at [1].
As the discussion there, there are some historical reasons and
practical issues. Considering we passed stage 1 and it also affected
the build as reported, this patch is trying to temporarily workaround
it. I thought to introduce a hookpod but that seems a bit overkill,
assuming scalar float type with the same mode should have the same
precision looks sensible.
[1] https://inbox.sourceware.org/gcc-patches/718677e7-614d-7977-312d-05a75e1fd5b4@linux.ibm.com/
PR tree-optimization/112788
gcc/ChangeLog:
* value-range.h (range_compatible_p): Workaround same type mode but
different type precision issue for rs6000 scalar float types
_Float128 and long double.
For constant building e.g. r120=0x66666666, which does not fit 'li or lis',
'pli' is used to build this constant via 'emit_move_insn'.
While for a complicated constant, e.g. 0x6666666666666666ULL, when using
'rs6000_emit_set_long_const' to split the constant recursively, it fails to
use 'pli' to build the half part constant: 0x66666666.
'rs6000_emit_set_long_const' could be updated to use 'pli' to build half
part of the constant when necessary. For example: 0x6666666666666666ULL,
"pli 3,1717986918; rldimi 3,3,32,0" can be used.
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add code to use
pli for 34bit constant.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/const-build-1.c: New test.
Trunk gcc supports more constants to be built via two instructions:
e.g. "li/lis; xori/xoris/rldicl/rldicr/rldic".
And then num_insns_constant should also be updated.
Function "rs6000_emit_set_long_const" is used to build complicated
constants; and "num_insns_constant_gpr" is used to compute 'how
many instructions are needed" to build the constant. So, these
two functions should be aligned.
The idea of this patch is: to reuse "rs6000_emit_set_long_const" to
compute/record the instruction number(when computing the insn_num,
then do not emit instructions).
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add new
parameter to record number of instructions to build the constant.
(num_insns_constant_gpr): Call rs6000_emit_set_long_const to compute
num_insn.
The FUNCTION_DECL check ignored member function templates.
gcc/cp/ChangeLog:
* class.cc (propagate_class_warmth_attribute): Handle
member templates.
gcc/testsuite/ChangeLog:
* g++.dg/ext/attr-hotness.C: Add member templates.
Co-authored-by: Jason Xu <rxu@DRWHoldings.com>
Adding a testcase for PR112822 to ensure we won't regress.
2023-12-12 Peter Bergner <bergner@linux.ibm.com>
gcc/testsuite/
PR tree-optimization/112822
* g++.dg/pr112822.C: New test.
When I added a fast path for std::format("{}", x) in
r14-5587-g41a5ea4cab2c59 I forgot to handle char separately from other
integral types. That caused std::format("{}", 'c') to return "99"
instead of "c".
libstdc++-v3/ChangeLog:
* include/std/format (__do_vformat_to): Handle char separately
from other integral types.
* testsuite/std/format/functions/format.cc: Check for expected
output for char and bool arguments.
* testsuite/std/format/string.cc: Check that 0 filling is
rejected for character and string formats.
During discussion of LWG 4022 I noticed that we do not correctly
implement floored division for the century. We were just truncating
towards zero, rather than applying the floor function. For negative
values that rounds the wrong way.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (__formatter_chrono::_M_C_y_Y): Fix
rounding for negative centuries.
* testsuite/std/time/year/io.cc: Check %C for negative years.
In r14-4060-gc4baeaecbbf7d0 I moved some files from src/c++98 to
src/c++11 but I didn't remove the redundant -std=gnu++11 flags for those
files. The flags aren't needed now, because AM_CXXFLAGS for that
directory already uses -std=gnu++11. This removes them.
libstdc++-v3/ChangeLog:
* src/c++11/Makefile.am: Remove redundant -std=gnu++11 flags.
* src/c++11/Makefile.in: Regenerate.
PR 112822 revealed a corner case in load_assign_lhs_subreplacements
where it creates invalid gimple: an assignment where on the LHS there
is a complex variable which however is not a gimple register because
it has partial defs and on the right hand side there is a
VIEW_CONVERT_EXPR. This patch invokes force_gimple_operand_gsi on
such statements (like it already does when both sides of a generated
assignment have partial definitions.
gcc/ChangeLog:
2023-12-12 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/112822
* tree-sra.cc (load_assign_lhs_subreplacements): Invoke
force_gimple_operand_gsi also when LHS has partial stores and RHS is a
VIEW_CONVERT_EXPR.
In discussion of PR71093 it came up that more clobber_kind options would be
useful within the C++ front-end.
gcc/ChangeLog:
* tree-core.h (enum clobber_kind): Rename CLOBBER_EOL to
CLOBBER_STORAGE_END. Add CLOBBER_STORAGE_BEGIN,
CLOBBER_OBJECT_BEGIN, CLOBBER_OBJECT_END.
* gimple-lower-bitint.cc
* gimple-ssa-warn-access.cc
* gimplify.cc
* tree-inline.cc
* tree-ssa-ccp.cc: Adjust for rename.
* tree-pretty-print.cc: And handle new values.
gcc/cp/ChangeLog:
* call.cc (build_trivial_dtor_call): Use CLOBBER_OBJECT_END.
* decl.cc (build_clobber_this): Take clobber_kind argument.
(start_preparsed_function): Pass CLOBBER_OBJECT_BEGIN.
(begin_destructor_body): Pass CLOBBER_OBJECT_END.
gcc/testsuite/ChangeLog:
* gcc.dg/pr87052.c: Adjust expected CLOBBER output.
Co-authored-by: Nathaniel Shead <nathanieloshead@gmail.com>
Refactor the parsing to have a single API and fix a few parsing issues:
- Different handling of "bti+none" and "none+bti": these should be
rejected because "none" can only appear alone.
- Accepted empty strings such as "bti++pac-ret" or "bti+", this bug
was caused by using strtok_r.
- Memory got leaked (str_root was never freed). And two buffers got
allocated when one is enough.
The callbacks now have no failure mode, only parsing can fail and
all failures are handled locally. The "-mbranch-protection=" vs
"target("branch-protection=")" difference in the error message is
handled by a separate argument to aarch_validate_mbranch_protection.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_override_options): Update.
(aarch64_handle_attr_branch_protection): Update.
* config/arm/aarch-common-protos.h (aarch_parse_branch_protection):
Remove.
(aarch_validate_mbranch_protection): Add new argument.
* config/arm/aarch-common.cc (aarch_handle_no_branch_protection):
Update.
(aarch_handle_standard_branch_protection): Update.
(aarch_handle_pac_ret_protection): Update.
(aarch_handle_pac_ret_leaf): Update.
(aarch_handle_pac_ret_b_key): Update.
(aarch_handle_bti_protection): Update.
(aarch_parse_branch_protection): Remove.
(next_tok): New.
(aarch_validate_mbranch_protection): Rewrite.
* config/arm/aarch-common.h (struct aarch_branch_protect_type):
Add field "alone".
* config/arm/arm.cc (arm_configure_build_target): Update.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/branch-protection-attr.c: Update.
* gcc.target/aarch64/branch-protection-option.c: Update.
On aarch64 this caused ICE with pragma push_options since
commit ae54c1b099
Author: Wilco Dijkstra <wilco.dijkstra@arm.com>
CommitDate: 2022-06-01 18:13:57 +0100
AArch64: Cleanup option processing code
The failure is at pop_options:
internal compiler error: ‘global_options’ are modified in local context
On arm the variable was unused.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_override_options_after_change_1):
Do not override branch_protection options.
(aarch64_override_options): Remove accepted_branch_protection_string.
* config/arm/aarch-common.cc (BRANCH_PROTECT_STR_MAX): Remove.
(aarch_parse_branch_protection): Remove
accepted_branch_protection_string.
* config/arm/arm.cc: Likewise.
The following aovids over/under-read of storage when vectorizing
a non-grouped load with SLP. Instead of forcing peeling for gaps
use a smaller load for the last vector which might access excess
elements. This builds upon the existing optimization avoiding
peeling for gaps, generalizing it to all gap widths leaving a
power-of-two remaining number of elements (but it doesn't replace
or improve that particular case at this point).
I wonder if the poly relational compares I set up are good enough
to guarantee /* remain should now be > 0 and < nunits. */.
There is existing test coverage that runs into /* DR will be unused. */
always when the gap is wider than nunits. Compared to the
existing gap == nunits/2 case this only adjusts the load that will
cause the overrun at the end, not every load. Apart from the
poly relational compares it should reliably cover these cases but
I'll leave it for stage1 to remove.
PR tree-optimization/112736
* tree-vect-stmts.cc (vectorizable_load): Extend optimization
to avoid peeling for gaps to handle single-element non-groups
we now allow with SLP.
* gcc.dg/torture/pr112736.c: New testcase.
The following adds no_icf handling for variables where the attribute
was rejected. It also fixes the check for no_icf by checking both
the source and the targets decl.
PR ipa/92606
gcc/c-family/
* c-attribs.cc (handle_noicf_attribute): Also allow the
attribute on global variables.
gcc/
* ipa-icf.cc (sem_item_optimizer::merge_classes): Check
both source and alias for the no_icf attribute.
* doc/extend.texi (no_icf): Document variable attribute.
The following makes sure to also process the (empty) latch when
performing CSE on the if-converted loop body. That's important
to get all uses of copies propagated out on the backedge as well.
To avoid CSE on the PHI nodes itself which is prohibitive
(see PR90402) this temporarily adds a fake entry edge to the loop.
PR tree-optimization/112961
* tree-if-conv.cc (tree_if_conversion): Instead of excluding
the latch block from VN, add a fake entry edge.
* g++.dg/vect/pr112961.cc: New testcase.
With -fno-fp-int-builtin-inexact, trunc is not allowed to raise
FE_INEXACT and it should produce an integral result (if the input is not
NaN or Inf). Thus FE_INEXACT should not be raised.
But (int)x may raise FE_INEXACT when x is a non-integer, non-NaN, and
non-Inf value. C23 recommends to do so in a footnote.
Thus we should not simplify (int)trunc(x) to (int)x if
-fno-fp-int-builtin-inexact is in-effect.
gcc/ChangeLog:
PR middle-end/107723
* convert.cc (convert_to_integer_1) [case BUILT_IN_TRUNC]: Break
early if !flag_fp_int_builtin_inexact and flag_trapping_math.
gcc/testsuite/ChangeLog:
PR middle-end/107723
* gcc.dg/torture/builtin-fp-int-inexact-trunc.c: New test.