FreeChainXenon/gcc - Aiden Isik's Forgejo Server

Author	SHA1	Message	Date
Andrew Stubbs	392f70cc11	amdgcn: Support XNACK mode The XNACK feature allows memory load instructions to restart safely following a page-miss interrupt. This is useful for shared-memory devices, like APUs, and to implement OpenMP Unified Shared Memory. To support the feature we must be able to set the appropriate meta-data and set the load instructions to early-clobber. When the port supports scheduling of s_waitcnt instructions there will be further requirements. gcc/ChangeLog: * config/gcn/gcn-hsa.h (NO_XNACK): Ignore missing -march. (XNACKOPT): Match on/off; ignore any. * config/gcn/gcn-valu.md (gather<mode>_insn_1offset<exec>): Add xnack compatible alternatives. (gather<mode>_insn_2offsets<exec>): Likewise. * config/gcn/gcn.cc (gcn_option_override): Permit -mxnack for devices other than Fiji and gfx1030. (gcn_expand_epilogue): Remove early-clobber problems. (gcn_hsa_declare_function_name): Obey -mxnack setting. * config/gcn/gcn.md (xnack): New attribute. (enabled): Rework to include "xnack" attribute. (movbi): Add xnack compatible alternatives. (mov<mode>_insn): Likewise. (mov<mode>_insn): Likewise. (mov<mode>_insn): Likewise. (movti_insn): Likewise. config/gcn/gcn.opt (-mxnack): Change the default to "any". * doc/invoke.texi: Remove placeholder notice for -mxnack.	2023-12-13 15:30:42 +00:00
Andrew Carlotti	e6bb4d9979	aarch64 testsuite: Check entire .arch string Add a terminating newline to various tests, and add missing extensions to some test strings. The current output is broken for options_set_4.c, so this test is left unchanged, to be fixed in a subsequent patch. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cpunative/native_cpu_18.c: Add \+nopauth\n * gcc.target/aarch64/options_set_7.c: Add \+crc\n * gcc.target/aarch64/options_set_8.c: Add \+crc\+nodotprod\n * gcc.target/aarch64/cpunative/native_cpu_0.c: Add \n * gcc.target/aarch64/cpunative/native_cpu_1.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_2.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_3.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_4.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_5.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_6.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_7.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_8.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_9.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_10.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_11.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_12.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_14.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_15.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto. * gcc.target/aarch64/options_set_1.c: Ditto. * gcc.target/aarch64/options_set_2.c: Ditto. * gcc.target/aarch64/options_set_3.c: Ditto. * gcc.target/aarch64/options_set_5.c: Ditto. * gcc.target/aarch64/options_set_6.c: Ditto. * gcc.target/aarch64/options_set_9.c: Ditto. * gcc.target/aarch64/options_set_11.c: Ditto. * gcc.target/aarch64/options_set_12.c: Ditto. * gcc.target/aarch64/options_set_13.c: Ditto. * gcc.target/aarch64/options_set_14.c: Ditto. * gcc.target/aarch64/options_set_15.c: Ditto. * gcc.target/aarch64/options_set_16.c: Ditto. * gcc.target/aarch64/options_set_17.c: Ditto. * gcc.target/aarch64/options_set_18.c: Ditto. * gcc.target/aarch64/options_set_19.c: Ditto. * gcc.target/aarch64/options_set_20.c: Ditto. * gcc.target/aarch64/options_set_21.c: Ditto. * gcc.target/aarch64/options_set_22.c: Ditto. * gcc.target/aarch64/options_set_23.c: Ditto. * gcc.target/aarch64/options_set_24.c: Ditto. * gcc.target/aarch64/options_set_25.c: Ditto. * gcc.target/aarch64/options_set_26.c: Ditto.	2023-12-13 15:03:35 +00:00
Andrew Carlotti	943fd92254	aarch64: Add missing driver-aarch64 dependencies gcc/ChangeLog: * config/aarch64/x-aarch64: Add missing dependencies.	2023-12-13 15:01:53 +00:00
Andrew Stubbs	348874f0ba	libgomp: basic pinned memory on Linux Implement the OpenMP pinned memory trait on Linux hosts using the mlock syscall. Pinned allocations are performed using mmap, not malloc, to ensure that they can be unpinned safely when freed. This implementation will work OK for page-scale allocations, and finer-grained allocations will be implemented in a future patch. libgomp/ChangeLog: * allocator.c (MEMSPACE_ALLOC): Add PIN. (MEMSPACE_CALLOC): Add PIN. (MEMSPACE_REALLOC): Add PIN. (MEMSPACE_FREE): Add PIN. (MEMSPACE_VALIDATE): Add PIN. (omp_init_allocator): Use MEMSPACE_VALIDATE to check pinning. (omp_aligned_alloc): Add pinning to all MEMSPACE_* calls. (omp_aligned_calloc): Likewise. (omp_realloc): Likewise. (omp_free): Likewise. * config/linux/allocator.c: New file. * config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN. (MEMSPACE_CALLOC): Add PIN. (MEMSPACE_REALLOC): Add PIN. (MEMSPACE_FREE): Add PIN. (MEMSPACE_VALIDATE): Add PIN. * config/gcn/allocator.c (MEMSPACE_ALLOC): Add PIN. (MEMSPACE_CALLOC): Add PIN. (MEMSPACE_REALLOC): Add PIN. (MEMSPACE_FREE): Add PIN. * libgomp.texi: Switch pinned trait to supported. (MEMSPACE_VALIDATE): Add PIN. * testsuite/libgomp.c/alloc-pinned-1.c: New test. * testsuite/libgomp.c/alloc-pinned-2.c: New test. * testsuite/libgomp.c/alloc-pinned-3.c: New test. * testsuite/libgomp.c/alloc-pinned-4.c: New test. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>	2023-12-13 14:27:07 +00:00
Peter Bergner	373a85a826	testsuite: Add dg-do compile target c++17 directive for testcase [PR112822] Add dg-do compile target directive that limits the test case to being built on c++17 compiles or greater. 2023-12-13 Peter Bergner <bergner@linux.ibm.com> gcc/testsuite/ PR tree-optimization/112822 * g++.dg/pr112822.C: Add dg-do compile target c++17 directive.	2023-12-13 08:14:04 -06:00
Pan Li	d702387b1b	RISC-V: Refine test cases for both PR112929 and PR112988 Refine the test cases for: * Name convention. * Add run case. These test cases used to cause out-of-bounds writes to the stack and therefore showed unreliable behavior. Depending on the execution environment they can either pass or fail. As of now, with the latest QEMU version, they will pass even without the underlying issue fixed. As the test case is known to have caused the problem before we keep it as a run test case for future reference. PR target/112929 PR target/112988 gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr112929.c: Moved to... * gcc.target/riscv/rvv/vsetvl/pr112929-1.c: ...here. * gcc.target/riscv/rvv/vsetvl/pr112988.c: Moved to... * gcc.target/riscv/rvv/vsetvl/pr112988-1.c: ...here. * gcc.target/riscv/rvv/vsetvl/pr112929-2.c: New test. * gcc.target/riscv/rvv/vsetvl/pr112988-2.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>	2023-12-13 22:06:51 +08:00
Andrew Carlotti	f6f76583fc	aarch64 testsuite: Only run aarch64-ssve tests once gcc/testsuite/ChangeLog: * g++.target/aarch64/sve/aarch64-ssve.exp:	2023-12-13 13:58:56 +00:00
Roger Sayle	ff8d0ce17f	ARC: Add extvsi_n_0 define_insn_and_split for PR 110717. This patch improves the code generated for bitfield sign extensions on ARC cpus without a barrel shifter. Compiling the following test case: int foo(int x) { return (x<<27)>>27; } with -O2 -mcpu=em, generates two loops: foo: mov lp_count,27 lp 2f add r0,r0,r0 nop 2: # end single insn loop mov lp_count,27 lp 2f asr r0,r0 nop 2: # end single insn loop j_s [blink] and the closely related test case: struct S { int a : 5; }; int bar (struct S p) { return p->a; } generates the slightly better: bar: ldb_s r0,[r0] mov_s r2,0 ;3 add3 r0,r2,r0 sexb_s r0,r0 asr_s r0,r0 asr_s r0,r0 j_s.d [blink] asr_s r0,r0 which uses 6 instructions to perform this particular sign extension. It turns out that sign extensions can always be implemented using at most three instructions on ARC (without a barrel shifter) using the idiom ((x&mask)^msb)-msb [as described in section "2-5 Sign Extension" of Henry Warren's book "Hacker's Delight"]. Using this, the sign extensions above on ARC's EM both become: bmsk_s r0,r0,4 xor r0,r0,16 sub r0,r0,16 which takes about 3 cycles, compared to the ~112 cycles for the loops in foo. 2023-12-13 Roger Sayle <roger@nextmovesoftware.com> Jeff Law <jlaw@ventanamicro.com> gcc/ChangeLog * config/arc/arc.md (extvsi_n_0): New define_insn_and_split to implement SImode sign extract using a AND, XOR and MINUS sequence. gcc/testsuite/ChangeLog gcc.target/arc/extvsi-1.c: New test case. * gcc.target/arc/extvsi-2.c: Likewise.	2023-12-13 13:36:44 +00:00
Feng Wang	6a737ec24a	RISC-V:Add crypto vector implied ISA info. Due to the crypto vector entension is depend on the Vector extension, so add the implied ISA info with the corresponding crypto vector extension. gcc/ChangeLog: * common/config/riscv/riscv-common.cc: Modify implied ISA info. * config/riscv/arch-canonicalize: Add crypto vector implied info.	2023-12-13 13:15:04 +00:00
Jonathan Wakely	ad537ccd52	libstdc++: Fix regression in std::format output of %Y for negative years The change in r14-6468-ga01462ae8bafa8 was only supposed to apply to %C formats, not %Y. libstdc++-v3/ChangeLog: * include/bits/chrono_io.h (__formatter_chrono::_M_C_y_Y): Do not round century down for %Y formats.	2023-12-13 12:30:14 +00:00
Arsen Arsenović	36cb7be477	gettext: disable install, docs targets, libasprintf, threads This fixes issues reported by David Edelsohn <dje.gcc@gmail.com>, and by Eric Gallager <egallager@gcc.gnu.org>. ChangeLog: * Makefile.def (gettext): Disable (via missing) {install-,}{pdf,html,info,dvi} and TAGS targets. Set no_install to true. Add --disable-threads --disable-libasprintf. Drop the lib_path (as there are no shared libs). * Makefile.in: Regenerate.	2023-12-13 13:17:35 +01:00
Arsen Arsenović	eb6c2bcb67	download_prerequisites: add --only-gettext contrib/ChangeLog: * download_prerequisites <arg parse>: Parse --only-gettext. (echo_archives): Check only_gettext and stop early if true. (helptext): Document --only-gettext.	2023-12-13 13:14:18 +01:00
Juzhe-Zhong	ef21ae5c45	RISC-V: Postpone full available optimization [VSETVL PASS] Fix VSETVL BUG that AVL is polluted .L15: li a3,9 lui a4,%hi(s) sw a3,%lo(j)(t2) sh a5,%lo(s)(a4) <--a4 is hold the address of s beq t0,zero,.L42 sw t5,8(t4) vsetvli zero,a4,e8,m8,ta,ma <<--- a4 as avl Actually, this vsetvl is redundant. The root cause we include full available optimization in LCM local data computation. full available optimization should be after LCM computation. PR target/112929 PR target/112988 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::compute_lcm_local_properties): Remove full available. (pre_vsetvl::pre_global_vsetvl_info): Add full available optimization. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr112929.c: New test. * gcc.target/riscv/rvv/vsetvl/pr112988.c: New test.	2023-12-13 20:01:31 +08:00
demin.han	90be333ad5	RISC-V: Fix dynamic lmul tests depended on abi Some toolchain configs would report: fatal error: gnu/stubs-ilp32.h: No such file or directory Fix method suggested by Juzhe-Zhong gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h: New file. Signed-off-by: demin.han <demin.han@starfivetech.com> Signed-off-by: demin.han <demin.han@starfivetech.com>	2023-12-13 19:52:23 +08:00
Juzhe-Zhong	f6d787c231	Middle-end: Adjust decrement IV style partial vectorization COST model Hi, before this patch, a simple conversion case for RVV codegen: foo: ble a2,zero,.L8 addiw a5,a2,-1 li a4,6 bleu a5,a4,.L6 srliw a3,a2,3 slli a3,a3,3 add a3,a3,a0 mv a5,a0 mv a4,a1 vsetivli zero,8,e16,m1,ta,ma .L4: vle8.v v2,0(a5) addi a5,a5,8 vzext.vf2 v1,v2 vse16.v v1,0(a4) addi a4,a4,16 bne a3,a5,.L4 andi a5,a2,-8 beq a2,a5,.L10 .L3: slli a4,a5,32 srli a4,a4,32 subw a2,a2,a5 slli a2,a2,32 slli a5,a4,1 srli a2,a2,32 add a0,a0,a4 add a1,a1,a5 vsetvli zero,a2,e16,m1,ta,ma vle8.v v2,0(a0) vzext.vf2 v1,v2 vse16.v v1,0(a1) .L8: ret .L10: ret .L6: li a5,0 j .L3 This vectorization go through first loop: vsetivli zero,8,e16,m1,ta,ma .L4: vle8.v v2,0(a5) addi a5,a5,8 vzext.vf2 v1,v2 vse16.v v1,0(a4) addi a4,a4,16 bne a3,a5,.L4 Each iteration processes 8 elements. For a scalable vectorization with VLEN > 128 bits CPU, it's ok when VLEN = 128. But, as long as VLEN > 128 bits, it will waste the CPU resources. That is, e.g. VLEN = 256bits. only half of the vector units are working and another half is idle. After investigation, I realize that I forgot to adjust COST for SELECT_VL. So, adjust COST for SELECT_VL styple length vectorization. We adjust COST from 3 to 2. since after this patch: foo: ble a2,zero,.L5 .L3: vsetvli a5,a2,e16,m1,ta,ma -----> SELECT_VL cost. vle8.v v2,0(a0) slli a4,a5,1 -----> additional shift of outcome SELECT_VL for memory address calculation. vzext.vf2 v1,v2 sub a2,a2,a5 vse16.v v1,0(a1) add a0,a0,a5 add a1,a1,a4 bne a2,zero,.L3 .L5: ret This patch is a simple fix that I previous forgot. Ok for trunk ? If not, I am going to adjust cost in backend cost model. PR target/111317 gcc/ChangeLog: * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Adjust for COST for decrement IV. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/pr111317.c: New test.	2023-12-13 19:51:59 +08:00
Jakub Jelinek	07efd56685	lower-bitint: Fix lowering of non-_BitInt to _BitInt cast merged with some wider cast [PR112940] The following testcase ICEs, because a PHI argument from latch edge uses a SSA_NAME set only in a conditionally executed block inside of the loop. This happens when we have some outer cast which lowers its operand several times, under some condition with variable index, under different condition with some constant index, otherwise something else, and then there is an inner cast from non-_BitInt integer (or small/middle one). Such cast in certain conditions is emitted by initializing some SSA_NAMEs in the initialization statements before loops (say for casts from <= limb size precision by computing a SSA_NAME for the first limb and then extension of it for the later limbs) and uses the prepare_data_in_out function to create a PHI node. Such function is passed the value (constant or SSA_NAME) to use in the PHI argument from the pre-header edge, but for the latch edge it always created a new SSA_NAME and then caller emitted in the following 3 spots an extra assignment to set that SSA_NAME to whatever value we want from the latch edge. In all these 3 cases the argument from the latch edge is known already before the loop though, either constant or SSA_NAME computed in pre-header as well. But the need to emit an assignment combined with the handle_operand done in a conditional basic block results in the SSA verification failure. The following patch fixes it by extending the prpare_data_in_out method, so that when the latch edge argument is known before (constant or computed in pre-header), we can just use it directly and avoid the extra assignment that would normally be hopefully optimized away later to what we now emit directly. 2023-12-13 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/112940 * gimple-lower-bitint.cc (struct bitint_large_huge): Add another argument to prepare_data_in_out method defaulted to NULL_TREE. (bitint_large_huge::handle_operand): Pass another argument to prepare_data_in_out instead of emitting an assignment to set it. (bitint_large_huge::prepare_data_in_out): Add VAL_OUT argument. If non-NULL, use it as PHI argument instead of creating a new SSA_NAME. (bitint_large_huge::handle_cast): Pass rext as another argument to 2 prepare_data_in_out calls instead of emitting assignments to set them. * gcc.dg/bitint-53.c: New test.	2023-12-13 11:36:27 +01:00
Jakub Jelinek	bb600f9822	attribs: Fix valgrind failures on -Wno-attributes* tests [PR112953] The r14-6076 change changed the allocation of attribute tables from table = new attribute_spec[2]; to table = new attribute_spec { ... }; with ignored_attributes_table.safe_push (table); later in both cases, but didn't change the corresponding delete in free_attr_data, which means valgrind is unhappy about that: FAIL: c-c++-common/Wno-attributes-2.c -Wc++-compat (test for excess errors) Excess errors: ==974681== Mismatched free() / delete / delete [] ==974681== at 0x484965B: operator delete[](void) (vg_replace_malloc.c:1103) ==974681== by 0x707434: free_attr_data() (attribs.cc:318) ==974681== by 0xCFF8A4: compile_file() (toplev.cc:454) ==974681== by 0x704D23: do_compile (toplev.cc:2150) ==974681== by 0x704D23: toplev::main(int, char) (toplev.cc:2306) ==974681== by 0x7064BA: main (main.cc:39) ==974681== Address 0x51dffa0 is 0 bytes inside a block of size 40 alloc'd ==974681== at 0x4845FF5: operator new(unsigned long) (vg_replace_malloc.c:422) ==974681== by 0x70A040: handle_ignored_attributes_option(vec<char, va_heap, vl_ptr>) (attribs.cc:301) ==974681== by 0x7FA089: handle_pragma_diagnostic_impl<false, false> (c-pragma.cc:934) ==974681== by 0x7FA089: handle_pragma_diagnostic(cpp_reader) (c-pragma.cc:1028) ==974681== by 0x75814F: c_parser_pragma(c_parser, pragma_context, bool) (c-parser.cc:14707) ==974681== by 0x784A85: c_parser_external_declaration(c_parser) (c-parser.cc:2027) ==974681== by 0x785223: c_parser_translation_unit (c-parser.cc:1900) ==974681== by 0x785223: c_parse_file() (c-parser.cc:26713) ==974681== by 0x7F6331: c_common_parse_file() (c-opts.cc:1301) ==974681== by 0xCFF87D: compile_file() (toplev.cc:446) ==974681== by 0x704D23: toplev::main(int, char) (toplev.cc:2306) ==974681== by 0x7064BA: main (main.cc:39) 2023-12-13 Jakub Jelinek <jakub@redhat.com> PR middle-end/112953 attribs.cc (free_attr_data): Use delete x rather than delete[] x.	2023-12-13 11:35:20 +01:00
Jakub Jelinek	02c30fdad2	i386: Fix ICE on __builtin_ia32_pabsd128 without lhs [PR112962] The following patch fixes ICE on the testcase in similar way to how other folded builtins are handled in ix86_gimple_fold_builtin when they don't have a lhs; these builtins are const or pure, so normally DCE would remove them later, but with -O0 that isn't guaranteed to happen, and during expansion if they are marked TREE_SIDE_EFFECTS it might still be attempted to be expanded. This removes them right away during the folding. Initially I wanted to also change all gsi_replace last args in that function to true, but Andrew pointed to PR107209, so I've kept them as is. 2023-12-13 Jakub Jelinek <jakub@redhat.com> PR target/112962 * config/i386/i386.cc (ix86_gimple_fold_builtin): For shifts and abs without lhs replace with nop. * gcc.target/i386/pr112962.c: New test.	2023-12-13 11:34:12 +01:00
Richard Biener	4dfc6bcabb	Avoid losing MEM_REF offset in MEM_EXPR adjustment for stack slot sharing When investigating PR111591 with respect to TBAA and stack slot sharing I noticed we're eventually scrapping a [TARGET_]MEM_REF offset when rewriting the VAR_DECL base of the MEM_EXPR to use a pointer to the partition instead. The following makes sure to preserve that. * emit-rtl.cc (set_mem_attributes_minus_bitpos): Preserve the offset when rewriting an exising MEM_REF base for stack slot sharing.	2023-12-13 10:06:30 +01:00
Richard Biener	93db32a414	tree-optimization/112991 - re-do PR112961 fix The following does away with the fake edge adding as in the original PR112961 fix and instead exposes handling of entry PHIs as additional parameter of the region VN run. PR tree-optimization/112991 PR tree-optimization/112961 * tree-ssa-sccvn.h (do_rpo_vn): Add skip_entry_phis argument. * tree-ssa-sccvn.cc (do_rpo_vn): Likewise. (do_rpo_vn_1): Likewise, merge with auto-processing. (run_rpo_vn): Adjust. (pass_fre::execute): Likewise. * tree-if-conv.cc (tree_if_conversion): Revert last change. Value-number latch block but disable value-numbering of entry PHIs. * tree-ssa-uninit.cc (execute_early_warn_uninitialized): Adjust. * gcc.dg/torture/pr112991.c: New testcase.	2023-12-13 09:43:10 +01:00
Richard Biener	b9baead90d	tree-optimization/112990 - unsupported VEC_PERM from match pattern The following avoids creating an unsupported VEC_PERM after vector lowering from the pattern merging a bit-insert from a bit-field-ref to a VEC_PERM. For the already existing s390 testcase we get TImode vectors which later ICE during attempted expansion of a vec_perm_const. PR tree-optimization/112990 * match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): Restrict to vector modes after lowering.	2023-12-13 09:42:17 +01:00
Richard Biener	97094d2ffd	middle-end/111591 - explain why TBAA doesn't need adjustment While tidying the prototype patch I've done for the reduced testcase in PR111591 and in that process trying to produce a testcase that is miscompiled by stack slot coalescing and the TBAA info that remains un-altered I've realized we do not need to adjust TBAA info. The following documents this in the place we adjust points-to info which we do need to adjust. PR middle-end/111591 * cfgexpand.cc (update_alias_info_with_stack_vars): Document why not adjusting TBAA info on accesses is OK.	2023-12-13 08:59:01 +01:00
Alexandre Oliva	8e0568d8ac	multiflags: fix doc warning properly Rather than a dubious fix for a dubious warning, namely adding a period after a parenthesized @xref because the warning demands it, use @pxref that is meant for exactly this case. Thanks to Joseph Myers for introducing me to it. for gcc/ChangeLog * doc/invoke.texi (multiflags): Drop extraneous period, use @pxref instead.	2023-12-13 01:31:41 -03:00
Victor Do Nascimento	9fba663768	aarch64: Implement the ACLE instruction/data prefetch functions. Implement the ACLE data and instruction prefetch functions[1] with the following signatures: 1. Data prefetch intrinsics: ---------------------------- void __pldx (/constant/ unsigned int /access_kind/, /constant/ unsigned int /cache_level/, /constant/ unsigned int /retention_policy/, void const volatile addr); void __pld (void const volatile addr); 2. Instruction prefetch intrinsics: ----------------------------------- void __plix (/constant/ unsigned int /cache_level/, /constant/ unsigned int /retention_policy/, void const volatile addr); void __pli (void const volatile addr); `__pldx' affords the programmer more fine-grained control over the data prefetch behaviour than the analogous GCC builtin `__builtin_prefetch', and allows access to the "SLC" cache level. While `__builtin_prefetch' chooses both cache-level and retention policy automatically via the optional `locality' parameter, `__pldx' expects 2 (mandatory) arguments to explicitly define the desired cache-level and retention policies. `__plix' on the other hand, generates a code prefetch instruction and so extends functionality on aarch64 targets beyond that which is exposed by `builtin_prefetch'. `__pld' and `__pli' do prefetch of data and instructions, respectively, using default values for both cache-level and retention policies. Bootstrapped and tested on aarch64-none-linux-gnu. [1] https://arm-software.github.io/acle/main/acle.html#memory-prefetch-intrinsics gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc: (AARCH64_PLD): New enum aarch64_builtins entry. (AARCH64_PLDX): Likewise. (AARCH64_PLI): Likewise. (AARCH64_PLIX): Likewise. (aarch64_init_prefetch_builtin): New. (aarch64_general_init_builtins): Call prefetch init function. (aarch64_expand_prefetch_builtin): New. (aarch64_general_expand_builtin): Add prefetch expansion. (require_const_argument): New. * config/aarch64/aarch64.md (UNSPEC_PLDX): New. (aarch64_pldx): Likewise. * config/aarch64/arm_acle.h (__pld): Likewise. (__pli): Likewise. (__plix): Likewise. (__pldx): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/builtin_pld_pli.c: New. * gcc.target/aarch64/builtin_pld_pli_illegal.c: New.	2023-12-13 02:49:56 +00:00
Kewen Lin	fda8e2f829	range: Workaround different type precision between _Float128 and long double [PR112788] As PR112788 shows, on rs6000 with -mabi=ieeelongdouble type _Float128 has the different type precision (128) from that (127) of type long double, but actually they has the same underlying mode, so they have the same precision as the mode indicates the same real type format ieee_quad_format. It's not sensible to have such two types which have the same mode but different type precisions, some fix attempt was posted at [1]. As the discussion there, there are some historical reasons and practical issues. Considering we passed stage 1 and it also affected the build as reported, this patch is trying to temporarily workaround it. I thought to introduce a hookpod but that seems a bit overkill, assuming scalar float type with the same mode should have the same precision looks sensible. [1] https://inbox.sourceware.org/gcc-patches/718677e7-614d-7977-312d-05a75e1fd5b4@linux.ibm.com/ PR tree-optimization/112788 gcc/ChangeLog: * value-range.h (range_compatible_p): Workaround same type mode but different type precision issue for rs6000 scalar float types _Float128 and long double.	2023-12-12 20:39:34 -06:00
Haochen Jiang	1243a057be	i386: Fix PR110790 testcase gcc/testsuite/ChangeLog: * gcc.target/i386/pr110790-2.c: Change scan-assembler from shrq to shr\[qx\].	2023-12-13 10:15:29 +08:00
Jiufu Guo	a9046f1979	rs6000: using pli for constant splitting For constant building e.g. r120=0x66666666, which does not fit 'li or lis', 'pli' is used to build this constant via 'emit_move_insn'. While for a complicated constant, e.g. 0x6666666666666666ULL, when using 'rs6000_emit_set_long_const' to split the constant recursively, it fails to use 'pli' to build the half part constant: 0x66666666. 'rs6000_emit_set_long_const' could be updated to use 'pli' to build half part of the constant when necessary. For example: 0x6666666666666666ULL, "pli 3,1717986918; rldimi 3,3,32,0" can be used. gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add code to use pli for 34bit constant. gcc/testsuite/ChangeLog: * gcc.target/powerpc/const-build-1.c: New test.	2023-12-13 08:53:18 +08:00
Jiufu Guo	97b3b38e5f	rs6000: accurate num_insns_constant_gpr Trunk gcc supports more constants to be built via two instructions: e.g. "li/lis; xori/xoris/rldicl/rldicr/rldic". And then num_insns_constant should also be updated. Function "rs6000_emit_set_long_const" is used to build complicated constants; and "num_insns_constant_gpr" is used to compute 'how many instructions are needed" to build the constant. So, these two functions should be aligned. The idea of this patch is: to reuse "rs6000_emit_set_long_const" to compute/record the instruction number(when computing the insn_num, then do not emit instructions). gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add new parameter to record number of instructions to build the constant. (num_insns_constant_gpr): Call rs6000_emit_set_long_const to compute num_insn.	2023-12-13 08:53:10 +08:00
GCC Administrator	8a5d6ce0e8	Daily bump.	2023-12-13 00:17:49 +00:00
Jason Merrill	b756830399	c++: class hotness attribute and member template The FUNCTION_DECL check ignored member function templates. gcc/cp/ChangeLog: * class.cc (propagate_class_warmth_attribute): Handle member templates. gcc/testsuite/ChangeLog: * g++.dg/ext/attr-hotness.C: Add member templates. Co-authored-by: Jason Xu <rxu@DRWHoldings.com>	2023-12-12 18:49:04 -05:00
Juzhe-Zhong	8501edba91	RISC-V: Apply vla vs. vls mode heuristic vector COST model This patch apply vla vs. vls mode heuristic which can fixes the following FAILs: FAIL: gcc.target/riscv/rvv/autovec/pr111751.c -O3 -ftree-vectorize scan-assembler-not vset FAIL: gcc.target/riscv/rvv/autovec/pr111751.c -O3 -ftree-vectorize scan-assembler-times li\\s+[a-x0-9]+,0\\s+ret 2 The root cause of this FAIL is we failed to pick VLS mode for the vectorization. Before this patch: foo2: addi sp,sp,-208 addi a2,sp,64 addi a5,sp,128 lui a6,%hi(.LANCHOR0) sd ra,200(sp) addi a6,a6,%lo(.LANCHOR0) mv a0,a2 mv a1,a5 li a3,16 mv a4,sp vsetivli zero,8,e64,m8,ta,ma vle64.v v8,0(a6) vse64.v v8,0(a2) vse64.v v8,0(a5) .L4: vsetvli a5,a3,e32,m1,ta,ma slli a2,a5,2 vle32.v v2,0(a1) vle32.v v1,0(a0) sub a3,a3,a5 vadd.vv v1,v1,v2 vse32.v v1,0(a4) add a1,a1,a2 add a0,a0,a2 add a4,a4,a2 bne a3,zero,.L4 lw a4,128(sp) lw a5,64(sp) addw a5,a5,a4 lw a4,0(sp) bne a4,a5,.L5 lw a4,132(sp) lw a5,68(sp) addw a5,a5,a4 lw a4,4(sp) bne a4,a5,.L5 lw a4,136(sp) lw a5,72(sp) addw a5,a5,a4 lw a4,8(sp) bne a4,a5,.L5 lw a4,140(sp) lw a5,76(sp) addw a5,a5,a4 lw a4,12(sp) bne a4,a5,.L5 lw a4,144(sp) lw a5,80(sp) addw a5,a5,a4 lw a4,16(sp) bne a4,a5,.L5 lw a4,148(sp) lw a5,84(sp) addw a5,a5,a4 lw a4,20(sp) bne a4,a5,.L5 lw a4,152(sp) lw a5,88(sp) addw a5,a5,a4 lw a4,24(sp) bne a4,a5,.L5 lw a4,156(sp) lw a5,92(sp) addw a5,a5,a4 lw a4,28(sp) bne a4,a5,.L5 lw a4,160(sp) lw a5,96(sp) addw a5,a5,a4 lw a4,32(sp) bne a4,a5,.L5 lw a4,164(sp) lw a5,100(sp) addw a5,a5,a4 lw a4,36(sp) bne a4,a5,.L5 lw a4,168(sp) lw a5,104(sp) addw a5,a5,a4 lw a4,40(sp) bne a4,a5,.L5 lw a4,172(sp) lw a5,108(sp) addw a5,a5,a4 lw a4,44(sp) bne a4,a5,.L5 lw a4,176(sp) lw a5,112(sp) addw a5,a5,a4 lw a4,48(sp) bne a4,a5,.L5 lw a4,180(sp) lw a5,116(sp) addw a5,a5,a4 lw a4,52(sp) bne a4,a5,.L5 lw a4,184(sp) lw a5,120(sp) addw a5,a5,a4 lw a4,56(sp) bne a4,a5,.L5 lw a4,188(sp) lw a5,124(sp) addw a5,a5,a4 lw a4,60(sp) bne a4,a5,.L5 ld ra,200(sp) li a0,0 addi sp,sp,208 jr ra .L5: call abort After this patch: li a0,0 ret The heuristic leverage ARM SVE and fully tested and confirm we have same behavior as ARM SVE GCC and RVV Clang. gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (costs::analyze_loop_vinfo): New function. (costs::record_potential_vls_unrolling): Ditto. (costs::prefer_unrolled_loop): Ditto. (costs::better_main_loop_than_p): Ditto. (costs::add_stmt_cost): Ditto. * config/riscv/riscv-vector-costs.h (enum cost_type_enum): New enum. * config/riscv/t-riscv: Add new include files. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr111313.c: Adapt test. * gcc.target/riscv/rvv/autovec/vls/shift-3.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-1.c: New test. * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: New test. * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: New test. * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: New test. * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-2.c: New test. * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-3.c: New test. * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-4.c: New test. * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-5.c: New test. * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-6.c: New test. * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-7.c: New test. * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-8.c: New test. * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-9.c: New test.	2023-12-13 07:19:26 +08:00
Juzhe-Zhong	26250632df	RISC-V: Refactor Dynamic LMUL codes This patch refactor dynamic LMUL to remove this following variable: static hash_map<class loop , autovec_info> loop_autovec_infos; which will keep growing on-the-fly. gcc/ChangeLog: config/riscv/riscv-vector-costs.cc (get_current_lmul): Remove it. (compute_estimated_lmul): New function. (costs::costs): Refactor. (costs::preferred_new_lmul_p): Ditto. (preferred_new_lmul_p): Ditto. (costs::better_main_loop_than_p): Ditto. * config/riscv/riscv-vector-costs.h (struct autovec_info): Remove it. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-9.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/pr111848.c: Adapt test.	2023-12-13 07:18:41 +08:00
Peter Bergner	788e0d48ec	testsuite: Add testcase for already fixed PR [PR112822] Adding a testcase for PR112822 to ensure we won't regress. 2023-12-12 Peter Bergner <bergner@linux.ibm.com> gcc/testsuite/ PR tree-optimization/112822 * g++.dg/pr112822.C: New test.	2023-12-12 16:46:40 -06:00
Jonathan Wakely	52de6aa1a8	libstdc++: Fix std::format("{}", 'c') When I added a fast path for std::format("{}", x) in r14-5587-g41a5ea4cab2c59 I forgot to handle char separately from other integral types. That caused std::format("{}", 'c') to return "99" instead of "c". libstdc++-v3/ChangeLog: * include/std/format (__do_vformat_to): Handle char separately from other integral types. * testsuite/std/format/functions/format.cc: Check for expected output for char and bool arguments. * testsuite/std/format/string.cc: Check that 0 filling is rejected for character and string formats.	2023-12-12 22:33:32 +00:00
Jonathan Wakely	a01462ae8b	libstdc++: Fix std::format output of %C for negative years During discussion of LWG 4022 I noticed that we do not correctly implement floored division for the century. We were just truncating towards zero, rather than applying the floor function. For negative values that rounds the wrong way. libstdc++-v3/ChangeLog: * include/bits/chrono_io.h (__formatter_chrono::_M_C_y_Y): Fix rounding for negative centuries. * testsuite/std/time/year/io.cc: Check %C for negative years.	2023-12-12 22:33:32 +00:00
Jonathan Wakely	988dd6384c	libstdc++: Remove redundant -std flags from Makefile In r14-4060-gc4baeaecbbf7d0 I moved some files from src/c++98 to src/c++11 but I didn't remove the redundant -std=gnu++11 flags for those files. The flags aren't needed now, because AM_CXXFLAGS for that directory already uses -std=gnu++11. This removes them. libstdc++-v3/ChangeLog: * src/c++11/Makefile.am: Remove redundant -std=gnu++11 flags. * src/c++11/Makefile.in: Regenerate.	2023-12-12 22:32:35 +00:00
Martin Jambor	cd7d0b4cf7	SRA: Force gimple operand in an additional corner case (PR 112822) PR 112822 revealed a corner case in load_assign_lhs_subreplacements where it creates invalid gimple: an assignment where on the LHS there is a complex variable which however is not a gimple register because it has partial defs and on the right hand side there is a VIEW_CONVERT_EXPR. This patch invokes force_gimple_operand_gsi on such statements (like it already does when both sides of a generated assignment have partial definitions. gcc/ChangeLog: 2023-12-12 Martin Jambor <mjambor@suse.cz> PR tree-optimization/112822 * tree-sra.cc (load_assign_lhs_subreplacements): Invoke force_gimple_operand_gsi also when LHS has partial stores and RHS is a VIEW_CONVERT_EXPR.	2023-12-12 21:20:26 +01:00
Gaius Mulley	01cca857aa	PR modula2/112984 Compiling program with -Wpedantic shows warning in libraries This patch tidies up the library modules so that -Wpedantic does not generate any warnings (apart from two procedures with legitimate infinite loops). gcc/m2/ChangeLog: PR modula2/112984 * gm2-libs-coroutines/SYSTEM.mod: Remove redundant import of memcpy. * gm2-libs-iso/ClientSocket.mod: Remove redundant import of IOConsts. * gm2-libs-iso/IOChan.mod: Remove redundant import of IOConsts. * gm2-libs-iso/IOLink.mod: Remove redundant import of IOChan and SYSTEM. * gm2-libs-iso/IOResult.mod: Remove redundant import of IOChan. * gm2-libs-iso/LongIO.mod: Remove redundant import of writeString. * gm2-libs-iso/LongWholeIO.mod: Remove redundant import of IOChan. * gm2-libs-iso/M2RTS.mod: Remove redundant import of ADDRESS. * gm2-libs-iso/MemStream.mod: Remove redundant import of ADDRESS. * gm2-libs-iso/RTdata.mod: Remove redundant import of DeviceTablePtr. * gm2-libs-iso/RTfio.mod: Remove redundant import of DeviceTablePtr. * gm2-libs-iso/RTgen.mod: Remove redundant import of DeviceTablePtr. * gm2-libs-iso/RealIO.mod: Remove redundant import of writeString. * gm2-libs-iso/RndFile.mod: Remove redundant import of SYSTEM. * gm2-libs-iso/SYSTEM.mod: Remove redundant import of memcpy. * gm2-libs-iso/ShortWholeIO.mod: Remove redundant import of IOConsts. * gm2-libs-iso/TextIO.mod: Remove redundant import of IOChan. * gm2-libs-iso/TextUtil.mod: Remove redundant import of IOChan. * gm2-libs-iso/WholeIO.mod: Remove redundant import of IOChan. * gm2-libs-log/BitByteOps.mod: Remove redundant import of BYTE. * gm2-libs-log/FileSystem.mod: Remove redundant import of BYTE and ADDRESS. * gm2-libs-log/InOut.mod: Remove redundant import of String. * gm2-libs-log/RealConversions.mod: Remove redundant import of StringToLongreal. * gm2-libs/FIO.mod: Remove redundant import of SIZE. * gm2-libs/FormatStrings.mod: Remove redundant import of String and ConCatChar. * gm2-libs/IO.mod: Remove redundant import of SIZE. * gm2-libs/Indexing.mod: Remove redundant import of ADDRESS. * gm2-libs/M2Dependent.mod: Remove redundant import of SIZE. * gm2-libs/M2RTS.mod: Remove redundant import of ADDRESS. * gm2-libs/OptLib.mod: Remove redundant import of DynamicStrings. * gm2-libs/SYSTEM.mod: Remove redundant import of memcpy. * gm2-libs/StringConvert.mod: Remove redundant import of String. libgm2/ChangeLog: * libm2iso/Makefile.am (libm2iso_la_M2FLAGS): Added line breaks. * libm2iso/Makefile.in: Regenerate. * libm2log/Makefile.am (libm2log_la_M2FLAGS): Added line breaks. * libm2log/Makefile.in: Regenerate. * libm2pim/Makefile.am (libm2pim_la_M2FLAGS): Added line breaks. * libm2pim/Makefile.in: Regenerate. gcc/testsuite/ChangeLog: PR modula2/112984 * gm2/switches/pedantic/pass/hello.mod: New test. * gm2/switches/pedantic/pass/switches-pedantic-pass.exp: New test. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>	2023-12-12 19:29:06 +00:00
Xi Ruoyao	f4d8ab192d	LoongArch: testsuite: Remove XFAIL in vect-ftint-no-inexact.c After r14-6455 this no longer fails. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vect-ftint-no-inexact.c (xfail): Remove.	2023-12-13 02:42:18 +08:00
Jason Merrill	6cc3231b9e	testsuite: fix is_nothrow_default_constructible8.C This testcase uses variable templates, a C++14 feature. gcc/testsuite/ChangeLog: * g++.dg/ext/is_nothrow_constructible8.C: Require C++14.	2023-12-12 12:47:21 -05:00
Jason Merrill	de072b5229	tree: add to clobber_kind In discussion of PR71093 it came up that more clobber_kind options would be useful within the C++ front-end. gcc/ChangeLog: * tree-core.h (enum clobber_kind): Rename CLOBBER_EOL to CLOBBER_STORAGE_END. Add CLOBBER_STORAGE_BEGIN, CLOBBER_OBJECT_BEGIN, CLOBBER_OBJECT_END. * gimple-lower-bitint.cc * gimple-ssa-warn-access.cc * gimplify.cc * tree-inline.cc * tree-ssa-ccp.cc: Adjust for rename. * tree-pretty-print.cc: And handle new values. gcc/cp/ChangeLog: * call.cc (build_trivial_dtor_call): Use CLOBBER_OBJECT_END. * decl.cc (build_clobber_this): Take clobber_kind argument. (start_preparsed_function): Pass CLOBBER_OBJECT_BEGIN. (begin_destructor_body): Pass CLOBBER_OBJECT_END. gcc/testsuite/ChangeLog: * gcc.dg/pr87052.c: Adjust expected CLOBBER output. Co-authored-by: Nathaniel Shead <nathanieloshead@gmail.com>	2023-12-12 12:47:21 -05:00
Szabolcs Nagy	321477fc3a	aarch64,arm: Fix branch-protection= parsing Refactor the parsing to have a single API and fix a few parsing issues: - Different handling of "bti+none" and "none+bti": these should be rejected because "none" can only appear alone. - Accepted empty strings such as "bti++pac-ret" or "bti+", this bug was caused by using strtok_r. - Memory got leaked (str_root was never freed). And two buffers got allocated when one is enough. The callbacks now have no failure mode, only parsing can fail and all failures are handled locally. The "-mbranch-protection=" vs "target("branch-protection=")" difference in the error message is handled by a separate argument to aarch_validate_mbranch_protection. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_override_options): Update. (aarch64_handle_attr_branch_protection): Update. * config/arm/aarch-common-protos.h (aarch_parse_branch_protection): Remove. (aarch_validate_mbranch_protection): Add new argument. * config/arm/aarch-common.cc (aarch_handle_no_branch_protection): Update. (aarch_handle_standard_branch_protection): Update. (aarch_handle_pac_ret_protection): Update. (aarch_handle_pac_ret_leaf): Update. (aarch_handle_pac_ret_b_key): Update. (aarch_handle_bti_protection): Update. (aarch_parse_branch_protection): Remove. (next_tok): New. (aarch_validate_mbranch_protection): Rewrite. * config/arm/aarch-common.h (struct aarch_branch_protect_type): Add field "alone". * config/arm/arm.cc (arm_configure_build_target): Update. gcc/testsuite/ChangeLog: * gcc.target/aarch64/branch-protection-attr.c: Update. * gcc.target/aarch64/branch-protection-option.c: Update.	2023-12-12 15:48:41 +00:00
Szabolcs Nagy	d83acace70	aarch64,arm: Remove accepted_branch_protection_string On aarch64 this caused ICE with pragma push_options since commit `ae54c1b099` Author: Wilco Dijkstra <wilco.dijkstra@arm.com> CommitDate: 2022-06-01 18:13:57 +0100 AArch64: Cleanup option processing code The failure is at pop_options: internal compiler error: ‘global_options’ are modified in local context On arm the variable was unused. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_override_options_after_change_1): Do not override branch_protection options. (aarch64_override_options): Remove accepted_branch_protection_string. * config/arm/aarch-common.cc (BRANCH_PROTECT_STR_MAX): Remove. (aarch_parse_branch_protection): Remove accepted_branch_protection_string. * config/arm/arm.cc: Likewise.	2023-12-12 15:48:31 +00:00
Richard Biener	6d0b0806eb	tree-optimization/112736 - avoid overread with non-grouped SLP load The following aovids over/under-read of storage when vectorizing a non-grouped load with SLP. Instead of forcing peeling for gaps use a smaller load for the last vector which might access excess elements. This builds upon the existing optimization avoiding peeling for gaps, generalizing it to all gap widths leaving a power-of-two remaining number of elements (but it doesn't replace or improve that particular case at this point). I wonder if the poly relational compares I set up are good enough to guarantee /* remain should now be > 0 and < nunits. /. There is existing test coverage that runs into / DR will be unused. / always when the gap is wider than nunits. Compared to the existing gap == nunits/2 case this only adjusts the load that will cause the overrun at the end, not every load. Apart from the poly relational compares it should reliably cover these cases but I'll leave it for stage1 to remove. PR tree-optimization/112736 tree-vect-stmts.cc (vectorizable_load): Extend optimization to avoid peeling for gaps to handle single-element non-groups we now allow with SLP. * gcc.dg/torture/pr112736.c: New testcase.	2023-12-12 15:25:25 +01:00
Richard Biener	eee13a3730	ipa/92606 - properly handle no_icf attribute for variables The following adds no_icf handling for variables where the attribute was rejected. It also fixes the check for no_icf by checking both the source and the targets decl. PR ipa/92606 gcc/c-family/ * c-attribs.cc (handle_noicf_attribute): Also allow the attribute on global variables. gcc/ * ipa-icf.cc (sem_item_optimizer::merge_classes): Check both source and alias for the no_icf attribute. * doc/extend.texi (no_icf): Document variable attribute.	2023-12-12 15:25:24 +01:00
Richard Biener	878cb5acf0	tree-optimization/112961 - include latch in if-conversion CSE The following makes sure to also process the (empty) latch when performing CSE on the if-converted loop body. That's important to get all uses of copies propagated out on the backedge as well. To avoid CSE on the PHI nodes itself which is prohibitive (see PR90402) this temporarily adds a fake entry edge to the loop. PR tree-optimization/112961 * tree-if-conv.cc (tree_if_conversion): Instead of excluding the latch block from VN, add a fake entry edge. * g++.dg/vect/pr112961.cc: New testcase.	2023-12-12 15:13:15 +01:00
Jakub Jelinek	dabd94da0c	testsuite: Fix up test directive syntax errors I've noticed +ERROR: gcc.dg/gomp/pr87887-1.c: syntax error in target selector ".-4" for " dg-warning 13 "unsupported return type ‘struct S’ for ‘simd’ functions" { target aarch64--* } .-4 " +ERROR: gcc.dg/gomp/pr87887-1.c: syntax error in target selector ".-4" for " dg-warning 13 "unsupported return type ‘struct S’ for ‘simd’ functions" { target aarch64--* } .-4 " +ERROR: gcc.dg/gomp/pr89246-1.c: syntax error in target selector ".-4" for " dg-warning 11 "unsupported argument type ‘__int128’ for ‘simd’ functions" { target aarch64--* } .-4 " +ERROR: gcc.dg/gomp/pr89246-1.c: syntax error in target selector ".-4" for " dg-warning 11 "unsupported argument type ‘__int128’ for ‘simd’ functions" { target aarch64--* } .-4 " +ERROR: gcc.dg/gomp/simd-clones-2.c: unmatched open quote in list for " dg-final 19 { scan-tree-dump "_ZGVnN2ua32vl_setArray" "optimized { target aarch64--* } } " +ERROR: gcc.dg/gomp/simd-clones-2.c: unmatched open quote in list for " dg-final 19 { scan-tree-dump "_ZGVnN2ua32vl_setArray" "optimized { target aarch64--* } } " regressions. The following patch fixes those. 2023-12-12 Jakub Jelinek <jakub@redhat.com> * gcc.dg/gomp/pr87887-1.c: Add missing comment argument to dg-warning. * gcc.dg/gomp/pr89246-1.c: Likewise. * gcc.dg/gomp/simd-clones-2.c: Add missing " after dump name.	2023-12-12 13:07:35 +01:00
Xi Ruoyao	99182ea09f	Only allow (int)trunc(x) to (int)x simplification with -ffp-int-builtin-inexact [PR107723] With -fno-fp-int-builtin-inexact, trunc is not allowed to raise FE_INEXACT and it should produce an integral result (if the input is not NaN or Inf). Thus FE_INEXACT should not be raised. But (int)x may raise FE_INEXACT when x is a non-integer, non-NaN, and non-Inf value. C23 recommends to do so in a footnote. Thus we should not simplify (int)trunc(x) to (int)x if -fno-fp-int-builtin-inexact is in-effect. gcc/ChangeLog: PR middle-end/107723 * convert.cc (convert_to_integer_1) [case BUILT_IN_TRUNC]: Break early if !flag_fp_int_builtin_inexact and flag_trapping_math. gcc/testsuite/ChangeLog: PR middle-end/107723 * gcc.dg/torture/builtin-fp-int-inexact-trunc.c: New test.	2023-12-12 19:21:57 +08:00
Richard Sandiford	0640bc76cd	aarch64: Add dg-options to prfm_imm_offset_2.c gcc/testsuite/ * gcc.target/aarch64/prfm_imm_offset_2.c: Add dg-options.	2023-12-12 10:09:53 +00:00
Paul Iannetta	1ee4ad6e92	Add myself to write after approval ChangeLog: * MAINTAINERS: Add myself to write after approval Signed-off-by: Paul Iannetta <piannetta@kalrayinc.com>	2023-12-12 09:37:21 +01:00

... 5 6 7 8 9 ...

206731 commits