FreeChainXenon/gcc - Aiden Isik's Forgejo Server

Author	SHA1	Message	Date
Richard Biener	0f7945417f	middle-end/113622 - handle store with variable index to register The following implements storing to a non-MEM_P with a variable offset. We usually avoid this by forcing expansion to memory but this doesn't work for hard register variables. The solution is to spill and operate on the stack. PR middle-end/113622 * expr.cc (expand_assignment): Spill hard registers if we index them with a variable offset. * gcc.target/i386/pr113622-2.c: New testcase. * gcc.target/i386/pr113622-3.c: Likewise.	2024-01-29 14:25:10 +01:00
Richard Biener	96bc048d78	middle-end/113622 - allow .VEC_SET and .VEC_EXTRACT for global hard regs The following expands .VEC_SET and .VEC_EXTRACT instruction selection to global hard registers, not only automatic variables (possibly) promoted to registers. This can avoid some ICEs later and create better code. PR middle-end/113622 * gimple-isel.cc (gimple_expand_vec_set_extract_expr): Also allow DECL_HARD_REGISTER variables. * gcc.target/i386/pr113622-1.c: New testcase.	2024-01-29 14:25:10 +01:00
Alex Coplan	d41a1873f3	aarch64: Ensure iterator validity when updating debug uses [PR113616] The fix for PR113089 introduced range-based for loops over the debug_insn_uses of an RTL-SSA set_info, but in the case that we reset a debug insn, the use would get removed from the use list, and thus we would end up using an invalidated iterator in the next iteration of the loop. In practice this means we end up terminating the loop prematurely, and hence ICE as in PR113089 since there are debug uses that we failed to fix up. This patch fixes that by introducing a general mechanism to avoid this sort of problem. We introduce a safe_iterator to iterator-utils.h which wraps an iterator, and also holds the end iterator value. It then pre-computes the next iterator value at all iterations, so it doesn't matter if the original iterator got invalidated during the loop body, we can still move safely to the next iteration. We introduce an iterate_safely helper which effectively adapts a container such as iterator_range into a container of safe_iterators over the original iterator type. We then use iterate_safely around all loops over debug_insn_uses () in the aarch64 ldp/stp pass to fix PR113616. While doing this, I remembered that cleanup_tombstones () had the same problem. I previously worked around this locally by manually maintaining the next nondebug insn, so this patch also refactors that loop to use the new iterate_safely helper. While doing that I noticed that a couple of cases in cleanup_tombstones could be converted from using dyn_cast<set_info > to as_a<set_info >, which should be safe because there are no clobbers of mem in RTL-SSA, so all defs of memory should be set_infos. gcc/ChangeLog: PR target/113616 * config/aarch64/aarch64-ldp-fusion.cc (fixup_debug_uses_trailing_add): Use iterate_safely when iterating over debug uses. (fixup_debug_uses): Likewise. (ldp_bb_info::cleanup_tombstones): Use iterate_safely to iterate over nondebug insns instead of manually maintaining the next insn. * iterator-utils.h (class safe_iterator): New. (iterate_safely): New. gcc/testsuite/ChangeLog: PR target/113616 * gcc.c-torture/compile/pr113616.c: New test.	2024-01-29 13:29:54 +00:00
H.J. Lu	291f75fa1b	x86: Save callee-saved registers in noreturn functions for -O0/-Og Save callee-saved registers in noreturn functions for -O0/-Og so that debugger can restore callee-saved registers in caller's frame. Also add the TREE_THIS_VOLATILE check to minimize noreturn attribute lookup. gcc/ PR target/38534 * config/i386/i386-options.cc (ix86_set_func_type): Save callee-saved registers in noreturn functions for -O0/-Og. gcc/testsuite/ PR target/38534 * gcc.target/i386/pr38534-5.c: New file. * gcc.target/i386/pr38534-6.c: Likewise.	2024-01-29 05:29:01 -08:00
Prathamesh Kulkarni	8a48723dac	PR112950: Use #pragma GCC for including arm_sve.h. gcc/testsuite/ChangeLog: PR target/112950 * gcc.target/aarch64/sve/acle/general/dupq_5.c: Remove include directive and instead use #pragma GCC for including arm_sve.h.	2024-01-29 18:42:44 +05:30
Tobias Burnus	7cc2262ec9	gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615] gcc/ChangeLog: PR target/113615 * config/gcn/gcn-valu.md (fold_left_plus_<mode>): Only define for !TARGET_RDNA2_PLUS. Signed-off-by: Tobias Burnus <tburnus@baylibre.com>	2024-01-29 13:51:25 +01:00
Richard Sandiford	1a8261e047	vect: Tighten vect_determine_precisions_from_range [PR113281] This was another PR caused by the way that vect_determine_precisions_from_range handles shifts. We tried to narrow 32768 >> x to a 16-bit shift based on range information for the inputs and outputs, with vect_recog_over_widening_pattern (after PR110828) adjusting the shift amount. But this doesn't work for the case where x is in [16, 31], since then 32-bit 32768 >> x is a well-defined zero, whereas no well-defined 16-bit 32768 >> y will produce 0. We could perhaps generate x < 16 ? 32768 >> x : 0 instead, but since vect_determine_precisions_from_range was never really supposed to rely on fix-ups, it seems better to fix that instead. The patch also makes the code more selective about which codes can be narrowed based on input and output ranges. This showed that vect_truncatable_operation_p was missing cases for BIT_NOT_EXPR (equivalent to BIT_XOR_EXPR of -1) and NEGATE_EXPR (equivalent to BIT_NOT_EXPR followed by a PLUS_EXPR of 1). pr113281-1.c is the original testcase. pr113281-[23].c failed before the patch due to overly optimistic narrowing. pr113281-[45].c previously passed and are meant to protect against accidental optimisation regressions. gcc/ PR target/113281 * tree-vect-patterns.cc (vect_recog_over_widening_pattern): Remove workaround for right shifts. (vect_truncatable_operation_p): Handle NEGATE_EXPR and BIT_NOT_EXPR. (vect_determine_precisions_from_range): Be more selective about which codes can be narrowed based on their input and output ranges. For shifts, require at least one more bit of precision than the maximum shift amount. gcc/testsuite/ PR target/113281 * gcc.dg/vect/pr113281-1.c: New test. * gcc.dg/vect/pr113281-2.c: Likewise. * gcc.dg/vect/pr113281-3.c: Likewise. * gcc.dg/vect/pr113281-4.c: Likewise. * gcc.dg/vect/pr113281-5.c: Likewise.	2024-01-29 12:33:08 +00:00
Tobias Burnus	9e89b5e925	nvptx.opt: Add sm_89 and sm_90a to -march-map= The -march-map= options maps the compute capability to the closest lower compute capability that has been implemented; for sm_89 and sm_90a, that were previously missing, that's currently -march=sm_80 alias -misa=sm_80. gcc/ChangeLog: * config/nvptx/nvptx.opt (march-map=): Add sm_89 and sm_90a. Signed-off-by: Tobias Burnus <tburnus@baylibre.com>	2024-01-29 13:06:27 +01:00
Tobias Burnus	c1a38cd67e	install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled gcc/ChangeLog: * doc/install.texi (amdgcn): Recommend LLVM 15+ and newlib 4.4+, but keep requiring only newlib 4.3+ and, if gfx1100 is disabled, LLVM 13.0.1+. Signed-off-by: Tobias Burnus <tburnus@baylibre.com>	2024-01-29 11:20:49 +01:00
Tobias Burnus	ef5ccdbbc6	gcn/mkoffload.cc: Fix SRAM_ECC and XNACK handling [PR111966] Some more '-g' fixes as the .mkoffload.dbg.o debug file's has elf flags which did not match those generated for the compilation, leading to linker errors. For .mkoffload.dbg.o, the elf flags are generated by mkoffload itself - while for the other .o files, that's done by the compiler via setting default and mainly via the ASM_SPEC. This is a follow up to r14-8332-g13127dac106724 which fixed an issue caused by the default arch. In this patch, it is mainly for gfx1100 and gfx1030 which always failed. It also affects gfx906 and possibly gfx900 but only when using the -mxnack/-msram-ecc flags explicitly. What happens on the compiler side is mainly determined by gcn-hsa.h's and otherwise by some default setting. In particular for xnack and sram_ecc, there is: For gfx1100 and gfx1030, neither xnack nor sram_ecc is set (only '+wavefrontsize64'). For fiji, gfx900, gfx906 and gfx908 there is always -mattr=-xnack and for all but gfx908 also -msram-ecc=no - independent of what has been passed to the compiler. However, on the elf flags, the result differs: For fiji, due to the HSACOv3, it is always set to 0 via copy_early_debug_info; for gfx900, gfx906 and gfx908, xnack is OFF. For sram-ecc, it is 'unset' for gfx900, 'any' for gfx906 and for gfx908 it is 'any' unless overridden. For gfx90a, the -msram-ecc= and -mxnack= are passed on, or if not present, ...=any is passed on. Note that this "any" is different from argument nor present at elf flag level: For XNACK: unset/unsupported is 0, any = 0x100, off = 0x200, on = 0x300. For SRAMECC: unset/unsupported is 0, any = 0x400, off = 0x800, on = 0xc00. The obstack_ptr_grow changes are more to avoid confusion than having an actual effect as they would overwise be filtered out via the ASM_SPEC. gcc/ChangeLog: PR other/111966 * config/gcn/mkoffload.cc (SET_XNACK_UNSET, TEST_SRAM_ECC_UNSET): New. (SET_SRAM_ECC_UNSUPPORTED): Renamed to ... (SET_SRAM_ECC_UNSET): ... this. (copy_early_debug_info): Remove gfx900 special case, now handled as part of the generic handling. (main): Update SRAM_ECC and XNACK for the -march as done in gcn-hsa.h. Signed-off-by: Tobias Burnus <tburnus@baylibre.com>	2024-01-29 11:10:33 +01:00
Tobias Burnus	cb366731e7	libgomp.c/declare-variant-4.h: Fix used variant function for gfx1030/gfx1100 libgomp/ChangeLog: * testsuite/libgomp.c/declare-variant-4.h: Use gfx1100/gfx1030 function not gfx90a for gfx1100/gfx1030 context selector. Signed-off-by: Tobias Burnus <tburnus@baylibre.com>	2024-01-29 11:06:15 +01:00
Jakub Jelinek	b338fdbc2b	tree-ssa-strlen: Fix pdata->maxlen computation [PR110603] On the following testcase we emit an invalid range of [2, 1] due to UB in the source. Older VRP code silently swapped the boundaries and made [1, 2] range out of it, but newer code just ICEs on it. The reason for pdata->minlen 2 is that we see a memcpy in this case setting both elements of the array to non-zero value, so strlen (a) can't be smaller than 2. The reason for pdata->maxlen 1 is that in char a[2] array without UB there can be at most 1 non-zero character because there needs to be '\0' termination in the buffer too. IMHO we shouldn't create invalid ranges like that and even creating for that case a range [1, 2] looks wrong to me, so the following patch just doesn't set maxlen in that case to the array size - 1, matching what will really happen at runtime when triggering such UB (strlen will be at least 2, perhaps more or will crash). This is what the second hunk of the patch does. The first hunk fixes a fortunately harmless thinko. If the strlen pass knows the string length (i.e. get_string_length function returns non-NULL), we take a different path, we get to this only if all we know is that there are certain number of non-zero characters but we don't know what it is followed with, whether further non-zero characters or zero termination or either of that. If we know exactly how many non-zero characters it is, such as char a[42]; ... memcpy (a, "01234567890123456789", 20); then we take an earlier if for the INTEGER_CST case and set correctly just pdata->minlen to 20 in that case, but if we have something like int len; ... if (len < 15 \|\| len > 32) return; memcpy (a, "0123456789012345678901234567890123456789", len); then we have [15, 32] range for the nonzero_chars and we set pdata->minlen correctly to 15, but incorrectly set also pdata->maxlen to 32. That is not what the above implies, it just means that in some cases we know that there are at least 32 non-zero characters, followed by something we don't know. There is no guarantee that there is '\0' right after it, so it means nothing. The reason this is harmless, just confusing, is that the code a few lines later fortunately overwrites this incorrect pdata->maxlen value with something different (either array length - 1 or all ones etc.). 2024-01-29 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/110603 * tree-ssa-strlen.cc (get_range_strlen_dynamic): Remove incorrect setting of pdata->maxlen to vr.upper_bound (which is unconditionally overwritten anyway). Avoid creating invalid range with minlen larger than maxlen. Formatting fix. * gcc.c-torture/compile/pr110603.c: New test.	2024-01-29 10:20:32 +01:00
Richard Biener	b702dc9802	debug/103047 - argument order of inlined functions The inliner puts variables for parameters of the inlined functions in the inline scope in reverse order. The following reverses them again so that we get consistent ordering between the DW_TAG_subprogram DW_TAG_formal_parameter and the DW_TAG_inlined_subroutine DW_TAG_formal_parameter set. I failed to create a testcase with regexps since the inline instances have just abstract origins and so I can't match them up. PR debug/103047 * tree-inline.cc (initialize_inlined_parameters): Reverse the decl chain of inlined parameters.	2024-01-29 08:41:20 +01:00
Andrew Pinski	5b393ac7f1	testsuite: Fix vect_long_mult for 32-bit Power [PR109705] As PR109705#c17, commit r14-7270 missed to consider long type is 32bit with option -m32. This patch is take care of it accordingly. Note that the vect_long_mult is supposed to be only used in vect/ (generic), powerpc_altivec_ok would be guaranteed. PR testsuite/109705 gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_vect_long_mult): Fix powerpc--* checks by considering ilp32. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>	2024-01-28 20:35:05 -06:00
GCC Administrator	91b3da6f11	Daily bump.	2024-01-29 00:18:44 +00:00
Victor Do Nascimento	7dd4466b39	Libatomic: Add checks in ifunc selectors for LSE/LSE2 requirements. At present, Evaluation of both `has_lse2(hwcap)' and `has_lse128(hwcap)' may require issuing an `mrs' instruction to query a system register. This instruction, when issued from user-space results in a trap by the kernel which then returns the value read in by the system register. Given the undesirable nature of the computational expense associated with the context switch, it is important to implement mechanisms to, wherever possible, forgo the operation. In light of this, given how other architectural requirements serving as prerequisites have long been assigned HWCAP bits by the kernel, we can inexpensively query for their availability before attempting to read any system registers. Where one of these early tests fail, we can assert that the main feature of interest (be it LSE2 or LSE128) cannot be present, allowing us to return from the function early and skip the unnecessary expensive kernel-mediated access to system registers. libatomic/ChangeLog: * config/linux/aarch64/host-config.h (has_lse2): Add test for LSE. (has_lse128): Add test for LSE2.	2024-01-28 20:02:17 +00:00
Victor Do Nascimento	5ad64d76c0	libatomic: Enable LSE128 128-bit atomics for Armv9.4-a The armv9.4-a architectural revision adds three new atomic operations associated with the LSE128 feature: * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit value held in a pair of registers, with original data loaded into the same 2 registers. * LDSETP - Atomic OR (bitset) of a location with 128-bit value held in a pair of registers, with original data loaded into the same 2 registers. * SWPP - Atomic swap of one 128-bit value with 128-bit value held in a pair of registers. It is worth noting that in keeping with existing 128-bit atomic operations in `atomic_16.S', we have chosen to merge certain less-restrictive orderings into more restrictive ones. This is done to minimize the number of branches in the atomic functions, minimizing both the likelihood of branch mispredictions and, in keeping code small, limit the need for extra fetch cycles. Past benchmarking has revealed that acquire is typically slightly faster than release (5-10%), such that for the most frequently used atomics (CAS and SWP) it makes sense to add support for acquire, as well as release. Likewise, it was identified that combining acquire and release typically results in little to no penalty, such that it is of negligible benefit to distinguish between release and acquire-release, making the combining release/acq_rel/seq_cst a worthwhile design choice. This patch adds the logic required to make use of these when the architectural feature is present and a suitable assembler available. In order to do this, the following changes are made: 1. Add a configure-time check to check for LSE128 support in the assembler. 2. Edit host-config.h so that when N == 16, nifunc = 2. 3. Where available due to LSE128, implement the second ifunc, making use of the novel instructions. 4. For atomic functions unable to make use of these new instructions, define a new alias which causes the _i1 function variant to point ahead to the corresponding _i2 implementation. libatomic/ChangeLog: * Makefile.am (AM_CPPFLAGS): add conditional setting of -DHAVE_FEAT_LSE128. * acinclude.m4 (LIBAT_TEST_FEAT_AARCH64_LSE128): New. * config/linux/aarch64/atomic_16.S (LSE128): New macro definition. (libat_exchange_16): New LSE128 variant. (libat_fetch_or_16): Likewise. (libat_or_fetch_16): Likewise. (libat_fetch_and_16): Likewise. (libat_and_fetch_16): Likewise. * config/linux/aarch64/host-config.h (IFUNC_COND_2): New. (IFUNC_NCOND): Add operand size checking. (has_lse2): Renamed from `ifunc1`. (has_lse128): New. (HWCAP2_LSE128): Likewise. * configure.ac: Add call to LIBAT_TEST_FEAT_AARCH64_LSE128. * configure (ac_subst_vars): Regenerated via autoreconf. * Makefile.in: Likewise. * auto-config.h.in: Likewise.	2024-01-28 20:02:01 +00:00
Victor Do Nascimento	a899a1f2f3	libatomic: Add support for __ifunc_arg_t arg in ifunc resolver With support for new atomic features in Armv9.4-a being indicated by HWCAP2 bits, Libatomic's ifunc resolver must now query its second argument, of type __ifunc_arg_t. We therefore make this argument known to libatomic, allowing us to query hwcap2 bits in the following manner: bool resolver (unsigned long hwcap, const __ifunc_arg_t features); { return (features->hwcap2 & HWCAP2_<FEAT_NAME>); } libatomic/ChangeLog: * config/linux/aarch64/host-config.h (__ifunc_arg_t): Conditionally-defined if `sys/ifunc.h' not found. (_IFUNC_ARG_HWCAP): Likewise. (IFUNC_COND_1): Pass __ifunc_arg_t argument to ifunc. (ifunc1): Modify function signature to accept __ifunc_arg_t argument. * configure.tgt: Add second `const __ifunc_arg_t *features' argument to IFUNC_RESOLVER_ARGS.	2024-01-28 19:52:42 +00:00
Victor Do Nascimento	e64602c025	libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface The introduction of further architectural-feature dependent ifuncs for AArch64 makes hard-coding ifunc `_i<n>' suffixes to functions cumbersome to work with. It is awkward to remember which ifunc maps onto which arch feature and makes the code harder to maintain when new ifuncs are added and their suffixes possibly altered. This patch uses pre-processor `#define' statements to map each suffix to a descriptive feature name macro, for example: #define LSE(NAME) NAME##_i1 Where we wish to generate ifunc names with the pre-processor's token concatenation feature, we add a level of indirection to previous macro calls. If before we would have had`MACRO(<name>_i<n>)', we now have `MACRO_FEAT(name, feature)'. Where we wish to refer to base functionality (i.e., functions where ifunc suffixes are absent), the original `MACRO(<name>)' may be used to bypass suffixing. Consequently, for base functionality, where the ifunc suffix is absent, the macro interface remains the same. For example, the entry and endpoints of `libat_store_16' remain defined by: ENTRY (libat_store_16) and END (libat_store_16) For the LSE2 implementation of the same 16-byte atomic store, we now have: ENTRY_FEAT (libat_store_16, LSE2) and END_FEAT (libat_store_16, LSE2) For the aliasing of function names, we define the following new implementation of the ALIAS macro: ALIAS (FN_BASE_NAME, FROM_SUFFIX, TO_SUFFIX) Defining the `CORE(NAME)' macro to be the identity operator, it returns the base function name unaltered and allows us to alias target-specific ifuncs to the corresponding base implementation. For example, we'd alias the LSE2 `libat_exchange_16' to it base implementation with: ALIAS (libat_exchange_16, LSE2, CORE) libatomic/ChangeLog: * config/linux/aarch64/atomic_16.S (CORE): New macro. (LSE2): Likewise. (ENTRY_FEAT): Likewise. (ENTRY_FEAT1): Likewise. (END_FEAT): Likewise. (END_FEAT1): Likewise. (ALIAS): Modify macro to take in `arch' arguments. (ALIAS1): New.	2024-01-28 19:52:41 +00:00
Harald Anlauf	c4773944bb	Fortran: NULL actual to optional dummy with VALUE attribute [PR113377] gcc/fortran/ChangeLog: PR fortran/113377 * trans-expr.cc (conv_dummy_value): Treat NULL actual argument to optional dummy with the VALUE attribute as not present. (gfc_conv_procedure_call): Likewise. gcc/testsuite/ChangeLog: PR fortran/113377 * gfortran.dg/optional_absent_11.f90: New test.	2024-01-28 20:06:37 +01:00
Iain Sandoe	f74f840d35	Objective-C, Darwin: Do not overalign CFStrings and Objective-C metadata. We have reports of regressions in both Objective-C and Objective-C++ on Darwin23 (macOS 14). In some cases, these are linker warnings about the alignment of CFString constants; in other cases the built executables crash during runtime initialization. The underlying issue is the same in both cases; since the objects (CFStrings, Objective-C meta-data) are TU- local, we are choosing to increase their alignment for efficiency - to values greater than ABI alignment. However, although these objects are TU-local, they are also visible to the linker (since they are placed in specific named sections). In many cases the metadata can be regarded as tables of data, and thus it is expected that these sections can be concatenated from multiple TUs and the data treated as tabular. In order for this to work the data cannot be allowed to exceed ABI alignment - which leads to the crashes. For GCC-15+ it would be nice to find a more elegant solution to this issue (perhaps by adjusting the concept of binds-locally to exclude specific named sections) - but I do not want to do that in stage 4. The solution here is to force the alignment to be preserved as created by setting DECL_USER_ALIGN on the relevant objects. gcc/ChangeLog: * config/darwin.cc (darwin_build_constant_cfstring): Prevent over- alignment of CFString constants by setting DECL_USER_ALIGN. gcc/objc/ChangeLog: * objc-next-runtime-abi-02.cc (build_v2_address_table): Prevent over-alignment of Objective-C metadata by setting DECL_USER_ALIGN on relevant variables. (build_v2_protocol_list_address_table): Likewise. (generate_v2_protocol_list): Likewise. (generate_v2_meth_descriptor_table): Likewise. (generate_v2_meth_type_list): Likewise. (generate_v2_property_table): Likewise. (generate_v2_dispatch_table): Likewise. (generate_v2_ivars_list): Likewise. (generate_v2_class_structs): Likewise. (build_ehtype): Likewise. * objc-runtime-shared-support.cc (generate_strings): Likewise. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>	2024-01-28 11:46:27 +00:00
Iain Sandoe	30d9e81c19	testsuite, Objective-C: Fix duplicate libobjc cases. Two of the encode testcases include '-lobjc' as their dg-options. Since the library is already appended as part of the generic testsuite handling, this means that two instances appear on the link line leading to spurious warnings from Darwin's new linker. gcc/testsuite/ChangeLog: * obj-c++.dg/encode-10.mm: Remove unneeded '-lobjc' option addition. * obj-c++.dg/encode-9.mm: Likewise. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>	2024-01-28 11:03:26 +00:00
Iain Sandoe	837827f8f2	Fix __builtin_nested_func_ptr_{created,deleted} symbol versions [PR113402] The symbols for the functions supporting heap-based trampolines were exported at an incorrect symbol version, the following patch fixes that. As requested in the PR, this also renames __builtin_nested_func_ptr* to __gcc_nested_func_ptr. In carrying our the rename, we move the builtins to use DEF_EXT_LIB_BUILTIN. PR libgcc/113402 gcc/ChangeLog: builtins.cc (expand_builtin): Handle BUILT_IN_GCC_NESTED_PTR_CREATED and BUILT_IN_GCC_NESTED_PTR_DELETED. * builtins.def (BUILT_IN_GCC_NESTED_PTR_CREATED, BUILT_IN_GCC_NESTED_PTR_DELETED): Make these builtins LIB-EXT and rename the library fallbacks to __gcc_nested_func_ptr_created and __gcc_nested_func_ptr_deleted. * doc/invoke.texi: Rename these to __gcc_nested_func_ptr_created and __gcc_nested_func_ptr_deleted. * tree-nested.cc (finalize_nesting_tree_1): Use builtin_explicit for BUILT_IN_GCC_NESTED_PTR_CREATED and BUILT_IN_GCC_NESTED_PTR_DELETED. * tree.cc (build_common_builtin_nodes): Build the BUILT_IN_GCC_NESTED_PTR_CREATED and BUILT_IN_GCC_NESTED_PTR_DELETED local builtins only for non-explicit. libgcc/ChangeLog: * config/aarch64/heap-trampoline.c: Rename __builtin_nested_func_ptr_created to __gcc_nested_func_ptr_created and __builtin_nested_func_ptr_deleted to __gcc_nested_func_ptr_deleted. * config/i386/heap-trampoline.c: Likewise. * libgcc2.h: Likewise. * libgcc-std.ver.in (GCC_7.0.0): Likewise and then move __gcc_nested_func_ptr_created and __gcc_nested_func_ptr_deleted from this symbol version to ... (GCC_14.0.0): ... this one. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> Co-authored-by: Jakub Jelinek <jakub@redhat.com>	2024-01-28 10:59:34 +00:00
Iain Sandoe	557bea3d2e	testsuite, jit: Stabilize error output. Currently when a test fails, we print out a lot of information, this includes items that are not stable between invocations (e.g. the PID for the executable). That makes automated comparisons between test runs flag any persistent fails as new ones each time which is not usually what is wanted. This patch amends the error output to drop the variable portion of the message and retain items that should only change if the failure mode changes. gcc/testsuite/ChangeLog: * jit.dg/jit.exp: Filter error output to remove per-run variable content. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>	2024-01-28 10:58:33 +00:00
YunQiang Su	46df13697a	doc/invoke: Remove duplicate explicit-relocs entry of MIPS When add new style -mexplicit-relocs=, the old style was not removed. gcc * doc/invoke.texi: Remove duplicate MIPS explicit-relocs option.	2024-01-28 13:21:41 +08:00
GCC Administrator	3979171d8e	Daily bump.	2024-01-28 00:16:34 +00:00
Jose E. Marchesi	f64448f4ff	bpf: add constant pointer to helper-skb-ancestor-cgroup-id.c test The purpose of this test is to make sure that constant propagation is achieved with the proper optimization level, so a BPF call instruction to a kernel helper is generated. This patch updates the patch so it also covers kernel helpers defined with constant static pointers. The motivation for this patch is: https://lore.kernel.org/bpf/20240127185031.29854-1-jose.marchesi@oracle.com/T/#u Tested in bpf-unknown-none target x86_64-linux-gnu host. gcc/testsuite/ChangeLog * gcc.target/bpf/helper-skb-ancestor-cgroup-id.c: Add constant version of kernel helper static pointer.	2024-01-27 20:08:12 +01:00
Harald Anlauf	ce61de1b8a	Fortran: fix bounds-checking errors for CLASS array dummies [PR104908] Commit r11-1235 addressed issues with bounds of unlimited polymorphic array dummies. However, using the descriptor from sym->backend_decl does break the case of CLASS array dummies. The obvious solution is to restrict the fix to the unlimited polymorphic case, thus keeping the original descriptor in the ordinary case. gcc/fortran/ChangeLog: PR fortran/104908 * trans-array.cc (gfc_conv_array_ref): Restrict use of transformed descriptor (sym->backend_decl) to the unlimited polymorphic case. gcc/testsuite/ChangeLog: PR fortran/104908 * gfortran.dg/pr104908.f90: New test.	2024-01-27 17:41:43 +01:00
H.J. Lu	7cc9adc62c	x86: Don't save callee-saved registers in noreturn functions There is no need to save callee-saved registers in noreturn functions if they don't throw nor support exceptions. We can treat them the same as functions with no_callee_saved_registers attribute. Adjust stack-check-17.c for noreturn function which no longer saves any registers. With this change, __libc_start_main in glibc 2.39, which is a noreturn function, is changed from __libc_start_main: endbr64 push %r15 push %r14 mov %rcx,%r14 push %r13 push %r12 push %rbp mov %esi,%ebp push %rbx mov %rdx,%rbx sub $0x28,%rsp mov %rdi,(%rsp) mov %fs:0x28,%rax mov %rax,0x18(%rsp) xor %eax,%eax test %r9,%r9 to __libc_start_main: endbr64 sub $0x28,%rsp mov %esi,%ebp mov %rdx,%rbx mov %rcx,%r14 mov %rdi,(%rsp) mov %fs:0x28,%rax mov %rax,0x18(%rsp) xor %eax,%eax test %r9,%r9 In Linux kernel 6.7.0 on x86-64, do_exit is changed from do_exit: endbr64 call <do_exit+0x9> push %r15 push %r14 push %r13 push %r12 mov %rdi,%r12 push %rbp push %rbx mov %gs:0x0,%rbx sub $0x28,%rsp mov %gs:0x28,%rax mov %rax,0x20(%rsp) xor %eax,%eax call 0x0(%rip) # <do_exit+0x39> test $0x2,%ah je <do_exit+0x8d3> to do_exit: endbr64 call <do_exit+0x9> sub $0x28,%rsp mov %rdi,%r12 mov %gs:0x28,%rax mov %rax,0x20(%rsp) xor %eax,%eax mov %gs:0x0,%rbx call 0x0(%rip) # <do_exit+0x2f> test $0x2,%ah je <do_exit+0x8c9> I compared GCC master branch bootstrap and test times on a slow machine with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13 with the backported patch. The performance data isn't precise since the measurements were done on different days with different GCC sources under different 6.6 kernel versions. GCC master branch build time in seconds: before after improvement 30043.75user 30013.16user 0% 1274.85system 1243.72system 2.4% GCC master branch test time in seconds (new tests added): before after improvement 216035.90user 216547.51user 0 27365.51system 26658.54system 2.6% gcc/ PR target/38534 * config/i386/i386-options.cc (ix86_set_func_type): Don't save and restore callee saved registers for a noreturn function with nothrow or compiled with -fno-exceptions. gcc/testsuite/ PR target/38534 * gcc.target/i386/pr38534-1.c: New file. * gcc.target/i386/pr38534-2.c: Likewise. * gcc.target/i386/pr38534-3.c: Likewise. * gcc.target/i386/pr38534-4.c: Likewise. * gcc.target/i386/stack-check-17.c: Updated.	2024-01-27 04:10:49 -08:00
H.J. Lu	a96549dce7	x86: Add no_callee_saved_registers function attribute When an interrupt handler is implemented by an assembly stub which does: 1. Save all registers. 2. Call a C function. 3. Restore all registers. 4. Return from interrupt. it is completely unnecessary to save and restore any registers in the C function called by the assembly stub, even if they would normally be callee-saved. Add no_callee_saved_registers function attribute, which is complementary to no_caller_saved_registers function attribute, to mark a function which doesn't have any callee-saved registers. Such a function won't save and restore any registers. Classify function call-saved register handling type with: 1. Default call-saved registers. 2. No caller-saved registers with no_caller_saved_registers attribute. 3. No callee-saved registers with no_callee_saved_registers attribute. Disallow sibcall if callee is a no_callee_saved_registers function and caller isn't a no_callee_saved_registers function. Otherwise, callee-saved registers won't be preserved. After a no_callee_saved_registers function is called, all registers may be clobbered. If the calling function isn't a no_callee_saved_registers function, we need to preserve all registers which aren't used by function calls. gcc/ PR target/103503 PR target/113312 * config/i386/i386-expand.cc (ix86_expand_call): Replace no_caller_saved_registers check with call_saved_registers check. Clobber all registers that are not used by the callee with no_callee_saved_registers attribute. * config/i386/i386-options.cc (ix86_set_func_type): Set call_saved_registers to TYPE_NO_CALLEE_SAVED_REGISTERS for noreturn function. Disallow no_callee_saved_registers with interrupt or no_caller_saved_registers attributes together. (ix86_set_current_function): Replace no_caller_saved_registers check with call_saved_registers check. (ix86_handle_no_caller_saved_registers_attribute): Renamed to ... (ix86_handle_call_saved_registers_attribute): This. (ix86_gnu_attributes): Add ix86_handle_call_saved_registers_attribute. * config/i386/i386.cc (ix86_conditional_register_usage): Replace no_caller_saved_registers check with call_saved_registers check. (ix86_function_ok_for_sibcall): Don't allow callee with no_callee_saved_registers attribute when the calling function has callee-saved registers. (ix86_comp_type_attributes): Also check no_callee_saved_registers. (ix86_epilogue_uses): Replace no_caller_saved_registers check with call_saved_registers check. (ix86_hard_regno_scratch_ok): Likewise. (ix86_save_reg): Replace no_caller_saved_registers check with call_saved_registers check. Don't save any registers for TYPE_NO_CALLEE_SAVED_REGISTERS. Save all registers with TYPE_DEFAULT_CALL_SAVED_REGISTERS if function with no_callee_saved_registers attribute is called. (find_drap_reg): Replace no_caller_saved_registers check with call_saved_registers check. * config/i386/i386.h (call_saved_registers_type): New enum. (machine_function): Replace no_caller_saved_registers with call_saved_registers. * doc/extend.texi: Document no_callee_saved_registers attribute. gcc/testsuite/ PR target/103503 PR target/113312 * gcc.dg/torture/no-callee-saved-run-1a.c: New file. * gcc.dg/torture/no-callee-saved-run-1b.c: Likewise. * gcc.target/i386/no-callee-saved-1.c: Likewise. * gcc.target/i386/no-callee-saved-2.c: Likewise. * gcc.target/i386/no-callee-saved-3.c: Likewise. * gcc.target/i386/no-callee-saved-4.c: Likewise. * gcc.target/i386/no-callee-saved-5.c: Likewise. * gcc.target/i386/no-callee-saved-6.c: Likewise. * gcc.target/i386/no-callee-saved-7.c: Likewise. * gcc.target/i386/no-callee-saved-8.c: Likewise. * gcc.target/i386/no-callee-saved-9.c: Likewise. * gcc.target/i386/no-callee-saved-10.c: Likewise. * gcc.target/i386/no-callee-saved-11.c: Likewise. * gcc.target/i386/no-callee-saved-12.c: Likewise. * gcc.target/i386/no-callee-saved-13.c: Likewise. * gcc.target/i386/no-callee-saved-14.c: Likewise. * gcc.target/i386/no-callee-saved-15.c: Likewise. * gcc.target/i386/no-callee-saved-16.c: Likewise. * gcc.target/i386/no-callee-saved-17.c: Likewise. * gcc.target/i386/no-callee-saved-18.c: Likewise.	2024-01-27 04:10:49 -08:00
Jakub Jelinek	a12b0e9360	lower-bitint: Avoid sign-extending cast to unsigned types feeding div/mod/float [PR113614] The following testcase is miscompiled, because some narrower value is sign-extended to wider unsigned _BitInt used as division operand. handle_operand_addr for that case returns the narrower value and precision -prec_of_narrower_value. That works fine for multiplication (at least, normal multiplication, but we don't merge casts with .MUL_OVERFLOW or the ubsan multiplication right now), because the result is the same whether we treat the arguments as signed or unsigned. But is completely wrong for division/modulo or conversions to floating-point, if we pass negative prec for an input operand of a libgcc handler, those treat it like a negative number, not an unsigned one sign-extended from something smaller (and it doesn't know to what precision it has been extended). So, the following patch fixes it by making sure we don't merge such sign-extensions to unsigned _BitInt type with division, modulo or conversions to floating point. 2024-01-27 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113614 * gimple-lower-bitint.cc (gimple_lower_bitint): Don't merge widening casts from signed to unsigned types with TRUNC_DIV_EXPR, TRUNC_MOD_EXPR or FLOAT_EXPR uses. * gcc.dg/torture/bitint-54.c: New test.	2024-01-27 13:06:55 +01:00
Jakub Jelinek	3f5ac46963	lower-bitint: Fix up VIEW_CONVERT_EXPR handling in lower_mergeable_stmt [PR113568] We generally allow merging mergeable stmts with some final cast (but not further casts or mergeable operations after the cast). As some casts are handled conditionally, if (idx < cst) handle_operand (idx); else if idx == cst) handle_operand (cst); else ..., we must sure that e.g. the mergeable PLUS_EXPR/MINUS_EXPR/NEGATE_EXPR never appear in handle_operand called from such casts, because it ICEs on invalid SSA_NAME form (that part could be fixable by adding further PHIs) but also because we'd need to correctly propagate the overflow flags from the if to else if. So, instead lower_mergeable_stmt handles an outermost widening cast (or widening cast feeding outermost store) specially. The problem was similar to PR113408, that VIEW_CONVERT_EXPR tree is present in the gimple_assign_rhs1 while it is not for NOP_EXPR/CONVERT_EXPR, so the checks whether the outermost cast should be handled didn't handle the VCE case and so handle_plus_minus was called from the conditional handle_cast. 2024-01-27 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113568 * gimple-lower-bitint.cc (bitint_large_huge::lower_mergeable_stmt): For VIEW_CONVERT_EXPR use first operand of rhs1 instead of rhs1 in the widening extension checks. * gcc.dg/bitint-78.c: New test.	2024-01-27 13:06:17 +01:00
Jakub Jelinek	675e522903	lower-bitint: Add debugging dump of SSA_NAME -> decl mappings While the SSA coalescing performed by lower bitint prints some information if -fdump-tree-bitintlower-details, it is really hard to read and doesn't contain the most important information which one looks for when debugging bitint lowering issues, namely what VAR_DECLs (or PARM_DECLs/RESULT_DECLs) each SSA_NAME in large_huge.m_names bitmap maps to. So, the following patch adds dumping of that, so that we know that say _3 -> bitint.3 _8 -> bitint.7 _16 -> bitint.7 etc. 2024-01-27 Jakub Jelinek <jakub@redhat.com> * gimple-lower-bitint.cc (gimple_lower_bitint): For TDF_DETAILS dump mapping of SSA_NAMEs to decls.	2024-01-27 13:05:30 +01:00
Lewis Hyatt	5200ef26ac	c-family: Fix ICE with large column number after restoring a PCH [PR105608] Users are allowed to define macros prior to restoring a precompiled header file, as long as those macros are not defined (or are defined identically) in the PCH. However, the PCH restoration process destroys all the macro definitions, so libcpp has to record them before restoring the PCH and then redefine them afterward. This process does not currently assign great locations to the macros after redefining them. Some work is needed to also remember the original locations and get the line_maps instance in the right state (since, like all other data structures, the line_maps instance is also reset after restoring a PCH). The new testcase line-map-3.C contains XFAILed examples where the locations are wrong. This patch addresses a more pressing issue, which is that we ICE in some cases since GCC 11, hitting an assert in line-maps.cc. It happens if the first line encountered after the PCH restore requires an LC_RENAME map, such as will happen if the line is sufficiently long. This is much easier to fix, since we just need to call linemap_line_start before asking libcpp to redefine the stored macros, instead of afterward, to avoid the unexpected need for an LC_RENAME before an LC_ENTER has been seen. gcc/c-family/ChangeLog: PR preprocessor/105608 * c-pch.cc (c_common_read_pch): Start a new line map before asking libcpp to restore macros defined prior to reading the PCH, instead of afterward. gcc/testsuite/ChangeLog: PR preprocessor/105608 * g++.dg/pch/line-map-1.C: New test. * g++.dg/pch/line-map-1.Hs: New test. * g++.dg/pch/line-map-2.C: New test. * g++.dg/pch/line-map-2.Hs: New test. * g++.dg/pch/line-map-3.C: New test. * g++.dg/pch/line-map-3.Hs: New test.	2024-01-26 23:27:53 -05:00
GCC Administrator	ce9dae5640	Daily bump.	2024-01-27 00:18:16 +00:00
Hans-Peter Nilsson	4eb8367042	c/c++: Tweak warning for 'always_inline function might not be inlinable' When you're not regularly exposed to this warning, it is easy to be misled by its wording, believing that there's something else in the function that stops it from being inlined, something other than the lack of also being declared inline. Also, clang does not warn. It's just a warning: without the inline directive, there has to be a secondary reason for the function to be inlined, other than the always_inline attribute, a reason that may be in effect despite the warning. Whenever the text is quoted in inline-related bugzilla entries, there seems to often have been an initial step of confusion that has to be cleared, for example in PR55830. A file in the powerpc-specific parts of the test-suite, gcc.target/powerpc/vec-extract-v16qiu-v2.h, has a comment and seems to be another example, and I testify as the first-hand third "experience". The wording has been the same since the warning was added. Let's just tweak the wording, adding the cause, so that the reason for the warning is clearer. This hopefully stops the user from immediately asking "'Might'? Because why?" and then going off looking at the function body - or grepping the gcc source or documentation, or enter a bug-report subsequently closed as resolved/invalid. Since the message is only appended with additional information, no test-case actually required adjustment. I still changed them, so the message is covered. gcc: * cgraphunit.cc (process_function_and_variable_attributes): Tweak the warning for an attribute-always_inline without inline declaration. gcc/testsuite: * g++.dg/Wattributes-3.C: Adjust expected warning. * gcc.dg/fail_always_inline.c: Ditto.	2024-01-27 00:55:01 +01:00
Nathaniel Shead	ec57d183d3	c++: Stream additional fields for DECL_STRUCT_FUNCTION [PR113580] Currently the DECL_STRUCT_FUNCTION for a declaration is always reconstructed from scratch. This causes issues though, as some fields used by other parts of the compiler (in this case, specifically 'function_{start,end}_locus') are then not correctly initialised. This patch makes sure that these fields are also read and written. PR c++/113580 gcc/cp/ChangeLog: * module.cc (struct post_process_data): Create. (trees_in::post_decls): Use. (trees_in::post_process): Return entire vector at once. Change overload to take post_process_data instead of tree. (trees_out::write_function_def): Write needed flags from DECL_STRUCT_FUNCTION. (trees_in::read_function_def): Read them and pass to post_process. (module_state::read_cluster): Write flags into cfun. gcc/testsuite/ChangeLog: * g++.dg/modules/pr113580_a.C: New test. * g++.dg/modules/pr113580_b.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>	2024-01-27 09:29:05 +11:00
Maciej W. Rozycki	5a874dec60	RISC-V/testsuite: Add RTL cset-sext.c testcase variants Add RTL tests, for RV64 and RV32 where appropriate, corresponding to the existing cset-sext.c tests. They have been produced from RTL code as at the entry of the "ce1" pass for the respective cset-sext.c tests built at -O3. gcc/testsuite/ * gcc.target/riscv/cset-sext-rtl.c: New file. * gcc.target/riscv/cset-sext-rtl32.c: New file. * gcc.target/riscv/cset-sext-sfb-rtl.c: New file. * gcc.target/riscv/cset-sext-sfb-rtl32.c: New file. * gcc.target/riscv/cset-sext-thead-rtl.c: New file. * gcc.target/riscv/cset-sext-ventana-rtl.c: New file. * gcc.target/riscv/cset-sext-zicond-rtl.c: New file. * gcc.target/riscv/cset-sext-zicond-rtl32.c: New file.	2024-01-26 21:47:40 +00:00
Maciej W. Rozycki	d4e15084e2	RISC-V/testsuite: Add RTL pr105314.c testcase variants Add a pair of RTL tests, for RV64 and RV32 respectively, corresponding to the existing pr105314.c test. They have been produced from RTL code as at the entry of the "ce1" pass for pr105314.c compiled at -O3. gcc/testsuite/ * gcc.target/riscv/pr105314-rtl.c: New file. * gcc.target/riscv/pr105314-rtl32.c: New file.	2024-01-26 21:47:40 +00:00
Maciej W. Rozycki	3e3b9b708d	RISC-V/testsuite: Also verify if-conversion runs for pr105314.c Verify that if-conversion succeeded through noce_try_store_flag_mask, as per PR rtl-optimization/105314, tightening the test case and making it explicit. gcc/testsuite/ * gcc.target/riscv/pr105314.c: Scan the RTL "ce1" pass too.	2024-01-26 21:47:40 +00:00
Maciej W. Rozycki	a0596173c8	RISC-V/testsuite: Widen coverage for pr105314.c The optimization levels pr105314.c is iterated over are needlessly overridden with "-O2", limiting the coverage of the test case to that level, perhaps with additional options the original optimization level has been supplied with. We could prevent the extra iterations other than "-O2" from being run, but the transformation made by if-conversion is also expected to happen at other optimization levels, so include them all, and also make sure no reverse-condition branch appears in output, moving the `dg-final' command to the bottom, as with most test cases. gcc/testsuite/ * gcc.target/riscv/pr105314.c: Replace `dg-options' command with `dg-skip-if'. Also reject "bne" with `dg-final'.	2024-01-26 21:47:40 +00:00
Robin Dapp	861997a9c7	genopinit: Split init_all_optabs [PR113575]. init_all_optabs initializes > 10000 patterns for riscv targets. This leads to pathological situations in dataflow analysis (which can occur with many adjacent stores). To alleviate this this patch makes genopinit split the init_all_optabs function into several init_optabs_xx functions that each initialize 1000 patterns. With this change insn-opinit.cc's compilation time is reduced from 4+ minutes to 1:30 and memory consumption decreases from 1.2G to 630M. gcc/ChangeLog: PR other/113575 * genopinit.cc (main): Split init_all_optabs into functions of 1000 patterns each.	2024-01-26 22:12:06 +01:00
Gaius Mulley	eb619490b0	modula2: detect string and pointer formal and actual parameter incompatibility This patch improves the location accuracy of parameters and fixes bugs in parameter checking in M2Check. It also corrects the location of constant declarations. gcc/m2/ChangeLog: * gm2-compiler/M2Check.mod (dumpIndice): New procedure. (dumpIndex): New procedure. (dumptInfo): New procedure. (buildError4): Add comment and pass formal and actual to MetaError4. Improve text describing error. (buildError2): Generate different error descriptions for the three error kinds. (checkConstMeta): Add block comment. Add more meta checks and call doCheckPair to complete string const checking. Add tinfo parameter. (checkConstEquivalence): Add tinfo parameter. * gm2-compiler/M2GCCDeclare.mod (PrintVerboseFromList): Print the length of a const string. * gm2-compiler/M2GenGCC.mod (CodeParam): Remove parameters op1, op2 and op3. (doParam): Add paramtok parameter. Use paramtok instead rather than CurrentQuadToken. (CodeParam): Rewrite. * gm2-compiler/M2Quads.mod (CheckProcedureParameters): Add comments explaining that const strings are not checked in M2Quads.mod. (FailParameter): Use MetaErrorT2 with tokpos rather than MetaError2. (doBuildBinaryOp): Assign OldPos and OperatorPos before the IF block. * gm2-compiler/SymbolTable.mod (PutConstString): Add call to InitWhereDeclaredTok. gcc/testsuite/ChangeLog: * gm2/pim/fail/badpointer4.mod: New test. * gm2/pim/fail/strconst.def: New test. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>	2024-01-26 19:04:48 +00:00
Richard Biener	c34ab549d8	Avoid registering unsupported OMP offload devices The following avoids registering unsupported GCN offload devices when iterating over available ones. With a Zen4 desktop CPU you will have an IGPU (unspported) which will otherwise be made available. This causes testcases like libgomp.c-c++-common/non-rect-loop-1.c which iterate over all decives to FAIL. libgomp/ * plugin/plugin-gcn.c (suitable_hsa_agent_p): Filter out agents with unsupported ISA.	2024-01-26 15:36:35 +01:00
Richard Biener	209ed06c3a	Fix architecture support in OMP_OFFLOAD_init_device for gcn The following makes the existing architecture support check work instead of being optimized away (enum vs. -1). This avoids later asserts when we assume such devices are never actually used. libgomp/ * plugin/plugin-gcn.c (EF_AMDGPU_MACH::EF_AMDGPU_MACH_UNSUPPORTED): Add. (isa_code): Return that instead of -1. (GOMP_OFFLOAD_init_device): Adjust.	2024-01-26 15:36:35 +01:00
Tobias Burnus	56d0aba11a	amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs gcc/ChangeLog: * config.gcc (amdgcn--): Add gfx1030 and gfx1100 to TM_MULTILIB_CONFIG. * doc/install.texi (Configuration amdgcn--): Mention gfx1030/gfx1100. * doc/invoke.texi (AMD GCN Options): Add gfx1030 and gfx1100 to -march/-mtune. libgomp/ChangeLog: * testsuite/libgomp.c/declare-variant-4.h: Add variant functions for gfx1030 and gfx1100. * testsuite/libgomp.c/declare-variant-4-gfx1030.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx1100.c: New test. Signed-off-by: Tobias Burnus <tburnus@baylibre.com>	2024-01-26 15:11:09 +01:00
Andrew Stubbs	99890e1552	amdgcn: additional gfx1030/gfx1100 support This is enough to get gfx1030 and gfx1100 working; there are still some test failures to investigate, and probably some tuning to do. gcc/ChangeLog: * config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3. * config/gcn/gcn-valu.md (all_convert): New iterator. (<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): New define_expand, and rename the old one to ... (<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): ... this. (extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): Likewise, to ... (extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): .. this. (<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>): New. * config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly. (gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100. * config/gcn/gcn.md (<u>mulhisi3): Disable on RDNA3. (<u>mulqihi3_scalar): Likewise. libgcc/ChangeLog: * config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3. libgomp/ChangeLog: * config/gcn/time.c (RTC_TICKS): Configure RDNA3. (omp_get_wtime): Add RDNA3-compatible variant. * plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100. Signed-off-by: Andrew Stubbs <ams@baylibre.com>	2024-01-26 11:38:47 +00:00
Nathaniel Shead	a0dde47f84	c++: Emit definitions of ODR-used static members imported from modules [PR112899] Static data members marked 'inline' should be emitted in TUs where they are ODR-used. We need to make sure that inlines imported from modules are correctly added to the 'pending_statics' map so that they get emitted if needed, otherwise the attached testcase fails to link. PR c++/112899 gcc/cp/ChangeLog: * cp-tree.h (note_variable_template_instantiation): Rename to... (note_vague_linkage_variable): ...this. * decl2.cc (note_variable_template_instantiation): Rename to... (note_vague_linkage_variable): ...this. * pt.cc (instantiate_decl): Rename usage of above function. * module.cc (trees_in::read_var_def): Remember pending statics that we stream in. gcc/testsuite/ChangeLog: * g++.dg/modules/init-4_a.C: New test. * g++.dg/modules/init-4_b.C: New test. * g++.dg/modules/init-6_a.H: New test. * g++.dg/modules/init-6_b.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jason Merrill <jason@redhat.com	2024-01-26 22:28:45 +11:00
Richard Biener	f9b143d239	tree-optimization/113602 - datarefs of non-addressables We can end up creating ADDR_EXPRs of non-addressable entities during for example vectorization. The following plugs this in data-ref analysis when that would create such invalid ADDR_EXPR as part of analyzing the ref structure. PR tree-optimization/113602 * tree-data-ref.cc (dr_analyze_innermost): Fail when the base object isn't addressable. * gcc.dg/pr113602.c: New testcase.	2024-01-26 11:25:05 +01:00
Tobias Burnus	4b5650acb3	gcn/gcn-hsa.h: Always pass --amdhsa-code-object-version= in ASM_SPEC Since LLVM commit 082f87c9d418 (Pull Req. #79038; will become LLVM 18) "[AMDGPU] Change default AMDHSA Code Object version to 5" the default - when no --amdhsa-code-object-version= is used - was bumped. Using --amdhsa-code-object-version=5 is supported (with unknown limitations) since LLVM 14. GCC required for proper support at least LLVM 13.0.1 such that explicitly using COV5 is not possible. Unfortunately, the COV number matters for debugging ("-g") as mkoffload.cc extracts debugging data from the host's object file and writes into an an AMD GPU object file it creates. And all object files linked together must have the same ABI version. gcc/ChangeLog: * config/gcn/gcn-hsa.h (ABI_VERSION_SPEC): New; creates the "--amdhsa-code-object-version=" argument. (ASM_SPEC): Use it; replace previous version of it. Signed-off-by: Tobias Burnus <tburnus@baylibre.com>	2024-01-26 10:14:09 +01:00

... 3 4 5 6 7 ...

208627 commits