The following implements storing to a non-MEM_P with a variable
offset. We usually avoid this by forcing expansion to memory but
this doesn't work for hard register variables. The solution is
to spill and operate on the stack.
PR middle-end/113622
* expr.cc (expand_assignment): Spill hard registers if
we index them with a variable offset.
* gcc.target/i386/pr113622-2.c: New testcase.
* gcc.target/i386/pr113622-3.c: Likewise.
The following expands .VEC_SET and .VEC_EXTRACT instruction selection
to global hard registers, not only automatic variables (possibly)
promoted to registers. This can avoid some ICEs later and create
better code.
PR middle-end/113622
* gimple-isel.cc (gimple_expand_vec_set_extract_expr):
Also allow DECL_HARD_REGISTER variables.
* gcc.target/i386/pr113622-1.c: New testcase.
The fix for PR113089 introduced range-based for loops over the
debug_insn_uses of an RTL-SSA set_info, but in the case that we reset a
debug insn, the use would get removed from the use list, and thus we
would end up using an invalidated iterator in the next iteration of the
loop. In practice this means we end up terminating the loop
prematurely, and hence ICE as in PR113089 since there are debug uses
that we failed to fix up.
This patch fixes that by introducing a general mechanism to avoid this
sort of problem. We introduce a safe_iterator to iterator-utils.h which
wraps an iterator, and also holds the end iterator value. It then
pre-computes the next iterator value at all iterations, so it doesn't
matter if the original iterator got invalidated during the loop body, we
can still move safely to the next iteration.
We introduce an iterate_safely helper which effectively adapts a
container such as iterator_range into a container of safe_iterators over
the original iterator type.
We then use iterate_safely around all loops over debug_insn_uses () in
the aarch64 ldp/stp pass to fix PR113616. While doing this, I
remembered that cleanup_tombstones () had the same problem. I
previously worked around this locally by manually maintaining the next
nondebug insn, so this patch also refactors that loop to use the new
iterate_safely helper.
While doing that I noticed that a couple of cases in cleanup_tombstones
could be converted from using dyn_cast<set_info *> to as_a<set_info *>,
which should be safe because there are no clobbers of mem in RTL-SSA, so
all defs of memory should be set_infos.
gcc/ChangeLog:
PR target/113616
* config/aarch64/aarch64-ldp-fusion.cc (fixup_debug_uses_trailing_add):
Use iterate_safely when iterating over debug uses.
(fixup_debug_uses): Likewise.
(ldp_bb_info::cleanup_tombstones): Use iterate_safely to iterate
over nondebug insns instead of manually maintaining the next insn.
* iterator-utils.h (class safe_iterator): New.
(iterate_safely): New.
gcc/testsuite/ChangeLog:
PR target/113616
* gcc.c-torture/compile/pr113616.c: New test.
Save callee-saved registers in noreturn functions for -O0/-Og so that
debugger can restore callee-saved registers in caller's frame.
Also add the TREE_THIS_VOLATILE check to minimize noreturn attribute
lookup.
gcc/
PR target/38534
* config/i386/i386-options.cc (ix86_set_func_type): Save
callee-saved registers in noreturn functions for -O0/-Og.
gcc/testsuite/
PR target/38534
* gcc.target/i386/pr38534-5.c: New file.
* gcc.target/i386/pr38534-6.c: Likewise.
gcc/testsuite/ChangeLog:
PR target/112950
* gcc.target/aarch64/sve/acle/general/dupq_5.c: Remove include directive
and instead use #pragma GCC for including arm_sve.h.
This was another PR caused by the way that
vect_determine_precisions_from_range handles shifts. We tried to
narrow 32768 >> x to a 16-bit shift based on range information for
the inputs and outputs, with vect_recog_over_widening_pattern
(after PR110828) adjusting the shift amount. But this doesn't
work for the case where x is in [16, 31], since then 32-bit
32768 >> x is a well-defined zero, whereas no well-defined
16-bit 32768 >> y will produce 0.
We could perhaps generate x < 16 ? 32768 >> x : 0 instead,
but since vect_determine_precisions_from_range was never really
supposed to rely on fix-ups, it seems better to fix that instead.
The patch also makes the code more selective about which codes
can be narrowed based on input and output ranges. This showed
that vect_truncatable_operation_p was missing cases for
BIT_NOT_EXPR (equivalent to BIT_XOR_EXPR of -1) and NEGATE_EXPR
(equivalent to BIT_NOT_EXPR followed by a PLUS_EXPR of 1).
pr113281-1.c is the original testcase. pr113281-[23].c failed
before the patch due to overly optimistic narrowing. pr113281-[45].c
previously passed and are meant to protect against accidental
optimisation regressions.
gcc/
PR target/113281
* tree-vect-patterns.cc (vect_recog_over_widening_pattern): Remove
workaround for right shifts.
(vect_truncatable_operation_p): Handle NEGATE_EXPR and BIT_NOT_EXPR.
(vect_determine_precisions_from_range): Be more selective about
which codes can be narrowed based on their input and output ranges.
For shifts, require at least one more bit of precision than the
maximum shift amount.
gcc/testsuite/
PR target/113281
* gcc.dg/vect/pr113281-1.c: New test.
* gcc.dg/vect/pr113281-2.c: Likewise.
* gcc.dg/vect/pr113281-3.c: Likewise.
* gcc.dg/vect/pr113281-4.c: Likewise.
* gcc.dg/vect/pr113281-5.c: Likewise.
The -march-map= options maps the compute capability to the closest
lower compute capability that has been implemented; for sm_89 and
sm_90a, that were previously missing, that's currently -march=sm_80
alias -misa=sm_80.
gcc/ChangeLog:
* config/nvptx/nvptx.opt (march-map=): Add sm_89 and sm_90a.
Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
Some more '-g' fixes as the .mkoffload.dbg.o debug file's has elf flags
which did not match those generated for the compilation, leading to linker
errors. For .mkoffload.dbg.o, the elf flags are generated by mkoffload
itself - while for the other .o files, that's done by the compiler via
setting default and mainly via the ASM_SPEC.
This is a follow up to r14-8332-g13127dac106724 which fixed an issue
caused by the default arch. In this patch, it is mainly for gfx1100
and gfx1030 which always failed. It also affects gfx906 and possibly
gfx900 but only when using the -mxnack/-msram-ecc flags explicitly.
What happens on the compiler side is mainly determined by gcn-hsa.h's
and otherwise by some default setting. In particular for xnack and
sram_ecc, there is:
For gfx1100 and gfx1030, neither xnack nor sram_ecc is set (only
'+wavefrontsize64').
For fiji, gfx900, gfx906 and gfx908 there is always -mattr=-xnack and
for all but gfx908 also -msram-ecc=no - independent of what has been
passed to the compiler. However, on the elf flags, the result differs:
For fiji, due to the HSACOv3, it is always set to 0 via
copy_early_debug_info; for gfx900, gfx906 and gfx908, xnack is OFF.
For sram-ecc, it is 'unset' for gfx900, 'any' for gfx906 and for
gfx908 it is 'any' unless overridden.
For gfx90a, the -msram-ecc= and -mxnack= are passed on, or if not present,
...=any is passed on. Note that this "any" is different from argument
nor present at elf flag level:
For XNACK: unset/unsupported is 0, any = 0x100, off = 0x200, on = 0x300.
For SRAMECC: unset/unsupported is 0, any = 0x400, off = 0x800, on = 0xc00.
The obstack_ptr_grow changes are more to avoid confusion than having an
actual effect as they would overwise be filtered out via the ASM_SPEC.
gcc/ChangeLog:
PR other/111966
* config/gcn/mkoffload.cc (SET_XNACK_UNSET, TEST_SRAM_ECC_UNSET): New.
(SET_SRAM_ECC_UNSUPPORTED): Renamed to ...
(SET_SRAM_ECC_UNSET): ... this.
(copy_early_debug_info): Remove gfx900 special case, now handled as
part of the generic handling.
(main): Update SRAM_ECC and XNACK for the -march as done in gcn-hsa.h.
Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
libgomp/ChangeLog:
* testsuite/libgomp.c/declare-variant-4.h: Use gfx1100/gfx1030
function not gfx90a for gfx1100/gfx1030 context selector.
Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
On the following testcase we emit an invalid range of [2, 1] due to
UB in the source. Older VRP code silently swapped the boundaries and
made [1, 2] range out of it, but newer code just ICEs on it.
The reason for pdata->minlen 2 is that we see a memcpy in this case
setting both elements of the array to non-zero value, so strlen (a)
can't be smaller than 2. The reason for pdata->maxlen 1 is that in
char a[2] array without UB there can be at most 1 non-zero character
because there needs to be '\0' termination in the buffer too.
IMHO we shouldn't create invalid ranges like that and even creating
for that case a range [1, 2] looks wrong to me, so the following patch
just doesn't set maxlen in that case to the array size - 1, matching
what will really happen at runtime when triggering such UB (strlen will
be at least 2, perhaps more or will crash).
This is what the second hunk of the patch does.
The first hunk fixes a fortunately harmless thinko.
If the strlen pass knows the string length (i.e. get_string_length
function returns non-NULL), we take a different path, we get to this
only if all we know is that there are certain number of non-zero
characters but we don't know what it is followed with, whether further
non-zero characters or zero termination or either of that.
If we know exactly how many non-zero characters it is, such as
char a[42];
...
memcpy (a, "01234567890123456789", 20);
then we take an earlier if for the INTEGER_CST case and set correctly
just pdata->minlen to 20 in that case, but if we have something like
int len;
...
if (len < 15 || len > 32) return;
memcpy (a, "0123456789012345678901234567890123456789", len);
then we have [15, 32] range for the nonzero_chars and we set pdata->minlen
correctly to 15, but incorrectly set also pdata->maxlen to 32. That is
not what the above implies, it just means that in some cases we know that
there are at least 32 non-zero characters, followed by something we don't
know. There is no guarantee that there is '\0' right after it, so it
means nothing.
The reason this is harmless, just confusing, is that the code a few lines
later fortunately overwrites this incorrect pdata->maxlen value with
something different (either array length - 1 or all ones etc.).
2024-01-29 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/110603
* tree-ssa-strlen.cc (get_range_strlen_dynamic): Remove incorrect
setting of pdata->maxlen to vr.upper_bound (which is unconditionally
overwritten anyway). Avoid creating invalid range with minlen
larger than maxlen. Formatting fix.
* gcc.c-torture/compile/pr110603.c: New test.
The inliner puts variables for parameters of the inlined functions
in the inline scope in reverse order. The following reverses them
again so that we get consistent ordering between the
DW_TAG_subprogram DW_TAG_formal_parameter and the
DW_TAG_inlined_subroutine DW_TAG_formal_parameter set.
I failed to create a testcase with regexps since the inline
instances have just abstract origins and so I can't match them up.
PR debug/103047
* tree-inline.cc (initialize_inlined_parameters): Reverse
the decl chain of inlined parameters.
As PR109705#c17, commit r14-7270 missed to consider long
type is 32bit with option -m32. This patch is take care of
it accordingly.
Note that the vect_long_mult is supposed to be only used in
vect/ (generic), powerpc_altivec_ok would be guaranteed.
PR testsuite/109705
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (check_effective_target_vect_long_mult):
Fix powerpc*-*-* checks by considering ilp32.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
At present, Evaluation of both `has_lse2(hwcap)' and
`has_lse128(hwcap)' may require issuing an `mrs' instruction to query
a system register. This instruction, when issued from user-space
results in a trap by the kernel which then returns the value read in
by the system register. Given the undesirable nature of the
computational expense associated with the context switch, it is
important to implement mechanisms to, wherever possible, forgo the
operation.
In light of this, given how other architectural requirements serving
as prerequisites have long been assigned HWCAP bits by the kernel, we
can inexpensively query for their availability before attempting to
read any system registers. Where one of these early tests fail, we
can assert that the main feature of interest (be it LSE2 or LSE128)
cannot be present, allowing us to return from the function early and
skip the unnecessary expensive kernel-mediated access to system
registers.
libatomic/ChangeLog:
* config/linux/aarch64/host-config.h (has_lse2): Add test for LSE.
(has_lse128): Add test for LSE2.
The armv9.4-a architectural revision adds three new atomic operations
associated with the LSE128 feature:
* LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
value held in a pair of registers, with original data loaded into
the same 2 registers.
* LDSETP - Atomic OR (bitset) of a location with 128-bit value held
in a pair of registers, with original data loaded into the same 2
registers.
* SWPP - Atomic swap of one 128-bit value with 128-bit value held
in a pair of registers.
It is worth noting that in keeping with existing 128-bit atomic
operations in `atomic_16.S', we have chosen to merge certain
less-restrictive orderings into more restrictive ones. This is done
to minimize the number of branches in the atomic functions, minimizing
both the likelihood of branch mispredictions and, in keeping code
small, limit the need for extra fetch cycles.
Past benchmarking has revealed that acquire is typically slightly
faster than release (5-10%), such that for the most frequently used
atomics (CAS and SWP) it makes sense to add support for acquire, as
well as release.
Likewise, it was identified that combining acquire and release typically
results in little to no penalty, such that it is of negligible benefit
to distinguish between release and acquire-release, making the
combining release/acq_rel/seq_cst a worthwhile design choice.
This patch adds the logic required to make use of these when the
architectural feature is present and a suitable assembler available.
In order to do this, the following changes are made:
1. Add a configure-time check to check for LSE128 support in the
assembler.
2. Edit host-config.h so that when N == 16, nifunc = 2.
3. Where available due to LSE128, implement the second ifunc, making
use of the novel instructions.
4. For atomic functions unable to make use of these new
instructions, define a new alias which causes the _i1 function
variant to point ahead to the corresponding _i2 implementation.
libatomic/ChangeLog:
* Makefile.am (AM_CPPFLAGS): add conditional setting of
-DHAVE_FEAT_LSE128.
* acinclude.m4 (LIBAT_TEST_FEAT_AARCH64_LSE128): New.
* config/linux/aarch64/atomic_16.S (LSE128): New macro
definition.
(libat_exchange_16): New LSE128 variant.
(libat_fetch_or_16): Likewise.
(libat_or_fetch_16): Likewise.
(libat_fetch_and_16): Likewise.
(libat_and_fetch_16): Likewise.
* config/linux/aarch64/host-config.h (IFUNC_COND_2): New.
(IFUNC_NCOND): Add operand size checking.
(has_lse2): Renamed from `ifunc1`.
(has_lse128): New.
(HWCAP2_LSE128): Likewise.
* configure.ac: Add call to
LIBAT_TEST_FEAT_AARCH64_LSE128.
* configure (ac_subst_vars): Regenerated via autoreconf.
* Makefile.in: Likewise.
* auto-config.h.in: Likewise.
With support for new atomic features in Armv9.4-a being indicated by
HWCAP2 bits, Libatomic's ifunc resolver must now query its second
argument, of type __ifunc_arg_t*.
We therefore make this argument known to libatomic, allowing us to
query hwcap2 bits in the following manner:
bool
resolver (unsigned long hwcap, const __ifunc_arg_t *features);
{
return (features->hwcap2 & HWCAP2_<FEAT_NAME>);
}
libatomic/ChangeLog:
* config/linux/aarch64/host-config.h (__ifunc_arg_t):
Conditionally-defined if `sys/ifunc.h' not found.
(_IFUNC_ARG_HWCAP): Likewise.
(IFUNC_COND_1): Pass __ifunc_arg_t argument to ifunc.
(ifunc1): Modify function signature to accept __ifunc_arg_t
argument.
* configure.tgt: Add second `const __ifunc_arg_t *features'
argument to IFUNC_RESOLVER_ARGS.
The introduction of further architectural-feature dependent ifuncs
for AArch64 makes hard-coding ifunc `_i<n>' suffixes to functions
cumbersome to work with. It is awkward to remember which ifunc maps
onto which arch feature and makes the code harder to maintain when new
ifuncs are added and their suffixes possibly altered.
This patch uses pre-processor `#define' statements to map each suffix to
a descriptive feature name macro, for example:
#define LSE(NAME) NAME##_i1
Where we wish to generate ifunc names with the pre-processor's token
concatenation feature, we add a level of indirection to previous macro
calls. If before we would have had`MACRO(<name>_i<n>)', we now have
`MACRO_FEAT(name, feature)'. Where we wish to refer to base
functionality (i.e., functions where ifunc suffixes are absent), the
original `MACRO(<name>)' may be used to bypass suffixing.
Consequently, for base functionality, where the ifunc suffix is
absent, the macro interface remains the same. For example, the entry
and endpoints of `libat_store_16' remain defined by:
ENTRY (libat_store_16)
and
END (libat_store_16)
For the LSE2 implementation of the same 16-byte atomic store, we now
have:
ENTRY_FEAT (libat_store_16, LSE2)
and
END_FEAT (libat_store_16, LSE2)
For the aliasing of function names, we define the following new
implementation of the ALIAS macro:
ALIAS (FN_BASE_NAME, FROM_SUFFIX, TO_SUFFIX)
Defining the `CORE(NAME)' macro to be the identity operator, it
returns the base function name unaltered and allows us to alias
target-specific ifuncs to the corresponding base implementation.
For example, we'd alias the LSE2 `libat_exchange_16' to it base
implementation with:
ALIAS (libat_exchange_16, LSE2, CORE)
libatomic/ChangeLog:
* config/linux/aarch64/atomic_16.S (CORE): New macro.
(LSE2): Likewise.
(ENTRY_FEAT): Likewise.
(ENTRY_FEAT1): Likewise.
(END_FEAT): Likewise.
(END_FEAT1): Likewise.
(ALIAS): Modify macro to take in `arch' arguments.
(ALIAS1): New.
gcc/fortran/ChangeLog:
PR fortran/113377
* trans-expr.cc (conv_dummy_value): Treat NULL actual argument to
optional dummy with the VALUE attribute as not present.
(gfc_conv_procedure_call): Likewise.
gcc/testsuite/ChangeLog:
PR fortran/113377
* gfortran.dg/optional_absent_11.f90: New test.
We have reports of regressions in both Objective-C and Objective-C++ on
Darwin23 (macOS 14). In some cases, these are linker warnings about the
alignment of CFString constants; in other cases the built executables
crash during runtime initialization. The underlying issue is the same in
both cases; since the objects (CFStrings, Objective-C meta-data) are TU-
local, we are choosing to increase their alignment for efficiency - to
values greater than ABI alignment.
However, although these objects are TU-local, they are also visible to the
linker (since they are placed in specific named sections). In many cases
the metadata can be regarded as tables of data, and thus it is expected
that these sections can be concatenated from multiple TUs and the data
treated as tabular. In order for this to work the data cannot be allowed
to exceed ABI alignment - which leads to the crashes.
For GCC-15+ it would be nice to find a more elegant solution to this issue
(perhaps by adjusting the concept of binds-locally to exclude specific
named sections) - but I do not want to do that in stage 4.
The solution here is to force the alignment to be preserved as created by
setting DECL_USER_ALIGN on the relevant objects.
gcc/ChangeLog:
* config/darwin.cc (darwin_build_constant_cfstring): Prevent over-
alignment of CFString constants by setting DECL_USER_ALIGN.
gcc/objc/ChangeLog:
* objc-next-runtime-abi-02.cc (build_v2_address_table): Prevent
over-alignment of Objective-C metadata by setting DECL_USER_ALIGN
on relevant variables.
(build_v2_protocol_list_address_table): Likewise.
(generate_v2_protocol_list): Likewise.
(generate_v2_meth_descriptor_table): Likewise.
(generate_v2_meth_type_list): Likewise.
(generate_v2_property_table): Likewise.
(generate_v2_dispatch_table): Likewise.
(generate_v2_ivars_list): Likewise.
(generate_v2_class_structs): Likewise.
(build_ehtype): Likewise.
* objc-runtime-shared-support.cc (generate_strings): Likewise.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Two of the encode testcases include '-lobjc' as their dg-options.
Since the library is already appended as part of the generic testsuite
handling, this means that two instances appear on the link line leading
to spurious warnings from Darwin's new linker.
gcc/testsuite/ChangeLog:
* obj-c++.dg/encode-10.mm: Remove unneeded '-lobjc' option addition.
* obj-c++.dg/encode-9.mm: Likewise.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
The symbols for the functions supporting heap-based trampolines were
exported at an incorrect symbol version, the following patch fixes that.
As requested in the PR, this also renames __builtin_nested_func_ptr* to
__gcc_nested_func_ptr*. In carrying our the rename, we move the builtins
to use DEF_EXT_LIB_BUILTIN.
PR libgcc/113402
gcc/ChangeLog:
* builtins.cc (expand_builtin): Handle BUILT_IN_GCC_NESTED_PTR_CREATED
and BUILT_IN_GCC_NESTED_PTR_DELETED.
* builtins.def (BUILT_IN_GCC_NESTED_PTR_CREATED,
BUILT_IN_GCC_NESTED_PTR_DELETED): Make these builtins LIB-EXT and
rename the library fallbacks to __gcc_nested_func_ptr_created and
__gcc_nested_func_ptr_deleted.
* doc/invoke.texi: Rename these to __gcc_nested_func_ptr_created
and __gcc_nested_func_ptr_deleted.
* tree-nested.cc (finalize_nesting_tree_1): Use builtin_explicit for
BUILT_IN_GCC_NESTED_PTR_CREATED and BUILT_IN_GCC_NESTED_PTR_DELETED.
* tree.cc (build_common_builtin_nodes): Build the
BUILT_IN_GCC_NESTED_PTR_CREATED and BUILT_IN_GCC_NESTED_PTR_DELETED local
builtins only for non-explicit.
libgcc/ChangeLog:
* config/aarch64/heap-trampoline.c: Rename
__builtin_nested_func_ptr_created to __gcc_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted to __gcc_nested_func_ptr_deleted.
* config/i386/heap-trampoline.c: Likewise.
* libgcc2.h: Likewise.
* libgcc-std.ver.in (GCC_7.0.0): Likewise and then move
__gcc_nested_func_ptr_created and
__gcc_nested_func_ptr_deleted from this symbol version to ...
(GCC_14.0.0): ... this one.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Co-authored-by: Jakub Jelinek <jakub@redhat.com>
Currently when a test fails, we print out a lot of information,
this includes items that are not stable between invocations (e.g.
the PID for the executable). That makes automated comparisons
between test runs flag any persistent fails as new ones each time
which is not usually what is wanted.
This patch amends the error output to drop the variable portion
of the message and retain items that should only change if the
failure mode changes.
gcc/testsuite/ChangeLog:
* jit.dg/jit.exp: Filter error output to remove per-run
variable content.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
The purpose of this test is to make sure that constant propagation is
achieved with the proper optimization level, so a BPF call instruction
to a kernel helper is generated. This patch updates the patch so it
also covers kernel helpers defined with constant static pointers.
The motivation for this patch is:
https://lore.kernel.org/bpf/20240127185031.29854-1-jose.marchesi@oracle.com/T/#u
Tested in bpf-unknown-none target x86_64-linux-gnu host.
gcc/testsuite/ChangeLog
* gcc.target/bpf/helper-skb-ancestor-cgroup-id.c: Add constant
version of kernel helper static pointer.
Commit r11-1235 addressed issues with bounds of unlimited polymorphic array
dummies. However, using the descriptor from sym->backend_decl does break
the case of CLASS array dummies. The obvious solution is to restrict the
fix to the unlimited polymorphic case, thus keeping the original descriptor
in the ordinary case.
gcc/fortran/ChangeLog:
PR fortran/104908
* trans-array.cc (gfc_conv_array_ref): Restrict use of transformed
descriptor (sym->backend_decl) to the unlimited polymorphic case.
gcc/testsuite/ChangeLog:
PR fortran/104908
* gfortran.dg/pr104908.f90: New test.
There is no need to save callee-saved registers in noreturn functions
if they don't throw nor support exceptions. We can treat them the same
as functions with no_callee_saved_registers attribute.
Adjust stack-check-17.c for noreturn function which no longer saves any
registers.
With this change, __libc_start_main in glibc 2.39, which is a noreturn
function, is changed from
__libc_start_main:
endbr64
push %r15
push %r14
mov %rcx,%r14
push %r13
push %r12
push %rbp
mov %esi,%ebp
push %rbx
mov %rdx,%rbx
sub $0x28,%rsp
mov %rdi,(%rsp)
mov %fs:0x28,%rax
mov %rax,0x18(%rsp)
xor %eax,%eax
test %r9,%r9
to
__libc_start_main:
endbr64
sub $0x28,%rsp
mov %esi,%ebp
mov %rdx,%rbx
mov %rcx,%r14
mov %rdi,(%rsp)
mov %fs:0x28,%rax
mov %rax,0x18(%rsp)
xor %eax,%eax
test %r9,%r9
In Linux kernel 6.7.0 on x86-64, do_exit is changed from
do_exit:
endbr64
call <do_exit+0x9>
push %r15
push %r14
push %r13
push %r12
mov %rdi,%r12
push %rbp
push %rbx
mov %gs:0x0,%rbx
sub $0x28,%rsp
mov %gs:0x28,%rax
mov %rax,0x20(%rsp)
xor %eax,%eax
call *0x0(%rip) # <do_exit+0x39>
test $0x2,%ah
je <do_exit+0x8d3>
to
do_exit:
endbr64
call <do_exit+0x9>
sub $0x28,%rsp
mov %rdi,%r12
mov %gs:0x28,%rax
mov %rax,0x20(%rsp)
xor %eax,%eax
mov %gs:0x0,%rbx
call *0x0(%rip) # <do_exit+0x2f>
test $0x2,%ah
je <do_exit+0x8c9>
I compared GCC master branch bootstrap and test times on a slow machine
with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13
with the backported patch. The performance data isn't precise since the
measurements were done on different days with different GCC sources under
different 6.6 kernel versions.
GCC master branch build time in seconds:
before after improvement
30043.75user 30013.16user 0%
1274.85system 1243.72system 2.4%
GCC master branch test time in seconds (new tests added):
before after improvement
216035.90user 216547.51user 0
27365.51system 26658.54system 2.6%
gcc/
PR target/38534
* config/i386/i386-options.cc (ix86_set_func_type): Don't
save and restore callee saved registers for a noreturn function
with nothrow or compiled with -fno-exceptions.
gcc/testsuite/
PR target/38534
* gcc.target/i386/pr38534-1.c: New file.
* gcc.target/i386/pr38534-2.c: Likewise.
* gcc.target/i386/pr38534-3.c: Likewise.
* gcc.target/i386/pr38534-4.c: Likewise.
* gcc.target/i386/stack-check-17.c: Updated.
When an interrupt handler is implemented by an assembly stub which does:
1. Save all registers.
2. Call a C function.
3. Restore all registers.
4. Return from interrupt.
it is completely unnecessary to save and restore any registers in the C
function called by the assembly stub, even if they would normally be
callee-saved.
Add no_callee_saved_registers function attribute, which is complementary
to no_caller_saved_registers function attribute, to mark a function which
doesn't have any callee-saved registers. Such a function won't save and
restore any registers. Classify function call-saved register handling
type with:
1. Default call-saved registers.
2. No caller-saved registers with no_caller_saved_registers attribute.
3. No callee-saved registers with no_callee_saved_registers attribute.
Disallow sibcall if callee is a no_callee_saved_registers function
and caller isn't a no_callee_saved_registers function. Otherwise,
callee-saved registers won't be preserved.
After a no_callee_saved_registers function is called, all registers may
be clobbered. If the calling function isn't a no_callee_saved_registers
function, we need to preserve all registers which aren't used by function
calls.
gcc/
PR target/103503
PR target/113312
* config/i386/i386-expand.cc (ix86_expand_call): Replace
no_caller_saved_registers check with call_saved_registers check.
Clobber all registers that are not used by the callee with
no_callee_saved_registers attribute.
* config/i386/i386-options.cc (ix86_set_func_type): Set
call_saved_registers to TYPE_NO_CALLEE_SAVED_REGISTERS for
noreturn function. Disallow no_callee_saved_registers with
interrupt or no_caller_saved_registers attributes together.
(ix86_set_current_function): Replace no_caller_saved_registers
check with call_saved_registers check.
(ix86_handle_no_caller_saved_registers_attribute): Renamed to ...
(ix86_handle_call_saved_registers_attribute): This.
(ix86_gnu_attributes): Add
ix86_handle_call_saved_registers_attribute.
* config/i386/i386.cc (ix86_conditional_register_usage): Replace
no_caller_saved_registers check with call_saved_registers check.
(ix86_function_ok_for_sibcall): Don't allow callee with
no_callee_saved_registers attribute when the calling function
has callee-saved registers.
(ix86_comp_type_attributes): Also check
no_callee_saved_registers.
(ix86_epilogue_uses): Replace no_caller_saved_registers check
with call_saved_registers check.
(ix86_hard_regno_scratch_ok): Likewise.
(ix86_save_reg): Replace no_caller_saved_registers check with
call_saved_registers check. Don't save any registers for
TYPE_NO_CALLEE_SAVED_REGISTERS. Save all registers with
TYPE_DEFAULT_CALL_SAVED_REGISTERS if function with
no_callee_saved_registers attribute is called.
(find_drap_reg): Replace no_caller_saved_registers check with
call_saved_registers check.
* config/i386/i386.h (call_saved_registers_type): New enum.
(machine_function): Replace no_caller_saved_registers with
call_saved_registers.
* doc/extend.texi: Document no_callee_saved_registers attribute.
gcc/testsuite/
PR target/103503
PR target/113312
* gcc.dg/torture/no-callee-saved-run-1a.c: New file.
* gcc.dg/torture/no-callee-saved-run-1b.c: Likewise.
* gcc.target/i386/no-callee-saved-1.c: Likewise.
* gcc.target/i386/no-callee-saved-2.c: Likewise.
* gcc.target/i386/no-callee-saved-3.c: Likewise.
* gcc.target/i386/no-callee-saved-4.c: Likewise.
* gcc.target/i386/no-callee-saved-5.c: Likewise.
* gcc.target/i386/no-callee-saved-6.c: Likewise.
* gcc.target/i386/no-callee-saved-7.c: Likewise.
* gcc.target/i386/no-callee-saved-8.c: Likewise.
* gcc.target/i386/no-callee-saved-9.c: Likewise.
* gcc.target/i386/no-callee-saved-10.c: Likewise.
* gcc.target/i386/no-callee-saved-11.c: Likewise.
* gcc.target/i386/no-callee-saved-12.c: Likewise.
* gcc.target/i386/no-callee-saved-13.c: Likewise.
* gcc.target/i386/no-callee-saved-14.c: Likewise.
* gcc.target/i386/no-callee-saved-15.c: Likewise.
* gcc.target/i386/no-callee-saved-16.c: Likewise.
* gcc.target/i386/no-callee-saved-17.c: Likewise.
* gcc.target/i386/no-callee-saved-18.c: Likewise.
The following testcase is miscompiled, because some narrower value
is sign-extended to wider unsigned _BitInt used as division operand.
handle_operand_addr for that case returns the narrower value and
precision -prec_of_narrower_value. That works fine for multiplication
(at least, normal multiplication, but we don't merge casts with
.MUL_OVERFLOW or the ubsan multiplication right now), because the
result is the same whether we treat the arguments as signed or unsigned.
But is completely wrong for division/modulo or conversions to
floating-point, if we pass negative prec for an input operand of a libgcc
handler, those treat it like a negative number, not an unsigned one
sign-extended from something smaller (and it doesn't know to what precision
it has been extended).
So, the following patch fixes it by making sure we don't merge such
sign-extensions to unsigned _BitInt type with division, modulo or
conversions to floating point.
2024-01-27 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113614
* gimple-lower-bitint.cc (gimple_lower_bitint): Don't merge
widening casts from signed to unsigned types with TRUNC_DIV_EXPR,
TRUNC_MOD_EXPR or FLOAT_EXPR uses.
* gcc.dg/torture/bitint-54.c: New test.
We generally allow merging mergeable stmts with some final cast (but not
further casts or mergeable operations after the cast). As some casts
are handled conditionally, if (idx < cst) handle_operand (idx); else if
idx == cst) handle_operand (cst); else ..., we must sure that e.g. the
mergeable PLUS_EXPR/MINUS_EXPR/NEGATE_EXPR never appear in handle_operand
called from such casts, because it ICEs on invalid SSA_NAME form (that part
could be fixable by adding further PHIs) but also because we'd need to
correctly propagate the overflow flags from the if to else if.
So, instead lower_mergeable_stmt handles an outermost widening cast (or
widening cast feeding outermost store) specially.
The problem was similar to PR113408, that VIEW_CONVERT_EXPR tree is
present in the gimple_assign_rhs1 while it is not for NOP_EXPR/CONVERT_EXPR,
so the checks whether the outermost cast should be handled didn't handle
the VCE case and so handle_plus_minus was called from the conditional
handle_cast.
2024-01-27 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113568
* gimple-lower-bitint.cc (bitint_large_huge::lower_mergeable_stmt):
For VIEW_CONVERT_EXPR use first operand of rhs1 instead of rhs1
in the widening extension checks.
* gcc.dg/bitint-78.c: New test.
While the SSA coalescing performed by lower bitint prints some information
if -fdump-tree-bitintlower-details, it is really hard to read and doesn't
contain the most important information which one looks for when debugging
bitint lowering issues, namely what VAR_DECLs (or PARM_DECLs/RESULT_DECLs)
each SSA_NAME in large_huge.m_names bitmap maps to.
So, the following patch adds dumping of that, so that we know that say
_3 -> bitint.3
_8 -> bitint.7
_16 -> bitint.7
etc.
2024-01-27 Jakub Jelinek <jakub@redhat.com>
* gimple-lower-bitint.cc (gimple_lower_bitint): For
TDF_DETAILS dump mapping of SSA_NAMEs to decls.
Users are allowed to define macros prior to restoring a precompiled header
file, as long as those macros are not defined (or are defined identically)
in the PCH. However, the PCH restoration process destroys all the macro
definitions, so libcpp has to record them before restoring the PCH and then
redefine them afterward.
This process does not currently assign great locations to the macros after
redefining them. Some work is needed to also remember the original locations
and get the line_maps instance in the right state (since, like all other
data structures, the line_maps instance is also reset after restoring a PCH).
The new testcase line-map-3.C contains XFAILed examples where the locations
are wrong.
This patch addresses a more pressing issue, which is that we ICE in some
cases since GCC 11, hitting an assert in line-maps.cc. It happens if the
first line encountered after the PCH restore requires an LC_RENAME map, such
as will happen if the line is sufficiently long. This is much easier to
fix, since we just need to call linemap_line_start before asking libcpp to
redefine the stored macros, instead of afterward, to avoid the unexpected
need for an LC_RENAME before an LC_ENTER has been seen.
gcc/c-family/ChangeLog:
PR preprocessor/105608
* c-pch.cc (c_common_read_pch): Start a new line map before asking
libcpp to restore macros defined prior to reading the PCH, instead
of afterward.
gcc/testsuite/ChangeLog:
PR preprocessor/105608
* g++.dg/pch/line-map-1.C: New test.
* g++.dg/pch/line-map-1.Hs: New test.
* g++.dg/pch/line-map-2.C: New test.
* g++.dg/pch/line-map-2.Hs: New test.
* g++.dg/pch/line-map-3.C: New test.
* g++.dg/pch/line-map-3.Hs: New test.
When you're not regularly exposed to this warning, it is
easy to be misled by its wording, believing that there's
something else in the function that stops it from being
inlined, something other than the lack of also being
*declared* inline. Also, clang does not warn.
It's just a warning: without the inline directive, there has
to be a secondary reason for the function to be inlined,
other than the always_inline attribute, a reason that may be
in effect despite the warning.
Whenever the text is quoted in inline-related bugzilla
entries, there seems to often have been an initial step of
confusion that has to be cleared, for example in PR55830.
A file in the powerpc-specific parts of the test-suite,
gcc.target/powerpc/vec-extract-v16qiu-v2.h, has a comment
and seems to be another example, and I testify as the
first-hand third "experience". The wording has been the
same since the warning was added.
Let's just tweak the wording, adding the cause, so that the
reason for the warning is clearer. This hopefully stops the
user from immediately asking "'Might'? Because why?" and
then going off looking at the function body - or grepping
the gcc source or documentation, or enter a bug-report
subsequently closed as resolved/invalid.
Since the message is only appended with additional
information, no test-case actually required adjustment.
I still changed them, so the message is covered.
gcc:
* cgraphunit.cc (process_function_and_variable_attributes): Tweak
the warning for an attribute-always_inline without inline declaration.
gcc/testsuite:
* g++.dg/Wattributes-3.C: Adjust expected warning.
* gcc.dg/fail_always_inline.c: Ditto.
Currently the DECL_STRUCT_FUNCTION for a declaration is always
reconstructed from scratch. This causes issues though, as some fields
used by other parts of the compiler (in this case, specifically
'function_{start,end}_locus') are then not correctly initialised. This
patch makes sure that these fields are also read and written.
PR c++/113580
gcc/cp/ChangeLog:
* module.cc (struct post_process_data): Create.
(trees_in::post_decls): Use.
(trees_in::post_process): Return entire vector at once.
Change overload to take post_process_data instead of tree.
(trees_out::write_function_def): Write needed flags from
DECL_STRUCT_FUNCTION.
(trees_in::read_function_def): Read them and pass to
post_process.
(module_state::read_cluster): Write flags into cfun.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr113580_a.C: New test.
* g++.dg/modules/pr113580_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Add RTL tests, for RV64 and RV32 where appropriate, corresponding to the
existing cset-sext.c tests. They have been produced from RTL code as at
the entry of the "ce1" pass for the respective cset-sext.c tests built
at -O3.
gcc/testsuite/
* gcc.target/riscv/cset-sext-rtl.c: New file.
* gcc.target/riscv/cset-sext-rtl32.c: New file.
* gcc.target/riscv/cset-sext-sfb-rtl.c: New file.
* gcc.target/riscv/cset-sext-sfb-rtl32.c: New file.
* gcc.target/riscv/cset-sext-thead-rtl.c: New file.
* gcc.target/riscv/cset-sext-ventana-rtl.c: New file.
* gcc.target/riscv/cset-sext-zicond-rtl.c: New file.
* gcc.target/riscv/cset-sext-zicond-rtl32.c: New file.
Add a pair of RTL tests, for RV64 and RV32 respectively, corresponding
to the existing pr105314.c test. They have been produced from RTL code
as at the entry of the "ce1" pass for pr105314.c compiled at -O3.
gcc/testsuite/
* gcc.target/riscv/pr105314-rtl.c: New file.
* gcc.target/riscv/pr105314-rtl32.c: New file.
Verify that if-conversion succeeded through noce_try_store_flag_mask, as
per PR rtl-optimization/105314, tightening the test case and making it
explicit.
gcc/testsuite/
* gcc.target/riscv/pr105314.c: Scan the RTL "ce1" pass too.
The optimization levels pr105314.c is iterated over are needlessly
overridden with "-O2", limiting the coverage of the test case to that
level, perhaps with additional options the original optimization level
has been supplied with. We could prevent the extra iterations other
than "-O2" from being run, but the transformation made by if-conversion
is also expected to happen at other optimization levels, so include them
all, and also make sure no reverse-condition branch appears in output,
moving the `dg-final' command to the bottom, as with most test cases.
gcc/testsuite/
* gcc.target/riscv/pr105314.c: Replace `dg-options' command with
`dg-skip-if'. Also reject "bne" with `dg-final'.
init_all_optabs initializes > 10000 patterns for riscv targets. This
leads to pathological situations in dataflow analysis (which can occur
with many adjacent stores).
To alleviate this this patch makes genopinit split the init_all_optabs
function into several init_optabs_xx functions that each initialize 1000
patterns.
With this change insn-opinit.cc's compilation time is reduced from 4+
minutes to 1:30 and memory consumption decreases from 1.2G to 630M.
gcc/ChangeLog:
PR other/113575
* genopinit.cc (main): Split init_all_optabs into functions
of 1000 patterns each.
This patch improves the location accuracy of parameters and fixes bugs
in parameter checking in M2Check. It also corrects the location
of constant declarations.
gcc/m2/ChangeLog:
* gm2-compiler/M2Check.mod (dumpIndice): New procedure.
(dumpIndex): New procedure.
(dumptInfo): New procedure.
(buildError4): Add comment and pass formal and actual to
MetaError4. Improve text describing error.
(buildError2): Generate different error descriptions for
the three error kinds.
(checkConstMeta): Add block comment. Add more meta checks
and call doCheckPair to complete string const checking.
Add tinfo parameter.
(checkConstEquivalence): Add tinfo parameter.
* gm2-compiler/M2GCCDeclare.mod (PrintVerboseFromList):
Print the length of a const string.
* gm2-compiler/M2GenGCC.mod (CodeParam): Remove parameters
op1, op2 and op3.
(doParam): Add paramtok parameter. Use paramtok instead rather
than CurrentQuadToken.
(CodeParam): Rewrite.
* gm2-compiler/M2Quads.mod (CheckProcedureParameters):
Add comments explaining that const strings are not checked
in M2Quads.mod.
(FailParameter): Use MetaErrorT2 with tokpos rather than
MetaError2.
(doBuildBinaryOp): Assign OldPos and OperatorPos before the
IF block.
* gm2-compiler/SymbolTable.mod (PutConstString): Add call to
InitWhereDeclaredTok.
gcc/testsuite/ChangeLog:
* gm2/pim/fail/badpointer4.mod: New test.
* gm2/pim/fail/strconst.def: New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
The following avoids registering unsupported GCN offload devices
when iterating over available ones. With a Zen4 desktop CPU
you will have an IGPU (unspported) which will otherwise be made
available. This causes testcases like
libgomp.c-c++-common/non-rect-loop-1.c which iterate over all
decives to FAIL.
libgomp/
* plugin/plugin-gcn.c (suitable_hsa_agent_p): Filter out
agents with unsupported ISA.
The following makes the existing architecture support check work
instead of being optimized away (enum vs. -1). This avoids
later asserts when we assume such devices are never actually
used.
libgomp/
* plugin/plugin-gcn.c
(EF_AMDGPU_MACH::EF_AMDGPU_MACH_UNSUPPORTED): Add.
(isa_code): Return that instead of -1.
(GOMP_OFFLOAD_init_device): Adjust.
This is enough to get gfx1030 and gfx1100 working; there are still some test
failures to investigate, and probably some tuning to do.
gcc/ChangeLog:
* config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3.
* config/gcn/gcn-valu.md (all_convert): New iterator.
(<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): New
define_expand, and rename the old one to ...
(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): ... this.
(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): Likewise, to ...
(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): .. this.
(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>): New.
* config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly.
(gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100.
* config/gcn/gcn.md (<u>mulhisi3): Disable on RDNA3.
(<u>mulqihi3_scalar): Likewise.
libgcc/ChangeLog:
* config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3.
libgomp/ChangeLog:
* config/gcn/time.c (RTC_TICKS): Configure RDNA3.
(omp_get_wtime): Add RDNA3-compatible variant.
* plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100.
Signed-off-by: Andrew Stubbs <ams@baylibre.com>
Static data members marked 'inline' should be emitted in TUs where they
are ODR-used. We need to make sure that inlines imported from modules
are correctly added to the 'pending_statics' map so that they get
emitted if needed, otherwise the attached testcase fails to link.
PR c++/112899
gcc/cp/ChangeLog:
* cp-tree.h (note_variable_template_instantiation): Rename to...
(note_vague_linkage_variable): ...this.
* decl2.cc (note_variable_template_instantiation): Rename to...
(note_vague_linkage_variable): ...this.
* pt.cc (instantiate_decl): Rename usage of above function.
* module.cc (trees_in::read_var_def): Remember pending statics
that we stream in.
gcc/testsuite/ChangeLog:
* g++.dg/modules/init-4_a.C: New test.
* g++.dg/modules/init-4_b.C: New test.
* g++.dg/modules/init-6_a.H: New test.
* g++.dg/modules/init-6_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jason Merrill <jason@redhat.com
We can end up creating ADDR_EXPRs of non-addressable entities during
for example vectorization. The following plugs this in data-ref
analysis when that would create such invalid ADDR_EXPR as part of
analyzing the ref structure.
PR tree-optimization/113602
* tree-data-ref.cc (dr_analyze_innermost): Fail when
the base object isn't addressable.
* gcc.dg/pr113602.c: New testcase.
Since LLVM commit 082f87c9d418 (Pull Req. #79038; will become LLVM 18)
"[AMDGPU] Change default AMDHSA Code Object version to 5"
the default - when no --amdhsa-code-object-version= is used - was bumped.
Using --amdhsa-code-object-version=5 is supported (with unknown limitations)
since LLVM 14. GCC required for proper support at least LLVM 13.0.1 such
that explicitly using COV5 is not possible.
Unfortunately, the COV number matters for debugging ("-g") as mkoffload.cc
extracts debugging data from the host's object file and writes into an
an AMD GPU object file it creates. And all object files linked together
must have the same ABI version.
gcc/ChangeLog:
* config/gcn/gcn-hsa.h (ABI_VERSION_SPEC): New; creates the
"--amdhsa-code-object-version=" argument.
(ASM_SPEC): Use it; replace previous version of it.
Signed-off-by: Tobias Burnus <tburnus@baylibre.com>