Commit graph

208627 commits

Author SHA1 Message Date
Richard Biener
0f7945417f middle-end/113622 - handle store with variable index to register
The following implements storing to a non-MEM_P with a variable
offset.  We usually avoid this by forcing expansion to memory but
this doesn't work for hard register variables.  The solution is
to spill and operate on the stack.

	PR middle-end/113622
	* expr.cc (expand_assignment): Spill hard registers if
	we index them with a variable offset.

	* gcc.target/i386/pr113622-2.c: New testcase.
	* gcc.target/i386/pr113622-3.c: Likewise.
2024-01-29 14:25:10 +01:00
Richard Biener
96bc048d78 middle-end/113622 - allow .VEC_SET and .VEC_EXTRACT for global hard regs
The following expands .VEC_SET and .VEC_EXTRACT instruction selection
to global hard registers, not only automatic variables (possibly)
promoted to registers.  This can avoid some ICEs later and create
better code.

	PR middle-end/113622
	* gimple-isel.cc (gimple_expand_vec_set_extract_expr):
	Also allow DECL_HARD_REGISTER variables.

	* gcc.target/i386/pr113622-1.c: New testcase.
2024-01-29 14:25:10 +01:00
Alex Coplan
d41a1873f3 aarch64: Ensure iterator validity when updating debug uses [PR113616]
The fix for PR113089 introduced range-based for loops over the
debug_insn_uses of an RTL-SSA set_info, but in the case that we reset a
debug insn, the use would get removed from the use list, and thus we
would end up using an invalidated iterator in the next iteration of the
loop.  In practice this means we end up terminating the loop
prematurely, and hence ICE as in PR113089 since there are debug uses
that we failed to fix up.

This patch fixes that by introducing a general mechanism to avoid this
sort of problem.  We introduce a safe_iterator to iterator-utils.h which
wraps an iterator, and also holds the end iterator value.  It then
pre-computes the next iterator value at all iterations, so it doesn't
matter if the original iterator got invalidated during the loop body, we
can still move safely to the next iteration.

We introduce an iterate_safely helper which effectively adapts a
container such as iterator_range into a container of safe_iterators over
the original iterator type.

We then use iterate_safely around all loops over debug_insn_uses () in
the aarch64 ldp/stp pass to fix PR113616.  While doing this, I
remembered that cleanup_tombstones () had the same problem.  I
previously worked around this locally by manually maintaining the next
nondebug insn, so this patch also refactors that loop to use the new
iterate_safely helper.

While doing that I noticed that a couple of cases in cleanup_tombstones
could be converted from using dyn_cast<set_info *> to as_a<set_info *>,
which should be safe because there are no clobbers of mem in RTL-SSA, so
all defs of memory should be set_infos.

gcc/ChangeLog:

	PR target/113616
	* config/aarch64/aarch64-ldp-fusion.cc (fixup_debug_uses_trailing_add):
	Use iterate_safely when iterating over debug uses.
	(fixup_debug_uses): Likewise.
	(ldp_bb_info::cleanup_tombstones): Use iterate_safely to iterate
	over nondebug insns instead of manually maintaining the next insn.
	* iterator-utils.h (class safe_iterator): New.
	(iterate_safely): New.

gcc/testsuite/ChangeLog:

	PR target/113616
	* gcc.c-torture/compile/pr113616.c: New test.
2024-01-29 13:29:54 +00:00
H.J. Lu
291f75fa1b x86: Save callee-saved registers in noreturn functions for -O0/-Og
Save callee-saved registers in noreturn functions for -O0/-Og so that
debugger can restore callee-saved registers in caller's frame.

Also add the TREE_THIS_VOLATILE check to minimize noreturn attribute
lookup.

gcc/

	PR target/38534
	* config/i386/i386-options.cc (ix86_set_func_type): Save
	callee-saved registers in noreturn functions for -O0/-Og.

gcc/testsuite/

	PR target/38534
	* gcc.target/i386/pr38534-5.c: New file.
	* gcc.target/i386/pr38534-6.c: Likewise.
2024-01-29 05:29:01 -08:00
Prathamesh Kulkarni
8a48723dac PR112950: Use #pragma GCC for including arm_sve.h.
gcc/testsuite/ChangeLog:
	PR target/112950
	* gcc.target/aarch64/sve/acle/general/dupq_5.c: Remove include directive
	and instead use #pragma GCC for including arm_sve.h.
2024-01-29 18:42:44 +05:30
Tobias Burnus
7cc2262ec9 gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]
gcc/ChangeLog:

	PR target/113615
	* config/gcn/gcn-valu.md (fold_left_plus_<mode>): Only
	define for !TARGET_RDNA2_PLUS.

Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
2024-01-29 13:51:25 +01:00
Richard Sandiford
1a8261e047 vect: Tighten vect_determine_precisions_from_range [PR113281]
This was another PR caused by the way that
vect_determine_precisions_from_range handles shifts.  We tried to
narrow 32768 >> x to a 16-bit shift based on range information for
the inputs and outputs, with vect_recog_over_widening_pattern
(after PR110828) adjusting the shift amount.  But this doesn't
work for the case where x is in [16, 31], since then 32-bit
32768 >> x is a well-defined zero, whereas no well-defined
16-bit 32768 >> y will produce 0.

We could perhaps generate x < 16 ? 32768 >> x : 0 instead,
but since vect_determine_precisions_from_range was never really
supposed to rely on fix-ups, it seems better to fix that instead.

The patch also makes the code more selective about which codes
can be narrowed based on input and output ranges.  This showed
that vect_truncatable_operation_p was missing cases for
BIT_NOT_EXPR (equivalent to BIT_XOR_EXPR of -1) and NEGATE_EXPR
(equivalent to BIT_NOT_EXPR followed by a PLUS_EXPR of 1).

pr113281-1.c is the original testcase.  pr113281-[23].c failed
before the patch due to overly optimistic narrowing.  pr113281-[45].c
previously passed and are meant to protect against accidental
optimisation regressions.

gcc/
	PR target/113281
	* tree-vect-patterns.cc (vect_recog_over_widening_pattern): Remove
	workaround for right shifts.
	(vect_truncatable_operation_p): Handle NEGATE_EXPR and BIT_NOT_EXPR.
	(vect_determine_precisions_from_range): Be more selective about
	which codes can be narrowed based on their input and output ranges.
	For shifts, require at least one more bit of precision than the
	maximum shift amount.

gcc/testsuite/
	PR target/113281
	* gcc.dg/vect/pr113281-1.c: New test.
	* gcc.dg/vect/pr113281-2.c: Likewise.
	* gcc.dg/vect/pr113281-3.c: Likewise.
	* gcc.dg/vect/pr113281-4.c: Likewise.
	* gcc.dg/vect/pr113281-5.c: Likewise.
2024-01-29 12:33:08 +00:00
Tobias Burnus
9e89b5e925 nvptx.opt: Add sm_89 and sm_90a to -march-map=
The -march-map= options maps the compute capability to the closest
lower compute capability that has been implemented; for sm_89 and
sm_90a, that were previously missing, that's currently -march=sm_80
alias -misa=sm_80.

gcc/ChangeLog:

	* config/nvptx/nvptx.opt (march-map=): Add sm_89 and sm_90a.

Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
2024-01-29 13:06:27 +01:00
Tobias Burnus
c1a38cd67e install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled
gcc/ChangeLog:

	* doc/install.texi (amdgcn): Recommend LLVM 15+ and newlib 4.4+,
	but keep requiring only newlib 4.3+ and, if gfx1100 is disabled,
	LLVM 13.0.1+.

Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
2024-01-29 11:20:49 +01:00
Tobias Burnus
ef5ccdbbc6 gcn/mkoffload.cc: Fix SRAM_ECC and XNACK handling [PR111966]
Some more '-g' fixes as the .mkoffload.dbg.o debug file's has elf flags
which did not match those generated for the compilation, leading to linker
errors.  For .mkoffload.dbg.o, the elf flags are generated by mkoffload
itself - while for the other .o files, that's done by the compiler via
setting default and mainly via the ASM_SPEC.

This is a follow up to r14-8332-g13127dac106724 which fixed an issue
caused by the default arch.  In this patch, it is mainly for gfx1100
and gfx1030 which always failed.  It also affects gfx906 and possibly
gfx900 but only when using the -mxnack/-msram-ecc flags explicitly.

What happens on the compiler side is mainly determined by gcn-hsa.h's
and otherwise by some default setting. In particular for xnack and
sram_ecc, there is:

For gfx1100 and gfx1030, neither xnack nor sram_ecc is set (only
'+wavefrontsize64').

For fiji, gfx900, gfx906 and gfx908 there is always -mattr=-xnack and
for all but gfx908 also -msram-ecc=no - independent of what has been
passed to the compiler. However, on the elf flags, the result differs:
For fiji, due to the HSACOv3, it is always set to 0 via
copy_early_debug_info; for gfx900, gfx906 and gfx908, xnack is OFF.
For sram-ecc, it is 'unset' for gfx900, 'any' for gfx906 and for
gfx908 it is 'any' unless overridden.

For gfx90a, the -msram-ecc= and -mxnack= are passed on, or if not present,
...=any is passed on.  Note that this "any" is different from argument
nor present at elf flag level:
For XNACK: unset/unsupported is 0, any = 0x100, off = 0x200, on = 0x300.
For SRAMECC: unset/unsupported is 0, any = 0x400, off = 0x800, on = 0xc00.

The obstack_ptr_grow changes are more to avoid confusion than having an
actual effect as they would overwise be filtered out via the ASM_SPEC.

gcc/ChangeLog:

	PR other/111966
	* config/gcn/mkoffload.cc (SET_XNACK_UNSET, TEST_SRAM_ECC_UNSET): New.
	(SET_SRAM_ECC_UNSUPPORTED): Renamed to ...
	(SET_SRAM_ECC_UNSET): ... this.
	(copy_early_debug_info): Remove gfx900 special case, now handled as
	part of the generic handling.
	(main): Update SRAM_ECC and XNACK for the -march as done in gcn-hsa.h.

Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
2024-01-29 11:10:33 +01:00
Tobias Burnus
cb366731e7 libgomp.c/declare-variant-4.h: Fix used variant function for gfx1030/gfx1100
libgomp/ChangeLog:

	* testsuite/libgomp.c/declare-variant-4.h: Use gfx1100/gfx1030
	function not gfx90a for gfx1100/gfx1030 context selector.

Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
2024-01-29 11:06:15 +01:00
Jakub Jelinek
b338fdbc2b tree-ssa-strlen: Fix pdata->maxlen computation [PR110603]
On the following testcase we emit an invalid range of [2, 1] due to
UB in the source.  Older VRP code silently swapped the boundaries and
made [1, 2] range out of it, but newer code just ICEs on it.

The reason for pdata->minlen 2 is that we see a memcpy in this case
setting both elements of the array to non-zero value, so strlen (a)
can't be smaller than 2.  The reason for pdata->maxlen 1 is that in
char a[2] array without UB there can be at most 1 non-zero character
because there needs to be '\0' termination in the buffer too.

IMHO we shouldn't create invalid ranges like that and even creating
for that case a range [1, 2] looks wrong to me, so the following patch
just doesn't set maxlen in that case to the array size - 1, matching
what will really happen at runtime when triggering such UB (strlen will
be at least 2, perhaps more or will crash).
This is what the second hunk of the patch does.

The first hunk fixes a fortunately harmless thinko.
If the strlen pass knows the string length (i.e. get_string_length
function returns non-NULL), we take a different path, we get to this
only if all we know is that there are certain number of non-zero
characters but we don't know what it is followed with, whether further
non-zero characters or zero termination or either of that.
If we know exactly how many non-zero characters it is, such as
char a[42];
...
  memcpy (a, "01234567890123456789", 20);
then we take an earlier if for the INTEGER_CST case and set correctly
just pdata->minlen to 20 in that case, but if we have something like
  int len;
  ...
  if (len < 15 || len > 32) return;
  memcpy (a, "0123456789012345678901234567890123456789", len);
then we have [15, 32] range for the nonzero_chars and we set pdata->minlen
correctly to 15, but incorrectly set also pdata->maxlen to 32.  That is
not what the above implies, it just means that in some cases we know that
there are at least 32 non-zero characters, followed by something we don't
know.  There is no guarantee that there is '\0' right after it, so it
means nothing.
The reason this is harmless, just confusing, is that the code a few lines
later fortunately overwrites this incorrect pdata->maxlen value with
something different (either array length - 1 or all ones etc.).

2024-01-29  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/110603
	* tree-ssa-strlen.cc (get_range_strlen_dynamic): Remove incorrect
	setting of pdata->maxlen to vr.upper_bound (which is unconditionally
	overwritten anyway).  Avoid creating invalid range with minlen
	larger than maxlen.  Formatting fix.

	* gcc.c-torture/compile/pr110603.c: New test.
2024-01-29 10:20:32 +01:00
Richard Biener
b702dc9802 debug/103047 - argument order of inlined functions
The inliner puts variables for parameters of the inlined functions
in the inline scope in reverse order.  The following reverses them
again so that we get consistent ordering between the
DW_TAG_subprogram DW_TAG_formal_parameter and the
DW_TAG_inlined_subroutine DW_TAG_formal_parameter set.

I failed to create a testcase with regexps since the inline
instances have just abstract origins and so I can't match them up.

	PR debug/103047
	* tree-inline.cc (initialize_inlined_parameters): Reverse
	the decl chain of inlined parameters.
2024-01-29 08:41:20 +01:00
Andrew Pinski
5b393ac7f1 testsuite: Fix vect_long_mult for 32-bit Power [PR109705]
As PR109705#c17, commit r14-7270 missed to consider long
type is 32bit with option -m32.  This patch is take care of
it accordingly.

Note that the vect_long_mult is supposed to be only used in
vect/ (generic), powerpc_altivec_ok would be guaranteed.

	PR testsuite/109705

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp (check_effective_target_vect_long_mult):
	Fix powerpc*-*-* checks by considering ilp32.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-01-28 20:35:05 -06:00
GCC Administrator
91b3da6f11 Daily bump. 2024-01-29 00:18:44 +00:00
Victor Do Nascimento
7dd4466b39 Libatomic: Add checks in ifunc selectors for LSE/LSE2 requirements.
At present, Evaluation of both `has_lse2(hwcap)' and
`has_lse128(hwcap)' may require issuing an `mrs' instruction to query
a system register.  This instruction, when issued from user-space
results in a trap by the kernel which then returns the value read in
by the system register.  Given the undesirable nature of the
computational expense associated with the context switch, it is
important to implement mechanisms to, wherever possible, forgo the
operation.

In light of this, given how other architectural requirements serving
as prerequisites have long been assigned HWCAP bits by the kernel, we
can inexpensively query for their availability before attempting to
read any system registers.  Where one of these early tests fail, we
can assert that the main feature of interest (be it LSE2 or LSE128)
cannot be present, allowing us to return from the function early and
skip the unnecessary expensive kernel-mediated access to system
registers.

libatomic/ChangeLog:

	* config/linux/aarch64/host-config.h (has_lse2): Add test for LSE.
	(has_lse128): Add test for LSE2.
2024-01-28 20:02:17 +00:00
Victor Do Nascimento
5ad64d76c0 libatomic: Enable LSE128 128-bit atomics for Armv9.4-a
The armv9.4-a architectural revision adds three new atomic operations
associated with the LSE128 feature:

  * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
  value held in a pair of registers, with original data loaded into
  the same 2 registers.
  * LDSETP - Atomic OR (bitset) of a location with 128-bit value held
  in a pair of registers, with original data loaded into the same 2
  registers.
  * SWPP - Atomic swap of one 128-bit value with 128-bit value held
  in a pair of registers.

It is worth noting that in keeping with existing 128-bit atomic
operations in `atomic_16.S', we have chosen to merge certain
less-restrictive orderings into more restrictive ones.  This is done
to minimize the number of branches in the atomic functions, minimizing
both the likelihood of branch mispredictions and, in keeping code
small, limit the need for extra fetch cycles.

Past benchmarking has revealed that acquire is typically slightly
faster than release (5-10%), such that for the most frequently used
atomics (CAS and SWP) it makes sense to add support for acquire, as
well as release.

Likewise, it was identified that combining acquire and release typically
results in little to no penalty, such that it is of negligible benefit
to distinguish between release and acquire-release, making the
combining release/acq_rel/seq_cst a worthwhile design choice.

This patch adds the logic required to make use of these when the
architectural feature is present and a suitable assembler available.

In order to do this, the following changes are made:

  1. Add a configure-time check to check for LSE128 support in the
  assembler.
  2. Edit host-config.h so that when N == 16, nifunc = 2.
  3. Where available due to LSE128, implement the second ifunc, making
  use of the novel instructions.
  4. For atomic functions unable to make use of these new
  instructions, define a new alias which causes the _i1 function
  variant to point ahead to the corresponding _i2 implementation.

libatomic/ChangeLog:

	* Makefile.am (AM_CPPFLAGS): add conditional setting of
	-DHAVE_FEAT_LSE128.
	* acinclude.m4 (LIBAT_TEST_FEAT_AARCH64_LSE128): New.
	* config/linux/aarch64/atomic_16.S (LSE128): New macro
	definition.
	(libat_exchange_16): New LSE128 variant.
	(libat_fetch_or_16): Likewise.
	(libat_or_fetch_16): Likewise.
	(libat_fetch_and_16): Likewise.
	(libat_and_fetch_16): Likewise.
	* config/linux/aarch64/host-config.h (IFUNC_COND_2): New.
	(IFUNC_NCOND): Add operand size checking.
	(has_lse2): Renamed from `ifunc1`.
	(has_lse128): New.
	(HWCAP2_LSE128): Likewise.
	* configure.ac: Add call to
	LIBAT_TEST_FEAT_AARCH64_LSE128.
	* configure (ac_subst_vars): Regenerated via autoreconf.
	* Makefile.in: Likewise.
	* auto-config.h.in: Likewise.
2024-01-28 20:02:01 +00:00
Victor Do Nascimento
a899a1f2f3 libatomic: Add support for __ifunc_arg_t arg in ifunc resolver
With support for new atomic features in Armv9.4-a being indicated by
HWCAP2 bits, Libatomic's ifunc resolver must now query its second
argument, of type __ifunc_arg_t*.

We therefore make this argument known to libatomic, allowing us to
query hwcap2 bits in the following manner:

  bool
  resolver (unsigned long hwcap, const __ifunc_arg_t *features);
  {
    return (features->hwcap2 & HWCAP2_<FEAT_NAME>);
  }

libatomic/ChangeLog:

	* config/linux/aarch64/host-config.h (__ifunc_arg_t):
	Conditionally-defined if `sys/ifunc.h' not found.
	(_IFUNC_ARG_HWCAP): Likewise.
	(IFUNC_COND_1): Pass __ifunc_arg_t argument to ifunc.
	(ifunc1): Modify function signature to accept __ifunc_arg_t
	argument.
	* configure.tgt: Add second `const __ifunc_arg_t *features'
	argument to IFUNC_RESOLVER_ARGS.
2024-01-28 19:52:42 +00:00
Victor Do Nascimento
e64602c025 libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface
The introduction of further architectural-feature dependent ifuncs
for AArch64 makes hard-coding ifunc `_i<n>' suffixes to functions
cumbersome to work with.  It is awkward to remember which ifunc maps
onto which arch feature and makes the code harder to maintain when new
ifuncs are added and their suffixes possibly altered.

This patch uses pre-processor `#define' statements to map each suffix to
a descriptive feature name macro, for example:

  #define LSE(NAME) NAME##_i1

Where we wish to generate ifunc names with the pre-processor's token
concatenation feature, we add a level of indirection to previous macro
calls.  If before we would have had`MACRO(<name>_i<n>)', we now have
`MACRO_FEAT(name, feature)'.  Where we wish to refer to base
functionality (i.e., functions where ifunc suffixes are absent), the
original `MACRO(<name>)' may be used to bypass suffixing.

Consequently, for base functionality, where the ifunc suffix is
absent, the macro interface remains the same.  For example, the entry
and endpoints of `libat_store_16' remain defined by:

  ENTRY (libat_store_16)

and

  END (libat_store_16)

For the LSE2 implementation of the same 16-byte atomic store, we now
have:

  ENTRY_FEAT (libat_store_16, LSE2)

and

  END_FEAT (libat_store_16, LSE2)

For the aliasing of function names, we define the following new
implementation of the ALIAS macro:

  ALIAS (FN_BASE_NAME, FROM_SUFFIX, TO_SUFFIX)

Defining the `CORE(NAME)' macro to be the identity operator, it
returns the base function name unaltered and allows us to alias
target-specific ifuncs to the corresponding base implementation.
For example, we'd alias the LSE2 `libat_exchange_16' to it base
implementation with:

  ALIAS (libat_exchange_16, LSE2, CORE)

libatomic/ChangeLog:
	* config/linux/aarch64/atomic_16.S (CORE): New macro.
	(LSE2): Likewise.
	(ENTRY_FEAT): Likewise.
	(ENTRY_FEAT1): Likewise.
	(END_FEAT): Likewise.
	(END_FEAT1): Likewise.
	(ALIAS): Modify macro to take in `arch' arguments.
	(ALIAS1): New.
2024-01-28 19:52:41 +00:00
Harald Anlauf
c4773944bb Fortran: NULL actual to optional dummy with VALUE attribute [PR113377]
gcc/fortran/ChangeLog:

	PR fortran/113377
	* trans-expr.cc (conv_dummy_value): Treat NULL actual argument to
	optional dummy with the VALUE attribute as not present.
	(gfc_conv_procedure_call): Likewise.

gcc/testsuite/ChangeLog:

	PR fortran/113377
	* gfortran.dg/optional_absent_11.f90: New test.
2024-01-28 20:06:37 +01:00
Iain Sandoe
f74f840d35 Objective-C, Darwin: Do not overalign CFStrings and Objective-C metadata.
We have reports of regressions in both Objective-C and Objective-C++ on
Darwin23 (macOS 14).  In some cases, these are linker warnings about the
alignment of CFString constants; in other cases the built executables
crash during runtime initialization.  The underlying issue is the same in
both cases; since the objects (CFStrings, Objective-C meta-data) are TU-
local, we are choosing to increase their alignment for efficiency - to
values greater than ABI alignment.

However, although these objects are TU-local, they are also visible to the
linker (since they are placed in specific named sections).  In many cases
the metadata can be regarded as tables of data, and thus it is expected
that these sections can be concatenated from multiple TUs and the data
treated as tabular.  In order for this to work the data cannot be allowed
to exceed ABI alignment - which leads to the crashes.

For GCC-15+ it would be nice to find a more elegant solution to this issue
(perhaps by adjusting the concept of binds-locally to exclude specific
named sections) - but I do not want to do that in stage 4.

The solution here is to force the alignment to be preserved as created by
setting DECL_USER_ALIGN on the relevant objects.

gcc/ChangeLog:

	* config/darwin.cc (darwin_build_constant_cfstring): Prevent over-
	alignment of CFString constants by setting DECL_USER_ALIGN.

gcc/objc/ChangeLog:

	* objc-next-runtime-abi-02.cc (build_v2_address_table): Prevent
	over-alignment of Objective-C metadata by setting DECL_USER_ALIGN
	on relevant variables.
	(build_v2_protocol_list_address_table): Likewise.
	(generate_v2_protocol_list): Likewise.
	(generate_v2_meth_descriptor_table): Likewise.
	(generate_v2_meth_type_list): Likewise.
	(generate_v2_property_table): Likewise.
	(generate_v2_dispatch_table): Likewise.
	(generate_v2_ivars_list): Likewise.
	(generate_v2_class_structs): Likewise.
	(build_ehtype): Likewise.
	* objc-runtime-shared-support.cc (generate_strings): Likewise.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-01-28 11:46:27 +00:00
Iain Sandoe
30d9e81c19 testsuite, Objective-C: Fix duplicate libobjc cases.
Two of the encode testcases include '-lobjc' as their dg-options.
Since the library is already appended as part of the generic testsuite
handling,  this means that two instances appear on the link line leading
to spurious warnings from Darwin's new linker.

gcc/testsuite/ChangeLog:

	* obj-c++.dg/encode-10.mm: Remove unneeded '-lobjc' option addition.
	* obj-c++.dg/encode-9.mm: Likewise.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-01-28 11:03:26 +00:00
Iain Sandoe
837827f8f2 Fix __builtin_nested_func_ptr_{created,deleted} symbol versions [PR113402]
The symbols for the functions supporting heap-based trampolines were
exported at an incorrect symbol version, the following patch fixes that.

As requested in the PR, this also renames __builtin_nested_func_ptr* to
__gcc_nested_func_ptr*.  In carrying our the rename, we move the builtins
to use DEF_EXT_LIB_BUILTIN.

	PR libgcc/113402

gcc/ChangeLog:

	* builtins.cc (expand_builtin): Handle BUILT_IN_GCC_NESTED_PTR_CREATED
	and BUILT_IN_GCC_NESTED_PTR_DELETED.
	* builtins.def (BUILT_IN_GCC_NESTED_PTR_CREATED,
	BUILT_IN_GCC_NESTED_PTR_DELETED): Make these builtins LIB-EXT and
	rename the library fallbacks to __gcc_nested_func_ptr_created and
	__gcc_nested_func_ptr_deleted.
	* doc/invoke.texi: Rename these to __gcc_nested_func_ptr_created
	and __gcc_nested_func_ptr_deleted.
	* tree-nested.cc (finalize_nesting_tree_1): Use builtin_explicit for
	BUILT_IN_GCC_NESTED_PTR_CREATED and BUILT_IN_GCC_NESTED_PTR_DELETED.
	* tree.cc (build_common_builtin_nodes): Build the
	BUILT_IN_GCC_NESTED_PTR_CREATED and BUILT_IN_GCC_NESTED_PTR_DELETED local
	builtins only for non-explicit.

libgcc/ChangeLog:

	* config/aarch64/heap-trampoline.c: Rename
	__builtin_nested_func_ptr_created to __gcc_nested_func_ptr_created and
	__builtin_nested_func_ptr_deleted to __gcc_nested_func_ptr_deleted.
	* config/i386/heap-trampoline.c: Likewise.
	* libgcc2.h: Likewise.
	* libgcc-std.ver.in (GCC_7.0.0): Likewise and then move
	__gcc_nested_func_ptr_created and
	__gcc_nested_func_ptr_deleted from this symbol version to ...
	(GCC_14.0.0): ... this one.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Co-authored-by: Jakub Jelinek  <jakub@redhat.com>
2024-01-28 10:59:34 +00:00
Iain Sandoe
557bea3d2e testsuite, jit: Stabilize error output.
Currently when a test fails, we print out a lot of information,
this includes items that are not stable between invocations (e.g.
the PID for the executable).  That makes automated comparisons
between test runs flag any persistent fails as new ones each time
which is not usually what is wanted.

This patch amends the error output to drop the variable portion
of the message and retain items that should only change if the
failure mode changes.

gcc/testsuite/ChangeLog:

	* jit.dg/jit.exp: Filter error output to remove per-run
	variable content.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-01-28 10:58:33 +00:00
YunQiang Su
46df13697a doc/invoke: Remove duplicate explicit-relocs entry of MIPS
When add new style -mexplicit-relocs=, the old style was not removed.

gcc
	* doc/invoke.texi: Remove duplicate MIPS explicit-relocs option.
2024-01-28 13:21:41 +08:00
GCC Administrator
3979171d8e Daily bump. 2024-01-28 00:16:34 +00:00
Jose E. Marchesi
f64448f4ff bpf: add constant pointer to helper-skb-ancestor-cgroup-id.c test
The purpose of this test is to make sure that constant propagation is
achieved with the proper optimization level, so a BPF call instruction
to a kernel helper is generated.  This patch updates the patch so it
also covers kernel helpers defined with constant static pointers.

The motivation for this patch is:

  https://lore.kernel.org/bpf/20240127185031.29854-1-jose.marchesi@oracle.com/T/#u

Tested in bpf-unknown-none target x86_64-linux-gnu host.

gcc/testsuite/ChangeLog

	* gcc.target/bpf/helper-skb-ancestor-cgroup-id.c: Add constant
	version of kernel helper static pointer.
2024-01-27 20:08:12 +01:00
Harald Anlauf
ce61de1b8a Fortran: fix bounds-checking errors for CLASS array dummies [PR104908]
Commit r11-1235 addressed issues with bounds of unlimited polymorphic array
dummies.  However, using the descriptor from sym->backend_decl does break
the case of CLASS array dummies.  The obvious solution is to restrict the
fix to the unlimited polymorphic case, thus keeping the original descriptor
in the ordinary case.

gcc/fortran/ChangeLog:

	PR fortran/104908
	* trans-array.cc (gfc_conv_array_ref): Restrict use of transformed
	descriptor (sym->backend_decl) to the unlimited polymorphic case.

gcc/testsuite/ChangeLog:

	PR fortran/104908
	* gfortran.dg/pr104908.f90: New test.
2024-01-27 17:41:43 +01:00
H.J. Lu
7cc9adc62c x86: Don't save callee-saved registers in noreturn functions
There is no need to save callee-saved registers in noreturn functions
if they don't throw nor support exceptions.  We can treat them the same
as functions with no_callee_saved_registers attribute.

Adjust stack-check-17.c for noreturn function which no longer saves any
registers.

With this change, __libc_start_main in glibc 2.39, which is a noreturn
function, is changed from

__libc_start_main:
	endbr64
	push   %r15
	push   %r14
	mov    %rcx,%r14
	push   %r13
	push   %r12
	push   %rbp
	mov    %esi,%ebp
	push   %rbx
	mov    %rdx,%rbx
	sub    $0x28,%rsp
	mov    %rdi,(%rsp)
	mov    %fs:0x28,%rax
	mov    %rax,0x18(%rsp)
	xor    %eax,%eax
	test   %r9,%r9

to

__libc_start_main:
	endbr64
        sub    $0x28,%rsp
        mov    %esi,%ebp
        mov    %rdx,%rbx
        mov    %rcx,%r14
        mov    %rdi,(%rsp)
        mov    %fs:0x28,%rax
        mov    %rax,0x18(%rsp)
        xor    %eax,%eax
        test   %r9,%r9

In Linux kernel 6.7.0 on x86-64, do_exit is changed from

do_exit:
        endbr64
        call   <do_exit+0x9>
        push   %r15
        push   %r14
        push   %r13
        push   %r12
        mov    %rdi,%r12
        push   %rbp
        push   %rbx
        mov    %gs:0x0,%rbx
        sub    $0x28,%rsp
        mov    %gs:0x28,%rax
        mov    %rax,0x20(%rsp)
        xor    %eax,%eax
        call   *0x0(%rip)        # <do_exit+0x39>
        test   $0x2,%ah
        je     <do_exit+0x8d3>

to

do_exit:
        endbr64
        call   <do_exit+0x9>
        sub    $0x28,%rsp
        mov    %rdi,%r12
        mov    %gs:0x28,%rax
        mov    %rax,0x20(%rsp)
        xor    %eax,%eax
        mov    %gs:0x0,%rbx
        call   *0x0(%rip)        # <do_exit+0x2f>
        test   $0x2,%ah
        je     <do_exit+0x8c9>

I compared GCC master branch bootstrap and test times on a slow machine
with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13
with the backported patch.  The performance data isn't precise since the
measurements were done on different days with different GCC sources under
different 6.6 kernel versions.

GCC master branch build time in seconds:

before                after                  improvement
30043.75user          30013.16user           0%
1274.85system         1243.72system          2.4%

GCC master branch test time in seconds (new tests added):

before                after                  improvement
216035.90user         216547.51user          0
27365.51system        26658.54system         2.6%

gcc/

	PR target/38534
	* config/i386/i386-options.cc (ix86_set_func_type): Don't
	save and restore callee saved registers for a noreturn function
	with nothrow or compiled with -fno-exceptions.

gcc/testsuite/

	PR target/38534
	* gcc.target/i386/pr38534-1.c: New file.
	* gcc.target/i386/pr38534-2.c: Likewise.
	* gcc.target/i386/pr38534-3.c: Likewise.
	* gcc.target/i386/pr38534-4.c: Likewise.
	* gcc.target/i386/stack-check-17.c: Updated.
2024-01-27 04:10:49 -08:00
H.J. Lu
a96549dce7 x86: Add no_callee_saved_registers function attribute
When an interrupt handler is implemented by an assembly stub which does:

1. Save all registers.
2. Call a C function.
3. Restore all registers.
4. Return from interrupt.

it is completely unnecessary to save and restore any registers in the C
function called by the assembly stub, even if they would normally be
callee-saved.

Add no_callee_saved_registers function attribute, which is complementary
to no_caller_saved_registers function attribute, to mark a function which
doesn't have any callee-saved registers.  Such a function won't save and
restore any registers.  Classify function call-saved register handling
type with:

1. Default call-saved registers.
2. No caller-saved registers with no_caller_saved_registers attribute.
3. No callee-saved registers with no_callee_saved_registers attribute.

Disallow sibcall if callee is a no_callee_saved_registers function
and caller isn't a no_callee_saved_registers function.  Otherwise,
callee-saved registers won't be preserved.

After a no_callee_saved_registers function is called, all registers may
be clobbered.  If the calling function isn't a no_callee_saved_registers
function, we need to preserve all registers which aren't used by function
calls.

gcc/

	PR target/103503
	PR target/113312
	* config/i386/i386-expand.cc (ix86_expand_call): Replace
	no_caller_saved_registers check with call_saved_registers check.
	Clobber all registers that are not used by the callee with
	no_callee_saved_registers attribute.
	* config/i386/i386-options.cc (ix86_set_func_type): Set
	call_saved_registers to TYPE_NO_CALLEE_SAVED_REGISTERS for
	noreturn function.  Disallow no_callee_saved_registers with
	interrupt or no_caller_saved_registers attributes together.
	(ix86_set_current_function): Replace no_caller_saved_registers
	check with call_saved_registers check.
	(ix86_handle_no_caller_saved_registers_attribute): Renamed to ...
	(ix86_handle_call_saved_registers_attribute): This.
	(ix86_gnu_attributes): Add
	ix86_handle_call_saved_registers_attribute.
	* config/i386/i386.cc (ix86_conditional_register_usage): Replace
	no_caller_saved_registers check with call_saved_registers check.
	(ix86_function_ok_for_sibcall): Don't allow callee with
	no_callee_saved_registers attribute when the calling function
	has callee-saved registers.
	(ix86_comp_type_attributes): Also check
	no_callee_saved_registers.
	(ix86_epilogue_uses): Replace no_caller_saved_registers check
	with call_saved_registers check.
	(ix86_hard_regno_scratch_ok): Likewise.
	(ix86_save_reg): Replace no_caller_saved_registers check with
	call_saved_registers check.  Don't save any registers for
	TYPE_NO_CALLEE_SAVED_REGISTERS.  Save all registers with
	TYPE_DEFAULT_CALL_SAVED_REGISTERS if function with
	no_callee_saved_registers attribute is called.
	(find_drap_reg): Replace no_caller_saved_registers check with
	call_saved_registers check.
	* config/i386/i386.h (call_saved_registers_type): New enum.
	(machine_function): Replace no_caller_saved_registers with
	call_saved_registers.
	* doc/extend.texi: Document no_callee_saved_registers attribute.

gcc/testsuite/

	PR target/103503
	PR target/113312
	* gcc.dg/torture/no-callee-saved-run-1a.c: New file.
	* gcc.dg/torture/no-callee-saved-run-1b.c: Likewise.
	* gcc.target/i386/no-callee-saved-1.c: Likewise.
	* gcc.target/i386/no-callee-saved-2.c: Likewise.
	* gcc.target/i386/no-callee-saved-3.c: Likewise.
	* gcc.target/i386/no-callee-saved-4.c: Likewise.
	* gcc.target/i386/no-callee-saved-5.c: Likewise.
	* gcc.target/i386/no-callee-saved-6.c: Likewise.
	* gcc.target/i386/no-callee-saved-7.c: Likewise.
	* gcc.target/i386/no-callee-saved-8.c: Likewise.
	* gcc.target/i386/no-callee-saved-9.c: Likewise.
	* gcc.target/i386/no-callee-saved-10.c: Likewise.
	* gcc.target/i386/no-callee-saved-11.c: Likewise.
	* gcc.target/i386/no-callee-saved-12.c: Likewise.
	* gcc.target/i386/no-callee-saved-13.c: Likewise.
	* gcc.target/i386/no-callee-saved-14.c: Likewise.
	* gcc.target/i386/no-callee-saved-15.c: Likewise.
	* gcc.target/i386/no-callee-saved-16.c: Likewise.
	* gcc.target/i386/no-callee-saved-17.c: Likewise.
	* gcc.target/i386/no-callee-saved-18.c: Likewise.
2024-01-27 04:10:49 -08:00
Jakub Jelinek
a12b0e9360 lower-bitint: Avoid sign-extending cast to unsigned types feeding div/mod/float [PR113614]
The following testcase is miscompiled, because some narrower value
is sign-extended to wider unsigned _BitInt used as division operand.
handle_operand_addr for that case returns the narrower value and
precision -prec_of_narrower_value.  That works fine for multiplication
(at least, normal multiplication, but we don't merge casts with
.MUL_OVERFLOW or the ubsan multiplication right now), because the
result is the same whether we treat the arguments as signed or unsigned.
But is completely wrong for division/modulo or conversions to
floating-point, if we pass negative prec for an input operand of a libgcc
handler, those treat it like a negative number, not an unsigned one
sign-extended from something smaller (and it doesn't know to what precision
it has been extended).

So, the following patch fixes it by making sure we don't merge such
sign-extensions to unsigned _BitInt type with division, modulo or
conversions to floating point.

2024-01-27  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/113614
	* gimple-lower-bitint.cc (gimple_lower_bitint): Don't merge
	widening casts from signed to unsigned types with TRUNC_DIV_EXPR,
	TRUNC_MOD_EXPR or FLOAT_EXPR uses.

	* gcc.dg/torture/bitint-54.c: New test.
2024-01-27 13:06:55 +01:00
Jakub Jelinek
3f5ac46963 lower-bitint: Fix up VIEW_CONVERT_EXPR handling in lower_mergeable_stmt [PR113568]
We generally allow merging mergeable stmts with some final cast (but not
further casts or mergeable operations after the cast).  As some casts
are handled conditionally, if (idx < cst) handle_operand (idx); else if
idx == cst) handle_operand (cst); else ..., we must sure that e.g. the
mergeable PLUS_EXPR/MINUS_EXPR/NEGATE_EXPR never appear in handle_operand
called from such casts, because it ICEs on invalid SSA_NAME form (that part
could be fixable by adding further PHIs) but also because we'd need to
correctly propagate the overflow flags from the if to else if.
So, instead lower_mergeable_stmt handles an outermost widening cast (or
widening cast feeding outermost store) specially.
The problem was similar to PR113408, that VIEW_CONVERT_EXPR tree is
present in the gimple_assign_rhs1 while it is not for NOP_EXPR/CONVERT_EXPR,
so the checks whether the outermost cast should be handled didn't handle
the VCE case and so handle_plus_minus was called from the conditional
handle_cast.

2024-01-27  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/113568
	* gimple-lower-bitint.cc (bitint_large_huge::lower_mergeable_stmt):
	For VIEW_CONVERT_EXPR use first operand of rhs1 instead of rhs1
	in the widening extension checks.

	* gcc.dg/bitint-78.c: New test.
2024-01-27 13:06:17 +01:00
Jakub Jelinek
675e522903 lower-bitint: Add debugging dump of SSA_NAME -> decl mappings
While the SSA coalescing performed by lower bitint prints some information
if -fdump-tree-bitintlower-details, it is really hard to read and doesn't
contain the most important information which one looks for when debugging
bitint lowering issues, namely what VAR_DECLs (or PARM_DECLs/RESULT_DECLs)
each SSA_NAME in large_huge.m_names bitmap maps to.

So, the following patch adds dumping of that, so that we know that say
_3 -> bitint.3
_8 -> bitint.7
_16 -> bitint.7
etc.

2024-01-27  Jakub Jelinek  <jakub@redhat.com>

	* gimple-lower-bitint.cc (gimple_lower_bitint): For
	TDF_DETAILS dump mapping of SSA_NAMEs to decls.
2024-01-27 13:05:30 +01:00
Lewis Hyatt
5200ef26ac c-family: Fix ICE with large column number after restoring a PCH [PR105608]
Users are allowed to define macros prior to restoring a precompiled header
file, as long as those macros are not defined (or are defined identically)
in the PCH.  However, the PCH restoration process destroys all the macro
definitions, so libcpp has to record them before restoring the PCH and then
redefine them afterward.

This process does not currently assign great locations to the macros after
redefining them. Some work is needed to also remember the original locations
and get the line_maps instance in the right state (since, like all other
data structures, the line_maps instance is also reset after restoring a PCH).
The new testcase line-map-3.C contains XFAILed examples where the locations
are wrong.

This patch addresses a more pressing issue, which is that we ICE in some
cases since GCC 11, hitting an assert in line-maps.cc. It happens if the
first line encountered after the PCH restore requires an LC_RENAME map, such
as will happen if the line is sufficiently long.  This is much easier to
fix, since we just need to call linemap_line_start before asking libcpp to
redefine the stored macros, instead of afterward, to avoid the unexpected
need for an LC_RENAME before an LC_ENTER has been seen.

gcc/c-family/ChangeLog:

	PR preprocessor/105608
	* c-pch.cc (c_common_read_pch): Start a new line map before asking
	libcpp to restore macros defined prior to reading the PCH, instead
	of afterward.

gcc/testsuite/ChangeLog:

	PR preprocessor/105608
	* g++.dg/pch/line-map-1.C: New test.
	* g++.dg/pch/line-map-1.Hs: New test.
	* g++.dg/pch/line-map-2.C: New test.
	* g++.dg/pch/line-map-2.Hs: New test.
	* g++.dg/pch/line-map-3.C: New test.
	* g++.dg/pch/line-map-3.Hs: New test.
2024-01-26 23:27:53 -05:00
GCC Administrator
ce9dae5640 Daily bump. 2024-01-27 00:18:16 +00:00
Hans-Peter Nilsson
4eb8367042 c/c++: Tweak warning for 'always_inline function might not be inlinable'
When you're not regularly exposed to this warning, it is
easy to be misled by its wording, believing that there's
something else in the function that stops it from being
inlined, something other than the lack of also being
*declared* inline.  Also, clang does not warn.

It's just a warning: without the inline directive, there has
to be a secondary reason for the function to be inlined,
other than the always_inline attribute, a reason that may be
in effect despite the warning.

Whenever the text is quoted in inline-related bugzilla
entries, there seems to often have been an initial step of
confusion that has to be cleared, for example in PR55830.
A file in the powerpc-specific parts of the test-suite,
gcc.target/powerpc/vec-extract-v16qiu-v2.h, has a comment
and seems to be another example, and I testify as the
first-hand third "experience".  The wording has been the
same since the warning was added.

Let's just tweak the wording, adding the cause, so that the
reason for the warning is clearer.  This hopefully stops the
user from immediately asking "'Might'?  Because why?"  and
then going off looking at the function body - or grepping
the gcc source or documentation, or enter a bug-report
subsequently closed as resolved/invalid.

Since the message is only appended with additional
information, no test-case actually required adjustment.
I still changed them, so the message is covered.

gcc:
	* cgraphunit.cc (process_function_and_variable_attributes): Tweak
	the warning for an attribute-always_inline without inline declaration.

gcc/testsuite:
	* g++.dg/Wattributes-3.C: Adjust expected warning.
	* gcc.dg/fail_always_inline.c: Ditto.
2024-01-27 00:55:01 +01:00
Nathaniel Shead
ec57d183d3 c++: Stream additional fields for DECL_STRUCT_FUNCTION [PR113580]
Currently the DECL_STRUCT_FUNCTION for a declaration is always
reconstructed from scratch. This causes issues though, as some fields
used by other parts of the compiler (in this case, specifically
'function_{start,end}_locus') are then not correctly initialised. This
patch makes sure that these fields are also read and written.

	PR c++/113580

gcc/cp/ChangeLog:

	* module.cc (struct post_process_data): Create.
	(trees_in::post_decls): Use.
	(trees_in::post_process): Return entire vector at once.
	Change overload to take post_process_data instead of tree.
	(trees_out::write_function_def): Write needed flags from
	DECL_STRUCT_FUNCTION.
	(trees_in::read_function_def): Read them and pass to
	post_process.
	(module_state::read_cluster): Write flags into cfun.

gcc/testsuite/ChangeLog:

	* g++.dg/modules/pr113580_a.C: New test.
	* g++.dg/modules/pr113580_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2024-01-27 09:29:05 +11:00
Maciej W. Rozycki
5a874dec60 RISC-V/testsuite: Add RTL cset-sext.c testcase variants
Add RTL tests, for RV64 and RV32 where appropriate, corresponding to the
existing cset-sext.c tests.  They have been produced from RTL code as at
the entry of the "ce1" pass for the respective cset-sext.c tests built
at -O3.

	gcc/testsuite/
	* gcc.target/riscv/cset-sext-rtl.c: New file.
	* gcc.target/riscv/cset-sext-rtl32.c: New file.
	* gcc.target/riscv/cset-sext-sfb-rtl.c: New file.
	* gcc.target/riscv/cset-sext-sfb-rtl32.c: New file.
	* gcc.target/riscv/cset-sext-thead-rtl.c: New file.
	* gcc.target/riscv/cset-sext-ventana-rtl.c: New file.
	* gcc.target/riscv/cset-sext-zicond-rtl.c: New file.
	* gcc.target/riscv/cset-sext-zicond-rtl32.c: New file.
2024-01-26 21:47:40 +00:00
Maciej W. Rozycki
d4e15084e2 RISC-V/testsuite: Add RTL pr105314.c testcase variants
Add a pair of RTL tests, for RV64 and RV32 respectively, corresponding
to the existing pr105314.c test.  They have been produced from RTL code
as at the entry of the "ce1" pass for pr105314.c compiled at -O3.

	gcc/testsuite/
	* gcc.target/riscv/pr105314-rtl.c: New file.
	* gcc.target/riscv/pr105314-rtl32.c: New file.
2024-01-26 21:47:40 +00:00
Maciej W. Rozycki
3e3b9b708d RISC-V/testsuite: Also verify if-conversion runs for pr105314.c
Verify that if-conversion succeeded through noce_try_store_flag_mask, as
per PR rtl-optimization/105314, tightening the test case and making it
explicit.

	gcc/testsuite/
	* gcc.target/riscv/pr105314.c: Scan the RTL "ce1" pass too.
2024-01-26 21:47:40 +00:00
Maciej W. Rozycki
a0596173c8 RISC-V/testsuite: Widen coverage for pr105314.c
The optimization levels pr105314.c is iterated over are needlessly
overridden with "-O2", limiting the coverage of the test case to that
level, perhaps with additional options the original optimization level
has been supplied with.  We could prevent the extra iterations other
than "-O2" from being run, but the transformation made by if-conversion
is also expected to happen at other optimization levels, so include them
all, and also make sure no reverse-condition branch appears in output,
moving the `dg-final' command to the bottom, as with most test cases.

	gcc/testsuite/
	* gcc.target/riscv/pr105314.c: Replace `dg-options' command with
	`dg-skip-if'.  Also reject "bne" with `dg-final'.
2024-01-26 21:47:40 +00:00
Robin Dapp
861997a9c7 genopinit: Split init_all_optabs [PR113575].
init_all_optabs initializes > 10000 patterns for riscv targets.  This
leads to pathological situations in dataflow analysis (which can occur
with many adjacent stores).
To alleviate this this patch makes genopinit split the init_all_optabs
function into several init_optabs_xx functions that each initialize 1000
patterns.

With this change insn-opinit.cc's compilation time is reduced from 4+
minutes to 1:30 and memory consumption decreases from 1.2G to 630M.

gcc/ChangeLog:

	PR other/113575

	* genopinit.cc (main): Split init_all_optabs into functions
	of 1000 patterns each.
2024-01-26 22:12:06 +01:00
Gaius Mulley
eb619490b0 modula2: detect string and pointer formal and actual parameter incompatibility
This patch improves the location accuracy of parameters and fixes bugs
in parameter checking in M2Check.  It also corrects the location
of constant declarations.

gcc/m2/ChangeLog:

	* gm2-compiler/M2Check.mod (dumpIndice): New procedure.
	(dumpIndex): New procedure.
	(dumptInfo): New procedure.
	(buildError4): Add comment and pass formal and actual to
	MetaError4.  Improve text describing error.
	(buildError2): Generate different error descriptions for
	the three error kinds.
	(checkConstMeta): Add block comment.  Add more meta checks
	and call doCheckPair to complete string const checking.
	Add tinfo parameter.
	(checkConstEquivalence): Add tinfo parameter.
	* gm2-compiler/M2GCCDeclare.mod (PrintVerboseFromList):
	Print the length of a const string.
	* gm2-compiler/M2GenGCC.mod (CodeParam): Remove parameters
	op1, op2 and op3.
	(doParam): Add paramtok parameter.  Use paramtok instead rather
	than CurrentQuadToken.
	(CodeParam): Rewrite.
	* gm2-compiler/M2Quads.mod (CheckProcedureParameters):
	Add comments explaining that const strings are not checked
	in M2Quads.mod.
	(FailParameter): Use MetaErrorT2 with tokpos rather than
	MetaError2.
	(doBuildBinaryOp): Assign OldPos and OperatorPos before the
	IF block.
	* gm2-compiler/SymbolTable.mod (PutConstString): Add call to
	InitWhereDeclaredTok.

gcc/testsuite/ChangeLog:

	* gm2/pim/fail/badpointer4.mod: New test.
	* gm2/pim/fail/strconst.def: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2024-01-26 19:04:48 +00:00
Richard Biener
c34ab549d8 Avoid registering unsupported OMP offload devices
The following avoids registering unsupported GCN offload devices
when iterating over available ones.  With a Zen4 desktop CPU
you will have an IGPU (unspported) which will otherwise be made
available.  This causes testcases like
libgomp.c-c++-common/non-rect-loop-1.c which iterate over all
decives to FAIL.

libgomp/
	* plugin/plugin-gcn.c (suitable_hsa_agent_p): Filter out
	agents with unsupported ISA.
2024-01-26 15:36:35 +01:00
Richard Biener
209ed06c3a Fix architecture support in OMP_OFFLOAD_init_device for gcn
The following makes the existing architecture support check work
instead of being optimized away (enum vs. -1).  This avoids
later asserts when we assume such devices are never actually
used.

libgomp/
	* plugin/plugin-gcn.c
	(EF_AMDGPU_MACH::EF_AMDGPU_MACH_UNSUPPORTED): Add.
	(isa_code): Return that instead of -1.
	(GOMP_OFFLOAD_init_device): Adjust.
2024-01-26 15:36:35 +01:00
Tobias Burnus
56d0aba11a amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs
gcc/ChangeLog:

	* config.gcc (amdgcn-*-*): Add gfx1030 and gfx1100 to
	TM_MULTILIB_CONFIG.
	* doc/install.texi (Configuration amdgcn-*-*): Mention gfx1030/gfx1100.
	* doc/invoke.texi (AMD GCN Options): Add gfx1030 and gfx1100 to
	-march/-mtune.

libgomp/ChangeLog:

	* testsuite/libgomp.c/declare-variant-4.h: Add variant functions
	for gfx1030 and gfx1100.
	* testsuite/libgomp.c/declare-variant-4-gfx1030.c: New test.
	* testsuite/libgomp.c/declare-variant-4-gfx1100.c: New test.

Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
2024-01-26 15:11:09 +01:00
Andrew Stubbs
99890e1552 amdgcn: additional gfx1030/gfx1100 support
This is enough to get gfx1030 and gfx1100 working; there are still some test
failures to investigate, and probably some tuning to do.

gcc/ChangeLog:

	* config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3.
	* config/gcn/gcn-valu.md (all_convert): New iterator.
	(<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): New
	define_expand, and rename the old one to ...
	(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): ... this.
	(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): Likewise, to ...
	(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): .. this.
	(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>): New.
	* config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly.
	(gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100.
	* config/gcn/gcn.md (<u>mulhisi3): Disable on RDNA3.
	(<u>mulqihi3_scalar): Likewise.

libgcc/ChangeLog:

	* config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3.

libgomp/ChangeLog:

	* config/gcn/time.c (RTC_TICKS): Configure RDNA3.
	(omp_get_wtime): Add RDNA3-compatible variant.
	* plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100.

Signed-off-by:  Andrew Stubbs <ams@baylibre.com>
2024-01-26 11:38:47 +00:00
Nathaniel Shead
a0dde47f84 c++: Emit definitions of ODR-used static members imported from modules [PR112899]
Static data members marked 'inline' should be emitted in TUs where they
are ODR-used.  We need to make sure that inlines imported from modules
are correctly added to the 'pending_statics' map so that they get
emitted if needed, otherwise the attached testcase fails to link.

	PR c++/112899

gcc/cp/ChangeLog:

	* cp-tree.h (note_variable_template_instantiation): Rename to...
	(note_vague_linkage_variable): ...this.
	* decl2.cc (note_variable_template_instantiation): Rename to...
	(note_vague_linkage_variable): ...this.
	* pt.cc (instantiate_decl): Rename usage of above function.
	* module.cc (trees_in::read_var_def): Remember pending statics
	that we stream in.

gcc/testsuite/ChangeLog:

	* g++.dg/modules/init-4_a.C: New test.
	* g++.dg/modules/init-4_b.C: New test.
	* g++.dg/modules/init-6_a.H: New test.
	* g++.dg/modules/init-6_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jason Merrill <jason@redhat.com
2024-01-26 22:28:45 +11:00
Richard Biener
f9b143d239 tree-optimization/113602 - datarefs of non-addressables
We can end up creating ADDR_EXPRs of non-addressable entities during
for example vectorization.  The following plugs this in data-ref
analysis when that would create such invalid ADDR_EXPR as part of
analyzing the ref structure.

	PR tree-optimization/113602
	* tree-data-ref.cc (dr_analyze_innermost): Fail when
	the base object isn't addressable.

	* gcc.dg/pr113602.c: New testcase.
2024-01-26 11:25:05 +01:00
Tobias Burnus
4b5650acb3 gcn/gcn-hsa.h: Always pass --amdhsa-code-object-version= in ASM_SPEC
Since LLVM commit 082f87c9d418 (Pull Req. #79038; will become LLVM 18)
  "[AMDGPU] Change default AMDHSA Code Object version to 5"
the default - when no --amdhsa-code-object-version= is used - was bumped.

Using --amdhsa-code-object-version=5 is supported (with unknown limitations)
since LLVM 14. GCC required for proper support at least LLVM 13.0.1 such
that explicitly using COV5 is not possible.

Unfortunately, the COV number matters for debugging ("-g") as mkoffload.cc
extracts debugging data from the host's object file and writes into an
an AMD GPU object file it creates. And all object files linked together
must have the same ABI version.

gcc/ChangeLog:

	* config/gcn/gcn-hsa.h (ABI_VERSION_SPEC): New; creates the
	"--amdhsa-code-object-version=" argument.
	(ASM_SPEC): Use it; replace previous version of it.

Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
2024-01-26 10:14:09 +01:00