Commit graph

206130 commits

Author SHA1 Message Date
Alexandre Oliva
f0a90c7d73 Introduce strub: machine-independent stack scrubbing
This patch adds the strub attribute for function and variable types,
command-line options, passes and adjustments to implement it,
documentation, and tests.

Stack scrubbing is implemented in a machine-independent way: functions
with strub enabled are modified so that they take an extra stack
watermark argument, that they update with their stack use, and the
caller can then zero it out once it regains control, whether by return
or exception.  There are two ways to go about it: at-calls, that
modifies the visible interface (signature) of the function, and
internal, in which the body is moved to a clone, the clone undergoes
the interface change, and the function becomes a wrapper, preserving
its original interface, that calls the clone and then clears the stack
used by it.

Variables can also be annotated with the strub attribute, so that
functions that read from them get stack scrubbing enabled implicitly,
whether at-calls, for functions only usable within a translation unit,
or internal, for functions whose interfaces must not be modified.

There is a strict mode, in which functions that have their stack
scrubbed can only call other functions with stack-scrubbing
interfaces, or those explicitly marked as callable from strub
contexts, so that an entire call chain gets scrubbing, at once or
piecemeal depending on optimization levels.  In the default mode,
relaxed, this requirement is not enforced by the compiler.

The implementation adds two IPA passes, one that assigns strub modes
early on, another that modifies interfaces and adds calls to the
builtins that jointly implement stack scrubbing.  Another builtin,
that obtains the stack pointer, is added for use in the implementation
of the builtins, whether expanded inline or called in libgcc.

There are new command-line options to change operation modes and to
force the feature disabled; it is enabled by default, but it has no
effect and is implicitly disabled if the strub attribute is never
used.  There are also options meant to use for testing the feature,
enabling different strubbing modes for all (viable) functions.


for  gcc/ChangeLog

	* Makefile.in (OBJS): Add ipa-strub.o.
	(GTFILES): Add ipa-strub.cc.
	* builtins.def (BUILT_IN_STACK_ADDRESS): New.
	(BUILT_IN___STRUB_ENTER): New.
	(BUILT_IN___STRUB_UPDATE): New.
	(BUILT_IN___STRUB_LEAVE): New.
	* builtins.cc: Include ipa-strub.h.
	(STACK_STOPS, STACK_UNSIGNED): Define.
	(expand_builtin_stack_address): New.
	(expand_builtin_strub_enter): New.
	(expand_builtin_strub_update): New.
	(expand_builtin_strub_leave): New.
	(expand_builtin): Call them.
	* common.opt (fstrub=*): New options.
	* doc/extend.texi (strub): New type attribute.
	(__builtin_stack_address): New function.
	(Stack Scrubbing): New section.
	* doc/invoke.texi (-fstrub=*): New options.
	(-fdump-ipa-*): New passes.
	* gengtype-lex.l: Ignore multi-line pp-directives.
	* ipa-inline.cc: Include ipa-strub.h.
	(can_inline_edge_p): Test strub_inlinable_to_p.
	* ipa-split.cc: Include ipa-strub.h.
	(execute_split_functions): Test strub_splittable_p.
	* ipa-strub.cc, ipa-strub.h: New.
	* passes.def: Add strub_mode and strub passes.
	* tree-cfg.cc (gimple_verify_flow_info): Note on debug stmts.
	* tree-pass.h (make_pass_ipa_strub_mode): Declare.
	(make_pass_ipa_strub): Declare.
	(make_pass_ipa_function_and_variable_visibility): Fix
	formatting.
	* tree-ssa-ccp.cc (optimize_stack_restore): Keep restores
	before strub leave.
	* attribs.cc: Include ipa-strub.h.
	(decl_attributes): Support applying attributes to function
	type, rather than pointer type, at handler's request.
	(comp_type_attributes): Combine strub_comptypes and target
	comp_type results.
	* doc/tm.texi.in (TARGET_STRUB_USE_DYNAMIC_ARRAY): New.
	(TARGET_STRUB_MAY_USE_MEMSET): New.
	* doc/tm.texi: Rebuilt.
	* cgraph.h (symtab_node::reset): Add preserve_comdat_group
	param, with a default.
	* cgraphunit.cc (symtab_node::reset): Use it.

for  gcc/c-family/ChangeLog

	* c-attribs.cc: Include ipa-strub.h.
	(handle_strub_attribute): New.
	(c_common_attribute_table): Add strub.

for  gcc/ada/ChangeLog

	* gcc-interface/trans.cc: Include ipa-strub.h.
	(gigi): Make internal decls for targets of compiler-generated
	calls strub-callable too.
	(build_raise_check): Likewise.
	* gcc-interface/utils.cc: Include ipa-strub.h.
	(handle_strub_attribute): New.
	(gnat_internal_attribute_table): Add strub.

for  gcc/testsuite/ChangeLog

	* c-c++-common/strub-O0.c: New.
	* c-c++-common/strub-O1.c: New.
	* c-c++-common/strub-O2.c: New.
	* c-c++-common/strub-O2fni.c: New.
	* c-c++-common/strub-O3.c: New.
	* c-c++-common/strub-O3fni.c: New.
	* c-c++-common/strub-Og.c: New.
	* c-c++-common/strub-Os.c: New.
	* c-c++-common/strub-all1.c: New.
	* c-c++-common/strub-all2.c: New.
	* c-c++-common/strub-apply1.c: New.
	* c-c++-common/strub-apply2.c: New.
	* c-c++-common/strub-apply3.c: New.
	* c-c++-common/strub-apply4.c: New.
	* c-c++-common/strub-at-calls1.c: New.
	* c-c++-common/strub-at-calls2.c: New.
	* c-c++-common/strub-defer-O1.c: New.
	* c-c++-common/strub-defer-O2.c: New.
	* c-c++-common/strub-defer-O3.c: New.
	* c-c++-common/strub-defer-Os.c: New.
	* c-c++-common/strub-internal1.c: New.
	* c-c++-common/strub-internal2.c: New.
	* c-c++-common/strub-parms1.c: New.
	* c-c++-common/strub-parms2.c: New.
	* c-c++-common/strub-parms3.c: New.
	* c-c++-common/strub-relaxed1.c: New.
	* c-c++-common/strub-relaxed2.c: New.
	* c-c++-common/strub-short-O0-exc.c: New.
	* c-c++-common/strub-short-O0.c: New.
	* c-c++-common/strub-short-O1.c: New.
	* c-c++-common/strub-short-O2.c: New.
	* c-c++-common/strub-short-O3.c: New.
	* c-c++-common/strub-short-Os.c: New.
	* c-c++-common/strub-strict1.c: New.
	* c-c++-common/strub-strict2.c: New.
	* c-c++-common/strub-tail-O1.c: New.
	* c-c++-common/strub-tail-O2.c: New.
	* c-c++-common/torture/strub-callable1.c: New.
	* c-c++-common/torture/strub-callable2.c: New.
	* c-c++-common/torture/strub-const1.c: New.
	* c-c++-common/torture/strub-const2.c: New.
	* c-c++-common/torture/strub-const3.c: New.
	* c-c++-common/torture/strub-const4.c: New.
	* c-c++-common/torture/strub-data1.c: New.
	* c-c++-common/torture/strub-data2.c: New.
	* c-c++-common/torture/strub-data3.c: New.
	* c-c++-common/torture/strub-data4.c: New.
	* c-c++-common/torture/strub-data5.c: New.
	* c-c++-common/torture/strub-indcall1.c: New.
	* c-c++-common/torture/strub-indcall2.c: New.
	* c-c++-common/torture/strub-indcall3.c: New.
	* c-c++-common/torture/strub-inlinable1.c: New.
	* c-c++-common/torture/strub-inlinable2.c: New.
	* c-c++-common/torture/strub-ptrfn1.c: New.
	* c-c++-common/torture/strub-ptrfn2.c: New.
	* c-c++-common/torture/strub-ptrfn3.c: New.
	* c-c++-common/torture/strub-ptrfn4.c: New.
	* c-c++-common/torture/strub-pure1.c: New.
	* c-c++-common/torture/strub-pure2.c: New.
	* c-c++-common/torture/strub-pure3.c: New.
	* c-c++-common/torture/strub-pure4.c: New.
	* c-c++-common/torture/strub-run1.c: New.
	* c-c++-common/torture/strub-run2.c: New.
	* c-c++-common/torture/strub-run3.c: New.
	* c-c++-common/torture/strub-run4.c: New.
	* c-c++-common/torture/strub-run4c.c: New.
	* c-c++-common/torture/strub-run4d.c: New.
	* c-c++-common/torture/strub-run4i.c: New.
	* g++.dg/strub-run1.C: New.
	* g++.dg/torture/strub-init1.C: New.
	* g++.dg/torture/strub-init2.C: New.
	* g++.dg/torture/strub-init3.C: New.
	* gnat.dg/strub_attr.adb, gnat.dg/strub_attr.ads: New.
	* gnat.dg/strub_ind.adb, gnat.dg/strub_ind.ads: New.

for  libgcc/ChangeLog

	* Makefile.in (LIB2ADD): Add strub.c.
	* libgcc2.h (__strub_enter, __strub_update, __strub_leave):
	Declare.
	* strub.c: New.
	* libgcc-std.ver.in (__strub_enter): Add to GCC_14.0.0.
	(__strub_update, __strub_leave): Likewise.
2023-12-05 21:07:36 -03:00
Jonathan Wakely
08448dc146 libstdc++: Add workaround to std::ranges::subrange [PR111948]
libstdc++-v3/ChangeLog:

	PR libstdc++/111948
	* include/bits/ranges_util.h (subrange): Add constructor to
	_Size to aoid setting member in constructor.
	* testsuite/std/ranges/subrange/111948.cc: New test.
2023-12-05 23:34:12 +00:00
Jonathan Wakely
45630fbcf7 libstdc++: Implement LWG 4016 for std::ranges::to
This implements the proposed resolution of LWG 4016, so that
std::ranges::to does not use std::back_inserter and std::inserter.
Instead it inserts at the back of the container directly, using
the first supported one of emplace_back, push_back, emplace, and insert.

Using emplace avoids creating a temporary that has to be moved into the
container, for cases where the source range and the destination
container do not have the same value type.

libstdc++-v3/ChangeLog:

	* include/std/ranges (__detail::__container_insertable): Remove.
	(__detail::__container_inserter): Remove.
	(ranges::to): Use emplace_back or emplace, as per LWG 4016.
	* testsuite/std/ranges/conv/1.cc (Cont4, test_2_1_4): Check for
	use of emplace_back and emplace.
2023-12-05 23:34:12 +00:00
Jonathan Wakely
5e8a30d8b8 libstdc++: Redefine __glibcxx_assert to work in C++23 constexpr
The changes in r14-5979 to support unknown references in constant
expressions caused some test regressions. The way that __glibcxx_assert
is defined for constant evaluation no longer works when
_GLIBCXX_ASSERTIONS is defined.

This change simplifies __glibcxx_assert so that there is only one check,
rather than a constexpr one and a conditionally-enabled runtime one. The
constexpr one does not need to use __builtin_unreachable to cause a
compilation failure, because __glibcxx_assert_fail is not usable in
constant expressions, so that will cause a failure too.

As well as fixing the regressions, this makes the code for the
assertions shorter and simpler, so should be quicker to compile, and
might inline better too.

libstdc++-v3/ChangeLog:

	* include/bits/c++config (__glibcxx_assert_fail): Declare even
	when assertions are not enabled.
	(__glibcxx_constexpr_assert): Remove macro.
	(__glibcxx_assert_impl): Remove macro.
	(_GLIBCXX_ASSERT_FAIL): New macro.
	(_GLIBCXX_DO_ASSERT): New macro.
	(__glibcxx_assert): Simplify to a single definition that works
	at runtime and during constant evaluation.
	* testsuite/21_strings/basic_string_view/element_access/char/back_constexpr_neg.cc:
	Adjust expected errors.
	* testsuite/21_strings/basic_string_view/element_access/char/constexpr_neg.cc:
	Likewise.
	* testsuite/21_strings/basic_string_view/element_access/char/front_constexpr_neg.cc:
	Likewise.
	* testsuite/21_strings/basic_string_view/element_access/wchar_t/back_constexpr_neg.cc:
	Likewise.
	* testsuite/21_strings/basic_string_view/element_access/wchar_t/constexpr_neg.cc:
	Likewise.
	* testsuite/21_strings/basic_string_view/element_access/wchar_t/front_constexpr_neg.cc:
	Likewise.
	* testsuite/21_strings/basic_string_view/modifiers/remove_prefix/debug.cc:
	Likewise.
	* testsuite/21_strings/basic_string_view/modifiers/remove_suffix/debug.cc:
	Likewise.
	* testsuite/23_containers/span/back_neg.cc: Likewise.
	* testsuite/23_containers/span/front_neg.cc: Likewise.
	* testsuite/23_containers/span/index_op_neg.cc: Likewise.
	* testsuite/26_numerics/lcm/105844.cc: Likewise.
2023-12-05 23:33:22 +00:00
Juzhe-Zhong
2e7abd0962 RISC-V: Block VLSmodes according to TARGET_MAX_LMUL and BITS_PER_RISCV_VECTOR
This patch fixes ICE mentioned on PR112851 and PR112852.
Actually these ICEs happens many times in full coverage testing.

The ICE happens on:

bug.c:84:1: internal compiler error: in partial_subreg_p, at rtl.h:3187
   84 | }
      | ^
0x11a7271 partial_subreg_p(machine_mode, machine_mode)
        ../../../../gcc/gcc/rtl.h:3187

gcc_checking_assert (ordered_p (outer_prec, inner_prec));

outer_prec is the PRECISION of RVVM1SImode
inner_prec is the PRECISION of V64SImode

when it is zvl512b.

outer_prec is VLA mode with size (512, 512)
inner_prec is VLS mode with size (2048, 0)

Their precision/size relationship is not certain.
So block VLSmodes according to TARGET_MAX_LMUL and BITS_PER_RISCV_VECTOR, then we never reaches
the situation that comparing the precision/size between VLA size and VLS size that size > coeffs[0] of VLA mode.

Note this patch cause following regression:

FAIL: gcc.target/riscv/rvv/autovec/pr111751.c -O3 -ftree-vectorize  scan-assembler-not vset
FAIL: gcc.target/riscv/rvv/autovec/pr111751.c -O3 -ftree-vectorize  scan-assembler-times li\\s+[a-x0-9]+,0\\s+ret 2

FAIL: gcc.target/riscv/rvv/base/cpymem-1.c check-function-bodies f3
FAIL: gcc.target/riscv/rvv/base/cpymem-2.c check-function-bodies f2
FAIL: gcc.target/riscv/rvv/base/cpymem-2.c check-function-bodies f3

1. cpymem check FAIL should be fixed on the testcase since the test is fragile which should be robostified.

2. pr111751.c is Vector cost model issue, and I will fix it in the following patch.

For now, we should land this patch first (highest-priority) since it is fixing ICE.

	PR target/112851
	PR target/112852

gcc/ChangeLog:

	* config/riscv/riscv-v.cc (vls_mode_valid_p): Block VLSmodes according
	TARGET_MAX_LMUL and BITS_PER_RISCV_VECTOR.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/vls/consecutive-1.c: Add LMUL = 8 option.
	* gcc.target/riscv/rvv/autovec/vls/consecutive-2.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/mod-1.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/mov-1.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/mov-10.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/mov-11.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/mov-12.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/mov-13.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/mov-14.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/mov-15.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/mov-16.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/mov-17.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/mov-3.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/mov-5.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/mov-7.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/mov-8.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/mov-9.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/spill-1.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
	* gcc.target/riscv/rvv/autovec/zve32f-1.c: Adapt test.
	* gcc.target/riscv/rvv/autovec/pr112851.c: New test.
	* gcc.target/riscv/rvv/autovec/pr112852.c: New test.
2023-12-06 07:30:28 +08:00
Jakub Jelinek
c73cc6fe62 libiberty: Fix build with GCC < 7
Tobias reported on IRC that the linker fails to build with GCC 4.8.5.
In configure I've tried to use everything actually used in the sha1.c
x86 hw implementation, but unfortunately I forgot about implicit function
declarations.  GCC before 7 did have <cpuid.h> header and bit_SHA define
and __get_cpuid function defined inline, but it didn't define
__get_cpuid_count, which compiled fine (and the configure test is
intentionally compile time only) due to implicit function declaration,
but then failed to link when linking the linker, because
__get_cpuid_count wasn't defined anywhere.

The following patch fixes that by using what autoconf uses in AC_CHECK_DECL
to make sure the functions are declared.

2023-12-05  Jakub Jelinek  <jakub@redhat.com>

	* configure.ac (HAVE_X86_SHA1_HW_SUPPORT): Verify __get_cpuid and
	__get_cpuid_count are not implicitly declared.
	* configure: Regenerated.
2023-12-05 23:32:19 +01:00
David Faust
b8cf266f4c btf: avoid wrong DATASEC entries for extern vars [PR112849]
The process of creating BTF_KIND_DATASEC records involves iterating
through variable declarations, determining which section they will be
placed in, and creating an entry in the appropriate DATASEC record
accordingly.

For variables without e.g. an explicit __attribute__((section)), we use
categorize_decl_for_section () to identify the appropriate named section
and corresponding BTF_KIND_DATASEC record.

This was incorrectly being done for 'extern' variable declarations as
well as non-extern ones, which meant that extern variable declarations
could result in BTF_KIND_DATASEC entries claiming the variable is
allocated in some section such as '.bss' without any knowledge whether
that is actually true. That resulted in errors building the Linux kernel
BPF selftests.

This patch corrects btf_collect_datasec () to avoid assuming a section
for extern variables, and only emit BTF_KIND_DATASEC entries for them if
they have a known section.

gcc/
	PR debug/112849
	* btfout.cc (btf_collect_datasec): Avoid incorrectly creating an
	entry in a BTF_KIND_DATASEC record for extern variable decls without
	a known section.

gcc/testsuite/
	PR debug/112849
	* gcc.dg/debug/btf/btf-datasec-3.c: New test.
2023-12-05 14:02:16 -08:00
Jakub Jelinek
9610ba7b6f libgfortran: Fix -Wincompatible-pointer-types errors
As reported, libgfortran fails to build on targets where int32_t and int
are different types, because it uses int vs. GFC_INTEGER_4 (under hood
int32_t) interchangeably.

The following patch fixes that.

2023-12-05  Florian Weimer  <fweimer@redhat.com>
	    Jakub Jelinek  <jakub@redhat.com>

	* io/list_read.c (list_formatted_read_scalar) <case BT_CLASS>:
	Change types of unit and noiostat to GFC_INTEGER_4 from int, change
	type of child_iostat from to GFC_INTEGER_4 * from int *, formatting
	fixes.
	(nml_read_obj): Likewise.
	* io/write.c (list_formatted_write_scalar) <case BT_CLASS>: Likewise.
	(nml_write_obj): Likewise.
	* io/transfer.c (unformatted_read, unformatted_write): Likewise.
2023-12-05 22:56:41 +01:00
Jakub Jelinek
59be79fd59 c++: Further #pragma GCC unroll C++ fix [PR112795]
When committing the #pragma GCC unroll patch, I found I forgot one spot
for diagnosting the invalid unrolls - if #pragma GCC unroll argument is
dependent and the pragma is before a range for loop, the unroll tree (now,
before one converted form ushort) is saved into RANGE_FOR_UNROLL and
tsubst_stmt was RECURing on it, but didn't diagnose if it was invalid and
so we ICEd later in the middle-end when  ANNOTATE_EXPR had unexpected
argument.

The following patch fixes that.  So that the diagnostics isn't done in 3
different places, the patch introduces a new function that both
cp_parser_pragma_unroll and instantiation of ANNOTATE_EXPR and RANGE_FOR_STMT
can use.

2023-12-05  Jakub Jelinek  <jakub@redhat.com>

	PR c++/112795
	* cp-tree.h (cp_check_pragma_unroll): Declare.
	* semantics.cc (cp_check_pragma_unroll): New function.
	* parser.cc (cp_parser_pragma_unroll): Use cp_check_pragma_unroll.
	* pt.cc (tsubst_expr) <case ANNOTATE_EXPR>: Likewise.
	(tsubst_stmt) <case RANGE_FOR_STMT>: Likwsie.

	* g++.dg/ext/unroll-2.C: Use { target c++11 } instead of dg-skip-if for
	-std=gnu++98.
	* g++.dg/ext/unroll-3.C: Likewise.
	* g++.dg/ext/unroll-7.C: New test.
	* g++.dg/ext/unroll-8.C: New test.
2023-12-05 22:54:08 +01:00
Jakub Jelinek
58d5546af9 rs6000: Canonicalize copysign (x, -1) back to -abs (x) in the backend [PR112606]
The middle-end has been changed quite recently to canonicalize
-abs (x) to copysign (x, -1) rather than the other way around.
While I agree with that at GIMPLE level, since it matches the GIMPLE
goal of as few operations as possible for a canonical form (-abs (x)
is 2 GIMPLE statements, copysign (x, -1) is just one), I must say
I don't really like that being done on RTL as well (or at least
not canonicalizing (COPYSIGN x, negative) back to (NEG (ABS x))),
because on most targets most of floating point constants need to be loaded
from memory, there are a few exceptions but -1 is often not one of them.

Anyway, the following patch fixes the rs6000 regression caused by the
change in GIMPLE canonicalization (i.e. the desirable one).  As rs6000
clearly prefers -abs (x) form because it has a single instruction to do
that while it also has copysign instruction, but that requires loading the
-1 from memory, the following patch just ensures the copysign expander
can actually see the floating point constant and in that case emits the
-abs (x) code (or in the hypothetical case of copysign with non-negative
constant abs (x) - but there copysign (x, 1) in GIMPLE is canonicalized
to abs (x)), otherwise forces the operand to be the expected gpc_reg_operand
and does what it did before.

2023-12-05  Jakub Jelinek  <jakub@redhat.com>

	PR target/112606
	* config/rs6000/rs6000.md (copysign<mode>3): Change predicate
	of the last argument from gpc_reg_operand to any_operand.  If
	operands[2] is CONST_DOUBLE, emit abs or neg abs depending on
	its sign, otherwise if it doesn't satisfy gpc_reg_operand,
	force it to REG using copy_to_mode_reg.
2023-12-05 21:39:31 +01:00
Harald Anlauf
9c3a880fee Fortran: allow RESTRICT qualifier also for optional arguments [PR100988]
gcc/fortran/ChangeLog:

	PR fortran/100988
	* gfortran.h (IS_PROC_POINTER): New macro.
	* trans-types.cc (gfc_sym_type): Use macro in determination if the
	restrict qualifier can be used for a dummy variable.  Fix logic to
	allow the restrict qualifier also for optional arguments, and to
	not apply it to pointer or proc_pointer arguments.

gcc/testsuite/ChangeLog:

	PR fortran/100988
	* gfortran.dg/coarray_poly_6.f90: Adjust pattern.
	* gfortran.dg/coarray_poly_7.f90: Likewise.
	* gfortran.dg/coarray_poly_8.f90: Likewise.
	* gfortran.dg/missing_optional_dummy_6a.f90: Likewise.
	* gfortran.dg/pr100988.f90: New test.

Co-authored-by: Tobias Burnus  <tobias@codesourcery.com>
2023-12-05 19:16:19 +01:00
Richard Sandiford
1dad3df1e7 Restore build with GCC 4.8 to GCC 5
GCC 5 and earlier applied array-to-pointer decay too early,
which affected the new attribute namespace code.  A reduced
example of the construct that the attribute code uses is:

    struct S { template<__SIZE_TYPE__ N> S(int (&)[N]); };
    struct T { int a; S b; };
    int a[] = { 1 };
    T t = { 1, a };

This was fixed by f85e1317f8
(PR 16333 et al).

This patch tries to add a minimally-invasive workaround.

gcc/ada/
	* gcc-interface/utils.cc (gnat_internal_attribute_table): Add extra
	braces to work around PR 16333 in older compilers.

gcc/
	* attribs.cc (handle_ignored_attributes_option): Add extra
	braces to work around PR 16333 in older compilers.
	* config/aarch64/aarch64.cc (aarch64_gnu_attribute_table): Likewise.
	(aarch64_arm_attribute_table): Likewise.
	* config/arm/arm.cc (arm_gnu_attribute_table): Likewise.
	* config/i386/i386-options.cc (ix86_gnu_attribute_table): Likewise.
	* config/ia64/ia64.cc (ia64_gnu_attribute_table): Likewise.
	* config/rs6000/rs6000.cc (rs6000_gnu_attribute_table): Likewise.
	* target-def.h (TARGET_GNU_ATTRIBUTES): Likewise.
	* genhooks.cc (emit_init_macros): Likewise, when emitting the
	instantiation of TARGET_ATTRIBUTE_TABLE.
	* langhooks-def.h (LANG_HOOKS_INITIALIZER): Likewise, when
	instantiating LANG_HOOKS_ATTRIBUTE_TABLE.
	(LANG_HOOKS_ATTRIBUTE_TABLE): Define to be empty by default.
	* target.def (attribute_table): Likewise.

gcc/c-family/
	* c-attribs.cc (c_common_gnu_attribute_table): Add extra
	braces to work around PR 16333 in older compilers.

gcc/c/
	* c-decl.cc (std_attribute_table): Add extra braces to work
	around PR 16333 in older compilers.

gcc/cp/
	* tree.cc (cxx_gnu_attribute_table): Add extra braces to work
	around PR 16333 in older compilers.

gcc/d/
	* d-attribs.cc (d_langhook_common_attribute_table): Add extra braces
	to work around PR 16333 in older compilers.
	(d_langhook_gnu_attribute_table): Likewise.

gcc/fortran/
	* f95-lang.cc (gfc_gnu_attribute_table): Add extra braces to work
	around PR 16333 in older compilers.

gcc/jit/
	* dummy-frontend.cc (jit_gnu_attribute_table): Add extra braces
	to work around PR 16333 in older compilers.
	(jit_format_attribute_table): Likewise.

gcc/lto/
	* lto-lang.cc (lto_gnu_attribute_table): Add extra braces to work
	around PR 16333 in older compilers.
	(lto_format_attribute_table): Likewise.
2023-12-05 17:53:50 +00:00
Jonathan Wakely
3cd73543a1 libstdc++: Disable std::formatter::set_debug_format [PR112832]
All set_debug_format member functions should be guarded by the
__cpp_lib_formatting_ranges macro (which is not defined yet).

libstdc++-v3/ChangeLog:

	PR libstdc++/112832
	* include/std/format (formatter::set_debug_format): Ensure this
	member is defined conditionally for all specializations.
	* testsuite/std/format/formatter/112832.cc: New test.
2023-12-05 16:40:43 +00:00
Will Hawkins
9fff752695 libstdc++: Add test for LWG Issue 3897
Add a test to verify that the implementation of inout_ptr is not
vulnerable to LWG Issue 3897.

libstdc++-v3/ChangeLog:

	* testsuite/20_util/smartptr.adapt/inout_ptr/2.cc: Add check
	for LWG Issue 3897.

Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
2023-12-05 16:40:43 +00:00
Jakub Jelinek
e5153e7d63 c++: Implement C++ DR 2262 - Attributes for asm-definition [PR110734]
Seems in 2017 attribute-specifier-seq[opt] was added to asm-declaration
and the change was voted in as a DR.

The following patch implements it by parsing the attributes and warning
about them.

I found one attribute parsing bug I'll send a fix for momentarily.

And there is another thing I wonder about: with -Wno-attributes= we are
supposed to ignore the attributes altogether, but we are actually still
warning about them when we emit these generic warnings about ignoring
all attributes which appertain to this and that (perhaps with some
exceptions we first remove from the attribute chain), like:
void foo () { [[foo::bar]]; }
with -Wattributes -Wno-attributes=foo::bar
Shouldn't we call some helper function in cases like this and warn
not when std_attrs (or how the attribute chain var is called) is non-NULL,
but if it is non-NULL and contains at least one non-attribute_ignored_p
attribute?  cp_parser_declaration at least tries:
      if (std_attrs != NULL_TREE && !attribute_ignored_p (std_attrs))
        warning_at (make_location (attrs_loc, attrs_loc, parser->lexer),
                    OPT_Wattributes, "attribute ignored");
but attribute_ignored_p here checks the first attribute rather than the
whole chain.  So it will incorrectly not warn if there is an ignored
attribute followed by non-ignored.

2023-12-05  Jakub Jelinek  <jakub@redhat.com>

	PR c++/110734
	* parser.cc (cp_parser_block_declaration): Implement C++ DR 2262
	- Attributes for asm-definition.  Call cp_parser_asm_definition
	even if RID_ASM token is only seen after sequence of standard
	attributes.
	(cp_parser_asm_definition): Parse standard attributes before
	RID_ASM token and warn for them with -Wattributes.

	* g++.dg/DRs/dr2262.C: New test.
	* g++.dg/cpp0x/gen-attrs-76.C (foo, bar): Don't expect errors
	on attributes on asm definitions.
	* g++.dg/gomp/attrs-11.C: Remove 2 expected errors.
2023-12-05 17:38:46 +01:00
Richard Biener
d9403153f9 middle-end/112860 - -fgimple can skip ISEL
The following makes sure we don't skip ISEL.

	PR middle-end/112860
	* passes.cc (should_skip_pass_p): Do not skip ISEL.
2023-12-05 15:54:29 +01:00
Gaius Mulley
805be8fbea PR modula2/112865 IM and RE fails to skip type equivalences
This patch skip type equivalences when checking IM and RE
ISO M2 standard functions for complex data type operands.

gcc/m2/ChangeLog:

	PR modula2/112865
	* gm2-compiler/M2Quads.mod (BuildReFunction): Use
	GetDType to retrieve the type of the operand when
	converting the complex type to its scalar equivalent.
	(BuildImFunction): Use GetDType to retrieve the type of the
	operand when converting the complex type to its scalar
	equivalent.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-12-05 14:54:00 +00:00
Richard Biener
7e40497805 sanitizer/111736 - skip ASAN for globals in alternate address-space
PR sanitizer/111736
	* asan.cc (asan_protect_global): Do not protect globals
	in non-generic address-space.
2023-12-05 15:07:49 +01:00
Richard Biener
1e6c4aa479 ipa/92606 - IPA ICF merging variables in different address-space
The following aovids merging variables that are put in different
address-spaces.

	PR ipa/92606
	* ipa-icf.cc (sem_variable::equals_wpa): Compare address-spaces.
2023-12-05 14:55:15 +01:00
Richard Biener
68d32d0203 middle-end/112830 - avoid gimplifying non-default addr-space assign to memcpy
The following avoids turning aggregate copy involving non-default
address-spaces to memcpy since that is not prepared for that.

GIMPLE verification no longer accepts WITH_SIZE_EXPR in aggregate
copies, the following re-allows that for the RHS.  I also needed
to adjust one assert in DCE.

get_memory_address is used for string builtin expansion, so instead
of fixing that up for non-generic address-spaces I've put an assert
there.

I'll note that the same issue exists for initialization from an
empty CTOR which we gimplify to a memset call but since we are
not prepared to handle RTL expansion of the original VLA init and
I failed to provide test coverage (without extending the GNU C
extension for VLA structs) and the Ada frontend (or other frontends)
to not have address-space support the patch instead asserts we only
see generic address-spaces there.

	PR middle-end/112830
	* gimplify.cc (gimplify_modify_expr): Avoid turning aggregate
	copy of non-generic address-spaces to memcpy.
	(gimplify_modify_expr_to_memcpy): Assert we are dealing with
	a copy inside the generic address-space.
	(gimplify_modify_expr_to_memset): Likewise.
	* tree-cfg.cc (verify_gimple_assign_single): Allow
	WITH_SIZE_EXPR as part of the RHS of an assignment.
	* builtins.cc (get_memory_address): Assert we are dealing
	with the generic address-space.
	* tree-ssa-dce.cc (ref_may_be_aliased): Handle WITH_SIZE_EXPR.

	* gcc.target/avr/pr112830.c: New testcase.
	* gcc.target/i386/pr112830.c: Likewise.
2023-12-05 14:51:34 +01:00
Richard Biener
8ff02df629 tree-optimization/112856 - fix LC SSA after loop header copying
When loop header copying unloops loops we have to possibly fixup
LC SSA.  I've take the opportunity to streamline the unloop_loops
API, removing the use of a ivcanon local global variable.

	PR tree-optimization/109689
	PR tree-optimization/112856
	* cfgloopmanip.h (unloop_loops): Adjust API.
	* tree-ssa-loop-ivcanon.cc (unloop_loops): Take edges_to_remove
	as parameter.
	(canonicalize_induction_variables): Adjust.
	(tree_unroll_loops_completely): Likewise.
	* tree-ssa-loop-ch.cc (ch_base::copy_headers): Rewrite into
	LC SSA if we unlooped some loops and we are in LC SSA.

	* gcc.dg/torture/pr109689.c: New testcase.
	* gcc.dg/torture/pr112856.c: Likewise.
2023-12-05 14:12:12 +01:00
Jakub Jelinek
e0786ca9a1 i386: Fix -fcf-protection -Os ICE due to movabsq peephole2 [PR112845]
The following testcase ICEs in the movabsq $(i32 << shift), r64 peephole2
I've added a while back to use smaller code than movabsq if possible.
If i32 is 0xfa1e0ff3 and shift is not divisible by 8, then it creates
an invalid insn (as 0xfa1e0ff3 CONST_INT is not allowed as
x86_64_immediate_operand nor x86_64_zext_immediate_operand), the peephole2
even triggers on it again and again (this time with shift 0) until it gives
up.

The following patch fixes that.  As ix86_endbr_immediate_operand needs a
CONST_INT and it is hopefully rare, I chose to use FAIL rather than handling
it in the condition (where I'd probably need to call ctz_hwi again etc.).

2023-12-05  Jakub Jelinek  <jakub@redhat.com>

	PR target/112845
	* config/i386/i386.md (movabsq $(i32 << shift), r64 peephole2): FAIL
	if the new immediate is ix86_endbr_immediate_operand.
2023-12-05 13:17:57 +01:00
Richard Sandiford
c1c267dfcd aarch64: Add support for SME2 intrinsics
This patch adds support for the SME2 <arm_sme.h> intrinsics.  The
convention I've used is to put stuff in aarch64-sve-builtins-sme.*
if it relates to ZA, ZT0, the streaming vector length, or other
such SME state.  Things that operate purely on predicates and
vectors go in aarch64-sve-builtins-sve2.* instead.  Some of these
will later be picked up for SVE2p1.

We previously used Uph internally as a constraint for 16-bit
immediates to atomic instructions.  However, we need a user-facing
constraint for the upper predicate registers (already available as
PR_HI_REGS), and Uph makes a natural pair with the existing Upl.

gcc/
	* config/aarch64/aarch64.h (TARGET_STREAMING_SME2): New macro.
	(P_ALIASES): Likewise.
	(REGISTER_NAMES): Add pn aliases of the predicate registers.
	(W8_W11_REGNUM_P): New macro.
	(W8_W11_REGS): New register class.
	(REG_CLASS_NAMES, REG_CLASS_CONTENTS): Update accordingly.
	* config/aarch64/aarch64.cc (aarch64_print_operand): Add support
	for %K, which prints a predicate as a counter.  Handle tuples of
	predicates.
	(aarch64_regno_regclass): Handle W8_W11_REGS.
	(aarch64_class_max_nregs): Likewise.
	* config/aarch64/constraints.md (Uci, Uw2, Uw4): New constraints.
	(x, y): Move further up file.
	(Uph): Redefine as the high predicate registers, renaming the old
	constraint to...
	(Uih): ...this.
	* config/aarch64/predicates.md (const_0_to_7_operand): New predicate.
	(const_0_to_4_step_4_operand, const_0_to_6_step_2_operand): Likewise.
	(const_0_to_12_step_4_operand, const_0_to_14_step_2_operand): Likewise.
	(aarch64_simd_shift_imm_qi): Use const_0_to_7_operand.
	* config/aarch64/iterators.md (VNx16SI_ONLY, VNx8SI_ONLY)
	(VNx8DI_ONLY, SVE_FULL_BHSIx2, SVE_FULL_HF, SVE_FULL_SIx2_SDIx4)
	(SVE_FULL_BHS, SVE_FULLx24, SVE_DIx24, SVE_BHSx24, SVE_Ix24)
	(SVE_Fx24, SVE_SFx24, SME_ZA_BIx24, SME_ZA_BHIx124, SME_ZA_BHIx24)
	(SME_ZA_HFx124, SME_ZA_HFx24, SME_ZA_HIx124, SME_ZA_HIx24)
	(SME_ZA_SDIx24, SME_ZA_SDFx24): New mode iterators.
	(UNSPEC_REVD, UNSPEC_CNTP_C, UNSPEC_PEXT, UNSPEC_PEXTx2): New unspecs.
	(UNSPEC_PSEL, UNSPEC_PTRUE_C, UNSPEC_SQRSHR, UNSPEC_SQRSHRN)
	(UNSPEC_SQRSHRU, UNSPEC_SQRSHRUN, UNSPEC_UQRSHR, UNSPEC_UQRSHRN)
	(UNSPEC_UZP, UNSPEC_UZPQ, UNSPEC_ZIP, UNSPEC_ZIPQ, UNSPEC_BFMLSLB)
	(UNSPEC_BFMLSLT, UNSPEC_FCVTN, UNSPEC_FDOT, UNSPEC_SQCVT): Likewise.
	(UNSPEC_SQCVTN, UNSPEC_SQCVTU, UNSPEC_SQCVTUN, UNSPEC_UQCVT): Likewise.
	(UNSPEC_SME_ADD, UNSPEC_SME_ADD_WRITE, UNSPEC_SME_BMOPA): Likewise.
	(UNSPEC_SME_BMOPS, UNSPEC_SME_FADD, UNSPEC_SME_FDOT, UNSPEC_SME_FVDOT)
	(UNSPEC_SME_FMLA, UNSPEC_SME_FMLS, UNSPEC_SME_FSUB, UNSPEC_SME_READ)
	(UNSPEC_SME_SDOT, UNSPEC_SME_SVDOT, UNSPEC_SME_SMLA, UNSPEC_SME_SMLS)
	(UNSPEC_SME_SUB, UNSPEC_SME_SUB_WRITE, UNSPEC_SME_SUDOT): Likewise.
	(UNSPEC_SME_SUVDOT, UNSPEC_SME_UDOT, UNSPEC_SME_UVDOT): Likewise.
	(UNSPEC_SME_UMLA, UNSPEC_SME_UMLS, UNSPEC_SME_USDOT): Likewise.
	(UNSPEC_SME_USVDOT, UNSPEC_SME_WRITE): Likewise.
	(Vetype, VNARROW, V2XWIDE, Ventype, V_INT_EQUIV, v_int_equiv)
	(VSINGLE, vsingle, b): Add tuple modes.
	(v2xwide, za32_offset_range, za64_offset_range, za32_long)
	(za32_last_offset, vg_modifier, z_suffix, aligned_operand)
	(aligned_fpr): New mode attributes.
	(SVE_INT_BINARY_MULTI, SVE_INT_BINARY_SINGLE, SVE_INT_BINARY_MULTI)
	(SVE_FP_BINARY_MULTI): New int iterators.
	(SVE_BFLOAT_TERNARY_LONG): Add UNSPEC_BFMLSLB and UNSPEC_BFMLSLT.
	(SVE_BFLOAT_TERNARY_LONG_LANE): Likewise.
	(SVE_WHILE_ORDER, SVE2_INT_SHIFT_IMM_NARROWxN, SVE_QCVTxN)
	(SVE2_SFx24_UNARY, SVE2_x24_PERMUTE, SVE2_x24_PERMUTEQ)
	(UNSPEC_REVD_ONLY, SME2_INT_MOP, SME2_BMOP, SME_BINARY_SLICE_SDI)
	(SME_BINARY_SLICE_SDF, SME_BINARY_WRITE_SLICE_SDI, SME_INT_DOTPROD)
	(SME_INT_DOTPROD_LANE, SME_FP_DOTPROD, SME_FP_DOTPROD_LANE)
	(SME_INT_TERNARY_SLICE, SME_FP_TERNARY_SLICE, BHSD_BITS)
	(LUTI_BITS): New int iterators.
	(optab, sve_int_op): Handle the new unspecs.
	(sme_int_op, has_16bit_form): New int attributes.
	(bits_etype): Handle 64.
	* config/aarch64/aarch64.md (UNSPEC_LD1_SVE_COUNT): New unspec.
	(UNSPEC_ST1_SVE_COUNT, UNSPEC_LDNT1_SVE_COUNT): Likewise.
	(UNSPEC_STNT1_SVE_COUNT): Likewise.
	* config/aarch64/atomics.md (cas_short_expected_imm): Use Uhi
	rather than Uph for HImode immediates.
	* config/aarch64/aarch64-sve.md (@aarch64_ld1<SVE_FULLx24:mode>)
	(@aarch64_ldnt1<SVE_FULLx24:mode>, @aarch64_st1<SVE_FULLx24:mode>)
	(@aarch64_stnt1<SVE_FULLx24:mode>): New patterns.
	(@aarch64_<sur>dot_prod_lane<vsi2qi>): Extend to...
	(@aarch64_<sur>dot_prod_lane<SVE_FULL_SDI:mode><SVE_FULL_BHI:mode>)
	(@aarch64_<sur>dot_prod_lane<VNx4SI_ONLY:mode><VNx16QI_ONLY:mode>):
	...these new patterns.
	(SVE_WHILE_B, SVE_WHILE_B_X2, SVE_WHILE_C): New constants.  Add
	SVE_WHILE_B to existing while patterns.
	* config/aarch64/aarch64-sve2.md (@aarch64_sve_ptrue_c<BHSD_BITS>)
	(@aarch64_sve_pext<BHSD_BITS>, @aarch64_sve_pext<BHSD_BITS>x2)
	(@aarch64_sve_psel<BHSD_BITS>, *aarch64_sve_psel<BHSD_BITS>_plus)
	(@aarch64_sve_cntp_c<BHSD_BITS>, <frint_pattern><mode>2)
	(<optab><mode>3, *<optab><mode>3, @aarch64_sve_single_<optab><mode>)
	(@aarch64_sve_<sve_int_op><mode>): New patterns.
	(@aarch64_sve_single_<sve_int_op><mode>, @aarch64_sve_<su>clamp<mode>)
	(*aarch64_sve_<su>clamp<mode>_x, @aarch64_sve_<su>clamp_single<mode>)
	(@aarch64_sve_fclamp<mode>, *aarch64_sve_fclamp<mode>_x)
	(@aarch64_sve_fclamp_single<mode>, <optab><mode><v2xwide>2)
	(@aarch64_sve_<sur>dotvnx4sivnx8hi): New patterns.
	(@aarch64_sve_<maxmin_uns_op><mode>): Likewise.
	(*aarch64_sve_<maxmin_uns_op><mode>): Likewise.
	(@aarch64_sve_single_<maxmin_uns_op><mode>): Likewise.
	(aarch64_sve_fdotvnx4sfvnx8hf): Likewise.
	(aarch64_fdot_prod_lanevnx4sfvnx8hf): Likewise.
	(@aarch64_sve_<optab><VNx16QI_ONLY:mode><VNx16SI_ONLY:mode>): Likewise.
	(@aarch64_sve_<optab><VNx8HI_ONLY:mode><VNx8SI_ONLY:mode>): Likewise.
	(@aarch64_sve_<optab><VNx8HI_ONLY:mode><VNx8DI_ONLY:mode>): Likewise.
	(truncvnx8sf<mode>2, @aarch64_sve_cvtn<mode>): Likewise.
	(<optab><v_int_equiv><mode>2, <optab><mode><v_int_equiv>2): Likewise.
	(@aarch64_sve_sel<mode>): Likewise.
	(@aarch64_sve_while<while_optab_cmp>_b<BHSD_BITS>_x2): Likewise.
	(@aarch64_sve_while<while_optab_cmp>_c<BHSD_BITS>): Likewise.
	(@aarch64_pred_<optab><mode>, @cond_<optab><mode>): Likewise.
	(@aarch64_sve_<optab><mode>): Likewise.
	* config/aarch64/aarch64-sme.md (@aarch64_sme_<optab><mode><mode>)
	(*aarch64_sme_<optab><mode><mode>_plus, @aarch64_sme_read<mode>)
	(*aarch64_sme_read<mode>_plus, @aarch64_sme_write<mode>): New patterns.
	(*aarch64_sme_write<mode>_plus aarch64_sme_zero_zt0): Likewise.
	(@aarch64_sme_<optab><mode>, *aarch64_sme_<optab><mode>_plus)
	(@aarch64_sme_single_<optab><mode>): Likewise.
	(*aarch64_sme_single_<optab><mode>_plus): Likewise.
	(@aarch64_sme_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>)
	(*aarch64_sme_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus)
	(@aarch64_sme_single_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>)
	(*aarch64_sme_single_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus)
	(@aarch64_sme_single_sudot<VNx4SI_ONLY:mode><SME_ZA_BIx24:mode>)
	(*aarch64_sme_single_sudot<VNx4SI_ONLY:mode><SME_ZA_BIx24:mode>_plus)
	(@aarch64_sme_lane_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>)
	(*aarch64_sme_lane_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus)
	(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_BHI:mode>)
	(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_BHI:mode>_plus)
	(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>)
	(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>_plus)
	(@aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>)
	(*aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>_plus)
	(@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx124:mode>)
	(*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx124:mode>)
	(@aarch64_sme_<optab><VNx2DI_ONLY:mode><VNx8HI_ONLY:mode>)
	(*aarch64_sme_<optab><VNx2DI_ONLY:mode><VNx8HI_ONLY:mode>_plus)
	(@aarch64_sme_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>)
	(*aarch64_sme_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>_plus)
	(@aarch64_sme_single_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>)
	(*aarch64_sme_single_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>_plus)
	(@aarch64_sme_lane_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx124:mode>)
	(*aarch64_sme_lane_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx124:mode>)
	(@aarch64_sme_<optab><VNx4SI_ONLY:mode><VNx8HI_ONLY:mode>)
	(@aarch64_sme_<optab><VNx4SI_ONLY:mode><VNx4SI_ONLY:mode>)
	(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>)
	(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus)
	(@aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>)
	(*aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus)
	(@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>)
	(*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus)
	(@aarch64_sme_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
	(*aarch64_sme_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>_plus)
	(@aarch64_sme_single_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
	(*aarch64_sme_single_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>_plus)
	(@aarch64_sme_lane_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
	(*aarch64_sme_lane_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
	(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_HF:mode>)
	(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_HF:mode>_plus)
	(@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx124:mode>)
	(*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx124:mode>)
	(@aarch64_sme_lut<LUTI_BITS><mode>): Likewise.
	(UNSPEC_SME_LUTI): New unspec.
	* config/aarch64/aarch64-sve-builtins.def (single): New mode suffix.
	(c8, c16, c32, c64): New type suffixes.
	(vg1x2, vg1x4, vg2, vg2x1, vg2x2, vg2x4, vg4, vg4x1, vg4x2)
	(vg4x4): New group suffixes.
	* config/aarch64/aarch64-sve-builtins.h (CP_READ_ZT0)
	(CP_WRITE_ZT0): New constants.
	(get_svbool_t): Delete.
	(function_resolver::report_mismatched_num_vectors): New member
	function.
	(function_resolver::resolve_conversion): Likewise.
	(function_resolver::infer_predicate_type): Likewise.
	(function_resolver::infer_64bit_scalar_integer_pair): Likewise.
	(function_resolver::require_matching_predicate_type): Likewise.
	(function_resolver::require_nonscalar_type): Likewise.
	(function_resolver::finish_opt_single_resolution): Likewise.
	(function_resolver::require_derived_vector_type): Add an
	expected_num_vectors parameter.
	(function_expander::map_to_rtx_codes): Add an extra parameter
	for unconditional FP unspecs.
	(function_instance::gp_type_index): New member function.
	(function_instance::gp_type): Likewise.
	(function_instance::gp_mode): Handle multi-vector operations.
	* config/aarch64/aarch64-sve-builtins.cc (TYPES_all_count)
	(TYPES_all_pred_count, TYPES_c, TYPES_bhs_data, TYPES_bhs_widen)
	(TYPES_hs_data, TYPES_cvt_h_s_float, TYPES_cvt_s_s, TYPES_qcvt_x2)
	(TYPES_qcvt_x4, TYPES_qrshr_x2, TYPES_qrshru_x2, TYPES_qrshr_x4)
	(TYPES_qrshru_x4, TYPES_while_x, TYPES_while_x_c, TYPES_s_narrow_fsu)
	(TYPES_za_s_b_signed, TYPES_za_s_b_unsigned, TYPES_za_s_b_integer)
	(TYPES_za_s_h_integer, TYPES_za_s_h_data, TYPES_za_s_unsigned)
	(TYPES_za_s_float, TYPES_za_s_data, TYPES_za_d_h_integer): New type
	macros.
	(groups_x2, groups_x12, groups_x4, groups_x24, groups_x124)
	(groups_vg1x2, groups_vg1x4, groups_vg1x24, groups_vg2, groups_vg4)
	(groups_vg24): New group arrays.
	(function_instance::reads_global_state_p): Handle CP_READ_ZT0.
	(function_instance::modifies_global_state_p): Handle CP_WRITE_ZT0.
	(add_shared_state_attribute): Handle zt0 state.
	(function_builder::add_overloaded_functions): Skip MODE_single
	for non-tuple groups.
	(function_resolver::report_mismatched_num_vectors): New function.
	(function_resolver::resolve_to): Add a fallback error message for
	the general two-type case.
	(function_resolver::resolve_conversion): New function.
	(function_resolver::infer_predicate_type): Likewise.
	(function_resolver::infer_64bit_scalar_integer_pair): Likewise.
	(function_resolver::require_matching_predicate_type): Likewise.
	(function_resolver::require_matching_vector_type): Specifically
	diagnose mismatched vector counts.
	(function_resolver::require_derived_vector_type): Add an
	expected_num_vectors parameter.  Extend to handle cases where
	tuples are expected.
	(function_resolver::require_nonscalar_type): New function.
	(function_resolver::check_gp_argument): Use gp_type_index rather
	than hard-coding VECTOR_TYPE_svbool_t.
	(function_resolver::finish_opt_single_resolution): New function.
	(function_checker::require_immediate_either_or): Remove hard-coded
	constants.
	(function_expander::direct_optab_handler): New function.
	(function_expander::use_pred_x_insn): Only add a strictness flag
	is the insn has an operand for it.
	(function_expander::map_to_rtx_codes): Take an unconditional
	FP unspec as an extra parameter.  Handle tuples and MODE_single.
	(function_expander::map_to_unspecs): Handle tuples and MODE_single.
	* config/aarch64/aarch64-sve-builtins-functions.h (read_zt0)
	(write_zt0): New typedefs.
	(full_width_access::memory_vector): Use the function's
	vectors_per_tuple.
	(rtx_code_function_base): Add an optional unconditional FP unspec.
	(rtx_code_function::expand): Update accordingly.
	(rtx_code_function_rotated::expand): Likewise.
	(unspec_based_function_exact_insn::expand): Use tuple_mode instead
	of vector_mode.
	(unspec_based_uncond_function): New typedef.
	(cond_or_uncond_unspec_function): New class.
	(sme_1mode_function::expand): Handle single forms.
	(sme_2mode_function_t): Likewise, adding a template parameter for them.
	(sme_2mode_function): Update accordingly.
	(sme_2mode_lane_function): New typedef.
	(multireg_permute): New class.
	(class integer_conversion): Likewise.
	(while_comparison::expand): Handle svcount_t and svboolx2_t results.
	* config/aarch64/aarch64-sve-builtins-shapes.h
	(binary_int_opt_single_n, binary_opt_single_n, binary_single)
	(binary_za_slice_lane, binary_za_slice_int_opt_single)
	(binary_za_slice_opt_single, binary_za_slice_uint_opt_single)
	(binaryx, clamp, compare_scalar_count, count_pred_c)
	(dot_za_slice_int_lane, dot_za_slice_lane, dot_za_slice_uint_lane)
	(extract_pred, inherent_zt, ldr_zt, read_za, read_za_slice)
	(select_pred, shift_right_imm_narrowxn, storexn, str_zt)
	(unary_convertxn, unary_za_slice, unaryxn, write_za)
	(write_za_slice): Declare.
	* config/aarch64/aarch64-sve-builtins-shapes.cc
	(za_group_is_pure_overload): New function.
	(apply_predication): Use the function's gp_type for the predicate,
	instead of hard-coding the use of svbool_t.
	(parse_element_type): Add support for "c" (svcount_t).
	(parse_type): Add support for "c0" and "c1" (conversion destination
	and source types).
	(binary_za_slice_lane_base): New class.
	(binary_za_slice_opt_single_base): Likewise.
	(load_contiguous_base::resolve): Pass the group suffix to r.resolve.
	(luti_lane_zt_base): New class.
	(binary_int_opt_single_n, binary_opt_single_n, binary_single)
	(binary_za_slice_lane, binary_za_slice_int_opt_single)
	(binary_za_slice_opt_single, binary_za_slice_uint_opt_single)
	(binaryx, clamp): New shapes.
	(compare_scalar_def::build): Allow the return type to be a tuple.
	(compare_scalar_def::expand): Pass the group suffix to r.resolve.
	(compare_scalar_count, count_pred_c, dot_za_slice_int_lane)
	(dot_za_slice_lane, dot_za_slice_uint_lane, extract_pred, inherent_zt)
	(ldr_zt, read_za, read_za_slice, select_pred, shift_right_imm_narrowxn)
	(storexn, str_zt): New shapes.
	(ternary_qq_lane_def, ternary_qq_opt_n_def): Replace with...
	(ternary_qq_or_011_lane_def, ternary_qq_opt_n_or_011_def): ...these
	new classes.  Allow a second suffix that specifies the type of the
	second vector argument, and that is used to derive the third.
	(unary_def::build): Extend to handle tuple types.
	(unary_convert_def::build): Use the new c0 and c1 format specifiers.
	(unary_convertxn, unary_za_slice, unaryxn, write_za): New shapes.
	(write_za_slice): Likewise.
	* config/aarch64/aarch64-sve-builtins-base.cc (svbic_impl::expand)
	(svext_bhw_impl::expand): Update call to map_to_rtx_costs.
	(svcntp_impl::expand): Handle svcount_t variants.
	(svcvt_impl::expand): Handle unpredicated conversions separately,
	dealing with tuples.
	(svdot_impl::expand): Handle 2-way dot products.
	(svdotprod_lane_impl::expand): Likewise.
	(svld1_impl::fold): Punt on tuple loads.
	(svld1_impl::expand): Handle tuple loads.
	(svldnt1_impl::expand): Likewise.
	(svpfalse_impl::fold): Punt on svcount_t forms.
	(svptrue_impl::fold): Likewise.
	(svptrue_impl::expand): Handle svcount_t forms.
	(svrint_impl): New class.
	(svsel_impl::fold): Punt on tuple forms.
	(svsel_impl::expand): Handle tuple forms.
	(svst1_impl::fold): Punt on tuple loads.
	(svst1_impl::expand): Handle tuple loads.
	(svstnt1_impl::expand): Likewise.
	(svwhilelx_impl::fold): Punt on tuple forms.
	(svdot_lane): Use UNSPEC_FDOT.
	(svmax, svmaxnm, svmin, svminmm): Add unconditional FP unspecs.
	(rinta, rinti, rintm, rintn, rintp, rintx, rintz): Use svrint_impl.
	* config/aarch64/aarch64-sve-builtins-base.def (svcreate2, svget2)
	(svset2, svundef2): Add _b variants.
	(svcvt): Use unary_convertxn.
	(svdot): Use ternary_qq_opt_n_or_011.
	(svdot_lane): Use ternary_qq_or_011_lane.
	(svmax, svmaxnm, svmin, svminnm): Use binary_opt_single_n.
	(svpfalse): Add a form that returns svcount_t results.
	(svrinta, svrintm, svrintn, svrintp): Use unaryxn.
	(svsel): Use binaryxn.
	(svst1, svstnt1): Use storexn.
	* config/aarch64/aarch64-sve-builtins-sme.h
	(svadd_za, svadd_write_za, svbmopa_za, svbmops_za, svdot_za)
	(svdot_lane_za, svldr_zt, svluti2_lane_zt, svluti4_lane_zt)
	(svmla_za, svmla_lane_za, svmls_za, svmls_lane_za, svread_za)
	(svstr_zt, svsub_za, svsub_write_za, svsudot_za, svsudot_lane_za)
	(svsuvdot_lane_za, svusdot_za, svusdot_lane_za, svusvdot_lane_za)
	(svvdot_lane_za, svwrite_za, svzero_zt): Declare.
	* config/aarch64/aarch64-sve-builtins-sme.cc (load_store_za_base):
	Rename to...
	(load_store_za_zt0_base): ...this and extend to tuples.
	(load_za_base, store_za_base): Update accordingly.
	(expand_ldr_str_zt0): New function.
	(svldr_zt_impl, svluti_lane_zt_impl, svread_za_impl, svstr_zt_impl)
	(svsudot_za_impl, svwrite_za_impl, svzero_zt_impl): New classes.
	(svadd_za, svadd_write_za, svbmopa_za, svbmops_za, svdot_za)
	(svdot_lane_za, svldr_zt, svluti2_lane_zt, svluti4_lane_zt)
	(svmla_za, svmla_lane_za, svmls_za, svmls_lane_za, svread_za)
	(svstr_zt, svsub_za, svsub_write_za, svsudot_za, svsudot_lane_za)
	(svsuvdot_lane_za, svusdot_za, svusdot_lane_za, svusvdot_lane_za)
	(svvdot_lane_za, svwrite_za, svzero_zt): New functions.
	* config/aarch64/aarch64-sve-builtins-sme.def: Add SME2 intrinsics.
	* config/aarch64/aarch64-sve-builtins-sve2.h
	(svbfmlslb, svbfmlslb_lane, svbfmlslt, svbfmlslt_lane, svclamp)
	(svcvtn, svpext, svpsel, svqcvt, svqcvtn, svqrshr, svqrshrn)
	(svqrshru, svqrshrun, svrevd, svunpk, svuzp, svuzpq, svzip)
	(svzipq): Declare.
	* config/aarch64/aarch64-sve-builtins-sve2.cc (svclamp_impl)
	(svcvtn_impl, svpext_impl, svpsel_impl): New classes.
	(svqrshl_impl::fold): Update for change to svrshl shape.
	(svrshl_impl::fold): Punt on tuple forms.
	(svsqadd_impl::expand): Update call to map_to_rtx_codes.
	(svunpk_impl): New class.
	(svbfmlslb, svbfmlslb_lane, svbfmlslt, svbfmlslt_lane, svclamp)
	(svcvtn, svpext, svpsel, svqcvt, svqcvtn, svqrshr, svqrshrn)
	(svqrshru, svqrshrun, svrevd, svunpk, svuzp, svuzpq, svzip)
	(svzipq): New functions.
	* config/aarch64/aarch64-sve-builtins-sve2.def: Add SME2 intrinsics.
	* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define
	or undefine __ARM_FEATURE_SME2.

gcc/testsuite/
	* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Provide a way
	for test functions to share ZT0.
	(ATTR): Update accordingly.
	(TEST_LOAD_COUNT, TEST_STORE_COUNT, TEST_PN, TEST_COUNT_PN)
	(TEST_EXTRACT_PN, TEST_SELECT_P, TEST_COMPARE_S_X2, TEST_COMPARE_S_C)
	(TEST_CREATE_B, TEST_GET_B, TEST_SET_B, TEST_XN, TEST_XN_SINGLE)
	(TEST_XN_SINGLE_Z15, TEST_XN_SINGLE_AWKWARD, TEST_X2_NARROW)
	(TEST_X4_NARROW): New macros.
	* gcc.target/aarch64/sve/acle/asm/create2_1.c: Add _b tests.
	* gcc.target/aarch64/sve/acle/general-c/binary_za_m_1.c: Remove
	test for svmopa that becomes valid with SME2.
	* gcc.target/aarch64/sve/acle/general-c/create_1.c: Adjust for
	existence of svboolx2_t version of svcreate2.
	* gcc.target/aarch64/sve/acle/general-c/store_1.c: Adjust error
	messages to account for svcount_t predication.
	* gcc.target/aarch64/sve/acle/general-c/store_2.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/ternary_qq_lane_1.c: Adjust
	error messages to account for new SME2 variants.
	* gcc.target/aarch64/sve/acle/general-c/ternary_qq_opt_n_2.c: Likewise.
2023-12-05 10:24:02 +00:00
Richard Sandiford
8d29b7aca1 aarch64: Add ZT0
SME2 adds a 512-bit lookup table called ZT0.  It is enabled
and disabled by PSTATE.ZA, just like ZA itself.  This patch
adds support for the register, including saving and restoring
contents.

The code reuses the V8DI that was added for LS64, including
the associated memory classification rules.  (The ZT0 range
is more restricted than the LS64 range, but that's enforced
by predicates and constraints.)

gcc/
	* config/aarch64/aarch64.md (ZT0_REGNUM): New constant.
	(LAST_FAKE_REGNUM): Bump to include it.
	* config/aarch64/aarch64.h (FIXED_REGISTERS): Add an entry for ZT0.
	(CALL_REALLY_USED_REGISTERS, REGISTER_NAMES): Likewise.
	(REG_CLASS_CONTENTS): Likewise.
	(machine_function): Add zt0_save_buffer.
	(CUMULATIVE_ARGS): Add shared_zt0_flags;
	* config/aarch64/aarch64.cc (aarch64_check_state_string): Handle zt0.
	(aarch64_fntype_pstate_za, aarch64_fndecl_pstate_za): Likewise.
	(aarch64_function_arg): Add the shared ZT0 flags as an extra
	limb of the parallel.
	(aarch64_init_cumulative_args): Initialize shared_zt0_flags.
	(aarch64_extra_live_on_entry): Handle ZT0_REGNUM.
	(aarch64_epilogue_uses): Likewise.
	(aarch64_get_zt0_save_buffer, aarch64_save_zt0): New functions.
	(aarch64_restore_zt0): Likewise.
	(aarch64_start_call_args): Reject calls to functions that share
	ZT0 from functions that have no ZT0 state.  Save ZT0 around shared-ZA
	calls that do not share ZT0.
	(aarch64_expand_call): Handle ZT0.  Reject calls to functions that
	share ZT0 but not ZA from functions with ZA state.
	(aarch64_end_call_args): Restore ZT0 after calls to shared-ZA functions
	that do not share ZT0.
	(aarch64_set_current_function): Require +sme2 for functions that
	have ZT0 state.
	(aarch64_function_attribute_inlinable_p): Don't allow functions to
	be inlined if they have local zt0 state.
	(AARCH64_IPA_CLOBBERS_ZT0): New constant.
	(aarch64_update_ipa_fn_target_info): Record asms that clobber ZT0.
	(aarch64_can_inline_p): Don't inline callees that clobber ZT0
	into functions that have ZT0 state.
	(aarch64_comp_type_attributes): Check for compatible ZT0 sharing.
	(aarch64_optimize_mode_switching): Use mode switching if the
	function has ZT0 state.
	(aarch64_mode_emit_local_sme_state): Save and restore ZT0 around
	calls to private-ZA functions.
	(aarch64_mode_needed_local_sme_state): Require ZA to be active
	for instructions that access ZT0.
	(aarch64_mode_entry): Mark ZA as dead on entry if the function
	only shares state other than "za" itself.
	(aarch64_mode_exit): Likewise mark ZA as dead on return.
	(aarch64_md_asm_adjust): Extend handling of ZA clobbers to ZT0.
	* config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros):
	Define __ARM_STATE_ZT0.
	* config/aarch64/aarch64-sme.md (UNSPECV_ASM_UPDATE_ZT0): New unspecv.
	(aarch64_asm_update_zt0): New insn.
	(UNSPEC_RESTORE_ZT0): New unspec.
	(aarch64_sme_ldr_zt0, aarch64_restore_zt0): New insns.
	(aarch64_sme_str_zt0): Likewise.

gcc/testsuite/
	* gcc.target/aarch64/sme/zt0_state_1.c: New test.
	* gcc.target/aarch64/sme/zt0_state_2.c: Likewise.
	* gcc.target/aarch64/sme/zt0_state_3.c: Likewise.
	* gcc.target/aarch64/sme/zt0_state_4.c: Likewise.
	* gcc.target/aarch64/sme/zt0_state_5.c: Likewise.
	* gcc.target/aarch64/sme/zt0_state_6.c: Likewise.
2023-12-05 10:24:01 +00:00
Richard Sandiford
724a873b14 aarch64: Add svboolx2_t
SME2 has some instructions that operate on pairs of predicates.
The SME2 ACLE defines an svboolx2_t type for the associated
intrinsics.

The patch uses a double-width predicate mode, VNx32BI, to represent
the contents, similarly to how data vector tuples work.  At present
there doesn't seem to be any need to define pairs for VNx2BI,
VNx4BI and VNx8BI.

We already supported pairs of svbool_ts at the PCS level, as part
of a more general framework.  All that changes on the PCS side is
that we now have an associated mode.

gcc/
	* config/aarch64/aarch64-modes.def (VNx32BI): New mode.
	* config/aarch64/aarch64-protos.h (aarch64_split_double_move): Declare.
	* config/aarch64/aarch64-sve-builtins.cc
	(register_tuple_type): Handle tuples of predicates.
	(handle_arm_sve_h): Define svboolx2_t as a pair of two svbool_ts.
	* config/aarch64/aarch64-sve.md (movvnx32bi): New insn.
	* config/aarch64/aarch64.cc
	(pure_scalable_type_info::piece::get_rtx): Use VNx32BI for pairs
	of predicates.
	(pure_scalable_type_info::add_piece): Don't try to form pairs of
	predicates.
	(VEC_STRUCT): Generalize comment.
	(aarch64_classify_vector_mode): Handle VNx32BI.
	(aarch64_array_mode): Likewise.  Return BLKmode for arrays of
	predicates that have no associated mode, rather than allowing
	an integer mode to be chosen.
	(aarch64_hard_regno_nregs): Handle VNx32BI.
	(aarch64_hard_regno_mode_ok): Likewise.
	(aarch64_split_double_move): New function, split out from...
	(aarch64_split_128bit_move): ...here.
	(aarch64_ptrue_reg): Tighten assert to aarch64_sve_pred_mode_p.
	(aarch64_pfalse_reg): Likewise.
	(aarch64_sve_same_pred_for_ptest_p): Likewise.
	(aarch64_sme_mode_switch_regs::add_reg): Handle VNx32BI.
	(aarch64_expand_mov_immediate): Restrict handling of boolean vector
	constants to single-predicate modes.
	(aarch64_classify_address): Handle VNx32BI, ensuring that both halves
	can be addressed.
	(aarch64_class_max_nregs): Handle VNx32BI.
	(aarch64_member_type_forces_blk): Don't for BLKmode for svboolx2_t.
	(aarch64_simd_valid_immediate): Allow all-zeros and all-ones for
	VNx32BI.
	(aarch64_mov_operand_p): Restrict predicate constant canonicalization
	to single-predicate modes.
	(aarch64_evpc_ext): Generalize exclusion to all predicate modes.
	(aarch64_evpc_rev_local, aarch64_evpc_dup): Likewise.
	* config/aarch64/constraints.md (PR_REGS): New predicate.

gcc/testsuite/
	* gcc.target/aarch64/sve/pcs/struct_3_128.c (test_nonpst3): Adjust
	stack offsets.
	(ret_nonpst3): Remove XFAIL.
	* gcc.target/aarch64/sve/acle/general-c/svboolx2_1.c: New test.
2023-12-05 10:24:01 +00:00
Richard Sandiford
37be343727 aarch64: Add svcount_t
Some SME2 instructions interpret predicates as counters, rather than
as bit-per-byte masks.  The SME2 ACLE defines an svcount_t type for
this interpretation.

I don't think we have a better way of representing counters than
the VNx16BI that we use for masks.  The patch therefore doesn't
add a new mode for this representation.  It's just something that
is interpreted in context, a bit like signed vs. unsigned integers.

gcc/
	* config/aarch64/aarch64-sve-builtins-base.cc
	(svreinterpret_impl::fold): Handle reinterprets between svbool_t
	and svcount_t.
	(svreinterpret_impl::expand): Likewise.
	* config/aarch64/aarch64-sve-builtins-base.def (svreinterpret): Add
	b<->c forms.
	* config/aarch64/aarch64-sve-builtins.cc (TYPES_reinterpret_b): New
	type suffix list.
	(wrap_type_in_struct, register_type_decl): New functions, split out
	from...
	(register_tuple_type): ...here.
	(register_builtin_types): Handle svcount_t.
	(handle_arm_sve_h): Don't create tuples of svcount_t.
	* config/aarch64/aarch64-sve-builtins.def (svcount_t): New type.
	(c): New type suffix.
	* config/aarch64/aarch64-sve-builtins.h (TYPE_count): New type class.

gcc/testsuite/
	* g++.target/aarch64/sve/acle/general-c++/mangle_1.C: Add test
	for svcount_t.
	* g++.target/aarch64/sve/acle/general-c++/mangle_2.C: Likewise.
	* g++.target/aarch64/sve/acle/general-c++/svcount_1.C: New test.
	* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_DUAL_P)
	(TEST_DUAL_P_REV): New macros.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_b.c: New test.
	* gcc.target/aarch64/sve/acle/general-c/load_1.c: Test passing
	an svcount_t.
	* gcc.target/aarch64/sve/acle/general-c/svcount_1.c: New test.
	* gcc.target/aarch64/sve/acle/general-c/unary_convert_1.c: Test
	reinterprets involving svcount_t.
	* gcc.target/aarch64/sve/acle/general/attributes_7.c: Test svcount_t.
	* gcc.target/aarch64/sve/pcs/annotate_1.c: Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_2.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_12.c: New test.
2023-12-05 10:24:00 +00:00
Richard Sandiford
3b58b2205f aarch64: Add +sme2
gcc/
	* doc/invoke.texi: Document +sme2.
	* doc/sourcebuild.texi: Document aarch64_sme2.
	* config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION):
	Add sme2.
	* config/aarch64/aarch64.h (AARCH64_ISA_SME2, TARGET_SME2): New macros.

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_aarch64_sme2): New
	target test.
	(check_effective_target_aarch64_asm_sme2_ok): Likewise.
2023-12-05 10:24:00 +00:00
Richard Sandiford
0e7fee57c0 aarch64: Update sibcall handling for SME
We only support tail calls between functions with the same PSTATE.ZA
setting ("private-ZA" to "private-ZA" and "shared-ZA" to "shared-ZA").

Only a normal non-streaming function can tail-call another non-streaming
function, and only a streaming function can tail-call another streaming
function.  Any function can tail-call a streaming-compatible function.

gcc/
	* config/aarch64/aarch64.cc (aarch64_function_ok_for_sibcall):
	Enforce PSTATE.SM and PSTATE.ZA restrictions.
	(aarch64_expand_epilogue): Save and restore the arguments
	to a sibcall around any change to PSTATE.SM.

gcc/testsuite/
	* gcc.target/aarch64/sme/sibcall_1.c: New test.
	* gcc.target/aarch64/sme/sibcall_2.c: Likewise.
	* gcc.target/aarch64/sme/sibcall_3.c: Likewise.
	* gcc.target/aarch64/sme/sibcall_4.c: Likewise.
	* gcc.target/aarch64/sme/sibcall_5.c: Likewise.
	* gcc.target/aarch64/sme/sibcall_6.c: Likewise.
	* gcc.target/aarch64/sme/sibcall_7.c: Likewise.
	* gcc.target/aarch64/sme/sibcall_8.c: Likewise.
2023-12-05 10:11:30 +00:00
Richard Sandiford
0e9aa05df6 aarch64: Enforce inlining restrictions for SME
A function that has local ZA state cannot be inlined into its caller,
since we only support managing ZA switches at function scope.

A function whose body directly clobbers ZA state cannot be inlined into
a function with ZA state.

A function whose body requires a particular PSTATE.SM setting can only
be inlined into a function body that guarantees that PSTATE.SM setting.
The callee's function type doesn't matter here: one locally-streaming
function can be inlined into another.

gcc/
	* config/aarch64/aarch64.cc: Include symbol-summary.h, ipa-prop.h,
	and ipa-fnsummary.h
	(aarch64_function_attribute_inlinable_p): New function.
	(AARCH64_IPA_SM_FIXED, AARCH64_IPA_CLOBBERS_ZA): New constants.
	(aarch64_need_ipa_fn_target_info): New function.
	(aarch64_update_ipa_fn_target_info): Likewise.
	(aarch64_can_inline_p): Restrict the previous ISA flag checks
	to non-modal features.  Prevent callees that require a particular
	PSTATE.SM state from being inlined into callers that can't guarantee
	that state.  Also prevent callees that have ZA state from being
	inlined into callers that don't.  Finally, prevent callees that
	clobber ZA from being inlined into callers that have ZA state.
	(TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P): Define.
	(TARGET_NEED_IPA_FN_TARGET_INFO): Likewise.
	(TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise.

gcc/testsuite/
	* gcc.target/aarch64/sme/inlining_1.c: New test.
	* gcc.target/aarch64/sme/inlining_2.c: Likewise.
	* gcc.target/aarch64/sme/inlining_3.c: Likewise.
	* gcc.target/aarch64/sme/inlining_4.c: Likewise.
	* gcc.target/aarch64/sme/inlining_5.c: Likewise.
	* gcc.target/aarch64/sme/inlining_6.c: Likewise.
	* gcc.target/aarch64/sme/inlining_7.c: Likewise.
	* gcc.target/aarch64/sme/inlining_8.c: Likewise.
2023-12-05 10:11:30 +00:00
Richard Sandiford
275706fc59 aarch64: Handle PSTATE.SM across abnormal edges
PSTATE.SM is always off on entry to an exception handler, and on entry
to a nonlocal goto receiver.  Those entry points need to switch
PSTATE.SM back to the appropriate state for the current function.
In the case of streaming-compatible functions, they need to restore
the mode that the caller was originally using.

The requirement on nonlocal goto receivers means that nonlocal
jumps need to ensure that PSTATE.SM is zero.

gcc/
	* config/aarch64/aarch64.cc: Include except.h
	(aarch64_sme_mode_switch_regs::add_call_preserved_reg): New function.
	(aarch64_sme_mode_switch_regs::add_call_preserved_regs): Likewise.
	(aarch64_need_old_pstate_sm): Return true if the function has
	a nonlocal-goto or exception receiver.
	(aarch64_switch_pstate_sm_for_landing_pad): New function.
	(aarch64_switch_pstate_sm_for_jump): Likewise.
	(pass_switch_pstate_sm::gate): Enable the pass for all
	streaming and streaming-compatible functions.
	(pass_switch_pstate_sm::execute): Handle non-local gotos and their
	receivers.  Handle exception handler entry points.

gcc/testsuite/
	* g++.target/aarch64/sme/exceptions_2.C: New test.
	* gcc.target/aarch64/sme/nonlocal_goto_1.c: Likewise.
	* gcc.target/aarch64/sme/nonlocal_goto_2.c: Likewise.
	* gcc.target/aarch64/sme/nonlocal_goto_3.c: Likewise.
	* gcc.target/aarch64/sme/nonlocal_goto_4.c: Likewise.
	* gcc.target/aarch64/sme/nonlocal_goto_5.c: Likewise.
	* gcc.target/aarch64/sme/nonlocal_goto_6.c: Likewise.
	* gcc.target/aarch64/sme/nonlocal_goto_7.c: Likewise.
2023-12-05 10:11:29 +00:00
Richard Sandiford
3f6e5991fa aarch64: Add support for __arm_locally_streaming
This patch adds support for the __arm_locally_streaming attribute,
which allows a function to use SME internally without changing
the function's ABI.  The attribute is valid but redundant for
__arm_streaming functions.

gcc/
	* config/aarch64/aarch64.cc (aarch64_arm_attribute_table): Add
	arm::locally_streaming.
	(aarch64_fndecl_is_locally_streaming): New function.
	(aarch64_fndecl_sm_state): Handle locally-streaming functions.
	(aarch64_cfun_enables_pstate_sm): New function.
	(aarch64_add_offset): Add an argument that specifies whether
	the streaming vector length should be used instead of the
	prevailing one.
	(aarch64_split_add_offset, aarch64_add_sp, aarch64_sub_sp): Likewise.
	(aarch64_allocate_and_probe_stack_space): Likewise.
	(aarch64_expand_mov_immediate): Update calls accordingly.
	(aarch64_need_old_pstate_sm): Return true for locally-streaming
	streaming-compatible functions.
	(aarch64_layout_frame): Force all call-preserved Z and P registers
	to be saved and restored if the function switches PSTATE.SM in the
	prologue.
	(aarch64_get_separate_components): Disable shrink-wrapping of
	such Z and P saves and restores.
	(aarch64_use_late_prologue_epilogue): New function.
	(aarch64_expand_prologue): Measure SVE lengths in the streaming
	vector length for locally-streaming functions, then emit code
	to enable streaming mode.
	(aarch64_expand_epilogue): Likewise in reverse.
	(TARGET_USE_LATE_PROLOGUE_EPILOGUE): Define.
	* config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros):
	Define __arm_locally_streaming.

gcc/testsuite/
	* gcc.target/aarch64/sme/locally_streaming_1.c: New test.
	* gcc.target/aarch64/sme/locally_streaming_2.c: Likewise.
	* gcc.target/aarch64/sme/locally_streaming_3.c: Likewise.
	* gcc.target/aarch64/sme/locally_streaming_4.c: Likewise.
	* gcc.target/aarch64/sme/keyword_macros_1.c: Add
	__arm_locally_streaming.
	* g++.target/aarch64/sme/keyword_macros_1.C: Likewise.
2023-12-05 10:11:29 +00:00
Richard Sandiford
4f6ab95370 aarch64: Add support for <arm_sme.h>
This adds support for the SME parts of arm_sme.h.

gcc/
	* doc/invoke.texi: Document +sme-i16i64 and +sme-f64f64.
	* config.gcc (aarch64*-*-*): Add arm_sme.h to the list of headers
	to install and aarch64-sve-builtins-sme.o to the list of objects
	to build.
	* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define
	or undefine TARGET_SME, TARGET_SME_I16I64 and TARGET_SME_F64F64.
	(aarch64_pragma_aarch64): Handle arm_sme.h.
	* config/aarch64/aarch64-option-extensions.def (sme-i16i64)
	(sme-f64f64): New extensions.
	* config/aarch64/aarch64-protos.h (aarch64_sme_vq_immediate)
	(aarch64_addsvl_addspl_immediate_p, aarch64_output_addsvl_addspl)
	(aarch64_output_sme_zero_za): Declare.
	(aarch64_output_move_struct): Delete.
	(aarch64_sme_ldr_vnum_offset): Declare.
	(aarch64_sve::handle_arm_sme_h): Likewise.
	* config/aarch64/aarch64.h (AARCH64_ISA_SM_ON): New macro.
	(AARCH64_ISA_SME_I16I64, AARCH64_ISA_SME_F64F64): Likewise.
	(TARGET_STREAMING, TARGET_STREAMING_SME): Likewise.
	(TARGET_SME_I16I64, TARGET_SME_F64F64): Likewise.
	* config/aarch64/aarch64.cc (aarch64_sve_rdvl_factor_p): Rename to...
	(aarch64_sve_rdvl_addvl_factor_p): ...this.
	(aarch64_sve_rdvl_immediate_p): Update accordingly.
	(aarch64_rdsvl_immediate_p, aarch64_add_offset): Likewise.
	(aarch64_sme_vq_immediate): Likewise.  Make public.
	(aarch64_sve_addpl_factor_p): New function.
	(aarch64_sve_addvl_addpl_immediate_p): Use
	aarch64_sve_rdvl_addvl_factor_p and aarch64_sve_addpl_factor_p.
	(aarch64_addsvl_addspl_immediate_p): New function.
	(aarch64_output_addsvl_addspl): Likewise.
	(aarch64_cannot_force_const_mem): Return true for RDSVL immediates.
	(aarch64_classify_index): Handle .Q scaling for VNx1TImode.
	(aarch64_classify_address): Likewise for vnum offsets.
	(aarch64_output_sme_zero_za): New function.
	(aarch64_sme_ldr_vnum_offset_p): Likewise.
	* config/aarch64/predicates.md (aarch64_addsvl_addspl_immediate):
	New predicate.
	(aarch64_pluslong_operand): Include it for SME.
	* config/aarch64/constraints.md (Ucj, Uav): New constraints.
	* config/aarch64/iterators.md (VNx1TI_ONLY): New mode iterator.
	(SME_ZA_I, SME_ZA_SDI, SME_ZA_SDF_I, SME_MOP_BHI): Likewise.
	(SME_MOP_HSDF): Likewise.
	(UNSPEC_SME_ADDHA, UNSPEC_SME_ADDVA, UNSPEC_SME_FMOPA)
	(UNSPEC_SME_FMOPS, UNSPEC_SME_LD1_HOR, UNSPEC_SME_LD1_VER)
	(UNSPEC_SME_READ_HOR, UNSPEC_SME_READ_VER, UNSPEC_SME_SMOPA)
	(UNSPEC_SME_SMOPS, UNSPEC_SME_ST1_HOR, UNSPEC_SME_ST1_VER)
	(UNSPEC_SME_SUMOPA, UNSPEC_SME_SUMOPS, UNSPEC_SME_UMOPA)
	(UNSPEC_SME_UMOPS, UNSPEC_SME_USMOPA, UNSPEC_SME_USMOPS)
	(UNSPEC_SME_WRITE_HOR, UNSPEC_SME_WRITE_VER): New unspecs.
	(elem_bits): Handle x2 and x4 structure modes, plus VNx1TI.
	(Vetype, Vesize, VPRED): Handle VNx1TI.
	(b): New mode attribute.
	(SME_LD1, SME_READ, SME_ST1, SME_WRITE, SME_BINARY_SDI, SME_INT_MOP)
	(SME_FP_MOP): New int iterators.
	(optab): Handle SME unspecs.
	(hv): New int attribute.
	* config/aarch64/aarch64.md (*add<mode>3_aarch64): Handle ADDSVL
	and ADDSPL.
	* config/aarch64/aarch64-sme.md (UNSPEC_SME_LDR): New unspec.
	(@aarch64_sme_<optab><mode>, @aarch64_sme_<optab><mode>_plus)
	(aarch64_sme_ldr0, @aarch64_sme_ldrn<mode>): New patterns.
	(UNSPEC_SME_STR): New unspec.
	(@aarch64_sme_<optab><mode>, @aarch64_sme_<optab><mode>_plus)
	(aarch64_sme_str0, @aarch64_sme_strn<mode>): New patterns.
	(@aarch64_sme_<optab><v_int_container><mode>): Likewise.
	(*aarch64_sme_<optab><v_int_container><mode>_plus): Likewise.
	(@aarch64_sme_<optab><VNx1TI_ONLY:mode><SVE_FULL:mode>): Likewise.
	(@aarch64_sme_<optab><v_int_container><mode>): Likewise.
	(*aarch64_sme_<optab><v_int_container><mode>_plus): Likewise.
	(@aarch64_sme_<optab><VNx1TI_ONLY:mode><SVE_FULL:mode>): Likewise.
	(UNSPEC_SME_ZERO): New unspec.
	(aarch64_sme_zero): New pattern.
	(@aarch64_sme_<SME_BINARY_SDI:optab><mode>): Likewise.
	(@aarch64_sme_<SME_INT_MOP:optab><mode>): Likewise.
	(@aarch64_sme_<SME_FP_MOP:optab><mode>): Likewise.
	* config/aarch64/aarch64-sve-builtins.def: Add ZA type suffixes.
	Include aarch64-sve-builtins-sme.def.
	(DEF_SME_ZA_FUNCTION): New macro.
	* config/aarch64/aarch64-sve-builtins.h (CP_READ_ZA): New call
	property.
	(CP_WRITE_ZA): Likewise.
	(PRED_za_m): New predication type.
	(type_suffix_index): Handle DEF_SME_ZA_SUFFIX.
	(type_suffix_info): Add vector_p and za_p fields.
	(function_instance::num_za_tiles): New member function.
	(function_builder::get_attributes): Add an aarch64_feature_flags
	argument.
	(function_expander::get_contiguous_base): Take a base argument
	number, a vnum argument number, and an argument that indicates
	whether the vnum parameter is a factor of the SME vector length
	or the prevailing vector length.
	(function_expander::add_integer_operand): Take a poly_int64.
	(sve_switcher::sve_switcher): Take a base set of flags.
	(sme_switcher): New class.
	(scalar_types): Add a null entry for NUM_VECTOR_TYPES.
	* config/aarch64/aarch64-sve-builtins.cc: Include
	aarch64-sve-builtins-sme.h.
	(pred_suffixes): Add an entry for PRED_za_m.
	(type_suffixes): Initialize vector_p and za_p.  Handle ZA suffixes.
	(TYPES_all_za, TYPES_d_za, TYPES_za_bhsd_data, TYPES_za_all_data)
	(TYPES_za_s_integer, TYPES_za_d_integer, TYPES_mop_base)
	(TYPES_mop_base_signed, TYPES_mop_base_unsigned, TYPES_mop_i16i64)
	(TYPES_mop_i16i64_signed, TYPES_mop_i16i64_unsigned, TYPES_za): New
	type suffix macros.
	(preds_m, preds_za_m): New predication lists.
	(function_groups): Handle DEF_SME_ZA_FUNCTION.
	(scalar_types): Add an entry for NUM_VECTOR_TYPES.
	(find_type_suffix_for_scalar_type): Check positively for vectors
	rather than negatively for predicates.
	(check_required_extensions): Handle PSTATE.SM and PSTATE.ZA
	requirements.
	(report_out_of_range): Handle the case where the minimum and
	maximum are the same.
	(function_instance::reads_global_state_p): Return true for functions
	that read ZA.
	(function_instance::modifies_global_state_p): Return true for functions
	that write to ZA.
	(sve_switcher::sve_switcher): Add a base flags argument.
	(function_builder::get_name): Handle "__arm_" prefixes.
	(add_attribute): Add an overload that takes a namespaces.
	(add_shared_state_attribute): New function.
	(function_builder::get_attributes): Take the required feature flags
	as argument.  Add streaming and ZA attributes where appropriate.
	(function_builder::add_unique_function): Update calls accordingly.
	(function_resolver::check_gp_argument): Assert that the predication
	isn't ZA _m predication.
	(function_checker::function_checker): Don't bias the argument
	number for ZA _m predication.
	(function_expander::get_contiguous_base): Add arguments that
	specify the base argument number, the vnum argument number,
	and an argument that indicates whether the vnum parameter is
	a factor of the SME vector length or the prevailing vector length.
	Handle the SME case.
	(function_expander::add_input_operand): Handle pmode_register_operand.
	(function_expander::add_integer_operand): Take a poly_int64.
	(init_builtins): Call handle_arm_sme_h for LTO.
	(handle_arm_sve_h): Skip SME intrinsics.
	(handle_arm_sme_h): New function.
	* config/aarch64/aarch64-sve-builtins-functions.h
	(read_write_za, write_za): New classes.
	(unspec_based_sme_function, za_arith_function): New using aliases.
	(quiet_za_arith_function): Likewise.
	* config/aarch64/aarch64-sve-builtins-shapes.h
	(binary_za_int_m, binary_za_m, binary_za_uint_m, bool_inherent)
	(inherent_za, inherent_mask_za, ldr_za, load_za, read_za_m, store_za)
	(str_za, unary_za_m, write_za_m): Declare.
	* config/aarch64/aarch64-sve-builtins-shapes.cc (apply_predication):
	Expect za_m functions to have an existing governing predicate.
	(binary_za_m_base, binary_za_int_m_def, binary_za_m_def): New classes.
	(binary_za_uint_m_def, bool_inherent_def, inherent_za_def): Likewise.
	(inherent_mask_za_def, ldr_za_def, load_za_def, read_za_m_def)
	(store_za_def, str_za_def, unary_za_m_def, write_za_m_def): Likewise.
	* config/aarch64/arm_sme.h: New file.
	* config/aarch64/aarch64-sve-builtins-sme.h: Likewise.
	* config/aarch64/aarch64-sve-builtins-sme.cc: Likewise.
	* config/aarch64/aarch64-sve-builtins-sme.def: Likewise.
	* config/aarch64/t-aarch64 (aarch64-sve-builtins.o): Depend on
	aarch64-sve-builtins-sme.def and aarch64-sve-builtins-sme.h.
	(aarch64-sve-builtins-sme.o): New rule.

gcc/testsuite/
	* lib/target-supports.exp: Add sme and sme-i16i64 features.
	* gcc.target/aarch64/pragma_cpp_predefs_4.c: Test __ARM_FEATURE_SME*
	macros.
	* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Allow functions
	to be marked as __arm_streaming, __arm_streaming_compatible, and
	__arm_inout("za").
	* g++.target/aarch64/sve/acle/general-c++/func_redef_4.c: Mark the
	function as __arm_streaming_compatible.
	* g++.target/aarch64/sve/acle/general-c++/func_redef_5.c: Likewise.
	* g++.target/aarch64/sve/acle/general-c++/func_redef_7.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/func_redef_4.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/func_redef_5.c: Likewise.
	* g++.target/aarch64/sme/aarch64-sme-acle-asm.exp: New test harness.
	* gcc.target/aarch64/sme/aarch64-sme-acle-asm.exp: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/binary_za_int_m_1.c: New test.
	* gcc.target/aarch64/sve/acle/general-c/binary_za_m_1.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/binary_za_m_2.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/binary_za_uint_m_1.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/read_za_m_1.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/unary_za_m_1.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/write_za_m_1.c: Likewise.
2023-12-05 10:11:28 +00:00
Richard Sandiford
8de9304d94 aarch64: Generalise _m rules for SVE intrinsics
In SVE there was a simple rule that unary merging (_m) intrinsics
had a separate initial argument to specify the values of inactive
lanes, whereas other merging functions took inactive lanes from
the first operand to the operation.

That rule began to break down in SVE2, and it continues to do
so in SME.  This patch therefore adds a virtual function to
specify whether the separate initial argument is present or not.
The old rule is still the default.

gcc/
	* config/aarch64/aarch64-sve-builtins.h
	(function_shape::has_merge_argument_p): New member function.
	* config/aarch64/aarch64-sve-builtins.cc:
	(function_resolver::check_gp_argument): Use it.
	(function_expander::get_fallback_value): Likewise.
	* config/aarch64/aarch64-sve-builtins-shapes.cc
	(apply_predication): Likewise.
	(unary_convert_narrowt_def::has_merge_argument_p): New function.
2023-12-05 10:11:28 +00:00
Richard Sandiford
1ec23d5a29 aarch64: Generalise unspec_based_function_base
Until now, SVE intrinsics that map directly to unspecs
have always used type suffix 0 to distinguish between signed
integers, unsigned integers, and floating-point values.
SME adds functions that need to use type suffix 1 instead.
This patch generalises the classes accordingly.

gcc/
	* config/aarch64/aarch64-sve-builtins-functions.h
	(unspec_based_function_base): Allow type suffix 1 to determine
	the mode of the operation.
	(unspec_based_function): Update accordingly.
	(unspec_based_fused_function): Likewise.
	(unspec_based_fused_lane_function): Likewise.
2023-12-05 10:11:27 +00:00
Richard Sandiford
80fc055cf0 aarch64: Add a VNx1TI mode
Although TI isn't really a native SVE element mode, it's convenient
for SME if we define VNx1TI anyway, so that it can be used to
distinguish .Q ZA operations from others.  It's purely an RTL
convenience and isn't (yet) a valid storage mode.

gcc/
	* config/aarch64/aarch64-modes.def: Add VNx1TI.
2023-12-05 10:11:27 +00:00
Richard Sandiford
084122adb5 aarch64: Add a register class for w12-w15
Some SME instructions use w12-w15 to index ZA.  This patch
adds a register class for that range.

gcc/
	* config/aarch64/aarch64.h (W12_W15_REGNUM_P): New macro.
	(W12_W15_REGS): New register class.
	(REG_CLASS_NAMES, REG_CLASS_CONTENTS): Add entries for it.
	* config/aarch64/aarch64.cc (aarch64_regno_regclass)
	(aarch64_class_max_nregs, aarch64_register_move_cost): Handle
	W12_W15_REGS.
2023-12-05 10:11:26 +00:00
Richard Sandiford
3af9ceb631 aarch64: Add support for SME ZA attributes
SME has an array called ZA that can be enabled and disabled separately
from streaming mode.  A status bit called PSTATE.ZA indicates whether
ZA is currently enabled or not.

In C and C++, the state of PSTATE.ZA is controlled using function
attributes.  There are four attributes that can be attached to
function types to indicate that the function shares ZA with its
caller.  These are:

- arm::in("za")
- arm::out("za")
- arm::inout("za")
- arm::preserves("za")

If a function's type has one of these shared-ZA attributes,
PSTATE.ZA is specified to be 1 on entry to the function and on return
from the function.  Otherwise, the caller and callee have separate
ZA contexts; they do not use ZA to share data.

Although normal non-shared-ZA functions have a separate ZA context
from their callers, nested uses of ZA are expected to be rare.
The ABI therefore defines a cooperative lazy saving scheme that
allows saves and restore of ZA to be kept to a minimum.
(Callers still have the option of doing a full save and restore
if they prefer.)

Functions that want to use ZA internally have an arm::new("za")
attribute, which tells the compiler to enable PSTATE.ZA for
the duration of the function body.  It also tells the compiler
to commit any lazy save initiated by a caller.

The patch uses various abstract hard registers to track dataflow
relating to ZA.  See the comments in the patch for details.

The lazy save scheme is intended to be transparent to most normal
functions, so that they don't need to be recompiled for SME.
This is reflected in the way that most normal functions ignore
the new hard registers added in the patch.

As with arm::streaming and arm::streaming_compatible, the attributes are
also available as __arm_<attr>.  This has two advantages: it triggers an
error on compilers that don't understand the attributes, and it eases
use on C, where [[...]] attributes were only added in C23.

gcc/
	* config/aarch64/aarch64-isa-modes.def (ZA_ON): New ISA mode.
	* config/aarch64/aarch64-protos.h (aarch64_rdsvl_immediate_p)
	(aarch64_output_rdsvl, aarch64_optimize_mode_switching)
	(aarch64_restore_za): Declare.
	* config/aarch64/constraints.md (UsR): New constraint.
	* config/aarch64/aarch64.md (LOWERING_REGNUM, TPIDR_BLOCK_REGNUM)
	(SME_STATE_REGNUM, TPIDR2_SETUP_REGNUM, ZA_FREE_REGNUM)
	(ZA_SAVED_REGNUM, ZA_REGNUM, FIRST_FAKE_REGNUM): New constants.
	(LAST_FAKE_REGNUM): Likewise.
	(UNSPEC_SAVE_NZCV, UNSPEC_RESTORE_NZCV, UNSPEC_SME_VQ): New unspecs.
	(arches): Add sme.
	(arch_enabled): Handle it.
	(*cb<optab><mode>1): Rename to...
	(aarch64_cb<optab><mode>1): ...this.
	(*movsi_aarch64): Add an alternative for RDSVL.
	(*movdi_aarch64): Likewise.
	(aarch64_save_nzcv, aarch64_restore_nzcv): New insns.
	* config/aarch64/aarch64-sme.md (UNSPEC_SMSTOP_ZA)
	(UNSPEC_INITIAL_ZERO_ZA, UNSPEC_TPIDR2_SAVE, UNSPEC_TPIDR2_RESTORE)
	(UNSPEC_READ_TPIDR2, UNSPEC_WRITE_TPIDR2, UNSPEC_SETUP_LOCAL_TPIDR2)
	(UNSPEC_RESTORE_ZA, UNSPEC_START_PRIVATE_ZA_CALL): New unspecs.
	(UNSPEC_END_PRIVATE_ZA_CALL, UNSPEC_COMMIT_LAZY_SAVE): Likewise.
	(UNSPECV_ASM_UPDATE_ZA): New unspecv.
	(aarch64_tpidr2_save, aarch64_smstart_za, aarch64_smstop_za)
	(aarch64_initial_zero_za, aarch64_setup_local_tpidr2)
	(aarch64_clear_tpidr2, aarch64_write_tpidr2, aarch64_read_tpidr2)
	(aarch64_tpidr2_restore, aarch64_restore_za, aarch64_asm_update_za)
	(aarch64_start_private_za_call, aarch64_end_private_za_call)
	(aarch64_commit_lazy_save): New patterns.
	* config/aarch64/aarch64.h (AARCH64_ISA_ZA_ON, TARGET_ZA): New macros.
	(FIXED_REGISTERS, REGISTER_NAMES): Add the new fake ZA registers.
	(CALL_USED_REGISTERS): Replace with...
	(CALL_REALLY_USED_REGISTERS): ...this and add the fake ZA registers.
	(FIRST_PSEUDO_REGISTER): Bump to include the fake ZA registers.
	(FAKE_REGS): New register class.
	(REG_CLASS_NAMES): Update accordingly.
	(REG_CLASS_CONTENTS): Likewise.
	(machine_function::tpidr2_block): New member variable.
	(machine_function::tpidr2_block_ptr): Likewise.
	(machine_function::za_save_buffer): Likewise.
	(machine_function::next_asm_update_za_id): Likewise.
	(CUMULATIVE_ARGS::shared_za_flags): Likewise.
	(aarch64_mode_entity, aarch64_local_sme_state): New enums.
	(aarch64_tristate_mode): Likewise.
	(OPTIMIZE_MODE_SWITCHING, NUM_MODES_FOR_MODE_SWITCHING): Define.
	* config/aarch64/aarch64.cc (AARCH64_STATE_SHARED, AARCH64_STATE_IN)
	(AARCH64_STATE_OUT): New constants.
	(aarch64_attribute_shared_state_flags): New function.
	(aarch64_lookup_shared_state_flags, aarch64_fndecl_has_new_state)
	(aarch64_check_state_string, cmp_string_csts): Likewise.
	(aarch64_merge_string_arguments, aarch64_check_arm_new_against_type)
	(handle_arm_new, handle_arm_shared): Likewise.
	(handle_arm_new_za_attribute): New
	(aarch64_arm_attribute_table): Add new, preserves, in, out, and inout.
	(aarch64_hard_regno_nregs): Handle FAKE_REGS.
	(aarch64_hard_regno_mode_ok): Likewise.
	(aarch64_fntype_shared_flags, aarch64_fntype_pstate_za): New functions.
	(aarch64_fntype_isa_mode): Include aarch64_fntype_pstate_za.
	(aarch64_fndecl_has_state, aarch64_fndecl_pstate_za): New functions.
	(aarch64_fndecl_isa_mode): Include aarch64_fndecl_pstate_za.
	(aarch64_cfun_incoming_pstate_za, aarch64_cfun_shared_flags)
	(aarch64_cfun_has_new_state, aarch64_cfun_has_state): New functions.
	(aarch64_sme_vq_immediate, aarch64_sme_vq_unspec_p): Likewise.
	(aarch64_rdsvl_immediate_p, aarch64_output_rdsvl): Likewise.
	(aarch64_expand_mov_immediate): Handle RDSVL immediates.
	(aarch64_function_arg): Add the ZA sharing flags as a third limb
	of the PARALLEL.
	(aarch64_init_cumulative_args): Record the ZA sharing flags.
	(aarch64_extra_live_on_entry): New function.  Handle the new
	ZA-related fake registers.
	(aarch64_epilogue_uses): Handle the new ZA-related fake registers.
	(aarch64_cannot_force_const_mem): Handle UNSPEC_SME_VQ constants.
	(aarch64_get_tpidr2_block, aarch64_get_tpidr2_ptr): New functions.
	(aarch64_init_tpidr2_block, aarch64_restore_za): Likewise.
	(aarch64_layout_frame): Check whether the current function creates
	new ZA state.  Record that it clobbers LR if so.
	(aarch64_expand_prologue): Handle functions that create new ZA state.
	(aarch64_expand_epilogue): Likewise.
	(aarch64_create_tpidr2_block): New function.
	(aarch64_restore_za): Likewise.
	(aarch64_start_call_args): Disallow calls to shared-ZA functions
	from functions that have no ZA state.  Emit a marker instruction
	before calls to private-ZA functions from functions that have
	SME state.
	(aarch64_expand_call): Add return registers for state that is
	managed via attributes.  Record the use and clobber information
	for the ZA registers.
	(aarch64_end_call_args): New function.
	(aarch64_regno_regclass): Handle FAKE_REGS.
	(aarch64_class_max_nregs): Likewise.
	(aarch64_override_options_internal): Require TARGET_SME for
	functions that have ZA state.
	(aarch64_conditional_register_usage): Handle FAKE_REGS.
	(aarch64_mov_operand_p): Handle RDSVL immediates.
	(aarch64_comp_type_attributes): Check that the ZA sharing flags
	are equal.
	(aarch64_merge_decl_attributes): New function.
	(aarch64_optimize_mode_switching, aarch64_mode_emit_za_save_buffer)
	(aarch64_mode_emit_local_sme_state, aarch64_mode_emit):  Likewise.
	(aarch64_insn_references_sme_state_p): Likewise.
	(aarch64_mode_needed_local_sme_state): Likewise.
	(aarch64_mode_needed_za_save_buffer, aarch64_mode_needed): Likewise.
	(aarch64_mode_after_local_sme_state, aarch64_mode_after): Likewise.
	(aarch64_local_sme_confluence, aarch64_mode_confluence): Likewise.
	(aarch64_one_shot_backprop, aarch64_local_sme_backprop): Likewise.
	(aarch64_mode_backprop, aarch64_mode_entry): Likewise.
	(aarch64_mode_exit, aarch64_mode_eh_handler): Likewise.
	(aarch64_mode_priority, aarch64_md_asm_adjust): Likewise.
	(TARGET_END_CALL_ARGS, TARGET_MERGE_DECL_ATTRIBUTES): Define.
	(TARGET_MODE_EMIT, TARGET_MODE_NEEDED, TARGET_MODE_AFTER): Likewise.
	(TARGET_MODE_CONFLUENCE, TARGET_MODE_BACKPROP): Likewise.
	(TARGET_MODE_ENTRY, TARGET_MODE_EXIT): Likewise.
	(TARGET_MODE_EH_HANDLER, TARGET_MODE_PRIORITY): Likewise.
	(TARGET_EXTRA_LIVE_ON_ENTRY): Likewise.
	(TARGET_MD_ASM_ADJUST): Use aarch64_md_asm_adjust.
	* config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros):
	Define __arm_new, __arm_preserves,__arm_in, __arm_out, and __arm_inout.

gcc/testsuite/
	* gcc.target/aarch64/sme/za_state_1.c: New test.
	* gcc.target/aarch64/sme/za_state_2.c: Likewise.
	* gcc.target/aarch64/sme/za_state_3.c: Likewise.
	* gcc.target/aarch64/sme/za_state_4.c: Likewise.
	* gcc.target/aarch64/sme/za_state_5.c: Likewise.
	* gcc.target/aarch64/sme/za_state_6.c: Likewise.
	* g++.target/aarch64/sme/exceptions_1.C: Likewise.
	* gcc.target/aarch64/sme/keyword_macros_1.c: Add ZA macros.
	* g++.target/aarch64/sme/keyword_macros_1.C: Likewise.
2023-12-05 10:11:26 +00:00
Richard Sandiford
dd8090f400 aarch64: Switch PSTATE.SM around calls
This patch adds support for switching to the appropriate SME mode
for each call.  Switching to streaming mode requires an SMSTART SM
instruction and switching to non-streaming mode requires an SMSTOP SM
instruction.  If the call is being made from streaming-compatible code,
these switches are conditional on the current mode being the opposite
of the one that the call needs.

Since changing PSTATE.SM changes the vector length and effectively
changes the ISA, the code to do the switching has to be emitted late.
The patch does this using a new pass that runs next to late prologue/
epilogue insertion.  (It doesn't use md_reorg because later additions
need the CFG.)

If a streaming-compatible function needs to switch mode for a call,
it must restore the original mode afterwards.  The old mode must
therefore be available immediately after the call.  The easiest
way of ensuring this is to force the use of a hard frame pointer
and ensure that the old state is saved at an in-range offset
from there.

Changing modes clobbers the Z and P registers, so we need to
save and restore live Z and P state around each mode switch.
However, mode switches are not expected to be performance
critical, so it seemed better to err on the side of being
correct rather than trying to optimise the save and restore
with surrounding code.

gcc/
	* config/aarch64/aarch64-passes.def
	(pass_late_thread_prologue_and_epilogue): New pass.
	* config/aarch64/aarch64-sme.md: New file.
	* config/aarch64/aarch64.md: Include it.
	(*tb<optab><mode>1): Rename to...
	(@aarch64_tb<optab><mode>): ...this.
	(call, call_value, sibcall, sibcall_value): Don't require operand 2
	to be a CONST_INT.
	* config/aarch64/aarch64-protos.h (aarch64_emit_call_insn): Return
	the insn.
	(make_pass_switch_sm_state): Declare.
	* config/aarch64/aarch64.h (TARGET_STREAMING_COMPATIBLE): New macro.
	(CALL_USED_REGISTER): Mark VG as call-preserved.
	(aarch64_frame::old_svcr_offset): New member variable.
	(machine_function::call_switches_sm_state): Likewise.
	(CUMULATIVE_ARGS::num_sme_mode_switch_args): Likewise.
	(CUMULATIVE_ARGS::sme_mode_switch_args): Likewise.
	* config/aarch64/aarch64.cc: Include tree-pass.h and cfgbuild.h.
	(aarch64_cfun_incoming_pstate_sm): New function.
	(aarch64_call_switches_pstate_sm): Likewise.
	(aarch64_reg_save_mode): Return DImode for VG_REGNUM.
	(aarch64_callee_isa_mode): New function.
	(aarch64_insn_callee_isa_mode): Likewise.
	(aarch64_guard_switch_pstate_sm): Likewise.
	(aarch64_switch_pstate_sm): Likewise.
	(aarch64_sme_mode_switch_regs): New class.
	(aarch64_record_sme_mode_switch_args): New function.
	(aarch64_finish_sme_mode_switch_args): Likewise.
	(aarch64_function_arg): Handle the end marker by returning a
	PARALLEL that contains the ABI cookie that we used previously
	alongside the result of aarch64_finish_sme_mode_switch_args.
	(aarch64_init_cumulative_args): Initialize num_sme_mode_switch_args.
	(aarch64_function_arg_advance): If a call would switch SM state,
	record all argument registers that would need to be saved around
	the mode switch.
	(aarch64_need_old_pstate_sm): New function.
	(aarch64_layout_frame): Decide whether the frame needs to store the
	incoming value of PSTATE.SM and allocate a save slot for it if so.
	If a function switches SME state, arrange to save the old value
	of the DWARF VG register.  Handle the case where this is the only
	register save slot above the FP.
	(aarch64_save_callee_saves): Handles saves of the DWARF VG register.
	(aarch64_get_separate_components): Prevent such saves from being
	shrink-wrapped.
	(aarch64_old_svcr_mem): New function.
	(aarch64_read_old_svcr): Likewise.
	(aarch64_guard_switch_pstate_sm): Likewise.
	(aarch64_expand_prologue): Handle saves of the DWARF VG register.
	Initialize any SVCR save slot.
	(aarch64_expand_call): Allow the cookie to be PARALLEL that contains
	both the UNSPEC_CALLEE_ABI value and a list of registers that need
	to be preserved across a change to PSTATE.SM.  If the call does
	involve such a change to PSTATE.SM, record the registers that
	would be clobbered by this process.  Also emit an instruction
	to mark the temporary change in VG.  Update call_switches_pstate_sm.
	(aarch64_emit_call_insn): Return the emitted instruction.
	(aarch64_frame_pointer_required): New function.
	(aarch64_conditional_register_usage): Prevent VG_REGNUM from being
	treated as a register operand.
	(aarch64_switch_pstate_sm_for_call): New function.
	(pass_data_switch_pstate_sm): New pass variable.
	(pass_switch_pstate_sm): New pass class.
	(make_pass_switch_pstate_sm): New function.
	(TARGET_FRAME_POINTER_REQUIRED): Define.
	* config/aarch64/t-aarch64 (s-check-sve-md): Add aarch64-sme.md.

gcc/testsuite/
	* gcc.target/aarch64/sme/call_sm_switch_1.c: New test.
	* gcc.target/aarch64/sme/call_sm_switch_2.c: Likewise.
	* gcc.target/aarch64/sme/call_sm_switch_3.c: Likewise.
	* gcc.target/aarch64/sme/call_sm_switch_4.c: Likewise.
	* gcc.target/aarch64/sme/call_sm_switch_5.c: Likewise.
	* gcc.target/aarch64/sme/call_sm_switch_6.c: Likewise.
	* gcc.target/aarch64/sme/call_sm_switch_7.c: Likewise.
	* gcc.target/aarch64/sme/call_sm_switch_8.c: Likewise.
	* gcc.target/aarch64/sme/call_sm_switch_9.c: Likewise.
	* gcc.target/aarch64/sme/call_sm_switch_10.c: Likewise.
2023-12-05 10:11:25 +00:00
Richard Sandiford
983b436502 aarch64: Mark relevant SVE instructions as non-streaming
Following on from the previous Advanced SIMD patch, this one
divides SVE instructions into non-streaming and streaming-
compatible groups.

gcc/
	* config/aarch64/aarch64.h (TARGET_NON_STREAMING): New macro.
	(TARGET_SVE2_AES, TARGET_SVE2_BITPERM): Use it.
	(TARGET_SVE2_SHA3, TARGET_SVE2_SM4): Likewise.
	* config/aarch64/aarch64-sve-builtins-base.def: Separate out
	the functions that require PSTATE.SM to be 0 and guard them
	with AARCH64_FL_SM_OFF.
	* config/aarch64/aarch64-sve-builtins-sve2.def: Likewise.
	* config/aarch64/aarch64-sve-builtins.cc (check_required_extensions):
	Enforce AARCH64_FL_SM_OFF requirements.
	* config/aarch64/aarch64-sve.md (aarch64_wrffr): Require
	TARGET_NON_STREAMING
	(aarch64_rdffr, aarch64_rdffr_z, *aarch64_rdffr_z_ptest): Likewise.
	(*aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc)
	(@aarch64_ld<fn>f1<mode>): Likewise.
	(@aarch64_ld<fn>f1_<ANY_EXTEND:optab><SVE_HSDI:mode><SVE_PARTIAL_I:mode>)
	(gather_load<mode><v_int_container>): Likewise
	(mask_gather_load<mode><v_int_container>): Likewise.
	(mask_gather_load<mode><v_int_container>): Likewise.
	(*mask_gather_load<mode><v_int_container>_<su>xtw_unpacked): Likewise.
	(*mask_gather_load<mode><v_int_container>_sxtw): Likewise.
	(*mask_gather_load<mode><v_int_container>_uxtw): Likewise.
	(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>)
	(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
	<SVE_2BHSI:mode>): Likewise.
	(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
	<SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked)
	(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
	<SVE_2BHSI:mode>_sxtw): Likewise.
	(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
	<SVE_2BHSI:mode>_uxtw): Likewise.
	(@aarch64_ldff1_gather<mode>, @aarch64_ldff1_gather<mode>): Likewise.
	(*aarch64_ldff1_gather<mode>_sxtw): Likewise.
	(*aarch64_ldff1_gather<mode>_uxtw): Likewise.
	(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode>
	<VNx4_NARROW:mode>): Likewise.
	(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
	<VNx2_NARROW:mode>): Likewise.
	(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
	<VNx2_NARROW:mode>_sxtw): Likewise.
	(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
	<VNx2_NARROW:mode>_uxtw): Likewise.
	(@aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx4SI_ONLY:mode>)
	(@aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>)
	(*aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>_sxtw)
	(*aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>_uxtw)
	(scatter_store<mode><v_int_container>): Likewise.
	(mask_scatter_store<mode><v_int_container>): Likewise.
	(*mask_scatter_store<mode><v_int_container>_<su>xtw_unpacked)
	(*mask_scatter_store<mode><v_int_container>_sxtw): Likewise.
	(*mask_scatter_store<mode><v_int_container>_uxtw): Likewise.
	(@aarch64_scatter_store_trunc<VNx4_NARROW:mode><VNx4_WIDE:mode>)
	(@aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>)
	(*aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>_sxtw)
	(*aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>_uxtw)
	(@aarch64_sve_ld1ro<mode>, @aarch64_adr<mode>): Likewise.
	(*aarch64_adr_sxtw, *aarch64_adr_uxtw_unspec): Likewise.
	(*aarch64_adr_uxtw_and, @aarch64_adr<mode>_shift): Likewise.
	(*aarch64_adr<mode>_shift, *aarch64_adr_shift_sxtw): Likewise.
	(*aarch64_adr_shift_uxtw, @aarch64_sve_add_<optab><vsi2qi>): Likewise.
	(@aarch64_sve_<sve_fp_op><mode>, fold_left_plus_<mode>): Likewise.
	(mask_fold_left_plus_<mode>, @aarch64_sve_compact<mode>): Likewise.
	* config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt<mode>)
	(@aarch64_gather_ldnt_<ANY_EXTEND:optab><SVE_FULL_SDI:mode>
	<SVE_PARTIAL_I:mode>): Likewise.
	(@aarch64_sve2_histcnt<mode>, @aarch64_sve2_histseg<mode>): Likewise.
	(@aarch64_pred_<SVE2_MATCH:sve_int_op><mode>): Likewise.
	(*aarch64_pred_<SVE2_MATCH:sve_int_op><mode>_cc): Likewise.
	(*aarch64_pred_<SVE2_MATCH:sve_int_op><mode>_ptest): Likewise.
	* config/aarch64/iterators.md (SVE_FP_UNARY_INT): Make FEXPA
	depend on TARGET_NON_STREAMING.
	(SVE_BFLOAT_TERNARY_LONG): Likewise BFMMLA.

gcc/testsuite/
	* g++.target/aarch64/sve/aarch64-ssve.exp: New harness.
	* g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Add
	-DSTREAMING_COMPATIBLE to the list of options.
	* g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise.
	* gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise.
	* gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise.
	Fix pasto in variable name.
	* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Mark functions
	as streaming-compatible if STREAMING_COMPATIBLE is defined.
	* gcc.target/aarch64/sve/acle/asm/adda_f16.c: Disable for
	streaming-compatible code.
	* gcc.target/aarch64/sve/acle/asm/adda_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/adda_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/adrb.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/adrd.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/adrh.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/adrw.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/bfmmla_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/compact_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/compact_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/compact_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/compact_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/compact_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/compact_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/expa_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/expa_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/expa_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1_gather_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1_gather_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sw_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sw_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1uw_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1uw_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_gather_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_gather_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sh_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sh_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sh_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sh_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sw_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sw_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uh_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uh_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uh_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uh_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uw_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uw_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sb_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sb_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sb_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sb_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sb_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sb_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sh_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sh_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sh_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sh_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sw_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sw_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1ub_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1ub_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1ub_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1ub_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1ub_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1ub_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1uh_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1uh_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1uh_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1uh_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1uw_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1uw_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mmla_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mmla_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mmla_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mmla_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/prfb_gather.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/prfd_gather.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/prfh_gather.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/prfw_gather.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/rdffr_1.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1_scatter_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1_scatter_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1_scatter_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1_scatter_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1_scatter_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1_scatter_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1b_scatter_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1b_scatter_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1b_scatter_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1b_scatter_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1h_scatter_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1h_scatter_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1h_scatter_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1h_scatter_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1w_scatter_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1w_scatter_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tmad_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tmad_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tmad_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tsmul_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tsmul_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tsmul_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tssel_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tssel_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tssel_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/usmmla_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/aesd_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/aese_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bdep_u16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bdep_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bdep_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bdep_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bext_u16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bext_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bext_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bext_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bgrp_u16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bgrp_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bgrp_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bgrp_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/histcnt_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/histcnt_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/histcnt_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/histcnt_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/histseg_s8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/histseg_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/match_s16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/match_s8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/match_u16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/match_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/nmatch_s16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/nmatch_s8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/nmatch_u16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/nmatch_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/pmullb_pair_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/pmullt_pair_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/rax1_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/rax1_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/sm4e_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/sm4ekey_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_u64.c: Likewise.
2023-12-05 10:11:24 +00:00
Richard Sandiford
c86ee4f683 aarch64: Distinguish streaming-compatible AdvSIMD insns
The vast majority of Advanced SIMD instructions are not
available in streaming mode, but some of the load/store/move
instructions are.  This patch adds a new target feature macro
called TARGET_BASE_SIMD for this streaming-compatible subset.

The vector-to-vector move instructions are not streaming-compatible,
so we need to use the SVE move instructions where enabled, or fall
back to the nofp16 handling otherwise.

I haven't found a good way of testing the SVE EXT alternative
in aarch64_simd_mov_from_<mode>high, but I'd rather provide it
than not.

gcc/
	* config/aarch64/aarch64.h (TARGET_BASE_SIMD): New macro.
	(TARGET_SIMD): Require PSTATE.SM to be 0.
	(AARCH64_ISA_SM_OFF): New macro.
	* config/aarch64/aarch64.cc (aarch64_array_mode_supported_p):
	Allow Advanced SIMD structure modes for TARGET_BASE_SIMD.
	(aarch64_print_operand): Support '%Z'.
	(aarch64_secondary_reload): Expect SVE moves to be used for
	Advanced SIMD modes if SVE is enabled and non-streaming
	Advanced SIMD isn't.
	(aarch64_register_move_cost): Likewise.
	(aarch64_simd_container_mode): Extend Advanced SIMD mode
	handling to TARGET_BASE_SIMD.
	(aarch64_expand_cpymem): Expand commentary.
	* config/aarch64/aarch64.md (arches): Add base_simd and nobase_simd.
	(arch_enabled): Handle it.
	(*mov<mode>_aarch64): Extend UMOV alternative to TARGET_BASE_SIMD.
	(*movti_aarch64): Use an SVE move instruction if non-streaming
	SIMD isn't available.
	(*mov<TFD:mode>_aarch64): Likewise.
	(load_pair_dw_tftf): Extend to TARGET_BASE_SIMD.
	(store_pair_dw_tftf): Likewise.
	(loadwb_pair<TX:mode>_<P:mode>): Likewise.
	(storewb_pair<TX:mode>_<P:mode>): Likewise.
	* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VDMOV:mode>):
	Allow UMOV in streaming mode.
	(*aarch64_simd_mov<VQMOV:mode>): Use an SVE move instruction
	if non-streaming SIMD isn't available.
	(aarch64_store_lane0<mode>): Depend on TARGET_FLOAT rather than
	TARGET_SIMD.
	(aarch64_simd_mov_from_<mode>low): Likewise.  Use fmov if
	Advanced SIMD is completely disabled.
	(aarch64_simd_mov_from_<mode>high): Use SVE EXT instructions if
	non-streaming SIMD isn't available.

gcc/testsuite/
	* gcc.target/aarch64/movdf_2.c: New test.
	* gcc.target/aarch64/movdi_3.c: Likewise.
	* gcc.target/aarch64/movhf_2.c: Likewise.
	* gcc.target/aarch64/movhi_2.c: Likewise.
	* gcc.target/aarch64/movqi_2.c: Likewise.
	* gcc.target/aarch64/movsf_2.c: Likewise.
	* gcc.target/aarch64/movsi_2.c: Likewise.
	* gcc.target/aarch64/movtf_3.c: Likewise.
	* gcc.target/aarch64/movtf_4.c: Likewise.
	* gcc.target/aarch64/movti_3.c: Likewise.
	* gcc.target/aarch64/movti_4.c: Likewise.
	* gcc.target/aarch64/movv16qi_4.c: Likewise.
	* gcc.target/aarch64/movv16qi_5.c: Likewise.
	* gcc.target/aarch64/movv8qi_4.c: Likewise.
	* gcc.target/aarch64/sme/arm_neon_1.c: Likewise.
	* gcc.target/aarch64/sme/arm_neon_2.c: Likewise.
	* gcc.target/aarch64/sme/arm_neon_3.c: Likewise.
2023-12-05 10:11:24 +00:00
Richard Sandiford
7e04bd1fad aarch64: Add +sme
This patch adds the +sme ISA feature and requires it to be present
when compiling arm_streaming code.  (arm_streaming_compatible code
does not necessarily assume the presence of SME.  It just has to
work when SME is present and streaming mode is enabled.)

gcc/
	* doc/invoke.texi: Document SME.
	* doc/sourcebuild.texi: Document aarch64_sve.
	* config/aarch64/aarch64-option-extensions.def (sme): Define.
	* config/aarch64/aarch64.h (AARCH64_ISA_SME): New macro.
	(TARGET_SME): Likewise.
	* config/aarch64/aarch64.cc (aarch64_override_options_internal):
	Ensure that SME is present when compiling streaming code.

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_aarch64_sme): New
	target test.
	* gcc.target/aarch64/sme/aarch64-sme.exp: Force SME to be enabled
	if it isn't by default.
	* g++.target/aarch64/sme/aarch64-sme.exp: Likewise.
	* gcc.target/aarch64/sme/streaming_mode_3.c: New test.
2023-12-05 10:11:23 +00:00
Richard Sandiford
2c9a54b423 aarch64: Add arm_streaming(_compatible) attributes
This patch adds support for recognising the SME arm::streaming
and arm::streaming_compatible attributes.  These attributes
respectively describe whether the processor is definitely in
"streaming mode" (PSTATE.SM==1), whether the processor is
definitely not in streaming mode (PSTATE.SM==0), or whether
we don't know at compile time either way.

As far as the compiler is concerned, this effectively creates three
ISA submodes: streaming mode enables things that are not available
in non-streaming mode, non-streaming mode enables things that not
available in streaming mode, and streaming-compatible mode has to stick
to the common subset.  This means that some instructions are conditional
on PSTATE.SM==1 and some are conditional on PSTATE.SM==0.

I wondered about recording the streaming state in a new variable.
However, the set of available instructions is also influenced by
PSTATE.ZA (added later), so I think it makes sense to view this
as an instance of a more general mechanism.  Also, keeping the
PSTATE.SM state in the same flag variable as the other ISA
features makes it possible to sum up the requirements of an
ACLE function in a single value.

The patch therefore adds a new set of feature flags called "ISA modes".
Unlike the other two sets of flags (optional features and architecture-
level features), these ISA modes are not controlled directly by
command-line parameters or "target" attributes.

arm::streaming and arm::streaming_compatible are function type attributes
rather than function declaration attributes.  This means that we need
to find somewhere to copy the type information across to a function's
target options.  The patch does this in aarch64_set_current_function.

We also need to record which ISA mode a callee expects/requires
to be active on entry.  (The same mode is then active on return.)
The patch extends the current UNSPEC_CALLEE_ABI cookie to include
this information, as well as the PCS variant that it recorded
previously.

The attributes can also be written __arm_streaming and
__arm_streaming_compatible.  This has two advantages: it triggers
an error on compilers that don't understand the attributes, and it
eases use on C, where [[...]] attributes were only added in C23.

gcc/
	* config/aarch64/aarch64-isa-modes.def: New file.
	* config/aarch64/aarch64.h: Include it in the feature enumerations.
	(AARCH64_FL_SM_STATE, AARCH64_FL_ISA_MODES): New constants.
	(AARCH64_FL_DEFAULT_ISA_MODE): Likewise.
	(AARCH64_ISA_MODE): New macro.
	(CUMULATIVE_ARGS): Add an isa_mode field.
	* config/aarch64/aarch64-protos.h (aarch64_gen_callee_cookie): Declare.
	(aarch64_tlsdesc_abi_id): Return an arm_pcs.
	* config/aarch64/aarch64.cc (attr_streaming_exclusions)
	(aarch64_gnu_attributes, aarch64_gnu_attribute_table)
	(aarch64_arm_attributes, aarch64_arm_attribute_table): New tables.
	(aarch64_attribute_table): Redefine to include the gnu and arm
	attributes.
	(aarch64_fntype_pstate_sm, aarch64_fntype_isa_mode): New functions.
	(aarch64_fndecl_pstate_sm, aarch64_fndecl_isa_mode): Likewise.
	(aarch64_gen_callee_cookie, aarch64_callee_abi): Likewise.
	(aarch64_insn_callee_cookie, aarch64_insn_callee_abi): Use them.
	(aarch64_function_arg, aarch64_output_mi_thunk): Likewise.
	(aarch64_init_cumulative_args): Initialize the isa_mode field.
	(aarch64_output_mi_thunk): Use aarch64_gen_callee_cookie to get
	the ABI cookie.
	(aarch64_override_options): Add the ISA mode to the feature set.
	(aarch64_temporary_target::copy_from_fndecl): Likewise.
	(aarch64_fndecl_options, aarch64_handle_attr_arch): Likewise.
	(aarch64_set_current_function): Maintain the correct ISA mode.
	(aarch64_tlsdesc_abi_id): Return an arm_pcs.
	(aarch64_comp_type_attributes): Handle arm::streaming and
	arm::streaming_compatible.
	* config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros):
	Define __arm_streaming and __arm_streaming_compatible.
	* config/aarch64/aarch64.md (tlsdesc_small_<mode>): Use
	aarch64_gen_callee_cookie to get the ABI cookie.
	* config/aarch64/t-aarch64 (TM_H): Add all feature-related .def files.

gcc/testsuite/
	* gcc.target/aarch64/sme/aarch64-sme.exp: New harness.
	* gcc.target/aarch64/sme/streaming_mode_1.c: New test.
	* gcc.target/aarch64/sme/streaming_mode_2.c: Likewise.
	* gcc.target/aarch64/sme/keyword_macros_1.c: Likewise.
	* g++.target/aarch64/sme/aarch64-sme.exp: New harness.
	* g++.target/aarch64/sme/streaming_mode_1.C: New test.
	* g++.target/aarch64/sme/streaming_mode_2.C: Likewise.
	* g++.target/aarch64/sme/keyword_macros_1.C: Likewise.
	* gcc.target/aarch64/auto-init-1.c: Only expect the call insn
	to contain 1 (const_int 0), not 2.
2023-12-05 10:11:23 +00:00
Richard Sandiford
1ce9dc263c aarch64: Add tuple forms of svreinterpret
SME2 adds a number of intrinsics that operate on tuples of 2 and 4
vectors.  The ACLE therefore extends the existing svreinterpret
intrinsics to handle tuples as well.

gcc/
	* config/aarch64/aarch64-sve-builtins-base.cc
	(svreinterpret_impl::fold): Punt on tuple forms.
	(svreinterpret_impl::expand): Use tuple_mode instead of vector_mode.
	* config/aarch64/aarch64-sve-builtins-base.def (svreinterpret):
	Extend to x1234 groups.
	* config/aarch64/aarch64-sve-builtins-functions.h
	(multi_vector_function::vectors_per_tuple): If the function has
	a group suffix, get the number of vectors from there.
	* config/aarch64/aarch64-sve-builtins-shapes.h (reinterpret): Declare.
	* config/aarch64/aarch64-sve-builtins-shapes.cc (reinterpret_def)
	(reinterpret): New function shape.
	* config/aarch64/aarch64-sve-builtins.cc (function_groups): Handle
	DEF_SVE_FUNCTION_GS.
	* config/aarch64/aarch64-sve-builtins.def (DEF_SVE_FUNCTION_GS): New
	macro.
	(DEF_SVE_FUNCTION): Forward to DEF_SVE_FUNCTION_GS by default.
	* config/aarch64/aarch64-sve-builtins.h
	(function_instance::tuple_mode): New member function.
	(function_base::vectors_per_tuple): Take the function instance
	as argument and get the number from the group suffix.
	(function_instance::vectors_per_tuple): Update accordingly.
	* config/aarch64/iterators.md (SVE_FULLx2, SVE_FULLx3, SVE_FULLx4)
	(SVE_ALL_STRUCT): New mode iterators.
	(SVE_STRUCT): Redefine in terms of SVE_FULL*.
	* config/aarch64/aarch64-sve.md (@aarch64_sve_reinterpret<mode>)
	(*aarch64_sve_reinterpret<mode>): Extend to SVE structure modes.

gcc/testsuite/
	* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_DUAL_XN):
	New macro.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c: Add tests for
	tuple forms.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c: Likewise.
2023-12-05 10:11:22 +00:00
Richard Sandiford
5ce2e22b7e aarch64: Tweak error message for (tuple,vector) pairs
SME2 adds more intrinsics that take a tuple of vectors followed
by a single vector, with the two arguments expected to have the
same element type.  Unlike with the existing svset* intrinsics,
the size of the tuple is not fixed by the overloaded function name.

This patch adds an error message that (hopefully) copes better
with that combination.

gcc/
	* config/aarch64/aarch64-sve-builtins.cc
	(function_resolver::require_derived_vector_type): Add a specific
	error message for the case in which the caller wants a single
	vector whose element type matches a previous tuyple argument.

gcc/testsuite/
	* gcc.target/aarch64/sve/acle/general-c/set_1.c: Tweak expected
	error message.
	* gcc.target/aarch64/sve/acle/general-c/set_3.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/set_5.c: Likewise.
2023-12-05 10:11:22 +00:00
Richard Sandiford
1f7f076ad6 aarch64: Make more use of sve_type in ACLE code
This patch makes some functions operate on sve_type, rather than just
on type suffixes.  It also allows an overload to be resolved based on
a mode and sve_type.  In this case the sve_type is used to derive the
group size as well as a type suffix.

This is needed for the SME2 intrinsics and the new tuple forms of
svreinterpret.  No functional change intended on its own.

gcc/
	* config/aarch64/aarch64-sve-builtins.h
	(function_resolver::lookup_form): Add an overload that takes
	an sve_type rather than type and group suffixes.
	(function_resolver::resolve_to): Likewise.
	(function_resolver::infer_vector_or_tuple_type): Return an sve_type.
	(function_resolver::infer_tuple_type): Likewise.
	(function_resolver::require_matching_vector_type): Take an sve_type
	rather than a type_suffix_index.
	(function_resolver::require_derived_vector_type): Likewise.
	* config/aarch64/aarch64-sve-builtins.cc (num_vectors_to_group):
	New function.
	(function_resolver::lookup_form): Add an overload that takes
	an sve_type rather than type and group suffixes.
	(function_resolver::resolve_to): Likewise.
	(function_resolver::infer_vector_or_tuple_type): Return an sve_type.
	(function_resolver::infer_tuple_type): Likewise.
	(function_resolver::infer_vector_type): Update accordingly.
	(function_resolver::require_matching_vector_type): Take an sve_type
	rather than a type_suffix_index.
	(function_resolver::require_derived_vector_type): Likewise.
	* config/aarch64/aarch64-sve-builtins-shapes.cc (get_def::resolve)
	(set_def::resolve, store_def::resolve, tbl_tuple_def::resolve): Update
	calls accordingly.
2023-12-05 10:11:21 +00:00
Richard Sandiford
1b52d4b66e aarch64: Replace vague "previous arguments" message
If an SVE ACLE intrinsic requires two arguments to have the
same type, the C resolver would report mismatches as "argument N
has type T2, but previous arguments had type T1".  This patch makes
the message say which argument had type T1.

This is needed to give decent error messages for some SME cases.

gcc/
	* config/aarch64/aarch64-sve-builtins.h
	(function_resolver::require_matching_vector_type): Add a parameter
	that specifies the number of the earlier argument that is being
	matched against.
	* config/aarch64/aarch64-sve-builtins.cc
	(function_resolver::require_matching_vector_type): Likewise.
	(require_derived_vector_type): Update calls accordingly.
	(function_resolver::resolve_unary): Likewise.
	(function_resolver::resolve_uniform): Likewise.
	(function_resolver::resolve_uniform_opt_n): Likewise.
	* config/aarch64/aarch64-sve-builtins-shapes.cc
	(binary_long_lane_def::resolve): Likewise.
	(clast_def::resolve, ternary_uint_def::resolve): Likewise.

gcc/testsuite/
	* gcc.target/aarch64/sve/acle/general-c/*: Replace "but previous
	arguments had" with "but argument N had".
2023-12-05 10:11:21 +00:00
Richard Sandiford
bb01ef94ff aarch64: Generalise some SVE ACLE error messages
The current SVE ACLE function-resolution diagnostics assume
that a function has a fixed choice between vectors or tuples
of vectors.  If an argument was not an SVE type at all, the
error message said the function "expects an SVE vector type"
or "expects an SVE tuple type".

This patch generalises the error to cope with cases where
an argument can be either a vector or a tuple.  It also splits
out the diagnostics for mismatched tuple sizes, so that they
can be reused by later patches.

gcc/
	* config/aarch64/aarch64-sve-builtins.h
	(function_resolver::infer_sve_type): New member function.
	(function_resolver::report_incorrect_num_vectors): Likewise.
	* config/aarch64/aarch64-sve-builtins.cc
	(function_resolver::infer_sve_type): New function,.
	(function_resolver::report_incorrect_num_vectors): New function,
	split out from...
	(function_resolver::infer_vector_or_tuple_type): ...here.  Use
	infer_sve_type.

gcc/testsuite/
	* gcc.target/aarch64/sve/acle/general-c/*: Update expected error
	messages.
2023-12-05 10:11:20 +00:00
Richard Sandiford
7f6de9861e aarch64: Add sve_type to SVE builtins code
Until now, the SVE ACLE code had mostly been able to represent
individual SVE arguments with just an element type suffix (s32, u32,
etc.).  However, the SME2 ACLE provides many overloaded intrinsics
that operate on tuples rather than single vectors.  This patch
therefore adds a new type (sve_type) that combines an element
type suffix with a vector count.  This is enough to uniquely
represent all SVE ACLE types.

gcc/
	* config/aarch64/aarch64-sve-builtins.h (sve_type): New struct.
	(sve_type::operator==): New function.
	(function_resolver::get_vector_type): Delete.
	(function_resolver::report_no_such_form): Take an sve_type rather
	than a type_suffix_index.
	* config/aarch64/aarch64-sve-builtins.cc (get_vector_type): New
	function.
	(function_resolver::get_vector_type): Delete.
	(function_resolver::report_no_such_form): Take an sve_type rather
	than a type_suffix_index.
	(find_sve_type): New function, split out from...
	(function_resolver::infer_vector_or_tuple_type): ...here.
2023-12-05 10:11:20 +00:00
Richard Sandiford
7b607f1979 aarch64: Add group suffixes to SVE intrinsics
The SME2 ACLE adds a new "group" suffix component to the naming
convention for SVE intrinsics.  This is also used in the new tuple
forms of the svreinterpret intrinsics.

This patch adds support for group suffixes and defines the
x2, x3 and x4 suffixes that are needed for the svreinterprets.

gcc/
	* config/aarch64/aarch64-sve-builtins-shapes.cc (build_one): Take
	a group suffix index parameter.
	(build_32_64, build_all): Update accordingly.  Iterate over all
	group suffixes.
	* config/aarch64/aarch64-sve-builtins-sve2.cc (svqrshl_impl::fold)
	(svqshl_impl::fold, svrshl_impl::fold): Update function_instance
	constructors.
	* config/aarch64/aarch64-sve-builtins.cc (group_suffixes): New array.
	(groups_none): New constant.
	(function_groups): Initialize the groups field.
	(function_instance::hash): Hash the group index.
	(function_builder::get_name): Add the group suffix.
	(function_builder::add_overloaded_functions): Iterate over all
	group suffixes.
	(function_resolver::lookup_form): Take a group suffix parameter.
	(function_resolver::resolve_to): Likewise.
	* config/aarch64/aarch64-sve-builtins.def (DEF_SVE_GROUP_SUFFIX): New
	macro.
	(x2, x3, x4): New group suffixes.
	* config/aarch64/aarch64-sve-builtins.h (group_suffix_index): New enum.
	(group_suffix_info): New structure.
	(function_group_info::groups): New member variable.
	(function_instance::group_suffix_id): Likewise.
	(group_suffixes): New array.
	(function_instance::operator==): Compare the group suffixes.
	(function_instance::group_suffix): New function.
2023-12-05 10:11:19 +00:00
Richard Sandiford
dd7aaef62a aarch64: Make AARCH64_FL_SVE requirements explicit
So far, all intrinsics covered by the aarch64-sve-builtins*
framework have (naturally enough) required at least SVE.
However, arm_sme.h defines a couple of intrinsics that can
be called by any code.  It's therefore necessary to make
the implicit SVE requirement explicit.

gcc/
	* config/aarch64/aarch64-sve-builtins.cc (function_groups): Remove
	implied requirement on SVE.
	* config/aarch64/aarch64-sve-builtins-base.def: Explicitly require SVE.
	* config/aarch64/aarch64-sve-builtins-sve2.def: Likewise.
2023-12-05 10:11:19 +00:00