We have a code duplication in riscv_set_arch_by_subset_list() and
riscv_parse_arch_string(), where the latter function parses an ISA string
into a subset_list before doing the same as the former function.
riscv_parse_arch_string() is used to process command line options and
riscv_set_arch_by_subset_list() processes target attributes.
So, it is obvious that both functions should do the same.
Let's deduplicate the code to enforce this.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (riscv_set_arch_by_subset_list):
Fix overlong line.
(riscv_parse_arch_string): Replace duplicated code by a call to
riscv_set_arch_by_subset_list.
(cherry picked from commit 85fa334fbcaa8e4b98ab197a8c9410dde87f0ae3)
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
There are two test cases with the following skip directive:
dg-skip-if "" { *-*-* } { "-flto -fno-fat-lto-objects" }
This reads as: skip if both '-flto' and '-fno-fat-lto-objects'
are present. This is not the case if only '-flto' is present.
Since both tests depend on instruction sequences (one does
check-function-bodies the other tests for an assembler error
message), they won't work reliably with fat LTO objects.
Let's change the skip line to gate the test on '-flto'
to avoid failing tests like this:
FAIL: gcc.target/riscv/interrupt-misaligned.c -O2 -flto check-function-bodies interrupt
FAIL: gcc.target/riscv/interrupt-misaligned.c -O2 -flto -flto-partition=none check-function-bodies interrupt
FAIL: gcc.target/riscv/pr93202.c -O2 -flto (test for errors, line 10)
FAIL: gcc.target/riscv/pr93202.c -O2 -flto (test for errors, line 9)
FAIL: gcc.target/riscv/pr93202.c -O2 -flto -flto-partition=none (test for errors, line 10)
FAIL: gcc.target/riscv/pr93202.c -O2 -flto -flto-partition=none (test for errors, line 9)
gcc/testsuite/ChangeLog:
* gcc.target/riscv/interrupt-misaligned.c: Remove
"-fno-fat-lto-objects" from skip condition.
* gcc.target/riscv/pr93202.c: Likewise.
(cherry picked from commit 0717d50fc4ff983b79093bdef43b04e4584cc3cd)
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
The first two patches for PR113719 have each regressed
gcc.dg/ipa/iinline-attr.c on a different target. The reason for this
instability is that there are competing flag_omit_frame_pointer
overriders on x86:
- ix86_recompute_optlev_based_flags computes and sets a
-f[no-]omit-frame-pointer default depending on
USE_IX86_FRAME_POINTER and, in 32-bit mode, optimize_size
- ix86_option_override_internal enables flag_omit_frame_pointer for
-momit-leaf-frame-pointer to take effect
ix86_option_override[_internal] calls
ix86_recompute_optlev_based_flags before setting
flag_omit_frame_pointer. It is called during global process_options.
But ix86_recompute_optlev_based_flags is also called by
parse_optimize_options, during attribute processing, and at that
point, ix86_option_override is not called, so the final overrider for
global options is not applied to the optimize attributes. If they
differ, the testcase fails.
In order to fix this, we need to process all overriders of this option
whenever we process any of them. Since this setting is affected by
optimization options, it makes sense to compute it in
parse_optimize_options, rather than in process_options.
for gcc/ChangeLog
PR target/113719
* config/i386/i386-options.cc (ix86_option_override_internal):
Move flag_omit_frame_pointer final overrider...
(ix86_recompute_optlev_based_flags): ... here.
(cherry picked from commit bf8e80f9d164f8778d86a3dc50e501cf19a9eff1)
The first patch for PR113719 regressed gcc.dg/ipa/iinline-attr.c on
toolchains configured to --enable-frame-pointer, because the
optimization node created within handle_optimize_attribute had
flag_omit_frame_pointer incorrectly set, whereas
default_optimization_node didn't. With this difference,
can_inline_edge_by_limits_p flagged an optimization mismatch and we
refused to inline the function that had a redundant optimization flag
into one that didn't, which is exactly what is tested for there.
This patch restores the calls to ix86_default_align and
ix86_recompute_optlev_based_flags that used to be, and ought to be,
issued during TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE, but preserves the
intent of the original change, of having those functions called at
different spots within ix86_option_override_internal. To that end,
the remaining bits were refactored into a separate function, that was
in turn adjusted to operate on explicitly-passed opts and opts_set,
rather than going for their global counterparts.
for gcc/ChangeLog
PR target/113719
* config/i386/i386-options.cc
(ix86_override_options_after_change_1): Add opts and opts_set
parms, operate on them, after factoring out of...
(ix86_override_options_after_change): ... this. Restore calls
of ix86_default_align and ix86_recompute_optlev_based_flags.
(ix86_option_override_internal): Call the factored-out bits.
(cherry picked from commit bf2fc0a27b35de039c3d45e6d7ea9ad0a8a305ba)
According to Intel® 64 and IA-32 Architectures Optimization Reference
Manual[1], Branch Hint is updated for Redwood Cove.
--------cut from [1]-------------------------
Starting with the Redwood Cove microarchitecture, if the predictor has
no stored information about a branch, the branch has the Intel® SSE2
branch taken hint (i.e., instruction prefix 3EH), When the codec
decodes the branch, it flips the branch’s prediction from not-taken to
taken. It then flushes the pipeline in front of it and steers this
pipeline to fetch the taken path of the branch.
--------cut end -----------------------------
Split tune branch_prediction_hints into branch_prediction_hints_taken
and branch_prediction_hints_not_taken, always generate branch hint for
conditional branches, both tunes are disabled by default.
[1] https://www.intel.com/content/www/us/en/content-details/821612/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html
gcc/
* config/i386/i386.cc (ix86_print_operand): Always generate
branch hint for conditional branches.
* config/i386/i386.h (TARGET_BRANCH_PREDICTION_HINTS): Split
into ..
(TARGET_BRANCH_PREDICTION_HINTS_TAKEN): .. this, and ..
(TARGET_BRANCH_PREDICTION_HINTS_NOT_TAKEN): .. this.
* config/i386/x86-tune.def (X86_TUNE_BRANCH_PREDICTION_HINTS):
Split into ..
(X86_TUNE_BRANCH_PREDICTION_HINTS_TAKEN): .. this, and ..
(X86_TUNE_BRANCH_PREDICTION_HINTS_NOT_TAKEN): .. this.
(cherry picked from commit a910c30c7c27cd0f6d2d2694544a09fb11d611b9)
gcc/fortran/ChangeLog:
PR fortran/93635
* symbol.cc (conflict_std): Helper function for reporting attribute
conflicts depending on the Fortran standard version.
(conf_std): Helper macro for checking standard-dependent conflicts.
(gfc_check_conflict): Use it.
gcc/testsuite/ChangeLog:
PR fortran/93635
* gfortran.dg/c-interop/c1255-2.f90: Adjust pattern.
* gfortran.dg/pr87907.f90: Likewise.
* gfortran.dg/pr93635.f90: New test.
Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
(cherry picked from commit 9561cf550a66a89e7c8d31202a03c4fddf82a3f2)
This prevents a premature release of memory with procedure symbols from
submodules, causing random compiler crashes.
The problem is a fragile detection of cyclic references, which can match
with procedures host-associated from a module in submodules, in cases where it
shouldn't. The formal namespace is released, and with it the dummy arguments
symbols of the procedure. But there is no cyclic reference, so the procedure
symbol itself is not released and remains, with pointers to its dummy arguments
now dangling.
The fix adds a condition to avoid the case, and refactors to a new predicate
by the way. Part of the original condition is also removed, for lack of a
reason to keep it.
PR fortran/99798
gcc/fortran/ChangeLog:
* symbol.cc (gfc_release_symbol): Move the condition guarding
the handling cyclic references...
(cyclic_reference_break_needed): ... here as a new predicate.
Remove superfluous parts. Add a condition preventing any premature
release with submodule symbols.
gcc/testsuite/ChangeLog:
* gfortran.dg/submodule_33.f08: New test.
(cherry picked from commit 38d1761c0c94b77a081ccc180d6e039f7a670468)
Add the preliminary code that the generated expression for MASK may depend
on when generating the inline code to evaluate MINLOC or MAXLOC with a
scalar MASK.
The generated code was only keeping the generated expression but not the
preliminary code, which was sufficient for simple cases such as data
references or simple (scalar) function calls, but was bogus with more
complicated ones.
gcc/fortran/ChangeLog:
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Add the
preliminary code generated for MASK to the preliminary code of
MINLOC/MAXLOC.
gcc/testsuite/ChangeLog:
* gfortran.dg/minmaxloc_17.f90: New test.
(cherry picked from commit d211100903d4d532d989451243ea00d7fa2e9d5e)
Although for instructions MVI and MVIY it does not make a difference
whether the immediate is interpreted as signed or unsigned, GAS expects
unsigned immediates for instruction format SI_URD.
gcc/ChangeLog:
* config/s390/vector.md (mov<mode>): Fix output template for
movv1qi.
(cherry picked from commit e6680d3f392f7f7cc2a1515276213e21e9eeab1c)
When we rebased the PSTL on upstream, in r14-2109-g3162ca09dbdc2e, a
change to how _PSTL_USAGE_WARNINGS is set was missed out, but the change
to how it's tested was included. This means that the macro is always
defined, so testing it with #ifdef (instead of using #if to test its
value) doesn't work as intended.
Revert the test to use #if again, since that part of the upstream change
was unnecessary in the first place (the macro is always defined, so
there's no need to use #ifdef to avoid -Wundef warnings).
libstdc++-v3/ChangeLog:
PR libstdc++/113376
* include/pstl/pstl_config.h: Use #if instead of #ifdef to test
the _PSTL_USAGE_WARNINGS macro.
(cherry picked from commit 99a1fe6c12c733fe4923a75a79d09a66ff8abcec)
Due to PR c++/85723 the std::is_trivial trait is true for types with a
deleted default constructor, so the use of std::is_trivial in
std::to_array is not sufficient to ensure the type can be trivially
default constructed then filled using memcpy.
I also forgot that a type with a deleted assignment operator can still
be trivial, so we also need to check that it's assignable because the
is_constant_evaluated() path can't use memcpy.
Replace the uses of std::is_trivial with std::is_trivially_copyable
(needed for memcpy), std::is_trivially_default_constructible (needed so
that the default construction is valid and does no work) and
std::is_copy_assignable (needed for the constant evaluation case).
libstdc++-v3/ChangeLog:
PR libstdc++/115522
* include/std/array (to_array): Workaround the fact that
std::is_trivial is not sufficient to check that a type is
trivially default constructible and assignable.
* testsuite/23_containers/array/creation/115522.cc: New test.
(cherry picked from commit 510ce5eed69ee1bea9c2c696fe3b2301e16d1486)
PR target/115840.
In riscv_preferred_else_value, we create an uninitialized tmp var
for else value, instead of the 0 (as default_preferred_else_value)
or the pre-exists VAR (as aarch64 does), so that we can use agnostic
policy.
The problem is that `warn_uninit` will emit a warning:
'({anonymous})' may be used uninitialized
Let's mark this tmp var as NO_WARNING.
This problem is found when I try to build glibc with V extension.
gcc
PR target/115840
* config/riscv/riscv.cc(riscv_preferred_else_value): Mark
tmp_var as NO_WARNING.
gcc/testsuite
* gcc.dg/vect/pr115840.c: New testcase.
(cherry picked from commit c6f38e5e6d900b8ed6a4f5c126d3197946cad4dd)
2024-05-23 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/103312
* dependency.cc (gfc_dep_compare_expr): Handle component call
expressions. Return -2 as default and return 0 if compared with
a function expression that is from an interface body and has
the same name.
* expr.cc (gfc_reduce_init_expr): If the expression is a comp
call do not attempt to reduce, defer to resolution and return
false.
* trans-types.cc (gfc_get_dtype_rank_type,
gfc_get_nodesc_array_type): Fix whitespace.
gcc/testsuite/
PR fortran/103312
* gfortran.dg/pr103312.f90: New test.
(cherry picked from commit 2ce90517ed75c4af9fc0616f2670cf6dfcfa8a91)
This patch fixes the backend pattern that was printing the wrong input
scalar register pair when inserting into lane 1.
Added a new test to force float-abi=hard so we can use scan-assembler to check
correct codegen.
gcc/ChangeLog:
PR target/115611
* config/arm/mve.md (mve_vec_setv2di_internal): Fix printing of input
scalar register pair when lane = 1.
gcc/testsuite/ChangeLog:
* gcc.target/arm/mve/intrinsics/vsetq_lane_su64.c: New test.
(cherry picked from commit 7c11fdd2cc11a7058e9643b6abf27831970ad2c9)
When duplicate_decls finds a match with an existing imported
declaration, it clears DECL_LANG_SPECIFIC of the olddecl and replaces it
with the contents of newdecl; this clears DECL_MODULE_ENTITY_P causing
an ICE if the same declaration is imported again later.
This fixes the issue by ensuring that the flag is transferred to newdecl
before clearing so that it ends up on olddecl again.
For future-proofing we also do the same with DECL_MODULE_KEYED_DECLS_P,
though because we don't yet support textual redefinition merging we
can't yet test this works as intended. I don't expect it's possible for
a new declaration already to have extra keyed decls mismatching that of
the old declaration though, so I don't do anything with 'keyed_map' at
this time.
PR c++/99241
gcc/cp/ChangeLog:
* decl.cc (duplicate_decls): Merge module entity information.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr99241_a.H: New test.
* g++.dg/modules/pr99241_b.H: New test.
* g++.dg/modules/pr99241_c.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
(cherry picked from commit f04f9714fca40315360af109b9e5ca2305fd75db)
Since r13-1006-g2005b9b888eeac, the test case copysign_softfloat_1.c
no longer contains any lsr istruction, so drop the check as per
comment 9 in PR105090.
gcc/testsuite/ChangeLog:
PR target/105090
* gcc.target/arm/copysign_softfloat_1.c: Drop check for lsr
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
(cherry picked from commit 4865a92b35054fdfaa1318a4c1f56d95d44012a2)
emit_store_flag_1 calculates scode (swapped condition code) at the
beginning of the function from the value of code variable. However,
code variable may change before scode usage site, resulting in
invalid stalled scode value.
Move calculation of scode value just before its only usage site to
avoid stalled scode value.
PR middle-end/115836
gcc/ChangeLog:
* expmed.cc (emit_store_flag_1): Move calculation of
scode just before its only usage site.
(cherry picked from commit 44933fdeb338e00c972e42224b9a83d3f8f6a757)
Root cause:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b27d323a368033f0b37e93c57a57a35fd9997864
Commit above tries in targetm.gen_epilogue () to detect if
there's li a0,0 insn at the end of insn chain, if so, cm.popret
is replaced by cm.popretz and li a0,0 insn is deleted.
Insertion of the generated epilogue sequence
into the insn chain doesn't happen at this moment.
If later shrink-wrap decides NOT to insert the epilogue sequence at the end
of insn chain, then the li a0,0 insn has already been mistakeny removed.
Fix this issue by removing generation of cm.popretz in epilogue,
leaving the assignment to a0 and use insn with cm.popret.
That's likely going to result in some kind of code size regression,
but not a correctness regression.
Optimization can be done in future.
Signed-off-by: Fei Gao <gaofei@eswincomputing.com>
gcc/ChangeLog:
PR target/113715
* config/riscv/riscv.cc (riscv_zcmp_can_use_popretz): Removed.
(riscv_gen_multi_pop_insn): Remove generation of cm.popretz.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rv32e_zcmp.c: Adapt TC.
* gcc.target/riscv/rv32i_zcmp.c: Likewise.
The definition of the _Atomic(T) macro needs to refer to ::std::atomic,
not some other std::atomic relative to the current namespace.
libstdc++-v3/ChangeLog:
PR libstdc++/115807
* include/c_compatibility/stdatomic.h (_Atomic): Ensure it
refers to std::atomic in the global namespace.
* testsuite/29_atomics/headers/stdatomic.h/115807.cc: New test.
(cherry picked from commit 40d234dd6439e8c8cfbf3f375a61906aed35c80d)
When the library is configured with --disable-libstdcxx-verbose the
assertions just abort instead of calling __glibcxx_assert_fail, and so I
didn't export that function for the non-verbose build. However, that
option is documented to not change the library ABI, so we still need to
export the symbol from the library. It could be needed by programs
compiled against the headers from a verbose build.
The non-verbose definition can just call abort so that it doesn't pull
in I/O symbols, which are unwanted in a non-verbose build.
libstdc++-v3/ChangeLog:
PR libstdc++/115585
* src/c++11/assert_fail.cc (__glibcxx_assert_fail): Add
definition for non-verbose builds.
(cherry picked from commit 52370c839edd04df86d3ff2b71fcdca0c7376a7f)
This change removes code that switches the operands in bigendian mode erroneously.
This fixes the related test also.
gcc/ChangeLog:
PR target/114890
* config/aarch64/aarch64-simd.md: Remove bigendian operand swap.
gcc/testsuite/ChangeLog:
PR target/114890
* gcc.target/aarch64/vector_intrinsics_asm.c: Remove xfail.
(cherry picked from commit 11049cdf204bc96bc407e5dd44ed3b8a492f405a)
The valid offset range of LDRD in arm_legitimate_index_p is increased to
-1024..1020 if NEON is enabled since VALID_NEON_DREG_MODE includes DImode.
Fix this by moving the LDRD check earlier.
gcc:
PR target/115153
* config/arm/arm.cc (arm_legitimate_index_p): Move LDRD case before
NEON.
(thumb2_legitimate_index_p): Update comments.
(output_move_neon): Use DFmode for vldr/vstr and non-checking
adjust_address.
gcc/testsuite:
PR target/115153
* gcc.target/arm/pr115153.c: Add new test.
* lib/target-supports.exp: Add arm_arch_v7ve_neon target support.
(cherry picked from commit 44e5ecfd261afe72aa04eba4bf1a9ec782579cab)
AVX10 Documentaion has specified ecx value as 0 for AVX10 version and
vector size under 0x24 subleaf. Although for ecx=1, the bits are all
reserved for now, we still need to specify ecx as 0 to avoid dirty
value in ecx.
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (get_available_features): Correct
AVX10 CPUID emulation to specify ecx value.
According to the ISA, the zvfhmin sub extension should only contain
convertion insn. Thus, the vfmv insn acts on FP16 should not be
present when only the zvfhmin option is given.
This patch would like to fix it by split the pred_broadcast define_insn
into zvfhmin and zvfh part. Given below example:
void test (_Float16 *dest, _Float16 bias) {
dest[0] = bias;
dest[1] = bias;
}
when compile with -march=rv64gcv_zfh_zvfhmin
Before this patch:
test:
vsetivli zero,2,e16,mf4,ta,ma
vfmv.v.f v1,fa0 // should not leverage vfmv for zvfhmin
vse16.v v1,0(a0)
ret
After this patch:
test:
addi sp,sp,-16
fsh fa0,14(sp)
addi a5,sp,14
vsetivli zero,2,e16,mf4,ta,ma
vlse16.v v1,0(a5),zero
vse16.v v1,0(a0)
addi sp,sp,16
jr ra
PR target/115763
gcc/ChangeLog:
* config/riscv/vector.md (*pred_broadcast<mode>): Split into
zvfh and zvfhmin part.
(*pred_broadcast<mode>_zvfh): New define_insn for zvfh part.
(*pred_broadcast<mode>_zvfhmin): Ditto but for zvfhmin.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/scalar_move-5.c: Adjust asm check.
* gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-7.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-8.c: Ditto.
* gcc.target/riscv/rvv/base/pr115763-1.c: New test.
* gcc.target/riscv/rvv/base/pr115763-2.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
(cherry picked from commit de9254e224eb3d89303cb9b3ba50b4c479c55f7c)
The following fixes an ICE with a .COND_ADD discovered as reduction
even though its else value isn't the reduction chain link but a
constant. This would be wrong-code with --disable-checking I think.
PR tree-optimization/115723
* tree-vect-loop.cc (check_reduction_path): For a .COND_ADD
verify the else value also refers to the reduction chain op.
* gcc.dg/vect/pr115723.c: New testcase.
(cherry picked from commit 286cda3461d6f5ce7d911d3f26bd4975ea7ea11d)
The following adds a missed check when forwprop attempts to rewrite
a complex store.
PR tree-optimization/115694
* tree-ssa-forwprop.cc (pass_forwprop::execute): Check the
store is complex before rewriting it.
* g++.dg/torture/pr115694.C: New testcase.
(cherry picked from commit 543a5b9da964f821b9e723ed9c93d6cdca464d47)
The following avoids associating a reduction path as that might
get STMT_VINFO_REDUC_IDX out-of-sync with the SLP operand order.
This is a latent issue with SLP reductions but now easily exposed
as we're doing single-lane SLP reductions.
When we achieved SLP only we can move and update this meta-data.
PR tree-optimization/115669
* tree-vect-slp.cc (vect_build_slp_tree_2): Do not reassociate
chains that participate in a reduction.
* gcc.dg/vect/pr115669.c: New testcase.
(cherry picked from commit 7886830bb45c4f5dca0496d4deae9a45204d78f5)
The following makes analysis and transform agree on constraints.
PR tree-optimization/115646
* tree-call-cdce.cc (check_pow): Check for bit_sz values
as allowed by transform.
* gcc.dg/pr115646.c: New testcase.
(cherry picked from commit 453b1d291d1a0f89087ad91cf6b1bed1ec68eff3)
2024-05-12 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/84006
PR fortran/100027
PR fortran/98534
* iresolve.cc (gfc_resolve_transfer): Emit a TODO error for
unlimited polymorphic mold.
* trans-expr.cc (gfc_resize_class_size_with_len): Use the fold
even if a block is not available in which to fix the result.
(trans_class_assignment): Enable correct assignment of
character expressions to unlimited polymorphic variables using
lhs _len field and rse string_length.
* trans-intrinsic.cc (gfc_conv_intrinsic_storage_size): Extract
the class expression so that the unlimited polymorphic class
expression can be used in gfc_resize_class_size_with_len to
obtain the storage size for character payloads. Guard the use
of GFC_DECL_SAVED_DESCRIPTOR by testing for DECL_LANG_SPECIFIC
to prevent the ICE. Also, invert the order to use the class
expression extracted from the argument.
(gfc_conv_intrinsic_transfer): In same way as 'storage_size',
use the _len field to obtaining the correct length for arg 1.
Add a branch for the element size in bytes of class expressions
with provision to make use of the unlimited polymorphic _len
field. Again, the class references are explicitly identified.
'mold_expr' was already declared. Use it instead of 'arg'. Do
not fix 'dest_word_len' for deferred character sources because
reallocation on assign makes use of it before it is assigned.
gcc/testsuite/
PR fortran/84006
PR fortran/100027
* gfortran.dg/storage_size_7.f90: New test.
PR fortran/98534
* gfortran.dg/transfer_class_4.f90: New test.
(cherry picked from commit b9294757f82aae8de6d98c122cd4e3b98f685217)
gcc/fortran/ChangeLog:
PR fortran/115700
* trans-stmt.cc (trans_associate_var): When the associate target
is an array-valued character variable, the length is known at entry
of the associate block. Move setting of string length of the
selector to the initialization part of the block.
gcc/testsuite/ChangeLog:
PR fortran/115700
* gfortran.dg/associate_69.f90: New test.
(cherry picked from commit 7b7f203472d07a05d959a29638c7c95d98bf0c1c)
This is an ICE in the RISC-V back-end calling tree_to_uhwi on the DECL_SIZE
of a global variable-length array.
gcc/
PR target/115591
* config/riscv/riscv.cc (riscv_valid_lo_sum_p): Add missing test on
tree_fits_uhwi_p before calling tree_to_uhwi.
gcc/testsuite/
* gnat.dg/array41.ads, gnat.dg/array41.adb: New test.
A Thumb-1 memory operand allows single-register LDMIA/STMIA. This doesn't get
printed as LDR/STR with writeback in unified syntax, resulting in strange
assembler errors if writeback is selected. To work around this, use the 'Uw'
constraint that blocks writeback. Also use a new 'mem_and_no_t1_wback_op'
which is a general memory operand that disallows writeback in Thumb-1.
A few other patterns were using 'm' for Thumb-1 in a similar way, update these
to also use 'mem_and_no_t1_wback_op' and 'Uw'.
gcc:
PR target/115188
* config/arm/arm.md (unaligned_loadsi): Use 'Uw' constraint and
'mem_and_no_t1_wback_op'.
(unaligned_loadhiu): Likewise.
(unaligned_storesi): Likewise.
(unaligned_storehi): Likewise.
* config/arm/predicates.md (mem_and_no_t1_wback_op): Add new predicate.
* config/arm/sync.md (arm_atomic_load<mode>): Use 'Uw' constraint.
(arm_atomic_store<mode>): Likewise.
gcc/testsuite:
PR target/115188
* gcc.target/arm/pr115188.c: Add new test.
(cherry picked from commit d04c5537f5ae4a3acd3f5135347d7e2d8c218811)
The avr-dimode.md expanders have code like emit_move_insn(acc_a, operands[1])
where acc_a is a hard register and operands[1] might be a non-generic
address-space memory reference. Such loads may clobber hard regs since
some of them are implemented as libgcc calls /and/ 64-moves are
expanded as eight byte-moves, so that acc_a or acc_b might be clobbered
by such a load.
This patch simply denies non-generic address-space references by using
nop_general_operand for all avr-dimode.md input predicates.
With the patch, all memory loads that require library calls are issued
before the expander codes from avr-dimode.md are run.
PR target/87376
gcc/
* config/avr/avr-dimode.md: Use "nop_general_operand" instead
of "general_operand" as predicate for all input operands.
gcc/testsuite/
* gcc.target/avr/torture/pr87376.c: New test.
(cherry picked from commit 23a0935262d6817097406578b1c70563f424804b)
The ACLE requires __ARM_FEATURE_SVE_BF16 to be enabled when SVE and BF16
and the associated intrinsics are available.
GCC does support the required intrinsics for TARGET_SVE_BF16 so define
this macro too.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
PR target/115475
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Define __ARM_FEATURE_SVE_BF16 for TARGET_SVE_BF16.
gcc/testsuite/
PR target/115475
* gcc.target/aarch64/acle/bf16_sve_feature.c: New test.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
(cherry picked from commit 6492c7130d6ae9992298fc3d072e2589d1131376)