FreeChainXenon/gcc - Aiden Isik's Forgejo Server

Author	SHA1	Message	Date
Richard Earnshaw	c1f800ccda	Revert "arm: vst1_types_x3 ACLE intrinsics" This reverts commit `ef07ae652c`.	2023-12-08 16:06:46 +00:00
Richard Earnshaw	bdd0a50833	Revert "arm: vst1_types_x4 ACLE intrinsics" This reverts commit `2f48d846c7`.	2023-12-08 16:06:46 +00:00
Richard Earnshaw	3783954776	Revert "arm: vst1q_types_x2 ACLE intrinsics" This reverts commit `2cd0d0261e`.	2023-12-08 16:06:46 +00:00
Richard Earnshaw	684bb3bdcd	Revert "arm: vst1q_types_x3 ACLE intrinsics" This reverts commit `2d58d53c9e`.	2023-12-08 16:06:46 +00:00
Richard Earnshaw	59f77a8971	Revert "arm: vst1q_types_x4 ACLE intrinsics" This reverts commit `4ad77f883c`.	2023-12-08 16:06:46 +00:00
Richard Earnshaw	f6d303dbb5	Revert "arm: vld1_types_x2 ACLE intrinsics" This reverts commit `8fff3f0652`.	2023-12-08 16:06:45 +00:00
Richard Earnshaw	0a80a35df3	Revert "arm: vld1_types_x3 ACLE intrinsics" This reverts commit `8e3ae874b2`.	2023-12-08 16:06:45 +00:00
Richard Earnshaw	b176556e69	Revert "arm: vld1_types_x4 ACLE intrinsics" This reverts commit `656f092cba`.	2023-12-08 16:06:45 +00:00
Florian Weimer	68d4138204	libgcov: Call __builtin_fork instead of fork Some targets do not provide a prototype for fork, and compilation now fails with an implicit-function-declaration error. libgcc/ * libgcov-interface.c (__gcov_fork): Use __builtin_fork instead of fork.	2023-12-08 16:27:55 +01:00
Tobias Burnus	d4b6d14792	OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables This commit adds -fopenmp-allocators which enables support for 'omp allocators' and 'omp allocate' that are associated with a Fortran allocate-stmt. If such a construct is encountered, an error is shown, unless the -fopenmp-allocators flag is present. With -fopenmp -fopenmp-allocators, those constructs get turned into GOMP_alloc allocations, while -fopenmp-allocators (also without -fopenmp) ensures deallocation and reallocation (via intrinsic assignments) are properly directed to GOMP_free/omp_realloc - while normal Fortran allocations are processed by free/realloc. In order to distinguish a 'malloc'ed from a 'GOMP_alloc'ed memory, the version field of the Fortran array discriptor is (mis)used: 0 indicates the normal Fortran allocation while 1 denotes GOMP_alloc. For scalars, there is record keeping in libgomp: GOMP_add_alloc(ptr) will add the pointer address to a splay_tree while GOMP_is_alloc(ptr) will return true it was previously added but also removes it from the list. Besides Fortran FE work, BUILT_IN_GOMP_REALLOC is no part of omp-builtins.def and libgomp gains the mentioned two new function. gcc/ChangeLog: * builtin-types.def (BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE): New. * omp-builtins.def (BUILT_IN_GOMP_REALLOC): New. * builtins.cc (builtin_fnspec): Handle it. * gimple-ssa-warn-access.cc (fndecl_alloc_p, matching_alloc_calls_p): Likewise. * gimple.cc (nonfreeing_call_p): Likewise. * predict.cc (expr_expected_value_1): Likewise. * tree-ssa-ccp.cc (evaluate_stmt): Likewise. * tree.cc (fndecl_dealloc_argno): Likewise. gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_ALLOCATE and EXEC_OMP_ALLOCATORS. * f95-lang.cc (ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST): Add 'ECF_LEAF \| ECF_MALLOC' to existing 'ECF_NOTHROW'. (ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LEAF_LIST): Define. * gfortran.h (gfc_omp_clauses): Add contained_in_target_construct. * invoke.texi (-fopenacc, -fopenmp): Update based on C version. (-fopenmp-simd): New, based on C version. (-fopenmp-allocators): New. * lang.opt (fopenmp-allocators): Add. * openmp.cc (resolve_omp_clauses): For allocators/allocate directive, add target and no dynamic_allocators diagnostic and more invalid diagnostic. * parse.cc (decode_omp_directive): Set contains_teams_construct. * trans-array.h (gfc_array_allocate): Update prototype. (gfc_conv_descriptor_version): New prototype. * trans-decl.cc (gfc_init_default_dt): Fix comment. * trans-array.cc (gfc_conv_descriptor_version): New. (gfc_array_allocate): Support GOMP_alloc allocation. (gfc_alloc_allocatable_for_assignment, structure_alloc_comps): Handle GOMP_free/omp_realloc as needed. * trans-expr.cc (gfc_conv_procedure_call): Likewise. (alloc_scalar_allocatable_for_assignment): Likewise. * trans-intrinsic.cc (conv_intrinsic_move_alloc): Likewise. * trans-openmp.cc (gfc_trans_omp_allocators, gfc_trans_omp_directive): Handle allocators/allocate directive. (gfc_omp_call_add_alloc, gfc_omp_call_is_alloc): New. * trans-stmt.h (gfc_trans_allocate): Update prototype. * trans-stmt.cc (gfc_trans_allocate): Support GOMP_alloc. * trans-types.cc (gfc_get_dtype_rank_type): Set version field. * trans.cc (gfc_allocate_using_malloc, gfc_allocate_allocatable): Update to handle GOMP_alloc. (gfc_deallocate_with_status, gfc_deallocate_scalar_with_status): Handle GOMP_free. (trans_code): Update call. * trans.h (gfc_allocate_allocatable, gfc_allocate_using_malloc): Update prototype. (gfc_omp_call_add_alloc, gfc_omp_call_is_alloc): New prototype. * types.def (BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE): New. libgomp/ChangeLog: * allocator.c (struct fort_alloc_splay_tree_key_s, fort_alloc_splay_compare, GOMP_add_alloc, GOMP_is_alloc): New. * libgomp.h: Define splay_tree_static for 'reverse' splay tree. * libgomp.map (GOMP_5.1.2): New; add GOMP_add_alloc and GOMP_is_alloc; move GOMP_target_map_indirect_ptr from ... (GOMP_5.1.1): ... here. * libgomp.texi (Impl. Status, Memory management): Update for allocators/allocate directives. * splay-tree.c: Handle splay_tree_static define to declare all functions as static. (splay_tree_lookup_node): New. * splay-tree.h: Handle splay_tree_decl_only define. (splay_tree_lookup_node): New prototype. * target.c: Define splay_tree_static for 'reverse'. * testsuite/libgomp.fortran/allocators-1.f90: New test. * testsuite/libgomp.fortran/allocators-2.f90: New test. * testsuite/libgomp.fortran/allocators-3.f90: New test. * testsuite/libgomp.fortran/allocators-4.f90: New test. * testsuite/libgomp.fortran/allocators-5.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-14.f90: Add coarray and not-listed tests. * gfortran.dg/gomp/allocate-5.f90: Remove sorry dg-message. * gfortran.dg/bind_c_array_params_2.f90: Update expected dump for dtype '.version=0'. * gfortran.dg/gomp/allocate-16.f90: New test. * gfortran.dg/gomp/allocators-3.f90: New test. * gfortran.dg/gomp/allocators-4.f90: New test.	2023-12-08 15:18:25 +01:00
Szabolcs Nagy	47575ec9ed	libgcc: Fix config.in It was updated incorrectly in commit `dbbfb52b0e` Author: Szabolcs Nagy <szabolcs.nagy@arm.com> CommitDate: 2023-12-08 11:29:06 +0000 libgcc: aarch64: Configure check for __getauxval so regenerate it. libgcc/ChangeLog: * config.in: Regenerate.	2023-12-08 12:35:40 +00:00
Szabolcs Nagy	91d68665b8	libgcc: aarch64: Add SME unwinder support To support the ZA lazy save scheme, the PCS requires the unwinder to reset the SME state to PSTATE.SM=0, PSTATE.ZA=0, TPIDR2_EL0=0 on entry to an exception handler. We use the __arm_za_disable SME runtime call unconditionally to achieve this. https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#exceptions The hidden alias is used to avoid a PLT and avoid inconsistent VPCS marking (we don't rely on special PCS at the call site). In case of static linking the SME runtime init code is linked in code that raises exceptions. libgcc/ChangeLog: * config/aarch64/__arm_za_disable.S: Add hidden alias. * config/aarch64/aarch64-unwind.h: Reset the SME state before EH return via the _Unwind_Frames_Extra hook.	2023-12-08 11:29:07 +00:00
Szabolcs Nagy	328c17af77	libgcc: aarch64: Add SME runtime support The call ABI for SME (Scalable Matrix Extension) requires a number of helper routines which are added to libgcc so they are tied to the compiler version instead of the libc version. See https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#sme-support-routines The routines are in shared libgcc and static libgcc eh, even though they are not related to exception handling. This is to avoid linking a copy of the routines into dynamic linked binaries, because TPIDR2_EL0 block can be extended in the future which is better to handle in a single place per process. The support routines have to decide if SME is accessible or not. Linux tells userspace if SME is accessible via AT_HWCAP2, otherwise a new __aarch64_sme_accessible symbol was introduced that a libc can define. Due to libgcc and libc build order, the symbol availability cannot be checked so for __aarch64_sme_accessible an unistd.h feature test macro is used while such detection mechanism is not available for __getauxval so we rely on configure checks based on the target triplet. Asm helper code is added to make writing the routines easier. libgcc/ChangeLog: * config/aarch64/t-aarch64: Add sources to the build. * config/aarch64/__aarch64_have_sme.c: New file. * config/aarch64/__arm_sme_state.S: New file. * config/aarch64/__arm_tpidr2_restore.S: New file. * config/aarch64/__arm_tpidr2_save.S: New file. * config/aarch64/__arm_za_disable.S: New file. * config/aarch64/aarch64-asm.h: New file. * config/aarch64/libgcc-sme.ver: New file.	2023-12-08 11:29:06 +00:00
Szabolcs Nagy	dbbfb52b0e	libgcc: aarch64: Configure check for __getauxval Add configure check for the __getauxval ABI symbol, which is always available on aarch64 glibc, and may be available on other linux C runtimes. For now only enabled on glibc, others have to override it target_configargs=libgcc_cv_have___getauxval=yes This is deliberately obscure as it should be auto detected, ideally via a feature test macro in unistd.h (link time detection is not possible since the libc may not be installed at libgcc build time), but currently there is no such feature test mechanism. Without __getauxval, libgcc cannot do runtime CPU feature detection and has to assume only the build time known features are available. libgcc/ChangeLog: * config.in: Undef HAVE___GETAUXVAL. * configure: Regenerate. * configure.ac: Check for __getauxval.	2023-12-08 11:29:06 +00:00
Szabolcs Nagy	3ebb591c65	libgcc: aarch64: Configure check for .variant_pcs support Ideally SME support routines in libgcc are marked as variant PCS symbols so check if as supports the directive. libgcc/ChangeLog: * config.in: Undef HAVE_AS_VARIANT_PCS. * configure: Regenerate. * configure.ac: Check for .variant_pcs.	2023-12-08 11:27:35 +00:00
Richard Biener	5e25baa7e5	tree-optimization/112909 - uninit diagnostic with abnormal copy The following avoids spurious uninit diagnostics for SSA name copies which mostly appear when the source is marked as abnormal which prevents copy propagation. To prevent regressions I remove the bail out for anonymous SSA names in the PHI arg place from warn_uninitialized_phi leaving that to warn_uninit where I handle SSA copies from a SSA name which isn't anonymous. In theory this might cause more valid and false positive diagnostics to pop up. PR tree-optimization/112909 * tree-ssa-uninit.cc (find_uninit_use): Look through a single level of SSA name copies with single use. * gcc.dg/uninit-pr112909.c: New testcase.	2023-12-08 11:31:26 +01:00
Marc Poulhiès	3b93ce50af	Revert "testsuite: require avx_runtime for some tests" This reverts commit `249404649d`.	2023-12-08 10:14:22 +01:00
Jiahao Xu	75f9c2ea29	LoongArch: Fix ICE and use simplify_gen_subreg instead of gen_rtx_SUBREG directly. loongarch_expand_vec_cond_mask_expr generates 'subreg's of 'subreg's, which are not supported in gcc, it causes an ICE: ice.c:55:1: error: unrecognizable insn: 55 \| } \| ^ (insn 63 62 64 8 (set (reg:V4DI 278) (subreg:V4DI (subreg:V4DF (reg:V4DI 273 [ vect__53.26 ]) 0) 0)) -1 (nil)) during RTL pass: vregs ice.c:55:1: internal compiler error: in extract_insn, at recog.cc:2804 Last time, Ruoyao has fixed a similar ICE: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636156.html This patch fixes ICE and use simplify_gen_subreg instead of gen_rtx_SUBREG as much as possible to avoid the same ice happening again. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_try_expand_lsx_vshuf_const): Use simplify_gen_subreg instead of gen_rtx_SUBREG. (loongarch_expand_vec_perm_const_2): Ditto. (loongarch_expand_vec_cond_expr): Ditto. gcc/testsuite/ChangeLog: * gcc.target/loongarch/pr112476-3.c: New test. * gcc.target/loongarch/pr112476-4.c: New test.	2023-12-08 16:44:07 +08:00
Jiahao Xu	40366b89e9	LoongArch: Fix lsx-vshuf.c and lasx-xvshuf_b.c tests fail on LA664 [PR112611] For [x]vshuf instructions, if the index value in the selector exceeds 63, it triggers undefined behavior on LA464, but not on LA664. To ensure compatibility of these two tests on both LA464 and LA664, we have modified both tests to ensure that the index value in the selector does not exceed 63. gcc/testsuite/ChangeLog: PR target/112611 * gcc.target/loongarch/vector/lasx/lasx-xvshuf_b.c: Sure index less than 64. * gcc.target/loongarch/vector/lsx/lsx-vshuf.c: Ditto.	2023-12-08 16:43:27 +08:00
Jiahao Xu	22362d0f77	LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf when -mrecip is enabled. Using -mrecip generates a sequence of instructions to replace divf, sqrtf and rsqrtf. The number of generated instructions is close to or exceeds the maximum issue instructions per cycle of the LoongArch, so vectorized loop unrolling is not performed on them. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_vector_costs::determine_suggested_unroll_factor): If m_has_recip is true, uf return 1. (loongarch_vector_costs::add_stmt_cost): Detect the use of approximate instruction sequence.	2023-12-08 16:29:51 +08:00
Jiahao Xu	9a07bc477e	LoongArch: New options -mrecip and -mrecip= with ffast-math. When both the -mrecip and -mfrecipe options are enabled, use approximate reciprocal instructions and approximate reciprocal square root instructions with additional Newton-Raphson steps to implement single precision floating-point division, square root and reciprocal square root operations, for a better performance. gcc/ChangeLog: * config/loongarch/genopts/loongarch.opt.in (recip_mask): New variable. (-mrecip, -mrecip): New options. * config/loongarch/lasx.md (div<mode>3): New expander. (div<mode>3): Rename. (sqrt<mode>2): New expander. (sqrt<mode>2): Rename. (rsqrt<mode>2): New expander. * config/loongarch/loongarch-protos.h (loongarch_emit_swrsqrtsf): New prototype. (loongarch_emit_swdivsf): Ditto. * config/loongarch/loongarch.cc (loongarch_option_override_internal): Set recip_mask for -mrecip and -mrecip= options. (loongarch_emit_swrsqrtsf): New function. (loongarch_emit_swdivsf): Ditto. * config/loongarch/loongarch.h (RECIP_MASK_NONE, RECIP_MASK_DIV, RECIP_MASK_SQRT RECIP_MASK_RSQRT, RECIP_MASK_VEC_DIV, RECIP_MASK_VEC_SQRT, RECIP_MASK_VEC_RSQRT RECIP_MASK_ALL): New bitmasks. (TARGET_RECIP_DIV, TARGET_RECIP_SQRT, TARGET_RECIP_RSQRT, TARGET_RECIP_VEC_DIV TARGET_RECIP_VEC_SQRT, TARGET_RECIP_VEC_RSQRT): New tests. * config/loongarch/loongarch.md (sqrt<mode>2): New expander. (sqrt<mode>2): Rename. (rsqrt<mode>2): New expander. config/loongarch/loongarch.opt (recip_mask): New variable. (-mrecip, -mrecip): New options. * config/loongarch/lsx.md (div<mode>3): New expander. (div<mode>3): Rename. (sqrt<mode>2): New expander. (sqrt<mode>2): Rename. (rsqrt<mode>2): New expander. * config/loongarch/predicates.md (reg_or_vecotr_1_operand): New predicate. * doc/invoke.texi (LoongArch Options): Document new options. gcc/testsuite/ChangeLog: * gcc.target/loongarch/divf.c: New test. * gcc.target/loongarch/recip-divf.c: New test. * gcc.target/loongarch/recip-sqrtf.c: New test. * gcc.target/loongarch/sqrtf.c: New test. * gcc.target/loongarch/vector/lasx/lasx-divf.c: New test. * gcc.target/loongarch/vector/lasx/lasx-recip-divf.c: New test. * gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c: New test. * gcc.target/loongarch/vector/lasx/lasx-recip.c: New test. * gcc.target/loongarch/vector/lasx/lasx-sqrtf.c: New test. * gcc.target/loongarch/vector/lsx/lsx-divf.c: New test. * gcc.target/loongarch/vector/lsx/lsx-recip-divf.c: New test. * gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c: New test. * gcc.target/loongarch/vector/lsx/lsx-recip.c: New test. * gcc.target/loongarch/vector/lsx/lsx-sqrtf.c: New test.	2023-12-08 16:29:50 +08:00
Jiahao Xu	276c7618bf	LoongArch: Redefine pattern for xvfrecip/vfrecip instructions. Redefine pattern for [x]vfrecip instructions use rtx code instead of unspec, and enable [x]vfrecip instructions to be generated during auto-vectorization. gcc/ChangeLog: * config/loongarch/lasx.md (lasx_xvfrecip_<flasxfmt>): Renamed to .. (recip<mode>3): .. this. * config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vfrecip_d): Redefine to new pattern name. (CODE_FOR_lsx_vfrecip_s): Ditto. (CODE_FOR_lasx_xvfrecip_d): Ditto. (CODE_FOR_lasx_xvfrecip_s): Ditto. (loongarch_expand_builtin_direct): For the vector recip instructions, construct a temporary parameter const1_vector. * config/loongarch/lsx.md (lsx_vfrecip_<flsxfmt>): Renamed to .. (recip<mode>3): .. this. * config/loongarch/predicates.md (const_vector_1_operand): New predicate.	2023-12-08 16:29:50 +08:00
Jiahao Xu	cd2f1d911c	LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt instructions. Rename lasx_xvfrsqrt/lsx_vfrsqrt to rsqrt<mode>2 to align with standard pattern name. Define function use_rsqrt_p to decide when to use rsqrt optab. gcc/ChangeLog: * config/loongarch/lasx.md (lasx_xvfrsqrt_<flasxfmt>): Renamed to .. (rsqrt<mode>2): .. this. * config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vfrsqrt_d): Redefine to standard pattern name. (CODE_FOR_lsx_vfrsqrt_s): Ditto. (CODE_FOR_lasx_xvfrsqrt_d): Ditto. (CODE_FOR_lasx_xvfrsqrt_s): Ditto. * config/loongarch/loongarch.cc (use_rsqrt_p): New function. (loongarch_optab_supported_p): Ditto. (TARGET_OPTAB_SUPPORTED_P): New hook. * config/loongarch/loongarch.md (rsqrt<mode>a): Remove. (rsqrt<mode>2): New insn pattern. (rsqrt<mode>b): Remove. config/loongarch/lsx.md (lsx_vfrsqrt_<flsxfmt>): Renamed to .. (rsqrt<mode>2): .. this. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lasx/lasx-rsqrt.c: New test. * gcc.target/loongarch/vector/lsx/lsx-rsqrt.c: New test.	2023-12-08 16:29:50 +08:00
Jiahao Xu	61f1001f2f	LoongArch: Add support for LoongArch V1.1 approximate instructions. This patch adds define_insn/builtins/intrinsics for these instructions, and add option -mfrecipe to control instruction generation. gcc/ChangeLog: * config/loongarch/genopts/isa-evolution.in (fecipe): Add. * config/loongarch/larchintrin.h (__frecipe_s): New intrinsic. (__frecipe_d): Ditto. (__frsqrte_s): Ditto. (__frsqrte_d): Ditto. * config/loongarch/lasx.md (lasx_xvfrecipe_<flasxfmt>): New insn pattern. (lasx_xvfrsqrte_<flasxfmt>): Ditto. * config/loongarch/lasxintrin.h (__lasx_xvfrecipe_s): New intrinsic. (__lasx_xvfrecipe_d): Ditto. (__lasx_xvfrsqrte_s): Ditto. (__lasx_xvfrsqrte_d): Ditto. * config/loongarch/loongarch-builtins.cc (AVAIL_ALL): Add predicates. (LSX_EXT_BUILTIN): New macro. (LASX_EXT_BUILTIN): Ditto. * config/loongarch/loongarch-cpucfg-map.h: Regenerate. * config/loongarch/loongarch-c.cc: Add builtin macro "__loongarch_frecipe". * config/loongarch/loongarch-def.cc: Regenerate. * config/loongarch/loongarch-str.h (OPTSTR_FRECIPE): Regenerate. * config/loongarch/loongarch.cc (loongarch_asm_code_end): Dump status for TARGET_FRECIPE. * config/loongarch/loongarch.md (loongarch_frecipe_<fmt>): New insn pattern. (loongarch_frsqrte_<fmt>): Ditto. * config/loongarch/loongarch.opt: Regenerate. * config/loongarch/lsx.md (lsx_vfrecipe_<flsxfmt>): New insn pattern. (lsx_vfrsqrte_<flsxfmt>): Ditto. * config/loongarch/lsxintrin.h (__lsx_vfrecipe_s): New intrinsic. (__lsx_vfrecipe_d): Ditto. (__lsx_vfrsqrte_s): Ditto. (__lsx_vfrsqrte_d): Ditto. * doc/extend.texi: Add documentation for LoongArch new builtins and intrinsics. gcc/testsuite/ChangeLog: * gcc.target/loongarch/larch-frecipe-builtin.c: New test. * gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c: New test. * gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c: New test.	2023-12-08 16:29:50 +08:00
Richard Biener	63a541a090	Shrink out-of-SSA dump The following removes the second GIMPLE function dump after remove_ssa_form which used to rewrite the IL with the coalescing result but doesn't do so since a long time now. * tree-outof-ssa.cc (rewrite_out_of_ssa): Dump GIMPLE once only, after final IL adjustments.	2023-12-08 09:21:00 +01:00
Pan Li	51b8259212	RISC-V: Fix ICE for incorrect mode attr in V_F2DI_CONVERT_BRIDGE The mode attr V_F2DI_CONVERT_BRIDGE converts the floating-point mode to the widden floating-point by design. But we take (RVVM1HF "RVVM2SI") by mistake. This patch would like to fix it by replacing the (RVVM1HF "RVVM2SI") to (RVVM1HF "RVVM2SF") as design. gcc/ChangeLog: * config/riscv/vector-iterators.md: Replace RVVM2SI to RVVM2SF for mode attr V_F2DI_CONVERT_BRIDGE. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>	2023-12-08 16:18:15 +08:00
Jiahao Xu	bf3ff057f6	LoongArch: Add support for xorsign. This patch adds support for xorsign pattern to scalar fp and vector. With the new expands, uniformly using vector bitwise logical operations to handle xorsign. On LoongArch64, floating-point registers and vector registers share the same register, so this patch also allows conversion between LSX vector mode and scalar fp mode to avoid unnecessary instruction generation. gcc/ChangeLog: * config/loongarch/lasx.md (xorsign<mode>3): New expander. * config/loongarch/loongarch.cc (loongarch_can_change_mode_class): Allow conversion between LSX vector mode and scalar fp mode. * config/loongarch/loongarch.md (@xorsign<mode>3): New expander. * config/loongarch/lsx.md (@xorsign<mode>3): Ditto. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lasx/lasx-xorsign-run.c: New test. * gcc.target/loongarch/vector/lasx/lasx-xorsign.c: New test. * gcc.target/loongarch/vector/lsx/lsx-xorsign-run.c: New test. * gcc.target/loongarch/vector/lsx/lsx-xorsign.c: New test. * gcc.target/loongarch/xorsign-run.c: New test. * gcc.target/loongarch/xorsign.c: New test.	2023-12-08 16:06:05 +08:00
Jakub Jelinek	f32e49add8	lower-bitint: Avoid merging non-mergeable stmt with cast and mergeable stmt [PR112902] Before bitint lowering, the IL has: b.0_1 = b; _2 = -b.0_1; _3 = (unsigned _BitInt(512)) _2; a.1_4 = a; a.2_5 = (unsigned _BitInt(512)) a.1_4; _6 = _3 * a.2_5; on the first function. Now, gimple_lower_bitint has an optimization (when not -O0) that it avoids assigning underlying VAR_DECLs for certain SSA_NAMEs where it is possible to lower it in a single loop (or straight line code) rather than in multiple loops. So, e.g. the multiplication above uses handle_operand_addr, which can deal with INTEGER_CST arguments, loads but also casts, so it is fine not to assign an underlying VAR_DECL for SSA_NAMEs a.1_4 and a.2_5, as the multiplication can handle it fine. The more problematic case is the other multiplication operand. It is again a result of a (in this case narrowing) cast, so it is fine not to assign VAR_DECL for _3. Normally we can merge the load (b.0_1) with the negation (_2) and even with the following cast (_3). If _3 was used in a mergeable operation like addition, subtraction, negation, &\|^ or equality comparison, all of b.0_1, _2 and _3 could be without underlying VAR_DECLs. The problem is that the current code does that even when the cast is used by a non-mergeable operation, and handle_operand_addr certainly can't handle the mergeable operations feeding the rhs1 of the cast, for multiplication we don't emit any loop in which it could appear, for other operations like shifts or non-equality comparisons we emit loops, but either in the reverse direction or with unpredictable indexes (for shifts). So, in order to lower the above correctly, we need to have an underlying VAR_DECL for either _2 or _3; if we choose _2, then the load and negation would be done in one loop and extension handled as part of the multiplication, if we choose _3, then the load, negation and cast are done in one loop and the multiplication just uses the underlying VAR_DECL computed by that. It is far easier to do this for _3, which is what the following patch implements. It actually already had code for most of it, just it did that for widening casts only (optimize unless the cast rhs1 is not SSA_NAME, or is SSA_NAME defined in some other bb, or with more than one use, etc.). This falls through into such code even for the narrowing or same precision casts, unless the cast is used in a mergeable operation. 2023-12-08 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/112902 * gimple-lower-bitint.cc (gimple_lower_bitint): For a narrowing or same precision cast don't set SSA_NAME_VERSION in m_names only if use_stmt is mergeable_op or fall through into the check that use is a store or rhs1 is not mergeable or other reasons prevent merging. * gcc.dg/bitint-52.c: New test.	2023-12-08 09:03:18 +01:00
Jakub Jelinek	b5cfbb8f4c	vr-values: Avoid ICEs on large _BitInt cast to floating point [PR112901] For casts from integers to floating point, simplify_float_conversion_using_ranges uses SCALAR_INT_TYPE_MODE and queries optabs on the optimization it wants to make. That doesn't really work for large/huge BITINT_TYPE, those have BLKmode which is not scalar int mode. Querying an optab is not useful for that either. I think it is best to just skip this optimization for those bitints, after all, bitint lowering uses ranges already to determine minimum precision for bitint operands of the integer to float casts. 2023-12-08 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/112901 * vr-values.cc (simplify_using_ranges::simplify_float_conversion_using_ranges): Return false if rhs1 has BITINT_TYPE type with BLKmode TYPE_MODE. * gcc.dg/bitint-51.c: New test.	2023-12-08 09:02:15 +01:00
Jakub Jelinek	8f60f5499e	haifa-sched: Avoid overflows in extend_h_i_d [PR112411] On Thu, Dec 07, 2023 at 09:36:23AM +0100, Jakub Jelinek wrote: > Without the dg-skip-if I got on 64-bit host with > -O3 --param min-nondebug-insn-uid=0x40000000: > cc1: out of memory allocating 571230784744 bytes after a total of 2772992 bytes I've looked at this and the problem is in haifa-sched.cc: 9047 h_i_d.safe_grow_cleared (3 * get_max_uid () / 2, true); get_max_uid () is 0x4000024d with the --param min-nondebug-insn-uid=0x40000000 and so 3 * get_max_uid () / 2 actually overflows to -536870028 but as vec.h then treats the value as unsigned, it attempts to allocate 0xe0000374U * 152UL bytes, i.e. those 532GB. If the above is fixed to do 3U * get_max_uid () / 2 instead, it will get slightly better and will only need 0x60000373U * 152UL bytes, i.e. 228GB. Perhaps more could be helped by making the vector indirect (contain pointers to haifa_insn_data_def rather than the structures themselves) and pool allocate those, but the more important question is how sparse are uids in normal compilations without those large --param min-nondebug-insn-uid= parameters. Because if they aren't enough, such a change would increase compile time memory just to help the unusual case. 2023-12-08 Jakub Jelinek <jakub@redhat.com> PR middle-end/112411 * haifa-sched.cc (extend_h_i_d): Use 3U instead of 3 in 3 * get_max_uid () / 2 calculation.	2023-12-08 08:56:33 +01:00
Lulu Cheng	f6cc6eb5b6	LoongArch: Remove the definition of ISA_BASE_LA64V110 from the code. The instructions defined in LoongArch Reference Manual v1.1 are not the instruction set v1.1 version. The CPU defined later may only support some instructions in LoongArch Reference Manual v1.1. Therefore, the macro ISA_BASE_LA64V110 and related definitions are removed here. gcc/ChangeLog: * config/loongarch/genopts/loongarch-strings: Delete STR_ISA_BASE_LA64V110. * config/loongarch/genopts/loongarch.opt.in: Likewise. * config/loongarch/loongarch-cpu.cc (ISA_BASE_LA64V110_FEATURES): Delete macro. (fill_native_cpu_config): Define a new variable hw_isa_evolution record the extended instruction set support read from cpucfg. * config/loongarch/loongarch-def.cc: Set evolution at initialization. * config/loongarch/loongarch-def.h (ISA_BASE_LA64V100): Delete. (ISA_BASE_LA64V110): Likewise. (N_ISA_BASE_TYPES): Likewise. (defined): Likewise. * config/loongarch/loongarch-opts.cc: Likewise. * config/loongarch/loongarch-opts.h (TARGET_64BIT): Likewise. (ISA_BASE_IS_LA64V110): Likewise. * config/loongarch/loongarch-str.h (STR_ISA_BASE_LA64V110): Likewise. * config/loongarch/loongarch.opt: Regenerate.	2023-12-08 15:38:53 +08:00
Xi Ruoyao	2b2a0599e2	LoongArch: Switch loongarch-def from C to C++ to make it possible. We'll use HOST_WIDE_INT in LoongArch static properties in following patches. To keep the same readability as C99 designated initializers, create a std::array like data structure with position setter function, and add field setter functions for structs used in loongarch-def.cc. Remove unneeded guards #if !defined(IN_LIBGCC2) && !defined(IN_TARGET_LIBS) && !defined(IN_RTS) in loongarch-def.h and loongarch-opts.h. gcc/ChangeLog: * config/loongarch/loongarch-def.h: Remove extern "C". (loongarch_isa_base_strings): Declare as loongarch_def_array instead of plain array. (loongarch_isa_ext_strings): Likewise. (loongarch_abi_base_strings): Likewise. (loongarch_abi_ext_strings): Likewise. (loongarch_cmodel_strings): Likewise. (loongarch_cpu_strings): Likewise. (loongarch_cpu_default_isa): Likewise. (loongarch_cpu_issue_rate): Likewise. (loongarch_cpu_multipass_dfa_lookahead): Likewise. (loongarch_cpu_cache): Likewise. (loongarch_cpu_align): Likewise. (loongarch_cpu_rtx_cost_data): Likewise. (loongarch_isa): Add a constructor and field setter functions. * config/loongarch/loongarch-opts.h (loongarch-defs.h): Do not include for target libraries. * config/loongarch/loongarch-opts.cc: Comment code that doesn't run and causes compilation errors. * config/loongarch/loongarch-tune.h (LOONGARCH_TUNE_H): Likewise. (struct loongarch_rtx_cost_data): Likewise. (struct loongarch_cache): Likewise. (struct loongarch_align): Likewise. * config/loongarch/t-loongarch: Compile loongarch-def.cc with the C++ compiler. * config/loongarch/loongarch-def-array.h: New file for a std:array like data structure with position setter function. * config/loongarch/loongarch-def.c: Rename to ... * config/loongarch/loongarch-def.cc: ... here. (loongarch_cpu_strings): Define as loongarch_def_array instead of plain array. (loongarch_cpu_default_isa): Likewise. (loongarch_cpu_cache): Likewise. (loongarch_cpu_align): Likewise. (loongarch_cpu_rtx_cost_data): Likewise. (loongarch_cpu_issue_rate): Likewise. (loongarch_cpu_multipass_dfa_lookahead): Likewise. (loongarch_isa_base_strings): Likewise. (loongarch_isa_ext_strings): Likewise. (loongarch_abi_base_strings): Likewise. (loongarch_abi_ext_strings): Likewise. (loongarch_cmodel_strings): Likewise. (abi_minimal_isa): Likewise. (loongarch_rtx_cost_optimize_size): Use field setter functions instead of designated initializers. (loongarch_rtx_cost_data): Implement default constructor.	2023-12-08 15:38:37 +08:00
Jakub Jelinek	39a1ab9c33	Add IntegerRange for -param=min-nondebug-insn-uid= and fix vector growing in LRA and vec [PR112411] As documented, --param min-nondebug-insn-uid= is very useful in debugging -fcompare-debug issues in RTL dumps, without it it is really hard to find differences. With it, DEBUG_INSNs generally use low INSN_UIDs (1+) and non-DEBUG_INSNs use INSN_UIDs from the parameter up. For good results, the parameter should be larger than the number of DEBUG_INSNs in all or at least problematic functions, so I typically use --param min-nondebug-insn-uid=10000 or --param min-nondebug-insn-uid=1000. The PR is about using --param min-nondebug-insn-uid=2147483647 or similar behavior can be achieved with that minus some epsilon, INSN_UIDs for the non-debug insns then wrap around and as they are signed, all kinds of things break. Obviously, that can happen even without that option, but functions containing more than 2147483647 insns usually don't compile much earlier due to getting out of memory. As it is a debugging option, I'd prefer not to impose any drastically small limits on it because if a function has a lot of DEBUG_INSNs, it is useful to start still above them, otherwise the allocation of uids will DTRT even for DEBUG_INSNs but there will be then differences in non-DEBUG_INSN allocations. So, the following patch uses 0x40000000 limit, half the maximum amount for DEBUG_INSNs and half for non-DEBUG_INSNs, it will still result in very unlikely overflows in real world. Note, using large min-nondebug-insn-uid is very expensive for compile time memory and compile time, because DF as well as various RTL passes use arrays indexed by INSN_UIDs, e.g. LRA with sizeof (void ) elements, ditto df (df->insns). Now, in LRA I've ran into ICEs already with --param min-nondebug-insn-uid=0x2aaaaaaa on 64-bit host. It uses a custom vector management and wants to grow allocation 1.5x when growing, but all this computation is done in int, so already 0x2aaaaaab 3 / 2 + 1 overflows to negative value. And unlike vec.cc growing which also uses unsigned int type for the above (and the + 1 is not there), it also doesn't make sure if there is an overflow that it allocates at least as much as needed, vec.cc does if ... else /* Grow slower when large. / alloc = (alloc 3 / 2); /* If this is still too small, set it to the right size. / if (alloc < desired) alloc = desired; so even if there is overflow during the 1.5 computation, but desired is still representable in the range of the alloced counter (31-bits in both vec.h and LRA), it doesn't grow exponentially but at least works for the current value. The patch now uses there lra_insn_recog_data_len = index * 3U / 2; if (lra_insn_recog_data_len <= index) lra_insn_recog_data_len = index + 1; basically do what vec.cc does. I thought we could do better for both vec.cc and LRA on 64-bit hosts even without growing the allocated counters, but now that I look at it again, perhaps we can't. The above overflows already with original alloc or lra_insn_recog_data_len 0x55555556, where 0x5555555 * 3U / 2 is still 0x7fffffff and so representable in the 32-bit, but 0x55555556 * 3U / 2 is 1. I thought that we could use alloc * (size_t) 3 / 2 so that on 64-bit hosts it wouldn't overflow that quickly, but 0x55555556 * (size_t) 3 / 2 there is 0x80000001 which is still ok in unsigned, but given that vec.h then stores the counter into unsigned m_alloc:31; bit-field, it is too much. With the lra.cc change, one can actually compile simple function with -O0 on 64-bit host with --param min-nondebug-insn-uid=0x40000000 (i.e. the new limit), but already needed quite a big part of my 32GB RAM + 24GB swap. The patch adds a dg-skip-if for that case though, because such option is way too much for 32-bit hosts even at -O0 and empty function, and with -O3 on a longer function it is too much for average 64-bit host as well. Without the dg-skip-if I got on 64-bit host: cc1: out of memory allocating 571230784744 bytes after a total of 2772992 bytes and cc1: out of memory allocating 1388 bytes after a total of 2002944 bytes on 32-bit host. A test requiring more than 532GB of RAM on 64-bit hosts is just too much for our testsuite. 2023-12-08 Jakub Jelinek <jakub@redhat.com> PR middle-end/112411 * params.opt (-param=min-nondebug-insn-uid=): Add IntegerRange(0, 1073741824). * lra.cc (check_and_expand_insn_recog_data): Use 3U rather than 3 in * 3 / 2 computation and if the result is smaller or equal to index, use index + 1. * gcc.dg/params/blocksort-part.c: Add dg-skip-if for --param min-nondebug-insn-uid=1073741824.	2023-12-08 08:29:44 +01:00
Haochen Jiang	642190b416	i386: Mark Xeon Phi ISAs as deprecated Since Knight Landing and Knight Mill microarchitectures are EOL, we would like to remove its support in GCC 15. In GCC 14, we will first emit a warning for the usage. gcc/ChangeLog: * config/i386/driver-i386.cc (host_detect_local_cpu): Do not append "-mno-" for Xeon Phi ISAs. * config/i386/i386-options.cc (ix86_option_override_internal): Emit a warning for KNL/KNM targets. * config/i386/i386.opt: Emit a warning for Xeon Phi ISAs. gcc/testsuite/ChangeLog: * g++.dg/other/i386-2.C: Adjust testcases. * g++.dg/other/i386-3.C: Ditto. * g++.dg/pr80481.C: Ditto. * gcc.dg/pr71279.c: Ditto. * gcc.target/i386/avx5124fmadd-v4fmaddps-1.c: Ditto. * gcc.target/i386/avx5124fmadd-v4fmaddps-2.c: Ditto. * gcc.target/i386/avx5124fmadd-v4fmaddss-1.c: Ditto. * gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c: Ditto. * gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c: Ditto. * gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c: Ditto. * gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c: Ditto. * gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c: Ditto. * gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c: Ditto. * gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c: Ditto. * gcc.target/i386/avx512er-vexp2pd-1.c: Ditto. * gcc.target/i386/avx512er-vexp2pd-2.c: Ditto. * gcc.target/i386/avx512er-vexp2ps-1.c: Ditto. * gcc.target/i386/avx512er-vexp2ps-2.c: Ditto. * gcc.target/i386/avx512er-vrcp28pd-1.c: Ditto. * gcc.target/i386/avx512er-vrcp28pd-2.c: Ditto. * gcc.target/i386/avx512er-vrcp28ps-1.c: Ditto. * gcc.target/i386/avx512er-vrcp28ps-2.c: Ditto. * gcc.target/i386/avx512er-vrcp28ps-3.c: Ditto. * gcc.target/i386/avx512er-vrcp28ps-4.c: Ditto. * gcc.target/i386/avx512er-vrcp28sd-1.c: Ditto. * gcc.target/i386/avx512er-vrcp28sd-2.c: Ditto. * gcc.target/i386/avx512er-vrcp28ss-1.c: Ditto. * gcc.target/i386/avx512er-vrcp28ss-2.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28pd-1.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28pd-2.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28ps-1.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28ps-2.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28ps-3.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28ps-4.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28ps-5.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28ps-6.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28sd-1.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28sd-2.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28ss-1.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28ss-2.c: Ditto. * gcc.target/i386/avx512f-gather-1.c: Ditto. * gcc.target/i386/avx512f-gather-2.c: Ditto. * gcc.target/i386/avx512f-gather-3.c: Ditto. * gcc.target/i386/avx512f-gather-4.c: Ditto. * gcc.target/i386/avx512f-gather-5.c: Ditto. * gcc.target/i386/avx512f-i32gatherd512-1.c: Ditto. * gcc.target/i386/avx512f-i32gatherd512-2.c: Ditto. * gcc.target/i386/avx512f-i32gatherpd512-1.c: Ditto. * gcc.target/i386/avx512f-i32gatherpd512-2.c: Ditto. * gcc.target/i386/avx512f-i32gatherps512-1.c: Ditto. * gcc.target/i386/avx512f-vect-perm-1.c: Ditto. * gcc.target/i386/avx512f-vect-perm-2.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf0dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf0dps-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf0qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf0qps-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1dps-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1qps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0dps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0qps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1dps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1qps-1.c: Ditto. * gcc.target/i386/funcspec-56.inc: Ditto. * gcc.target/i386/pr103404.c: Ditto. * gcc.target/i386/pr104448.c: Ditto. * gcc.target/i386/pr107934.c: Ditto. * gcc.target/i386/pr64387.c: Ditto. * gcc.target/i386/pr70728.c: Ditto. * gcc.target/i386/pr71346.c: Ditto. * gcc.target/i386/pr82941-2.c: Ditto. * gcc.target/i386/pr82942-1.c: Ditto. * gcc.target/i386/pr82942-2.c: Ditto. * gcc.target/i386/pr82990-1.c: Ditto. * gcc.target/i386/pr82990-3.c: Ditto. * gcc.target/i386/pr82990-4.c: Ditto. * gcc.target/i386/pr82990-6.c: Ditto. * gcc.target/i386/pr88713-3.c: Ditto. * gcc.target/i386/pr89523-5.c: Ditto. * gcc.target/i386/pr89523-6.c: Ditto. * gcc.target/i386/pr91033.c: Ditto. * gcc.target/i386/pr94561.c: Ditto. * gcc.target/i386/prefetchwt1-1.c: Ditto. * gcc.target/i386/sse-12.c: Ditto. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-26.c: Ditto. * gcc.target/i386/pr69471-3.c: Removed.	2023-12-08 14:55:21 +08:00
Juzhe-Zhong	b241d91f1e	RISC-V: Remove redundant check of better_main_loop_than_p in COST model Since loop vectorizer won't call better_main_loop_than_p if !flag_vect_cost_model. Committed as it is obvious. gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (costs::better_main_loop_than_p): Remove redundant check.	2023-12-08 14:43:19 +08:00
Hao Liu	2efe3a7de0	tree-optimization/112774: extend the SCEV CHREC tree with a nonwrapping flag The flag is defined as CHREC_NOWRAP(tree), and will be dumped from "{offset, +, 1}_1" to "{offset, +, 1}<nw>_1" (nw is short for nonwrapping). Two SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are added to set and check the flag respectively. As resetting the SCEV cache (i.e., the chrec trees) may not reset the loop->estimate_state, free_numbers_of_iterations_estimates is called explicitly in loop vectorization to make sure the flag can be calculated propriately by niter. gcc/ChangeLog: PR tree-optimization/112774 * tree-pretty-print.cc: if nonwrapping flag is set, chrec will be printed with additional <nw> info. * tree-scalar-evolution.cc: add record_nonwrapping_chrec and nonwrapping_chrec_p to set and check the new flag respectively. * tree-scalar-evolution.h: Likewise. * tree-ssa-loop-niter.cc (idx_infer_loop_bounds, infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness, scev_probably_wraps_p): call record_nonwrapping_chrec before record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is set and return false from scev_probably_wraps_p. * tree-vect-loop.cc (vect_analyze_loop): call free_numbers_of_iterations_estimates explicitly. * tree-core.h: document the nothrow_flag usage in CHREC_NOWRAP * tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used to represent the nonwrapping info. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/scev-16.c: New test.	2023-12-08 11:18:03 +08:00
Fei Gao	9f7ad5eff3	[PATCH 1/5][V3][ifcvt] optimize x=c ? (y op z) : y by RISC-V Zicond like insns op=[PLUS, MINUS, IOR, XOR] Conditional op, if zero rd = (rc == 0) ? (rs1 op rs2) : rs1 --> czero.nez rd, rs2, rc op rd, rs1, rd Conditional op, if non-zero rd = (rc != 0) ? (rs1 op rs2) : rs1 --> czero.eqz rd, rs2, rc op rd, rs1, rd gcc/ChangeLog: * ifcvt.cc (noce_try_cond_zero_arith): New function. (noce_emit_czero, get_base_reg): Likewise. (noce_cond_zero_binary_op_supported): Likewise. (noce_bbs_ok_for_cond_zero_arith): Likewise. (noce_process_if_block): Use noce_try_cond_zero_arith. Co-authored-by: Xiao Zeng<zengxiao@eswincomputing.com>	2023-12-07 17:53:21 -07:00
David Malcolm	775aeabcb8	analyzer: fix ICE for 2 bits before the start of base region [PR112889] Cncrete bindings were using -1 and -2 in the offset field to signify deleted and empty hash slots, but these are valid values, leading to assertion failures inside hash_map::put on a debug build, and probable bugs in a release build. (gdb) call k.dump(true) start: -2, size: 1, next: -1 (gdb) p k.is_empty() $6 = true Fix by using the size field rather than the offset. gcc/analyzer/ChangeLog: PR analyzer/112889 * store.h (concrete_binding::concrete_binding): Strengthen assertion to require size to be be positive, rather than just non-zero. (concrete_binding::mark_deleted): Use size rather than start bit offset. (concrete_binding::mark_empty): Likewise. (concrete_binding::is_deleted): Likewise. (concrete_binding::is_empty): Likewise. gcc/testsuite/ChangeLog: PR analyzer/112889 * c-c++-common/analyzer/ice-pr112889.c: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>	2023-12-07 19:42:45 -05:00
GCC Administrator	08f89e5e7f	Daily bump.	2023-12-08 00:17:33 +00:00
Juzhe-Zhong	71a5ac6703	RISC-V: Support interleave vector with different step sequence This patch fixes 64 ICEs in full coverage testing since they happens due to same reason. Before this patch: internal compiler error: in expand_const_vector, at config/riscv/riscv-v.cc:1270 appears 400 times in full coverage testing report. The root cause is we didn't support interleave vector with different steps. Here is the story: We already supported interleave with single same step, that is: e.g. v = { 0, 100, 2, 102, 4, 104, ... } This sequence can be interpreted as interleave vector by 2 seperate sequences: sequence1 = { 0, 2, 4, ... } and sequence2 = { 100, 102, 104, ... }. Their step are both 2. However, we didn't support interleave vector when they have different steps which cause ICE in such situations. This patch support different steps interleaved vector for the following 2 situations: 1. When vector can be extended EEW: Case 1: { 0, 0, 1, 0, 2, 0, ... } It's interleaved by sequence1 = { 0, 1, 2, ... } and sequence1 = { 0, 0, 0, ... } Suppose the original vector can be extended EEW, e.g. mode = RVVM1SImode. Then such interleaved vector can be achieved with { 1, 2, 3, ... } with RVVM1DImode. So, for this situation the codegen is pretty efficient and clean: .MASK_LEN_STORE (&s, 32B, { -1, ... }, 16, 0, { 0, 0, 1, 0, 2, 0, ... }); -> vsetvli a5,zero,e64,m8,ta,ma vid.v v8 vsetivli zero,16,e32,m8,ta,ma vse32.v v8,0(a4) Case 2: { 0, 100, 1, 100, 2, 100, ... } .MASK_LEN_STORE (&s, 32B, { -1, ... }, 16, 0, { 0, 100, 1, 100, 2, 100, ... }); -> vsetvli a1,zero,e64,m8,ta,ma vid.v v8 li a7,100 vand.vx v8,v8,a4 vsetivli zero,16,e32,m8,ta,ma vse32.v v8,0(a5) 2. When vector can't be extended EEW: Since we can't use EEW = 64, for example, RVVM1SImode in -march=rv32gc_zve32f, we use vmerge to combine the sequence. .MASK_LEN_STORE (&s, 32B, { -1, ... }, 16, 0, { 200, 100, 201, 103, 202, 106, ... }); 1. Generate sequence1 = { 200, 200, 201, 201, 202, 202, ... } and sequence2 = { 100, 100, 103, 103, 106, 106, ... } 2. Merge sequence1 and sequence2 with mask { 0, 1, 0, 1, ... } gcc/ChangeLog: * config/riscv/riscv-protos.h (expand_vec_series): Adapt function. * config/riscv/riscv-v.cc (rvv_builder::double_steps_npatterns_p): New function. (expand_vec_series): Adapt function. (expand_const_vector): Support new interleave vector with different step. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/slp-interleave-1.c: New test. * gcc.target/riscv/rvv/autovec/slp-interleave-2.c: New test. * gcc.target/riscv/rvv/autovec/slp-interleave-3.c: New test. * gcc.target/riscv/rvv/autovec/slp-interleave-4.c: New test.	2023-12-08 07:26:22 +08:00
Patrick Palka	0832cf42a6	libstdc++: Simplify ranges::to closure objects We can use the existing _Partial range adaptor closure object for ranges::to instead of essentially reimplementing it. libstdc++-v3/ChangeLog: * include/std/ranges (__detail::_ToClosure): Replace with ... (__detail::_To): ... this. (__detail::_ToClosure2): Replace with ... (__detail::To2): ... this. (to): Simplify using the existing _Partial range adaptor closure object.	2023-12-07 16:36:23 -05:00
Jonathan Wakely	cab0083dc7	libstdc++: Fix misleading typedef name in <format> This local typedef for uintptr_t was accidentally named uint64_t, probably from a careless code completion shortcut. We don't need the typedef at all since it's only used once. Just use __UINTPTR_TYPE__ directly instead. libstdc++-v3/ChangeLog: * include/std/format (_Iter_sink<charT, contiguous_iterator>): Remove uint64_t local type.	2023-12-07 20:56:34 +00:00
Jonathan Wakely	2f512f6fcd	libstdc++: Use <cstdint> instead of <stdint.h> in <bits/atomic_wait.h> In r14-5922-g6c8f2d3a08bc01 I added <stdint.h> to <bits/atomic_wait.h>, so that uintptr_t is declared if that header is compiled as a header unit. I used <stdint.h> because that's what <atomic> already includes, so it seemed simpler to be consistent. However, this means that name lookup for uintptr_t in <bits/atomic_wait.h> depends on whether <cstdint> has been included by another header first. Whether name lookup finds std::uintptr_t or ::uintptr_t will depend on include order. This causes problems when compiling modules with Clang: bits/atomic_wait.h:251:7: error: 'std::__detail::__waiter_pool_base' has different definitions in different modules; first difference is defined here found method '_S_for' with body _S_for(const void* __addr) noexcept ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ bits/atomic_wait.h:251:7: note: but in 'tm.<global>' found method '_S_for' with different body _S_for(const void* __addr) noexcept ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ By including <cstdint> we would ensure that name lookup always finds the name in namespace std. Alternatively, we can stop including <stdint.h> for those types, so that we don't declare the entire contents of <stdint.h> when we only need a couple of types from it. This patch does the former, which is appropriate for backporting. libstdc++-v3/ChangeLog: * include/bits/atomic_wait.h: Include <cstdint> instead of <stdint.h>.	2023-12-07 20:54:11 +00:00
Jonathan Wakely	1395c573c5	libstdc++: Fix recent changes to __glibcxx_assert [PR112882] The changes in r14-6198-g5e8a30d8b8f4d7 were broken, as I used _GLIBCXX17_CONSTEXPR for the 'if _GLIBCXX17_CONSTEXPR (true)' condition, forgetting that it would also be used for the is_constant_evaluated() check. Using 'if constexpr (std::is_constant_evaluated())' is a bug. Additionally, relying on __glibcxx_assert_fail to give a "not a constant expression" error is a problem because at -O0 an undefined reference to __glibcxx_assert_fail is present in the compiled code. This means you can't use libstdc++ headers without also linking to libstdc++ for the symbol definition. This fix rewrites the __glibcxx_assert macro again. This still avoids doing the duplicate checks, once for constexpr and once at runtime (if _GLIBCXX_ASSERTIONS is defined). When _GLIBCXX_ASSERTIONS is defined we still rely on __glibcxx_assert_fail to give a "not a constant expression" error during constant evaluation (because when assertions are defined it's not a problem to emit a reference to the symbol). But when that macro is not defined, we use a new inline (but not constexpr) overload of __glibcxx_assert_fail to cause compilation to fail. That inline function doesn't cause an undefined reference to a symbol in the library (and will be optimized away anyway). We can also add always_inline to the __is_constant_evaluated function, although this doesn't actually matter for -O0 and it's always inlined with any optimization enabled. libstdc++-v3/ChangeLog: PR libstdc++/112882 * include/bits/c++config (__is_constant_evaluated): Add always_inline attribute. (_GLIBCXX_DO_ASSERT): Remove macro. (__glibcxx_assert): Define separately for assertions-enabled and constexpr-only cases.	2023-12-07 20:54:11 +00:00
Richard Sandiford	9f0f7d8024	aarch64: Add an early RA for strided registers This pass adds a simple register allocator for FP & SIMD registers. Its main purpose is to make use of SME2's strided LD1, ST1 and LUTI2/4 instructions, which require a very specific grouping structure, and so would be difficult to exploit with general allocation. The allocator is very simple. It gives up on anything that would require spilling, or that it might not handle well for other reasons. The allocator needs to track liveness at the level of individual FPRs. Doing that fixes a lot of the PRs relating to redundant moves caused by structure loads and stores. That particular problem is going to be fixed more generally for GCC 15 by Lehua's RA patches. However, the early-RA pass runs before scheduling, so it has a chance to bag a spill-free allocation of vector code before the scheduler moves things around. It could therefore still be useful for non-SME code (e.g. for hand-scheduled ACLE code) even after Lehua's patches are in. The pass is controlled by a tristate switch: - -mearly-ra=all: run on all functions - -mearly-ra=strided: run on functions that have access to strided registers - -mearly-ra=none: don't run on any function The patch makes -mearly-ra=all the default at -O2 and above for now. We can revisit this for GCC 15 once Lehua's patches are in; -mearly-ra=strided might then be more appropriate. As said previously, the pass is very naive. There's much more that we could do, such as handling invariants better. The main focus is on not committing to a bad allocation, rather than on handling as much as possible. gcc/ PR rtl-optimization/106694 PR rtl-optimization/109078 PR rtl-optimization/109391 * config.gcc: Add aarch64-early-ra.o for AArch64 targets. * config/aarch64/t-aarch64 (aarch64-early-ra.o): New rule. * config/aarch64/aarch64-opts.h (aarch64_early_ra_scope): New enum. * config/aarch64/aarch64.opt (mearly_ra): New option. * doc/invoke.texi: Document it. * common/config/aarch64/aarch64-common.cc (aarch_option_optimization_table): Use -mearly-ra=strided by default for -O2 and above. * config/aarch64/aarch64-passes.def (pass_aarch64_early_ra): New pass. * config/aarch64/aarch64-protos.h (aarch64_strided_registers_p) (make_pass_aarch64_early_ra): Declare. * config/aarch64/aarch64-sme.md (@aarch64_sme_lut<LUTI_BITS><mode>): Add a stride_type attribute. (@aarch64_sme_lut<LUTI_BITS><mode>_strided2): New pattern. (@aarch64_sme_lut<LUTI_BITS><mode>_strided4): Likewise. * config/aarch64/aarch64-sve-builtins-base.cc (svld1_impl::expand) (svldnt1_impl::expand, svst1_impl::expand, svstn1_impl::expand): Handle new way of defining multi-register loads and stores. * config/aarch64/aarch64-sve.md (@aarch64_ld1<SVE_FULLx24:mode>) (@aarch64_ldnt1<SVE_FULLx24:mode>, @aarch64_st1<SVE_FULLx24:mode>) (@aarch64_stnt1<SVE_FULLx24:mode>): Delete. * config/aarch64/aarch64-sve2.md (@aarch64_<LD1_COUNT:optab><mode>) (@aarch64_<LD1_COUNT:optab><mode>_strided2): New patterns. (@aarch64_<LD1_COUNT:optab><mode>_strided4): Likewise. (@aarch64_<ST1_COUNT:optab><mode>): Likewise. (@aarch64_<ST1_COUNT:optab><mode>_strided2): Likewise. (@aarch64_<ST1_COUNT:optab><mode>_strided4): Likewise. * config/aarch64/aarch64.cc (aarch64_strided_registers_p): New function. * config/aarch64/aarch64.md (UNSPEC_LD1_SVE_COUNT): Delete. (UNSPEC_ST1_SVE_COUNT, UNSPEC_LDNT1_SVE_COUNT): Likewise. (UNSPEC_STNT1_SVE_COUNT): Likewise. (stride_type): New attribute. * config/aarch64/constraints.md (Uwd, Uwt): New constraints. * config/aarch64/iterators.md (UNSPEC_LD1_COUNT, UNSPEC_LDNT1_COUNT) (UNSPEC_ST1_COUNT, UNSPEC_STNT1_COUNT): New unspecs. (optab): Handle them. (LD1_COUNT, ST1_COUNT): New iterators. * config/aarch64/aarch64-early-ra.cc: New file. gcc/testsuite/ PR rtl-optimization/106694 PR rtl-optimization/109078 PR rtl-optimization/109391 * gcc.target/aarch64/ldp_stp_16.c (cons4_4_float): Tighten expected output test. * gcc.target/aarch64/sve/shift_1.c: Allow reversed shifts for .s as well as .d. * gcc.target/aarch64/sme/strided_1.c: New test. * gcc.target/aarch64/pr109078.c: Likewise. * gcc.target/aarch64/pr109391.c: Likewise. * gcc.target/aarch64/sve/pr106694.c: Likewise.	2023-12-07 19:41:19 +00:00
Ezra Sitorus	656f092cba	arm: vld1_types_x4 ACLE intrinsics This patch is part of a series of patches implementing the _xN variants of the vld1 intrinsic for the arm port. This patch adds the _x4 variants of the vld1 intrinsic. The previous vld1_x4 has been updated to vld1q_x4 to take into account that it works with 4-word-length types. vld1_x4 is now only for 2-word-length types. ACLE documents: https://developer.arm.com/documentation/ihi0053/latest/ ISA documents: https://developer.arm.com/documentation/ddi0487/latest/ gcc/ChangeLog: * config/arm/arm_neon.h (vld1_u8_x4, vld1_u16_x4, vld1_u32_x4, vld1_u64_x4): New (vld1_s8_x4, vld1_s16_x4, vld1_s32_x4, vld1_s64_x4): New. (vld1_f16_x4, vld1_f32_x4): New. (vld1_p8_x4, vld1_p16_x4, vld1_p64_x4): New. (vld1_bf16_x4): New. (vld1q_types_x4): Updated to use vld1q_x4 from arm_neon_builtins.def * config/arm/arm_neon_builtins.def (vld1_x4): Updated entries. (vld1q_x4): New entries, but comes from the old vld1_x2 * config/arm/neon.md (neon_vld1q_x4<mode>): Updated from neon_vld1_x4<mode>. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vld1_base_xN_1.c: Add new tests. * gcc.target/arm/simd/vld1_bf16_xN_1.c: Add new tests. * gcc.target/arm/simd/vld1_fp16_xN_1.c: Add new tests. * gcc.target/arm/simd/vld1_p64_xN_1.c: Add new tests.	2023-12-07 17:16:33 +00:00
Ezra Sitorus	8e3ae874b2	arm: vld1_types_x3 ACLE intrinsics This patch is part of a series of patches implementing the _xN variants of the vld1 intrinsic for the arm port. This patch adds the _x3 variants of the vld1 intrinsic. The previous vld1_x3 has been updated to vld1q_x3 to take into account that it works with 4-word-length types. vld1_x3 is now only for 2-word-length types. ACLE documents: https://developer.arm.com/documentation/ihi0053/latest/ ISA documents: https://developer.arm.com/documentation/ddi0487/latest/ gcc/ChangeLog: * config/arm/arm_neon.h (vld1_u8_x3, vld1_u16_x3, vld1_u32_x3, vld1_u64_x3): New (vld1_s8_x3, vld1_s16_x3, vld1_s32_x3, vld1_s64_x3): New. (vld1_f16_x3, vld1_f32_x3): New. (vld1_p8_x3, vld1_p16_x3, vld1_p64_x3): New. (vld1_bf16_x3): New. (vld1q_types_x3): Updated to use vld1q_x3 from arm_neon_builtins.def * config/arm/arm_neon_builtins.def (vld1_x3): Updated entries. (vld1q_x3): New entries, but comes from the old vld1_x2 * config/arm/neon.md (neon_vld1q_x3<mode>): Updated from neon_vld1_x3<mode>. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vld1_base_xN_1.c: Add new tests. * gcc.target/arm/simd/vld1_bf16_xN_1.c: Add new tests. * gcc.target/arm/simd/vld1_fp16_xN_1.c: Add new tests. * gcc.target/arm/simd/vld1_p64_xN_1.c: Add new tests.	2023-12-07 17:16:21 +00:00
Ezra Sitorus	8fff3f0652	arm: vld1_types_x2 ACLE intrinsics This patch is part of a series of patches implementing the _xN variants of the vld1 intrinsic for the arm port. This patch adds the _x2 variants of the vld1 intrinsic. The previous vld1_x2 has been updated to vld1q_x2 to take into account that it works with 4-word-length types. vld1_x2 is now only for 2-word-length types. ACLE documents: https://developer.arm.com/documentation/ihi0053/latest/ ISA documents: https://developer.arm.com/documentation/ddi0487/latest/ gcc/ChangeLog: * config/arm/arm_neon.h (vld1_u8_x2, vld1_u16_x2, vld1_u32_x2, vld1_u64_x2): New (vld1_s8_x2, vld1_s16_x2, vld1_s32_x2, vld1_s64_x2): New. (vld1_f16_x2, vld1_f32_x2): New. (vld1_p8_x2, vld1_p16_x2, vld1_p64_x2): New. (vld1_bf16_x2): New. (vld1q_types_x2): Updated to use vld1q_x2 from arm_neon_builtins.def * config/arm/arm_neon_builtins.def (vld1_x2): Updated entries. (vld1q_x2): New entries, but comes from the old vld1_x2 * config/arm/neon.md (neon_vld1<VMEMX2_q>_x2<VDQX:mode>): Updated from neon_vld1_x2<mode>. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vld1_base_xN_1.c: Add new tests. * gcc.target/arm/simd/vld1_bf16_xN_1.c: Add new tests. * gcc.target/arm/simd/vld1_fp16_xN_1.c: Add new tests. * gcc.target/arm/simd/vld1_p64_xN_1.c: Add new tests.	2023-12-07 17:16:09 +00:00
Ezra Sitorus	4ad77f883c	arm: vst1q_types_x4 ACLE intrinsics This patch is part of a series of patches implementing the _xN variants of the vst1q intrinsic for the arm port. This patch adds the _x4 variants of the vst1q intrinsic. ACLE: https://developer.arm.com/documentation/ihi0053/latest/ ISA documents: https://developer.arm.com/documentation/ddi0487/latest/ gcc/ChangeLog: * config/arm/arm_neon.h (vst1q_u8_x4, vst1q_u16_x4, vst1q_u32_x4, vst1q_u64_x4): New. (vst1q_s8_x4, vst1q_s16_x4, vst1q_s32_x4, vst1q_s64_x4): New. (vst1q_f16_x4, vst1q_f32_x4): New. (vst1q_p8_x4, vst1q_p16_x4, vst1q_p64_x4): New. (vst1q_bf16_x4): New. * config/arm/arm_neon_builtins.def (vst1q_x4): New entries. * config/arm/neon.md (neon_vst1q_x4<mode>): New. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vst1q_base_xN_1.c: Add new tests. * gcc.target/arm/simd/vst1q_bf16_xN_1.c: Add new tests. * gcc.target/arm/simd/vst1q_fp16_xN_1.c: Add new tests. * gcc.target/arm/simd/vst1q_p64_xN_1.c: Add new tests.	2023-12-07 17:15:36 +00:00
Ezra Sitorus	2d58d53c9e	arm: vst1q_types_x3 ACLE intrinsics This patch is part of a series of patches implementing the _xN variants of the vst1q intrinsic for the arm port. This patch adds the _x3 variants of the vst1q intrinsic. ACLE documents: https://developer.arm.com/documentation/ihi0053/latest/ ISA documents: https://developer.arm.com/documentation/ddi0487/latest/ gcc/ChangeLog: * config/arm/arm_neon.h (vst1q_u8_x3, vst1q_u16_x3, vst1q_u32_x3, vst1q_u64_x3): New. (vst1q_s8_x3, vst1q_s16_x3, vst1q_s32_x3, vst1q_s64_x3): New. (vst1q_f16_x3, vst1q_f32_x3): New. (vst1q_p8_x3, vst1q_p16_x3, vst1q_p64_x3): New. (vst1q_bf16_x3): New. * config/arm/arm_neon_builtins.def (vst1q_x3): New entries. * config/arm/neon.md (neon_vst1q_x3<mode>): New. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vst1q_base_xN_1.c: Add new tests. * gcc.target/arm/simd/vst1q_bf16_xN_1.c: Add new tests. * gcc.target/arm/simd/vst1q_fp16_xN_1.c: Add new tests. * gcc.target/arm/simd/vst1q_p64_xN_1.c: Add new tests.	2023-12-07 17:15:00 +00:00

... 4 5 6 7 8 ...

206513 commits