fr30 is the only target defining GO_IF_LEGITIMATE_ADDRESS right now, in
which case the `code_helper ch` argument to memory_address_addr_space_p()
is unused and emits a new warning.
gcc/ChangeLog:
* recog.cc (memory_address_addr_space_p): Mark possibly unused
argument as unused.
"#error Feature macro not defined" is required to test the existence of an
extension through the preprocessor. However, multiple occurrence of the
exact same error message will confuse the developer once an error is
encountered.
This commit replaces such error messages to
"#error Feature macro for `EXT' not defined" to make which
macro is missing.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zvkn.c: Deduplicate #error messages.
* gcc.target/riscv/zvkn-1.c: Ditto.
* gcc.target/riscv/zvknc.c: Ditto.
* gcc.target/riscv/zvknc-1.c: Ditto.
* gcc.target/riscv/zvknc-2.c: Ditto.
* gcc.target/riscv/zvkng.c: Ditto.
* gcc.target/riscv/zvkng-1.c: Ditto.
* gcc.target/riscv/zvkng-2.c: Ditto.
* gcc.target/riscv/zvks.c: Ditto.
* gcc.target/riscv/zvks-1.c: Ditto.
* gcc.target/riscv/zvksc.c: Ditto.
* gcc.target/riscv/zvksc-1.c: Ditto.
* gcc.target/riscv/zvksc-2.c: Ditto.
* gcc.target/riscv/zvksg.c: Ditto.
* gcc.target/riscv/zvksg-1.c: Ditto.
* gcc.target/riscv/zvksg-2.c: Ditto.
The following guards the bit test merging code in if-combine against
the appearance of SSA names used in abnormal PHIs.
PR tree-optimization/111039
* tree-ssa-ifcombine.cc (ifcombine_ifandif): Check for
SSA_NAME_OCCURS_IN_ABNORMAL_PHI.
* gcc.dg/pr111039.c: New testcase.
The documentation requires that numa_available() is called and only
when successful, other libnuma function may be called. Internally,
it does a syscall to get_mempolicy with flag=0 (which would return
the default policy if mode were not NULL). If this returns -1 (and
not 0) and errno == ENOSYS, the Linux kernel does not have the
get_mempolicy syscall function; if so, numa_available() returns -1
(otherwise: 0).
libgomp/
PR libgomp/111024
* allocator.c (gomp_init_libnuma): Call numa_available; if
not available or not returning 0, disable libnuma usage.
This patch fixes up the code examples in the RTL-SSA documentation (the
sections on making insn changes) to reflect the current API.
The main issues are as follows:
- rtl_ssa::recog takes an obstack_watermark & as the first parameter.
Presumably this is intended to be the change attempt, so I've updated
the examples to pass this through.
- The variants of recog and restrict_movement that take an ignore
predicate have been renamed with an _ignoring suffix, so I've
updated callers to use those names.
- A couple of minor "obvious" fixes to add a missing address-of
operator and correct a variable name.
gcc/ChangeLog:
* doc/rtl.texi: Fix up sample code for RTL-SSA insn changes.
The kernel selftests and other BPF programs make extensive use of the
`naked' function attribute with bodies written using basic inline
assembly. This patch adds support for the attribute to
bpf-unkonwn-none, makes it to inhibit warnings due to lack of explicit
`return' statement, and updates documentation and testsuite
accordingly.
Tested in x86_64-linux-gnu host and bpf-unknown-none target.
gcc/ChangeLog
PR target/111046
* config/bpf/bpf.cc (bpf_attribute_table): Add entry for the
`naked' function attribute.
(bpf_warn_func_return): New function.
(TARGET_WARN_FUNC_RETURN): Define.
(bpf_expand_prologue): Add preventive comment.
(bpf_expand_epilogue): Likewise.
* doc/extend.texi (BPF Function Attributes): Document the `naked'
function attribute.
gcc/testsuite/ChangeLog
* gcc.target/bpf/naked-1.c: New test.
std::format was treating {:f} and {:F} identically on the basis that for
the fixed 1.234567 format there are no alphabetical characters that need
to be in uppercase. But that's wrong for infinities and NaNs, which
should be formatted as "INF" and "NAN" for {:F}.
libstdc++-v3/ChangeLog:
* include/std/format (__format::_Pres_type): Add _Pres_F.
(__formatter_fp::parse): Use _Pres_F for 'F'.
(__formatter_fp::format): Set __upper for _Pres_F.
* testsuite/std/format/functions/format.cc: Check formatting of
infinity and NaN for each presentation type.
The following changes the gate to perform vectorization of BB reductions
to use needs_fold_left_reduction_p which in turn requires handling
TYPE_OVERFLOW_UNDEFINED types in the epilogue code generation by
promoting any operations generated there to use unsigned arithmetic.
The following does this, there's currently only v16qi where x86
supports a .REDUC_PLUS reduction for integral modes so I had to
add a x86 specific testcase using GIMPLE IL.
* tree-vect-slp.cc (vect_slp_check_for_roots): Use
!needs_fold_left_reduction_p to decide whether we can
handle the reduction with association.
(vectorize_slp_instance_root_stmt): For TYPE_OVERFLOW_UNDEFINED
reductions perform all arithmetic in an unsigned type.
* gcc.target/i386/vect-reduc-2.c: New testcase.
Test case g++.dg/analyzer/fanalyzer-show-events-in-system-headers.C
introduced by patch ce8cdf5bcf
emitted a warning for an unused dg-line variable.
This fixes up the blunder.
Signed-off-by: benjamin priour <vultkayn@gcc.gnu.org>
gcc/testsuite/ChangeLog:
* g++.dg/analyzer/fanalyzer-show-events-in-system-headers.C:
Remove dg-line var declare_a.
On macOS 14, a guard in <math.h> changed:
-- MacOSX13.3.sdk/usr/include/math.h 2023-04-19 01:54:44
+++ MacOSX14.0.sdk/usr/include/math.h 2023-08-01 08:42:43
@@ -22,0 +23 @@
+
@@ -43 +44 @@
-#if __FLT_EVAL_METHOD__ == 0
+#if __FLT_EVAL_METHOD__ == 0 || __FLT_EVAL_METHOD__ == -1
@@ -49 +50 @@
-#elif __FLT_EVAL_METHOD__ == 2 || __FLT_EVAL_METHOD__ == -1
+#elif __FLT_EVAL_METHOD__ == 2
Therefore the darwin_flt_eval_method fixincludes fix doesn't match any
longer, leading to a large number of testsuite failures like
/private/var/gcc/regression/master/14-gcc/build/gcc/include-fixed/math.h:69:5:
error: #error "Unsupported value of __FLT_EVAL_METHOD__."
where __FLT_EVAL_METHOD__ = 16.
This patch adjusts the fix to allow for both forms.
Tested with make check in fixincludes on x86_64-apple-darwin23.0.0 and
verifying that <math.h> has indeed been fixed as expected.
2023-08-16 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
fixincludes:
* inclhack.def (darwin_flt_eval_method): Handle macOS 14 guard
variant.
* fixincl.x: Regenerate.
* tests/base/math.h [DARWIN_FLT_EVAL_METHOD_CHECK]: Update test.
Since Xcode 15 beta 6, ld -v output differs from previous versions:
* macOS 13/Xcode 14:
@(#)PROGRAM:ld PROJECT:ld64-857.1
* macOS 14/Xcode 15:
@(#)PROGRAM:ld PROJECT:dyld-1015.1
configure cannot handle the new form, so LD64_VERSION isn't set.
This patch fixes this. The autoconf manual states that sed doesn't
portably support alternation, so I'm using two separate expressions to
extract the version number.
Tested on x86_64-apple-darwin23.0.0.
2023-08-16 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc:
* configure.ac (gcc_cv_ld64_version): Allow for dyld in ld -v
output.
* configure: Regenerate.
These tests expect to be able to #undef a feature test macro and then
include <version> to get it redefined. But if <version> has already been
included by the <bits/stdc++.h> PCH then including it again does nothing
and the macro remains undefined.
libstdc++-v3/ChangeLog:
* testsuite/24_iterators/move_iterator/p2520r0.cc: Add no_pch.
* testsuite/std/format/functions/format.cc: Likewise.
* testsuite/std/format/functions/format_c++23.cc: Likewise.
The { dg-add-options no_pch } directive is supposed to add a macro
definition that invalidates the PCH file, and ensures that the #include
directives in the test file are processed as written. But the proc that
adds the options actually removes all existing options, cancelling out
any previous dg-options directive.
This means that using no_pch will cause FAILs in a file that relies on
other options set by an earlier dg-options.
The no_pch directive was added for PR libstdc++/21769 where Janis
suggested adding it as return "$flags -D__GLIBCXX__=99999999" but what
was actually committed didn't include the $flags so replaced them.
Additionally, using no_pch only prevents the precompiled version of
<bits/stdc++.h> from being included, it doesn't prevent the
non-precompiled version being included by -include bits/stdc++.h in the
test flags. Use regsub to filter that out of the options as well.
libstdc++-v3/ChangeLog:
* testsuite/lib/dg-options.exp (add_options_for_no_pch): Remove
any "-include bits/stdc++.h" from options and add the macro to
the existing options instead of replacing them.
This patch would like to support the rounding mode API for the
VFWREDOSUM.VS as the below samples
* __riscv_vfwredosum_vs_f32m1_f64m1_rm
* __riscv_vfwredosum_vs_f32m1_f64m1_rm_m
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc
(widen_freducop): Add frm_opt_type template arg.
(vfwredosum_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwredosum_frm): New intrinsic function def.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/float-point-wredosum.c: New test.
This patch would like to support the rounding mode API for the
VFREDOSUM.VS as the below samples.
* __riscv_vfredosum_vs_f32m1_f32m1_rm
* __riscv_vfredosum_vs_f32m1_f32m1_rm_m
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc
(vfredosum_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfredosum_frm): New intrinsic function def.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/float-point-redosum.c: New test.
This patch would like to support the rounding mode API for the
VFREDUSUM.VS as the below samples.
* __riscv_vfredusum_vs_f32m1_f32m1_rm
* __riscv_vfredusum_vs_f32m1_f32m1_rm_m
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc
(class freducop): Add frm_op_type template arg.
(vfredusum_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfredusum_frm): New intrinsic function def.
* config/riscv/riscv-vector-builtins-shapes.cc
(struct reduc_alu_frm_def): New class for frm shape.
(SHAPE): New declaration.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/float-point-redusum.c: New test.
This patch would like to support the rounding mode API for the
VFNCVT.F.{X|XU|F}.W as the below samples.
* __riscv_vfncvt_f_x_w_f32m1_rm
* __riscv_vfncvt_f_x_w_f32m1_rm_m
* __riscv_vfncvt_f_xu_w_f32m1_rm
* __riscv_vfncvt_f_xu_w_f32m1_rm_m
* __riscv_vfncvt_f_f_w_f32m1_rm
* __riscv_vfncvt_f_f_w_f32m1_rm_m
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc
(class vfncvt_f): Add frm_op_type template arg.
(vfncvt_f_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfncvt_f_frm): New intrinsic function def.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/float-point-ncvt-f.c: New test.
This patch would like to support the rounding mode API for the
VFNCVT.XU.F.W as the below samples.
* __riscv_vfncvt_xu_f_w_u16mf2_rm
* __riscv_vfncvt_xu_f_w_u16mf2_rm_m
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc
(vfncvt_xu_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfncvt_xu_frm): New intrinsic function def.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/float-point-ncvt-xu.c: New test.
This patch would like to support the rounding mode API for the
VFNCVT.X.F.W as the below samples.
* __riscv_vfncvt_x_f_w_i16mf2_rm
* __riscv_vfncvt_x_f_w_i16mf2_rm_m
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc
(class vfncvt_x): Add frm_op_type template arg.
(BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfncvt_x_frm): New intrinsic function def.
* config/riscv/riscv-vector-builtins-shapes.cc
(struct narrow_alu_frm_def): New shape function for frm.
(SHAPE): New declaration.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/float-point-ncvt-x.c: New test.
gcc/ChangeLog:
* common/config/i386/i386-common.cc
(ix86_check_avx10_vector_width): New function to check isa_flags
to emit a warning when there is a conflict in AVX10 options for
vector width.
(ix86_handle_option): Add check for avx10.1-256 and avx10.1-512.
* config/i386/driver-i386.cc (host_detect_local_cpu):
Do not append -mno-avx10-max-512bit for -march=native.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx10_1-15.c: New test.
* gcc.target/i386/avx10_1-16.c: Ditto.
* gcc.target/i386/avx10_1-17.c: Ditto.
* gcc.target/i386/avx10_1-18.c: Ditto.
gcc/ChangeLog:
* common/config/i386/i386-common.cc
(ix86_check_avx10): New function to check isa_flags and
isa_flags_explicit to emit warning when AVX10 is enabled
by "-m" option.
(ix86_check_avx512): New function to check isa_flags and
isa_flags_explicit to emit warning when AVX512 is enabled
by "-m" option.
(ix86_handle_option): Do not change the flags when warning
is emitted.
* config/i386/driver-i386.cc (host_detect_local_cpu):
Do not append -mno-avx10.1 for -march=native.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx10_1-11.c: New test.
* gcc.target/i386/avx10_1-12.c: Ditto.
* gcc.target/i386/avx10_1-13.c: Ditto.
* gcc.target/i386/avx10_1-14.c: Ditto.
From: Yanzhang Wang <yanzhang.wang@intel.com>
The pattern is enabled for scalar but not for vector. The patch try to
make it consistent and will convert below code,
shortcut_for_riscv_vrsub_case_1_32:
vl1re32.v v1,0(a1)
vsetvli zero,a2,e32,m1,ta,ma
vrsub.vi v1,v1,-1
vs1r.v v1,0(a0)
ret
to,
shortcut_for_riscv_vrsub_case_1_32:
vl1re32.v v1,0(a1)
vsetvli zero,a2,e32,m1,ta,ma
vnot.v v1,v1
vs1r.v v1,0(a0)
ret
gcc/ChangeLog:
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Use
CONSTM1_RTX.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/simplify-vrsub.c: New test.
Like the support conditional neg (r12-4470-g20dcda98ed376cb61c74b2c71),
this just adds conditional not too.
Also we should be able to turn `(a ? -1 : 0) ^ b` into a conditional
not.
OK? Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu.
gcc/ChangeLog:
* internal-fn.def (COND_NOT): New internal function.
* match.pd (UNCOND_UNARY, COND_UNARY): Add bit_not/not
to the lists.
(`vec (a ? -1 : 0) ^ b`): New pattern to convert
into conditional not.
* optabs.def (cond_one_cmpl): New optab.
(cond_len_one_cmpl): Likewise.
gcc/testsuite/ChangeLog:
PR target/110986
* gcc.target/aarch64/sve/cond_unary_9.c: New test.
This adds libstdc++-v3/include/bits/version.h so it has the correct timestamp.
Committed as obvious after running contrib/gcc_update --touch
contrib/ChangeLog:
* gcc_update: Add libstdc++-v3/include/bits/version.h.
Testcase gfortran.dg/bind_c_usage_13.f03 exhibited a memleak in the frontend
occuring when passing a character literal to a character,value dummy of a
bind(c) procedure, due to a missing cleanup in the conversion of the actual
argument expression. Reduced testcase:
program p
interface
subroutine val_c (c) bind(c)
use iso_c_binding, only: c_char
character(len=1,kind=c_char), value :: c
end subroutine val_c
end interface
call val_c ("A")
end
gcc/fortran/ChangeLog:
PR fortran/110360
* trans-expr.cc (conv_scalar_char_value): Use gfc_replace_expr to
avoid leaking replaced gfc_expr.
The callable used for resize_and_overwrite was being passed the string's
expanded capacity, which might be greater than the new size being
requested. This is not conforming, as the standard requires the same n
to be passed to the callable that the user passed to
resize_and_overwrite.
The existing tests didn't catch this because they all used a value which
was more than twice the existing capacity, so the _M_create call
allocated exactly what was requested, and the value passed to the
callable was correct. But when the requested size is greater than the
current capacity but smaller than twice the current capacity, _M_create
will allocate twice the current capacity and then that value was being
passed to the callable.
I noticed this because std::format(L"{}", 0.25) was producing L"0.25XX"
where the XX characters were whatever happened to be on the stack before
the call. When std::format used resize_and_overwrite to widen a string
it was copying too many characters into the destination and setting the
result's length too long. I've added a test for this case, and a new
test that doesn't hardcode -std=gnu++20 so can be used to test
std::format in C++23 and C++26 modes.
libstdc++-v3/ChangeLog:
* include/bits/basic_string.tcc (resize_and_overwrite): Invoke
the callable with the same size as resize_and_overwrite was
called with.
* testsuite/21_strings/basic_string/capacity/char/resize_and_overwrite.cc:
Check with small values for the new size.
* testsuite/std/format/functions/format.cc: Check wide
formatting of double values that produce small strings.
* testsuite/std/format/functions/format_c++23.cc: New test.
The improve_allocation() routine does not update the
allocated_hardreg_p[] array after an allocno is assigned a register.
If the register chosen in improve_allocation() is one that already has
been assigned to a conflicting allocno, then allocated_hardreg_p[]
already has the corresponding bit set to TRUE, so nothing needs to be
done.
But improve_allocation() can also choose a register that has not been
assigned to a conflicting allocno, and also has not been assigned to any
other allocno. In this case, allocated_hardreg_p[] has to be updated.
2023-07-21 Surya Kumari Jangala <jskumari@linux.ibm.com>
gcc/
PR rtl-optimization/110254
* ira-color.cc (improve_allocation): Update array
allocated_hard_reg_p.
These tests were derived from set.pass.cpp not set.pass.cc, specifically
pstl/test/std/algorithms/alg.sorting/alg.set.operations/set.pass.cpp in
the LLVM repo.
libstdc++-v3/ChangeLog:
* testsuite/25_algorithms/pstl/alg_sorting/set_difference.cc:
Fix name of upstream file this was derived from.
* testsuite/25_algorithms/pstl/alg_sorting/set_intersection.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set_symmetric_difference.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set_union.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set_util.h: Likewise.
Porting LRA to AVR revealed that creating a stack slot can make fp->sp
elimination impossible. The previous patches undoes fp assignment after
the stack slot creation but calculated wrongly live info after this. This
resulted in wrong generation by deleting some still alive insns. This
patch fixes this problem.
gcc/ChangeLog:
* lra-int.h (lra_update_fp2sp_elimination): Change the prototype.
* lra-eliminations.cc (spill_pseudos): Record spilled pseudos.
(lra_update_fp2sp_elimination): Ditto.
(update_reg_eliminate): Adjust spill_pseudos call.
* lra-spills.cc (lra_spill): Assign stack slots to pseudos spilled
in lra_update_fp2sp_elimination.
This commit replaces the ad-hoc logic in <version> with an AutoGen
database that (mostly) declaratively generates a version.h bit which
combines all of the FTM logic across all headers together.
This generated header defines macros of the form __glibcxx_foo,
equivalent to their __cpp_lib_foo variants, according to rules specified
in version.def and, optionally, if __glibcxx_want_foo or
__glibcxx_want_all are defined, also defines __cpp_lib_foo forms with
the same definition.
libstdc++-v3/ChangeLog:
* include/Makefile.am (bits_freestanding): Add version.h.
(allcreated): Add version.h.
(${bits_srcdir}/version.h): New rule. Regenerates
version.h out of version.{def,tpl}.
* include/Makefile.in: Regenerate.
* include/bits/version.def: New file. Declares a list of
all feature test macros, their values and their preconditions.
* include/bits/version.tpl: New file. Turns version.def
into a sequence of #if blocks.
* include/bits/version.h: New file. Generated from
version.def.
* include/std/version: Replace with a __glibcxx_want_all define
and bits/version.h include.
This patch adds support for the Cortex-A720 CPU to GCC.
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add Cortex-A720 CPU.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Document Cortex-A720 CPU.
This patch adds vector average patterns
op[0] = (narrow) ((wide) op[1] + (wide) op[2]) >> 1;
op[0] = (narrow) ((wide) op[1] + (wide) op[2] + 1) >> 1;
If there is no direct support, the vectorizer can synthesize the pattern
but, presumably, due to lack of narrowing operation support, won't try a
narrowing shift. Therefore, this patch implements the expanders
instead.
gcc/ChangeLog:
* config/riscv/autovec.md (<u>avg<v_double_trunc>3_floor):
Implement expander.
(<u>avg<v_double_trunc>3_ceil): Ditto.
* config/riscv/vector-iterators.md (ashiftrt): New iterator.
(ASHIFTRT): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/widen/vec-avg-run.c: New test.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-template.h: New test.
This patch fixes the case where vec_extract gets passed a promoted
subreg (e.g. from a return value). This is achieved by using
expand_convert_optab_fn instead of a separate expander function.
gcc/ChangeLog:
* internal-fn.cc (vec_extract_direct): Change type argument
numbers.
(expand_vec_extract_optab_fn): Call convert_optab_fn.
(direct_vec_extract_optab_supported_p): Use
convert_optab_supported_p.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c: New test.
The patch extends fold_vec_perm to fold VLA vector_csts.
For eg:
arg0 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x
arg1 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x
sel = { 0, len, ...} npatterns = 2, nelts_per_pattern = 1, len = 4 + 4x
res = VEC_PERM_EXPR<arg0, arg1, sel>
--> { arg0[0], arg1[0], ... }, npatterns = 2, nelts_per_pattern = 1
Eg 2:
arg0 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x
arg1 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x
sel = {0, 1, 2, ...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x
For this case the index 2 in sel is ambiguous for len 2 + 2x:
if x = 0, runtime vector length = 2 and sel[i] will choose arg1[0]
if x > 0, runtime vector length > 2 and sel[i] choose arg0[2].
So we return NULL_TREE for this case.
This leads us to defining a constraint that a stepped sequence in sel,
should only select a particular pattern from a particular input vector.
Eg 3:
arg0 = {...} npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x
arg1 = {...} npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x
sel = { len, 0, 2, ... } npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x
sel contains a single pattern with stepped sequence: {0, 2, ...}.
Let, a1 = the first element of stepped part of sequence, which is 0.
Let esel = number of total elements in stepped sequence.
Thus,
esel = len / sel_npatterns
= (4 + 4x) / 1
= 4 + 4x
Let S = step of the sequence, which is 2 in this case.
Let ae = last element of the stepped sequence.
Thus,
ae = a1 + (esel - 2) * S
= 0 + (4 + 4x - 2) * 2
= 4 + 8x
To ensure that we select elements from the same input vector,
a1 /trunc len = ae /trunc len.
Let, q1 = a1 /trunc len = 0 / (4 + 4x) = 0
Let, qe = ae /trunc len = (4 + 8x) / (4 + 4x) = 1
Since q1 != qe, we cross input vectors, and return NULL_TREE for this case.
However, if sel was:
sel = {len, 0, 1, ...}
The only change in this case is S = 1.
So,
ae = a1 + (esel - 2) * S
= 0 + (4 + 4x - 2) * 1
= 2 + 4x
In this case, a1/len == ae/len == 0, and the stepped sequence chooses all elements
from arg0.
Thus,
res = {arg1[0], arg0[0], arg0[1], ...}
For VLA folding, sel has to conform to constraints imposed in
valid_mask_for_fold_vec_perm_cst_p.
test_fold_vec_perm_cst defines several unit-tests for VLA folding.
gcc/ChangeLog:
* fold-const.cc (INCLUDE_ALGORITHM): Add Include.
(valid_mask_for_fold_vec_perm_cst_p): New function.
(fold_vec_perm_cst): Likewise.
(fold_vec_perm): Adjust assert and call fold_vec_perm_cst.
(test_fold_vec_perm_cst): New namespace.
(test_fold_vec_perm_cst::build_vec_cst_rand): New function.
(test_fold_vec_perm_cst::validate_res): Likewise.
(test_fold_vec_perm_cst::validate_res_vls): Likewise.
(test_fold_vec_perm_cst::builder_push_elems): Likewise.
(test_fold_vec_perm_cst::test_vnx4si_v4si): Likewise.
(test_fold_vec_perm_cst::test_v4si_vnx4si): Likewise.
(test_fold_vec_perm_cst::test_all_nunits): Likewise.
(test_fold_vec_perm_cst::test_nunits_min_2): Likewise.
(test_fold_vec_perm_cst::test_nunits_min_4): Likewise.
(test_fold_vec_perm_cst::test_nunits_min_8): Likewise.
(test_fold_vec_perm_cst::test_nunits_max_4): Likewise.
(test_fold_vec_perm_cst::is_simple_vla_size): Likewise.
(test_fold_vec_perm_cst::test): Likewise.
(fold_const_cc_tests): Call test_fold_vec_perm_cst::test.
Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
This patch would like to support the rounding mode API for the
VFWCVT.X.F.V as the below samples.
* __riscv_vfwcvt_xu_f_v_u64m2_rm
* __riscv_vfwcvt_xu_f_v_u64m2_rm_m
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc
(BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwcvt_xu_frm): New intrinsic function def.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/float-point-wcvt-xu.c: New test.
In some build option combination, the default value may result in
below error. This patch would like to fix it by passing a explict
argument.
riscv-vector-builtins-bases.cc:2495:24: error: invalid use of template-name \
‘riscv_vector::vfcvt_f’ without an argument list
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc: Use explicit argument.