A number of methods were still not using the small size optimization which
is to prefer an O(N) research to a hash computation as long as N is small.
libstdc++-v3/ChangeLog:
* include/bits/hashtable.h: Move comment about all equivalent values
being next to each other in the class documentation header.
(_M_reinsert_node, _M_merge_unique): Implement small size optimization.
(_M_find_tr, _M_count_tr, _M_equal_range_tr): Likewise.
Add benches on insert with hint and before begin cache.
libstdc++-v3/ChangeLog:
* testsuite/performance/23_containers/insert/54075.cc: Add lookup on unknown entries
w/o copy to see potential impact of memory fragmentation enhancements.
* testsuite/performance/23_containers/insert/unordered_multiset_hint.cc: Enhance hash
functor to make it perfect, exactly 1 entry per bucket. Also use hash functor tagged as
slow or not to bench w/o hash code cache.
* testsuite/performance/23_containers/insert/unordered_set_hint.cc: New test case. Like
previous one but using std::unordered_set.
* testsuite/performance/23_containers/insert/unordered_set_range_insert.cc: New test case.
Check performance of range-insertion compared to individual insertions.
* testsuite/performance/23_containers/insert_erase/unordered_small_size.cc: Add same bench
but after a copy to demonstrate impact of enhancements regarding memory fragmentation.
This fixes the test gcc.dg/gnu23-tag-4.c introduced by commit 23fee88f
which fails for -march=... because the DECL_FIELD_BIT_OFFSET are set
inconsistently for types with and without variable-sized field. This
is fixed by testing for DECL_ALIGN instead. The code is further
simplified by removing some unnecessary conditions, i.e. anon_field is
set unconditionaly and all fields are assumed to be DECL_FIELDs.
gcc/c:
* c-typeck.cc (tagged_types_tu_compatible_p): Revise.
gcc/testsuite:
* gcc.dg/c23-tag-9.c: New test.
this patch disables use of FMA in matrix multiplication loop for generic (for
x86-64-v3) and zen4. I tested this on zen4 and Xenon Gold Gold 6212U.
For Intel this is neutral both on the matrix multiplication microbenchmark
(attached) and spec2k17 where the difference was within noise for Core.
On core the micro-benchmark runs as follows:
With FMA:
578,500,241 cycles:u # 3.645 GHz
( +- 0.12% )
753,318,477 instructions:u # 1.30 insn per
cycle ( +- 0.00% )
125,417,701 branches:u # 790.227 M/sec
( +- 0.00% )
0.159146 +- 0.000363 seconds time elapsed ( +- 0.23% )
No FMA:
577,573,960 cycles:u # 3.514 GHz
( +- 0.15% )
878,318,479 instructions:u # 1.52 insn per
cycle ( +- 0.00% )
125,417,702 branches:u # 763.035 M/sec
( +- 0.00% )
0.164734 +- 0.000321 seconds time elapsed ( +- 0.19% )
So the cycle count is unchanged and discrete multiply+add takes same time as
FMA.
While on zen:
With FMA:
484875179 cycles:u # 3.599 GHz
( +- 0.05% ) (82.11%)
752031517 instructions:u # 1.55 insn per
cycle
125106525 branches:u # 928.712 M/sec
( +- 0.03% ) (85.09%)
128356 branch-misses:u # 0.10% of all
branches ( +- 0.06% ) (83.58%)
No FMA:
375875209 cycles:u # 3.592 GHz
( +- 0.08% ) (80.74%)
875725341 instructions:u # 2.33 insn per
cycle
124903825 branches:u # 1.194 G/sec
( +- 0.04% ) (84.59%)
0.105203 +- 0.000188 seconds time elapsed ( +- 0.18% )
The diffrerence is that Cores understand the fact that fmadd does not need
all three parameters to start computation, while Zen cores doesn't.
Since this seems noticeable win on zen and not loss on Core it seems like good
default for generic.
float a[SIZE][SIZE];
float b[SIZE][SIZE];
float c[SIZE][SIZE];
void init(void)
{
int i, j, k;
for(i=0; i<SIZE; ++i)
{
for(j=0; j<SIZE; ++j)
{
a[i][j] = (float)i + j;
b[i][j] = (float)i - j;
c[i][j] = 0.0f;
}
}
}
void mult(void)
{
int i, j, k;
for(i=0; i<SIZE; ++i)
{
for(j=0; j<SIZE; ++j)
{
for(k=0; k<SIZE; ++k)
{
c[i][j] += a[i][k] * b[k][j];
}
}
}
}
int main(void)
{
clock_t s, e;
init();
s=clock();
mult();
e=clock();
printf(" mult took %10d clocks\n", (int)(e-s));
return 0;
}
gcc/ChangeLog:
* config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS,
X86_TUNE_AVOID_256FMA_CHAINS): Enable for znver4 and Core.
In gimple the operation
short _8;
double _9;
_9 = (double) _8;
denotes two operations on AArch64. First we have to widen from short to
long and then convert this integer to a double.
Currently however we only count the widen/truncate operations:
(double) _5 6 times vec_promote_demote costs 12 in body
(double) _5 12 times vec_promote_demote costs 24 in body
but not the actual conversion operation, which needs an additional 12
instructions in the attached testcase. Without this the attached testcase ends
up incorrectly thinking that it's beneficial to vectorize the loop at a very
high VF = 8 (4x unrolled).
Because we can't change the mid-end to account for this the costing code in the
backend now keeps track of whether the previous operation was a
promotion/demotion and ajdusts the expected number of instructions to:
1. If it's the first FLOAT_EXPR and the precision of the lhs and rhs are
different, double it, since we need to convert and promote.
2. If it's the previous operation was a demonition/promotion then reduce the
cost of the current operation by the amount we added extra in the last.
with the patch we get:
(double) _5 6 times vec_promote_demote costs 24 in body
(double) _5 12 times vec_promote_demote costs 36 in body
which correctly accounts for 30 operations.
This fixes the 16% regression in imagick in SPECCPU 2017 reported on Neoverse N2
and using the new generic Armv9-a cost model.
gcc/ChangeLog:
PR target/110625
* config/aarch64/aarch64.cc (aarch64_vector_costs::add_stmt_cost):
Adjust throughput and latency calculations for vector conversions.
(class aarch64_vector_costs): Add m_num_last_promote_demote.
gcc/testsuite/ChangeLog:
PR target/110625
* gcc.target/aarch64/pr110625_4.c: New test.
* gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Add
--param aarch64-sve-compare-costs=0.
* gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise
gcc/ChangeLog:
* config/loongarch/loongarch.md (bstrins_<mode>_for_ior_mask):
For the condition, remove unneeded trailing "\" and move "&&" to
follow GNU coding style. NFC.
The problem with peephole2 is it uses a naive sliding-window algorithm
and misses many cases. For example:
float a[10000];
float t() { return a[0] + a[8000]; }
is compiled to:
la.local $r13,a
la.local $r12,a+32768
fld.s $f1,$r13,0
fld.s $f0,$r12,-768
fadd.s $f0,$f1,$f0
by trunk. But as we've explained in r14-4851, the following would be
better with -mexplicit-relocs=auto:
pcalau12i $r13,%pc_hi20(a)
pcalau12i $r12,%pc_hi20(a+32000)
fld.s $f1,$r13,%pc_lo12(a)
fld.s $f0,$r12,%pc_lo12(a+32000)
fadd.s $f0,$f1,$f0
However the sliding-window algorithm just won't detect the pcalau12i/fld
pair to be optimized. Use a define_insn_and_rewrite in combine pass
will work around the issue.
gcc/ChangeLog:
* config/loongarch/predicates.md
(symbolic_pcrel_offset_operand): New define_predicate.
(mem_simple_ldst_operand): Likewise.
* config/loongarch/loongarch-protos.h
(loongarch_rewrite_mem_for_simple_ldst): Declare.
* config/loongarch/loongarch.cc
(loongarch_rewrite_mem_for_simple_ldst): Implement.
* config/loongarch/loongarch.md (simple_load<mode>): New
define_insn_and_rewrite.
(simple_load_<su>ext<SUBDI:mode><GPR:mode>): Likewise.
(simple_store<mode>): Likewise.
(define_peephole2): Remove la.local/[f]ld peepholes.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/explicit-relocs-auto-single-load-store-2.c:
New test.
* gcc.target/loongarch/explicit-relocs-auto-single-load-store-3.c:
New test.
The post-reload splitter currently allows xmm16+ registers with TARGET_EVEX512.
The splitter changes SFmode of the output operand to V4SFmode, but the vector
mode is currently unsupported in xmm16+ without TARGET_AVX512VL. lowpart_subreg
returns NULL_RTX in this case and the compilation fails with invalid RTX.
The patch removes support for x/ymm16+ registers with TARGET_EVEX512. The
support should be restored once ix86_hard_regno_mode_ok is fixed to allow
16-byte modes in x/ymm16+ with TARGET_EVEX512.
PR target/113133
gcc/ChangeLog:
* config/i386/i386.md
(TARGET_USE_VECTOR_FP_CONVERTS SF->DF float_extend splitter):
Do not handle xmm16+ with TARGET_EVEX512.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr113133-1.c: New test.
* gcc.target/i386/pr113133-2.c: New test.
This fixes the gcc.dg/tree-ssa/gen-vect-26.c testcase by adding
`#pragma GCC novector` in front of the loop that is doing the checking
of the result. We only want to test the first loop to see if it can be
vectorize.
Committed as obvious after testing on x86_64-linux-gnu with -m32.
gcc/testsuite/ChangeLog:
PR testsuite/113167
* gcc.dg/tree-ssa/gen-vect-26.c: Mark the test/check loop
as novector.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
The redudant dump check is fragile and easily changed, not necessary.
Tested on both RV32/RV64 no regression.
Remove it and committed.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: Remove redundant checks.
Notice we have this following situation:
vsetivli zero,4,e32,m1,ta,ma
vlseg4e32.v v4,(a5)
vlseg4e32.v v12,(a3)
vsetvli a5,zero,e32,m1,tu,ma ---> This is redundant since VLMAX AVL = 4 when it is fixed-vlmax
vfadd.vf v3,v13,fa0
vfadd.vf v1,v12,fa1
vfmul.vv v17,v3,v5
vfmul.vv v16,v1,v5
The rootcause is that we transform COND_LEN_xxx into VLMAX AVL when len == NUNITS blindly.
However, we don't need to transform all of them since when len is range of [0,31], we don't need to
consume scalar registers.
After this patch:
vsetivli zero,4,e32,m1,tu,ma
addi a4,a5,400
vlseg4e32.v v12,(a3)
vfadd.vf v3,v13,fa0
vfadd.vf v1,v12,fa1
vlseg4e32.v v4,(a4)
vfadd.vf v2,v14,fa1
vfmul.vv v17,v3,v5
vfmul.vv v16,v1,v5
Tested on both RV32 and RV64 no regression.
Ok for trunk ?
gcc/ChangeLog:
* config/riscv/riscv-v.cc (is_vlmax_len_p): New function.
(expand_load_store): Disallow transformation into VLMAX when len is in range of [0,31]
(expand_cond_len_op): Ditto.
(expand_gather_scatter): Ditto.
(expand_lanes_load_store): Ditto.
(expand_fold_extract_last): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/post-ra-avl.c: Adapt test.
* gcc.target/riscv/rvv/base/vf_avl-2.c: New test.
Separate out -fdump-* options to the new section. Sort by option name.
While there, document -save-temps intermediates.
gcc/fortran/ChangeLog:
PR fortran/81615
* invoke.texi: Add Developer Options section. Move '-fdump-*'
to it. Add small examples about changed '-save-temps' behavior.
Signed-off-by: Rimvydas Jasinskas <rimvydas.jas@gmail.com>
The template linkage2.C and linkage3.C testcases expect a
decoration that does not match AIX assembler syntax. Expect failure.
gcc/testsuite/ChangeLog:
* g++.dg/template/linkage2.C: XFAIL on AIX.
* g++.dg/template/linkage3.C: Same.
Signed-off-by: David Edelsohn <dje.gcc@gmail.com>
Move ix86_expand_unary_operator from i386.cc to i386-expand.cc, re-arrange
prototypes and do some cosmetic changes with the usage of TARGET_APX_NDD.
No functional changes.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_unary_operator_ok): Move from here...
* config/i386/i386-expand.cc (ix86_unary_operator_ok): ... to here.
* config/i386/i386-protos.h: Re-arrange ix86_{unary|binary}_operator_ok
and ix86_expand_{unary|binary}_operator prototypes.
* config/i386/i386.md: Cosmetic changes with the usage of
TARGET_APX_NDD in ix86_expand_{unary|binary}_operator
and ix86_{unary|binary}_operator_ok function calls.
The GCC internal doc says:
X might be a pseudo-register or a 'subreg' of a pseudo-register,
which could either be in a hard register or in memory. Use
'true_regnum' to find out; it will return -1 if the pseudo is in
memory and the hard register number if it is in a register.
So "MEM_P (x)" is not enough for checking if we are reloading from/to
the memory. This bug has caused reload pass to stall and finally ICE
complaining with "maximum number of generated reload insns per insn
achieved", since r14-6814.
Check if "true_regnum (x)" is -1 besides "MEM_P (x)" to fix the issue.
gcc/ChangeLog:
PR target/113148
* config/loongarch/loongarch.cc (loongarch_secondary_reload):
Check if regno == -1 besides MEM_P (x) for reloading FCCmode
from/to FPR to/from memory.
gcc/testsuite/ChangeLog:
PR target/113148
* gcc.target/loongarch/pr113148.c: New test.
gcc/ChangeLog:
* config/loongarch/loongarch.md (rotl<mode>3):
New define_expand.
* config/loongarch/simd.md (vrotl<mode>3): Likewise.
(rotl<mode>3): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/rotl-with-rotr.c: New test.
* gcc.target/loongarch/rotl-with-vrotr-b.c: New test.
* gcc.target/loongarch/rotl-with-vrotr-h.c: New test.
* gcc.target/loongarch/rotl-with-vrotr-w.c: New test.
* gcc.target/loongarch/rotl-with-vrotr-d.c: New test.
* gcc.target/loongarch/rotl-with-xvrotr-b.c: New test.
* gcc.target/loongarch/rotl-with-xvrotr-h.c: New test.
* gcc.target/loongarch/rotl-with-xvrotr-w.c: New test.
* gcc.target/loongarch/rotl-with-xvrotr-d.c: New test.
Following code will cause ICE on LoongArch target:
#include <lsxintrin.h>
extern void bar (__m128i, __m128i);
__m128i a;
void
foo ()
{
bar (a, a);
}
It is caused by missing constraint definition in mov<mode>_lsx. This
patch fixes the template and remove the unnecessary processing from
loongarch_split_move () function.
This patch also cleanup the redundant definition from
loongarch_split_move () and loongarch_split_move_p ().
gcc/ChangeLog:
* config/loongarch/lasx.md: Use loongarch_split_move and
loongarch_split_move_p directly.
* config/loongarch/loongarch-protos.h
(loongarch_split_move): Remove unnecessary argument.
(loongarch_split_move_insn_p): Delete.
(loongarch_split_move_insn): Delete.
* config/loongarch/loongarch.cc
(loongarch_split_move_insn_p): Delete.
(loongarch_load_store_insns): Use loongarch_split_move_p
directly.
(loongarch_split_move): remove the unnecessary processing.
(loongarch_split_move_insn): Delete.
* config/loongarch/lsx.md: Use loongarch_split_move and
loongarch_split_move_p directly.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lsx/lsx-mov-1.c: New test.
When investigaing failure of gcc.dg/vect/slp-reduc-sad.c, following
instruction block are being generated by vec_concatv32qi (which is
generated by vec_initv32qiv16qi) at entrance of foo() function:
vldx $vr3,$r5,$r6
vld $vr2,$r5,0
xvpermi.q $xr2,$xr3,0x20
causes the reversion of vec_initv32qiv16qi operation's high and
low 128-bit part.
According to other target's similar impl and LSX impl for following
RTL representation, current definition in lasx.md of "vec_concat<mode>"
are wrong:
(set (op0) (vec_concat (op1) (op2)))
For correct behavior, the last argument of xvpermi.q should be 0x02
instead of 0x20. This patch fixes this issue and cleanup the vec_concat
template impl.
gcc/ChangeLog:
* config/loongarch/lasx.md (vec_concatv4di): Delete.
(vec_concatv8si): Delete.
(vec_concatv16hi): Delete.
(vec_concatv32qi): Delete.
(vec_concatv4df): Delete.
(vec_concatv8sf): Delete.
(vec_concat<mode>): New template with insn output fixed.
We found that using the latest compiled gcc will cause a miscompare error
when running spec2006 400.perlbench test with -flto turned on. After testing,
it was found that only the LoongArch architecture will report errors.
The first error commit was located through the git bisect command as
r14-3773-g5b857e87201335. Through debugging, it was found that the problem
was that the split condition of the *bstrins_<mode>_for_ior_mask template was
empty, which should actually be consistent with the insn condition.
gcc/ChangeLog:
* config/loongarch/loongarch.md: Adjust.
Remove P7 CPU test as only P7 above can enter this function and P7 LE is
excluded by the checking of targetm.slow_unaligned_access on word_mode.
Also performance test shows the expand of block compare is better than
library on P7 BE when the length is from 16 bytes to 64 bytes.
gcc/
* config/rs6000/rs6000-string.cc (expand_block_compare): Assert
only P7 above can enter this function. Remove P7 CPU test and let
P7 BE do the expand.
gcc/testsuite/
* gcc.target/powerpc/block-cmp-4.c: New.
Marco TARGET_EFFICIENT_OVERLAPPING_UNALIGNED is used in rs6000-string.cc
to guard the platform which is efficient on fixed point unaligned
load/store. It's originally defined by TARGET_EFFICIENT_UNALIGNED_VSX
which is enabled from P8 and can be disabled by mno-vsx option. So the
definition is improper. This patch corrects it and call
slow_unaligned_access to judge if fixed point unaligned load/store is
efficient or not.
gcc/
* config/rs6000/rs6000.h (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED):
Remove.
* config/rs6000/rs6000-string.cc (select_block_compare_mode):
Replace TARGET_EFFICIENT_OVERLAPPING_UNALIGNED with
targetm.slow_unaligned_access.
(expand_block_compare_gpr): Likewise.
(expand_block_compare): Likewise.
(expand_strncmp_gpr_sequence): Likewise.
gcc/testsuite/
* gcc.target/powerpc/block-cmp-1.c: New.
* gcc.target/powerpc/block-cmp-2.c: New.
AIX sections use the csect directive to name a section. Check for
csect name in attr-section testcases.
gcc/testsuite/ChangeLog:
* g++.dg/ext/attr-section1.C: Test for csect section directive.
* g++.dg/ext/attr-section1a.C: Same.
* g++.dg/ext/attr-section2.C: Same.
* g++.dg/ext/attr-section2a.C: Same.
* g++.dg/ext/attr-section2b.C: Same.
Signed-off-by: David Edelsohn <dje.gcc@gmail.com>
The out-of-bounds diagram tests fail on AIX.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/out-of-bounds-diagram-17.c: Skip on AIX.
* gcc.dg/analyzer/out-of-bounds-diagram-18.c: Same.
Signed-off-by: David Edelsohn <dje.gcc@gmail.com>
AIX does not support stack scrubbing. Set strub as unsupported on AIX
and ensure that testcases check for strub support.
gcc/testsuite/ChangeLog:
* c-c++-common/strub-unsupported-2.c: Require strub.
* c-c++-common/strub-unsupported-3.c: Same.
* c-c++-common/strub-unsupported.c: Same.
* lib/target-supports.exp (check_effective_target_strub): Return 0
for AIX.
Signed-off-by: David Edelsohn <dje.gcc@gmail.com>
The two testcases are for targets that support FMA. And
pr110279-2.c assumes reassoc_width of FMUL to be 4.
This patch adds missing options, to fix regression test failures
on nvptx/GCN (default reassoc_width of FMUL is 1) and x86_64
(need "-mfma").
gcc/testsuite/ChangeLog:
* gcc.dg/pr110279-1.c: Add "-mcpu=generic" for aarch64; add
"-mfma" for x86_64.
* gcc.dg/pr110279-2.c: Replace "-march=armv8.2-a" with
"-mcpu=generic"; limit the check to be on aarch64.
Add dg-require-effective-target directive that allows the test case to run specifically
on powerpc_pcrel target.
2023-12-26 Jeevitha Palanisamy <jeevitha@linux.ibm.com>
gcc/testsuite/
PR target/110320
* gcc.target/powerpc/pr110320-1.c: Add dg-require-effective-target powerpc_pcrel.
Currently, we compute RVV V_REGS liveness during better_main_loop_than_p which is not appropriate
time to do that since we for example, when have the codes will finally pick LMUL = 8 vectorization
factor, we compute liveness for LMUL = 8 multiple times which are redundant.
Since we have leverage the current ARM SVE COST model:
/* Do one-time initialization based on the vinfo. */
loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (m_vinfo);
if (!m_analyzed_vinfo)
{
if (loop_vinfo)
analyze_loop_vinfo (loop_vinfo);
m_analyzed_vinfo = true;
}
Analyze COST model only once for each cost model.
So here we move dynamic LMUL liveness information into analyze_loop_vinfo.
/* Do one-time initialization of the costs given that we're
costing the loop vectorization described by LOOP_VINFO. */
void
costs::analyze_loop_vinfo (loop_vec_info loop_vinfo)
{
...
/* Detect whether the LOOP has unexpected spills. */
record_potential_unexpected_spills (loop_vinfo);
}
So that we can avoid redundant computations and the current dynamic LMUL cost model flow is much
more reasonable and consistent with others.
Tested on RV32 and RV64 no regressions.
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (compute_estimated_lmul): Allow
fractional vecrtor.
(preferred_new_lmul_p): Move RVV V_REGS liveness computation into analyze_loop_vinfo.
(has_unexpected_spills_p): New function.
(costs::record_potential_unexpected_spills): Ditto.
(costs::better_main_loop_than_p): Move RVV V_REGS liveness computation into
analyze_loop_vinfo.
* config/riscv/riscv-vector-costs.h: New functions and variables.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c: Robostify test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/no-dynamic-lmul-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/pr111848.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: Ditto.
when configured with --enable-checking=release we get a false
positive on the use of vec_stmts as the compiler seems unable
to notice it gets initialized through the pass-by-reference.
This explicitly initializes the local.
gcc/ChangeLog:
PR bootstrap/113132
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Initialize vec_stmts;
Normally, GPR2 is the TOC pointer and is defined as a fixed and non-volatile
register. However, it can be used as volatile for PCREL addressing. Therefore,
modified r2 to be non-fixed in FIXED_REGISTERS and set it to fixed if it is not
PCREL and also when the user explicitly requests TOC or fixed. If the register
r2 is fixed, it is made as non-volatile. Changes in register preservation roles
can be accomplished with the help of available target hooks
(TARGET_CONDITIONAL_REGISTER_USAGE).
2023-12-24 Jeevitha Palanisamy <jeevitha@linux.ibm.com>
gcc/
PR target/110320
* config/rs6000/rs6000.cc (rs6000_conditional_register_usage): Change
GPR2 to volatile and non-fixed register for PCREL.
* config/rs6000/rs6000.h (FIXED_REGISTERS): Modify GPR2 to not fixed.
gcc/testsuite/
PR target/110320
* gcc.target/powerpc/pr110320-1.c: New testcase.
* gcc.target/powerpc/pr110320-2.c: New testcase.
* gcc.target/powerpc/pr110320-3.c: New testcase.
Co-authored-by: Peter Bergner <bergner@linux.ibm.com>
In the testcase provided, we would match f_plus but not g_plus
due to a missing `:c` on the plus operator. This fixes the oversight
there.
Note this was noted in https://github.com/llvm/llvm-project/issues/76318 .
Committed as obvious after bootstrap/test on x86_64-linux-gnu.
PR tree-optimization/19832
gcc/ChangeLog:
* match.pd (`(a != b) ? (a + b) : (2 * a)`): Add `:c`
on the plus operator.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/phi-opt-same-2.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
The following three tests now correctly work for targets that have an
implementation of cbranch for vectors so XFAILs are conditionally removed gated
on vect_early_break support.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/tsvc/vect-tsvc-s332.c: Remove xfail when early break
supported.
* gcc.dg/vect/tsvc/vect-tsvc-s481.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s482.c: Likewise.
This adds new test to check for all the early break functionality.
It includes a number of codegen and runtime tests checking the values at
different needles in the array.
They also check the values on different array sizes and peeling positions,
datatypes, VL, ncopies and every other variant I could think of.
Additionally it also contains reduced cases from issues found running over
various codebases.
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Also regtested with:
-march=armv8.3-a+sve
-march=armv8.3-a+nosve
-march=armv9-a
Bootstrapped Regtested x86_64-pc-linux-gnu and no issues.
On the tests I have disabled x86_64 on it's because the target is missing
cbranch for all types. I think it should be possible to add them for the
missing type since all we care about is if a bit is set or not.
Bootstrap and Regtest on arm-none-linux-gnueabihf still running
and test on arm-none-eabi -march=armv8.1-m.main+mve -mfpu=auto running.
gcc/ChangeLog:
* doc/sourcebuild.texi (check_effective_target_vect_early_break_hw,
check_effective_target_vect_early_break): Document.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (add_options_for_vect_early_break,
check_effective_target_vect_early_break_hw,
check_effective_target_vect_early_break): New.
* g++.dg/vect/vect-early-break_1.cc: New test.
* g++.dg/vect/vect-early-break_2.cc: New test.
* g++.dg/vect/vect-early-break_3.cc: New test.
* gcc.dg/vect/vect-early-break-run_1.c: New test.
* gcc.dg/vect/vect-early-break-run_10.c: New test.
* gcc.dg/vect/vect-early-break-run_2.c: New test.
* gcc.dg/vect/vect-early-break-run_3.c: New test.
* gcc.dg/vect/vect-early-break-run_4.c: New test.
* gcc.dg/vect/vect-early-break-run_5.c: New test.
* gcc.dg/vect/vect-early-break-run_6.c: New test.
* gcc.dg/vect/vect-early-break-run_7.c: New test.
* gcc.dg/vect/vect-early-break-run_8.c: New test.
* gcc.dg/vect/vect-early-break-run_9.c: New test.
* gcc.dg/vect/vect-early-break-template_1.c: New test.
* gcc.dg/vect/vect-early-break-template_2.c: New test.
* gcc.dg/vect/vect-early-break_1.c: New test.
* gcc.dg/vect/vect-early-break_10.c: New test.
* gcc.dg/vect/vect-early-break_11.c: New test.
* gcc.dg/vect/vect-early-break_12.c: New test.
* gcc.dg/vect/vect-early-break_13.c: New test.
* gcc.dg/vect/vect-early-break_14.c: New test.
* gcc.dg/vect/vect-early-break_15.c: New test.
* gcc.dg/vect/vect-early-break_16.c: New test.
* gcc.dg/vect/vect-early-break_17.c: New test.
* gcc.dg/vect/vect-early-break_18.c: New test.
* gcc.dg/vect/vect-early-break_19.c: New test.
* gcc.dg/vect/vect-early-break_2.c: New test.
* gcc.dg/vect/vect-early-break_20.c: New test.
* gcc.dg/vect/vect-early-break_21.c: New test.
* gcc.dg/vect/vect-early-break_22.c: New test.
* gcc.dg/vect/vect-early-break_23.c: New test.
* gcc.dg/vect/vect-early-break_24.c: New test.
* gcc.dg/vect/vect-early-break_25.c: New test.
* gcc.dg/vect/vect-early-break_26.c: New test.
* gcc.dg/vect/vect-early-break_27.c: New test.
* gcc.dg/vect/vect-early-break_28.c: New test.
* gcc.dg/vect/vect-early-break_29.c: New test.
* gcc.dg/vect/vect-early-break_3.c: New test.
* gcc.dg/vect/vect-early-break_30.c: New test.
* gcc.dg/vect/vect-early-break_31.c: New test.
* gcc.dg/vect/vect-early-break_32.c: New test.
* gcc.dg/vect/vect-early-break_33.c: New test.
* gcc.dg/vect/vect-early-break_34.c: New test.
* gcc.dg/vect/vect-early-break_35.c: New test.
* gcc.dg/vect/vect-early-break_36.c: New test.
* gcc.dg/vect/vect-early-break_37.c: New test.
* gcc.dg/vect/vect-early-break_38.c: New test.
* gcc.dg/vect/vect-early-break_39.c: New test.
* gcc.dg/vect/vect-early-break_4.c: New test.
* gcc.dg/vect/vect-early-break_40.c: New test.
* gcc.dg/vect/vect-early-break_41.c: New test.
* gcc.dg/vect/vect-early-break_42.c: New test.
* gcc.dg/vect/vect-early-break_43.c: New test.
* gcc.dg/vect/vect-early-break_44.c: New test.
* gcc.dg/vect/vect-early-break_45.c: New test.
* gcc.dg/vect/vect-early-break_46.c: New test.
* gcc.dg/vect/vect-early-break_47.c: New test.
* gcc.dg/vect/vect-early-break_48.c: New test.
* gcc.dg/vect/vect-early-break_49.c: New test.
* gcc.dg/vect/vect-early-break_5.c: New test.
* gcc.dg/vect/vect-early-break_50.c: New test.
* gcc.dg/vect/vect-early-break_51.c: New test.
* gcc.dg/vect/vect-early-break_52.c: New test.
* gcc.dg/vect/vect-early-break_53.c: New test.
* gcc.dg/vect/vect-early-break_54.c: New test.
* gcc.dg/vect/vect-early-break_55.c: New test.
* gcc.dg/vect/vect-early-break_56.c: New test.
* gcc.dg/vect/vect-early-break_57.c: New test.
* gcc.dg/vect/vect-early-break_58.c: New test.
* gcc.dg/vect/vect-early-break_59.c: New test.
* gcc.dg/vect/vect-early-break_6.c: New test.
* gcc.dg/vect/vect-early-break_60.c: New test.
* gcc.dg/vect/vect-early-break_61.c: New test.
* gcc.dg/vect/vect-early-break_62.c: New test.
* gcc.dg/vect/vect-early-break_63.c: New test.
* gcc.dg/vect/vect-early-break_64.c: New test.
* gcc.dg/vect/vect-early-break_65.c: New test.
* gcc.dg/vect/vect-early-break_66.c: New test.
* gcc.dg/vect/vect-early-break_67.c: New test.
* gcc.dg/vect/vect-early-break_68.c: New test.
* gcc.dg/vect/vect-early-break_69.c: New test.
* gcc.dg/vect/vect-early-break_7.c: New test.
* gcc.dg/vect/vect-early-break_70.c: New test.
* gcc.dg/vect/vect-early-break_71.c: New test.
* gcc.dg/vect/vect-early-break_72.c: New test.
* gcc.dg/vect/vect-early-break_73.c: New test.
* gcc.dg/vect/vect-early-break_74.c: New test.
* gcc.dg/vect/vect-early-break_75.c: New test.
* gcc.dg/vect/vect-early-break_76.c: New test.
* gcc.dg/vect/vect-early-break_77.c: New test.
* gcc.dg/vect/vect-early-break_78.c: New test.
* gcc.dg/vect/vect-early-break_79.c: New test.
* gcc.dg/vect/vect-early-break_8.c: New test.
* gcc.dg/vect/vect-early-break_80.c: New test.
* gcc.dg/vect/vect-early-break_81.c: New test.
* gcc.dg/vect/vect-early-break_82.c: New test.
* gcc.dg/vect/vect-early-break_83.c: New test.
* gcc.dg/vect/vect-early-break_84.c: New test.
* gcc.dg/vect/vect-early-break_85.c: New test.
* gcc.dg/vect/vect-early-break_86.c: New test.
* gcc.dg/vect/vect-early-break_87.c: New test.
* gcc.dg/vect/vect-early-break_88.c: New test.
* gcc.dg/vect/vect-early-break_89.c: New test.
* gcc.dg/vect/vect-early-break_9.c: New test.
* gcc.dg/vect/vect-early-break_90.c: New test.
* gcc.dg/vect/vect-early-break_91.c: New test.
* gcc.dg/vect/vect-early-break_92.c: New test.
* gcc.dg/vect/vect-early-break_93.c: New test.
Hi All,
This adds an implementation for conditional branch optab for AArch64.
For e.g.
void f1 ()
{
for (int i = 0; i < N; i++)
{
b[i] += a[i];
if (a[i] > 0)
break;
}
}
For 128-bit vectors we generate:
cmgt v1.4s, v1.4s, #0
umaxp v1.4s, v1.4s, v1.4s
fmov x3, d1
cbnz x3, .L8
and of 64-bit vector we can omit the compression:
cmgt v1.2s, v1.2s, #0
fmov x2, d1
cbz x2, .L13
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (cbranch<mode>4): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/vect-early-break-cbranch.c: New test.
* gcc.target/aarch64/vect-early-break-cbranch.c: New test.