GCC modified for the FreeChainXenon project
![]() This patch fixes the known issues on SLP cases: ble a2,zero,.L11 addiw t1,a2,-1 li a5,15 bleu t1,a5,.L9 srliw a7,t1,4 slli a7,a7,7 lui t3,%hi(.LANCHOR0) lui a6,%hi(.LANCHOR0+128) addi t3,t3,%lo(.LANCHOR0) li a4,128 addi a6,a6,%lo(.LANCHOR0+128) add a7,a7,a0 addi a3,a1,37 mv a5,a0 vsetvli zero,a4,e8,m8,ta,ma vle8.v v24,0(t3) vle8.v v16,0(a6) .L4: li a6,128 vle8.v v0,0(a3) vrgather.vv v8,v0,v24 vadd.vv v8,v8,v16 vse8.v v8,0(a5) add a5,a5,a6 add a3,a3,a6 bne a5,a7,.L4 andi a5,t1,-16 mv t1,a5 .L3: subw a2,a2,a5 li a4,1 beq a2,a4,.L5 slli a5,a5,32 srli a5,a5,32 addiw a2,a2,-1 slli a5,a5,3 csrr a4,vlenb slli a6,a2,32 addi t3,a5,37 srli a3,a6,29 slli a4,a4,2 add t3,a1,t3 add a5,a0,a5 mv t5,a3 bgtu a3,a4,.L14 .L6: li a4,50790400 addi a4,a4,1541 li a6,67633152 addi a6,a6,513 slli a4,a4,32 add a4,a4,a6 vsetvli t4,zero,e64,m4,ta,ma vmv.v.x v16,a4 vsetvli a6,zero,e16,m8,ta,ma vid.v v8 vsetvli zero,t5,e8,m4,ta,ma vle8.v v20,0(t3) vsetvli a6,zero,e16,m8,ta,ma csrr a7,vlenb vand.vi v8,v8,-8 vsetvli zero,zero,e8,m4,ta,ma slli a4,a7,2 vrgatherei16.vv v4,v20,v8 vadd.vv v4,v4,v16 vsetvli zero,t5,e8,m4,ta,ma vse8.v v4,0(a5) bgtu a3,a4,.L15 .L7: addw t1,a2,t1 .L5: slliw a5,t1,3 add a1,a1,a5 lui a4,%hi(.LC2) add a0,a0,a5 lbu a3,37(a1) addi a5,a4,%lo(.LC2) vsetivli zero,8,e8,mf2,ta,ma vmv.v.x v1,a3 vle8.v v2,0(a5) vadd.vv v1,v1,v2 vse8.v v1,0(a0) .L11: ret .L15: sub a3,a3,a4 bleu a3,a4,.L8 mv a3,a4 .L8: li a7,50790400 csrr a4,vlenb slli a4,a4,2 addi a7,a7,1541 li t4,67633152 add t3,t3,a4 vsetvli zero,a3,e8,m4,ta,ma slli a7,a7,32 addi t4,t4,513 vle8.v v20,0(t3) add a4,a5,a4 add a7,a7,t4 vsetvli a5,zero,e64,m4,ta,ma vmv.v.x v16,a7 vsetvli a6,zero,e16,m8,ta,ma vid.v v8 vand.vi v8,v8,-8 vsetvli zero,zero,e8,m4,ta,ma vrgatherei16.vv v4,v20,v8 vadd.vv v4,v4,v16 vsetvli zero,a3,e8,m4,ta,ma vse8.v v4,0(a4) j .L7 .L14: mv t5,a4 j .L6 .L9: li a5,0 li t1,0 j .L3 The vectorization codegen is quite inefficient since we choose a VLS modes to vectorize the loop body with epilogue choosing a VLA modes. cost.c:6:21: note: ***** Choosing vector mode V128QI cost.c:6:21: note: ***** Choosing epilogue vector mode RVVM4QI As we known, in RVV side, we have VLA modes and VLS modes. VLAmodes support partial vectors wheras VLSmodes support full vectors. The goal we add VLSmodes is to improve the codegen of known NITERS or SLP codes. If NITERS is unknown, that is i < n, n is unknown. We will always have partial vectors vectorization. It can be loop body or epilogue. In this case, It's always more efficient to apply VLA partial vectorization on loop body which doesn't have epilogue. After this patch: f: ble a2,zero,.L7 li a5,1 beq a2,a5,.L5 li a6,50790400 addi a6,a6,1541 li a4,67633152 addi a4,a4,513 csrr a5,vlenb addiw a2,a2,-1 slli a6,a6,32 add a6,a6,a4 slli a5,a5,2 slli a4,a2,32 vsetvli t1,zero,e64,m4,ta,ma srli a3,a4,29 neg t4,a5 addi a7,a1,37 mv a4,a0 vmv.v.x v12,a6 vsetvli t3,zero,e16,m8,ta,ma vid.v v16 vand.vi v16,v16,-8 .L4: minu a6,a3,a5 vsetvli zero,a6,e8,m4,ta,ma vle8.v v8,0(a7) vsetvli t3,zero,e8,m4,ta,ma mv t1,a3 vrgatherei16.vv v4,v8,v16 vsetvli zero,a6,e8,m4,ta,ma vadd.vv v4,v4,v12 vse8.v v4,0(a4) add a7,a7,a5 add a4,a4,a5 add a3,a3,t4 bgtu t1,a5,.L4 .L3: slliw a2,a2,3 add a1,a1,a2 lui a5,%hi(.LC0) lbu a4,37(a1) add a0,a0,a2 addi a5,a5,%lo(.LC0) vsetivli zero,8,e8,mf2,ta,ma vmv.v.x v1,a4 vle8.v v2,0(a5) vadd.vv v1,v1,v2 vse8.v v1,0(a0) .L7: ret Tested on both RV32 and RV64 no regression. Ok for trunk ? gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (costs::better_main_loop_than_p): VLA preempt VLS on unknown NITERS loop. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/slp-1.c: Remove xfail. * gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto. |
||
---|---|---|
.github | ||
c++tools | ||
config | ||
contrib | ||
fixincludes | ||
gcc | ||
gnattools | ||
gotools | ||
include | ||
INSTALL | ||
libada | ||
libatomic | ||
libbacktrace | ||
libcc1 | ||
libcody | ||
libcpp | ||
libdecnumber | ||
libffi | ||
libgcc | ||
libgfortran | ||
libgm2 | ||
libgo | ||
libgomp | ||
libgrust | ||
libiberty | ||
libitm | ||
libobjc | ||
libphobos | ||
libquadmath | ||
libsanitizer | ||
libssp | ||
libstdc++-v3 | ||
libvtv | ||
lto-plugin | ||
maintainer-scripts | ||
zlib | ||
.dir-locals.el | ||
.gitattributes | ||
.gitignore | ||
ABOUT-NLS | ||
ar-lib | ||
ChangeLog | ||
ChangeLog.jit | ||
ChangeLog.tree-ssa | ||
compile | ||
config-ml.in | ||
config.guess | ||
config.rpath | ||
config.sub | ||
configure | ||
configure.ac | ||
COPYING | ||
COPYING.LIB | ||
COPYING.RUNTIME | ||
COPYING3 | ||
COPYING3.LIB | ||
depcomp | ||
install-sh | ||
libtool-ldflags | ||
libtool.m4 | ||
ltgcc.m4 | ||
ltmain.sh | ||
ltoptions.m4 | ||
ltsugar.m4 | ||
ltversion.m4 | ||
lt~obsolete.m4 | ||
MAINTAINERS | ||
Makefile.def | ||
Makefile.in | ||
Makefile.tpl | ||
missing | ||
mkdep | ||
mkinstalldirs | ||
move-if-change | ||
multilib.am | ||
README | ||
SECURITY.txt | ||
symlink-tree | ||
test-driver | ||
ylwrap |
This directory contains the GNU Compiler Collection (GCC). The GNU Compiler Collection is free software. See the files whose names start with COPYING for copying permission. The manuals, and some of the runtime libraries, are under different terms; see the individual source files for details. The directory INSTALL contains copies of the installation information as HTML and plain text. The source of this information is gcc/doc/install.texi. The installation information includes details of what is included in the GCC sources and what files GCC installs. See the file gcc/doc/gcc.texi (together with other files that it includes) for usage and porting information. An online readable version of the manual is in the files gcc/doc/gcc.info*. See http://gcc.gnu.org/bugs/ for how to report bugs usefully. Copyright years on GCC source files may be listed using range notation, e.g., 1987-2012, indicating that every year in the range, inclusive, is a copyrightable year that could otherwise be listed individually.