GCC modified for the FreeChainXenon project
Find a file
Tamar Christina c98aabc142 AArch64: Add implementation for pow2 bitmask division.
This adds an implementation for the new optab for unsigned pow2 bitmask for
AArch64.

The implementation rewrites:

   x = y / (2 ^ (sizeof (y)/2)-1

into e.g. (for bytes)

   (x + ((x + 257) >> 8)) >> 8

where it's required that the additions be done in double the precision of x
such that we don't lose any bits during an overflow.

Essentially the sequence decomposes the division into doing two smaller
divisions, one for the top and bottom parts of the number and adding the results
back together.

To account for the fact that shift by 8 would be division by 256 we add 1 to
both parts of x such that when 255 we still get 1 as the answer.

Because the amount we shift are half the original datatype we can use the
halfing instructions the ISA provides to do the operation instead of using
actual shifts.

For AArch64 this means we generate for:

void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n)
{
  for (int i = 0; i < (n & -16); i+=1)
    pixel[i] = (pixel[i] * level) / 0xff;
}

the following:

	movi    v3.16b, 0x1
	umull2  v1.8h, v0.16b, v2.16b
	umull   v0.8h, v0.8b, v2.8b
	addhn   v5.8b, v1.8h, v3.8h
	addhn   v4.8b, v0.8h, v3.8h
	uaddw   v1.8h, v1.8h, v5.8b
	uaddw   v0.8h, v0.8h, v4.8b
	uzp2    v0.16b, v0.16b, v1.16b

instead of:

	umull   v2.8h, v1.8b, v5.8b
	umull2  v1.8h, v1.16b, v5.16b
	umull   v0.4s, v2.4h, v3.4h
	umull2  v2.4s, v2.8h, v3.8h
	umull   v4.4s, v1.4h, v3.4h
	umull2  v1.4s, v1.8h, v3.8h
	uzp2    v0.8h, v0.8h, v2.8h
	uzp2    v1.8h, v4.8h, v1.8h
	shrn    v0.8b, v0.8h, 7
	shrn2   v0.16b, v1.8h, 7

Which results in significantly faster code.

Thanks for Wilco for the concept.

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (@aarch64_bitmask_udiv<mode>3): New.
	* config/aarch64/aarch64.cc (aarch64_vectorize_can_special_div_by_constant): New.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/div-by-bitmask.c: New test.
2022-11-14 17:41:33 +00:00
c++tools
config Revert "sphinx: support Sphinx in build system" 2022-11-14 09:35:06 +01:00
contrib gcc-changelog: temporarily disable check_line_start 2022-11-14 03:52:14 +01:00
fixincludes Daily bump. 2022-10-08 00:17:29 +00:00
gcc AArch64: Add implementation for pow2 bitmask division. 2022-11-14 17:41:33 +00:00
gnattools Daily bump. 2022-09-01 00:17:39 +00:00
gotools Daily bump. 2022-08-31 00:16:45 +00:00
include Manually add ChangeLog entries from r13-3652-ge4cba49413ca429dc82f6aa2e88129ecb3fdd943 2022-11-06 12:12:47 +01:00
INSTALL
intl
libada Daily bump. 2022-08-26 00:16:21 +00:00
libatomic Daily bump. 2022-10-20 00:17:52 +00:00
libbacktrace Daily bump. 2022-10-13 00:17:37 +00:00
libcc1 Daily bump. 2022-11-02 00:17:38 +00:00
libcody
libcpp Daily bump. 2022-11-06 11:05:22 +00:00
libdecnumber Daily bump. 2022-10-08 00:17:29 +00:00
libffi Daily bump. 2022-10-13 00:17:37 +00:00
libgcc Daily bump. 2022-11-06 11:05:22 +00:00
libgfortran Daily bump. 2022-10-13 00:17:37 +00:00
libgo runtime: use _libgo_off_t_type when calling C mmap 2022-10-27 17:12:57 -07:00
libgomp Revert "sphinx: copy files from texi2rst-generated repository" 2022-11-14 09:35:07 +01:00
libiberty Revert "sphinx: copy files from texi2rst-generated repository" 2022-11-14 09:35:07 +01:00
libitm Revert "sphinx: copy files from texi2rst-generated repository" 2022-11-14 09:35:07 +01:00
libobjc Daily bump. 2022-10-21 00:17:52 +00:00
libphobos Daily bump. 2022-11-06 11:05:22 +00:00
libquadmath Revert "sphinx: copy files from texi2rst-generated repository" 2022-11-14 09:35:07 +01:00
libsanitizer Daily bump. 2022-10-20 00:17:52 +00:00
libssp Daily bump. 2022-10-13 00:17:37 +00:00
libstdc++-v3 libstdc++: Fix installation of python files for debug lib 2022-11-14 15:59:50 +00:00
libvtv Daily bump. 2022-11-01 00:19:02 +00:00
lto-plugin Daily bump. 2022-10-13 00:17:37 +00:00
maintainer-scripts Revert "sphinx: add update_web_docs_git.py script" 2022-11-14 09:35:05 +01:00
zlib Daily bump. 2022-10-13 00:17:37 +00:00
.dir-locals.el
.gitattributes
.gitignore
ABOUT-NLS
ar-lib
ChangeLog Daily bump. 2022-11-14 00:17:08 +00:00
ChangeLog.jit
ChangeLog.tree-ssa
compile
config-ml.in
config.guess
config.rpath
config.sub
configure Revert "sphinx: support Sphinx in build system" 2022-11-14 09:35:06 +01:00
configure.ac Revert "sphinx: support Sphinx in build system" 2022-11-14 09:35:06 +01:00
COPYING
COPYING.LIB
COPYING.RUNTIME
COPYING3
COPYING3.LIB
depcomp
install-sh
libtool-ldflags
libtool.m4 Generic configury support for shared libs on VxWorks 2022-10-11 07:31:07 +00:00
ltgcc.m4
ltmain.sh
ltoptions.m4
ltsugar.m4
ltversion.m4
lt~obsolete.m4
MAINTAINERS Update email address 2022-10-31 11:15:45 +00:00
Makefile.def Remove support for Intel MIC offloading 2022-11-04 10:51:01 +01:00
Makefile.in Remove support for Intel MIC offloading 2022-11-04 10:51:01 +01:00
Makefile.tpl
missing
mkdep
mkinstalldirs
move-if-change
multilib.am
README
symlink-tree
test-driver
ylwrap

This directory contains the GNU Compiler Collection (GCC).

The GNU Compiler Collection is free software.  See the files whose
names start with COPYING for copying permission.  The manuals, and
some of the runtime libraries, are under different terms; see the
individual source files for details.

The directory INSTALL contains copies of the installation information
as HTML and plain text.  The source of this information is
gcc/doc/install.texi.  The installation information includes details
of what is included in the GCC sources and what files GCC installs.

See the file gcc/doc/gcc.texi (together with other files that it
includes) for usage and porting information.  An online readable
version of the manual is in the files gcc/doc/gcc.info*.

See http://gcc.gnu.org/bugs/ for how to report bugs usefully.

Copyright years on GCC source files may be listed using range
notation, e.g., 1987-2012, indicating that every year in the range,
inclusive, is a copyrightable year that could otherwise be listed
individually.