gcc/libgcc/soft-fp/brain.h

173 lines
5.2 KiB
C
Raw Normal View History

middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic support Here is a complete patch to add std::bfloat16_t support on x86 (AArch64 and ARM left for later). Almost no BFmode optabs are added by the patch, so for binops/unops it extends to SFmode first and then truncates back to BFmode. For {HF,SF,DF,XF,TF}mode -> BFmode conversions libgcc has implementations of all those conversions so that we avoid double rounding, for BFmode -> {DF,XF,TF}mode conversions to avoid growing libgcc too much it emits BFmode -> SFmode conversion first and then converts to the even wider mode, neither step should be imprecise. For BFmode -> HFmode, it first emits a precise BFmode -> SFmode conversion and then SFmode -> HFmode, because neither format is subset or superset of the other, while SFmode is superset of both. expr.cc then contains a -ffast-math optimization of the BF -> SF and SF -> BF conversions if we don't optimize for space (and for the latter if -frounding-math isn't enabled either). For x86, perhaps truncsfbf2 optab could be defined for TARGET_AVX512BF16 but IMNSHO should FAIL if !flag_finite_math || flag_rounding_math || !flag_unsafe_math_optimizations, because I think the insn doesn't raise on sNaNs, hardcodes round to nearest and flushes denormals to zero. By default (unless x86 -fexcess-precision=16) we use float excess precision for BFmode, so truncate only on explicit casts and assignments. The patch introduces a single __bf16 builtin - __builtin_nansf16b, because (__bf16) __builtin_nansf ("") will drop the sNaN into qNaN, and uses f16b suffix instead of bf16 because there would be ambiguity on log vs. logb - __builtin_logbf16 could be either log with bf16 suffix or logb with f16 suffix. In other cases libstdc++ should mostly use __builtin_*f for std::bfloat16_t overloads (we have a problem with std::nextafter though but that one we have also for std::float16_t). 2022-10-14 Jakub Jelinek <jakub@redhat.com> gcc/ * tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE. * tree.h (bfloat16_type_node): Define. * tree.cc (excess_precision_type): Promote bfloat16_type_mode like float16_type_mode. (build_common_tree_nodes): Initialize bfloat16_type_node if BFmode is supported. * expmed.h (maybe_expand_shift): Declare. * expmed.cc (maybe_expand_shift): No longer static. * expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF conversions. If there is no optab, handle BF -> {DF,XF,TF,HF} conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add -ffast-math generic implementation for BF -> SF and SF -> BF conversions. * builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING): New. * builtins.def (BUILT_IN_NANSF16B): New builtin. * fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NANSF16B. * config/i386/i386.cc (classify_argument): Handle E_BCmode. (ix86_libgcc_floating_mode_supported_p): Also return true for BFmode for -msse2. (ix86_mangle_type): Mangle BFmode as DF16b. (ix86_invalid_conversion, ix86_invalid_unary_op, ix86_invalid_binary_op): Remove. (TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP, TARGET_INVALID_BINARY_OP): Don't redefine. * config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove. (ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than ix86_bf16_type_node, only create it if still NULL. * config/i386/i386-builtin-types.def (BFLOAT16): Likewise. * config/i386/i386.md (cbranchbf4, cstorebf4): New expanders. gcc/c-family/ * c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node, predefine __BFLT16_*__ macros and for C++23 also __STDCPP_BFLOAT16_T__. Predefine bfloat16_type_node related macros for -fbuilding-libgcc. * c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16. gcc/c/ * c-typeck.cc (convert_arguments): Don't promote __bf16 to double. gcc/cp/ * cp-tree.h (extended_float_type_p): Return true for bfloat16_type_node. * typeck.cc (cp_compare_floating_point_conversion_ranks): Set extended{1,2} if mv{1,2} is bfloat16_type_node. Adjust comment. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_bfloat16, check_effective_target_bfloat16_runtime, add_options_for_bfloat16): New. * gcc.dg/torture/bfloat16-basic.c: New test. * gcc.dg/torture/bfloat16-builtin.c: New test. * gcc.dg/torture/bfloat16-builtin-issignaling-1.c: New test. * gcc.dg/torture/bfloat16-complex.c: New test. * gcc.dg/torture/builtin-issignaling-1.c: Allow to be includable from bfloat16-builtin-issignaling-1.c. * gcc.dg/torture/floatn-basic.h: Allow to be includable from bfloat16-basic.c. * gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected diagnostics. * gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise. * gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise. * g++.target/i386/bfloat_cpp_typecheck.C: Likewise. libcpp/ * include/cpplib.h (CPP_N_BFLOAT16): Define. * expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for C++. libgcc/ * config/i386/t-softfp (softfp_extensions): Add bfsf. (softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf. (CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c, CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add -msse2. * config/i386/libgcc-glibc.ver (GCC_13.0.0): Export __extendbfsf2 and __trunc{s,d,x,t,h}fbf2. * config/i386/sfp-machine.h (_FP_NANSIGN_B): Define. * config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define. * config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define. * soft-fp/brain.h: New file. * soft-fp/truncsfbf2.c: New file. * soft-fp/truncdfbf2.c: New file. * soft-fp/truncxfbf2.c: New file. * soft-fp/trunctfbf2.c: New file. * soft-fp/trunchfbf2.c: New file. * soft-fp/truncbfhf2.c: New file. * soft-fp/extendbfsf2.c: New file. libiberty/ * cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment. * cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t entry. (cplus_demangle_type): Demangle DF16b. * testsuite/demangle-expected (_Z3xxxDF16b): New test.
2022-10-14 09:37:01 +02:00
/* Software floating-point emulation.
Definitions for Brain Floating Point format (bfloat16).
Copyright (C) 1997-2022 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
In addition to the permissions in the GNU Lesser General Public
License, the Free Software Foundation gives you unlimited
permission to link the compiled version of this file into
combinations with other programs, and to distribute those
combinations without any restriction coming from the use of this
file. (The Lesser General Public License restrictions do apply in
other respects; for example, they cover modification of the file,
and distribution when not linked into a combine executable.)
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
<https://www.gnu.org/licenses/>. */
#ifndef SOFT_FP_BRAIN_H
#define SOFT_FP_BRAIN_H 1
#if _FP_W_TYPE_SIZE < 32
# error "Here's a nickel kid. Go buy yourself a real computer."
#endif
#define _FP_FRACTBITS_B (_FP_W_TYPE_SIZE)
#define _FP_FRACTBITS_DW_B (_FP_W_TYPE_SIZE)
#define _FP_FRACBITS_B 8
#define _FP_FRACXBITS_B (_FP_FRACTBITS_B - _FP_FRACBITS_B)
#define _FP_WFRACBITS_B (_FP_WORKBITS + _FP_FRACBITS_B)
#define _FP_WFRACXBITS_B (_FP_FRACTBITS_B - _FP_WFRACBITS_B)
#define _FP_EXPBITS_B 8
#define _FP_EXPBIAS_B 127
#define _FP_EXPMAX_B 255
#define _FP_QNANBIT_B ((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2))
#define _FP_QNANBIT_SH_B ((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2+_FP_WORKBITS))
#define _FP_IMPLBIT_B ((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1))
#define _FP_IMPLBIT_SH_B ((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1+_FP_WORKBITS))
#define _FP_OVERFLOW_B ((_FP_W_TYPE) 1 << (_FP_WFRACBITS_B))
#define _FP_WFRACBITS_DW_B (2 * _FP_WFRACBITS_B)
#define _FP_WFRACXBITS_DW_B (_FP_FRACTBITS_DW_B - _FP_WFRACBITS_DW_B)
#define _FP_HIGHBIT_DW_B \
((_FP_W_TYPE) 1 << (_FP_WFRACBITS_DW_B - 1) % _FP_W_TYPE_SIZE)
/* The implementation of _FP_MUL_MEAT_B and _FP_DIV_MEAT_B should be
chosen by the target machine. */
typedef float BFtype __attribute__ ((mode (BF)));
union _FP_UNION_B
{
BFtype flt;
struct _FP_STRUCT_LAYOUT
{
#if __BYTE_ORDER == __BIG_ENDIAN
unsigned sign : 1;
unsigned exp : _FP_EXPBITS_B;
unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
#else
unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
unsigned exp : _FP_EXPBITS_B;
unsigned sign : 1;
#endif
} bits;
};
#define FP_DECL_B(X) _FP_DECL (1, X)
#define FP_UNPACK_RAW_B(X, val) _FP_UNPACK_RAW_1 (B, X, (val))
#define FP_UNPACK_RAW_BP(X, val) _FP_UNPACK_RAW_1_P (B, X, (val))
#define FP_PACK_RAW_B(val, X) _FP_PACK_RAW_1 (B, (val), X)
#define FP_PACK_RAW_BP(val, X) \
do \
{ \
if (!FP_INHIBIT_RESULTS) \
_FP_PACK_RAW_1_P (B, (val), X); \
} \
while (0)
#define FP_UNPACK_B(X, val) \
do \
{ \
_FP_UNPACK_RAW_1 (B, X, (val)); \
_FP_UNPACK_CANONICAL (B, 1, X); \
} \
while (0)
#define FP_UNPACK_BP(X, val) \
do \
{ \
_FP_UNPACK_RAW_1_P (B, X, (val)); \
_FP_UNPACK_CANONICAL (B, 1, X); \
} \
while (0)
#define FP_UNPACK_SEMIRAW_B(X, val) \
do \
{ \
_FP_UNPACK_RAW_1 (B, X, (val)); \
_FP_UNPACK_SEMIRAW (B, 1, X); \
} \
while (0)
#define FP_UNPACK_SEMIRAW_BP(X, val) \
do \
{ \
_FP_UNPACK_RAW_1_P (B, X, (val)); \
_FP_UNPACK_SEMIRAW (B, 1, X); \
} \
while (0)
#define FP_PACK_B(val, X) \
do \
{ \
_FP_PACK_CANONICAL (B, 1, X); \
_FP_PACK_RAW_1 (B, (val), X); \
} \
while (0)
#define FP_PACK_BP(val, X) \
do \
{ \
_FP_PACK_CANONICAL (B, 1, X); \
if (!FP_INHIBIT_RESULTS) \
_FP_PACK_RAW_1_P (B, (val), X); \
} \
while (0)
#define FP_PACK_SEMIRAW_B(val, X) \
do \
{ \
_FP_PACK_SEMIRAW (B, 1, X); \
_FP_PACK_RAW_1 (B, (val), X); \
} \
while (0)
#define FP_PACK_SEMIRAW_BP(val, X) \
do \
{ \
_FP_PACK_SEMIRAW (B, 1, X); \
if (!FP_INHIBIT_RESULTS) \
_FP_PACK_RAW_1_P (B, (val), X); \
} \
while (0)
#define FP_TO_INT_B(r, X, rsz, rsg) _FP_TO_INT (B, 1, (r), X, (rsz), (rsg))
#define FP_TO_INT_ROUND_B(r, X, rsz, rsg) \
_FP_TO_INT_ROUND (B, 1, (r), X, (rsz), (rsg))
#define FP_FROM_INT_B(X, r, rs, rt) _FP_FROM_INT (B, 1, X, (r), (rs), rt)
/* BFmode arithmetic is not implemented. */
#define _FP_FRAC_HIGH_B(X) _FP_FRAC_HIGH_1 (X)
#define _FP_FRAC_HIGH_RAW_B(X) _FP_FRAC_HIGH_1 (X)
#define _FP_FRAC_HIGH_DW_B(X) _FP_FRAC_HIGH_1 (X)
#define FP_CMP_EQ_B(r, X, Y, ex) _FP_CMP_EQ (B, 1, (r), X, Y, (ex))
#endif /* !SOFT_FP_BRAIN_H */