gcc/libgcc/config/i386/sfp-machine.h
Jakub Jelinek c2565a31c1 middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic support
Here is a complete patch to add std::bfloat16_t support on
x86 (AArch64 and ARM left for later).  Almost no BFmode optabs
are added by the patch, so for binops/unops it extends to SFmode
first and then truncates back to BFmode.
For {HF,SF,DF,XF,TF}mode -> BFmode conversions libgcc has implementations
of all those conversions so that we avoid double rounding, for
BFmode -> {DF,XF,TF}mode conversions to avoid growing libgcc too much
it emits BFmode -> SFmode conversion first and then converts to the even
wider mode, neither step should be imprecise.
For BFmode -> HFmode, it first emits a precise BFmode -> SFmode conversion
and then SFmode -> HFmode, because neither format is subset or superset
of the other, while SFmode is superset of both.
expr.cc then contains a -ffast-math optimization of the BF -> SF and
SF -> BF conversions if we don't optimize for space (and for the latter
if -frounding-math isn't enabled either).
For x86, perhaps truncsfbf2 optab could be defined for TARGET_AVX512BF16
but IMNSHO should FAIL if !flag_finite_math || flag_rounding_math
|| !flag_unsafe_math_optimizations, because I think the insn doesn't
raise on sNaNs, hardcodes round to nearest and flushes denormals to zero.
By default (unless x86 -fexcess-precision=16) we use float excess
precision for BFmode, so truncate only on explicit casts and assignments.
The patch introduces a single __bf16 builtin - __builtin_nansf16b,
because (__bf16) __builtin_nansf ("") will drop the sNaN into qNaN,
and uses f16b suffix instead of bf16 because there would be ambiguity on
log vs. logb - __builtin_logbf16 could be either log with bf16 suffix
or logb with f16 suffix.  In other cases libstdc++ should mostly use
__builtin_*f for std::bfloat16_t overloads (we have a problem with
std::nextafter though but that one we have also for std::float16_t).

2022-10-14  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
	* tree.h (bfloat16_type_node): Define.
	* tree.cc (excess_precision_type): Promote bfloat16_type_mode
	like float16_type_mode.
	(build_common_tree_nodes): Initialize bfloat16_type_node if
	BFmode is supported.
	* expmed.h (maybe_expand_shift): Declare.
	* expmed.cc (maybe_expand_shift): No longer static.
	* expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
	conversions.  If there is no optab, handle BF -> {DF,XF,TF,HF}
	conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
	-ffast-math generic implementation for BF -> SF and SF -> BF
	conversions.
	* builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING): New.
	* builtins.def (BUILT_IN_NANSF16B): New builtin.
	* fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NANSF16B.
	* config/i386/i386.cc (classify_argument): Handle E_BCmode.
	(ix86_libgcc_floating_mode_supported_p): Also return true for BFmode
	for -msse2.
	(ix86_mangle_type): Mangle BFmode as DF16b.
	(ix86_invalid_conversion, ix86_invalid_unary_op,
	ix86_invalid_binary_op): Remove.
	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
	TARGET_INVALID_BINARY_OP): Don't redefine.
	* config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
	(ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
	ix86_bf16_type_node, only create it if still NULL.
	* config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
	* config/i386/i386.md (cbranchbf4, cstorebf4): New expanders.
gcc/c-family/
	* c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
	predefine __BFLT16_*__ macros and for C++23 also
	__STDCPP_BFLOAT16_T__.  Predefine bfloat16_type_node related
	macros for -fbuilding-libgcc.
	* c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16.
gcc/c/
	* c-typeck.cc (convert_arguments): Don't promote __bf16 to
	double.
gcc/cp/
	* cp-tree.h (extended_float_type_p): Return true for
	bfloat16_type_node.
	* typeck.cc (cp_compare_floating_point_conversion_ranks): Set
	extended{1,2} if mv{1,2} is bfloat16_type_node.  Adjust comment.
gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_bfloat16,
	check_effective_target_bfloat16_runtime, add_options_for_bfloat16):
	New.
	* gcc.dg/torture/bfloat16-basic.c: New test.
	* gcc.dg/torture/bfloat16-builtin.c: New test.
	* gcc.dg/torture/bfloat16-builtin-issignaling-1.c: New test.
	* gcc.dg/torture/bfloat16-complex.c: New test.
	* gcc.dg/torture/builtin-issignaling-1.c: Allow to be includable
	from bfloat16-builtin-issignaling-1.c.
	* gcc.dg/torture/floatn-basic.h: Allow to be includable from
	bfloat16-basic.c.
	* gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected
	diagnostics.
	* gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise.
	* gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise.
	* g++.target/i386/bfloat_cpp_typecheck.C: Likewise.
libcpp/
	* include/cpplib.h (CPP_N_BFLOAT16): Define.
	* expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for
	C++.
libgcc/
	* config/i386/t-softfp (softfp_extensions): Add bfsf.
	(softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf.
	(CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c,
	CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add
	-msse2.
	* config/i386/libgcc-glibc.ver (GCC_13.0.0): Export
	__extendbfsf2 and __trunc{s,d,x,t,h}fbf2.
	* config/i386/sfp-machine.h (_FP_NANSIGN_B): Define.
	* config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define.
	* config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define.
	* soft-fp/brain.h: New file.
	* soft-fp/truncsfbf2.c: New file.
	* soft-fp/truncdfbf2.c: New file.
	* soft-fp/truncxfbf2.c: New file.
	* soft-fp/trunctfbf2.c: New file.
	* soft-fp/trunchfbf2.c: New file.
	* soft-fp/truncbfhf2.c: New file.
	* soft-fp/extendbfsf2.c: New file.
libiberty/
	* cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment.
	* cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t
	entry.
	(cplus_demangle_type): Demangle DF16b.
	* testsuite/demangle-expected (_Z3xxxDF16b): New test.
2022-10-14 09:37:01 +02:00

101 lines
3.2 KiB
C

#ifdef __MINGW32__
/* Make sure we are using gnu-style bitfield handling. */
#define _FP_STRUCT_LAYOUT __attribute__ ((gcc_struct))
#endif
/* The type of the result of a floating point comparison. This must
match `__libgcc_cmp_return__' in GCC for the target. */
typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__)));
#define CMPtype __gcc_CMPtype
#ifdef __x86_64__
#include "config/i386/64/sfp-machine.h"
#else
#include "config/i386/32/sfp-machine.h"
#endif
#define _FP_KEEPNANFRACP 1
#define _FP_QNANNEGATEDP 0
#define _FP_NANSIGN_H 1
#define _FP_NANSIGN_B 1
#define _FP_NANSIGN_S 1
#define _FP_NANSIGN_D 1
#define _FP_NANSIGN_E 1
#define _FP_NANSIGN_Q 1
/* Here is something Intel misdesigned: the specs don't define
the case where we have two NaNs with same mantissas, but
different sign. Different operations pick up different NaNs. */
#define _FP_CHOOSENAN(fs, wc, R, X, Y, OP) \
do { \
if (_FP_FRAC_GT_##wc(X, Y) \
|| (_FP_FRAC_EQ_##wc(X,Y) && (OP == '+' || OP == '*'))) \
{ \
R##_s = X##_s; \
_FP_FRAC_COPY_##wc(R,X); \
} \
else \
{ \
R##_s = Y##_s; \
_FP_FRAC_COPY_##wc(R,Y); \
} \
R##_c = FP_CLS_NAN; \
} while (0)
#ifndef _SOFT_FLOAT
#define FP_EX_INVALID 0x01
#define FP_EX_DENORM 0x02
#define FP_EX_DIVZERO 0x04
#define FP_EX_OVERFLOW 0x08
#define FP_EX_UNDERFLOW 0x10
#define FP_EX_INEXACT 0x20
#define FP_EX_ALL \
(FP_EX_INVALID | FP_EX_DENORM | FP_EX_DIVZERO | FP_EX_OVERFLOW \
| FP_EX_UNDERFLOW | FP_EX_INEXACT)
void __sfp_handle_exceptions (int);
#define FP_HANDLE_EXCEPTIONS \
do { \
if (__builtin_expect (_fex, 0)) \
__sfp_handle_exceptions (_fex); \
} while (0)
#define FP_TRAPPING_EXCEPTIONS ((~_fcw >> FP_EX_SHIFT) & FP_EX_ALL)
#define FP_ROUNDMODE (_fcw & FP_RND_MASK)
#endif
#define _FP_TININESS_AFTER_ROUNDING 1
#define __LITTLE_ENDIAN 1234
#define __BIG_ENDIAN 4321
#define __BYTE_ORDER __LITTLE_ENDIAN
/* Define ALIASNAME as a strong alias for NAME. */
#if defined __APPLE__
/* Mach-O doesn't support aliasing, so we build a secondary function for
the alias - we need to do a bit of a dance to find out what the type of
the arguments is and then apply that to the secondary function.
If these functions ever return anything but CMPtype we need to revisit
this... */
typedef float alias_HFtype __attribute__ ((mode (HF)));
typedef float alias_SFtype __attribute__ ((mode (SF)));
typedef float alias_DFtype __attribute__ ((mode (DF)));
typedef float alias_TFtype __attribute__ ((mode (TF)));
#define ALIAS_SELECTOR \
CMPtype (*) (alias_HFtype, alias_HFtype): (alias_HFtype) 0, \
CMPtype (*) (alias_SFtype, alias_SFtype): (alias_SFtype) 0, \
CMPtype (*) (alias_DFtype, alias_DFtype): (alias_DFtype) 0, \
CMPtype (*) (alias_TFtype, alias_TFtype): (alias_TFtype) 0
#define strong_alias(name, aliasname) \
CMPtype aliasname (__typeof (_Generic (name, ALIAS_SELECTOR)) a, \
__typeof (_Generic (name, ALIAS_SELECTOR)) b) \
{ return name (a, b); }
#else
# define strong_alias(name, aliasname) _strong_alias(name, aliasname)
# define _strong_alias(name, aliasname) \
extern __typeof (name) aliasname __attribute__ ((alias (#name)));
#endif