A 2013 paper [1] proposed 5 simple tests for evaluating the
effectiveness of static analysis tools at detecting
CWE-121 ("Stack-based Buffer Overflow").
The tests can be found in:
https://samate.nist.gov/SARD/test-suites/81
This patch adds theses 5 tests to -fanalyzer's testsuite, lightly
modified to add DejaGnu directives.
This is for unit-testing; for broader testing of -fanalyzer I'm working
on a separate integration testing suite that builds various real-world C
projects with -fanalyzer, currently here:
https://github.com/davidmalcolm/gcc-analyzer-integration-tests
[1] Black, P. , Koo, H. and Irish, T. (2013), A Basic CWE-121 Buffer Overflow Effectiveness Test Suite, Proc. 6th Latin-American Symposium on Dependable Computing, Rio de Janeiro, -1, [online], https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=913117 (Accessed January 17, 2023)
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/SARD-tc117-basic-00001-min.c: New test, adapted
from https://samate.nist.gov/SARD/test-suites/81.
* gcc.dg/analyzer/SARD-tc1909-stack_overflow_loop.c: Likewise.
* gcc.dg/analyzer/SARD-tc249-basic-00034-min.c: Likewise.
* gcc.dg/analyzer/SARD-tc293-basic-00045-min.c: Likewise.
* gcc.dg/analyzer/SARD-tc841-basic-00182-min.c: Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
The code removing function bodies when the last call graph clone of a
node is removed is too aggressive when there are nodes up the
clone_of chain which still need them. Fixed by expanding the check.
gcc/ChangeLog:
2023-01-18 Martin Jambor <mjambor@suse.cz>
PR ipa/107944
* cgraph.cc (cgraph_node::remove): Check whether nodes up the
lcone_of chain also do not need the body.
r13-4743 exposed more tree sharing which runs into a latent issue
with LTO decl wrapping during streaming. The following adds a
testcase triggering the issue.
PR lto/108445
* gcc.dg/lto/pr108445_0.c: New testcase.
* gcc.dg/lto/pr108445_1.c: Likewise.
A recent change only initializes the regs.how[] during Dwarf unwinding
which resulted in an uninitialized offset used in return address signing
and random failures during unwinding. The fix is to encode the return
address signing state in REG_UNSAVED and a new state REG_UNSAVED_ARCHEXT.
libgcc/
PR target/107678
* unwind-dw2.h (REG_UNSAVED_ARCHEXT): Add new enum.
* unwind-dw2.c (uw_update_context_1): Add REG_UNSAVED_ARCHEXT case.
* unwind-dw2-execute_cfa.h: Use REG_UNSAVED_ARCHEXT/REG_UNSAVED to
encode the return address signing state.
* config/aarch64/aarch64-unwind.h (aarch64_demangle_return_addr)
Check current return address signing state.
(aarch64_frob_update_contex): Remove.
The MVE ACLE allows for __ARM_MVE_PRESERVE_USER_NAMESPACE to be defined,
which removes definitions for intrinsic functions without the __arm_
prefix. __arm_vld1q_z* and __arm_vst1q_p* are currently implemented via
calls to vldr* and vstr*, which results in several compile-time errors when
__ARM_MVE_PRESERVE_USER_NAMESPACE is defined. This patch replaces these
with calls to their prefixed counterparts, __arm_vldr* and __arm_str*,
and adds a test covering the definition of __ARM_MVE_PRESERVE_USER_NAMESPACE.
gcc/ChangeLog:
PR target/108442
* config/arm/arm_mve.h (__arm_vst1q_p_u8): Use prefixed intrinsic
function.
(__arm_vst1q_p_s8): Likewise.
(__arm_vld1q_z_u8): Likewise.
(__arm_vld1q_z_s8): Likewise.
(__arm_vst1q_p_u16): Likewise.
(__arm_vst1q_p_s16): Likewise.
(__arm_vld1q_z_u16): Likewise.
(__arm_vld1q_z_s16): Likewise.
(__arm_vst1q_p_u32): Likewise.
(__arm_vst1q_p_s32): Likewise.
(__arm_vld1q_z_u32): Likewise.
(__arm_vld1q_z_s32): Likewise.
(__arm_vld1q_z_f16): Likewise.
(__arm_vst1q_p_f16): Likewise.
(__arm_vld1q_z_f32): Likewise.
(__arm_vst1q_p_f32): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/arm/mve/general/preserve_user_namespace_1.c: New test.
Such operation can be done either bitwise-XOR or addition with -2147483648,
but the latter is one byte less if TARGET_DENSITY.
gcc/ChangeLog:
* config/xtensa/xtensa.md (xorsi3_internal):
Rename from the original of "xorsi3".
(xorsi3): New expansion pattern that emits addition rather than
bitwise-XOR when the second source is a constant of -2147483648
if TARGET_DENSITY.
As Andrew pointed out in PR108396, there is one typo in
rs6000-overload.def on built-in function vec_vsubcuq:
[VEC_VSUBCUQ, vec_vsubcuqP, __builtin_vec_vsubcuq]
"vec_vsubcuqP" should be "vec_vsubcuq", this typo caused
us to define vec_vsubcuqP in rs6000-vecdefines.h instead
of vec_vsubcuq, so that compiler is not able to realize
the built-in function name vec_vsubcuq any more.
Co-authored-By: Andrew Pinski <apinski@marvell.com>
PR target/108396
gcc/ChangeLog:
* config/rs6000/rs6000-overload.def (VEC_VSUBCUQ): Fix typo
vec_vsubcuqP with vec_vsubcuq.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr108396.c: New test.
PR108348 shows one special case that MMA opaque types are
used in function arguments and treated as pass by reference,
it results in one copying from argument to a temp variable,
since this copying happens before rs6000_function_arg check,
it can cause ICE without MMA support then. This patch is to
teach function rs6000_opaque_type_invalid_use_p to check if
any function argument in a gcall stmt has the invalid use of
MMA opaque types.
btw, I checked the handling on return value, it doesn't have
this kind of issue as its checking and error emission is quite
early, so this doesn't handle function return value.
PR target/108348
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_opaque_type_invalid_use_p): Add the
support for invalid uses of MMA opaque type in function arguments.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr108348-1.c: New test.
* gcc.target/powerpc/pr108348-2.c: New test.
Patches [1] and [2] fixed PR55522 for x86-linux but left all other x86
targets unfixed (x86-cygwin, x86-darwin and x86-mingw32).
This patch applies a similar change to other specs using crtfastmath.o.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608528.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608529.html
gcc/ChangeLog:
PR target/55522
* config/i386/cygwin.h (ENDFILE_SPEC): Link crtfastmath.o
whenever -mdaz-ftz is specified. Don't link crtfastmath.o when
-share or -mno-daz-ftz is specified.
* config/i386/darwin.h (ENDFILE_SPEC): Ditto.
* config/i386/mingw32.h (ENDFILE_SPEC): Ditto.
gcc/fortran/ChangeLog:
PR fortran/108421
* interface.cc (get_expr_storage_size): Check that we actually have
an integer value before trying to extract it with mpz_get_si.
gcc/testsuite/ChangeLog:
PR fortran/108421
* gfortran.dg/pr108421.f90: New test.
The stack protector is not supported in BPF. This patch disables
-fstack-protector in bpf-* targets, along with the emission of a note
indicating that the feature is not supported in this platform.
Regtested in bpf-unknown-none.
gcc/ChangeLog:
* config/bpf/bpf.cc (bpf_option_override): Disable
-fstack-protector.
Obfuscate the copyright text in gcc/m2/mc/mcOptions.mod so that the
year change script does not attempt to modify the text. The year
is determined at runtime and therefore the text requires
no modification. The middle printf (C) can be replaced by
a unicode character in the future.
gcc/m2/ChangeLog:
* mc-boot/GM2RTS.c: Rebuilt.
* mc-boot/GM2RTS.h: Rebuilt.
* mc-boot/Gdecl.c: Rebuilt.
* mc-boot/GmcOptions.c: Rebuilt.
* mc/mcOptions.mod (displayVersion):
Split first printf into three components
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
The config for --with-libstdcxx-zoneinfo=yes was comparing the target
triplet to "gnu* | linux* | kfreebsd*-gnu | knetbsd*-gnu" which is only
the last component of the triplet, so failed to match and always used
the zoneinfo_dir=none default. Check $target_os instead.
There was also an error in the check for native builds that tzdata.zi is
actually present in the configured directory. That meant a warning was
printed even when the file was present:
configure: zoneinfo data directory: /usr/share/zoneinfo
configure: WARNING: "/usr/share/zoneinfo does not contain tzdata.zi file"
configure: static tzdata.zi file will be compiled into the library
libstdc++-v3/ChangeLog:
* acinclude.m4 (GLIBCXX_ZONEINFO_DIR): Check $target_os instead
of $host. Fix check for file being present during native build.
* configure: Regenerate.
PR-108404 occurs because the C prototype does not match the Modula-2
procedure M2RTS_Halt. This patch provides a new procedure M2RTS_HaltC
which avoids the C/C++ code from having to fabricate a Modula-2 string.
gcc/m2/ChangeLog:
* gm2-libs-iso/M2RTS.def (Halt): Parameter file renamed to filename.
(HaltC): New procedure declaration.
(ErrorMessage): Parameter file renamed to filename.
* gm2-libs-iso/M2RTS.mod (Halt): Parameter file renamed to
filename.
(HaltC): New procedure implementation.
(ErrorStringC): New procedure implementation.
(ErrorMessageC): New procedure implementation.
* gm2-libs/M2RTS.def (Halt): Parameter file renamed to filename.
(HaltC): New procedure declaration.
(ErrorMessage): Parameter file renamed to filename.
* gm2-libs/M2RTS.mod (Halt): Parameter file renamed to filename.
(HaltC): New procedure implementation.
(ErrorStringC): New procedure implementation.
(ErrorMessageC): New procedure implementation.
libgm2/ChangeLog:
* libm2iso/RTco.cc (_M2_RTco_fini): Call M2RTS_HaltC.
(newSem): Call M2RTS_HaltC.
(currentThread): Call M2RTS_HaltC.
(never): Call M2RTS_HaltC.
(defined): Call M2RTS_HaltC.
(initThread): Call M2RTS_HaltC.
(RTco_transfer): Call M2RTS_HaltC.
* libm2iso/m2rts.h (M2RTS_Halt): Provide parameter names.
(M2RTS_HaltC): New procedure declaration.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
The comment above simplify_rotate roughly describes what patterns
are matched into what:
We are looking for X with unsigned type T with bitsize B, OP being
+, | or ^, some type T2 wider than T. For:
(X << CNT1) OP (X >> CNT2) iff CNT1 + CNT2 == B
((T) ((T2) X << CNT1)) OP ((T) ((T2) X >> CNT2)) iff CNT1 + CNT2 == B
transform these into:
X r<< CNT1
Or for:
(X << Y) OP (X >> (B - Y))
(X << (int) Y) OP (X >> (int) (B - Y))
((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
(X << Y) | (X >> ((-Y) & (B - 1)))
(X << (int) Y) | (X >> (int) ((-Y) & (B - 1)))
((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
transform these into (last 2 only if ranger can prove Y < B):
X r<< Y
Or for:
(X << (Y & (B - 1))) | (X >> ((-Y) & (B - 1)))
(X << (int) (Y & (B - 1))) | (X >> (int) ((-Y) & (B - 1)))
((T) ((T2) X << (Y & (B - 1)))) | ((T) ((T2) X >> ((-Y) & (B - 1))))
((T) ((T2) X << (int) (Y & (B - 1)))) \
| ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
transform these into:
X r<< (Y & (B - 1))
The following testcase shows that 2 of these are problematic.
If T2 is wider than T, then the 2 which yse (-Y) & (B - 1) on one
of the shift counts but Y on the can do something different from
rotate. E.g.:
__attribute__((noipa)) unsigned char
f7 (unsigned char x, unsigned int y)
{
unsigned int t = x;
return (t << y) | (t >> ((-y) & 7));
}
if y is [0, 7], then it is a normal rotate, and if y is in [32, ~0U]
then it is UB, but for y in [9, 31] the left shift in this case
will never leave any bits in the result, while in a rotate they are
left there. Say for y 5 and x 0xaa the expression gives
0x55 which is the same thing as rotate, while for y 19 and x 0xaa
0x5, which is different.
Now, I believe the
((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
forms are ok, because B - Y still needs to be a valid shift count,
and if Y > B then B - Y should be either negative or very large
positive (for unsigned types).
And similarly the last 2 cases above which use & (B - 1) on both
shift operands are definitely ok.
The following patch disables the
((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
unless ranger says Y is not in [B, B2 - 1] range.
And, looking at it again this morning, actually the Y equal to B
case is still fine, if Y is equal to 0, then it is
(T) (((T2) X << 0) | ((T2) X >> 0))
and so X, for Y == B it is
(T) (((T2) X << B) | ((T2) X >> 0))
which is the same as
(T) (0 | ((T2) X >> 0))
which is also X. So instead of the [B, B2 - 1] range we could use
[B + 1, B2 - 1]. And, if we wanted to go further, even multiplies
of B are ok if they are smaller than B2, so we could construct a detailed
int_range_max if we wanted.
2023-01-17 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/106523
* tree-ssa-forwprop.cc (simplify_rotate): For the
patterns with (-Y) & (B - 1) in one operand's shift
count and Y in another, if T2 has wider precision than T,
punt if Y could have a value in [B, B2 - 1] range.
* c-c++-common/rotate-2.c (f5, f6, f7, f8, f13, f14, f15, f16,
f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
__builtin_unreachable about shift count.
* c-c++-common/rotate-2b.c: New test.
* c-c++-common/rotate-4.c (f5, f6, f7, f8, f13, f14, f15, f16,
f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
__builtin_unreachable about shift count.
* c-c++-common/rotate-4b.c: New test.
* gcc.c-torture/execute/pr106523.c: New test.
When using GNU ld on Solaris, a large number of asan tests SEGV, while
Solaris ld is fine. This happens inside the __tls_get_addr interceptor,
which is highly glibc-specific. Therefore this patch disables that
interceptor.
Posted upstream at https://reviews.llvm.org/D141385.
Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11.
2023-01-17 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
libsanitizer:
* sanitizer_common/sanitizer_platform_interceptors.h: Cherry-pick
llvm-project revision 951cf656b2faaf6fc0baa867293c0cb0ab131951.
Since r5-172-gd9f069ab4f6450, the code no longer matches the
comment as the code for Solaris 9 support was removed.
This just updates the comment to reference AIX only as
the code does.
Committed as obvious.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (add_options_for_tls): Remove
reference to Solaris 9 in comments.
-mforce-indirect-call generates invalid instruction in 32-bit MI thunk
since there are no available scratch registers in 32-bit PIC mode.
Disable -mforce-indirect-call for PIC in 32-bit mode when generating
MI thunk.
gcc/
PR target/105980
* config/i386/i386.cc (x86_output_mi_thunk): Disable
-mforce-indirect-call for PIC in 32-bit mode.
gcc/testsuite/
PR target/105980
* g++.target/i386/pr105980.C: New test.
This patch removes the hard coded constant YEAR and replaces
its use by a call to a new procedure function getYear.
It also emits a GPL v3 boilerplate.
gcc/m2/ChangeLog:
* mc-boot-ch/Glibc.c (libc_time): New function.
(libc_localtime): New function.
* mc-boot/GDynamicStrings.c: Regenerate.
* mc-boot/GFIO.c: Regenerate.
* mc-boot/GFormatStrings.c: Regenerate.
* mc-boot/GIndexing.c: Regenerate.
* mc-boot/GM2Dependent.c: Regenerate.
* mc-boot/GM2EXCEPTION.c: Regenerate.
* mc-boot/GPushBackInput.c: Regenerate.
* mc-boot/GRTExceptions.c: Regenerate.
* mc-boot/GRTint.c: Regenerate.
* mc-boot/GStdIO.c: Regenerate.
* mc-boot/GStringConvert.c: Regenerate.
* mc-boot/GSysStorage.c: Regenerate.
* mc-boot/Gdecl.c: Regenerate.
* mc-boot/GmcComment.c: Regenerate.
* mc-boot/GmcComp.c: Regenerate.
* mc-boot/GmcDebug.c: Regenerate.
* mc-boot/GmcMetaError.c: Regenerate.
* mc-boot/GmcOptions.c: Regenerate.
* mc-boot/GmcStack.c: Regenerate.
* mc-boot/GnameKey.c: Regenerate.
* mc-boot/GsymbolKey.c: Regenerate.
* mc-boot/Gkeyc.c: Regenerate.
* mc/decl.mod (putFieldRecord): Change NulName to NulKey
and fix type comparision.
* mc/mcOptions.mod (YEAR): Remove.
(getYear): New procedure function.
(displayVersion): Use result from getYear instead of YEAR.
Emit boilerplate for GPL v3.
(gplBody): Use result from getYear instead of YEAR.
(glplBody): Use result from getYear instead of YEAR.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
Attempting to dereference an undeclared variable will cause an ICE.
Also attempting to pass an undeclared variable as an array of type
will also cause an ICE. This patch detects both conditions and
generates an appropriate error.
gcc/m2/ChangeLog:
* gm2-compiler/M2Quads.mod (AssignUnboundedVar): Check Type
against NulSym and call MetaErrorT1 if necessary.
(AssignUnboundedNonVar): Check Type against NulSym and
call MetaErrorT1 if necessary.
(BuildDesignatorPointer): Check Type1 against NulSym and
call MetaErrorT1 if necessary.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
Fix wrong code issues in ipa-sra where we are trying to prove that on every
execution of a given function a call to other function will happen. The code
uses post dominators and makes a wrong query (which passes only for first BB in
function). Hoever post-dominators are only valid if fake edges for every
possible reason for fuction execution to terminate are added.
Fixing this using postdominators is somewhat costy since one needs to walk
whole body and add a lot of fake edges. I ended up implementing a special
purpose function for this which is also useful in ipa-modref and other places
that does similar analysis. One does not need to modify CFG to use it and
moreover for complex functions it usually stops on first unanalyzed function
call and ends up being relatively cheap.
Bootstrapped/regtested x86_64-linux, plan to commit it shortly.
gcc/ChangeLog:
2023-01-16 Jan Hubicka <hubicka@ucw.cz>
PR ipa/106077
* ipa-modref.cc (modref_access_analysis::analyze): Use
find_always_executed_bbs.
* ipa-sra.cc (process_scan_results): Likewise.
* ipa-utils.cc (stmt_may_terminate_function_p): New function.
(find_always_executed_bbs): New function.
* ipa-utils.h (stmt_may_terminate_function_p): Declare.
(find_always_executed_bbs): Declare.
gcc/testsuite/ChangeLog:
2023-01-16 Jan Hubicka <hubicka@ucw.cz>
* g++.dg/tree-ssa/pr106077.C: New test.
When building src/c++20/tzdb.cc we currently get a build error for
--with-default-libstdcxx-abi=gcc4-compatible because std::chrono::tzdb
and related types are not declared for the gcc4-compatible ABI (unless
--disable-libstdcxx-dual-abi is also used, so that the gcc4-compatible
ABI is the only one built).
Define _GLIBCXX_USE_CXX11_ABI in tzdb.cc so that for a dual-abi build we
always build it for the cxx11 ABI.
libstdc++-v3/ChangeLog:
* src/c++20/tzdb.cc (_GLIBCXX_USE_CXX11_ABI): Define to 1.
When the type of the return object is a constrained array, there may be an
implicit sliding that needs to be preserved during the expansion.
gcc/ada/
* exp_ch3.adb (Make_Allocator_For_Return): Convert the expression
to the return object's type in the constrained array case as well.
The recent removal of the unconditional call to Remove_Side_Effects on the
expression of an object declaration or an allocator with a class-wide type
has introduced a pessimization in the former case for function calls that
return a specific tagged type, because the object ultimately created on the
primary stack has changed from being of a specific tagged type to being of
the class-wide type, the latter type always formally requiring finalization.
With the current finalization machinery, this means that a dispatching call
to the Deep_Finalize routine is generated, which is unnecessary. Although
this is a generic finalization issue with class-wide objects, this restores
the previous behavior in this case to fix the pessimization for now.
gcc/ada/
* exp_ch3.adb (Expand_N_Object_Declaration): For a class-wide non-
interface stand-alone object initialized by a function call, call
Remove_Side_Effects on the expression to capture the result.
This extends the use of static references to the interface tag in more cases
for (class-wide) interface objects, e.g. for initialization expressions that
are qualified aggregates or nondispatching calls returning a specific tagged
type implementing the interface.
gcc/ada/
* exp_util.ads (Has_Tag_Of_Type): Declare.
* exp_util.adb (Has_Tag_Of_Type): Move to package level. Recurse on
qualified expressions.
* exp_ch3.adb (Expand_N_Object_Declaration): Use a static reference
to the interface tag in more cases for class-wide interface objects.
This restores the proper finalization of temporaries for interface objects
in the case where the initializing expression is not of an interface type.
It turns out that neither Is_Temporary_For_Interface_Object nor its previous
incarnation are sufficient to catch all the various cases, so it is replaced
by a small enhancement to Is_Aliased, which is more robust.
gcc/ada/
* exp_util.adb (Is_Temporary_For_Interface_Object): Delete.
(Is_Finalizable_Transient.Is_Aliased): Deal with the specific case
of temporaries generated for interface objects.
This further optimizes the usual case of (class-wide) interface objects that
are initialized with calls to functions whose result type is the type of the
objects (this is not necessary as any result type implementing the interface
would do) by avoiding a back-and-forth displacement of the objects' address.
This exposed a latent issue whereby the displacement was missing in the case
of a simple return statement whose expression is a call to a function whose
result type is a specific tagged type that needs finalization.
And, in order to avoid pessimizing the expanded code, this in turn required
avoiding to create temporaries for allocators by calling Remove_Side_Effects
up front, in the common cases when they are not necessary.
gcc/ada/
* exp_ch3.adb (Expand_N_Object_Declaration): Do not generate a back-
and-forth displacement of the object's address when using a renaming
for an interface object with an expression of the same type.
* exp_ch4.adb (Expand_Allocator_Expression): Do not remove the side
effects of the expression up front for the simple allocators. Do not
call the Adjust primitive if the expression is a function call.
* exp_ch6.adb (Expand_Ctrl_Function_Call): Do not expand the call
unnecessarily for a special return object.
(Expand_Simple_Function_Return): Restore the displacement of the
return object's address in the case where the expression is the call
to a function whose result type is a type that needs finalization.
* exp_util.adb (Expand_Subtype_From_Expr): Do not remove the side
effects of the expression before calling Make_Subtype_From_Expr.
(Make_CW_Equivalent_Type): If the expression has the tag of its type
and this type has a uniform size, use 'Object_Size of this type in
lieu of 'Size of the expression to compute the expression's size.
This needs to be done for all expressions with class-wide type.
gcc/ada/
* exp_ch3.adb (Make_Allocator_For_Return): Put back an interface
conversion for expressions with non-interface class-wide type.
It turns out that the only blocking case is an aliased object whose nominal
subtype is an unconstrained array because the bounds must be allocated.
gcc/ada/
* exp_ch3.adb (Expand_N_Object_Declaration): Also optimize aliased
objects if their nominal subtype is not an unconstrained array.
This optimizes the implementation of (class-wide) interface objects that are
initialized with function calls, by avoiding an unnecessary copy operation.
This also removes useless access checks generated by the expansion of return
statements involving class-wide types.
gcc/ada/
* exp_ch3.adb (Expand_N_Object_Declaration): Factor out conditions
needed for an initializating expression that is a function call to
be renamable into the Is_Renamable_Function_Call predicate.
Use it to implement the renaming in the case of class-wide interface
objects. Remove an interface conversion on all paths, separate and
optimize the renaming path in the special expansion for interfaces.
(Is_Renamable_Function_Call): New predicate.
(Make_Allocator_For_Return): Put back an interface conversion.
* exp_ch6.adb (Apply_CW_Accessibility_Check): Remove useless access
checks on RE_Tag_Ptr.
this patch adds more tunes for zen4:
- new tunes for avx512 scater instructions.
In micro benchmarks these seems consistent loss compared to open-coded coe
- disable use of gather for zen4
While these are win for a micro benchmarks (based on TSVC), enabling gather
is a loss for parest. So for now it seems safe to keep it off.
- disable pass to avoid FMA chains for znver4 since fmadd was optimized and does not seem
to cause regressions.
* config/i386/i386.cc (ix86_vectorize_builtin_scatter): Guard scatter
by TARGET_USE_SCATTER.
* config/i386/i386.h (TARGET_USE_SCATTER_2PARTS,
TARGET_USE_SCATTER_4PARTS, TARGET_USE_SCATTER): New macros.
* config/i386/x86-tune.def (TARGET_USE_SCATTER_2PARTS,
TARGET_USE_SCATTER_4PARTS, TARGET_USE_SCATTER): New tunes.
(X86_TUNE_AVOID_256FMA_CHAINS, X86_TUNE_AVOID_512FMA_CHAINS): Disable
for znver4. (X86_TUNE_USE_GATHER): Disable for zen4.
Don't add crtfastmath.o for -shared to avoid altering the FP
environment when loading a shared library.
PR target/55522
* config/sol2.h (ENDFILE_SPEC): Don't add crtfastmath.o for -shared.
With these previous patches:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606586.htmlhttps://gcc.gnu.org/pipermail/gcc-patches/2022-November/606587.html
we enabled the MVE overloaded _Generic associations to handle more
scalar types, however at PR 107515 we found a new regression that
wasn't detected in our testing:
With glibc's posix/types.h:
```
typedef signed int __int32_t;
...
typedef __int32_t int32_t;
```
We would get a `error: '_Generic' specifies two compatible types`
from `__ARM_mve_coerce3` because of `type: param`, when `type` is
`int` and `int32_t: param` both being the same under the hood.
The same did not happen with Newlib's header sys/_stdint.h:
```
typedef long int __int32_t;
...
typedef __int32_t int32_t ;
```
which worked fine, because it uses `long int`.
The same could feasibly happen in `__ARM_mve_coerce2` between
`__fp16` and `float16_t`.
The solution here is to break the _Generic down so that the similar
types don't appear at the same level, as is done in `__ARM_mve_typeid`
gcc/ChangeLog:
PR target/96795
PR target/107515
* config/arm/arm_mve.h (__ARM_mve_coerce2): Split types.
(__ARM_mve_coerce3): Likewise.
gcc/testsuite/ChangeLog:
PR target/96795
PR target/107515
* gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c: New test.
* gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c: New test.
Annual update of dates. Also change the GPL boilerplate
emitted to GPL v3.
gcc/m2/ChangeLog:
* mc/mcOptions.mod (displayVersion): Change GPLv2 to GPLv3.
(YEAR) set to 2023.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
gcc/ChangeLog:
* tree-ssa-loop-niter.cc (build_popcount_expr): Add IFN support.
gcc/testsuite/ChangeLog:
* g++.dg/tree-ssa/pr86544.C: Add .POPCOUNT to tree scan regex.
* gcc.dg/tree-ssa/popcount.c: Likewise.
* gcc.dg/tree-ssa/popcount2.c: Likewise.
* gcc.dg/tree-ssa/popcount3.c: Likewise.
* gcc.target/aarch64/popcount4.c: Likewise.
* gcc.target/i386/pr95771.c: Likewise, and...
* gcc.target/i386/pr95771-2.c: ...split int128 test from above,
since this would emit just a single IFN if a TI optab is added.
This recognises the patterns of the form:
while (n & 1) { n >>= 1 }
Unfortunately there are currently two issues relating to this patch.
Firstly, simplify_using_initial_conditions does not recognise that
(n != 0) and ((n & 1) == 0) implies that ((n >> 1) != 0).
This preconditions arise following the loop copy-header pass, and the
assumptions returned by number_of_iterations_exit_assumptions then
prevent final value replacement from using the niter result.
I'm not sure what is the best way to fix this - one approach could be to
modify simplify_using_initial_conditions to handle this sort of case,
but it seems that it basically wants the information that ranger could
give anway, so would something like that be a better option?
The second issue arises in the vectoriser, which is able to determine
that the niter->assumptions are always true.
When building with -march=armv8.4-a+sve -S -O3, we get this codegen:
foo (unsigned int b) {
int c = 0;
if (b == 0)
return PREC;
while (!(b & (1 << (PREC - 1)))) {
b <<= 1;
c++;
}
return c;
}
foo:
.LFB0:
.cfi_startproc
cmp w0, 0
cbz w0, .L6
blt .L7
lsl w1, w0, 1
clz w2, w1
cmp w2, 14
bls .L8
mov x0, 0
cntw x3
add w1, w2, 1
index z1.s, #0, #1
whilelo p0.s, wzr, w1
.L4:
add x0, x0, x3
mov p1.b, p0.b
mov z0.d, z1.d
whilelo p0.s, w0, w1
incw z1.s
b.any .L4
add z0.s, z0.s, #1
lastb w0, p1, z0.s
ret
.p2align 2,,3
.L8:
mov w0, 0
b .L3
.p2align 2,,3
.L13:
lsl w1, w1, 1
.L3:
add w0, w0, 1
tbz w1, #31, .L13
ret
.p2align 2,,3
.L6:
mov w0, 32
ret
.p2align 2,,3
.L7:
mov w0, 0
ret
.cfi_endproc
In essence, the vectoriser uses the niter information to determine
exactly how many iterations of the loop it needs to run. It then uses
SVE whilelo instructions to run this number of iterations. The original
loop counter is also vectorised, despite only being used in the final
iteration, and then the final value of this counter is used as the
return value (which is the same as the number of iterations it computed
in the first place).
This vectorisation is obviously bad, and I think it exposes a latent
bug in the vectoriser, rather than being an issue caused by this
specific patch.
gcc/ChangeLog:
* tree-ssa-loop-niter.cc (number_of_iterations_cltz): New.
(number_of_iterations_bitcount): Add call to the above.
(number_of_iterations_exit_assumptions): Add EQ_EXPR case for
c[lt]z idiom recognition.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/cltz-max.c: New test.
* gcc.dg/tree-ssa/clz-char.c: New test.
* gcc.dg/tree-ssa/clz-int.c: New test.
* gcc.dg/tree-ssa/clz-long-long.c: New test.
* gcc.dg/tree-ssa/clz-long.c: New test.
* gcc.dg/tree-ssa/ctz-char.c: New test.
* gcc.dg/tree-ssa/ctz-int.c: New test.
* gcc.dg/tree-ssa/ctz-long-long.c: New test.
* gcc.dg/tree-ssa/ctz-long.c: New test.