I originally defined std::remove_cvref_t in terms of the internal
__remove_cvref_t trait, to avoid instantiating the remove_cvref class
template. However, as described in P1715R0 that is observable by users
and is thus non-conforming.
This defines remove_cvref_t as specified in the standard.
libstdc++-v3/ChangeLog:
* include/std/type_traits (remove_cvref_t): Define in terms of
remove_cvref.
* testsuite/20_util/remove_cvref/value.cc: Check alias.
Various functions in phiopt are also called with a gphi * but use
gimple * argument for it.
2021-05-06 Jakub Jelinek <jakub@redhat.com>
* tree-ssa-phiopt.c (value_replacement, minmax_replacement,
abs_replacement, xor_replacement,
cond_removal_in_popcount_clz_ctz_pattern,
replace_phi_edge_with_variable): Change type of phi argument from
gimple * to gphi *.
We already take care to not apply loop splitting to IL produced
by splitting so we should be able to delay updating SSA and
loop-closed SSA that was left broken after loop versioning
until after we processed all opportunities.
2021-05-06 Richard Biener <rguenther@suse.de>
* tree-ssa-loop-split.c (split_loop): Delay updating SSA form.
Output an opt-info message.
(do_split_loop_on_cond): Likewise.
(tree_ssa_split_loops): Update SSA form here.
While doing bogus call LHS removal I noticed that cloning with
dropping a return value creates a bogus replacement for a
DECL_BY_REFERENCE DECL_RESULT, resulting in MEM_REFs of
aggregates rather than pointers. The following fixes this
latent issue.
2021-05-06 Richard Biener <rguenther@suse.de>
* tree-inline.c (tree_function_versioning): Fix DECL_BY_REFERENCE
return variable removal.
The new test gcc.c-torture/execute/ieee/cdivchkld.c needs fmaxl(),
which may not be available, for instance on aarch64-elf with newlib.
As discussed in the PR, requiring c99_runtime enables to skip the test
in this case.
2021-05-06 Christophe Lyon <christophe.lyon@linaro.org>
PR testsuite/100355
gcc/testsuite/
* gcc.c-torture/execute/ieee/cdivchkld.x: New.
The builtin vec_permi is peculiar in that its immediate operand is
encoded differently than the immediate operand that is backing the
builtin. This fixes the check for the immediate operand, adding a
regression test in the process.
This partially reverts commit 3191c1f448
2021-05-06 Marius Hillenbrand <mhillen@linux.ibm.com>
gcc/ChangeLog:
* config/s390/s390-builtins.def (O_M5, O1_M5, ...): Remove unused macros.
(s390_vec_permi_s64, s390_vec_permi_b64, s390_vec_permi_u64)
(s390_vec_permi_dbl, s390_vpdi): Use the O3_U2 type for the immediate
operand.
* config/s390/s390.c (s390_const_operand_ok): Remove unused
values.
gcc/testsuite/ChangeLog:
* gcc.target/s390/zvector/imm-range-error-1.c: Fix test for
__builtin_s390_vpdi.
* gcc.target/s390/zvector/vec-permi.c: New test for builtin
vec_permi.
genericize_spaceship genericizes i <=> j to approximately
({ int c; if (i == j) c = 0; else if (i < j) c = -1; else c = 1; c; })
for strong ordering and
({ int c; if (i == j) c = 0; else if (i < j) c = -1; else if (i > j) c = 1; else c = 2; c; })
for partial ordering.
The C++ standard supports then == or != comparisons of that against
strong/partial ordering enums, or </<=/==/!=/>/>= comparisons of <=> result
against literal 0.
In some cases we already optimize that but in many cases we keep performing
all the 2 or 3 comparisons, compute the spaceship value and then compare
that.
The following patch recognizes those patterns if the <=> operands are
integral types or floating point (the latter only for -ffast-math) and
optimizes it to the single comparison that is needed (plus adds debug stmts
if needed for the spaceship result).
There is one thing I'd like to address in a follow-up: the pr94589-2.C
testcase should be matching just 12 times each, but runs
into operator>=(partial_ordering, unspecified) being defined as
(_M_value&1)==_M_value
rather than _M_value>=0. When not honoring NaNs, the 2 case should be
unreachable and so (_M_value&1)==_M_value is then equivalent to _M_value>=0,
but is not a single use but two uses. I'll need to pattern match that case
specially.
2021-05-06 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/94589
* tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Call
spaceship_replacement.
(cond_only_block_p, spaceship_replacement): New functions.
* gcc.dg/pr94589-1.c: New test.
* gcc.dg/pr94589-2.c: New test.
* gcc.dg/pr94589-3.c: New test.
* gcc.dg/pr94589-4.c: New test.
* g++.dg/opt/pr94589-1.C: New test.
* g++.dg/opt/pr94589-2.C: New test.
* g++.dg/opt/pr94589-3.C: New test.
* g++.dg/opt/pr94589-4.C: New test.
emutls figured that tls uses in debug-insns need lowering but
that obviously has effects on code-generation as can be seen
in the following IL diff with the new testcase:
<bb 2> [local count: 1073741824]:
- a = 0;
+ # DEBUG BEGIN_STMT
_4 = __builtin___emutls_get_address (&__emutls_v.b);
+ # DEBUG D#1 => *_4
+ # DEBUG d => (long int) D#1
+ # DEBUG BEGIN_STMT
+ a = 0;
+ # DEBUG BEGIN_STMT
*_4 = 0;
return;
where it figured the debug use of b in the original
<bb 2> [local count: 1073741824]:
# DEBUG BEGIN_STMT
# DEBUG D#1 => b
# DEBUG d => (long int) D#1
# DEBUG BEGIN_STMT
a = 0;
needs lowering (it maybe does when we want to produce perfect
debug but that's just bad luck).
The following patch fixes this by avoiding to create a new
emutls address when visiting debug stmts and instead resets them.
Another option might be to simply not lower debug stmt uses
but I have no way to verify actual debug info for this.
2021-05-05 Richard Biener <rguenther@suse.de>
PR ipa/100373
* tree-emutls.c (gen_emutls_addr): Pass in whether we're
dealing with a debug use and only query existing addresses
if so.
(lower_emutls_1): Avoid splitting out addresses for debug
stmts, reset the debug stmt when we fail to find existing
lowered addresses.
(lower_emutls_phi_arg): Set wi.stmt.
* gcc.dg/pr100373.c: New testcase.
gcc/ada/
* exp_disp.adb (Build_Class_Wide_Check): Extending the
functionality of this routine to climb to the ancestors
searching for the enclosing overridden dispatching primitive
that has a class-wide precondition to generate the check.
gcc/ada/
* exp_ch4.adb (Expand_N_If_Expression):
Apply_Arithmetic_Overflow_Check will not deal with
Then/Else_Actions so skip minimizing overflow checks if any
actions are present.
gcc/ada/
* sem_res.adb (First_Last_Ref): Simplify "if [condition] then
return True" in "return [condition]".
(Resolve_Range): Remove calls appearing in IF condition from the
THEN statements.
gcc/ada/
* sem_case.adb (Missing_Choice): Fix typo in comment.
(Lit_Of): Simplify with Make_Character_Literal.
(Check_Choices): Remove extra spaces in parameter
specifications.
* sem_case.ads: Same reformatting.
gcc/ada/
* exp_aggr.adb (Expand_Array_Aggregate): If the expression in an
Others_Clause has not been analyzed because previous analysis of
the enclosing aggregate showed the clause to be ineffective i.e.
cover a null range, analyze it now to detect a possible type
illegality.
gcc/ada/
* sem_ch4.adb (Analyze_Selected_Component): Remove explicit call
to Set_Raises_Constraint_Error on statically missing component.
* sem_eval.adb (Eval_Arithmetic_Op): Likewise for static
divisions by integer and real zeros.
* sem_util.adb (Apply_Compile_Time_Constraint_Error): Call
Set_Raises_Constraint_Error before exiting early in GNATprove
mode.
gcc/ada/
* doc/gnat_rm/implementation_defined_characteristics.rst (3.5.7):
Mention the IEEE standard explicitly. Use current format names.
Document assumed rounding mode and new features of I/O support.
* gnat_rm.texi: Regenerate.
gcc/ada/
* doc/gnat_ugn/building_executable_programs_with_gnat.rst: Add
mention of underscore and fix grammar error in doc for -gnatd.
* gnat_ugn.texi: Regenerate.
gcc/ada/
* Makefile.rtl (GNATRTL_NONTASKING_OBJS): Add s-exponr, s-exnflt
and s-exnlfl.
* exp_ch4.adb (Expand_N_Op_Expon): Use RE_Exn_Float for Short_Float.
* rtsfind.ads (RTU_Id): Add System_Exn_Flt and System_Exn_LFlt.
(RE_Id): Adjust entries for RE_Exn_Float and RE_Exn_Long_Float.
(RE_Unit_Table): Likewise.
* libgnat/s-exnflt.ads: New file.
* libgnat/s-exnlfl.ads: Likewise.
* libgnat/s-exnllf.ads: Change to mere instantiation.
* libgnat/s-exnllf.adb: Move implementation to...
* libgnat/s-exponr.ads: New generic unit.
* libgnat/s-exponr.adb: ...here and also make it generic.
(Expon): Do the computation in double precision internally.
gcc/ada/
* sem_res.adb (Resolve_If_Expression): If the context of the
expression is an indexed_component, resolve the expression and
its dependent_expressions with the base type of the index, to
ensure that an index check is generated when resolving the
enclosing indexxed_component, and avoid an improper use of
discriminants out of scope, when the index type is
discriminant-dependent.
On RISC-V we are facing the fact, that our conditional branches
require Pmode conditions. Currently, we generate them explicitly
with a check for Pmode and then calling the proper generator
(i.e. gen_cbranchdi4 on RV64 and gen_cbranchsi4 on RV32).
Let's simplify this code by generating the INSN helpers
and use gen_cbranch4 (Pmode).
gcc/
PR target/100266
* config/riscv/riscv.c (riscv_block_move_loop): Use cbranch helper.
* config/riscv/riscv.md (cbranch<mode>4): Generate helpers.
(stack_protect_test): Use cbranch helper.
This is a regression for 64-bit Windows present from mainline down to the 9
branch and introduced by the fix for PR target/99234. Again SEH, but with
a twist related to the way MinGW implements setjmp/longjmp, which turns out
to be piggybacked on SEH with recent versions of MinGW, i.e. the longjmp
performs a bona-fide unwinding of the stack, because it calls RtlUnwindEx
with the second argument initially passed to setjmp, which is the result of
__builtin_frame_address (0) in the MinGW header file:
define setjmp(BUF) _setjmp((BUF), __builtin_frame_address (0))
This means that we directly expose the frame pointer to the SEH machinery
here (unlike with regular exception handling where we use an intermediate
CFA) and thus that we cannot do whatever we want with it. The old code
would leave it unaligned, i.e. not multiple of 16, whereas the new code
aligns it, but this breaks for some reason; at least it appears that a
.seh_setframe directive with 0 as second argument always works, so the
fix aligns it this way.
gcc/
PR target/100402
* config/i386/i386.c (ix86_compute_frame_layout): For a SEH target,
always return the establisher frame for __builtin_frame_address (0).
gcc/testsuite/
* gcc.c-torture/execute/20210505-1.c: New test.
GCC -O2 generated quite bad code for this function:
bool
f (void)
{
return __builtin_cpu_supports("popcnt")
&& __builtin_cpu_supports("ssse3");
}
f:
movl __cpu_model+12(%rip), %edx
movl %edx, %eax
shrl $6, %eax
andl $1, %eax
andl $4, %edx
movl $0, %edx
cmove %edx, %eax
ret
The problem was caused by the fact that internally every invocation of
__builtin_cpu_supports built a new variable __cpu_model and a new type
__processor_model. Because of this, GIMPLE level optimizers weren't able
to CSE the loads of __cpu_model and optimize bit-operations properly.
Improve GCC -O2 code generation by caching __cpu_model and__cpu_features2
variables as well as their types:
f:
movl __cpu_model+12(%rip), %eax
andl $68, %eax
cmpl $68, %eax
sete %al
ret
2021-05-05 Ivan Sorokin <vanyacpp@gmail.com>
H.J. Lu <hjl.tools@gmail.com>
gcc/
PR target/91400
* config/i386/i386-builtins.c (ix86_cpu_model_type_node): New.
(ix86_cpu_model_var): Likewise.
(ix86_cpu_features2_type_node): Likewise.
(ix86_cpu_features2_var): Likewise.
(fold_builtin_cpu): Cache __cpu_model and __cpu_features2 with
their types.
gcc/testsuite/
PR target/91400
* gcc.target/i386/pr91400-1.c: New test.
* gcc.target/i386/pr91400-2.c: Likewise.
These constraints are already present on the template we're partially
specializing for.
libstdc++-v3/ChangeLog:
* include/bits/ranges_util.h (enable_borrowed_range<subrange>):
Remove constraints on this partial specialization.
* include/std/ranges (enable_borrowed_range<iota_view>):
Likewise.
libstdc++-v3/ChangeLog:
* include/std/ranges (transform_view::_Iterator::iter_swap):
Remove as per LWG 3520.
(join_view::_Iterator::iter_swap): Add indirectly_swappable
constraint as per LWG 3517.
For move2add_valid_value_p we also have to ask the target whether a
register can be accessed in a different mode than it was set before.
gcc/ChangeLog:
PR rtl-optimization/100263
* postreload.c (move2add_valid_value_p): Ensure register can
change mode.
This is the bootstrap failure of GCC 11 on MinGW64 configured with --enable-
tune=nocona. The bottom line is that SEH does not support CFI for epilogues
but the x86 back-end nevertheless attaches it to instructions, so we have to
filter it out and this is done by detecting the end of the prologue by means
of the NOTE_INSN_PROLOGUE_END note.
But the compiler manages to generate a second epilogue before this note in
the RTL stream and this fools the aforementioned logic. The root cause is
cross-jumping, which inserts a jump before the end of the prologue, in fact
just before the note; the rest (CFG cleanup, BB reordering, etc) is downhill
from there.
gcc/
PR rtl-optimization/100411
* cfgcleanup.c (try_crossjump_to_edge): Also skip end of prologue
and beginning of function markers.