Instead of selecting bits 62 to (wraparound) 59 from r2 and inserting them
into r3, we select bits 60 to 62 from r3 and insert them into r2
nowadays. Adjust the test accordingly.
gcc/testsuite/ChangeLog:
* gcc.target/s390/risbg-ll-3.c: Change match pattern.
When a taskloop doesn't have any iterations, GOMP_taskloop* takes an early
return, doesn't create any tasks and more importantly, doesn't create
a taskgroup and doesn't register task reductions. But, the code emitted
in the callers assumes task reductions have been registered and performs
the reduction handling and task reduction unregistration. The pointer
to the task reduction private variables is reused, on input it is the alignment
and only on output it is the pointer, so in the case taskloop with no iterations
the caller attempts to dereference the alignment value as if it was a pointer
and crashes. We could in the early returns register the task reductions
only to have them looped over and unregistered in the caller, but I think
it is better to tell the caller there is nothing to task reduce and bypass
all that.
2021-05-11 Jakub Jelinek <jakub@redhat.com>
PR middle-end/100471
* omp-low.c (lower_omp_task_reductions): For OMP_TASKLOOP, if data
is 0, bypass the reduction loop including
GOMP_taskgroup_reduction_unregister call.
* taskloop.c (GOMP_taskloop): If GOMP_TASK_FLAG_REDUCTION and not
GOMP_TASK_FLAG_NOGROUP, when doing early return clear the task
reduction pointer.
* testsuite/libgomp.c/task-reduction-4.c: New test.
This patch teaches rs6000_density_test to only care about the vector
version cost calculation and early return when calculating the single
scalar iteration cost.
Bootstrapped/regtested on powerpc64le-linux-gnu P9.
gcc/ChangeLog:
* config/rs6000/rs6000.c (struct rs6000_cost_data): New member
costing_for_scalar.
(rs6000_density_test): Early return if costing_for_scalar is true.
(rs6000_init_cost): Init costing_for_scalar of rs6000_cost_data.
rs6000 port function rs6000_density_test wants to differentiate the
current cost model is for the scalar version of a loop or block, or
the vector version. As Richi suggested, this patch introduces one
new parameter costing_for_scalar to init_cost hook to pass down this
information explicitly.
gcc/ChangeLog:
* doc/tm.texi: Regenerated.
* target.def (init_cost): Add new parameter costing_for_scalar.
* targhooks.c (default_init_cost): Adjust for new parameter.
* targhooks.h (default_init_cost): Likewise.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Likewise.
(vect_compute_single_scalar_iteration_cost): Likewise.
(vect_analyze_loop_2): Likewise.
* tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Likewise.
(vect_bb_vectorization_profitable_p): Likewise.
* tree-vectorizer.h (init_cost): Likewise.
* config/aarch64/aarch64.c (aarch64_init_cost): Likewise.
* config/i386/i386.c (ix86_init_cost): Likewise.
* config/rs6000/rs6000.c (rs6000_init_cost): Likewise.
This patch is to move rs6000_vect_nonmem (target cost_data
related information) into target cost_data struct.
As Richi pointed out, we can gather data from add_stmt_cost
invocations. This is one pre-step to centralize target
cost_data related stuffs.
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_vect_nonmem): Renamed to
vect_nonmem and moved into...
(struct rs6000_cost_data): ...here.
(rs6000_init_cost): Use vect_nonmem of cost_data instead.
(rs6000_add_stmt_cost): Likewise.
(rs6000_finish_cost): Likewise.
This unconditionally enables the maybe_save_operator_binding mechanism
for all function templates, so that when resolving a dependent operator
expression from a function template we ignore later-declared
namespace-scope bindings that weren't visible at template definition
time. This patch additionally makes the mechanism apply to dependent
comma and compound-assignment operator expressions.
Note that this doesn't fix the testcases in PR83035 or PR99692 because
there the dependent operator expressions aren't at function scope. I'm
not sure how adapt this mechanism for these testcases, since although
we'll in both testcases have a TEMPLATE_DECL to associate the lookup
result with, at instantiation time we won't have an appropriate binding
level to push to.
gcc/cp/ChangeLog:
PR c++/51577
* name-lookup.c (maybe_save_operator_binding): Unconditionally
enable for all function templates, not just generic lambdas.
Handle compound-assignment operator expressions.
* typeck.c (build_x_compound_expr): Call maybe_save_operator_binding
in the type-dependent case.
(build_x_modify_expr): Likewise. Move declaration of 'op' closer
to its first use.
gcc/testsuite/ChangeLog:
PR c++/51577
* g++.dg/lookup/operator-3.C: New test.
This PR is about CTAD but the underlying problems are more general;
CTAD is a good trigger for them because of the necessary substitution
into constraints that deduction guide generation entails.
In the testcase below, when generating the implicit deduction guide for
the constrained constructor template for A, we substitute the generic
flattening map 'tsubst_args' into the constructor's constraints. During
this substitution, tsubst_pack_expansion returns a rebuilt pack
expansion for sizeof...(xs), but doesn't carry over the
PACK_EXPANSION_LOCAL_P (and PACK_EXPANSION_SIZEOF_P) flag from the
original tree to the rebuilt one. The flag is otherwise unset on the
original tree but gets set for the rebuilt tree from make_pack_expansion
since at_function_scope_p() is true (we're inside main). This leads to
a crash during satisfaction when substituting into the pack expansion
because we don't have local_specializations set up (and it'd be set up
for us if PACK_EXPANSION_LOCAL_P is unset)
Similarly, tsubst_constraint needs to set cp_unevaluated so that the
substitution performed therein doesn't rely on local_specializations.
This avoids a crash during CTAD for C below.
gcc/cp/ChangeLog:
PR c++/100138
* constraint.cc (tsubst_constraint): Set up cp_unevaluated.
(satisfy_atom): Set up iloc_sentinel before calling
cxx_constant_value.
* pt.c (tsubst_pack_expansion): When returning a rebuilt pack
expansion, carry over PACK_EXPANSION_LOCAL_P and
PACK_EXPANSION_SIZEOF_P from the original pack expansion.
gcc/testsuite/ChangeLog:
PR c++/100138
* g++.dg/cpp2a/concepts-ctad4.C: New test.
gcc/ada/
PR bootstrap/100506
* Make-generated.in: Replace version.c with ada/version.c.
* gcc-interface/Make-lang.in: Add version.o to GNAT1_C_OBJS.
Add version.o to GNAT_ADA_OBJS and GNATBIND_OBJS.
* gcc-interface/Makefile.in: Add version.o to TOOLS_LIBS.
* gnatvsn.adb: Start using a new C symbol gnat_version_string.
* version.c: New file.
This pragma is relatively recent and may be problematic for the bootstrap.
gcc/ada/
* atree.ads (Slot): Remove pragma Provide_Shift_Operators.
(Shift_Left): New intrinsic function.
(Shift_Right): Likewise.
* atree.adb (Get_1_Bit_Val): Use Natural instead of Integer.
(Get_2_Bit_Val): Likewise.
(Get_4_Bit_Val): Likewise.
(Get_8_Bit_Val): Likewise.
(Set_1_Bit_Val): Likewise.
(Set_2_Bit_Val): Likewise.
(Set_4_Bit_Val): Likewise.
(Set_8_Bit_Val): Likewise.
This uses the same mechanism as for ada/snames and ada/sdefault to avoid
spurious rebuild actions in the ada/gen_il directory. This also avoids
copying some files into the generated directory, which is unnecessary.
gcc/ada/
* Make-generated.in (do_gen_il): Replace with...
(ada/stamp-gen_il): ...this. Do not copy files into generated/.
The Ada testcase happens to stumble on the call to gcc_unreachable in
operator_bitwise_xor::op1_range, but there is nothing wrong going on
and it's safe to let it go through.
gcc/
* range-op.cc (get_bool_state): Adjust head comment.
(operator_not_equal::op1_range): Fix comment.
(operator_bitwise_xor::op1_range): Remove call to gcc_unreachable.
gcc/testsuite/
* gnat.dg/specs/opt5.ads: New test.
* gnat.dg/specs/opt5_pkg.ads: New helper.
We have a comment saying to replace the simple binary_semaphore type
with std::binary_semaphore, which has been done. However, that isn't
defined on all targets. So keep the simple one here that just implements
the parts of the API needed by <stop_token>, and remove the comment
suggesting it should be replaced.
libstdc++-v3/ChangeLog:
* include/std/stop_token: Remove TODO comment.
This has been tentatively approved by LWG. The deleter from a unique_ptr
can be moved into the shared_ptr (at least, since LWG 2802). This uses
std::forward<_Del>(__r.get_deleter()) not std::move(__r.get_deleter())
because we don't want to convert the deleter to an rvalue when _Del is
an lvalue reference type.
This also adds a missing is_move_constructible_v<D> constraint to the
shared_ptr(unique_ptr<Y, D>&&) constructor, which is inherited from the
shared_ptr(Y*, D) constructor due to the use of "equivalent to" in the
specified effects.
libstdc++-v3/ChangeLog:
* include/bits/shared_ptr_base.h (__shared_count(unique_ptr&&)):
Initialize a non-reference deleter from an rvalue, as per LWG
3548.
(__shared_ptr::_UniqCompatible): Add missing constraint.
* testsuite/20_util/shared_ptr/cons/lwg3548.cc: New test.
* testsuite/20_util/shared_ptr/cons/unique_ptr_deleter.cc: Check
constraints.
Code that has heavy register pressure on Altivec registers can suffer from
over-aggressive scheduling during sched1, which then leads to increased
register spill. This is due to the fact that registers that prefer
ALTIVEC_REGS are currently assigned an allocno class of VSX_REGS. This then
misleads the scheduler to think there are 64 regs available, when in reality
there are only 32 Altivec regs. This patch fixes the problem by assigning an
allocno class of ALTIVEC_REGS and adding ALTIVEC_REGS as a pressure class.
2021-05-10 Pat Haugen <pthaugen@linux.ibm.com>
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_ira_change_pseudo_allocno_class):
Return ALTIVEC_REGS if that is best_class.
(rs6000_compute_pressure_classes): Add ALTIVEC_REGS.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/fold-vec-insert-float-p9.c: Adjust counts.
* gcc.target/powerpc/vec-rlmi-rlnm.c: Likewise.
The node and edge summaries defined in ipa-prop.h are probably the
oldest in GCC and so it happened that they are the only ones using
macros to look them up and create them. With Honza and Martin we
agreed it is ugly and the macros should be removed and the ipa-prop
summaries should be accessed like all the other ones but somehow I
never got to it until now.
The patch is mostly mechanical. Because the lookup machinery was much
simpler in the old times (something like the fast summaries we have
today), a lot of code queried for the summary multiple times for no
good reasons and I fixed that in places where it was easy.
Also, before we switched to hash based summaries, new summary pointers
had to be obtained whenever the underlying array could be reallocated
because of new cgraph nodes/edges. This is no longer necessary and so
I removed the instances which I found.
Both kinds of these non-mechanical changes should be specifically called
out in the ChangeLog.
I also removed the IS_VALID_JUMP_FUNC_INDEX macro because it not used
anywhere.
gcc/ChangeLog:
2021-05-07 Martin Jambor <mjambor@suse.cz>
* ipa-prop.h (IPA_NODE_REF): Removed.
(IPA_NODE_REF_GET_CREATE): Likewise.
(IPA_EDGE_REF): Likewise.
(IPA_EDGE_REF_GET_CREATE): Likewise.
(IS_VALID_JUMP_FUNC_INDEX): Likewise.
* ipa-cp.c (print_all_lattices): Replaced IPA_NODE_REF with a direct
use of ipa_node_params_sum.
(ipcp_versionable_function_p): Likewise.
(push_node_to_stack): Likewise.
(pop_node_from_stack): Likewise.
(set_single_call_flag): Replaced two IPA_NODE_REF with one single
direct use of ipa_node_params_sum.
(initialize_node_lattices): Replaced IPA_NODE_REF with a direct use of
ipa_node_params_sum.
(ipa_context_from_jfunc): Replaced IPA_EDGE_REF with a direct use of
ipa_edge_args_sum.
(ipcp_verify_propagated_values): Replaced IPA_NODE_REF with a direct
use of ipa_node_params_sum.
(self_recursively_generated_p): Likewise.
(propagate_scalar_across_jump_function): Likewise.
(propagate_context_across_jump_function): Replaced IPA_EDGE_REF with a
direct use of ipa_edge_args_sum, moved the lookup after the early
exit. Replaced IPA_NODE_REF with a direct use of ipa_node_params_sum.
(propagate_bits_across_jump_function): Replaced IPA_NODE_REF with
direct uses of ipa_node_params_sum.
(propagate_vr_across_jump_function): Likewise.
(propagate_aggregate_lattice): Likewise.
(propagate_aggs_across_jump_function): Likewise.
(propagate_constants_across_call): Likewise, also replaced
IPA_EDGE_REF with a direct use of ipa_edge_args_sum.
(good_cloning_opportunity_p): Replaced IPA_NODE_REF with a direct use
of ipa_node_params_sum.
(estimate_local_effects): Likewise.
(add_all_node_vals_to_toposort): Likewise.
(propagate_constants_topo): Likewise.
(ipcp_propagate_stage): Likewise.
(ipcp_discover_new_direct_edges): Likewise.
(calls_same_node_or_its_all_contexts_clone_p): Likewise.
(cgraph_edge_brings_value_p): Likewise (in both overloaded functions).
(get_info_about_necessary_edges): Likewise.
(want_remove_some_param_p): Likewise.
(create_specialized_node): Likewise.
(self_recursive_pass_through_p): Likewise.
(self_recursive_agg_pass_through_p): Likewise.
(find_more_scalar_values_for_callers_subset): Likewise and also
replaced IPA_EDGE_REF with direct uses of ipa_edge_args_sum, in one
case replacing two of those with a single query.
(find_more_contexts_for_caller_subset): Likewise for the
ipa_polymorphic_call_context overload.
(intersect_aggregates_with_edge): Replaced IPA_EDGE_REF with a direct
use of ipa_edge_args_sum. Replaced IPA_NODE_REF with direct uses of
ipa_node_params_sum.
(find_aggregate_values_for_callers_subset): Likewise, also reusing
results of ipa_edge_args_sum->get.
(cgraph_edge_brings_all_scalars_for_node): Replaced IPA_NODE_REF with
direct uses of ipa_node_params_sum, replaced IPA_EDGE_REF with a
direct use of ipa_edge_args_sum.
(cgraph_edge_brings_all_agg_vals_for_node): Likewise, moved node
summary query after the early exit and reused the result later.
(decide_about_value): Replaced IPA_NODE_REF with a direct use of
ipa_node_params_sum.
(decide_whether_version_node): Likewise. Removed re-querying for
summaries after cloning.
(spread_undeadness): Replaced IPA_NODE_REF with a direct use of
ipa_node_params_sum.
(has_undead_caller_from_outside_scc_p): Likewise, reusing results of
some queries.
(identify_dead_nodes): Likewise.
(ipcp_store_bits_results): Replaced IPA_NODE_REF with direct uses of
ipa_node_params_sum.
(ipcp_store_vr_results): Likewise.
* ipa-fnsummary.c (evaluate_properties_for_edge): Likewise.
(ipa_fn_summary_t::duplicate): Likewise.
(analyze_function_body): Likewise.
(estimate_calls_size_and_time): Likewise.
(ipa_cached_call_context::duplicate_from): Likewise.
(ipa_call_context::equal_to): Likewise.
(remap_edge_params): Likewise.
(ipa_merge_fn_summary_after_inlining): Likewise.
(inline_read_section): Likewise.
* ipa-icf.c (sem_function::param_used_p): Likewise.
* ipa-modref.c (compute_parm_map): Likewise.
(compute_parm_map): Replaced IPA_EDGE_REF with a direct use of
ipa_edge_args_sum.
(get_access_for_fnspec): Replaced IPA_NODE_REF with a direct use of
ipa_node_params_sum and replaced IPA_EDGE_REF with a direct use of
ipa_edge_args_sum.
* ipa-profile.c (check_argument_count): Likewise.
* ipa-prop.c (ipa_alloc_node_params): Replaced IPA_NODE_REF_GET_CREATE
with a direct use of ipa_node_params_sum.
(ipa_initialize_node_params): Likewise.
(ipa_print_node_jump_functions_for_edge): Replaced IPA_EDGE_REF with a
direct use of ipa_edge_args_sum and reused the query result.
(ipa_compute_jump_functions_for_edge): Replaced IPA_NODE_REF with a
direct use of ipa_node_params_sum and replaced IPA_EDGE_REF with a
direct use of ipa_edge_args_sum.
(ipa_note_param_call): Replaced IPA_NODE_REF with a direct use of
ipa_node_params_sum and reused the result of the query.
(ipa_analyze_node): Likewise.
(ipa_analyze_controlled_uses): Replaced IPA_NODE_REF with a direct use
of ipa_node_params_sum.
(update_jump_functions_after_inlining): Replaced IPA_EDGE_REF with
direct uses of ipa_edge_args_sum.
(update_indirect_edges_after_inlining): Replaced IPA_NODE_REF with
direct uses of ipa_node_params_sum and replaced IPA_EDGE_REF with a
direct use of ipa_edge_args_sum. Removed superficial re-querying the
top edge summary.
(propagate_controlled_uses): Replaced IPA_NODE_REF with direct uses of
ipa_node_params_sum and replaced IPA_EDGE_REF with a direct use of
ipa_edge_args_sum.
(ipa_propagate_indirect_call_infos): Replaced IPA_EDGE_REF with a
direct use of ipa_edge_args_sum.
(ipa_edge_args_sum_t::duplicate): Replaced IPA_NODE_REF with a direct
use of ipa_node_params_sum.
(ipa_print_node_params): Likewise.
(ipa_write_node_info): Likewise and also replaced IPA_EDGE_REF with
direct uses of ipa_edge_args_sum.
(ipa_read_edge_info): Replaced IPA_EDGE_REF with a direct use of
ipa_edge_args_sum.
(ipa_read_node_info): Replaced IPA_NODE_REF with a direct use of
ipa_node_params_sum.
(ipa_prop_write_jump_functions): Likewise. Move variable node to the
scopes where it is used.
For some reason middle-end does not enforce operand
predicates for vcond patterns.
2021-05-10 Uroš Bizjak <ubizjak@gmail.com>
gcc/
* config/i386/i386-expand.c (ix86_expand_sse_movcc)
<case E_V2SImode>: Force op_true to register.
When PCH are enabled this test file includes <any> and so the
using-directive brings std::any into the global scope. It isn't
currently a problem, because the -std option in the dg-options means
that PCH is not used. If that option is removed, the test fails with PCH
and passes without.
This just renames the type to avoid the name classh (and also the 'none'
type for consistency).
libstdc++-v3/ChangeLog:
* testsuite/20_util/variant/compile.cc: Rename 'any' to avoid
clash with std::any.
contrib/ChangeLog:
* gcc-changelog/git_check_commit.py (__Main__): State in --help
the default value for 'revisions'.
* gcc-changelog/git_email.py (show_help): Add.
(__main__): Handle -h and --help.
After removing the signed and unsigned suffixes in the previous
patches, we can now factorize the vcmp* patterns: there is no longer
an asymmetry where operators do not have the same set of signed and
unsigned variants.
The will make maintenance easier.
MVE has a different set of vector comparison operators than Neon,
so we have to introduce dedicated iterators.
2021-05-10 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/iterators.md (MVE_COMPARISONS): New.
(mve_cmp_op): New.
(mve_cmp_type): New.
* config/arm/mve.md (mve_vcmp<mve_cmp_op>q_<mode>): New, merge all
mve_vcmp patterns.
(mve_vcmpneq_<mode>, mve_vcmpcsq_n_<mode>, mve_vcmpcsq_<mode>)
(mve_vcmpeqq_n_<mode>, mve_vcmpeqq_<mode>, mve_vcmpgeq_n_<mode>)
(mve_vcmpgeq_<mode>, mve_vcmpgtq_n_<mode>, mve_vcmpgtq_<mode>)
(mve_vcmphiq_n_<mode>, mve_vcmphiq_<mode>, mve_vcmpleq_n_<mode>)
(mve_vcmpleq_<mode>, mve_vcmpltq_n_<mode>, mve_vcmpltq_<mode>)
(mve_vcmpneq_n_<mode>, mve_vcmpltq_n_<mode>, mve_vcmpltq_<mode>)
(mve_vcmpneq_n_<mode>): Remove.
This patch brings more unification in the vector comparison builtins,
by removing the useless 's' (signed) suffix since we no longer need
unsigned versions.
2021-05-10 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm_mve.h (__arm_vcmp*): Remove 's' suffix.
* config/arm/arm_mve_builtins.def (vcmp*): Remove 's' suffix.
* config/arm/mve.md (mve_vcmp*): Remove 's' suffix in pattern
names.
After the previous patch, we no longer need to emit the unsigned
variants of vcmpneq/vcmpeqq. This patch removes them as well as the
corresponding iterator entries.
2021-05-10 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm_mve_builtins.def (vcmpneq_u): Remove.
(vcmpneq_n_u): Likewise.
(vcmpeqq_u,): Likewise.
(vcmpeqq_n_u): Likewise.
* config/arm/iterators.md (supf): Remove VCMPNEQ_U, VCMPEQQ_U,
VCMPEQQ_N_U and VCMPNEQ_N_U.
* config/arm/mve.md (mve_vcmpneq): Remove <supf> iteration.
(mve_vcmpeqq_n): Likewise.
(mve_vcmpeqq): Likewise.
(mve_vcmpneq_n): Likewise.
There is no need to have a signed and an unsigned version of these
builtins. This is similar to what we do for Neon in arm_neon.h.
This mechanical patch enables later cleanup patches.
2021-05-10 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm_mve.h (__arm_vcmpeq*u*, __arm_vcmpne*u*): Call
the 's' version of the builtin.
Support for vmul has been present for a while, but it was lacking a
test for the scalar variant.
This patch adds one, precisely noting that we do not yet use the T2
variants of vmul, which take a scalar as final argument.
2021-05-10 Christophe Lyon <christophe.lyon@linaro.org>
gcc/testsuite/
* gcc.target/arm/simd/mve-vmul-scalar-1.c: New.
This patchs adds a test similar to mve-vsub_1.c, but operates on a
scalar as second argument. For the moment we do not select the T2 vsub
variant operating on a scalar final argument, and we use vadd of the
opposite.
2021-05-10 Christophe Lyon <christophe.lyon@linaro.org>
gcc/testsuite/
* gcc.target/arm/simd/mve-vsub-scalar-1.c: New test.
contrib/ChangeLog:
* gcc-changelog/git_commit.py (Error.__repr__): Add space after the colon.
(GitCommit.check_mentioned_files): Check whether the same file has been
specified multiple times.
* gcc-changelog/test_email.py (TestGccChangelog.test_multi_same_file): New.
* gcc-changelog/test_patches.txt (0001-OpenMP-Fix-SIMT): New test.
This makes sure to align data so targets without unaligned
accesses can vectorize it.
2021-05-10 Richard Biener <rguenther@suse.de>
PR testsuite/100452
* g++.dg/vect/slp-pr99971.cc: Align data.
When we distribute away a condition we rely on the ability to
change it to either 1 != 0 or 0 != 0 depending on the direction
of the exit branch in the respective loop. But when the loop
contains an irreducible sub-region then for the conditions inside
this this fails and can lead to infinite loops being generated.
Avoid distibuting loops with irreducible sub-regions.
2021-05-10 Richard Biener <rguenther@suse.de>
PR tree-optimization/100492
* tree-loop-distribution.c (find_seed_stmts_for_distribution):
Find nothing when the loop contains an irreducible region.
* gcc.dg/torture/pr100492.c: New testcase.
Fixes regression where the qualifier was ignored in an alias definition
if parentheses were not present.
Reviewed-on: https://github.com/dlang/dmd/pull/12504
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd b7d146c4c.