Most uses of rs6000_pcrel_p are called for the current function.
A specialized version for cfun is more efficient for these uses.
2020-09-16 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/predicates.md (current_file_function_operand):
Remove argument from rs6000_pcrel_p call.
* config/rs6000/rs6000-logue.c (rs6000_decl_ok_for_sibcall):
Likewise.
(rs6000_global_entry_point_prologue_needed_p): Likewise.
(rs6000_output_function_prologue): Likewise.
* config/rs6000/rs6000-protos.h (rs6000_function_pcrel_p): New
prototype.
(rs6000_pcrel_p): Remove argument.
* config/rs6000/rs6000.c (rs6000_legitimize_tls_address): Remove
argument from rs6000_pcrel_p call.
(rs6000_call_template_1): Likewise.
(rs6000_indirect_call_template_1): Likewise.
(rs6000_longcall_ref): Likewise.
(rs6000_call_aix): Likewise.
(rs6000_sibcall_aix): Likewise.
(rs6000_function_pcrel_p): Rename from rs6000_pcrel_p.
(rs6000_pcrel_p): Rewrite.
* config/rs6000/rs6000.md (*pltseq_plt_pcrel<mode>): Remove
argument from rs6000_pcrel_p call.
(*call_local<mode>): Likewise.
(*call_value_local<mode>): Likewise.
(*call_nonlocal_aix<mode>): Likewise.
(*call_value_nonlocal_aix<mode>): Likewise.
(*call_indirect_pcrel<mode>): Likewise.
(*call_value_indirect_pcrel<mode>): Likewise.
Here we ICE in char_span::subspan because the offset it gets is -1.
It's -1 because get_substring_ranges_for_loc gets a location whose
column was 0. That only happens in testcases like the attached where
we're dealing with extremely long lines (at least 4065 chars it seems).
This does happen in practice, though, so it's not just a theoretical
problem (e.g. when building the SU2 suite).
Fixed by checking that the column get_substring_ranges_for_loc gets is
sane, akin to other checks in that function.
gcc/ChangeLog:
PR preprocessor/96935
* input.c (get_substring_ranges_for_loc): Return if start.column
is less than 1.
gcc/testsuite/ChangeLog:
PR preprocessor/96935
* gcc.dg/format/pr96935.c: New test.
Resolves:
PR middle-end/96295 - -Wmaybe-uninitialized warning for range operator with
reference to an empty struct
gcc/ChangeLog:
PR middle-end/96295
* tree-ssa-uninit.c (maybe_warn_operand): Work harder to avoid
warning for objects of empty structs
gcc/testsuite/ChangeLog:
PR middle-end/96295
* g++.dg/warn/Wuninitialized-11.C: New test.
This corrects the earlier problems with removing the template header
from local omp reductions. And it uncovered a latent bug. When we
tsubst such a decl, we immediately tsubst its body.
cp_check_omp_declare_reduction gets a success return value to gate
that instantiation.
udr-2.C got a further error, as the omp checking machinery doesn't
appear to turn the reduction into an error mark when failing. I
didn't dig into that further. udr-3.C appears to have been invalid
and accidentally worked.
gcc/cp/
* cp-tree.h (cp_check_omp_declare_reduction): Return bool.
* semantics.c (cp_check_omp_declare_reduction): Return true on for
success.
* pt.c (push_template_decl_real): OMP reductions do not get a
template header.
(tsubst_function_decl): Remove special casing for local decl omp
reductions.
(tsubst_expr): Call instantiate_body for a local omp reduction.
(instantiate_body): Add nested_p parm, and deal with such
instantiations.
(instantiate_decl): Reject FUNCTION_SCOPE entities, adjust
instantiate_body call.
gcc/testsuite/
* g++.dg/gomp/udr-2.C: Add additional expected error.
libgomp/
* testsuite/libgomp.c++/udr-3.C: Add missing ctor.
This adds the PPC architecture variants for Mach-O libbacktrace.
With this (as for X86 and Arm) when dsymutil is run on the binary
we get a basic usable backtrace.
Testsuite results on powerpc-apple-darwin9 are the same as for X86:
* btest fails (TBC why)
* dwarf5 tests fail because dsymutil does not handle that so far.
libbacktrace/ChangeLog:
* macho.c (MACH_O_CPU_TYPE_PPC): New.
(MACH_O_CPU_TYPE_PPC64): New.
Add compile-tests for powerpc to the Mach-O variants.
This restores the post-order traversal done by cleanup_all_empty_eh in
order to eliminate empty landing pads and also contains a small tweak
to the line debug info to avoid a problematic inheritance for coverage
measurement.
gcc/ChangeLog:
* tree-eh.c (lower_try_finally_dup_block): Backward propagate slocs
to stack restore builtin calls.
(cleanup_all_empty_eh): Do again a post-order traversal of the EH
region tree.
gcc/testsuite/ChangeLog:
* gnat.dg/concat4.adb: New test.
instantiate_body has a local var call 'nested', which indicates that
this instantiation was caused during the body of some function -- not
necessarily its containing scope. That's confusing, let's just use
'current_function_decl' directly. Then we can also simplify the
push_to_top_level logic, which /does/ indicate whether this is an
actual nested function. (C++ does not have nested functions, but OMP
ODRs fall into that category. A follow up patch will use that more
usual meaning of 'nested' wrt to functions.)
gcc/cp/
* pt.c (instantiate_body): Remove 'nested' var, simplify
push_to_top logic.
This refactors instantiate_decl, breaking out the actual instantiation
work to instantiate_body. That'll allow me to address the OMP UDR
issue, but it also means we have slightly neater code in
instantiate_decl anyway.
gcc/cp/
* pt.c (instantiate_body): New, broken out of ..
(instantiate_decl): ... here. Call it.
gcc/ChangeLog
2020-09-09 Andrea Corallo <andrea.corallo@arm.com>
* tree-vect-loop.c (vect_need_peeling_or_partial_vectors_p): New
function.
(vect_analyze_loop_2): Make use of it not to select partial
vectors if no peel is required.
(determine_peel_for_niter): Move out some logic into
'vect_need_peeling_or_partial_vectors_p'.
gcc/testsuite/ChangeLog
2020-09-09 Andrea Corallo <andrea.corallo@arm.com>
* gcc.target/aarch64/sve/cost_model_10.c: New test.
* gcc.target/aarch64/sve/clastb_8.c: Update test for new
vectorization strategy.
* gcc.target/aarch64/sve/cost_model_5.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_14.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_15.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_16.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_17.c: Likewise.
Add sp_is_clobbered_by_asm to rtl_data to inform backends that the stack
pointer is clobbered by asm statement.
gcc/
PR target/97032
* cfgexpand.c (asm_clobber_reg_kind): Set sp_is_clobbered_by_asm
to true if the stack pointer is clobbered by asm statement.
* emit-rtl.h (rtl_data): Add sp_is_clobbered_by_asm.
* config/i386/i386.c (ix86_get_drap_rtx): Set need_drap to true
if the stack pointer is clobbered by asm statement.
gcc/testsuite/
PR target/97032
* gcc.target/i386/pr97032.c: New test.
The following testcases will be simplified by the new rule
(T)(A) +- (T)(B) -> (T)(A +- B), so could not keep code pattern
expected by test-check. Adjust test code to suppress simplification.
2020-09-16 Feng Xue <fxue@os.amperecomputing.com>
gcc/testsuite/
PR testsuite/97066
* gcc.dg/ifcvt-3.c: Modified to suppress simplification.
* gcc.dg/tree-ssa/20030807-10.c: Likewise.
Certain alternatives of *vec_tf_to_v1tf use "v" constraint for its
TFmode source operand. Therefore it is assigned to VEC_REGS class,
and when it is reloaded using *movtf_64, whose relevant alternatives
need FP_REGS, LRA loops and ICE happens. The reason is that register
class mismatch causes LRA to emit another reload, which triggers this
issue again.
Fix by using "f" constraint, which is more appropriate for FP register
pairs anyway.
gcc/ChangeLog:
2020-09-02 Ilya Leoshkevich <iii@linux.ibm.com>
* config/s390/vector.md(*vec_tf_to_v1tf): Use "f" instead of "v"
for the source operand.
This removes STMT_VINFO_NUM_SLP_USES by pushing the setting of
the shared stmt_vec_info vector type to where we actually need it
which is alignment analysis and vectorizable_* analysis (where
we could eventually elide it for non-load/store operations).
In particular "uses" in the cache and in disqualified SLP
subgraphs should no longer provide conflicting vector types
this way.
2020-09-16 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_stmt_vec_info::num_slp_uses): Remove.
(STMT_VINFO_NUM_SLP_USES): Likewise.
(vect_free_slp_instance): Adjust.
(vect_update_shared_vectype): Declare.
* tree-vectorizer.c (vec_info::~vec_info): Adjust.
* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
(vectorizable_live_operation): Use vector type from
SLP_TREE_REPRESENTATIVE.
(vect_transform_loop): Adjust.
* tree-vect-data-refs.c (vect_slp_analyze_node_alignment):
Set the shared vector type.
* tree-vect-slp.c (vect_free_slp_tree): Remove final_p
parameter, remove STMT_VINFO_NUM_SLP_USES updating.
(vect_free_slp_instance): Adjust.
(vect_create_new_slp_node): Remove STMT_VINFO_NUM_SLP_USES
updating.
(vect_update_shared_vectype): Always compare with the
present vector type, update if NULL.
(vect_build_slp_tree_1): Do not update the shared vector
type here.
(vect_build_slp_tree_2): Adjust.
(slp_copy_subtree): Likewise.
(vect_attempt_slp_rearrange_stmts): Likewise.
(vect_analyze_slp_instance): Likewise.
(vect_analyze_slp): Likewise.
(vect_slp_analyze_node_operations_1): Update the shared
vector type.
(vect_slp_analyze_operations): Adjust.
(vect_slp_analyze_bb_1): Likewise.
2020-09-16 Jakub Jelinek <jakub@redhat.com>
* config/arm/arm.c (arm_option_restore): Comment out opts argument
name to avoid unused parameter warnings.
When working on the previous patch, I've noticed that all cl_optimization
fields appart from strings are streamed with bp_pack_value (..., 64); so we
waste quite a lot of space, given that many of the options are just booleans
or char options and there are 450-ish of them.
Fixed by streaming the number of bits the corresponding fields have.
While for char fields we have also range information, except for 3
it is either -128, 127 or 0, 255, so it didn't seem worth it to bother
with using range-ish packing.
2020-09-16 Jakub Jelinek <jakub@redhat.com>
* optc-save-gen.awk: In cl_optimization_stream_out use
bp_pack_var_len_{int,unsigned} instead of bp_pack_value. In
cl_optimization_stream_in use bp_unpack_var_len_{int,unsigned}
instead of bp_unpack_value. Formatting fix.
As the testcases show, if we have something like:
MEM <char[12]> [&b + 8B] = {};
MEM[(short *) &b] = 5;
_5 = *x_4(D);
MEM <long long unsigned int> [&b + 2B] = _5;
MEM[(char *)&b + 16B] = 88;
MEM[(int *)&b + 20B] = 1;
then in sort_by_bitpos the stores are almost like in the given order,
except the first store is after the = _5; store.
We can't coalesce the = 5; store with = _5;, because the latter is MEM_REF,
while the former INTEGER_CST, and we can't coalesce the = _5 store with
the = {} store because the former is MEM_REF, the latter INTEGER_CST.
But we happily coalesce the remaining 3 stores, which is wrong, because the
= _5; store overlaps those and is in between them in the program order.
We already have code to deal with similar cases in check_no_overlap, but we
deal only with the following stores in sort_by_bitpos order, not the earlier
ones.
The following patch checks also the earlier ones. In coalesce_immediate_stores
it computes the first one that needs to be checked (all the ones whose
bitpos + bitsize is smaller or equal to merged_store->start don't need to be
checked and don't need to be checked even for any following attempts because
of the sort_by_bitpos sorting) and the end of that (that is the first store
in the merged_store).
2020-09-16 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/97053
* gimple-ssa-store-merging.c (check_no_overlap): Add FIRST_ORDER,
START, FIRST_EARLIER and LAST_EARLIER arguments. Return false if
any stores between FIRST_EARLIER inclusive and LAST_EARLIER exclusive
has order in between FIRST_ORDER and LAST_ORDER and overlaps the to
be merged store.
(imm_store_chain_info::try_coalesce_bswap): Add FIRST_EARLIER argument.
Adjust check_no_overlap caller.
(imm_store_chain_info::coalesce_immediate_stores): Add first_earlier
and last_earlier variables, adjust them during iterations. Adjust
check_no_overlap callers, call check_no_overlap even when extending
overlapping stores by extra INTEGER_CST stores.
* gcc.dg/store_merging_31.c: New test.
* gcc.dg/store_merging_32.c: New test.
This patch is to extend the existing function find_alignment_op to
check all defintions of base_reg are AND operations with mask -16B
to force the alignment. If all are satifised, it passes all AND
operations and instructions to function recombine_lvx_pattern
and recombine_stvx_pattern, they can remove all useless ANDs
further.
Bootstrapped/regtested on powerpc64le-linux-gnu P8.
gcc/ChangeLog:
PR target/97019
* config/rs6000/rs6000-p8swap.c (find_alignment_op): Adjust to
support multiple defintions which are all AND operations with
the mask -16B.
(recombine_lvx_pattern): Adjust to handle multiple AND operations
from find_alignment_op.
(recombine_stvx_pattern): Likewise.
gcc/testsuite/ChangeLog:
PR target/97019
* gcc.target/powerpc/pr97019.c: New test.
The description in rs6000-builtin.def provides for a builtin named
__builtin_altivec_xst_len_r. However, it is hand-defined in
altivec_init_builtins as __builtin_xst_len_r, against the usual naming
practice. Fix that.
2020-09-15 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000-call.c (altivec_init_builtins): Fix name
of __builtin_altivec_xst_len_r.
PR analyzer/96650 reports an assertion failure when merging the
intersection of two sets of constraints, due to the resulting
constraints being infeasible.
It turns out that the two input sets were each infeasible if
transitivity were considered, but -fanalyzer-transitivity was off.
However for this case, the merging code was "discovering" the
transitive infeasibility of the intersection of the constraints even
when -fanalyzer-transitivity is off, triggering an assertion failure.
I attempted various fixes for this, but each of them would have
introduced O(N^2) logic into the constraint-handling code into the
-fno-analyzer-transitivity case (with N == the number of constraints).
This patch fixes the ICE by tweaking the assertion, so that we
silently drop such constraints if -fanalyzer-transitivity is off.
gcc/analyzer/ChangeLog:
PR analyzer/96650
* constraint-manager.cc (merger_fact_visitor::on_fact): Replace
assertion that add_constraint succeeded with an assertion that
if it fails, -fanalyzer-transitivity is off.
gcc/testsuite/ChangeLog:
PR analyzer/96650
* gcc.dg/analyzer/pr96650-1-notrans.c: New test.
* gcc.dg/analyzer/pr96650-1-trans.c: New test.
* gcc.dg/analyzer/pr96650-2-notrans.c: New test.
* gcc.dg/analyzer/pr96650-2-trans.c: New test.
The following s390 rtx is errneously considered a no-op:
(set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8))
Here, SET_DEST is a second register in a floating-point register pair,
and SET_SRC is the second half of a vector register, so they refer to
different bits.
Fix by treating subregs of registers in different modes conservatively.
gcc/ChangeLog:
2020-09-11 Ilya Leoshkevich <iii@linux.ibm.com>
* rtlanal.c (set_noop_p): Treat subregs of registers in
different modes conservatively.
Since some time we're only using this argument to communicate from
vect_build_slp_tree_1 to vect_get_and_check_slp_defs. This makes
the direction of information flow clear.
2020-09-15 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_get_and_check_slp_defs): Make swap
argument by-value and do not change it.
(vect_build_slp_tree_2): Adjust, set swap to NULL after last
use.
Turns out I didn't get OMP reductions correct. To address those I
need to do some reorganization, so this patch just reverts the
OMP-specific pieces of the local decl changes.
gcc/cp/
* pt.c (push_template_decl_real): OMP reductions retain a template
header.
(tsubst_function_decl): Likewise.
Add a rule (T)(A) +- (T)(B) -> (T)(A +- B), which works only when (A +- B)
could be folded to a simple value. By this rule, a plusminus-mult-with-convert
expression could be handed over to the rule (A * C) +- (B * C) -> (A +- B).
2020-09-15 Feng Xue <fxue@os.amperecomputing.com>
gcc/
PR tree-optimization/94234
* match.pd (T)(A) +- (T)(B) -> (T)(A +- B): New simplification.
gcc/testsuite/
PR tree-optimization/94234
* gcc.dg/pr94234-3.c: New test.
After the previous patch we are left with an unreachable BB. This will
ICE if either we have -fschedule-fusion, or we do not have peephole2.
2020-09-15 Segher Boessenkool <segher@kernel.crashing.org>
PR rtl-optimization/96475
* bb-reorder.c (duplicate_computed_gotos): If we did anything, run
cleanup_cfg.
The following allows more BB vectorization by generally building leafs
from scalars rather than giving up. Note this is only a first step
towards this and as can be seen with the exception for node splitting
it is generally hard to get this heuristic sound. I've added variants
of the bb-slp-48.c testcase to make sure we still try permuting for
example.
2020-09-15 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_build_slp_tree_2): Also consider
building an operand from scalars when building it did not
fail fatally but avoid messing with the upcall splitting
of groups.
* gcc.dg/vect/bb-slp-48.c: New testcase.
* gcc.dg/vect/bb-slp-7.c: Adjust.
This patch changes the test to use the effective-target machinery disables the
error message "ARMv8-M Security Extensions incompatible with selected FPU" when
-mfloat-abi=soft.
Further changes 'asm' to '__asm__' to avoid failures with '-std=' options.
gcc/ChangeLog:
2020-07-06 Andre Vieira <andre.simoesdiasvieira@arm.com>
* config/arm/arm.c (arm_options_perform_arch_sanity_checks): Do not
check +D32 for CMSE if -mfloat-abi=soft
gcc/testsuite/ChangeLog:
2020-07-06 Andre Vieira <andre.simoesdiasvieira@arm.com>
* gcc.target/arm/pr95646.c: Fix testism.
gcc/ChangeLog:
PR target/96744
* config/i386/x86-tune-costs.h (struct processor_costs):
Increase mask <-> integer cost for non AVX512 target to avoid
spill gpr to mask. Also retune mask <-> integer and
mask_load/store for skylake_cost.
These patterns printed bogus <>s around the {1to16} and similar strings.
2020-09-15 Jakub Jelinek <jakub@redhat.com>
PR target/97028
* config/i386/sse.md (mul<mode>3<mask_name>_bcs,
<avx512>_div<mode>3<mask_name>_bcst): Use <avx512bcst> instead of
<<avx512bcst>>.
* gcc.target/i386/avx512f-pr97028.c: Untested fix.
2020-09-03 Feng Xue <fxue@os.amperecomputing.com>
gcc/
PR tree-optimization/94234
* genmatch.c (dt_simplify::gen_1): Emit check on final simplification
result when "!" is specified on toplevel output expr.
* match.pd ((A * C) +- (B * C) -> (A +- B) * C): Allow folding on expr
with multi-use operands if final result is a simple gimple value.
gcc/testsuite/
PR tree-optimization/94234
* gcc.dg/pr94234-2.c: New test.
The BPF ISA doesn't have a no-operation instruction, but in practice
the Linux kernel verifier performs some optimizations that rely on
these instructions to be encoded in a particular way. As it turns
out, we were using the "wrong" instruction in GCC.
This patch makes GCC to generate the expected instruction for NOP (a
`ja 0') and also adds a test to make sure this is the case.
Tested in bpf-unknown-none targets.
2020-09-14 Jose E. Marchesi <jose.marchesi@oracle.com>
gcc/
* config/bpf/bpf.md ("nop"): Re-define as `ja 0'.
gcc/testsuite/
* gcc.target/bpf/nop-1.c: New test.
The tests assume non-PIC for m32 (which means that they fail on default
PIC targets, like Darwin). There is also a missing space before the
closing '}' on one of the dg-require-effective-target which means that
test fails on machines without avx512f.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512f-broadcast-pr87767-1.c: Make the test
run as non-dynamic for m32 Darwin.
* gcc.target/i386/avx512f-broadcast-pr87767-3.c: Likewise.
* gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/avx512f-broadcast-pr87767-7.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-3.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/avx512f-broadcast-pr87767-6.c: Adjust dg-requires
clause.
Now we consistently mark local externs with DECL_LOCAL_DECL_P, we can
teach the template machinery not to give them a TEMPLATE_DECL head,
and the instantiation machinery treat them as the local specialiations
they are. (openmp UDRs also fall into this category, and are dealt
with similarly.)
gcc/cp/
* pt.c (push_template_decl_real): Don't attach a template head to
local externs.
(tsubst_function_decl): Add support for headless local extern
decls.
(tsubst_decl): Add support for headless local extern decls.
On attempting to run the full test suite with -fanalyzer via
make check RUNTESTFLAGS="-v -v --target_board=unix/-fanalyzer"
I saw it get stuck on:
gcc.c-torture/compile/20001226-1.c
It turns out this was on a debug build, rather than a release build;
but a release build with -fanalyzer took:
real 1m33.689s
user 1m30.239s
sys 0m2.727s
as compared to:
real 0m2.361s
user 0m2.107s
sys 0m0.214s
without -fanalyzer.
This torture test performs 64 * 64 uniqely-coded comparisons between
elements of a pair of arrays until it finds an element that's different,
leading to an accumulation of 4096 constraints along the path where
no difference is found.
"perf" shows most of the time is spent in canonicalizing and copying
constraint_manager instances, presumably as it copies and merges states
with increasingly more complex sets of constraints as it analyzes
further along the "no differences yet" path.
This patch crudely works around this by adding a
-param=analyzer-max-constraints=
limit, defaulting to 20, above which constraints will be silently
dropped. With this -fanalyzer takes:
real 0m6.935s
user 0m6.413s
sys 0m0.396s
on the above case.
gcc/analyzer/ChangeLog:
* analyzer.opt (-param=analyzer-max-constraints=): New param.
* constraint-manager.cc
(constraint_manager::add_constraint_internal): Silently reject
attempts to add constraints when the above limit is reached.