This patch adds support for generating LDPs and STPs of Q-registers.
This allows for more compact code generation and makes better use of the ISA.
It's implemented in a straightforward way by allowing 16-byte modes in the
sched-fusion machinery and adding appropriate peepholes in aarch64-ldpstp.md
as well as the patterns themselves in aarch64-simd.md.
It adds a new no_ldp_stp_qregs tuning flag.
I use it to restrict the peepholes in aarch64-ldpstp.md from merging the
operations together into PARALLELs. I also use it to restrict the sched fusion
check that brings such loads and stores together. This is enough to avoid
forming the pairs when the tuning flag is set.
I didn't see any non-noise performance effect on SPEC2017 on Cortex-A72 and Cortex-A53.
* config/aarch64/aarch64-tuning-flags.def (no_ldp_stp_qregs): New.
* config/aarch64/aarch64.c (xgene1_tunings): Add
AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS to tune_flags.
(aarch64_mode_valid_for_sched_fusion_p):
Allow 16-byte modes.
(aarch64_classify_address): Allow 16-byte modes for load_store_pair_p.
* config/aarch64/aarch64-ldpstp.md: Add peepholes for LDP STP of
128-bit modes.
* config/aarch64/aarch64-simd.md (load_pair<VQ:mode><VQ2:mode>):
New pattern.
(vec_store_pair<VQ:mode><VQ2:mode>): Likewise.
* config/aarch64/iterators.md (VQ2): New mode iterator.
* gcc.target/aarch64/ldp_stp_q.c: New test.
* gcc.target/aarch64/stp_vec_128_1.c: Likewise.
* gcc.target/aarch64/ldp_stp_q_disable.c: Likewise.
From-SVN: r261796
2018-06-20 Martin Liska <mliska@suse.cz>
* tree-switch-conversion.c (jump_table_cluster::can_be_handled):
Change default ratio from 10 to 8.
From-SVN: r261795
This patch makes pattern recognisers do their own checking for vector
types and target support. Previously some recognisers did this
themselves and some left it to vect_pattern_recog_1.
Doing this means we can get rid of the type_in argument, which was
ignored if the recogniser did its own checking. It also means
we create fewer junk statements.
2018-06-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (NUM_PATTERNS, vect_recog_func_ptr): Move to
tree-vect-patterns.c.
* tree-vect-patterns.c (vect_supportable_direct_optab_p): New function.
(vect_recog_dot_prod_pattern): Use it. Remove the type_in argument.
(vect_recog_sad_pattern): Likewise.
(vect_recog_widen_sum_pattern): Likewise.
(vect_recog_pow_pattern): Likewise. Check for a null vectype.
(vect_recog_widen_shift_pattern): Remove the type_in argument.
(vect_recog_rotate_pattern): Likewise.
(vect_recog_mult_pattern): Likewise.
(vect_recog_vector_vector_shift_pattern): Likewise.
(vect_recog_divmod_pattern): Likewise.
(vect_recog_mixed_size_cond_pattern): Likewise.
(vect_recog_bool_pattern): Likewise.
(vect_recog_mask_conversion_pattern): Likewise.
(vect_try_gather_scatter_pattern): Likewise.
(vect_recog_widen_mult_pattern): Likewise. Check for a null vectype.
(vect_recog_over_widening_pattern): Likewise.
(vect_recog_gather_scatter_pattern): Likewise.
(vect_recog_func_ptr): Move from tree-vectorizer.h
(vect_vect_recog_func_ptrs): Move further down the file.
(vect_recog_func): Likewise. Remove the third argument.
(NUM_PATTERNS): Define based on vect_vect_recog_func_ptrs.
(vect_pattern_recog_1): Expect the pattern function to do any
necessary target tests. Also expect it to provide a vector type.
Remove the type_in handling.
From-SVN: r261791
This message is a long write-up for a patch that simply adds a common
routine for printing the "vector_foo_pattern: detected:" messages.
The reason for doing this is that some routines check for target support
themselves and some leave it to vect_pattern_recog_1. Those that leave
it to vect_pattern_recog_1 currently print these "detected:" messages if
the statements have the right form, even if the pattern is eventually
discarded. IMO that's useful, and a lot of existing scan tests rely on it.
However, a later patch makes patterns do their own testing, and stops
them creating pattern statements until the tests have passed. This means
(a) they need to print the "detected:" message earlier and (b) the pattern
statement won't be around to print.
The patch therefore makes all routines print the original statement
rather than the pattern one. That information isn't obvious otherwise,
whereas vect_pattern_recog_1 already prints the pattern statement
in the case of a successful match. This also avoids the previous
situation in which a routine could print "detected:" and then
silently bail out before saying what had been detected.
2018-06-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-patterns.c (vect_pattern_detected): New function.
(vect_recog_dot_prod_patternm, vect_recog_sad_pattern)
(vect_recog_widen_mult_pattern, vect_recog_widen_sum_pattern)
(vect_recog_over_widening_pattern, vect_recog_widen_shift_pattern
(vect_recog_rotate_pattern, vect_recog_vector_vector_shift_pattern)
(vect_recog_mult_pattern, vect_recog_divmod_pattern)
(vect_recog_mixed_size_cond_pattern, vect_recog_bool_pattern)
(vect_recog_mask_conversion_pattern)
(vect_try_gather_scatter_pattern): Likewise.
From-SVN: r261790
This patch adds a helper for pattern code that wants to find an
internal (vectorisable) definition of an SSA name.
A later patch will make more use of this, and alter the definition.
2018-06-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-patterns.c (vect_get_internal_def): New function.
(vect_recog_dot_prod_pattern, vect_recog_sad_pattern)
(vect_recog_vector_vector_shift_pattern, check_bool_pattern)
(search_type_for_mask_1): Use it.
From-SVN: r261789
vect_recog_dot_prod_pattern and vect_recog_sad_pattern both checked
whether the statement passed in had already been recognised as a
WIDEN_SUM_EXPR pattern. That isn't possible (any more?), since the
first recognised pattern wins, and since vect_recog_widen_sum_pattern
never matches a later statement than the one it's given.
2018-06-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-patterns.c (vect_recog_dot_prod_pattern): Remove
redundant WIDEN_SUM_EXPR handling.
(vect_recog_sad_pattern): Likewise.
From-SVN: r261788
tree-vect-patterns.c checked that operands to primitive arithmetic ops
are compatible with each other and with the result. The checks date
back years and have long been redundant with verify_gimple_stmt.
2018-06-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-patterns.c (vect_recog_dot_prod_pattern): Remove
redundant check that the types of a PLUS_EXPR or MULT_EXPR agree.
(vect_recog_sad_pattern): Likewise PLUS_EXPR, ABS_EXPR and MINUS_EXPR.
(vect_recog_widen_mult_pattern): Likewise MULT_EXPR.
(vect_recog_widen_sum_pattern): Likewise PLUS_EXPR.
From-SVN: r261787
vectorizable_call stubs out the original scalar statement with
a dummy assignment to the same lhs, so that we don't leave any bogus
scalar calls around. If the call is actually a pattern statement,
the code rightly took the lhs of the original bb statement:
if (is_pattern_stmt_p (stmt_info))
lhs = gimple_call_lhs (STMT_VINFO_RELATED_STMT (stmt_info));
else
lhs = gimple_call_lhs (stmt);
But it then associated the new statement with the stmt_vec_info of the
pattern statement rather than the bb statement, which meant we had two
stmt_vec_infos assigning to the same lhs. This seems to be latent at
the moment but caused problems further into the series.
2018-06-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-stmts.c (vectorizable_call): Make sure that we
use the stmt_vec_info of the original bb statement for the
new zero assignment, even if the call is part of a pattern.
From-SVN: r261786
A pattern's PATTERN_DEF_SEQ was attached to both the original statement
and the main pattern statement, which made it harder to update later.
This patch attaches it to just the original statement. In practice,
anything that cared had ready access to both.
2018-06-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (_stmt_vec_info): Note above pattern_def_seq
that the sequence is attached to the original statement rather
than the pattern statement.
* tree-vect-loop.c (vect_determine_vf_for_stmt): Take the
PATTERN_DEF_SEQ from the original statement rather than
the main pattern statement.
* tree-vect-stmts.c (free_stmt_vec_info): Likewise.
* tree-vect-patterns.c (vect_recog_dot_prod_pattern): Likewise.
(vect_mark_pattern_stmts): Don't copy the PATTERN_DEF_SEQ.
From-SVN: r261785
This patch is the first part of a series to fix to PR85694.
Later patches can make the pattern for a statement S2 reuse the
results of a PATTERN_DEF_SEQ statement attached to an earlier
statement S1. Although vect_mark_stmts_to_be_vectorized handled
this fine, vect_analyze_stmt and vect_transform_loop both skipped the
PATTERN_DEF_SEQ for S1 if S1's main pattern wasn't live or relevant.
I couldn't wrap my head around the flow in vect_transform_loop,
so ended up moving the per-statement handling into a subroutine.
That makes the patch look bigger than it actually is.
2018-06-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-stmts.c (vect_analyze_stmt): Move the handling of pattern
definition statements before the early exit for statements that aren't
live or relevant.
* tree-vect-loop.c (vect_transform_loop_stmt): New function,
split out from...
(vect_transform_loop): ...here. Process pattern definition
statements without first checking whether the main pattern
statement is live or relevant.
From-SVN: r261784
* tree-cfgcleanup.c (tree_forwarder_block_p): Do not return false at
-O0 if the locus represent UNKNOWN_LOCATION but have different values.
From-SVN: r261770
The issue is caused by reordering of stack pointer update after stack
space allocation with instructions that write to the allocated stack
space. In windowed ABI register spill area for the previous call frame
is located just below the stack pointer and may be reloaded back into
the register file on movsp.
Implement allocate_stack pattern for windowed ABI configuration and
insert an instruction that prevents reordering of frame memory access
and stack pointer update.
gcc/
2018-06-19 Max Filippov <jcmvbkbc@gmail.com>
* config/xtensa/xtensa.md (UNSPEC_FRAME_BLOCKAGE): New unspec
constant.
(allocate_stack, frame_blockage, *frame_blockage): New patterns.
From-SVN: r261755
This header was needed for the declaration of std::terminate but the
calls to it were removed in r242401.
* include/std/utility: Remove unused <exception> header.
From-SVN: r261749
* zlib/configure.ac: Restore old behaviour of only enabling
multilibs when a target subdirectory is defined. This allows
building with srcdir == builddir.
* zlib/configure: Regenerate.
From-SVN: r261739
The existing code allows only 4 vectors worth of ieee128 homogeneous
aggregates, but it should be 8. This happens because at one spot it
is mistakenly qualified as being passed in floating point registers.
PR target/86197
* config/rs6000/rs6000.md (rs6000_discover_homogeneous_aggregate): An
ieee128 argument takes up only one (vector) register, not two (floating
point) registers.
From-SVN: r261738
2018-06-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/86179
* tree-vect-patterns.c (vect_pattern_recog_1): Clean up
after failed recognition.
* gcc.dg/pr86179.c: New testcase.
From-SVN: r261731
* include/std/scoped_allocator (__not_pair): Define SFINAE helper.
(construct(_Tp*, _Args&&...)): Remove from overload set when _Tp is
a specialization of std::pair.
* testsuite/20_util/scoped_allocator/construct_pair.cc: Ensure
pair elements are constructed correctly.
From-SVN: r261716
[gcc]
2018-06-18 Michael Meissner <meissner@linux.ibm.com>
PR target/85358
* config/rs6000/rs6000-modes.def (toplevel): Rework the 128-bit
floating point modes, so that IFmode is numerically greater than
TFmode, which is greater than KFmode using FRACTIONAL_FLOAT_MODE
to declare the ordering. This prevents IFmode from being
converted to TFmode when long double is IEEE 128-bit on an ISA 3.0
machine. Include rs6000-modes.h to share the fractional values
between genmodes* and the rest of the compiler.
(IFmode): Likewise.
(KFmode): Likewise.
(TFmode): Likewise.
* config/rs6000/rs6000-modes.h: New file.
* config/rs6000/rs6000.c (rs6000_debug_reg_global): Change the
meaning of rs6000_long_double_size so that 126..128 selects an
appropriate 128-bit floating point type.
(rs6000_option_override_internal): Likewise.
* config/rs6000/rs6000.h (toplevel): Include rs6000-modes.h.
(TARGET_LONG_DOUBLE_128): Change the meaning of
rs6000_long_double_size so that 126..128 selects an appropriate
128-bit floating point type.
(LONG_DOUBLE_TYPE_SIZE): Update comment.
* config/rs6000/rs6000.md (trunciftf2): Correct the modes of the
source and destination to match the standard usage.
(truncifkf2): Likewise.
(copysign<mode>3, IEEE iterator): Rework copysign of float128 on
ISA 2.07 to use an explicit clobber, instead of passing in a
temporary.
(copysign<mode>3_soft): Likewise.
[libgcc]
2018-06-18 Michael Meissner <meissner@linux.ibm.com>
* config/rs6000/t-float128 (FP128_CFLAGS_SW): Compile float128
support modules with -mno-gnu-attribute.
* config/rs6000/t-float128-hw (FP128_CFLAGS_HW): Likewise.
From-SVN: r261712
By only defining these operators as friends (with no namespace-scope
declaration) they can only be found by ADL and do not participate in
overload resolution for arguments of types other than path.
LWG 2989 hide path iostream operators from normal lookup
* include/bits/fs_path.h (operator<<, operator>>): Define inline as
friends.
* testsuite/27_io/filesystem/path/io/dr2989.cc: New.
From-SVN: r261711