Commit graph

180835 commits

Author SHA1 Message Date
Dennis Zhang
3553c65853 aarch64: intrinsics extract half of bf16 vector
This patch implements ACLE intrinsics vget_low_bf16 and vget_high_bf16
to extract lower or higher half from a bfloat16x8 vector. The
vget_high_bf16 is done by 'dup' instruction. The vget_low_bf16 is just
to return the lower half of a vector register. Tests include both big-
and little-endian cases.

gcc/ChangeLog:

2020-11-03  Dennis Zhang  <dennis.zhang@arm.com>

	* config/aarch64/aarch64-simd-builtins.def (vget_lo_half): New entry.
	(vget_hi_half): Likewise.
	* config/aarch64/aarch64-simd.md (aarch64_vget_lo_halfv8bf): New entry.
	(aarch64_vget_hi_halfv8bf): Likewise.
	* config/aarch64/arm_neon.h (vget_low_bf16): New intrinsic.
	(vget_high_bf16): Likewise.

gcc/testsuite/ChangeLog

	* gcc.target/aarch64/advsimd-intrinsics/bf16_get.c: New test.
	* gcc.target/aarch64/advsimd-intrinsics/bf16_get-be.c: New test.
2020-11-03 16:56:02 +00:00
Nathan Sidwell
cee45e4912 c++: Directly fixup deferred eh-specs
eh-specifiers in a class definition are complete-definition contexts,
and we sometimes need to deferr their parsing.  We create a deferred
eh specifier, which can end up persisting in the type system due to
variants being created before the deferred parse.  This causes
problems in modules handling.

This patch adds fixup_deferred_exception_variants, which directly
modifies the variants of such an eh spec once parsed.  As commented,
the general case is quite hard, so it doesn't deal with everything.
But I do catch the cases I encountered (from the std library).

	gcc/cp/
	* cp-tree.h (fixup_deferred_exception_variants): Declare.
	* parser.c (cp_parser_class_specifier_1): Call it when
	completing deferred parses rather than creating a variant.
	(cp_parser_member_declaration): Move comment from ...
	(cp_parser_noexcept_specification_opt): ... here.  Refactor the
	deferred parse.
	* tree.c (fixup_deferred_exception_variants): New.
2020-11-03 08:49:27 -08:00
Nathan Sidwell
1c8b8efa5b c++: A couple of template instantiation cleanups
I noticed that we were handling lambda extra scope during template
instantiation in a different order to how we handle the non-template
case.  Reordered that for consistency.  Also some more RAII during
template instantiation.

	gcc/cp/
	* pt.c (tsubst_lambda_expr): Reorder extra-scope handling to match
	the non-template case.
	(instantiate_body): Move a couple of declarations to their
	initializers.
2020-11-03 08:49:26 -08:00
Nathan Sidwell
770ec066b8 c++: Make extern-C mismatch an error
duplicate_decls was being lenient about extern-c mismatches, allowing
you to have two declarations in the symbol table after emitting an
error.  This resulted in duplicate error messages in modules, when we
find the same problem multiple times.  Let's just not let that happen.

	gcc/cp/
	* decl.c (duplicate_decls): Return error_mark_node fo extern-c
	mismatch.
2020-11-03 08:49:26 -08:00
Nathan Sidwell
082a7b2390 cpplib: Fix off-by-one error
I noticed a fencepost error in the preprocessor.  We should be
checking if the next char is at the limit, not the current char (which
can't be, because we're looking at it).

	libcpp/
	* lex.c (_cpp_clean_line): Fix DOS off-by-one error.
2020-11-03 08:49:25 -08:00
Tobias Burnus
84ed8d2c88 gcc-changelog/git_email.py: Support older unidiff modules
contrib/ChangeLog:

	* gcc-changelog/git_email.py: Add unidiff_supports_renaming check.
2020-11-03 17:46:36 +01:00
Martin Liska
19859d6ba6 Add setup.cfg for pytest.
contrib/ChangeLog:

	* gcc-changelog/setup.cfg: New file.
2020-11-03 17:32:10 +01:00
Yang Yang
abe93733a2 PR target/96342 Change field "simdlen" into poly_uint64
This is the first patch of PR96342. In order to add support for
"omp declare simd", change the type of the field "simdlen" of
struct cgraph_simd_clone from unsigned int to poly_uint64 and
related adaptation. Since the length might be variable for the
SVE cases.

2020-11-03  Yang Yang  <yangyang305@huawei.com>

gcc/ChangeLog:

	* cgraph.h (struct cgraph_simd_clone): Change field "simdlen" of
	struct cgraph_simd_clone from unsigned int to poly_uint64.
	* config/aarch64/aarch64.c
	(aarch64_simd_clone_compute_vecsize_and_simdlen): adaptation of
	operations on "simdlen".
	* config/i386/i386.c (ix86_simd_clone_compute_vecsize_and_simdlen):
	Printf formats update.
	* gengtype.c (main): Handle poly_uint64.
	* omp-simd-clone.c (simd_clone_mangle): Likewise.Re
	(simd_clone_adjust_return_type): Likewise.
	(create_tmp_simd_array): Likewise.
	(simd_clone_adjust_argument_types): Likewise.
	(simd_clone_init_simd_arrays): Likewise.
	(ipa_simd_modify_function_body): Likewise.
	(simd_clone_adjust): Likewise.
	(expand_simd_clones): Likewise.
	* poly-int-types.h (vector_unroll_factor): New macro.
	* poly-int.h (constant_multiple_p): Add two-argument versions.
	* tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise.
2020-11-03 16:13:47 +00:00
Richard Biener
c5b49c3e09 tree-optimization/97623 - limit PRE hoist insertion
This limits insert iteration caused by PRE insertions generating
hoist insertion opportunities and vice versa.  The patch limits
the hoist insertion iterations to three by default.

2020-11-03  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/97623
	* params.opt (-param=max-pre-hoist-insert-iterations): New.
	* doc/invoke.texi (max-pre-hoist-insert-iterations): Document.
	* tree-ssa-pre.c (insert): Do at most max-pre-hoist-insert-iterations
	hoist insert iterations.
2020-11-03 16:23:06 +01:00
Richard Biener
d0d8a16580 middle-end/97579 - fix VEC_COND_EXPR ISEL optab query
This fixes a mistake in the optab query done by ISEL.  It
doesn't fix the PR but shifts the ICE elsewhere.

2020-11-03  Richard Biener  <rguenther@suse.de>

	PR middle-end/97579
	* gimple-isel.cc (gimple_expand_vec_cond_expr): Use
	the correct types for the vcond_mask/vec_cmp optab queries.
2020-11-03 16:23:06 +01:00
Andrew MacLeod
ea7df355ca More Ranger cache tweaks
This patch splits the individual value propagation out from fill_block_cache,
and calls it from set_global_value when the global value is updated.
This ensures the "current" global value is reflected in the on-entry cache.

	* gimple-range-cache.cc (ssa_global_cache::get_global_range): Return
	true if there was a previous range set.
	(ranger_cache::ranger_cache): Take a gimple_ranger parameter.
	(ranger_cache::set_global_range): Propagate the value if updating.
	(ranger_cache::propagate_cache): Renamed from iterative_cache_update.
	(ranger_cache::propagate_updated_value): New.  Split from:
	(ranger_cache::fill_block_cache): Split out value propagator.
	* gimple-range-cache.h (ssa_global_cache): Update prototypes.
	(ranger_cache): Update prototypes.
2020-11-03 10:17:39 -05:00
Andrew MacLeod
220929c067 Tweaks to ranger cache
Add some bounds checking to ssa_block_ranges, and privatize the
ranges block cache and global cache, adding API points for accessing them.

	* gimple-range-cache.h (block_range_cache): Add new entry point.
	(ranger_cache): Privatize global abnd block cache members.
	* gimple-range-cache.cc (ssa_block_ranges::set_bb_range): Add bounds
	check.
	(ssa_block_ranges::set_bb_varying): Ditto.
	(ssa_block_ranges::get_bb_range): Ditto.
	(ssa_block_ranges::bb_range_p): Ditto.
	(block_range_cache::get_block_ranges): Fix formatting.
	(block_range_cache::query_block_ranges): New.
	(block_range_cache::get_bb_range): Use Query_block_ranges.
	(block_range_cache::bb_range_p): Ditto.
	(ranger_cache::dump): New.
	(ranger_cache::get_global_range): New.
	(ranger_cache::set_global_range): New.
	* gimple-range.cc (gimple_ranger::range_of_expr): Use new API.
	(gimple_ranger::range_of_stmt): Ditto.
	(gimple_ranger::export_global_ranges): Ditto.
	(gimple_ranger::dump): Ditto.
2020-11-03 10:17:39 -05:00
Marek Polacek
c2856ceec2 c++: Tweaks for value_dependent_expression_p.
We may not call value_dependent_expression_p on expressions that are
not potential constant expressions, otherwise value_d could crash,
as I saw recently (in C++98).  So beef up the checking in i_d_e_p.

This revealed a curious issue: when we have __PRETTY_FUNCTION__ in
a template function, we set its DECL_VALUE_EXPR to error_mark_node
(cp_make_fname_decl), so potential_c_e returns false when it gets it,
but value_dependent_expression_p handles it specially and says true.
This broke lambda-generic-pretty1.C.  So take care of that.

And then also tweak uses_template_parms.

gcc/cp/ChangeLog:

	* constexpr.c (potential_constant_expression_1): Treat
	__PRETTY_FUNCTION__ inside a template function as
	potentially-constant.
	* pt.c (uses_template_parms): Call
	instantiation_dependent_expression_p instead of
	value_dependent_expression_p.
	(instantiation_dependent_expression_p): Check
	potential_constant_expression before calling
	value_dependent_expression_p.
2020-11-03 10:09:53 -05:00
Marek Polacek
f620e64a6f c++: Disable -Winit-list-lifetime in unevaluated operand [PR97632]
Jon suggested turning this warning off when we're not actually
evaluating the operand.  This patch does that.

gcc/cp/ChangeLog:

	PR c++/97632
	* init.c (build_new_1): Disable -Winit-list-lifetime for an unevaluated
	operand.

gcc/testsuite/ChangeLog:

	PR c++/97632
	* g++.dg/warn/Winit-list4.C: New test.
2020-11-03 10:09:00 -05:00
Bernd Edlinger
6ff95a6eef Cleanup of a merge mistake in fold-const.c
This removes a duplicated statement.
It was apparently introduced due to a merge mistake.

2020-11-03  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* fold-const.c (getbyterep): Remove duplicated statement.
2020-11-03 15:39:15 +01:00
Bernd Edlinger
23ac7a009e Fix PR97205
This makes sure that stack allocated SSA_NAMEs are
at least MODE_ALIGNED.  Also increase the MEM_ALIGN
for the corresponding rtl objects.

gcc:
2020-11-03  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR target/97205
	* cfgexpand.c (align_local_variable): Make SSA_NAMEs
	at least MODE_ALIGNED.
	(expand_one_stack_var_at): Increase MEM_ALIGN for SSA_NAMEs.

gcc/testsuite:
2020-11-03  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR target/97205
	* gcc.c-torture/compile/pr97205.c: New test.
2020-11-03 15:07:25 +01:00
Nathan Sidwell
d8909271a2 libcpp: unbreak bootstrap
This fixes the bootstrap breakage I caused.  Sorry about that.

	libcpp/
	* init.c (cpp_read_main_file): Use cpp_get_deps result.
2020-11-03 06:03:11 -08:00
zhengnannan
60be12c32c AArch64: Add FLAG for AES/SHA/SM3/SM4 intrinsics [PR94442]
2020-11-03  Zhiheng Xie  <xiezhiheng@huawei.com>
	    Nannan Zheng  <zhengnannan@huawei.com>

gcc/ChangeLog:

	* config/aarch64/aarch64-simd-builtins.def: Add proper FLAG
	for AES/SHA/SM3/SM4 intrinsics.
2020-11-03 13:56:39 +00:00
zhengnannan
c229693ba6 AArch64: Add FLAG for compare intrinsics [PR94442]
2020-11-03  Zhiheng Xie  <xiezhiheng@huawei.com>
	    Nannan Zheng  <zhengnannan@huawei.com>

gcc/ChangeLog:

	* config/aarch64/aarch64-simd-builtins.def: Add proper FLAG
	for compare intrinsics.
2020-11-03 13:56:36 +00:00
Richard Biener
104ca9cfa6 Save some memory at debug stream-in time
This allows us to release references to BLOCKs by not keeping
them rooted in the external_die_map but instead remove it from
there as soon as we created the corresponding stub DIE.  For
decls it doesn't help since we still keep the decl_die_table.

2020-11-03  Richard Biener  <rguenther@suse.de>

	* dwarf2out.c (maybe_create_die_with_external_ref): Remove
	hashtable entry.
2020-11-03 14:51:39 +01:00
Andrea Corallo
ed62f3668b arm: Add vstN_lane_bf16 + vstNq_lane_bf16 intrisics
gcc/ChangeLog

2020-10-29  Andrea Corallo  <andrea.corallo@arm.com>

	* config/arm/arm_neon.h (vst2_lane_bf16, vst2q_lane_bf16)
	(vst3_lane_bf16, vst3q_lane_bf16, vst4_lane_bf16)
	(vst4q_lane_bf16): New intrinsics.
	* config/arm/arm_neon_builtins.def: Touch it for:
	__builtin_neon_vst2_lanev4bf, __builtin_neon_vst2_lanev8bf,
	__builtin_neon_vst3_lanev4bf, __builtin_neon_vst3_lanev8bf,
	__builtin_neon_vst4_lanev4bf,__builtin_neon_vst4_lanev8bf.

gcc/testsuite/ChangeLog

2020-10-29  Andrea Corallo  <andrea.corallo@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/vst2_lane_bf16_indices_1.c:
	Run it also for arm-*-*.
	* gcc.target/aarch64/advsimd-intrinsics/vst2q_lane_bf16_indices_1.c:
	Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vst3_lane_bf16_indices_1.c:
	Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vst3q_lane_bf16_indices_1.c:
	Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vst4_lane_bf16_indices_1.c:
	Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vst4q_lane_bf16_indices_1.c:
	Likewise.
	* gcc.target/arm/simd/vstn_lane_bf16_1.c: New test.
2020-11-03 14:23:08 +01:00
Andrea Corallo
1528f34341 arm: Add vldN_lane_bf16 + vldNq_lane_bf16 intrisics
gcc/ChangeLog

2020-10-29  Andrea Corallo  <andrea.corallo@arm.com>

	* config/arm/arm_neon.h (vld2_lane_bf16, vld2q_lane_bf16)
	(vld3_lane_bf16, vld3q_lane_bf16, vld4_lane_bf16)
	(vld4q_lane_bf16): Add intrinsics.
	* config/arm/arm_neon_builtins.def: Touch for:
	__builtin_neon_vld2_lanev4bf, __builtin_neon_vld2_lanev8bf,
	__builtin_neon_vld3_lanev4bf, __builtin_neon_vld3_lanev8bf,
	__builtin_neon_vld4_lanev4bf, __builtin_neon_vld4_lanev8bf.
	* config/arm/iterators.md (VQ_HS): Add V8BF to the iterator.

gcc/testsuite/ChangeLog

2020-10-29  Andrea Corallo  <andrea.corallo@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/vld2_lane_bf16_indices_1.c:
	Run it also for the arm backend.
	* gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_bf16_indices_1.c:
	Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vld3_lane_bf16_indices_1.c:
	Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_bf16_indices_1.c:
	Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vld4_lane_bf16_indices_1.c:
	Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vld4q_lane_bf16_indices_1.c:
	Likewise.
	* gcc.target/arm/simd/vldn_lane_bf16_1.c: New test.
2020-11-03 14:23:09 +01:00
Andrea Corallo
6170a793b7 arm: Add vst1_bf16 + vst1q_bf16 intrinsics
gcc/ChangeLog

2020-10-29  Andrea Corallo  <andrea.corallo@arm.com>

	* config/arm/arm_neon.h (vst1_bf16, vst1q_bf16): Add intrinsics.
	* config/arm/arm_neon_builtins.def : Touch for:
	__builtin_neon_vst1v4bf, __builtin_neon_vst1v8bf.

gcc/testsuite/ChangeLog

2020-10-29  Andrea Corallo  <andrea.corallo@arm.com>

	* gcc.target/arm/simd/vst1_bf16_1.c: New test.
2020-11-03 14:21:27 +01:00
Andrea Corallo
890076673d arm: Add vld1_bf16 + vld1q_bf16 intrinsics
gcc/ChangeLog

2020-10-29  Andrea Corallo  <andrea.corallo@arm.com>

	* config/arm/arm-builtins.c (VAR14): Define macro.
	* config/arm/arm_neon_builtins.def: Touch for:
	__builtin_neon_vld1v4bf, __builtin_neon_vld1v8bf.
	* config/arm/arm_neon.h (vld1_bf16, vld1q_bf16): Add intrinsics.

gcc/testsuite/ChangeLog

2020-10-29  Andrea Corallo  <andrea.corallo@arm.com>

	* gcc.target/arm/simd/vld1_bf16_1.c: New test.
2020-11-03 14:21:27 +01:00
Andrea Corallo
d65303b699 arm: Add vst1_lane_bf16 + vstq_lane_bf16 intrinsics
gcc/ChangeLog

2020-10-23  Andrea Corallo  <andrea.corallo@arm.com>

	* config/arm/arm_neon.h (vst1_lane_bf16, vst1q_lane_bf16): Add
	intrinsics.
	* config/arm/arm_neon_builtins.def (STORE1LANE): Add v4bf, v8bf.

gcc/testsuite/ChangeLog

2020-10-23  Andrea Corallo  <andrea.corallo@arm.com>

	* gcc.target/arm/simd/vst1_lane_bf16_1.c: New testcase.
	* gcc.target/arm/simd/vstq1_lane_bf16_indices_1.c: Likewise.
	* gcc.target/arm/simd/vst1_lane_bf16_indices_1.c: Likewise.
2020-11-03 14:21:27 +01:00
Andrea Corallo
c9a0276840 arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics
gcc/ChangeLog

2020-10-21  Andrea Corallo  <andrea.corallo@arm.com>

	* config/arm/arm_neon_builtins.def: Add to LOAD1LANE v4bf, v8bf.
	* config/arm/arm_neon.h (vld1_lane_bf16, vld1q_lane_bf16): Add
	intrinsics.

gcc/testsuite/ChangeLog

2020-10-21  Andrea Corallo  <andrea.corallo@arm.com>

	* gcc.target/arm/simd/vld1_lane_bf16_1.c: New testcase.
	* gcc.target/arm/simd/vld1_lane_bf16_indices_1.c: Likewise.
	* gcc.target/arm/simd/vld1q_lane_bf16_indices_1.c: Likewise.
2020-11-03 14:19:52 +01:00
Nathan Sidwell
444655b6f0 c++: cp_tree_equal cleanups
A couple of small fixes.  I noticed bind_template_template_parms was
not marking the parm a template parm (this broke some module
handling).  Debugging CALL_EXPR comparisons led me to refactor
cp_tree_equal's CALL_EXPR code (and my recent fix to debug printing of
same).  Finally TREE_VECS are best compared by comp_template_args.  I
recall that last piece being a left over from fixes during gcc-10.
I've been using it on the modules branch since then.

	gcc/cp/
	* tree.c (bind_template_template_parm): Mark the parm as a
	template parm.
	(cp_tree_equal): Refactor CALL_EXPR.  Use comp_template_args for
	TREE_VECs.
2020-11-03 05:16:31 -08:00
Nathan Sidwell
fbc3f84743 c++: rtti cleanups
Here are a few cleanups from the modules branch.  Generally some RAII,
and a bit of lazy namespace pushing.

	gcc/cp/
	* rtti.c (init_rtti_processing): Move var decl to its init.
	(get_tinfo_decl): Likewise.  Break out creation to called helper
	...
	(get_tinfo_decl_direct): ... here.
	(build_dynamic_cast_1): Move var decls to their initializers.
	(tinfo_base_init): Set decl's location to BUILTINS_LOCATION.
	(get_tinfo_desc): Only push ABI namespace when needed.  Set type's
	context.
2020-11-03 05:16:31 -08:00
Nathan Sidwell
918e8b10a7 libcpp: dependency emission tidying
This patch cleans up the interface to the dependency generation a
little.  We now only check the option in one place, and the
cpp_get_deps function returns nullptr if there are no dependencies.  I
also reworded the -MT and -MQ help text to be make agnostic -- as
there are ideas about emitting, say, JSON.

	libcpp/
	* include/mkdeps.h: Include cpplib.h
	(deps_write): Adjust first parm type.
	* mkdeps.c: Include internal.h
	(make_write): Adjust first parm type.  Check phony option
	directly.
	(deps_write): Adjust first parm type.
	* init.c (cpp_read_main_file): Use get_deps.
	* directives.c (cpp_get_deps): Check option before initializing.
	gcc/c-family/
	* c.opt (MQ,MT): Reword description to be make-agnostic.
	gcc/fortran/
	* cpp.c (gfc_cpp_add_dep): Only add dependency if we're recording
	them.
	(gfc_cpp_init): Likewise for target.
2020-11-03 05:16:19 -08:00
Dennis Zhang
f7d6961126 aarch64: ACLE intrinsics convert BF16 to Float32
This patch enables intrinsics to convert BFloat16 scalar and vector
operands to Float32 modes. The intrinsics are implemented by shifting
each BFloat16 item 16 bits to left using shl/shll/shll2 instructions.

gcc/ChangeLog:

2020-11-03  Dennis Zhang  <dennis.zhang@arm.com>

	* config/aarch64/aarch64-simd-builtins.def(vbfcvt): New entry.
	(vbfcvt_high, bfcvt): Likewise.
	* config/aarch64/aarch64-simd.md(aarch64_vbfcvt<mode>): New entry.
	(aarch64_vbfcvt_highv8bf, aarch64_bfcvtsf): Likewise.
	* config/aarch64/arm_bf16.h (vcvtah_f32_bf16): New intrinsic.
	* config/aarch64/arm_neon.h (vcvt_f32_bf16): Likewise.
	(vcvtq_low_f32_bf16, vcvtq_high_f32_bf16): Likewise.

gcc/testsuite/ChangeLog

	* gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c
	(test_vcvt_f32_bf16, test_vcvtq_low_f32_bf16): New tests.
	(test_vcvtq_high_f32_bf16, test_vcvth_f32_bf16): Likewise.
2020-11-03 13:00:51 +00:00
Richard Biener
9d1b813d0f bootstrap/97666 - fix array of bool allocation
This fixes the bad assumption that sizeof (bool) == 1

2020-11-03  Richard Biener  <rguenther@suse.de>

	PR bootstrap/97666
	* tree-vect-slp.c (vect_build_slp_tree_2): Scale
	allocation of skip_args by sizeof (bool).
2020-11-03 13:33:37 +01:00
Richard Biener
ac6affba97 tree-optimization/80928 - SLP vectorize nested loop induction
This adds SLP vectorization of nested inductions.

2020-11-03  Richard Biener <rguenther@suse.de>

	PR tree-optimization/80928
	* tree-vect-loop.c (vectorizable_induction): SLP vectorize
	nested inductions.

	* gcc.dg/vect/vect-outer-slp-2.c: New testcase.
	* gcc.dg/vect/vect-outer-slp-3.c: Likewise.
2020-11-03 13:33:37 +01:00
Uros Bizjak
a562d44924 testsuite: Fix gcc.target/i386/zero-scratch-regs-*.c scan-asm directives
Improve zero-scratch-regs-*.c scan-asm regexps
and add target selectors for 32bit targets.

2020-11-03  Uroš Bizjak  <ubizjak@gmail.com>

gcc/testsuite/ChangeLog:

	* gcc.target/i386/zero-scratch-regs-1.c: Add ia32 target
	selector where appropriate.  Improve scan-assembler regexp.
	* gcc.target/i386/zero-scratch-regs-2.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-3.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-4.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-5.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-6.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-7.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-8.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-9.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-10.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-13.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-14.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-15.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-16.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-17.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-18.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-19.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-20.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-21.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-22.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-23.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-24.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-25.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-26.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-27.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-28.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-29.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-30.c: Ditto.
	* gcc.target/i386/zero-scratch-regs-31.c: Ditto.
2020-11-03 13:08:04 +01:00
Olivier Hainque
87a9861b06 Add missing require-effective-target lto
This prevents failure of an lto test in configurations
missing LTO support, such as VxWorks for kernel mode.

2020-11-02  Olivier Hainque  <hainque@adacore.com>

gcc/testsuite/
	* gcc.dg/tree-ssa/pr71077.c: Add
	dg-require-effective-target lto.
2020-11-03 11:31:27 +00:00
Olivier Hainque
aa23a2dd53 Add dg-require-effective-target fpic to gcc i386 tests
This change adds

 /* { dg-require-effective-target fpic } */

to tests in gcc.target/i386 that do use -fpic or -fPIC
but don't currently query the target support.

This corresponds to what many other fpic tests do
and helps the vxWorks ports at least, as -fpic is
typically not supported in at least one of the two
major modes of such port (kernel vs RTP).

2020-11-03  Olivier Hainque  <hainque@adacore.com>

gcc/testsuite/

	* gcc.target/i386/pr45352-1.c: Add dg-require-effective-target fpic.
	* gcc.target/i386/pr47602.c: Likewise.
	* gcc.target/i386/pr55151.c: Likewise.
	* gcc.target/i386/pr55458.c: Likewise.
	* gcc.target/i386/pr56348.c: Likewise.
	* gcc.target/i386/pr57097.c: Likewise.
	* gcc.target/i386/pr65753.c: Likewise.
	* gcc.target/i386/pr65915.c: Likewise.
	* gcc.target/i386/pr66232-5.c: Likewise.
	* gcc.target/i386/pr66334.c: Likewise.
	* gcc.target/i386/pr66819-2.c: Likewise.
	* gcc.target/i386/pr67265.c: Likewise.
	* gcc.target/i386/pr81481.c: Likewise.
	* gcc.target/i386/pr83994.c: Likewise.
2020-11-03 11:13:11 +00:00
Jan Hubicka
f89dcf9334 Avoid recursion in tree-inline
gcc/ChangeLog:

2020-11-03  Jan Hubicka  <hubicka@ucw.cz>

	PR ipa/97578
	* ipa-inline-transform.c (maybe_materialize_called_clones): New
	function.
	(inline_transform): Use it.

gcc/testsuite/ChangeLog:

2020-11-03  Jan Hubicka  <hubicka@ucw.cz>

	* gcc.c-torture/compile/pr97578.c: New test.
2020-11-03 11:56:05 +01:00
Richard Biener
8414529156 testsuite/97688 - fix check_vect () with __AVX2__
This fixes the cpuid check to always specify a subleaf zero
which is required to detect AVX2 and doesn't hurt for level one.
Without this fix we get zero runtime coverage when -mavx2 is
specified.

2020-11-03  Richard Biener  <rguenther@suse.de>

	PR testsuite/97688
	* gcc.dg/vect/tree-vect.h (check_vect): Fix the x86 cpuid
	check to always specify subleaf zero.
2020-11-03 11:14:01 +01:00
Richard Biener
f53e9d40de tree-optimization/97678 - fix SLP induction epilogue vectorization
This restores not tracking SLP nodes for induction initial values
in not nested context because this interferes with peeling and
epilogue vectorization.

2020-11-03  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/97678
	* tree-vect-slp.c (vect_build_slp_tree_2): Do not track
	the initial values of inductions when not nested.
	* tree-vect-loop.c (vectorizable_induction): Look at
	PHI node initial values again for SLP and not nested
	inductions.  Handle LOOP_VINFO_MASK_SKIP_NITERS and cost
	invariants.

	* gcc.dg/vect/pr97678.c: New testcase.
2020-11-03 09:56:40 +01:00
Tobias Burnus
0caf400a86 Fortran: Add !GCC$ attributes DEPRECATED
gcc/fortran/ChangeLog:

	* decl.c (ext_attr_list): Add EXT_ATTR_DEPRECATED.
	* gfortran.h (ext_attr_id_t): Ditto.
	* gfortran.texi (GCC$ ATTRIBUTES): Document it.
	* resolve.c (resolve_variable, resolve_function,
	resolve_call, resolve_values): Show -Wdeprecated-declarations warning.
	* trans-decl.c (add_attributes_to_decl): Skip those
	with no middle_end_name.

gcc/testsuite/ChangeLog:

	* gfortran.dg/attr_deprecated.f90: New test.
2020-11-03 09:55:58 +01:00
Uros Bizjak
682ed7ad23 x86: Optimize aes<aeswideklvariant>u8 a bit, fix whitespace
2020-11-03  Uroš Bizjak  <ubizjak@gmail.com>

gcc/

	* config/i386/sse.md (aes<aeswideklvariant>u8):
	Do not use xmm_regs array.  Fix whitespace.
2020-11-03 09:51:01 +01:00
Uros Bizjak
db3f0d218c x86: Fix comment in ix86_expand_builtin
2020-11-03  Uroš Bizjak  <ubizjak@gmail.com>

gcc/

	* config/i386/i386-expand.c (ix86_expand_builtin): Fix comment.
2020-11-03 09:46:59 +01:00
Thomas Schwinge
64dc14b1a7 [OpenACC] Enable inconsistent nested 'reduction' clauses checking for OpenACC 'kernels'
gcc/
	* omp-low.c (scan_omp_for) <OpenACC>: Move earlier inconsistent
	nested 'reduction' clauses checking.
	gcc/testsuite/
	* c-c++-common/goacc/nested-reductions-1-kernels.c: Extend.
	* c-c++-common/goacc/nested-reductions-2-kernels.c: Likewise.
	* gfortran.dg/goacc/nested-reductions-1-kernels.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-2-kernels.f90: Likewise.
2020-11-03 09:35:33 +01:00
Thomas Schwinge
fedf3e94ef [OpenACC] Split up testcases for inconsistent nested 'reduction' clauses checking
gcc/testsuite/
	* c-c++-common/goacc/nested-reductions.c: Split file into...
	* c-c++-common/goacc/nested-reductions-1-kernels.c: ... this...
	* c-c++-common/goacc/nested-reductions-1-parallel.c: ..., this...
	* c-c++-common/goacc/nested-reductions-1-routine.c: ..., and this.
	* c-c++-common/goacc/nested-reductions-warn.c: Split file into...
	* c-c++-common/goacc/nested-reductions-2-kernels.c: ... this...
	* c-c++-common/goacc/nested-reductions-2-parallel.c: ..., this...
	* c-c++-common/goacc/nested-reductions-2-routine.c: ..., and this.
	* gfortran.dg/goacc/nested-reductions.f90: Split file into...
	* gfortran.dg/goacc/nested-reductions-1-kernels.f90: ... this...
	* gfortran.dg/goacc/nested-reductions-1-parallel.f90: ..., this...
	* gfortran.dg/goacc/nested-reductions-1-routine.f90: ..., and
	this.
	* gfortran.dg/goacc/nested-reductions-warn.f90: Split file into...
	* gfortran.dg/goacc/nested-reductions-2-kernels.f90: ... this...
	* gfortran.dg/goacc/nested-reductions-2-parallel.f90: ..., this...
	* gfortran.dg/goacc/nested-reductions-2-routine.f90: ..., and
	this.
2020-11-03 09:35:33 +01:00
Jonathan Yong
08fca4df1d libstdc++: use lt_host_flags for libstdc++.la
For platforms like Mingw and Cygwin, cygwin refuses to generate the
shared library without using -no-undefined.

Attached patch makes sure the right flags are used, since libtool is
already used to link libstdc++.

libstdc++-v3/ChangeLog:

	* src/Makefile.am (libstdc___la_LINK): Add lt_host_flags.
	* src/Makefile.in: Regenerate.
2020-11-03 08:22:53 +00:00
Thomas Schwinge
41f7f6178e [Fortran] More precise location information for OpenACC 'gang', 'worker', 'vector' clauses with argument [PR92793]
gcc/fortran/
	PR fortran/92793
	* trans-openmp.c (gfc_trans_omp_clauses): More precise location
	information for OpenACC 'gang', 'worker', 'vector' clauses with
	argument.
	gcc/testsuite/
	PR fortran/92793
	* gfortran.dg/goacc/pr92793-1.f90: Adjust.
2020-11-03 09:13:07 +01:00
Thomas Schwinge
beddd1762a [OpenACC] More precise diagnostics for 'gang', 'worker', 'vector' clauses with arguments on 'loop' only allowed in 'kernels' regions
Instead of at the location of the 'loop' directive, 'error_at' the location of
the improper clause, and 'inform' at the location of the enclosing parent
compute construct/routine.

The Fortran testcases come with some XFAILing, to be resolved later.

	gcc/
	* omp-low.c (scan_omp_for) <OpenACC>: More precise diagnostics for
	'gang', 'worker', 'vector' clauses with arguments only allowed in
	'kernels' regions.
	gcc/testsuite/
	* c-c++-common/goacc/pr92793-1.c: Extend.
	* gfortran.dg/goacc/pr92793-1.f90: Likewise.
2020-11-03 09:13:07 +01:00
Kewen Lin
f5e18dd9c7 pass: Run cleanup passes before SLP [PR96789]
As the discussion in PR96789, we found that some scalar stmts
which can be eliminated by some passes after SLP, but we still
modeled their costs when trying to SLP, it could impact
vectorizer's decision.  One typical case is the case in PR96789
on target Power.

As Richard suggested there, this patch is to introduce one pass
called pre_slp_scalar_cleanup which has some secondary clean up
passes, for now they are FRE and DSE.  It introduces one new
TODO flags group called pending TODO flags, unlike normal TODO
flags, the pending TODO flags are passed down in the pipeline
until one of its consumers can perform the requested action.
Consumers should then clear the flags for the actions that they
have taken.

Soem compilation time statistics on all SPEC2017 INT bmks were
collected on one Power9 machine for several option sets below:
  A1: -Ofast -funroll-loops
  A2: -O1
  A3: -O1 -funroll-loops
  A4: -O2
  A5: -O2 -funroll-loops

the corresponding increment rate is trivial:
  A1       A2       A3        A4        A5
  0.08%    0.00%    -0.38%    -0.10%    -0.05%

Bootstrapped/regtested on powerpc64le-linux-gnu P8.

gcc/ChangeLog:

	PR tree-optimization/96789
	* function.h (struct function): New member unsigned pending_TODOs.
	* passes.c (class pass_pre_slp_scalar_cleanup): New class.
	(make_pass_pre_slp_scalar_cleanup): New function.
	(pass_data_pre_slp_scalar_cleanup): New pass data.
	* passes.def: (pass_pre_slp_scalar_cleanup): New pass, add
	pass_fre and pass_dse as its children.
	* timevar.def (TV_SCALAR_CLEANUP): New timevar.
	* tree-pass.h (PENDING_TODO_force_next_scalar_cleanup): New
	pending TODO flag.
	(make_pass_pre_slp_scalar_cleanup): New declare.
	* tree-ssa-loop-ivcanon.c (tree_unroll_loops_completely_1):
	Once any outermost loop gets unrolled, flag cfun pending_TODOs
	PENDING_TODO_force_next_scalar_cleanup on.

gcc/testsuite/ChangeLog:

	PR tree-optimization/96789
	* gcc.dg/tree-ssa/ssa-dse-28.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dse-29.c: Likewise.
	* gcc.dg/vect/bb-slp-41.c: Likewise.
	* gcc.dg/tree-ssa/pr96789.c: New test.
2020-11-02 20:55:48 -06:00
Martin Storsjö
bd6ecbe48a libgcc: Expose the instruction pointer and stack pointer in SEH _Unwind_Backtrace
Previously, the SEH version of _Unwind_Backtrace did unwind
the stack and call the provided callback function as intended,
but there was little the caller could do within the callback to
actually get any info about that particular level in the unwind.

Set the ra and cfa pointers, which are used by _Unwind_GetIP
and _Unwind_GetCFA, to allow using these functions from the
callacb to inspect the state at each stack frame.

2020-09-08  Martin Storsjö  <martin@martin.st>

	libgcc/
	* unwind-seh.c (_Unwind_Backtrace): Set the ra and cfa pointers
	before calling the callback.
2020-11-03 00:30:35 +00:00
GCC Administrator
18f8fc9329 Daily bump. 2020-11-03 00:16:23 +00:00
Alan Modra
18963d3bee can_implement_as_sibling_call_p REG_PARM_STACK_SPACE check
This moves an #ifdef block of code from calls.c to
targetm.function_ok_for_sibcall.  Only two targets, x86 and rs6000,
define REG_PARM_STACK_SPACE or OUTGOING_REG_PARM_STACK_SPACE macros
that might vary depending on the called function.  Macros like
UNITS_PER_WORD don't change over a function boundary, nor does the
MIPS ABI, nor does TARGET_64BIT on PA-RISC.  Other targets are even
more trivially proven to not need the calls.c code.

Besides cleaning up a small piece of #ifdef code, the motivation for
this patch is to allow tail calls on PowerPC for functions that
require less reg_parm_stack_space than their caller.  The original
code in calls.c only permitted tail calls when exactly equal, but on
PowerPC we can tail call if the callee has less or equal
REG_PARM_STACK_SPACE than the caller, as demonstrated by the
testcase.  So we should use

  /* If reg parm stack space increases, we cannot sibcall.  */
  if (REG_PARM_STACK_SPACE (decl ? decl : fntype)
      > INCOMING_REG_PARM_STACK_SPACE (current_function_decl))

and note the change to use INCOMING_REG_PARM_STACK_SPACE.
REG_PARM_STACK_SPACE has always been wrong there for PowerPC.  See
https://gcc.gnu.org/pipermail/gcc-patches/2014-May/389867.html for why
if you're curious.  Not that it matters, because PowerPC can do
without this check entirely, relying on a stack slot test in generic
code.

a) The generic code checks that arg passing stack in the callee is not
   greater than that in the caller, and,
b) ELFv2 only allocates reg_parm_stack_space when some parameter is
   passed on the stack.
Point (b) means that zero reg_parm_stack_space implies zero stack
space, and non-zero reg_parm_stack_space implies non-zero stack
space.  So the case of 0 reg_parm_stack_space in the caller and 64 in
the callee will be caught by (a).

gcc/
	PR middle-end/97267
	* calls.h (maybe_complain_about_tail_call): Declare.
	* calls.c (maybe_complain_about_tail_call): Make global.
	(can_implement_as_sibling_call_p): Delete reg_parm_stack_space
	param.  Adjust caller.  Move REG_PARM_STACK_SPACE check to..
	* config/i386/i386.c (ix86_function_ok_for_sibcall): ..here.

gcc/testsuite/
	PR middle-end/97267
	* gcc.target/powerpc/pr97267.c: New test.
2020-11-03 09:36:40 +10:30