Commit graph

190528 commits

Author SHA1 Message Date
GCC Administrator
ad6091d1b8 Daily bump. 2021-12-27 00:16:20 +00:00
H.J. Lu
d87483015d i386: Check AX input in any_mul_highpart peepholes
When applying peephole optimization to transform

	mov imm, %reg0
	mov %reg1, %AX_REG
	imul %reg0

to

	mov imm, %AX_REG
	imul %reg1

disable peephole optimization if reg1 == AX_REG.

gcc/

	PR target/103785
	* config/i386/i386.md: Swap operand order in comments and check
	AX input in any_mul_highpart peepholes.

gcc/testsuite/

	PR target/103785
	* gcc.target/i386/pr103785.c: New test.
2021-12-26 05:09:22 -08:00
Francois-Xavier Coudert
9525c26bf1 Fortran: speed up decimal output of integers
libgfortran/ChangeLog:

	PR libfortran/98076
	* runtime/string.c (itoa64, itoa64_pad19): New helper functions.
	(gfc_itoa): On targets with 128-bit integers, call fast
	64-bit functions to avoid many slow divisions.

gcc/testsuite/ChangeLog:

	PR libfortran/98076
	* gfortran.dg/pr98076.f90: New test.
2021-12-26 12:00:00 +01:00
GCC Administrator
10ae9946dc Daily bump. 2021-12-26 00:16:17 +00:00
Francois-Xavier Coudert
4ae906e46c Fortran: simplify library code for integer-to-decimal conversion
libgfortran/ChangeLog:

	PR libfortran/81986
	PR libfortran/99191

	* libgfortran.h: Remove gfc_xtoa(), adjust gfc_itoa() and
	GFC_ITOA_BUF_SIZE.
	* io/write.c (write_decimal): conversion parameter is always
	gfc_itoa(), so remove it. Protect from overflow.
	(xtoa): Move gfc_xtoa and update its name.
	(xtoa_big): Renamed from ztoa_big for consistency.
	(write_z): Adjust to new function names.
	(write_i, write_integer): Remove last arg of write_decimal.
	* runtime/backtrace.c (error_callback): Comment on the use of
	gfc_itoa().
	* runtime/error.c (gfc_xtoa): Move to io/write.c.
	* runtime/string.c (gfc_itoa): Take an unsigned argument,
	remove the handling of negative values.
2021-12-25 15:07:12 +01:00
GCC Administrator
ffb5418fb7 Daily bump. 2021-12-25 00:16:18 +00:00
Uros Bizjak
8f921393e3 i386: Add V2SFmode DIV insn pattern [PR95046, PR103797]
Use V4SFmode "DIVPS X,Y" with [y0, y1, 1.0f, 1.0f] as a divisor
to avoid division by zero.

2021-12-24  Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog:

	PR target/95046
	PR target/103797
	* config/i386/mmx.md (divv2sf3): New instruction pattern.

gcc/testsuite/ChangeLog:

	PR target/95046
	PR target/103797
	* gcc.target/i386/pr95046-1.c (test_div): Add.
	(dg-options): Add -mno-recip.
2021-12-24 17:09:36 +01:00
Iain Sandoe
43dadcf3e7 Darwin: Amend a comment to be more inclusive [NFC].
As per title.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

gcc/ChangeLog:

	* config/darwin.c (darwin_override_options): Make a comment
	more inclusive.
2021-12-24 10:59:35 +00:00
Iain Sandoe
19bf83a9a0 Darwin: Update rules for handling alignment of globals.
The current rule was too strict and has not been required since Darwin11.

This relaxes the constraint to allow up to 2^28 alignment for non-common
entities.  Common is still restricted to a maximum aligment of 2^15.

When the host is an older version of Darwin ( earlier that 11 ) then the
existing constraint is still applied.  Note that this is a host constraint
not a target one (so that a compilation on 10.7 targeting 10.6 is allowed
to use a greater alignment than the tools on 10.6 support).  This matches
the behaviour of clang.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

gcc/ChangeLog:

	* config.gcc: Emit L2_MAX_OFILE_ALIGNMENT with suitable
	values for the host.
	* config/darwin.c (darwin_emit_common): Error for alignment
	values > 32768.
	* config/darwin.h (MAX_OFILE_ALIGNMENT): Rework to use the
	configured L2_MAX_OFILE_ALIGNMENT.

gcc/testsuite/ChangeLog:

	* gcc.dg/darwin-aligned-globals.c: New test.
	* gcc.dg/darwin-comm-1.c: New test.
	* gcc.dg/attr-aligned.c: Amend for new alignment values on
	Darwin.
	* gcc.target/i386/pr89261.c: Likewise.
2021-12-24 10:47:05 +00:00
Iain Sandoe
8381075ff3 Darwin: Check for that flag-reorder-and-partition.
We were checking whether the flag had been set by the user, but not if
it was set to true.  Which means that the check fails in its intent when
the user puts -fno-reorder-and-partition.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

gcc/ChangeLog:

	* config/darwin.c (darwin_override_options): When checking for the
	flag-reorder-and-partition case, also check that it is set on.
2021-12-24 10:42:35 +00:00
Iain Sandoe
9a4a29eaf2 Darwin: Define OBJECT_FORMAT_MACHO.
There are places that we need to make different codegen depending
on the object format rather than on the arch.  We already have
definitions for ELF, COFF etc. this adds one for MACHO.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

gcc/ChangeLog:

	* config/darwin.h (OBJECT_FORMAT_MACHO): New.
2021-12-24 10:39:25 +00:00
GCC Administrator
7d01da81b8 Daily bump. 2021-12-24 00:16:27 +00:00
H.J. Lu
8f34344ec6 smuldi3_highpart.c: Replace long with long long for -mx32
* gcc.target/i386/smuldi3_highpart.c: Replace long with long long.
2021-12-23 10:07:25 -08:00
Roger Sayle
ef26c151c1 x86: PR target/103773: Fix wrong-code with -Oz from pop to memory.
This is a fix to PR target/103773 where -Oz shouldn't use push/pop
on x86 to shrink writing small integer constants to memory.
Instead clang uses "andl $0, mem" for writing zero, and "orl $-1, mem"
when writing -1 to memory when using -Oz.  This patch implements this
via peephole2 where we can confirm that its ok to clobber the flags.

2021-12-23  Roger Sayle  <roger@nextmovesoftware.com>
	    Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog
	PR target/103773
	* config/i386/i386.md (*mov<mode>_and): New define_insn for
	writing a zero to memory using AND.
	(*mov<mode>_or): Extend to allow memory destination and HImode.
	(*movdi_internal): Remove -Oz push/pop optimization from here.
	(*movsi_internal): Likewise.
	(peephole2): Perform -Oz push/pop optimization here, only for
	register destinations, values other than zero, and in functions
	that don't used the red zone.
	(peephole2): With -Oz, convert writes of 0 or -1 to memory into
	their clobber forms, i.e. *mov<mode>_and and *mov<mode>_or resp.

gcc/testsuite/ChangeLog
	PR target/103773
	* gcc.target/i386/pr103773-2.c: New test case.
	* gcc.target/i386/pr103773.c: New test case.
2021-12-23 12:35:22 +00:00
konglin1
61e53698a0 i386: Enable intrinsics that convert float and bf16 data to each other.
gcc/ChangeLog:

	* config/i386/avx512bf16intrin.h (_mm_cvtsbh_ss): Add new intrinsic.
	(_mm512_cvtpbh_ps): Likewise.
	(_mm512_maskz_cvtpbh_ps): Likewise.
	(_mm512_mask_cvtpbh_ps): Likewise.
	* config/i386/avx512bf16vlintrin.h (_mm_cvtness_sbh): Likewise.
	(_mm_cvtpbh_ps): Likewise.
	(_mm256_cvtpbh_ps): Likewise.
	(_mm_maskz_cvtpbh_ps): Likewise.
	(_mm256_maskz_cvtpbh_ps): Likewise.
	(_mm_mask_cvtpbh_ps): Likewise.
	(_mm256_mask_cvtpbh_ps): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: New test.
	* gcc.target/i386/avx512bf16-vcvtpbh2ps-1.c: Ditto.
	* gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c: Ditto.
	* gcc.target/i386/avx512bf16vl-vcvtpbh2ps-1.c: Ditto.
2021-12-23 17:32:51 +08:00
Feng Xue
9ac0730c25 Fix typo in type verification.
PR ipa/103786

gcc/ChangeLog:

	* tree.c (verify_type): Fix typo.
2021-12-23 09:22:06 +01:00
liuhongt
1a7ce85709 Combine vpcmpuw + zero_extend to vpcmpuw.
vcmp{ps,ph,pd} and vpcmp{,u}{b,w,d,q} implicitly clear the upper bits
of dest.

gcc/ChangeLog:

	PR target/103750
	* config/i386/sse.md
	(*<avx512>_cmp<V48H_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
	New pre_reload define_insn_and_split.
	(*<avx512>_cmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
	Ditto.
	(*<avx512>_ucmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
	Ditto.
	(*<avx512>_ucmp<VI48_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
	Ditto.
	(*<avx512>_cmp<V48H_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
	Ditto.
	(*<avx512>_cmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
	Ditto.
	(*<avx512>_ucmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
	Ditto.
	(*<avx512>_ucmp<VI48_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
	Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512bw-pr103750-1.c: New test.
	* gcc.target/i386/avx512bw-pr103750-2.c: New test.
	* gcc.target/i386/avx512f-pr103750-1.c: New test.
	* gcc.target/i386/avx512f-pr103750-2.c: New test.
	* gcc.target/i386/avx512fp16-pr103750-1.c: New test.
	* gcc.target/i386/avx512fp16-pr103750-2.c: New test.
2021-12-23 13:42:55 +08:00
GCC Administrator
9f9bc0bf0d Daily bump. 2021-12-23 00:16:29 +00:00
Harald Anlauf
ff0ad4b5e1 Fortran: BOZ literal constants are not interoperable
gcc/fortran/ChangeLog:

	PR fortran/103778
	* check.c (is_c_interoperable): A BOZ literal constant is not
	interoperable.

gcc/testsuite/ChangeLog:

	PR fortran/103778
	* gfortran.dg/illegal_boz_arg_3.f90: New test.
2021-12-22 19:34:20 +01:00
Harald Anlauf
5474092c9a Fortran: CASE selector expressions must be scalar
gcc/fortran/ChangeLog:

	PR fortran/103776
	* match.c (match_case_selector): Reject expressions in CASE
	selector which are not scalar.

gcc/testsuite/ChangeLog:

	PR fortran/103776
	* gfortran.dg/select_10.f90: New test.
2021-12-22 19:34:20 +01:00
Murray Steele
9c1ce17bc4 arm: Declare MVE types internally via pragma
Move the implementation of MVE ACLE types from arm_mve_types.h to
inside GCC via a new pragma, which replaces the prior type
definitions.  This allows for the types to be used internally for
intrinsic function definitions.

gcc/ChangeLog:

	* config.gcc (arm*-*-*): Add arm-mve-builtins.o to extra_objs.
	* config/arm/arm-c.c (arm_pragma_arm): Handle "#pragma GCC arm".
	(arm_register_target_pragmas): Register it.
	* config/arm/arm-protos.h: (arm_mve::arm_handle_mve_types_h): New
	prototype.
	* config/arm/arm_mve_types.h: Replace MVE type definitions with
	new pragma.
	* config/arm/t-arm: (arm-mve-builtins.o): New target rule.
	* config/arm/arm-mve-builtins.cc: New file.
	* config/arm/arm-mve-builtins.def: New file.
	* config/arm/arm-mve-builtins.h: New file.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/mve.exp: Add new subdirectories.
	* gcc.target/arm/mve/general-c/type_redef_1.c: New test.
	* gcc.target/arm/mve/general/double_pragmas_1.c: New test.
	* gcc.target/arm/mve/general/nomve_1.c: New test.
2021-12-22 14:57:46 +00:00
Murray Steele
8c61cefe2b arm: Move arm_simd_info array declaration into header
Move the arm_simd_type and arm_type_qualifiers enums, and
arm_simd_info struct from arm-builtins.c into arm-builtins.h header.

This is a first step towards internalising the type definitions for
MVE predicate, vector, and tuple types.  By moving arm_simd_types into
a header, we allow future patches to use these type trees externally
to arm-builtins.c, which is a crucial step towards developing an MVE
intrinsics framework similar to the current SVE implementation.

gcc/ChangeLog:

	* config/arm/arm-builtins.c (enum arm_type_qualifiers): Move to
	arm_builtins.h.
	(enum arm_simd_type): Move to arm-builtins.h.
	(struct arm_simd_type_info): Move to arm-builtins.h.
	* config/arm/arm-builtins.h (enum arm_simd_type): Move from
	arm-builtins.c.
	(enum arm_type_qualifiers): Move from arm-builtins.c.
	(struct arm_simd_type_info): Move from arm-builtins.c.
2021-12-22 14:57:29 +00:00
Francois-Xavier Coudert
228173565e Fortran: allow __float128 on targets where long double is not REAL(KIND=10)
The logic for detection of REAL(KIND=16) in kinds-override.h made
assumptions:

    -- if real(kind=10) exists, i.e. if HAVE_GFC_REAL_10 is defined,
       then it is necessarily the "long double" type
    -- if real(kind=16) exists, then:
       * if HAVE_GFC_REAL_10, real(kind=16) is "__float128"
       * otherwise, real(kind=16) is "long double"

This may not always be true. Take the aarch64-apple-darwin port,
it has double == long double == binary64, and __float128 == binary128.

We already have more fine-grained logic in the mk-kinds-h.sh script,
where we actually check the Fortran kind corresponding to C’s long
double. So let's use it, and emit the GFC_REAL_16_IS_FLOAT128 /
GFC_REAL_16_IS_LONG_DOUBLE macros there.

libgfortran/ChangeLog:

	* kinds-override.h: Move GFC_REAL_16_IS_* macros...
	* mk-kinds-h.sh: ... here.
2021-12-22 12:46:07 +01:00
Martin Liska
63eb073efb docs: docs: use ';' for function declarations. (part 3)
gcc/ChangeLog:

	* doc/extend.texi: Unify all function declarations in examples
	where some miss trailing ';'.
2021-12-22 12:17:25 +01:00
Martin Liska
3892cfee77 docs: docs: use ';' for function declarations. (part 2)
gcc/ChangeLog:

	* doc/extend.texi: Unify all function declarations in examples
	where some miss trailing ';'.
2021-12-22 12:07:41 +01:00
Martin Liska
1a6592ff65 docs: use ';' for function declarations.
gcc/ChangeLog:

	* doc/extend.texi: Unify all function declarations in examples
	where some miss trailing ';'.
2021-12-22 11:59:28 +01:00
Martin Liska
3e1a06ec94 docs: Unify instruct set name.
gcc/ChangeLog:

	* doc/extend.texi: Use uppercase letters for SSEx.
2021-12-22 11:20:42 +01:00
GCC Administrator
aa17859b68 Daily bump. 2021-12-22 00:16:30 +00:00
Iain Buclaw
7c6ae994fb config: Add check whether D compiler works (PR103528)
As well as checking for the existence of a GDC compiler, also validate
that it has also been built with libphobos, otherwise warn or fail with
the message that GDC is required to build d.

config/ChangeLog:

	PR d/103528
	* acx.m4 (ACX_PROG_GDC): Add check whether D compiler works.

ChangeLog:

	* configure: Regenerate.
2021-12-21 21:29:35 +01:00
Iain Buclaw
0c3fc06c30 libphobos: Add power*-*-freebsd* as supported target
This has been tested on powerpc64-freebsd13 and powerpc64le-freebsd13,
and used to build dub, along with some D tools from ports.

libphobos/ChangeLog:

	* configure.tgt: Add power*-*-freebsd* as a supported target.
2021-12-21 16:07:08 +01:00
Jiang Haochen
d22907975b i386: Add missing BMI intrinsic to align with clang
gcc/ChangeLog:

	* config/i386/bmiintrin.h (_tzcnt_u16): New intrinsic.
	(_andn_u32): Ditto.
	(_andn_u64): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/bmi-1.c: Add test for new intrinsic.
	* gcc.target/i386/bmi-2.c: Ditto.
	* gcc.target/i386/bmi-3.c: Ditto.
2021-12-21 16:30:16 +08:00
Martin Liska
6fad101f30 config.sub: change mode to 755.
ChangeLog:

	* config.sub: Change mode back to 755.
2021-12-21 09:10:57 +01:00
Xionghu Luo
51a24e4a98 Don't move cold code out of loop by checking bb count
v8 changes:
1. Use hotter_than_inner_loop instead of colder to store a hotter loop
nearest to loop.
2. Update the logic in fill_coldest_and_hotter_out_loop and
get_coldest_out_loop to make common case O(1).
3. Update function argument bb_colder_than_loop_preheader.
4. Make cached array to vec<class *loop> for index checking.

v7 changes:
1. Refine get_coldest_out_loop to replace loop with checking
pre-computed coldest_outermost_loop and colder_than_inner_loop.
2. Add function fill_cold_out_loop, compute coldest_outermost_loop and
colder_than_inner_loop recursively without loop.

v6 changes:
1. Add function fill_coldest_out_loop to pre compute the coldest
outermost loop for each loop.
2. Rename find_coldest_out_loop to get_coldest_out_loop.
3. Add testcase ssa-lim-22.c to differentiate with ssa-lim-19.c.

v5 changes:
1. Refine comments for new functions.
2. Use basic_block instead of count in bb_colder_than_loop_preheader
to align with function name.
3. Refine with simpler implementation for get_coldest_out_loop and
ref_in_loop_hot_body::operator for better understanding.

v4 changes:
1. Sort out profile_count comparision to function bb_cold_than_loop_preheader.
2. Update ref_in_loop_hot_body::operator () to find cold_loop before compare.
3. Split RTL invariant motion part out.
4. Remove aux changes.

v3 changes:
1. Handle max_loop in determine_max_movement instead of outermost_invariant_loop.
2. Remove unnecessary changes.
3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in can_sm_ref_p.
4. "gsi_next (&bsi);" in move_computations_worker is kept since it caused
infinite loop when implementing v1 and the iteration is missed to be
updated actually.

v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html
v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html
v3: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580211.html
v4: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581231.html
v5: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581961.html
...
v8: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586209.html

There was a patch trying to avoid move cold block out of loop:

https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html

Richard suggested to "never hoist anything from a bb with lower execution
frequency to a bb with higher one in LIM invariantness_dom_walker
before_dom_children".

In gimple LIM analysis, add get_coldest_out_loop to move invariants to
expected target loop, if profile count of the loop bb is colder
than target loop preheader, it won't be hoisted out of loop.
Likely for store motion, if all locations of the REF in loop is cold,
don't do store motion of it.

SPEC2017 performance evaluation shows 1% performance improvement for
intrate GEOMEAN and no obvious regression for others.  Especially,
500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is
largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00%
on P8LE.

gcc/ChangeLog:

2021-12-21  Xionghu Luo  <luoxhu@linux.ibm.com>

	* tree-ssa-loop-im.c (bb_colder_than_loop_preheader): New
	function.
	(get_coldest_out_loop): New function.
	(determine_max_movement): Use get_coldest_out_loop.
	(move_computations_worker): Adjust and fix iteration udpate.
	(class ref_in_loop_hot_body): New functor.
	(ref_in_loop_hot_body::operator): New.
	(can_sm_ref_p): Use for_all_locs_in_loop.
	(fill_coldest_and_hotter_out_loop): New.
	(tree_ssa_lim_finalize): Free coldest_outermost_loop and
	hotter_than_inner_loop.
	(loop_invariant_motion_in_fun): Call fill_coldest_and_hotter_out_loop.

gcc/testsuite/ChangeLog:

2021-12-21  Xionghu Luo  <luoxhu@linux.ibm.com>

	* gcc.dg/tree-ssa/recip-3.c: Adjust.
	* gcc.dg/tree-ssa/ssa-lim-19.c: New test.
	* gcc.dg/tree-ssa/ssa-lim-20.c: New test.
	* gcc.dg/tree-ssa/ssa-lim-21.c: New test.
	* gcc.dg/tree-ssa/ssa-lim-22.c: New test.
	* gcc.dg/tree-ssa/ssa-lim-23.c: New test.
2021-12-20 21:12:50 -06:00
Xionghu Luo
cd5ae148c4 Fix loop split incorrect count and probability
In tree-ssa-loop-split.c, split_loop and split_loop_on_cond does two
kind of split. split_loop only works for single loop and insert edge at
exit when split, while split_loop_on_cond is not limited to single loop
and insert edge at latch when split.  Both split behavior should consider
loop count and probability update.  For split_loop, loop split condition
is moved in front of loop1 and loop2; But split_loop_on_cond moves the
condition between loop1 and loop2, this patch does:
 1) profile count proportion for both original loop and copied loop
without dropping down the true branch's count;
 2) probability update in the two loops and between the two loops.

Regression tested pass.

Changes diff for split_loop and split_loop_on_cond cases:

1) diff base/loop-split.c.151t.lsplit patched/loop-split.c.152t.lsplit
...
   <bb 2> [local count: 118111600]:
   if (beg_5(D) < end_8(D))
     goto <bb 14>; [89.00%]
   else
     goto <bb 6>; [11.00%]

   <bb 14> [local count: 105119324]:
   if (beg2_6(D) < c_9(D))
-    goto <bb 15>; [100.00%]
+    goto <bb 15>; [33.00%]
   else
-    goto <bb 16>; [100.00%]
+    goto <bb 16>; [67.00%]

-  <bb 15> [local count: 105119324]:
+  <bb 15> [local count: 34689377]:
   _25 = beg_5(D) + 1;
   _26 = end_8(D) - beg_5(D);
   _27 = beg2_6(D) + _26;
   _28 = MIN_EXPR <c_9(D), _27>;

-  <bb 3> [local count: 955630225]:
+  <bb 3> [local count: 315357973]:
   # i_16 = PHI <i_11(8), beg_5(D)(15)>
   # j_17 = PHI <j_12(8), beg2_6(D)(15)>
   printf ("a: %d %d\n", i_16, j_17);
   i_11 = i_16 + 1;
   j_12 = j_17 + 1;
   if (j_12 < _28)
-    goto <bb 8>; [89.00%]
+    goto <bb 8>; [29.37%]
   else
-    goto <bb 17>; [11.00%]
+    goto <bb 17>; [70.63%]

-  <bb 8> [local count: 850510901]:
+  <bb 8> [local count: 280668596]:
   goto <bb 3>; [100.00%]

-  <bb 16> [local count: 105119324]:
+  <bb 16> [local count: 70429947]:
   # i_22 = PHI <beg_5(D)(14), i_29(17)>
   # j_23 = PHI <beg2_6(D)(14), j_30(17)>

   <bb 10> [local count: 955630225]:
   # i_2 = PHI <i_22(16), i_20(13)>
   # j_1 = PHI <j_23(16), j_21(13)>
   i_20 = i_2 + 1;
   j_21 = j_1 + 1;
   if (end_8(D) > i_20)
-    goto <bb 13>; [89.00%]
+    goto <bb 13>; [59.63%]
   else
-    goto <bb 9>; [11.00%]
+    goto <bb 9>; [40.37%]

-  <bb 13> [local count: 850510901]:
+  <bb 13> [local count: 569842305]:
   goto <bb 10>; [100.00%]

   <bb 17> [local count: 105119324]:
   # i_29 = PHI <i_11(3)>
   # j_30 = PHI <j_12(3)>
   if (end_8(D) > i_29)
     goto <bb 16>; [80.00%]
   else
     goto <bb 9>; [20.00%]

   <bb 9> [local count: 105119324]:

   <bb 6> [local count: 118111600]:
   return 0;

 }
   <bb 2> [local count: 118111600]:
-  if (beg_5(D) < end_8(D))
+  _1 = end_6(D) - beg_7(D);
+  j_9 = _1 + beg2_8(D);
+  if (end_6(D) > beg_7(D))
     goto <bb 14>; [89.00%]
   else
     goto <bb 6>; [11.00%]

   <bb 14> [local count: 105119324]:
-  if (beg2_6(D) < c_9(D))
-    goto <bb 15>; [100.00%]
+  if (j_9 >= c_11(D))
+    goto <bb 15>; [33.00%]
   else
-    goto <bb 16>; [100.00%]
+    goto <bb 16>; [67.00%]

-  <bb 15> [local count: 105119324]:
-  _25 = beg_5(D) + 1;
-  _26 = end_8(D) - beg_5(D);
-  _27 = beg2_6(D) + _26;
-  _28 = MIN_EXPR <c_9(D), _27>;
-
-  <bb 3> [local count: 955630225]:
-  # i_16 = PHI <i_11(8), beg_5(D)(15)>
-  # j_17 = PHI <j_12(8), beg2_6(D)(15)>
-  printf ("a: %d %d\n", i_16, j_17);
-  i_11 = i_16 + 1;
-  j_12 = j_17 + 1;
-  if (j_12 < _28)
-    goto <bb 8>; [89.00%]
+  <bb 15> [local count: 34689377]:
+  _27 = end_6(D) + -1;
+  _28 = beg_7(D) - end_6(D);
+  _29 = j_9 + _28;
+  _30 = _29 + 1;
+  _31 = MAX_EXPR <c_11(D), _30>;
+
+  <bb 3> [local count: 315357973]:
+  # i_18 = PHI <i_13(8), end_6(D)(15)>
+  # j_19 = PHI <j_14(8), j_9(15)>
+  printf ("a: %d %d\n", i_18, j_19);
+  i_13 = i_18 + -1;
+  j_14 = j_19 + -1;
+  if (j_14 >= _31)
+    goto <bb 8>; [29.37%]
   else
-    goto <bb 17>; [11.00%]
+    goto <bb 17>; [70.63%]

-  <bb 8> [local count: 850510901]:
+  <bb 8> [local count: 280668596]:
   goto <bb 3>; [100.00%]

-  <bb 16> [local count: 105119324]:
-  # i_22 = PHI <beg_5(D)(14), i_29(17)>
-  # j_23 = PHI <beg2_6(D)(14), j_30(17)>
+  <bb 16> [local count: 70429947]:
+  # i_24 = PHI <end_6(D)(14), i_32(17)>
+  # j_25 = PHI <j_9(14), j_33(17)>

   <bb 10> [local count: 955630225]:
-  # i_2 = PHI <i_22(16), i_20(13)>
-  # j_1 = PHI <j_23(16), j_21(13)>
-  i_20 = i_2 + 1;
-  j_21 = j_1 + 1;
-  if (end_8(D) > i_20)
+  # i_3 = PHI <i_24(16), i_22(13)>
+  # j_2 = PHI <j_25(16), j_23(13)>
+  i_22 = i_3 + -1;
+  j_23 = j_2 + -1;
+  if (beg_7(D) < i_22)
     goto <bb 13>; [89.00%]
   else
     goto <bb 9>; [11.00%]

-  <bb 13> [local count: 850510901]:
+  <bb 13> [local count: 569842305]:
   goto <bb 10>; [100.00%]

   <bb 17> [local count: 105119324]:
-  # i_29 = PHI <i_11(3)>
-  # j_30 = PHI <j_12(3)>
-  if (end_8(D) > i_29)
+  # i_32 = PHI <i_13(3)>
+  # j_33 = PHI <j_14(3)>
+  if (beg_7(D) < i_32)
     goto <bb 16>; [80.00%]
   else
     goto <bb 9>; [20.00%]

   <bb 9> [local count: 105119324]:

   <bb 6> [local count: 118111600]:
   return 0;

 }

2) diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
...
   <bb 2> [local count: 118111600]:
   if (n_7(D) > 0)
     goto <bb 4>; [89.00%]
   else
     goto <bb 3>; [11.00%]

   <bb 3> [local count: 118111600]:
   return;

   <bb 4> [local count: 105119324]:
   pretmp_3 = ga;

-  <bb 5> [local count: 955630225]:
+  <bb 5> [local count: 315357973]:
   # i_13 = PHI <i_10(20), 0(4)>
   # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
   if (prephitmp_12 != 0)
     goto <bb 6>; [33.00%]
   else
     goto <bb 7>; [67.00%]

   <bb 6> [local count: 315357972]:
   _2 = do_something ();
   ga = _2;

-  <bb 7> [local count: 955630225]:
+  <bb 7> [local count: 315357973]:
   # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
   i_10 = inc (i_13);
   if (n_7(D) > i_10)
     goto <bb 21>; [89.00%]
   else
     goto <bb 11>; [11.00%]

   <bb 11> [local count: 105119324]:
   goto <bb 3>; [100.00%]

-  <bb 21> [local count: 850510901]:
+  <bb 21> [local count: 280668596]:
   if (prephitmp_12 != 0)
-    goto <bb 20>; [100.00%]
+    goto <bb 20>; [33.00%]
   else
-    goto <bb 19>; [INV]
+    goto <bb 19>; [67.00%]

-  <bb 20> [local count: 850510901]:
+  <bb 20> [local count: 280668596]:
   goto <bb 5>; [100.00%]

-  <bb 19> [count: 0]:
+  <bb 19> [local count: 70429947]:
   # i_23 = PHI <i_10(21)>
   # prephitmp_25 = PHI <prephitmp_5(21)>

-  <bb 12> [local count: 955630225]:
+  <bb 12> [local count: 640272252]:
   # i_15 = PHI <i_23(19), i_22(16)>
   # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
   i_22 = inc (i_15);
   if (n_7(D) > i_22)
     goto <bb 16>; [89.00%]
   else
     goto <bb 11>; [11.00%]

-  <bb 16> [local count: 850510901]:
+  <bb 16> [local count: 569842305]:
   goto <bb 12>; [100.00%]

 }

gcc/ChangeLog:

2021-12-21  Xionghu Luo  <luoxhu@linux.ibm.com>

	* tree-ssa-loop-split.c (split_loop): Fix incorrect
	profile_count and probability.
	(do_split_loop_on_cond): Likewise.
2021-12-20 21:12:05 -06:00
Xionghu Luo
46bfe1b0e1 Fix incorrect loop exit edge probability [PR103270]
r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
profile-estimate when predict_extra_loop_exits, outer loop's exit edge
is marked as inner loop's extra loop exit and set with incorrect
prediction, then a hot inner loop will become cold loop finally through
optimizations, this patch add loop check when searching extra exit edges
to avoid unexpected predict_edge from predict_paths_for_bb.

Regression tested on P8LE.

gcc/ChangeLog:

2021-12-21  Xionghu Luo  <luoxhu@linux.ibm.com>

	PR middle-end/103270
	* predict.c (predict_extra_loop_exits): Add loop parameter.
	(predict_loops): Call with loop argument.

gcc/testsuite/ChangeLog:

2021-12-21  Xionghu Luo  <luoxhu@linux.ibm.com>

	PR middle-end/103270
	* gcc.dg/pr103270.c: New test.
2021-12-20 21:10:46 -06:00
Xionghu Luo
460d53f816 rs6000: Replace UNSPECS with ss_plus/us_plus and ss_minus/us_minus
These four UNSPECS seems could be replaced with native RTL.

For
"(set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))":

Quoted David's explanation:

"The design came from the early implementation of Altivec:

https://gcc.gnu.org/pipermail/gcc-patches/2002-May/077409.html

If one later checks for saturation (reads VSCR), one needs a
corresponding SET of the value.  It's set in an architecture-specific
manner that isn't described to GCC, but it's set, not just clobbered
and in an undefined state.

The RTL does not describe that VSCR is set to the value 0.  The
(const_int 0) is not the value set.  You can think of the (const_int
0) as a dummy RTL argument to the VSCR UNSPEC.  UNSPEC requires at
least one argument and the pattern doesn't try to express the
argument, so it uses a dummy RTL constant.  It's part of a PARALLEL
and the plus or minus already expresses the data dependency of the
pattern on the input operands."

gcc/ChangeLog:

2021-12-21  Xionghu Luo  <luoxhu@linux.ibm.com>

	* config/rs6000/altivec.md (altivec_vaddu<VI_char>s): Replace
	UNSPEC_VADDU with us_plus.
	(altivec_vadds<VI_char>s): Replace UNSPEC_VADDS with ss_plus.
	(altivec_vsubu<VI_char>s): Replace UNSPEC_VSUBU with us_minus.
	(altivec_vsubs<VI_char>s): Replace UNSPEC_VSUBS with ss_minus.
	(altivec_abss_<mode>): Likewise.
2021-12-20 21:02:50 -06:00
GCC Administrator
7631a4d1de Daily bump. 2021-12-21 00:16:24 +00:00
Joseph Myers
bb42d680d5 Update cpplib es.po
* es.po: Update.
2021-12-20 23:09:37 +00:00
Uros Bizjak
72c68d7ad9 i386: Fix <sse2p4_1>_pinsr<ssemodesuffix> and its splitters [PR103772]
The clever trick to duplicate the value of the input operand into itself
proved not so clever after all.  The splitter should not clobber the input
operand in any case, since the register can hold the value outside the HImode
lowpart when accessed as subreg.  Use the standard earlyclobber approach
instead.

The testcase fails with avx2 ISA, but I was not able to create the testcase
that wouldn't require -mavx512fp16 compile flag.

2021-12-20  Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog:

	PR target/103772
	* config/i386/sse.md (<sse2p4_1>_pinsr<ssemodesuffix>): Add
	earlyclobber to (x,x,x,i) alternative.
	(<sse2p4_1>_pinsr<ssemodesuffix> peephole2): Remove.
	(<sse2p4_1>_pinsr<ssemodesuffix> splitter): Use output
	operand as a temporary register.  Split after reload_completed.
2021-12-20 21:16:39 +01:00
Patrick Palka
ab85331c58 c++: memfn lookup consistency in incomplete-class ctx
When instantiating a call to a member function of a class template, we
repeat the member function lookup in order to obtain the corresponding
partially instantiated functions.  Within an incomplete-class context
however, we need to be more careful when repeating the lookup because we
don't want to introduce later-declared member functions that weren't
visible at template definition time.  We're currently not careful enough
in this respect, which causes us to reject memfn1.C below.

This patch fixes this issue by making tsubst_baselink filter out from
the instantiation-time lookup those member functions that were invisible
at template definition time.  This is really only necessary within an
incomplete-class context, so this patch adds a heuristic flag to BASELINK
to help us avoid needlessly performing this filtering step (which would
be a no-op) in complete-class contexts.

This is also necessary for the ahead-of-time overload set pruning
implemented in r12-6075 to be effective for member functions within
class templates.

gcc/cp/ChangeLog:

	* call.c (build_new_method_call): Set
	BASELINK_FUNCTIONS_MAYBE_INCOMPLETE_P on the pruned baselink.
	* cp-tree.h (BASELINK_FUNCTIONS_MAYBE_INCOMPLETE_P): Define.
	* pt.c (filter_memfn_lookup): New subroutine of tsubst_baselink.
	(tsubst_baselink): Use filter_memfn_lookup on the new lookup
	result when BASELINK_FUNCTIONS_MAYBE_INCOMPLETE_P is set on the
	old baselink.  Remove redundant BASELINK_P check.
	* search.c (build_baselink): Set
	BASELINK_FUNCTIONS_MAYBE_INCOMPLETE_P appropriately.

gcc/testsuite/ChangeLog:

	* g++.dg/lookup/memfn1.C: New test.
	* g++.dg/template/non-dependent16b.C: New test.
2021-12-20 15:02:40 -05:00
Iain Buclaw
b3f58f87d7 d: Merge upstream dmd ad8412530, druntime fd9a4544, phobos 495e835c2.
D front-end changes:

    - Import dmd v2.098.1
    - Remove calling of _d_delstruct from code generator.

Druntime changes:

    - Import druntime v2.098.1

Phobos changes:

    - Import phobos v2.098.1

gcc/d/ChangeLog:

	* dmd/MERGE: Merge upstream dmd ad8412530.
	* expr.cc (ExprVisitor::visit (DeleteExp *)): Remove code generation
	of _d_delstruct.
	* runtime.def (DELSTRUCT): Remove.

libphobos/ChangeLog:

	* libdruntime/MERGE: Merge upstream druntime fd9a4544.
	* src/MERGE: Merge upstream phobos 495e835c2.
2021-12-20 19:29:43 +01:00
Olivier Hainque
7d5d5032c7 Fix static array size in gcc.dg/vect/vect-simd-20.c
10000 / 78 is strictly greater than 128 so we will
actually do 128+1 strides in foo() for s == 78 and p[]
needs to be dimensioned accordingly.

2021-12-20  Olivier Hainque  <hainque@adacore.com>

gcc/testsuite/
	* gcc.dg/vect/vect-simd-20.c: Fix size of p[]
	to accommodate the number of strides performed
	by foo() for s == 78.
2021-12-20 16:41:09 +00:00
Roger Sayle
c9c466ea33 x86_64: Improve code expanded for highpart multiplications.
While working on a middle-end patch to more aggressively use highpart
multiplications on targets that support them, I noticed that the RTL
expanded by the x86 backend interacts poorly with register allocation
leading to suboptimal code.

For the testcase,
typedef int __attribute ((mode(TI))) ti_t;
long foo(long x)
{
  return ((ti_t)x * 19065) >> 64;
}

we'd like to avoid:
foo:	movq    %rdi, %rax
        movl    $19065, %edx
        imulq   %rdx
        movq    %rdx, %rax
        ret

and would prefer:
foo:	movl    $19065, %eax
        imulq   %rdi
        movq    %rdx, %rax
        ret

This patch provides a pair of peephole2 transformations to tweak the
spills generated by reload, and at the same time replaces the current
define_expand with a define_insn pattern using the new [su]mul_highpart
RTX codes.

2021-12-20  Roger Sayle  <roger@nextmovesoftware.com>
	    Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog
	* config/i386/i386.md (any_mul_highpart): New code iterator.
	(sgnprefix, s): Add attribute support for [su]mul_highpart.
	(<s>mul<mode>3_highpart): Delete expander.
	(<s>mul<mode>3_highpart, <s>mulsi32_highpart_zext):
	New define_insn patterns.
	(define_peephole2): Tweak the register allocation for the above
	instructions after reload.

gcc/testsuite/ChangeLog
	* gcc.target/i386/smuldi3_highpart.c: New test case.
2021-12-20 15:22:18 +00:00
Joel Sherrill
1f56dbe2da Obsolete m32c-rtems target
2021-12-20  Joel Sherrill <joel@rtems.org>

gcc/
	* config.gcc: Obsolete m32c-*-rtems* target.
2021-12-20 08:49:18 -06:00
Patrick Palka
2decd2cabe c++: ahead-of-time overload set pruning for non-dep calls
This patch makes us remember the function selected by overload resolution
during ahead of time processing of a non-dependent call expression, so
that at instantiation time we avoid repeating some of the work of overload
resolution for the call.  Note that we already do this for non-dependent
operator expressions via build_min_non_dep_op_overload.

Some caveats:

 * When processing ahead of time a non-dependent call to a member
   function template of a currently open class template (as in
   g++.dg/template/deduce4.C), we end up generating an "inside-out"
   partial instantiation such as S<T>::foo<int, int>(), the likes of
   which we're apparently not prepared to fully instantiate.  So in this
   situation, we instead prune to the selected template instead of the
   specialization in this situation.

 * This change triggered a latent FUNCTION_DECL pretty printing issue
   in cpp0x/error2.C -- since we now resolve the call to foo<0> ahead
   of time, the error now looks like:

     error: expansion pattern ‘foo()()=0’ contains no parameter pack

   where the FUNCTION_DECL for foo<0> is clearly misprinted.  But this
   pretty-printing issue could be reproduced without this patch if
   we define foo as a non-template function.  Since this testcase was
   added to verify pretty printing of TEMPLATE_ID_EXPR, I work around
   this test failure by making the call to foo type-dependent and thus
   immune to this ahead of time pruning.

 * We now reject parts of cpp0x/fntmp-equiv1.C because we notice that
   the non-dependent call d(f, b) in

     int d(int, int);
     template <unsigned long f, unsigned b, typename> e<d(f, b)> d();

   is non-constexpr.  Since this testcase is about equivalency of
   dependent names in the context of declaration matching, it seems the
   best fix here is to make the calls to d, d2 and d3 within the
   function signatures dependent.

gcc/cp/ChangeLog:

	* call.c (build_new_method_call): For a non-dependent call
	expression inside a template, returning a templated tree
	whose overload set contains just the selected function.
	* semantics.c (finish_call_expr): Likewise.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/error2.C: Make the call to foo type-dependent in
	order to avoid latent pretty-printing issue for FUNCTION_DECL
	inside MODOP_EXPR.
	* g++.dg/cpp0x/fntmp-equiv1.C: Make the calls to d, d2 and d3
	within the function signatures dependent.
	* g++.dg/template/non-dependent16.C: New test.
	* g++.dg/template/non-dependent16a.C: New test.
	* g++.dg/template/non-dependent17.C: New test.
2021-12-20 09:28:20 -05:00
Martin Liska
7424323bd5 jit: Fix -Wodr warning
gcc/jit/libgccjit.c:3957:8: warning: type 'struct version_info' violates the C++ One Definition Rule [-Wodr]

../../gcc/jit/libgccjit.c:3957:8: warning: type 'struct version_info' violates the C++ One Definition Rule [-Wodr]
 3957 | struct version_info

../../gcc/tree-ssa-loop-ivopts.c:181: note: a different type is defined in another translation unit
  181 | struct version_info

gcc/jit/ChangeLog:

	* libgccjit.c (struct version_info): Rename to jit_version_info.
	(struct jit_version_info): Likewise.
	(gcc_jit_version_major): Likewise.
	(gcc_jit_version_minor): Likewise.
	(gcc_jit_version_patchlevel): Likewise.
2021-12-20 12:35:28 +01:00
Martin Liska
8d081c0093 opts: Support -Oz in -Ox option hints.
gcc/ChangeLog:

	* opts.c (default_options_optimization): Support -Oz in -Ox option hints.
2021-12-20 12:35:24 +01:00
Jan Hubicka
8d1e342b4a Fix handling of deferred SSA names in modref dataflow
In the testcase we fail to analyze SSA name because flag do_dataflow is set
and thus triggers early exist in analyze_ssa_name.  Fixed by disabling
early exits when handling deferred names.

gcc/ChangeLog:

2021-12-20  Jan Hubicka  <hubicka@ucw.cz>

	PR ipa/103669
	* ipa-modref.c (modref_eaf_analysis::analyze_ssa_name): Add deferred
	parameter.
	(modref_eaf_analysis::propagate): Use it.

gcc/testsuite/ChangeLog:

2021-12-20  Jan Hubicka  <hubicka@ucw.cz>

	PR ipa/103669
	* g++.dg/torture/pr103669.C: New test.
2021-12-20 08:43:13 +01:00
liuhongt
19dcecd963 Optimize bit_and op1 float_vector_all_ones_operands to op1.
gcc/ChangeLog:

	PR target/98468
	* config/i386/sse.md (*bit_and_float_vector_all_ones): New
	pre-reload splitter.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr98468.c: New test.
2021-12-20 09:53:06 +08:00
GCC Administrator
29309f6e29 Daily bump. 2021-12-20 00:16:21 +00:00