Commit graph

204994 commits

Author SHA1 Message Date
Andrew Pinski
a5e69e9459 MATCH: Simplify (X &| B) CMP X if possible [PR 101590]
I noticed we were missing these simplifications so let's add them.

This adds the following simplifications:
U & N <= U  -> true
U & N >  U  -> false
When U is known to be as non-negative.

When N is also known to be non-negative, this is also true:
U | N <  U  -> false
U | N >= U  -> true

When N is a negative integer, the result flips and we get:
U | N <  U  -> true
U | N >= U  -> false

We could extend this later on to be the case where we know N
is nonconstant but is known to be negative.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

	PR tree-optimization/101590
	PR tree-optimization/94884

gcc/ChangeLog:

	* match.pd (`(X BIT_OP Y) CMP X`): New pattern.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/bitcmp-1.c: New test.
	* gcc.dg/tree-ssa/bitcmp-2.c: New test.
	* gcc.dg/tree-ssa/bitcmp-3.c: New test.
	* gcc.dg/tree-ssa/bitcmp-4.c: New test.
	* gcc.dg/tree-ssa/bitcmp-5.c: New test.
	* gcc.dg/tree-ssa/bitcmp-6.c: New test.
2023-10-27 00:49:30 -07:00
liuhongt
7eed861e8c Support vec_cmpmn/vcondmn for v2hf/v4hf.
gcc/ChangeLog:

	PR target/103861
	* config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle
	V2HF/V2BF/V4HF/V4BFmode.
	* config/i386/i386.cc (ix86_get_mask_mode): Return QImode when
	data_mode is V4HF/V2HFmode.
	* config/i386/mmx.md (vec_cmpv4hfqi): New expander.
	(vcond_mask_<mode>v4hi): Ditto.
	(vcond_mask_<mode>qi): Ditto.
	(vec_cmpv2hfqi): Ditto.
	(vcond_mask_<mode>v2hi): Ditto.
	(mmx_plendvb_<mode>): Add 2 combine splitters after the
	patterns.
	(mmx_pblendvb_v8qi): Ditto.
	(<code>v2hi3): Add a combine splitter after the pattern.
	(<code><mode>3): Ditto.
	(<code>v8qi3): Ditto.
	(<code><mode>3): Ditto.
	* config/i386/sse.md (vcond<mode><mode>): Merge this with ..
	(vcond<sseintvecmodelower><mode>): .. this into ..
	(vcond<VI2HFBF_AVX512VL:mode><VHF_AVX512VL:mode>): .. this,
	and extend to V8BF/V16BF/V32BFmode.

gcc/testsuite/ChangeLog:

	* g++.target/i386/part-vect-vcondhf.C: New test.
	* gcc.target/i386/part-vect-vec_cmphf.c: New test.
2023-10-27 10:55:50 +08:00
GCC Administrator
ecca503bf4 Daily bump. 2023-10-27 00:17:12 +00:00
Juzhe-Zhong
446efa52a8 RISC-V: Move lmul calculation into macro
Notice we calculate LMUL according to --param=riscv-autovec-lmul
in multiple places: int lmul = riscv_autovec_lmul == RVV_DYNAMIC ? RVV_M8 : riscv_autovec_lmul;

Create a new macro for it for easier matain.

gcc/ChangeLog:

	* config/riscv/riscv-opts.h (TARGET_MAX_LMUL): New macro.
	* config/riscv/riscv-v.cc (preferred_simd_mode): Adapt macro.
	(autovectorize_vector_modes): Ditto.
	(can_find_related_mode_p): Ditto.
2023-10-27 07:03:32 +08:00
Juzhe-Zhong
e37bc2cf00 RISC-V: Add AVL propagation PASS for RVV auto-vectorization
This patch addresses the redundant AVL/VL toggling in RVV partial auto-vectorization
which is a known issue for a long time and I finally find the time to address it.

Consider a simple vector addition operation:

https://godbolt.org/z/7hfGfEjW3

void
foo (int *__restrict a,
     int *__restrict b,
     int *__restrict n)
{
  for (int i = 0; i < n; i++)
      a[i] = a[i] + b[i];
}

Optimized IR:

Loop body:
  _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]);                          -> vsetvli a5,a2,e8,mf4,ta,ma
  ...
  vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0);    -> vle32.v v2,0(a0)
  vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0);   -> vle32.v v1,0(a1)
  vect__7.12_19 = vect__6.11_20 + vect__4.8_27;                              -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
  .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19);  -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)

We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling.
The AVL/VL toggling is because we are missing LEN information in simple PLUS_EXPR GIMPLE assignment:

vect__7.12_19 = vect__6.11_20 + vect__4.8_27;

GCC apply partial predicate load/store and un-predicated full vector operation on partial vectorization.
Such flow are used by all other targets like ARM SVE (RVV also uses such flow):

ARM SVE:

.L3:
        ld1w    z30.s, p7/z, [x0, x3, lsl 2]   -> predicated load
        ld1w    z31.s, p7/z, [x1, x3, lsl 2]   -> predicated load
        add     z31.s, z31.s, z30.s            -> un-predicated add
        st1w    z31.s, p7, [x0, x3, lsl 2]     -> predicated store

Such vectorization flow causes AVL/VL toggling on RVV so we need AVL propagation PASS for it.

Also, It's very unlikely that we can apply predicated operations on all vectorization for following reasons:

1. It's very heavy workload to support them on all vectorization and we don't see any benefits if we can handle that on targets backend.
2. Changing Loop vectorizer for it will make code base ugly and hard to maintain.
3. We will need so many patterns for all operations. Not only COND_LEN_ADD, COND_LEN_SUB, ....
   We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over 100+ patterns, unreasonable number of patterns.

To conclude, we prefer un-predicated operations here, and design a nice and clean AVL propagation PASS for it to elide the redundant vsetvls
due to AVL/VL toggling.

The second question is that why we separate a PASS called AVL propagation. Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS)

Frankly, I was planning to address such issue in VSETVL PASS that's why we recently refactored VSETVL PASS. However, I changed my mind recently after several
experiments and tries.

The reasons as follows:

1. For code base management and maintainience. Current VSETVL PASS is complicated enough and aleady has enough aggressive and fancy optimizations which
   turns out it can always generate optimal codegen in most of the cases. It's not a good idea keep adding more features into VSETVL PASS to make VSETVL
	 PASS become heavy and heavy again, then we will need to refactor it again in the future.
	 Actuall, the VSETVL PASS is very stable and optimal after the recent refactoring. Hopefully, we should not change VSETVL PASS any more except the minor
	 fixes.

2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 different things,  I don't think we should fuse them into same PASS.

3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before RA which can reduce register allocation.

4. This patch's AVL propagation PASS only does AVL propagation for RVV partial auto-vectorization situations.
   This patch's codes are only hundreds lines which is very managable and can be very easily extended features and enhancements.
	 We can easily extend and enhance more AVL propagation in a clean and separate PASS in the future. (If we do it on VSETVL PASS, we will complicate
	 VSETVL PASS again which is already so complicated.)

Here is an example to demonstrate more:

https://godbolt.org/z/bE86sv3q5

void foo2 (int *__restrict a,
          int *__restrict b,
          int *__restrict c,
          int *__restrict a2,
          int *__restrict b2,
          int *__restrict c2,
          int *__restrict a3,
          int *__restrict b3,
          int *__restrict c3,
          int *__restrict a4,
          int *__restrict b4,
          int *__restrict c4,
          int *__restrict a5,
          int *__restrict b5,
          int *__restrict c5,
          int n)
{
    for (int i = 0; i < n; i++){
      a[i] = b[i] + c[i];
      b5[i] = b[i] + c[i];
      a2[i] = b2[i] + c2[i];
      a3[i] = b3[i] + c3[i];
      a4[i] = b4[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a5[i] + b5[i]+ a[i];

      a[i] = a[i] + c[i];
      b5[i] = a[i] + c[i];
      a2[i] = a[i] + c2[i];
      a3[i] = a[i] + c3[i];
      a4[i] = a[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a[i] + b5[i]+ a[i];
    }
}

1. Loop Body:

Before this patch:                                          After this patch:

	      vsetvli a4,t1,e8,mf4,ta,ma                           vsetvli	a4,t1,e32,m1,ta,ma
        vle32.v v2,0(a2)                                     vle32.v	v2,0(a2)
        vle32.v v4,0(a1)                                     vle32.v	v3,0(t2)
        vle32.v v1,0(t2)                                     vle32.v	v4,0(a1)
        vsetvli a7,zero,e32,m1,ta,ma                         vle32.v	v1,0(t0)
        vadd.vv v4,v2,v4                                     vadd.vv	v4,v2,v4
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv	v1,v3,v1
        vle32.v v3,0(s0)                                     vadd.vv	v1,v1,v4
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv	v1,v1,v4
        vadd.vv v1,v3,v1                                     vadd.vv	v1,v1,v4
        vadd.vv v1,v1,v4                                     vadd.vv	v1,v1,v2
        vadd.vv v1,v1,v4                                     vadd.vv	v2,v1,v2
        vadd.vv v1,v1,v4                                     vse32.v	v2,0(t5)
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv	v2,v2,v1
        vle32.v v4,0(a5)                                     vadd.vv	v2,v2,v1
        vsetvli a7,zero,e32,m1,ta,ma                         slli	a7,a4,2
        vadd.vv v1,v1,v2                                     vadd.vv	v3,v1,v3
        vadd.vv v2,v1,v2                                     vle32.v	v5,0(a5)
        vadd.vv v4,v1,v4                                     vle32.v	v6,0(t6)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v	v3,0(t3)
        vse32.v v2,0(t5)                                     vse32.v	v2,0(a0)
        vse32.v v4,0(a3)                                     vadd.vv	v3,v3,v1
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv	v2,v1,v5
        vadd.vv v3,v1,v3                                     vse32.v	v3,0(t4)
        vadd.vv v2,v2,v1                                     vadd.vv	v1,v1,v6
        vadd.vv v2,v2,v1                                     vse32.v	v2,0(a3)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v	v1,0(a6)
        vse32.v v2,0(a0)
        vse32.v v3,0(t3)
        vle32.v v2,0(t0)
        vsetvli a7,zero,e32,m1,ta,ma
        vadd.vv v3,v3,v1
        vsetvli zero,a4,e32,m1,ta,ma
        vse32.v v3,0(t4)
        vsetvli a7,zero,e32,m1,ta,ma
        slli    a7,a4,2
        vadd.vv v1,v1,v2
        sub     t1,t1,a4
        vsetvli zero,a4,e32,m1,ta,ma
        vse32.v v1,0(a6)

It's quite obvious, all heavy && redundant vsetvls inside loop body are eliminated.

2. Epilogue:
    Before this patch:                                          After this patch:

     .L5:                                                      .L5:
        ld      s0,8(sp)                                         ret
        addi    sp,sp,16
        jr      ra

This is the benefit we do the AVL propation before RA since we eliminate the use of 'a7' register
which is used by the redudant AVL/VL toggling instruction: 'vsetvli a7,zero,e32,m1,ta,ma'

The final codegen after this patch:

foo2:
	lw	t1,56(sp)
	ld	t6,0(sp)
	ld	t3,8(sp)
	ld	t0,16(sp)
	ld	t2,24(sp)
	ld	t4,32(sp)
	ld	t5,40(sp)
	ble	t1,zero,.L5
.L3:
	vsetvli	a4,t1,e32,m1,ta,ma
	vle32.v	v2,0(a2)
	vle32.v	v3,0(t2)
	vle32.v	v4,0(a1)
	vle32.v	v1,0(t0)
	vadd.vv	v4,v2,v4
	vadd.vv	v1,v3,v1
	vadd.vv	v1,v1,v4
	vadd.vv	v1,v1,v4
	vadd.vv	v1,v1,v4
	vadd.vv	v1,v1,v2
	vadd.vv	v2,v1,v2
	vse32.v	v2,0(t5)
	vadd.vv	v2,v2,v1
	vadd.vv	v2,v2,v1
	slli	a7,a4,2
	vadd.vv	v3,v1,v3
	vle32.v	v5,0(a5)
	vle32.v	v6,0(t6)
	vse32.v	v3,0(t3)
	vse32.v	v2,0(a0)
	vadd.vv	v3,v3,v1
	vadd.vv	v2,v1,v5
	vse32.v	v3,0(t4)
	vadd.vv	v1,v1,v6
	vse32.v	v2,0(a3)
	vse32.v	v1,0(a6)
	sub	t1,t1,a4
	add	a1,a1,a7
	add	a2,a2,a7
	add	a5,a5,a7
	add	t6,t6,a7
	add	t0,t0,a7
	add	t2,t2,a7
	add	t5,t5,a7
	add	a3,a3,a7
	add	a6,a6,a7
	add	t3,t3,a7
	add	t4,t4,a7
	add	a0,a0,a7
	bne	t1,zero,.L3
.L5:
	ret

	PR target/111318
	PR target/111888

gcc/ChangeLog:

	* config.gcc: Add AVL propagation pass.
	* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto.
	* config/riscv/riscv-protos.h (make_pass_avlprop): Ditto.
	* config/riscv/t-riscv: Ditto.
	* config/riscv/riscv-avlprop.cc: New file.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
	* gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto.
	* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto.
	* gcc.target/riscv/rvv/autovec/pr111318.c: New test.
	* gcc.target/riscv/rvv/autovec/pr111888.c: New test.
Tested-by: Patrick O'Neill <patrick@rivosinc.com>
2023-10-27 07:01:55 +08:00
Jonathan Wakely
0c305f3dec libstdc++: Fix exception thrown by std::shared_lock::unlock() [PR112089]
The incorrect errc constant here looks like a copy&paste error.

libstdc++-v3/ChangeLog:

	PR libstdc++/112089
	* include/std/shared_mutex (shared_lock::unlock): Change errc
	constant to operation_not_permitted.
	* testsuite/30_threads/shared_lock/locking/112089.cc: New test.
2023-10-26 21:10:47 +01:00
Jonathan Wakely
7d06b29f81 libstdc++: Add dg-timeout-factor to <chrono> IO tests
This avoids failures due to compilation timeouts when testing with a low
tool_timeout value.

libstdc++-v3/ChangeLog:

	* testsuite/20_util/duration/io.cc: Double timeout using
	dg-timeout-factor.
	* testsuite/std/time/day/io.cc: Likewise.
	* testsuite/std/time/format.cc: Likewise.
	* testsuite/std/time/hh_mm_ss/io.cc: Likewise.
	* testsuite/std/time/month/io.cc: Likewise.
	* testsuite/std/time/month_day/io.cc: Likewise.
	* testsuite/std/time/month_day_last/io.cc: Likewise.
	* testsuite/std/time/month_weekday/io.cc: Likewise.
	* testsuite/std/time/month_weekday_last/io.cc: Likewise.
	* testsuite/std/time/weekday/io.cc: Likewise.
	* testsuite/std/time/weekday_indexed/io.cc: Likewise.
	* testsuite/std/time/weekday_last/io.cc: Likewise.
	* testsuite/std/time/year/io.cc: Likewise.
	* testsuite/std/time/year_month/io.cc: Likewise.
	* testsuite/std/time/year_month_day/io.cc: Likewise.
	* testsuite/std/time/year_month_day_last/io.cc: Likewise.
	* testsuite/std/time/year_month_weekday/io.cc: Likewise.
	* testsuite/std/time/year_month_weekday_last/io.cc: Likewise.
	* testsuite/std/time/zoned_time/io.cc: Likewise.
2023-10-26 21:10:47 +01:00
David Malcolm
cd7dadcd27 Add attribute((null_terminated_string_arg(PARAM_IDX)))
This patch adds a new function attribute to GCC for marking that an
argument is expected to be a null-terminated string.

For example, consider:

  void test_a (const char *p)
    __attribute__((null_terminated_string_arg (1)));

which would indicate to humans and compilers that argument 1 of "test_a"
is expected to be a null-terminated string, with the idea:

- we should complain if it's not valid to read from *p up to the first
  '\0' character in the buffer

- we should complain if *p is not terminated, or if it's uninitialized
  before the first '\0' character

This is independent of the nonnull-ness of the pointer: if you also want
to express that the argument must be non-null, we already have
__attribute__((nonnull (N))), so the user can write e.g.:

  void test_b (const char *p)
    __attribute__((null_terminated_string_arg (1))
    __attribute__((nonnull (1)));

which can also be spelled as:

  void test_b (const char *p)
     __attribute__((null_terminated_string_arg (1),
                    nonnull (1)));

For a function similar to strncpy, we can use the "access" attribute to
express a maximum size of the read:

  void test_c (const char *p, size_t sz)
     __attribute__((null_terminated_string_arg (1),
                    nonnull (1),
                    access (read_only, 1, 2)));

The patch implements:
(a) C/C++ frontends: recognition of this attribute
(b) analyzer: usage of this attribute

gcc/analyzer/ChangeLog:
	* region-model.cc
	(region_model::check_external_function_for_access_attr): Split
	out, replacing with...
	(region_model::check_function_attr_access): ...this new function
	and...
	(region_model::check_function_attrs): ...this new function.
	(region_model::check_one_function_attr_null_terminated_string_arg):
	New.
	(region_model::check_function_attr_null_terminated_string_arg):
	New.
	(region_model::handle_unrecognized_call): Update for renaming of
	check_external_function_for_access_attr to check_function_attrs.
	(region_model::check_for_null_terminated_string_arg): Add return
	value to one overload.  Make both overloads const.
	* region-model.h: Include "stringpool.h" and "attribs.h".
	(region_model::check_for_null_terminated_string_arg): Add return
	value to one overload.  Make both overloads const.
	(region_model::check_external_function_for_access_attr): Delete
	decl.
	(region_model::check_function_attr_access): New decl.
	(region_model::check_function_attr_null_terminated_string_arg):
	New decl.
	(region_model::check_one_function_attr_null_terminated_string_arg):
	New decl.
	(region_model::check_function_attrs): New decl.

gcc/c-family/ChangeLog:
	* c-attribs.cc (c_common_attribute_table): Add
	"null_terminated_string_arg".
	(handle_null_terminated_string_arg_attribute): New.

gcc/ChangeLog:
	* doc/extend.texi (Common Function Attributes): Add
	null_terminated_string_arg.

gcc/testsuite/ChangeLog:
	* c-c++-common/analyzer/attr-null_terminated_string_arg-access-read_write.c:
	New test.
	* c-c++-common/analyzer/attr-null_terminated_string_arg-access-without-size.c:
	New test.
	* c-c++-common/analyzer/attr-null_terminated_string_arg-multiple.c:
	New test.
	* c-c++-common/analyzer/attr-null_terminated_string_arg-nonnull-2.c:
	New test.
	* c-c++-common/analyzer/attr-null_terminated_string_arg-nonnull-sized.c:
	New test.
	* c-c++-common/analyzer/attr-null_terminated_string_arg-nonnull.c:
	New test.
	* c-c++-common/analyzer/attr-null_terminated_string_arg-nullable-sized.c:
	New test.
	* c-c++-common/analyzer/attr-null_terminated_string_arg-nullable.c:
	New test.
	* c-c++-common/attr-null_terminated_string_arg.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-10-26 15:57:40 -04:00
Iain Sandoe
46f51bd73b testsuite, aarch64: Normalise options to aarch64.exp.
When the compiler is configured --with-cpu= and that is different from
the baselines assumed, we see excess tes fails (primarly in body code
scans which are necessarily sensitive to costs).  To stabilize the
testsuite against such changes, use aarch64-with-arch-dg-options ()
to provide suitable consistent defaults.

e.g. for --with-cpu=xgene1 we see over 100 excess fails which are
removed by this change.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/aarch64.exp: Use aarch64-with-arch-dg-options
	to normaize the options to the tests in aarch64.exp.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2023-10-26 20:41:22 +01:00
Iain Sandoe
8a1fcca720 testsuite, Darwin: Adjust target test for modern OS.
The same conditions on use of DYLD_LIBRARY_PATH apply to OS versions
11 to 14, so make the test general.

gcc/testsuite/ChangeLog:

	* lib/target-libpath.exp: Skip DYLD_LIBRARY_PATH for all
	current OS versions > 10.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2023-10-26 20:23:46 +01:00
Andrew Pinski
662655e22d match: Simplify a != C1 ? abs(a) : C2 when C2 == abs(C1) [PR111957]
This adds a match pattern for `a != C1 ? abs(a) : C2` which gets simplified
to `abs(a)`. if C1 was originally *_MIN then change it over to use absu instead
of abs.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

	PR tree-optimization/111957

gcc/ChangeLog:

	* match.pd (`a != C1 ? abs(a) : C2`): New pattern.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/phi-opt-40.c: New test.
2023-10-26 18:58:40 +00:00
Paul-Antoine Arras
abd78dc610 Add effective target to OpenMP tests
This adds an effective target DejaGnu directive to prevent these testcases from
failing on GCC configurations that do not support OpenMP.
This fixes 8d2130a4e5.

gcc/testsuite/ChangeLog:

	* gfortran.dg/c_ptr_tests_20.f90: Add "fopenmp" effective target.
	* gfortran.dg/c_ptr_tests_21.f90: Add "fopenmp" effective target.
2023-10-26 18:58:54 +02:00
Aldy Hernandez
3c8abcedaa [range-op] Remove unused variable in fold_range.
gcc/ChangeLog:

	* range-op-float.cc (range_operator::fold_range): Delete unused
	variable.
2023-10-26 12:56:20 -04:00
Aldy Hernandez
848b5f3ab7 [range-ops] Remove unneeded parameters from rv_fold.
Now that the floating point version of rv_fold calculates its result
in an frange, we can remove the superfluous LB, UB, and MAYBE_NAN
arguments.

gcc/ChangeLog:

	* range-op-float.cc (range_operator::fold_range): Remove
	superfluous code.
	(range_operator::rv_fold): Remove unneeded arguments.
	(operator_plus::rv_fold): Same.
	(operator_minus::rv_fold): Same.
	(operator_mult::rv_fold): Same.
	(operator_div::rv_fold): Same.
	* range-op-mixed.h: Remove lb, ub, and maybe_nan arguments from
	rv_fold methods.
	* range-op.h: Same.
2023-10-26 12:52:15 -04:00
Aldy Hernandez
24e97ac46c [range-ops] Add frange& argument to rv_fold.
The floating point version of rv_fold returns its result in 3 pieces:
the lower bound, the upper bound, and a maybe_nan bit.  It is cleaner
to return everything in an frange, thus bringing the floating point
version of rv_fold in line with the integer version.

This first patch adds an frange argument, while keeping the current
functionality, and asserting that we get the same results.  In a
follow-up patch I will nuke the now useless 3 arguments.  Splitting
this into two patches makes it easier to bisect any problems if any
should arise.

gcc/ChangeLog:

	* range-op-float.cc (range_operator::fold_range): Pass frange
	argument to rv_fold.
	(range_operator::rv_fold): Add frange argument.
	(operator_plus::rv_fold): Same.
	(operator_minus::rv_fold): Same.
	(operator_mult::rv_fold): Same.
	(operator_div::rv_fold): Same.
	* range-op-mixed.h: Add frange argument to rv_fold methods.
	* range-op.h: Same.
2023-10-26 12:52:14 -04:00
Patrick O'Neill
4d49685d67
RISC-V: Pass abi to g++ rvv testsuite
On rv32gcv testcases like g++.target/riscv/rvv/base/bug-22.C fail with:
FAIL: g++.target/riscv/rvv/base/bug-22.C (test for excess errors)
Excess errors:
cc1plus: error: ABI requires '-march=rv32'

This patch adds the -mabi argument to g++ rvv tests.

gcc/testsuite/ChangeLog:

	* g++.target/riscv/rvv/rvv.exp: Add -mabi argument to CFLAGS.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-10-26 09:20:25 -07:00
Thomas Schwinge
d8ff4b96b4 libatomic: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951]
Similar to commit fb5d27be27
"libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]",
this is commit 5ff06d762a
"libatomic/test: Fix compilation for build sysroot" done differently,
avoiding build-tree testing use of any random gunk that may appear in
build-time 'CC'.

	PR testsuite/109951
	libatomic/
	* configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'.
	* Makefile.in: Regenerate.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
	* testsuite/lib/libatomic.exp (libatomic_init): If
	'--with-build-sysroot=[...]' was specified, use it for build-tree
	testing.
	* testsuite/libatomic-site-extra.exp.in (GCC_UNDER_TEST): Don't
	set.
	(SYSROOT_CFLAGS_FOR_TARGET): Set.
2023-10-26 18:04:11 +02:00
Thomas Schwinge
967d4171b2 libffi: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951]
Similar to commit fb5d27be27
"libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]",
this is commit a0b48358cb
"libffi/test: Fix compilation for build sysroot" done differently,
avoiding build-tree testing use of any random gunk that may appear in
build-time 'CC', 'CXX'.

	PR testsuite/109951
	libffi/
	* configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'.
	<local.exp>: Don't set 'CC_FOR_TARGET', 'CXX_FOR_TARGET', instead
	set 'SYSROOT_CFLAGS_FOR_TARGET'.
	* Makefile.in: Regenerate.
	* configure: Likewise.
	* include/Makefile.in: Likewise.
	* man/Makefile.in: Likewise.
	* testsuite/Makefile.in: Likewise.
	* testsuite/lib/libffi.exp (libffi_target_compile): If
	'--with-build-sysroot=[...]' was specified, use it for build-tree
	testing.
2023-10-26 18:03:07 +02:00
Richard Sandiford
8a0fceee46 testsuite: Allow general skips/requires in PCH tests
dg-pch.exp handled dg-require-effective-target pch_supported_debug
as a special case, by grepping the source code.  This patch tries
to generalise it to other dg-require-effective-targets, and to
dg-skip-if.

There also seemed to be some errors in check-flags.  It used:

    lappend $args [list <elt>]

which treats the contents of args as a variable name.  I think
it was supposed to be "lappend args" instead.  From the later
code, the element was supposed to be <elt> itself, rather than
a singleton list containing <elt>.

We can also save some time by doing the common early-exit first.

Doing this removes the need to specify the dg-require-effective-target
in both files.  Tested by faking unsupported debug and checking that
the tests were still correctly skipped.

gcc/testsuite/
	* lib/target-supports-dg.exp (check-flags): Move default argument
	handling further up.  Fix a couple of issues in the lappends.
	Avoid frobbing the compiler flags if the return value is already
	known to be 1.
	* lib/dg-pch.exp (dg-flags-pch): Process the dg-skip-if and
	dg-require-effective-target directives to see whether the
	assembly test should be skipped.
	* gcc.dg/pch/valid-1.c: Remove dg-require-effective-target.
	* gcc.dg/pch/valid-1b.c: Likewise.
2023-10-26 16:35:47 +01:00
Richard Ball
7006e5d2d7 arm: Use deltas for Arm switch tables
For normal optimization for the Arm state in gcc we get an uncompressed
table of jump targets. This is in the middle of the text segment
far larger than necessary, especially at -Os.
This patch compresses the table to use deltas in a similar manner to
Thumb code generation.
Similar code is also used for -fpic where we currently generate a jump
to a jump. In this format the jumps are too dense for the hardware branch
predictor to handle accurately, so execution is likely to be very expensive.

Changes to switch statements for arm include a new function to handle the
assembly generation for different machine modes. This allows for more
optimisation to be performed in aout.h where arm has switched from using
ASM_OUTPUT_ADDR_VEC_ELT to using ASM_OUTPUT_ADDR_DIFF_ELT.
In ASM_OUTPUT_ADDR_DIFF_ELT new assembly generation options have been
added to utilise the different machine modes. Additional changes
made to the casesi expand and insn, CASE_VECTOR_PC_RELATIVE,
CASE_VECTOR_SHORTEN_MODE and LABEL_ALIGN_AFTER_BARRIER are all
to accomodate this new approach to switch statement generation.

New tests have been added and no regressions on arm-none-eabi.

gcc/ChangeLog:

	* config/arm/aout.h (ASM_OUTPUT_ADDR_DIFF_ELT): Add table output
	for different machine modes for arm.
	* config/arm/arm-protos.h (arm_output_casesi): New prototype.
	* config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Make arm use
	ASM_OUTPUT_ADDR_DIFF_ELT.
	(CASE_VECTOR_SHORTEN_MODE): Change table size calculation for
	TARGET_ARM.
	(LABEL_ALIGN_AFTER_BARRIER): Change to accommodate .p2align 2
	for TARGET_ARM.
	* config/arm/arm.cc (arm_output_casesi): New function.
	* config/arm/arm.md (arm_casesi_internal): Change casesi expand
	and insn.
	for arm to use new function arm_output_casesi.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/arm-switchstatement.c: New test.
2023-10-26 16:18:50 +01:00
Iain Sandoe
2ae00adb32 Darwin: Make metadata symbol lables linker-visible for GNU objc.
Now we have shifted to using the same relocation mechanism as clang for
objective-c typeinfo the static linker needs to have a linker-visible
symbol for metadata names (this is only needed for GNU objective C, for
NeXT the names are in separate sections).

gcc/ChangeLog:

	* config/darwin.h
	(darwin_label_is_anonymous_local_objc_name): Make metadata names
	linker-visibile for GNU objective C.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2023-10-26 15:59:34 +01:00
Vladimir N. Makarov
f55cdce3f8 [RA]: Modfify cost calculation for dealing with equivalences
RISCV target developers reported that pseudos with equivalence used in
a loop can be spilled.  Simple changes of heuristics of cost
calculation of pseudos with equivalence or even ignoring equivalences
resulted in numerous testsuite failures on different targets or worse
spec2017 performance.  This patch implements more sophisticated cost
calculations of pseudos with equivalences.  The patch does not change
RA behaviour for targets still using the old reload pass instead of
LRA.  The patch solves the reported problem and improves x86-64
specint2017 a bit (specfp2017 performance stays the same).  The patch
takes into account how the equivalence will be used: will it be
integrated into the user insns or require an input reload insn.  It
requires additional pass over insns.  To compensate RA slow down, the
patch removes a pass over insns in the reload pass used by IRA before.
This also decouples IRA from reload more and will help to remove the
reload pass in the future if it ever happens.

gcc/ChangeLog:

	* dwarf2out.cc (reg_loc_descriptor): Use lra_eliminate_regs when
	LRA is used.
	* ira-costs.cc: Include regset.h.
	(equiv_can_be_consumed_p, get_equiv_regno, calculate_equiv_gains):
	New functions.
	(find_costs_and_classes): Call calculate_equiv_gains and redefine
	mem_cost of pseudos with equivs when LRA is used.
	* var-tracking.cc: Include ira.h and lra.h.
	(vt_initialize): Use lra_eliminate_regs when LRA is used.
2023-10-26 09:52:14 -04:00
Paul-Antoine Arras
8d2130a4e5 Fortran: Fix incompatible types between INTEGER(8) and TYPE(c_ptr)
In the context of an OpenMP declare variant directive, arguments of type C_PTR
are sometimes recognised as C_PTR in the base function and as INTEGER(8) in the
variant - or the other way around, depending on the parsing order.
This patch prevents such situation from turning into a compile error.

2023-10-20  Paul-Antoine Arras  <pa@codesourcery.com>
	    Tobias Burnus  <tobias@codesourcery.com>

gcc/fortran/ChangeLog:

	* interface.cc (gfc_compare_types): Return true if one type is C_PTR
	and the other is a compatible INTEGER(8).
	* misc.cc (gfc_typename): Handle the case where an INTEGER(8) actually
	holds a TYPE(C_PTR).

gcc/testsuite/ChangeLog:

	* gfortran.dg/c_ptr_tests_20.f90: New test, checking that INTEGER(8)
	and TYPE(C_PTR) are recognised as compatible.
	* gfortran.dg/c_ptr_tests_21.f90: New test, exercising the error
	detection for C_FUNPTR.
2023-10-26 15:12:37 +02:00
Juzhe-Zhong
a4ca869133 DOC: Update COND_LEN document
gcc/ChangeLog:

	* doc/md.texi: Adapt COND_LEN pseudo code.
2023-10-26 18:01:10 +08:00
Roger Sayle
d1bb9569d7 PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo:	AND     #0xff, R12
        RLAM.A #4, R12 { RRAM.A #4, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:	MOV.B   R12, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA

2023-10-26  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR rtl-optimization/91865
	* combine.cc (make_compound_operation): Avoid creating a
	ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
	PR rtl-optimization/91865
	* gcc.target/msp430/pr91865.c: New test case.
2023-10-26 10:06:59 +01:00
liuhongt
2f592b7b55 Pass type of comparison operands instead of comparison result to truth_type_for in build_vec_cmp.
gcc/c/ChangeLog:

	* c-typeck.cc (build_vec_cmp): Pass type of arg0 to
	truth_type_for.

gcc/cp/ChangeLog:

	* typeck.cc (build_vec_cmp): Pass type of arg0 to
	truth_type_for.
2023-10-26 16:36:06 +08:00
Jiahao Xu
60c11c9a23 LoongArch:Enable vcond_mask_mn expanders for SF/DF modes.
If the vcond_mask patterns don't support fp modes, the vector
FP comparison instructions will not be generated.

gcc/ChangeLog:

	* config/loongarch/lasx.md (vcond_mask_<ILASX:mode><ILASX:mode>): Change to
	(vcond_mask_<mode><mode256_i>): this.
	* config/loongarch/lsx.md (vcond_mask_<ILSX:mode><ILSX:mode>): Change to
	(vcond_mask_<mode><mode_i>): this.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: New test.
	* gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: New test.
	* gcc.target/loongarch/vector/lsx/lsx-vcond-1.c: New test.
	* gcc.target/loongarch/vector/lsx/lsx-vcond-2.c: New test.
2023-10-26 15:03:58 +08:00
Stefan Schulze Frielinghaus
88df58b7ee testsuite: Fix _BitInt in gcc.misc-tests/godump-1.c
Currently _BitInt is only supported on x86_64 which means that for other
targets all tests fail with e.g.

gcc.misc-tests/godump-1.c:237:1: sorry, unimplemented: '_BitInt(32)' is not supported on this target
  237 | _BitInt(32) b32_v;
      | ^~~~~~~

Instead of requiring _BitInt support for godump-1.c, move _BitInt tests
into godump-2.c such that all other tests in godump-1.c are still
executed in case of missing _BitInt support.

gcc/testsuite/ChangeLog:

	* gcc.misc-tests/godump-1.c: Move _BitInt tests into godump-2.c.
	* gcc.misc-tests/godump-2.c: New test.
2023-10-26 08:41:24 +02:00
Thomas Schwinge
3dfe7e2d55 More '#ifdef ASM_OUTPUT_DEF' -> 'if (TARGET_SUPPORTS_ALIASES)' etc.
Per commit a8b522b483 (Subversion r251048)
"Introduce TARGET_SUPPORTS_ALIASES", there is the idea that a back end may or
may not provide symbol aliasing support ('TARGET_SUPPORTS_ALIASES') independent
of '#ifdef ASM_OUTPUT_DEF', and in particular, depending not just on static but
instead on dynamic (run-time) configuration.  There did remain a few instances
where we currently still assume that from '#ifdef ASM_OUTPUT_DEF' follows
'TARGET_SUPPORTS_ALIASES'.  Change these to 'if (TARGET_SUPPORTS_ALIASES)',
similarly, or 'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);'.

	gcc/
	* ipa-icf.cc (sem_item::target_supports_symbol_aliases_p):
	'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);' before
	'return true;'.
	* ipa-visibility.cc (function_and_variable_visibility): Change
	'#ifdef ASM_OUTPUT_DEF' to 'if (TARGET_SUPPORTS_ALIASES)'.
	* varasm.cc (output_constant_pool_contents)
	[#ifdef ASM_OUTPUT_DEF]:
	'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);'.
	(do_assemble_alias) [#ifdef ASM_OUTPUT_DEF]:
	'if (!TARGET_SUPPORTS_ALIASES)',
	'gcc_checking_assert (seen_error ());'.
	(assemble_alias): Change '#if !defined (ASM_OUTPUT_DEF)' to
	'if (!TARGET_SUPPORTS_ALIASES)'.
	(default_asm_output_anchor):
	'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);'.
2023-10-26 08:37:25 +02:00
Alexandre Oliva
33d38b431c set hardcmp eh probs
Set execution count of EH blocks, and probability of EH edges.


for  gcc/ChangeLog

	PR tree-optimization/111520
	* gimple-harden-conditionals.cc
	(pass_harden_compares::execute): Set EH edge probability and
	EH block execution count.

for  gcc/testsuite/ChangeLog

	PR tree-optimization/111520
	* g++.dg/torture/harden-comp-pr111520.cc: New.
2023-10-26 03:19:29 -03:00
Alexandre Oliva
2f398d148a rename make_eh_edges to make_eh_edge
Since make_eh_edges creates at most one edge, rename it to
make_eh_edge.


for  gcc/ChangeLog

	* tree-eh.h (make_eh_edges): Rename to...
	(make_eh_edge): ... this.
	* tree-eh.cc: Likewise.  Adjust all callers...
	* gimple-harden-conditionals.cc: ... here, ...
	* gimple-harden-control-flow.cc: ... here, ...
	* tree-cfg.cc: ... here, ...
	* tree-inline.cc: ... and here.
2023-10-26 03:06:05 -03:00
GCC Administrator
f75fc1f083 Daily bump. 2023-10-26 00:17:43 +00:00
Iain Sandoe
da9e72f80f Darwin: Handle the fPIE option specially.
For Darwin, PIE requires PIC codegen, but otherwise is only a link-time
change. For almost all Darwin, we do not report __PIE__; the exception is
32bit X86 and from Darwin12 to 17 only (32 bit is no longer supported
after Darwin17).

gcc/ChangeLog:

	* config/darwin.cc (darwin_override_options): Handle fPIE.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2023-10-25 20:46:59 +01:00
Iain Sandoe
8f62ce10bc config, aarch64: Use a more compatible sed invocation.
Currently, the sed command used to parse --with-{cpu,tune,arch} are
using GNU-specific extension (automatically recognising extended regex).

This is failing on Darwin, which defualts to Posix behaviour.
However '-E' is accepted to indicate an extended RE.  Strictly, this
is also not really sufficient, since we should only require a Posix
sed.

gcc/ChangeLog:

	* config.gcc: Use -E to to sed to indicate that we are using
	extended REs.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2023-10-25 20:44:20 +01:00
Jason Merrill
1aa9f1cc98 tree: update address_space comment
Mention front-end uses of the address_space bit-field, and remove the
inaccurate "only".

gcc/ChangeLog:

	* tree-core.h (struct tree_base): Update address_space comment.
2023-10-25 15:24:30 -04:00
Wilco Dijkstra
668c4c3783 AArch64: Improve immediate generation
Further improve immediate generation by adding support for 2-instruction
MOV/EOR bitmask immediates.  This reduces the number of 3/4-instruction
immediates in SPECCPU2017 by ~2%.

Reviewed-by: Richard Earnshaw <Richard.Earnshaw@arm.com>

gcc/ChangeLog:
	* config/aarch64/aarch64.cc (aarch64_internal_mov_immediate)
	Add support for immediates using MOV/EOR bitmask.

gcc/testsuite:
	* gcc.target/aarch64/imm_choice_comparison.c: Change tests.
	* gcc.target/aarch64/moveor_imm.c: Add new test.
	* gcc.target/aarch64/pr106583.c: Change tests.
2023-10-25 16:25:29 +01:00
Jason Merrill
406709b1c7 c++: improve comment
It's incorrect to say that the address of an OFFSET_REF is always a
pointer-to-member; if it represents an overload set with both static and
non-static member functions that ends up resolving to a static one, the
address is a normal pointer.  And let's go ahead and mention explicit object
member functions even though the patch hasn't landed yet.

gcc/cp/ChangeLog:

	* cp-tree.def: Improve OFFSET_REF comment.
	* cp-gimplify.cc (cp_fold_immediate): Add to comment.
2023-10-25 11:02:31 -04:00
Uros Bizjak
678e6c328c i386: Narrow test instructions with immediate operands [PR111698]
Narrow test instructions with immediate operand that test memory location
for zero.  E.g. testl $0x00aa0000, mem can be converted to testb $0xaa, mem+2.
Reject targets where reading (possibly unaligned) part of memory location
after a large write to the same address causes store-to-load forwarding stall.

	PR target/111698

gcc/ChangeLog:

	* config/i386/x86-tune.def (X86_TUNE_PARTIAL_MEMORY_READ_STALL):
	New tune.
	* config/i386/i386.h (TARGET_PARTIAL_MEMORY_READ_STALL): New macro.
	* config/i386/i386.md: New peephole pattern to narrow test
	instructions with immediate operands that test memory locations
	for zero.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr111698.c: New test.
2023-10-25 16:28:09 +02:00
Andrew MacLeod
f7dbf62304 Faster irange union for appending ranges.
A common pattern to to append a range to an existing range via union.
This optimizes that process.

	* value-range.cc (irange::union_append): New.
	(irange::union_): Call union_append when appropriate.
	* value-range.h (irange::union_append): New prototype.
2023-10-25 09:49:02 -04:00
Chenghui Pan
4912418dc1 LoongArch: Fix vfrint-releated comments in lsxintrin.h and lasxintrin.h
The comment of vfrint-related intrinsic functions does not match the return
value type in definition. This patch fixes these comments.

gcc/ChangeLog:

	* config/loongarch/lasxintrin.h (__lasx_xvftintrnel_l_s): Fix comments.
	(__lasx_xvfrintrne_s): Ditto.
	(__lasx_xvfrintrne_d): Ditto.
	(__lasx_xvfrintrz_s): Ditto.
	(__lasx_xvfrintrz_d): Ditto.
	(__lasx_xvfrintrp_s): Ditto.
	(__lasx_xvfrintrp_d): Ditto.
	(__lasx_xvfrintrm_s): Ditto.
	(__lasx_xvfrintrm_d): Ditto.
	* config/loongarch/lsxintrin.h (__lsx_vftintrneh_l_s): Ditto.
	(__lsx_vfrintrne_s): Ditto.
	(__lsx_vfrintrne_d): Ditto.
	(__lsx_vfrintrz_s): Ditto.
	(__lsx_vfrintrz_d): Ditto.
	(__lsx_vfrintrp_s): Ditto.
	(__lsx_vfrintrp_d): Ditto.
	(__lsx_vfrintrm_s): Ditto.
	(__lsx_vfrintrm_d): Ditto.
2023-10-25 21:13:27 +08:00
chenxiaolong
1b30ef7cea LoongArch: Implement __builtin_thread_pointer for TLS.
gcc/ChangeLog:

	* config/loongarch/loongarch.md (get_thread_pointer<mode>):Adds the
	instruction template corresponding to the __builtin_thread_pointer
	function.
	* doc/extend.texi:Add the __builtin_thread_pointer function support
	description to the documentation.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/builtin_thread_pointer.c: New test.
2023-10-25 21:11:16 +08:00
Patrick Palka
fb28d5c6b0 c++: add fixed testcase [PR99804]
We accept the non-dependent call f(e) here ever since the
NON_DEPENDENT_EXPR removal patch r14-4793-gdad311874ac3b3.
I haven't looked closely into why but I suspect wrapping 'e'
in a NON_DEPENDENT_EXPR was causing the argument conversion
to misbehave.

	PR c++/99804

gcc/testsuite/ChangeLog:

	* g++.dg/template/enum9.C: New test.
2023-10-25 09:03:52 -04:00
Vibhav Pant
ac66744d94 jit: dump string literal initializers correctly
Signed-off-by: David Malcolm <dmalcolm@redhat.com>

gcc/jit/ChangeLog:
	* jit-recording.cc (recording::global::write_to_dump): Fix
	dump of string literal initializers.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-10-25 08:35:47 -04:00
Jonathan Wakely
f32c1e1e96 libstdc++: Build libstdc++_libbacktrace.a as PIC [PR111936]
In order for std::stacktrace to be used in a shared library, the
libbacktrace symbols need to be built with -fPIC. Add the libtool
-prefer-pic flag to the commands in src/libbacktrace/Makefile so that
the archive contains PIC objects.

libstdc++-v3/ChangeLog:

	PR libstdc++/111936
	* src/libbacktrace/Makefile.am: Add -prefer-pic to libtool
	compile commands.
	* src/libbacktrace/Makefile.in: Regenerate.
2023-10-25 11:08:57 +01:00
Gaius Mulley
8bb655d0c5 PR modula2/111955 introduce isnan support to Builtins.def
This patch introduces isnan, isnanf and isnanl to Builtins.def.
It requires fallback functions isnan, isnanf, isnanl to be implemented in
libgm2/libm2pim/wrapc.cc and gm2-libs-ch/wrapc.c.
Access to the GCC builtin isnan tree is provided by adding
an isnan definition and support functions to gm2-gcc/m2builtins.cc.

gcc/m2/ChangeLog:

	PR modula2/111955
	* gm2-gcc/m2builtins.cc (gm2_isnan_node): New tree.
	(DoBuiltinIsnan): New function.
	(m2builtins_BuiltInIsnan): New function.
	(m2builtins_init): Initialize gm2_isnan_node.
	(list_of_builtins): Add define for __builtin_isnan.
	* gm2-libs-ch/wrapc.c (wrapc_isnan): New function.
	(wrapc_isnanf): New function.
	(wrapc_isnanl): New function.
	* gm2-libs/Builtins.def (isnanf): New procedure function.
	(isnan): New procedure function.
	(isnanl): New procedure function.
	* gm2-libs/Builtins.mod:
	* gm2-libs/wrapc.def (isnan): New function.
	(isnanf): New function.
	(isnanl): New function.

libgm2/ChangeLog:

	PR modula2/111955
	* libm2pim/wrapc.cc (isnan): Export new function.
	(isnanf): Export new function.
	(isnanl): Export new function.

gcc/testsuite/ChangeLog:

	PR modula2/111955
	* gm2/pimlib/run/pass/testnan.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-10-25 11:04:12 +01:00
Richard Sandiford
cfb7755d10 rtl-ssa: Add new helper functions
This patch adds some RTL-SSA helper functions.  They will be
used by the upcoming late-combine pass.

The patch contains the first non-template out-of-line function declared
in movement.h, so it adds a movement.cc.  I realise it seems a bit
over-the-top to have a file with just one function, but it might grow
in future. :)

gcc/
	* Makefile.in (OBJS): Add rtl-ssa/movement.o.
	* rtl-ssa/access-utils.h (accesses_include_nonfixed_hard_registers)
	(single_set_info): New functions.
	(remove_uses_of_def, accesses_reference_same_resource): Declare.
	(insn_clobbers_resources): Likewise.
	* rtl-ssa/accesses.cc (rtl_ssa::remove_uses_of_def): New function.
	(rtl_ssa::accesses_reference_same_resource): Likewise.
	(rtl_ssa::insn_clobbers_resources): Likewise.
	* rtl-ssa/movement.h (can_move_insn_p): Declare.
	* rtl-ssa/movement.cc: New file.
2023-10-25 10:39:53 +01:00
Richard Sandiford
39cac7c314 rtl-ssa: Extend make_uses_available
The first in-tree use of RTL-SSA was fwprop, and one of the goals
was to make the fwprop rewrite preserve the old behaviour as far
as possible.  The switch to RTL-SSA was supposed to be a pure
infrastructure change.  So RTL-SSA has various FIXMEs for things
that were artifically limited to faciliate the old-fwprop vs.
new-fwprop comparison.

One of the things that fwprop wants to do is extend live ranges, and
function_info::make_use_available tried to keep within the cases that
old fwprop could handle.

Since the information is built in extended basic blocks, it's easy
to handle intra-EBB queries directly.  This patch does that, and
removes the associated FIXME.

To get a flavour for how much difference this makes, I tried compiling
the testsuite at -Os for at least one target per supported CPU and OS.
For most targets, only a handful of tests changed, but the vast majority
of changes were positive.  The only target that seemed to benefit
significantly was i686-apple-darwin.

The main point of the patch is to remove the FIXME and to enable
the upcoming post-RA late-combine pass to handle more cases.

gcc/
	* rtl-ssa/functions.h (function_info::remains_available_at_insn):
	New member function.
	* rtl-ssa/accesses.cc (function_info::remains_available_at_insn):
	Likewise.
	(function_info::make_use_available): Avoid false negatives for
	queries within an EBB.
2023-10-25 10:39:53 +01:00
Richard Sandiford
d7266f655e rtl-ssa: Use frequency-weighted insn costs
rtl_ssa::changes_are_worthwhile used the standard approach
of summing up the individual costs of the old and new sequences
to see which one is better overall.  But when optimising for
speed and changing instructions in multiple blocks, it seems
better to weight the cost of each instruction by its execution
frequency.  (We already do something similar for SLP layouts.)

gcc/
	* rtl-ssa/changes.cc: Include sreal.h.
	(rtl_ssa::changes_are_worthwhile): When optimizing for speed,
	scale the cost of each instruction by its execution frequency.
2023-10-25 10:39:52 +01:00
Richard Sandiford
cc15a0f49d rtl-ssa: Handle call clobbers in more places
In order to save (a lot of) memory, RTL-SSA avoids creating
individual clobber records for every call-clobbered register.
It instead maintains a list & splay tree of calls in an EBB,
grouped by ABI.

This patch takes these call clobbers into account in a couple
more routines.  I don't think this will have any effect on
existing users, since it's only necessary for hard registers.

gcc/
	* rtl-ssa/access-utils.h (next_call_clobbers): New function.
	(is_single_dominating_def, remains_available_on_exit): Replace with...
	* rtl-ssa/functions.h (function_info::is_single_dominating_def)
	(function_info::remains_available_on_exit): ...these new member
	functions.
	(function_info::m_clobbered_by_calls): New member variable.
	* rtl-ssa/functions.cc (function_info::function_info): Explicitly
	initialize m_clobbered_by_calls.
	* rtl-ssa/insns.cc (function_info::record_call_clobbers): Update
	m_clobbered_by_calls for each call-clobber note.
	* rtl-ssa/member-fns.inl (function_info::is_single_dominating_def):
	New function.  Check for call clobbers.
	* rtl-ssa/accesses.cc (function_info::remains_available_on_exit):
	Likewise.
2023-10-25 10:39:52 +01:00
Richard Sandiford
ba97d0e3b9 rtl-ssa: Calculate dominance frontiers for the exit block
The exit block can have multiple predecessors, for example if the
function calls __builtin_eh_return.  We might then need PHI nodes
for values that are live on exit.

RTL-SSA uses the normal dominance frontiers approach for calculating
where PHI nodes are needed.  However, dominannce.cc only calculates
dominators for normal blocks, not the exit block.
calculate_dominance_frontiers likewise only calculates dominance
frontiers for normal blocks.

This patch fills in the “missing” frontiers manually.

gcc/
	* rtl-ssa/internals.h (build_info::exit_block_dominator): New
	member variable.
	* rtl-ssa/blocks.cc (build_info::build_info): Initialize it.
	(bb_walker::bb_walker): Use it, moving the computation of the
	dominator to...
	(function_info::process_all_blocks): ...here.
	(function_info::place_phis): Add dominance frontiers for the
	exit block.
2023-10-25 10:39:51 +01:00