This patch improves the code generated for x << 1 (and for x + x) when
X is 64-bit DImode, using the same two instruction code sequence used
for DImode addition.
For the test case:
long long foo(long long x) { return x << 1; }
GCC -O2 currently generates the following code:
foo: lsr r2,r0,31
asl_s r1,r1,1
asl_s r0,r0,1
j_s.d [blink]
or_s r1,r1,r2
and on CPU without a barrel shifter, i.e. -mcpu=em
foo: add.f 0,r0,r0
asl_s r1,r1
rlc r2,0
asl_s r0,r0
j_s.d [blink]
or_s r1,r1,r2
with this patch (both with and without a barrel shifter):
foo: add.f r0,r0,r0
j_s.d [blink]
adc r1,r1,r1
A similar optimization is also applicable to H8300H, that could also use
a two instruction sequence (plus rts) but currently GCC generates 16
instructions (plus an rts) for foo above.
2023-11-03 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/arc/arc.md (addsi3): Fix GNU-style code formatting.
(adddi3): Change define_expand to generate a *adddi3.
(*adddi3): New define_insn_and_split to lower DImode additions
during the split1 pass (after combine and before reload).
(ashldi3): New define_expand to (only) generate *ashldi3_cnt1
for DImode left shifts by a single bit.
(*ashldi3_cnt1): New define_insn_and_split to lower DImode
left shifts by one bit to an *adddi3.
gcc/testsuite/ChangeLog
* gcc.target/arc/adddi3-1.c: New test case.
* gcc.target/arc/ashldi3-1.c: Likewise.
This patch removes a can_create_pseudo_p condition from
*cmov_uxtw_insn_insv, bringing it in line with *cmov<mode>_insn_insv.
The constraints correctly describe the requirements.
gcc/
* config/aarch64/aarch64.md (*cmov_uxtw_insn_insv): Remove
can_create_pseudo_p condition.
This patch fixes following FAILs for RVV:
FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
Bootstrap on X86 and regtest passed.
Ok for trunk ?
PR tree-optimization/111721
gcc/ChangeLog:
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Support SLP for dummy mask -1.
* tree-vect-stmts.cc (vectorizable_load): Ditto.
The following removes a bogus assert constraining the uses that
could appear when a built from scalar defs SLP node constrains
code generation in a way so earlier uses of the vector CTOR
components fail to get vectorized. We can't really constrain the
operation such use appears in.
PR tree-optimization/112366
* tree-vect-loop.cc (vectorizable_live_operation): Remove
assert.
Running 'make check' with: 'RUNTESTFLAGS=--target_board=unix/-fno-exceptions',
'error: exception handling disabled' is triggered for C++ 'throw' etc. usage,
and per 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune':
# If exceptions are disabled, mark tests expecting exceptions to be enabled
# as unsupported.
if { ![check_effective_target_exceptions_enabled] } {
if [regexp "(^|\n)\[^\n\]*: error: exception handling disabled" $text] {
return "::unsupported::exception handling disabled"
}
..., which generally means:
-PASS: [...] (test for excess errors)
+UNSUPPORTED: [...]: exception handling disabled
However, this doesn't work for 'g++.dg/tree-prof/' test cases. For example:
[-PASS:-]{+UNSUPPORTED:+} g++.dg/tree-prof/indir-call-prof-2.C [-compilation, -fprofile-generate -D_PROFILE_GENERATE-]{+compilation: exception handling disabled+}
[-PASS:-]{+UNRESOLVED:+} g++.dg/tree-prof/indir-call-prof-2.C execution, -fprofile-generate -D_PROFILE_GENERATE
[-PASS:-]{+UNRESOLVED:+} g++.dg/tree-prof/indir-call-prof-2.C compilation, -fprofile-use -D_PROFILE_USE
[-PASS:-]{+UNRESOLVED:+} g++.dg/tree-prof/indir-call-prof-2.C execution, -fprofile-use -D_PROFILE_USE
Dependent tests turn UNRESOLVED if the first "compilation" runs into the
expected 'UNSUPPORTED: [...] compile: exception handling disabled'.
Specify 'dg-require-effective-target exceptions_enabled' for those test cases.
gcc/testsuite/
* g++.dg/tree-prof/indir-call-prof-2.C: Specify
'dg-require-effective-target exceptions_enabled'.
* g++.dg/tree-prof/partition1.C: Likewise.
* g++.dg/tree-prof/partition2.C: Likewise.
* g++.dg/tree-prof/partition3.C: Likewise.
* g++.dg/tree-prof/pr51719.C: Likewise.
* g++.dg/tree-prof/pr57451.C: Likewise.
* g++.dg/tree-prof/pr59255.C: Likewise.
Running 'make check' with: 'RUNTESTFLAGS=--target_board=unix/-fno-exceptions',
'error: exception handling disabled' is triggered for C++ 'throw' etc. usage,
and per 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune':
# If exceptions are disabled, mark tests expecting exceptions to be enabled
# as unsupported.
if { ![check_effective_target_exceptions_enabled] } {
if [regexp "(^|\n)\[^\n\]*: error: exception handling disabled" $text] {
return "::unsupported::exception handling disabled"
}
..., which generally means:
-PASS: [...] (test for excess errors)
+UNSUPPORTED: [...]: exception handling disabled
However, this doesn't work for "split files" test cases. For example:
[-PASS:-]{+UNSUPPORTED:+} g++.dg/lto/20081109-1 cp_lto_20081109-1_0.o [-assemble, -fPIC -flto -flto-partition=1to1-]{+assemble: exception handling disabled+}
[-PASS:-]{+UNRESOLVED:+} g++.dg/lto/20081109-1 cp_lto_20081109-1_0.o-cp_lto_20081109-1_0.o [-link,-]{+link+} -fPIC -flto -flto-partition=1to1
{+UNRESOLVED: g++.dg/lto/20081109-1 cp_lto_20081109-1_0.o-cp_lto_20081109-1_0.o execute -fPIC -flto -flto-partition=1to1+}
The "compile"/"assemble" tests (either continue to work, or) result in the
expected 'UNSUPPORTED: [...] compile: exception handling disabled', but
dependent "link" and "execute" tests then turn UNRESOLVED.
Specify 'dg-require-effective-target exceptions_enabled' for those test cases.
gcc/testsuite/
* g++.dg/lto/20081109-1_0.C: Specify
'dg-require-effective-target exceptions_enabled'.
* g++.dg/lto/20081109_0.C: Likewise.
* g++.dg/lto/20091026-1_0.C: Likewise.
* g++.dg/lto/pr87906_0.C: Likewise.
* g++.dg/lto/pr88046_0.C: Likewise.
Running 'make check' with: 'RUNTESTFLAGS=--target_board=unix/-fno-exceptions',
'error: exception handling disabled' is triggered for C++ 'throw' etc. usage,
and per 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune':
# If exceptions are disabled, mark tests expecting exceptions to be enabled
# as unsupported.
if { ![check_effective_target_exceptions_enabled] } {
if [regexp "(^|\n)\[^\n\]*: error: exception handling disabled" $text] {
return "::unsupported::exception handling disabled"
}
..., which generally means:
-PASS: [...] (test for excess errors)
+UNSUPPORTED: [...]: exception handling disabled
However, this doesn't work for "split files" test cases. For example:
PASS: g++.dg/compat/eh/ctor1 cp_compat_main_tst.o compile
[-PASS:-]{+UNSUPPORTED:+} g++.dg/compat/eh/ctor1 cp_compat_x_tst.o [-compile-]{+compile: exception handling disabled+}
[-PASS:-]{+UNSUPPORTED:+} g++.dg/compat/eh/ctor1 cp_compat_y_tst.o [-compile-]{+compile: exception handling disabled+}
[-PASS:-]{+UNRESOLVED:+} g++.dg/compat/eh/ctor1 cp_compat_x_tst.o-cp_compat_y_tst.o link
[-PASS:-]{+UNRESOLVED:+} g++.dg/compat/eh/ctor1 cp_compat_x_tst.o-cp_compat_y_tst.o execute
The "compile"/"assemble" tests (either continue to work, or) result in the
expected 'UNSUPPORTED: [...] compile: exception handling disabled', but
dependent "link" and "execute" tests then turn UNRESOLVED.
Specify 'dg-require-effective-target exceptions_enabled' for those test cases.
gcc/testsuite/
* g++.dg/compat/eh/ctor1_main.C: Specify
'dg-require-effective-target exceptions_enabled'.
* g++.dg/compat/eh/ctor2_main.C: Likewise.
* g++.dg/compat/eh/dtor1_main.C: Likewise.
* g++.dg/compat/eh/filter1_main.C: Likewise.
* g++.dg/compat/eh/filter2_main.C: Likewise.
* g++.dg/compat/eh/new1_main.C: Likewise.
* g++.dg/compat/eh/nrv1_main.C: Likewise.
* g++.dg/compat/eh/spec3_main.C: Likewise.
* g++.dg/compat/eh/template1_main.C: Likewise.
* g++.dg/compat/eh/unexpected1_main.C: Likewise.
* g++.dg/compat/init/array5_main.C: Likewise.
The following avoids hoisting expressions that may invoke undefined
behavior and are not computed on all paths. This is realized by
noting that we have to avoid materializing expressions as part
of hoisting that are not part of the set of expressions we have
found eligible for hoisting. Instead of picking the expression
corresponding to the hoistable values from the first successor
we now keep a union of the expressions so that hoisting can pick
the expression that has its dependences fully hoistable.
PR tree-optimization/112310
* tree-ssa-pre.cc (do_hoist_insertion): Keep the union
of expressions, validate dependences are contained within
the hoistable set before hoisting.
* gcc.dg/torture/pr112310.c: New testcase.
2023-11-03 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/98498
* interface.cc (upoly_ok): Defined operators using unlimited
polymorphic formal arguments must not override the intrinsic
operator use.
gcc/testsuite/
PR fortran/98498
* gfortran.dg/interface_50.f90: New test.
Update in v2:
* Add mode size equal check to disable different mode size when expand,
because the underlying codegen is not implemented yet.
Original log:
The previous rounding API start with i/l/ll only works on the same
mode types. For example as below, and we arrange the iterator similar
to fcvt.
* SF => SI
* DF => DI
After we refined this limination from middle-end, these API can also
vectorized with different type sizes, aka:
* HF => SI, HF => DI
* SF => DI, SF => SI
* DF => SI, DF => DI
Then the iterator cannot take care of this simply and this patch
would like to re-arrange the iterator in two items.
* V_VLS_F_CONVERT_SI: handle (HF, SF, DF) => SI
* V_VLS_F_CONVERT_DI: handle (HF, SF, DF) => DI
As well as related mode_attr to reconcile the new iterator.
gcc/ChangeLog:
* config/riscv/autovec.md (lrint<mode><v_i_l_ll_convert>2): Remove.
(lround<mode><v_i_l_ll_convert>2): Ditto.
(lceil<mode><v_i_l_ll_convert>2): Ditto.
(lfloor<mode><v_i_l_ll_convert>2): Ditto.
(lrint<mode><v_f2si_convert>2): New pattern for cvt from
FP to SI.
(lround<mode><v_f2si_convert>2): Ditto.
(lceil<mode><v_f2si_convert>2): Ditto.
(lfloor<mode><v_f2si_convert>2): Ditto.
(lrint<mode><v_f2di_convert>2): New pattern for cvt from
FP to DI.
(lround<mode><v_f2di_convert>2): Ditto.
(lceil<mode><v_f2di_convert>2): Ditto.
(lfloor<mode><v_f2di_convert>2): Ditto.
* config/riscv/vector-iterators.md: Renew iterators for both
the SI and DI.
Signed-off-by: Pan Li <pan2.li@intel.com>
With compile option --param=riscv-autovec-preference=fixed-vlmax, we have
redundant AVL/VL toggling:
vsetvli a5,a3,e8,mf4,ta,ma -> should be changed into e32m1
vle32.v v1,0(a1)
vle32.v v2,0(a0)
vsetivli zero,4,e32,m1,ta,ma -> redundant
slli a2,a5,2
vadd.vv v1,v1,v2
sub a3,a3,a5
vsetvli zero,a5,e32,m1,ta,ma -> redundant
vse32.v v1,0(a4)
add a0,a0,a2
add a1,a1,a2
add a4,a4,a2
bne a3,zero,.L3
The root cause is because we simplify AVL into immediate AVL too early
in FIXED-VLMAX situation. The later avlprop PASS failed to propagate AVL
generated by (SELECT_VL/vsetvl VL, AVL) into the normal RVV instruction.
So we need to remove immedate AVL simplification in 'expand' stage.
After this patch:
vsetvli a5,a3,e32,m1,ta,ma
slli a2,a5,2
vle32.v v1,0(a1)
vle32.v v2,0(a0)
sub a3,a3,a5
vadd.vv v1,v1,v2
vse32.v v1,0(a4)
add a0,a0,a2
add a1,a1,a2
add a4,a4,a2
bne a3,zero,.L3
After the removed simplification, the following situation should be fixed:
typedef int8_t vnx2qi __attribute__ ((vector_size (2)));
__attribute__ ((noipa)) void
f_vnx2qi (int8_t a, int8_t b, int8_t *out)
{
vnx2qi v = {a, b};
*(vnx2qi *) out = v;
}
We should use vsetvili zero, 2 instead of vsetvl a5,zero.
Such simplification is done in avlprop PASS which is also included in this patch
to fix regression of these situation.
PR target/112326
gcc/ChangeLog:
* config/riscv/riscv-avlprop.cc (get_insn_vtype_mode): New function.
(simplify_replace_vlmax_avl): Ditto.
(pass_avlprop::execute): Add immediate AVL simplification.
* config/riscv/riscv-protos.h (imm_avl_p): Rename.
* config/riscv/riscv-v.cc (const_vlmax_p): Ditto.
(imm_avl_p): Ditto.
(emit_vlmax_insn): Adapt for new interface name.
* config/riscv/vector.md (mode_idx): New attribute.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112326.c: New test.
Now that all insns are guaranteed to have a type, ensure every insn
is associated with a cpu unit/insn reservation.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_sched_variable_issue): add disabled assert
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
2023-11-02 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/112316
* parse.cc (parse_associate): Remove condition that caused this
regression.
gcc/testsuite/
PR fortran/112316
* gfortran.dg/pr112316.f90: New test.
I noticed we were using a hash_table directly here instead of the simpler
hash_set interface. Also, let's check for the variable itself and repeats
earlier, since they should happen more often than any of the other cases.
gcc/cp/ChangeLog:
* semantics.cc (nrv_data): Change visited to hash_set.
(finalize_nrv_r): Reorganize.
In r12-6333 for PR33799, I fixed the example in [except.ctor]/2. In that
testcase, the exception is caught and the function returns again,
successfully.
In this testcase, however, the exception is rethrown, and hits two separate
cleanups: one in the try block and the other in the function body. So we
destroy twice an object that was only constructed once.
Fortunately, the fix for the normal case is easy: we just need to clear the
"return value constructed by return" flag when we do it the first time.
This gets more complicated with the named return value optimization, since
we don't want to destroy the return value while the NRV variable is still in
scope.
PR c++/112301
PR c++/102191
PR c++/33799
gcc/cp/ChangeLog:
* except.cc (maybe_splice_retval_cleanup): Clear
current_retval_sentinel when destroying retval.
* semantics.cc (nrv_data): Add in_nrv_cleanup.
(finalize_nrv): Set it.
(finalize_nrv_r): Fix handling of throwing cleanups.
gcc/testsuite/ChangeLog:
* g++.dg/eh/return1.C: Add more cases.
Fix ICE because of forgotten checks for pointers to void
and incomplete arrays.
Committed as obvious.
PR c/112347
gcc/c:
* c-typeck.cc (convert_for_assignment): Add missing check.
gcc/testsuite:
* gcc.dg/Walloc-size-3.c: New test.
The checks for snprintf give a -Wformat warning due to a missing
argument.
libstdc++-v3/ChangeLog:
* acinclude.m4 (GLIBCXX_ENABLE_C99): Fix snprintf checks.
* configure: Regenerate.
Spurred by Roger's recent work on ARC, this patch improves the code we
generation for single bit sign extractions.
The basic idea is to get the bit we want into C, the use a subx;ext.w;ext.l
sequence to sign extend it in a GPR.
For bits 0..15 we can use a bld instruction to get the bit we want into C. For
bits 16..31, we can move the high word into the low word, then use bld.
There's a couple special cases where we can shift the bit we want from the high
word into C which is an instruction smaller.
Not surprisingly most cases seen in newlib and the test suite are extractions
from the low byte, HImode sign bit and top two bits of SImode.
Regression tested on the H8 with no regressions. Installing on the trunk.
gcc/
* config/h8300/combiner.md: Add new patterns for single bit
sign extractions.
The previous rounding API start with i/l/ll only works on the same
mode types. For example as below, and we arrange the iterator similar
to fcvt.
* SF => SI
* DF => DI
After we refined this limination from middle-end, these API can also
vectorized with different type sizes, aka:
* HF => SI, HF => DI
* SF => DI, SF => SI
* DF => SI, DF => DI
Then the iterator cannot take care of this simply and this patch
would like to re-arrange the iterator in two items.
* V_VLS_F_CONVERT_SI: handle (HF, SF, DF) => SI
* V_VLS_F_CONVERT_DI: handle (HF, SF, DF) => DI
As well as related mode_attr to reconcile the new iterator.
gcc/ChangeLog:
* config/riscv/autovec.md (lrint<mode><v_i_l_ll_convert>2): Remove.
(lround<mode><v_i_l_ll_convert>2): Ditto.
(lceil<mode><v_i_l_ll_convert>2): Ditto.
(lfloor<mode><v_i_l_ll_convert>2): Ditto.
(lrint<mode><v_f2si_convert>2): New pattern for cvt from
FP to SI.
(lround<mode><v_f2si_convert>2): Ditto.
(lceil<mode><v_f2si_convert>2): Ditto.
(lfloor<mode><v_f2si_convert>2): Ditto.
(lrint<mode><v_f2di_convert>2): New pattern for cvt from
FP to DI.
(lround<mode><v_f2di_convert>2): Ditto.
(lceil<mode><v_f2di_convert>2): Ditto.
(lfloor<mode><v_f2di_convert>2): Ditto.
* config/riscv/vector-iterators.md: Renew iterators for both
the SI and DI.
Signed-off-by: Pan Li <pan2.li@intel.com>
Say 'memory lifetime' rather than 'memory life' as lifetime is the more
standard term nowadays (indeed we have e.g. -fno-lifetime-dse).
It's also easier to grep for if someone is looking for the documentation on
where we do that.
gcc/ChangeLog:
* doc/passes.texi (Dead code elimination): Explicitly say 'lifetime'
as this has become the standard term for what we're doing here.
Signed-off-by: Sam James <sam@gentoo.org>
A run FAIL suddenly shows up today to me:
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test
that I didn't have before.
After investigation, I realize that there is a bug in AVL propagtion PASS.
gcc/ChangeLog:
* config/riscv/riscv-avlprop.cc
(pass_avlprop::get_vlmax_ta_preferred_avl): Don't allow
non-real insn AVL propation.
As described in PR111401 we currently emit a COND and a PLUS expression
for conditional reductions. This makes it difficult to combine both
into a masked reduction statement later.
This patch improves that by directly emitting a COND_ADD/COND_OP during
ifcvt and adjusting some vectorizer code to handle it.
It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
is true.
gcc/ChangeLog:
PR middle-end/111401
* internal-fn.cc (internal_fn_else_index): New function.
* internal-fn.h (internal_fn_else_index): Define.
* tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP
if supported.
(predicate_scalar_phi): Add whitespace.
* tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP.
(neutral_op_for_reduction): Return -0 for PLUS.
(check_reduction_path): Don't count else operand in COND_OP.
(vect_is_simple_reduction): Ditto.
(vect_create_epilog_for_reduction): Fix whitespace.
(vectorize_fold_left_reduction): Add COND_OP handling.
(vectorizable_reduction): Don't count else operand in COND_OP.
(vect_transform_reduction): Add COND_OP handling.
* tree-vectorizer.h (neutral_op_for_reduction): Add default
parameter.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
* gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
The following addresses wrong debug IL created by SCCP rewriting stmts
to defined overflow. I addressed another inefficiency there but
needed to adjust the API of rewrite_to_defined_overflow for this
which is now taking a stmt iterator for in-place operation and a
stmt for sequence producing because gsi_for_stmt doesn't work for
stmts not in the IL.
PR tree-optimization/112320
* gimple-fold.h (rewrite_to_defined_overflow): New overload
for in-place operation.
* gimple-fold.cc (rewrite_to_defined_overflow): Add stmt
iterator argument to worker, define separate API for
in-place and not in-place operation.
* tree-if-conv.cc (predicate_statements): Simplify.
* tree-scalar-evolution.cc (final_value_replacement_loop):
Likewise.
* tree-ssa-ifcombine.cc (pass_tree_ifcombine::execute): Adjust.
* tree-ssa-reassoc.cc (update_range_test): Likewise.
* gcc.dg/pr112320.c: New testcase.
The following patch implements C++26 unevaluated-string.
As it seems to me just extra pedanticity, it is implemented only for
-std=c++26 or -std=gnu++26 and later and only if -pedantic/-pedantic-errors.
Nothing is done for inline asm, while the spec changes those, it changes it
to a balanced token sequence with implementation defined rules on what is
and isn't allowed (so pedantically accepting asm ("" : "+m" (x));
was accepts-invalid before C++26, but we didn't diagnose anything).
For the other spots mentioned in the paper, static_assert message,
linkage specification, deprecated/nodiscard attributes it enforces the
requirements (no prefixes, udlit suffixes, no octal/hexadecimal escapes
(conditional escape sequences were rejected with pedantic already before).
For the deprecated operator "" identifier case I've kept things as is,
because everything seems to have been diagnosed already (a lot being implied
from the string having to be empty).
2023-11-02 Jakub Jelinek <jakub@redhat.com>
PR c++/110342
gcc/cp/
* parser.cc: Implement C++26 P2361R6 - Unevaluated strings.
(uneval_string_attr): New enumerator.
(cp_parser_string_literal_common): Add UNEVAL argument. If true,
pass CPP_UNEVAL_STRING rather than CPP_STRING to
cpp_interpret_string_notranslate.
(cp_parser_string_literal, cp_parser_userdef_string_literal): Adjust
callers of cp_parser_string_literal_common.
(cp_parser_unevaluated_string_literal): New function.
(cp_parser_parenthesized_expression_list): Handle uneval_string_attr.
(cp_parser_linkage_specification): Use
cp_parser_unevaluated_string_literal for C++26.
(cp_parser_static_assert): Likewise.
(cp_parser_std_attribute): Use uneval_string_attr for standard
deprecated and nodiscard attributes.
gcc/testsuite/
* g++.dg/cpp26/unevalstr1.C: New test.
* g++.dg/cpp26/unevalstr2.C: New test.
* g++.dg/cpp0x/udlit-error1.C (lol): Expect an error for C++26
about user-defined literal in deprecated attribute.
libcpp/
* include/cpplib.h (TTYPE_TABLE): Add CPP_UNEVAL_STRING literal
entry. Use C++11 instead of C++-0x in comments.
* charset.cc (convert_escape): Add UNEVAL argument, if true,
pedantically diagnose numeric escape sequences.
(cpp_interpret_string_1): Formatting fix. Adjust convert_escape
caller.
(cpp_interpret_string): Formatting string.
(cpp_interpret_string_notranslate): Pass type through to
cpp_interpret_string if it is CPP_UNEVAL_STRING.
Notice that there are some reundant 'vimov' codes in attribute.
Committed as it is obvious.
gcc/ChangeLog:
* config/riscv/vector.md: Fix redundant codes in attributes.
Update in v4:
* Append the check to vectorizable_internal_function.
Update in v3:
* Add func to predicate type size is legal or not for vectorizer call.
Update in v2:
* Fix one ICE of type assertion.
* Adjust some test cases for aarch64 sve and riscv vector.
Original log:
The vectoriable_call has one restriction of the size of data type.
Aka DF to DI is allowed but SF to DI isn't. You may see below message
when try to vectorize function call like lrintf.
void
test_lrintf (long *out, float *in, unsigned count)
{
for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lrintf (in[i]);
}
lrintf.c:5:26: missed: couldn't vectorize loop
lrintf.c:5:26: missed: not vectorized: unsupported data-type
Then the standard name pattern like lrintmn2 cannot work for different
data type size like SF => DI. This patch would like to refine this data
type size check and unblock the standard name like lrintmn2 on conditions.
The type size of vectype_out need to be exactly the same as the type
size of vectype_in when the vectype_out size isn't participating in
the optab selection. While there is no such restriction when the
vectype_out is somehow a part of the optab query.
The below test are passed for this patch.
* The risc-v regression tests.
* Ensure the lrintf standard name in risc-v.
The below test are ongoing.
* The x86 bootstrap and regression test.
* The aarch64 regression test.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_internal_function): Add type
size check for vectype_out doesn't participating for optab query.
(vectorizable_call): Remove the type size check.
Signed-off-by: Pan Li <pan2.li@intel.com>
Currently there is an unofficial mirror of GCC on GitHub that people
sometimes submit pull requests to:
https://github.com/gcc-mirror/gcc
However, this is not the proper way to contribute to GCC, so that means
that someone (usually Jonathan Wakely) has to go through the PRs and
manually tell people that they're sending their PRs to the wrong place.
One thing that would help mitigate this problem would be files in a
special .github directory that GitHub would automatically open when
contributors attempt to open a PR, that would then tell them the proper
way to contribute instead. This patch attempts to add two such files.
They are written in Markdown, which I'm realizing might require some
special handling in this repository, since the ".md" extension is also
used for GCC's "Machine Description" files here, but I'm not quite sure
how to go about handling that. Also note that I adapted these files from
equivalent files in the git repository for Git itself:
https://github.com/git/git/blob/master/.github/CONTRIBUTING.mdhttps://github.com/git/git/blob/master/.github/PULL_REQUEST_TEMPLATE.md
What do people think?
ChangeLog:
* .github/CONTRIBUTING.md: New file.
* .github/PULL_REQUEST_TEMPLATE.md: New file.
This patch is a follow-up to my previous PR target/110551 patch, this
time to address the additional move after mulx, seen on TARGET_BMI2
architectures (such as -march=haswell). The complication here is
that the flexible multiple-set mulx instruction is introduced into
RTL after reload, by split2, and therefore can't benefit from register
preferencing. This results in RTL like the following:
(insn 32 31 17 2 (parallel [
(set (reg:DI 4 si [orig:101 r ] [101])
(mult:DI (reg:DI 1 dx [109])
(reg:DI 5 di [109])))
(set (reg:DI 5 di [ r+8 ])
(umul_highpart:DI (reg:DI 1 dx [109])
(reg:DI 5 di [109])))
]) "pr110551-2.c":8:17 -1
(nil))
(insn 17 32 9 2 (set (reg:DI 0 ax [107])
(reg:DI 5 di [ r+8 ])) "pr110551-2.c":9:40 90 {*movdi_internal}
(expr_list:REG_DEAD (reg:DI 5 di [ r+8 ])
(nil)))
Here insn 32, the mulx instruction, places its results in si and di,
and then immediately after decides to move di to ax, with di now dead.
This can be trivially cleaned up by a peephole2. I've added an
additional constraint that the two SET_DESTs can't be the same
register to avoid confusing the middle-end, but this has well-defined
behaviour on x86_64/BMI2, encoding a umul_highpart.
For the new test case, compiled on x86_64 with -O2 -march=haswell:
Before:
mulx64: movabsq $-7046029254386353131, %rdx
mulx %rdi, %rsi, %rdi
movq %rdi, %rax
xorq %rsi, %rax
ret
After:
mulx64: movabsq $-7046029254386353131, %rdx
mulx %rdi, %rsi, %rax
xorq %rsi, %rax
ret
2023-11-01 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/110551
* config/i386/i386.md (*bmi2_umul<mode><dwi>3_1): Tidy condition
as operands[2] with predicate register_operand must be !MEM_P.
(peephole2): Optimize a mulx followed by a register-to-register
move, to place result in the correct destination if possible.
gcc/testsuite/ChangeLog
PR target/110551
* gcc.target/i386/pr110551-2.c: New test case.
Other subword atomic patterns use riscv_subword_address to calculate
the aligned address, shift amount, mask and !mask. atomic_test_and_set
was implemented before the common function was added. After this patch
all subword atomic patterns use riscv_subword_address.
gcc/ChangeLog:
* config/riscv/sync.md: Use riscv_subword_address function to
calculate the address and shift in atomic_test_and_set.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Fixes: 3496ca4e65 ("RISC-V: Add runtime invariant support")
riscv_promote_function_mode doesn't promote a SI to DI for libcalls
case. It intends to do that however the code is broken (regression).
The fix is what generic promote_mode () in explow.cc does. I really
don't understand why the old code didn't work, but stepping thru the
debugger shows old code didn't and fixed does.
This showed up when testing Ajit's REE ABI extension series which probes
the ABI (using a NULL tree type) and ends up hitting the libcall code path.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_promote_function_mode): Fix mode
returned for libcall case.
Tested-by: Patrick O'Neill <patrick@rivosinc.com> # pre-commit-CI #526
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Add option Walloc-size that warns about allocations that have
insufficient storage for the target type of the pointer the
storage is assigned to. Added to Wextra.
PR c/71219
gcc:
* doc/invoke.texi: Document -Walloc-size option.
gcc/c-family:
* c.opt (Walloc-size): New option.
gcc/c:
* c-typeck.cc (convert_for_assignment): Add warning.
gcc/testsuite:
* gcc.dg/Walloc-size-1.c: New test.
* gcc.dg/Walloc-size-2.c: New test.
genautomata was writing the insn_has_dfa_reservation_p function
inside of the CPU_UNITS_QUERY conditional when it shouldn't have.
Move insn_has_dfa_reservation_p outside of conditional group.
gcc/ChangeLog:
* genautomata.cc (write_automata): move endif
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
This patch moves the call to TARGET_SIMD_CLONE_ADJUST until after the arguments
and return types have been transformed into vector types. It also constructs
the adjuments and retval modifications after this call, allowing targets to
alter the types of the arguments and return of the clone prior to the
modifications to the function definition.
gcc/ChangeLog:
* omp-simd-clone.cc (simd_clone_adjust_return_type): Hoist out code to
create return array and don't return new type.
(simd_clone_adjust_argument_types): Hoist out code that creates
ipa_param_body_adjustments and don't return them.
(simd_clone_adjust): Call TARGET_SIMD_CLONE_ADJUST after return and
argument types have been vectorized, create adjustments and return array
after the hook.
(expand_simd_clones): Call TARGET_SIMD_CLONE_ADJUST after return and
argument types have been vectorized.
Improve stack protector patterns and peephole2s to substitute stack
protector scratch register clear with unrelated subsequent register
initialization in several ways:
a. Explicitly generate scratch register as named pseudo. This allows
optimizers to eventually reuse the zero value in the register.
b. Allow scratch register in different mode (SWI48) than PTR mode:
d000: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
d007: 00 00
d009: 48 89 44 24 08 mov %rax,0x8(%rsp)
d00e: 8b 87 e0 01 00 00 mov 0x1e0(%rdi),%eax
SImode moves on x86 zero-extend to the whole DImode register,
so stack protector paranoia is not compromised.
c. Relax peephole2 constraint that stack protector scratch register
must match new initialized register. This relaxation substantially
improves peephole2 opportunities, and generates sequences like:
a310: 65 4c 8b 34 25 28 00 mov %gs:0x28,%r14
a317: 00 00
a319: 4c 89 74 24 08 mov %r14,0x8(%rsp)
a31e: 4c 8b b7 98 00 00 00 mov 0x98(%rdi),%r14
We have to ensure the new scratch is dead in front of the sequence.
The patch also fixes omission of earlyclobbers for all alternatives of
new initialized register in *stack_protect_set_3, avoiding the need for
reg_overlap_mentioned_p constraint. Earlyclobbers are per alternative,
not per operand.
Also, instructions are already valid in peephole2 pass, so we don't
have to explicitly re-check their operands for validity.
gcc/ChangeLog:
* config/i386/i386.md (stack_protect_set): Explicitly
generate scratch register in word mode.
(@stack_protect_set_1_<mode>): Rename to ...
(@stack_protect_set_1_<PTR:mode>_<SWI48:mode>): ... this.
Use SWI48 mode iterator to match scratch register.
(stack_protexct_set_1 peephole2): Use PTR, W and SWI48 mode
iterators to match peephole sequence. Use general_operand
predicate for operand 4. Allow different operand 2 and operand 3
registers and use peep2_reg_dead_p to ensure new scratch
register is dead before peephole seqeunce. Use peep2_reg_dead_p
to ensure old scratch register is dead after peephole sequence.
(*stack_protect_set_2_<mode>): Rename to ...
(*stack_protect_set_2_<mode>_si): .. this.
(*stack_protect_set_3): Rename to ...
(*stack_protect_set_2_<mode>_di): ... this.
Use PTR mode iterator to match stack protector memory move.
Use earlyclobber for all alternatives of operand 1.
(stack_protexct_set_2 peephole2): Use PTR, W and SWI48 mode
iterators to match peephole sequence. Use general_operand
predicate for operand 4. Allow different operand 2 and operand 3
registers and use peep2_reg_dead_p to ensure new scratch
register is dead before peephole seqeunce. Use peep2_reg_dead_p
to ensure old scratch register is dead after peephole sequence.