Now that the floating point version of rv_fold calculates its result
in an frange, we can remove the superfluous LB, UB, and MAYBE_NAN
arguments.
gcc/ChangeLog:
* range-op-float.cc (range_operator::fold_range): Remove
superfluous code.
(range_operator::rv_fold): Remove unneeded arguments.
(operator_plus::rv_fold): Same.
(operator_minus::rv_fold): Same.
(operator_mult::rv_fold): Same.
(operator_div::rv_fold): Same.
* range-op-mixed.h: Remove lb, ub, and maybe_nan arguments from
rv_fold methods.
* range-op.h: Same.
The floating point version of rv_fold returns its result in 3 pieces:
the lower bound, the upper bound, and a maybe_nan bit. It is cleaner
to return everything in an frange, thus bringing the floating point
version of rv_fold in line with the integer version.
This first patch adds an frange argument, while keeping the current
functionality, and asserting that we get the same results. In a
follow-up patch I will nuke the now useless 3 arguments. Splitting
this into two patches makes it easier to bisect any problems if any
should arise.
gcc/ChangeLog:
* range-op-float.cc (range_operator::fold_range): Pass frange
argument to rv_fold.
(range_operator::rv_fold): Add frange argument.
(operator_plus::rv_fold): Same.
(operator_minus::rv_fold): Same.
(operator_mult::rv_fold): Same.
(operator_div::rv_fold): Same.
* range-op-mixed.h: Add frange argument to rv_fold methods.
* range-op.h: Same.
On rv32gcv testcases like g++.target/riscv/rvv/base/bug-22.C fail with:
FAIL: g++.target/riscv/rvv/base/bug-22.C (test for excess errors)
Excess errors:
cc1plus: error: ABI requires '-march=rv32'
This patch adds the -mabi argument to g++ rvv tests.
gcc/testsuite/ChangeLog:
* g++.target/riscv/rvv/rvv.exp: Add -mabi argument to CFLAGS.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Similar to commit fb5d27be27
"libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]",
this is commit 5ff06d762a
"libatomic/test: Fix compilation for build sysroot" done differently,
avoiding build-tree testing use of any random gunk that may appear in
build-time 'CC'.
PR testsuite/109951
libatomic/
* configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'.
* Makefile.in: Regenerate.
* configure: Likewise.
* testsuite/Makefile.in: Likewise.
* testsuite/lib/libatomic.exp (libatomic_init): If
'--with-build-sysroot=[...]' was specified, use it for build-tree
testing.
* testsuite/libatomic-site-extra.exp.in (GCC_UNDER_TEST): Don't
set.
(SYSROOT_CFLAGS_FOR_TARGET): Set.
Similar to commit fb5d27be27
"libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]",
this is commit a0b48358cb
"libffi/test: Fix compilation for build sysroot" done differently,
avoiding build-tree testing use of any random gunk that may appear in
build-time 'CC', 'CXX'.
PR testsuite/109951
libffi/
* configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'.
<local.exp>: Don't set 'CC_FOR_TARGET', 'CXX_FOR_TARGET', instead
set 'SYSROOT_CFLAGS_FOR_TARGET'.
* Makefile.in: Regenerate.
* configure: Likewise.
* include/Makefile.in: Likewise.
* man/Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise.
* testsuite/lib/libffi.exp (libffi_target_compile): If
'--with-build-sysroot=[...]' was specified, use it for build-tree
testing.
dg-pch.exp handled dg-require-effective-target pch_supported_debug
as a special case, by grepping the source code. This patch tries
to generalise it to other dg-require-effective-targets, and to
dg-skip-if.
There also seemed to be some errors in check-flags. It used:
lappend $args [list <elt>]
which treats the contents of args as a variable name. I think
it was supposed to be "lappend args" instead. From the later
code, the element was supposed to be <elt> itself, rather than
a singleton list containing <elt>.
We can also save some time by doing the common early-exit first.
Doing this removes the need to specify the dg-require-effective-target
in both files. Tested by faking unsupported debug and checking that
the tests were still correctly skipped.
gcc/testsuite/
* lib/target-supports-dg.exp (check-flags): Move default argument
handling further up. Fix a couple of issues in the lappends.
Avoid frobbing the compiler flags if the return value is already
known to be 1.
* lib/dg-pch.exp (dg-flags-pch): Process the dg-skip-if and
dg-require-effective-target directives to see whether the
assembly test should be skipped.
* gcc.dg/pch/valid-1.c: Remove dg-require-effective-target.
* gcc.dg/pch/valid-1b.c: Likewise.
For normal optimization for the Arm state in gcc we get an uncompressed
table of jump targets. This is in the middle of the text segment
far larger than necessary, especially at -Os.
This patch compresses the table to use deltas in a similar manner to
Thumb code generation.
Similar code is also used for -fpic where we currently generate a jump
to a jump. In this format the jumps are too dense for the hardware branch
predictor to handle accurately, so execution is likely to be very expensive.
Changes to switch statements for arm include a new function to handle the
assembly generation for different machine modes. This allows for more
optimisation to be performed in aout.h where arm has switched from using
ASM_OUTPUT_ADDR_VEC_ELT to using ASM_OUTPUT_ADDR_DIFF_ELT.
In ASM_OUTPUT_ADDR_DIFF_ELT new assembly generation options have been
added to utilise the different machine modes. Additional changes
made to the casesi expand and insn, CASE_VECTOR_PC_RELATIVE,
CASE_VECTOR_SHORTEN_MODE and LABEL_ALIGN_AFTER_BARRIER are all
to accomodate this new approach to switch statement generation.
New tests have been added and no regressions on arm-none-eabi.
gcc/ChangeLog:
* config/arm/aout.h (ASM_OUTPUT_ADDR_DIFF_ELT): Add table output
for different machine modes for arm.
* config/arm/arm-protos.h (arm_output_casesi): New prototype.
* config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Make arm use
ASM_OUTPUT_ADDR_DIFF_ELT.
(CASE_VECTOR_SHORTEN_MODE): Change table size calculation for
TARGET_ARM.
(LABEL_ALIGN_AFTER_BARRIER): Change to accommodate .p2align 2
for TARGET_ARM.
* config/arm/arm.cc (arm_output_casesi): New function.
* config/arm/arm.md (arm_casesi_internal): Change casesi expand
and insn.
for arm to use new function arm_output_casesi.
gcc/testsuite/ChangeLog:
* gcc.target/arm/arm-switchstatement.c: New test.
Now we have shifted to using the same relocation mechanism as clang for
objective-c typeinfo the static linker needs to have a linker-visible
symbol for metadata names (this is only needed for GNU objective C, for
NeXT the names are in separate sections).
gcc/ChangeLog:
* config/darwin.h
(darwin_label_is_anonymous_local_objc_name): Make metadata names
linker-visibile for GNU objective C.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
RISCV target developers reported that pseudos with equivalence used in
a loop can be spilled. Simple changes of heuristics of cost
calculation of pseudos with equivalence or even ignoring equivalences
resulted in numerous testsuite failures on different targets or worse
spec2017 performance. This patch implements more sophisticated cost
calculations of pseudos with equivalences. The patch does not change
RA behaviour for targets still using the old reload pass instead of
LRA. The patch solves the reported problem and improves x86-64
specint2017 a bit (specfp2017 performance stays the same). The patch
takes into account how the equivalence will be used: will it be
integrated into the user insns or require an input reload insn. It
requires additional pass over insns. To compensate RA slow down, the
patch removes a pass over insns in the reload pass used by IRA before.
This also decouples IRA from reload more and will help to remove the
reload pass in the future if it ever happens.
gcc/ChangeLog:
* dwarf2out.cc (reg_loc_descriptor): Use lra_eliminate_regs when
LRA is used.
* ira-costs.cc: Include regset.h.
(equiv_can_be_consumed_p, get_equiv_regno, calculate_equiv_gains):
New functions.
(find_costs_and_classes): Call calculate_equiv_gains and redefine
mem_cost of pseudos with equivs when LRA is used.
* var-tracking.cc: Include ira.h and lra.h.
(vt_initialize): Use lra_eliminate_regs when LRA is used.
In the context of an OpenMP declare variant directive, arguments of type C_PTR
are sometimes recognised as C_PTR in the base function and as INTEGER(8) in the
variant - or the other way around, depending on the parsing order.
This patch prevents such situation from turning into a compile error.
2023-10-20 Paul-Antoine Arras <pa@codesourcery.com>
Tobias Burnus <tobias@codesourcery.com>
gcc/fortran/ChangeLog:
* interface.cc (gfc_compare_types): Return true if one type is C_PTR
and the other is a compatible INTEGER(8).
* misc.cc (gfc_typename): Handle the case where an INTEGER(8) actually
holds a TYPE(C_PTR).
gcc/testsuite/ChangeLog:
* gfortran.dg/c_ptr_tests_20.f90: New test, checking that INTEGER(8)
and TYPE(C_PTR) are recognised as compatible.
* gfortran.dg/c_ptr_tests_21.f90: New test, exercising the error
detection for C_FUNPTR.
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.
For the new test case:
const int table[2] = {1, 2};
int foo (char i) { return table[i]; }
compiling with -O2 -mlarge on msp430 we currently see:
Trying 2 -> 7:
2: r25:HI=zero_extend(R12:QI)
REG_DEAD R12:QI
7: r28:PSI=sign_extend(r25:HI)#0
REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
(zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
which results in the following code:
foo: AND #0xff, R12
RLAM.A #4, R12 { RRAM.A #4, R12
RLAM.A #1, R12
MOVX.W table(R12), R12
RETA
With this patch, we now see:
Trying 2 -> 7:
2: r25:HI=zero_extend(R12:QI)
REG_DEAD R12:QI
7: r28:PSI=sign_extend(r25:HI)#0
REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
(zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8
foo: MOV.B R12, R12
RLAM.A #1, R12
MOVX.W table(R12), R12
RETA
2023-10-26 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR rtl-optimization/91865
* combine.cc (make_compound_operation): Avoid creating a
ZERO_EXTEND of a ZERO_EXTEND.
gcc/testsuite/ChangeLog
PR rtl-optimization/91865
* gcc.target/msp430/pr91865.c: New test case.
gcc/c/ChangeLog:
* c-typeck.cc (build_vec_cmp): Pass type of arg0 to
truth_type_for.
gcc/cp/ChangeLog:
* typeck.cc (build_vec_cmp): Pass type of arg0 to
truth_type_for.
If the vcond_mask patterns don't support fp modes, the vector
FP comparison instructions will not be generated.
gcc/ChangeLog:
* config/loongarch/lasx.md (vcond_mask_<ILASX:mode><ILASX:mode>): Change to
(vcond_mask_<mode><mode256_i>): this.
* config/loongarch/lsx.md (vcond_mask_<ILSX:mode><ILSX:mode>): Change to
(vcond_mask_<mode><mode_i>): this.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-vcond-1.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-vcond-2.c: New test.
Currently _BitInt is only supported on x86_64 which means that for other
targets all tests fail with e.g.
gcc.misc-tests/godump-1.c:237:1: sorry, unimplemented: '_BitInt(32)' is not supported on this target
237 | _BitInt(32) b32_v;
| ^~~~~~~
Instead of requiring _BitInt support for godump-1.c, move _BitInt tests
into godump-2.c such that all other tests in godump-1.c are still
executed in case of missing _BitInt support.
gcc/testsuite/ChangeLog:
* gcc.misc-tests/godump-1.c: Move _BitInt tests into godump-2.c.
* gcc.misc-tests/godump-2.c: New test.
Per commit a8b522b483 (Subversion r251048)
"Introduce TARGET_SUPPORTS_ALIASES", there is the idea that a back end may or
may not provide symbol aliasing support ('TARGET_SUPPORTS_ALIASES') independent
of '#ifdef ASM_OUTPUT_DEF', and in particular, depending not just on static but
instead on dynamic (run-time) configuration. There did remain a few instances
where we currently still assume that from '#ifdef ASM_OUTPUT_DEF' follows
'TARGET_SUPPORTS_ALIASES'. Change these to 'if (TARGET_SUPPORTS_ALIASES)',
similarly, or 'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);'.
gcc/
* ipa-icf.cc (sem_item::target_supports_symbol_aliases_p):
'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);' before
'return true;'.
* ipa-visibility.cc (function_and_variable_visibility): Change
'#ifdef ASM_OUTPUT_DEF' to 'if (TARGET_SUPPORTS_ALIASES)'.
* varasm.cc (output_constant_pool_contents)
[#ifdef ASM_OUTPUT_DEF]:
'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);'.
(do_assemble_alias) [#ifdef ASM_OUTPUT_DEF]:
'if (!TARGET_SUPPORTS_ALIASES)',
'gcc_checking_assert (seen_error ());'.
(assemble_alias): Change '#if !defined (ASM_OUTPUT_DEF)' to
'if (!TARGET_SUPPORTS_ALIASES)'.
(default_asm_output_anchor):
'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);'.
Set execution count of EH blocks, and probability of EH edges.
for gcc/ChangeLog
PR tree-optimization/111520
* gimple-harden-conditionals.cc
(pass_harden_compares::execute): Set EH edge probability and
EH block execution count.
for gcc/testsuite/ChangeLog
PR tree-optimization/111520
* g++.dg/torture/harden-comp-pr111520.cc: New.
For Darwin, PIE requires PIC codegen, but otherwise is only a link-time
change. For almost all Darwin, we do not report __PIE__; the exception is
32bit X86 and from Darwin12 to 17 only (32 bit is no longer supported
after Darwin17).
gcc/ChangeLog:
* config/darwin.cc (darwin_override_options): Handle fPIE.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Currently, the sed command used to parse --with-{cpu,tune,arch} are
using GNU-specific extension (automatically recognising extended regex).
This is failing on Darwin, which defualts to Posix behaviour.
However '-E' is accepted to indicate an extended RE. Strictly, this
is also not really sufficient, since we should only require a Posix
sed.
gcc/ChangeLog:
* config.gcc: Use -E to to sed to indicate that we are using
extended REs.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Mention front-end uses of the address_space bit-field, and remove the
inaccurate "only".
gcc/ChangeLog:
* tree-core.h (struct tree_base): Update address_space comment.
Further improve immediate generation by adding support for 2-instruction
MOV/EOR bitmask immediates. This reduces the number of 3/4-instruction
immediates in SPECCPU2017 by ~2%.
Reviewed-by: Richard Earnshaw <Richard.Earnshaw@arm.com>
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_internal_mov_immediate)
Add support for immediates using MOV/EOR bitmask.
gcc/testsuite:
* gcc.target/aarch64/imm_choice_comparison.c: Change tests.
* gcc.target/aarch64/moveor_imm.c: Add new test.
* gcc.target/aarch64/pr106583.c: Change tests.
It's incorrect to say that the address of an OFFSET_REF is always a
pointer-to-member; if it represents an overload set with both static and
non-static member functions that ends up resolving to a static one, the
address is a normal pointer. And let's go ahead and mention explicit object
member functions even though the patch hasn't landed yet.
gcc/cp/ChangeLog:
* cp-tree.def: Improve OFFSET_REF comment.
* cp-gimplify.cc (cp_fold_immediate): Add to comment.
Narrow test instructions with immediate operand that test memory location
for zero. E.g. testl $0x00aa0000, mem can be converted to testb $0xaa, mem+2.
Reject targets where reading (possibly unaligned) part of memory location
after a large write to the same address causes store-to-load forwarding stall.
PR target/111698
gcc/ChangeLog:
* config/i386/x86-tune.def (X86_TUNE_PARTIAL_MEMORY_READ_STALL):
New tune.
* config/i386/i386.h (TARGET_PARTIAL_MEMORY_READ_STALL): New macro.
* config/i386/i386.md: New peephole pattern to narrow test
instructions with immediate operands that test memory locations
for zero.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr111698.c: New test.
A common pattern to to append a range to an existing range via union.
This optimizes that process.
* value-range.cc (irange::union_append): New.
(irange::union_): Call union_append when appropriate.
* value-range.h (irange::union_append): New prototype.
gcc/ChangeLog:
* config/loongarch/loongarch.md (get_thread_pointer<mode>):Adds the
instruction template corresponding to the __builtin_thread_pointer
function.
* doc/extend.texi:Add the __builtin_thread_pointer function support
description to the documentation.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/builtin_thread_pointer.c: New test.
We accept the non-dependent call f(e) here ever since the
NON_DEPENDENT_EXPR removal patch r14-4793-gdad311874ac3b3.
I haven't looked closely into why but I suspect wrapping 'e'
in a NON_DEPENDENT_EXPR was causing the argument conversion
to misbehave.
PR c++/99804
gcc/testsuite/ChangeLog:
* g++.dg/template/enum9.C: New test.
In order for std::stacktrace to be used in a shared library, the
libbacktrace symbols need to be built with -fPIC. Add the libtool
-prefer-pic flag to the commands in src/libbacktrace/Makefile so that
the archive contains PIC objects.
libstdc++-v3/ChangeLog:
PR libstdc++/111936
* src/libbacktrace/Makefile.am: Add -prefer-pic to libtool
compile commands.
* src/libbacktrace/Makefile.in: Regenerate.
This patch introduces isnan, isnanf and isnanl to Builtins.def.
It requires fallback functions isnan, isnanf, isnanl to be implemented in
libgm2/libm2pim/wrapc.cc and gm2-libs-ch/wrapc.c.
Access to the GCC builtin isnan tree is provided by adding
an isnan definition and support functions to gm2-gcc/m2builtins.cc.
gcc/m2/ChangeLog:
PR modula2/111955
* gm2-gcc/m2builtins.cc (gm2_isnan_node): New tree.
(DoBuiltinIsnan): New function.
(m2builtins_BuiltInIsnan): New function.
(m2builtins_init): Initialize gm2_isnan_node.
(list_of_builtins): Add define for __builtin_isnan.
* gm2-libs-ch/wrapc.c (wrapc_isnan): New function.
(wrapc_isnanf): New function.
(wrapc_isnanl): New function.
* gm2-libs/Builtins.def (isnanf): New procedure function.
(isnan): New procedure function.
(isnanl): New procedure function.
* gm2-libs/Builtins.mod:
* gm2-libs/wrapc.def (isnan): New function.
(isnanf): New function.
(isnanl): New function.
libgm2/ChangeLog:
PR modula2/111955
* libm2pim/wrapc.cc (isnan): Export new function.
(isnanf): Export new function.
(isnanl): Export new function.
gcc/testsuite/ChangeLog:
PR modula2/111955
* gm2/pimlib/run/pass/testnan.mod: New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
This patch adds some RTL-SSA helper functions. They will be
used by the upcoming late-combine pass.
The patch contains the first non-template out-of-line function declared
in movement.h, so it adds a movement.cc. I realise it seems a bit
over-the-top to have a file with just one function, but it might grow
in future. :)
gcc/
* Makefile.in (OBJS): Add rtl-ssa/movement.o.
* rtl-ssa/access-utils.h (accesses_include_nonfixed_hard_registers)
(single_set_info): New functions.
(remove_uses_of_def, accesses_reference_same_resource): Declare.
(insn_clobbers_resources): Likewise.
* rtl-ssa/accesses.cc (rtl_ssa::remove_uses_of_def): New function.
(rtl_ssa::accesses_reference_same_resource): Likewise.
(rtl_ssa::insn_clobbers_resources): Likewise.
* rtl-ssa/movement.h (can_move_insn_p): Declare.
* rtl-ssa/movement.cc: New file.
The first in-tree use of RTL-SSA was fwprop, and one of the goals
was to make the fwprop rewrite preserve the old behaviour as far
as possible. The switch to RTL-SSA was supposed to be a pure
infrastructure change. So RTL-SSA has various FIXMEs for things
that were artifically limited to faciliate the old-fwprop vs.
new-fwprop comparison.
One of the things that fwprop wants to do is extend live ranges, and
function_info::make_use_available tried to keep within the cases that
old fwprop could handle.
Since the information is built in extended basic blocks, it's easy
to handle intra-EBB queries directly. This patch does that, and
removes the associated FIXME.
To get a flavour for how much difference this makes, I tried compiling
the testsuite at -Os for at least one target per supported CPU and OS.
For most targets, only a handful of tests changed, but the vast majority
of changes were positive. The only target that seemed to benefit
significantly was i686-apple-darwin.
The main point of the patch is to remove the FIXME and to enable
the upcoming post-RA late-combine pass to handle more cases.
gcc/
* rtl-ssa/functions.h (function_info::remains_available_at_insn):
New member function.
* rtl-ssa/accesses.cc (function_info::remains_available_at_insn):
Likewise.
(function_info::make_use_available): Avoid false negatives for
queries within an EBB.
rtl_ssa::changes_are_worthwhile used the standard approach
of summing up the individual costs of the old and new sequences
to see which one is better overall. But when optimising for
speed and changing instructions in multiple blocks, it seems
better to weight the cost of each instruction by its execution
frequency. (We already do something similar for SLP layouts.)
gcc/
* rtl-ssa/changes.cc: Include sreal.h.
(rtl_ssa::changes_are_worthwhile): When optimizing for speed,
scale the cost of each instruction by its execution frequency.
In order to save (a lot of) memory, RTL-SSA avoids creating
individual clobber records for every call-clobbered register.
It instead maintains a list & splay tree of calls in an EBB,
grouped by ABI.
This patch takes these call clobbers into account in a couple
more routines. I don't think this will have any effect on
existing users, since it's only necessary for hard registers.
gcc/
* rtl-ssa/access-utils.h (next_call_clobbers): New function.
(is_single_dominating_def, remains_available_on_exit): Replace with...
* rtl-ssa/functions.h (function_info::is_single_dominating_def)
(function_info::remains_available_on_exit): ...these new member
functions.
(function_info::m_clobbered_by_calls): New member variable.
* rtl-ssa/functions.cc (function_info::function_info): Explicitly
initialize m_clobbered_by_calls.
* rtl-ssa/insns.cc (function_info::record_call_clobbers): Update
m_clobbered_by_calls for each call-clobber note.
* rtl-ssa/member-fns.inl (function_info::is_single_dominating_def):
New function. Check for call clobbers.
* rtl-ssa/accesses.cc (function_info::remains_available_on_exit):
Likewise.
The exit block can have multiple predecessors, for example if the
function calls __builtin_eh_return. We might then need PHI nodes
for values that are live on exit.
RTL-SSA uses the normal dominance frontiers approach for calculating
where PHI nodes are needed. However, dominannce.cc only calculates
dominators for normal blocks, not the exit block.
calculate_dominance_frontiers likewise only calculates dominance
frontiers for normal blocks.
This patch fills in the “missing” frontiers manually.
gcc/
* rtl-ssa/internals.h (build_info::exit_block_dominator): New
member variable.
* rtl-ssa/blocks.cc (build_info::build_info): Initialize it.
(bb_walker::bb_walker): Use it, moving the computation of the
dominator to...
(function_info::process_all_blocks): ...here.
(function_info::place_phis): Add dominance frontiers for the
exit block.
If an optimisation removes the last real use of a definition,
there can still be artificial uses left. This patch removes
those uses too.
These artificial uses exist because RTL-SSA is only an SSA-like
view of the existing RTL IL, rather than a native SSA representation.
It effectively treats RTL registers like gimple vops, but with the
addition of an RPO view of the register's lifetime(s). Things are
structured to allow most operations to update this RPO view in
amortised sublinear time.
gcc/
* rtl-ssa/functions.h (function_info::process_uses_of_deleted_def):
New member function.
* rtl-ssa/changes.cc (function_info::process_uses_of_deleted_def):
Likewise.
(function_info::change_insns): Use it.
Sometimes an optimisation can remove a clobber of scratch registers
or scratch memory. We then need to update the DU chains to reflect
the removed clobber.
For registers this isn't a problem. Clobbers of registers are just
momentary blips in the register's lifetime. They act as a barrier for
moving uses later or defs earlier, but otherwise they have no effect on
the semantics of other instructions. Removing a clobber is therefore a
cheap, local operation.
In contrast, clobbers of memory are modelled as full sets.
This is because (a) a clobber of memory does not invalidate
*all* memory and (b) it's a common idiom to use (clobber (mem ...))
in stack barriers. But removing a set and redirecting all uses
to a different set is a linear operation. Doing it for potentially
every optimisation could lead to quadratic behaviour.
This patch therefore refrains from removing sets of memory that appear
to be redundant. There's an opportunity to clean this up in linear time
at the end of the pass, but as things stand, nothing would benefit from
that.
This is also a very rare event. Usually we should try to optimise the
insn before the scratch memory has been allocated.
gcc/
* rtl-ssa/changes.cc (function_info::finalize_new_accesses):
If a change describes a set of memory, ensure that that set
is kept, regardless of the insn pattern.
Unlike REG_DEAD notes, REG_UNUSED notes need to be kept free of
false positives by all passes. function_info::change_insns
does this by removing all REG_UNUSED notes, and then using
add_reg_unused_notes to add notes back (or create new ones)
where appropriate.
The problem was that it called add_reg_unused_notes on the fly
while updating each instruction, which meant that the information
for later instructions in the change set wasn't up to date.
This patch does it in a separate loop instead.
gcc/
* rtl-ssa/changes.cc (function_info::apply_changes_to_insn): Remove
call to add_reg_unused_notes and instead...
(function_info::change_insns): ...use a separate loop here.
RTL-SSA mostly relies on DF for block-level register liveness
information, including artificial uses and defs at the beginning
and end of blocks. But one case was missing. DF does not add
artificial uses of global registers to the beginning or end
of a block. Instead it marks them as used within every block
when computing LR and LIVE problems.
For RTL-SSA, global registers behave like memory, which in
turn behaves like gimple vops. We need to ensure that they
are live on exit so that final definitions do not appear
to be unused.
Also, the previous live-on-exit handling only considered the exit
block itself. It needs to consider non-local gotos as well, since
they jump directly to some code in a parent function and so do
not have a path to the exit block.
gcc/
* rtl-ssa/blocks.cc (function_info::add_artificial_accesses): Force
global registers to be live on exit. Handle any block with zero
successors like an exit block.
As noted in recent commit 3a3596389c
"OpenACC 2.7: Implement self clause for compute constructs", the OpenACC 'self'
clause very much relates to the 'if' clause, and therefore copies a lot of the
latter's handling. Therefore it makes sense to also place this handling in
proximity to that of the 'if' clause, which was done in a lot but not all
instances.
gcc/
* tree-core.h (omp_clause_code): Move 'OMP_CLAUSE_SELF' after
'OMP_CLAUSE_IF'.
* tree-pretty-print.cc (dump_omp_clause): Adjust.
* tree.cc (omp_clause_num_ops, omp_clause_code_name): Likewise.
* tree.h: Likewise.
'gcc/c-family/c-pragma.h:pragma_omp_clause' already defines
'PRAGMA_OACC_CLAUSE_SELF', but it has no longer been used for the 'update'
directive's 'self' clause as of 2018
commit 829c6349e9 (Subversion r261813)
"Update OpenACC data clause semantics to the 2.5 behavior". That one instead
mapped the 'self' pragma token to the 'host' one (same semantics). That means
that we're later not able to tell whether originally we had seen 'self' or
'host', which was OK as long as only the 'update' directive had a 'self'
clause. However, as of recent commit 3a3596389c
"OpenACC 2.7: Implement self clause for compute constructs", also OpenACC
compute constructs may have a 'self' clause -- with different semantics. That
means, we need to know which OpenACC directive we're parsing clauses for, which
can be done in a simpler way than in that commit, similar to how the OpenMP
'to' clause is handled.
While at that, clarify that (already in OpenACC 2.0a)
"The 'host' clause is a synonym for the 'self' clause." -- not the other way
round.
gcc/c/
* c-parser.cc (c_parser_omp_clause_name): Return
'PRAGMA_OACC_CLAUSE_SELF' for "self".
(c_parser_oacc_data_clause, OACC_UPDATE_CLAUSE_MASK): Adjust.
(c_parser_oacc_all_clauses): Remove 'bool compute_p' formal
parameter, and instead locally determine whether we're called for
an OpenACC compute construct or OpenACC 'update' directive.
(c_parser_oacc_compute): Adjust.
gcc/cp/
* parser.cc (cp_parser_omp_clause_name): Return
'PRAGMA_OACC_CLAUSE_SELF' for "self".
(cp_parser_oacc_data_clause, OACC_UPDATE_CLAUSE_MASK): Adjust.
(cp_parser_oacc_all_clauses): Remove 'bool compute_p' formal
parameter, and instead locally determine whether we're called for
an OpenACC compute construct or OpenACC 'update' directive.
(cp_parser_oacc_compute): Adjust.
gcc/fortran/
* openmp.cc (omp_mask2): Split 'OMP_CLAUSE_HOST_SELF' into
'OMP_CLAUSE_SELF', 'OMP_CLAUSE_HOST'.
(gfc_match_omp_clauses, OACC_UPDATE_CLAUSES): Adjust.
As discovered via recent
commit 3a3596389c
"OpenACC 2.7: Implement self clause for compute constructs",
'c-c++-common/goacc/if-clause-1.c', which the new
'c-c++-common/goacc/self-clause-1.c' was copied from, was not enabled for C++.
gcc/testsuite/
* c-c++-common/goacc/if-clause-1.c: Enable for C++
* c-c++-common/goacc/self-clause-1.c: Likewise.
This patch implements the 'self' clause for compute constructs: parallel,
kernels, and serial. This clause conditionally uses the local device
(the host mult-core CPU) as the executing device of the compute region.
The actual implementation of the "local device" device type inside libgomp
(presumably using pthreads) is still not yet completed, so the libgomp
side is still implemented the exact same as host-fallback mode. (so as of now,
it essentially behaves like the 'if' clause with the condition inverted)
gcc/c/ChangeLog:
* c-parser.cc (c_parser_oacc_compute_clause_self): New function.
(c_parser_oacc_all_clauses): Add new 'bool compute_p = false'
parameter, add parsing of self clause when compute_p is true.
(OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF.
(OACC_PARALLEL_CLAUSE_MASK): Likewise,
(OACC_SERIAL_CLAUSE_MASK): Likewise.
(c_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to
set compute_p argument to true.
* c-typeck.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_oacc_compute_clause_self): New function.
(cp_parser_oacc_all_clauses): Add new 'bool compute_p = false'
parameter, add parsing of self clause when compute_p is true.
(OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF.
(OACC_PARALLEL_CLAUSE_MASK): Likewise,
(OACC_SERIAL_CLAUSE_MASK): Likewise.
(cp_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to
set compute_p argument to true.
* pt.cc (tsubst_omp_clauses): Add OMP_CLAUSE_SELF case.
* semantics.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case, merged
with OMP_CLAUSE_IF case.
gcc/fortran/ChangeLog:
* gfortran.h (typedef struct gfc_omp_clauses): Add self_expr field.
* openmp.cc (enum omp_mask2): Add OMP_CLAUSE_SELF.
(gfc_match_omp_clauses): Add handling for OMP_CLAUSE_SELF.
(OACC_PARALLEL_CLAUSES): Add OMP_CLAUSE_SELF.
(OACC_KERNELS_CLAUSES): Likewise.
(OACC_SERIAL_CLAUSES): Likewise.
(resolve_omp_clauses): Add handling for omp_clauses->self_expr.
* trans-openmp.cc (gfc_trans_omp_clauses): Add handling of
clauses->self_expr and building of OMP_CLAUSE_SELF tree clause.
(gfc_split_omp_clauses): Add handling of self_expr field copy.
gcc/ChangeLog:
* gimplify.cc (gimplify_scan_omp_clauses): Add OMP_CLAUSE_SELF case.
(gimplify_adjust_omp_clauses): Likewise.
* omp-expand.cc (expand_omp_target): Add OMP_CLAUSE_SELF expansion code,
* omp-low.cc (scan_sharing_clauses): Add OMP_CLAUSE_SELF case.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_SELF enum.
* tree-nested.cc (convert_nonlocal_omp_clauses): Add OMP_CLAUSE_SELF
case.
(convert_local_omp_clauses): Likewise.
* tree-pretty-print.cc (dump_omp_clause): Add OMP_CLAUSE_SELF case.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_SELF entry.
(omp_clause_code_name): Likewise.
* tree.h (OMP_CLAUSE_SELF_EXPR): New macro.
gcc/testsuite/ChangeLog:
* c-c++-common/goacc/self-clause-1.c: New test.
* c-c++-common/goacc/self-clause-2.c: New test.
* gfortran.dg/goacc/self.f95: New test.
include/ChangeLog:
* gomp-constants.h (GOACC_FLAG_LOCAL_DEVICE): New flag bit value.
libgomp/ChangeLog:
* oacc-parallel.c (GOACC_parallel_keyed): Add code to handle
GOACC_FLAG_LOCAL_DEVICE case.
* testsuite/libgomp.oacc-c-c++-common/self-1.c: New test.
The 'if' clause with modifier was introduced in
commit b4c3a85be9 (Subversion r242037)
"Partial OpenMP 4.5 fortran support", but -- in some instances -- didn't place
it next to the existing handling of 'if' clause without modifier. Unify that;
no change in behavior.
gcc/fortran/
* dump-parse-tree.cc (show_omp_clauses): Group handling of 'if'
clause without and with modifier.
* frontend-passes.cc (gfc_code_walker): Likewise.
* gfortran.h (gfc_omp_clauses): Likewise.
* openmp.cc (gfc_free_omp_clauses): Likewise.