This change is motivated by a patchset that adds bfloat16 debugging
support for new avx512 instructions to GDB. The gdb thread can be found
here: https://sourceware.org/pipermail/gdb-patches/2020-July/170820.html
include:
2020-08-17 Felix Willgerodt <felix.willgerodt@intel.com>
* floatformat.h (floatformat_bfloat16_big): New.
(floatformat_bfloat16_little): New.
libiberty:
2020-08-17 Felix Willgerodt <felix.willgerodt@intel.com>
* floatformat.c (floatformat_bfloat16_big): New.
(floatformat_bfloat16_little): New.
PR analyzer/96949 reports an ICE with
--param analyzer-max-svalue-depth=0, where the param value leads
to INTEGER_CST values in a RANGE_EXPR being treated as unknown
symbolic values.
This patch replaces implicit assumptions that these values are
concrete (and thus have concrete bit offsets), adding
error-handling for symbolic cases instead of assertions.
gcc/analyzer/ChangeLog:
PR analyzer/96949
* store.cc (binding_map::apply_ctor_val_to_range): Add
error-handling for the cases where we have symbolic offsets.
gcc/testsuite/ChangeLog:
PR analyzer/96949
* gfortran.dg/analyzer/pr96949.f90: New test.
gcc/analyzer/ChangeLog:
PR analyzer/96950
* store.cc (binding_map::apply_ctor_to_region): Handle RANGE_EXPR
where min_index == max_index.
(binding_map::apply_ctor_val_to_range): Replace assertion that we
don't have a CONSTRUCTOR value with error-handling.
In g:ee7bfbe5eb70a23bbf3a2cedfdcbd2ea1a20c3f2 I added a
switch (DECL_UNCHECKED_FUNCTION_CODE (callee_fndecl))
to region_model::on_call_pre guarded by
fndecl_built_in_p (callee_fndecl).
I meant to handle only normal built-ins, whereas this
single-argument overload of fndecl_built_in_p returns true for any
kind of built-in.
PR analyzer/96962 reports a case where this matches for a
machine-specific builtin, leading to an ICE. Fixed thusly.
gcc/analyzer/ChangeLog:
PR analyzer/96962
* region-model.cc (region_model::on_call_pre): Fix guard on switch
on built-ins to only consider BUILT_IN_NORMAL, rather than other
kinds of build-ins.
The assembly code ".mspabi_attribute 4,1" uses the object attribute
mechanism to indicate that the 430 ISA is in use. However, the default
ISA is 430X, so GAS fails to assemble this since the ISA wasn't also set
to 430 on the command line.
gcc/ChangeLog:
* config/msp430/msp430.c (msp430_file_end): Fix jumbled
HAVE_AS_MSPABI_ATTRIBUTE and HAVE_AS_GNU_ATTRIBUTE checks.
* configure: Regenerate.
* configure.ac: Use ".mspabi_attribute 4,2" to check for assembler
support for this object attribute directive.
Rather than implementing support within D runtime itself, use libc
getcontext/swapcontext functions if CET is enabled.
Removes whatever CET support was in the switchContext routine for x86
D runtime, along with setting version AsmExternal, so that the fallback
ucontext_t implementation is used, which is capable of doing shadow
stack handling.
libphobos/ChangeLog:
PR d/95680
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac (DCFG_ENABLE_CET): Substitute.
* libdruntime/Makefile.in: Regenerate.
* libdruntime/config/x86/switchcontext.S: Remove CET support code.
* libdruntime/core/thread.d: Import gcc.config. Don't set version
AsmExternal when GNU_Enable_CET is true.
* libdruntime/gcc/config.d.in (GNU_Enable_CET): Define.
* src/Makefile.in: Regenerate.
* testsuite/Makefile.in: Regenerate.
The -mcpu= option accepts only a handful of string values.
Using enums instead of strings to handle the accepted values removes the
need to have specific processing of the strings in the backend, and
simplifies any comparisons which need to be performed on the value.
It also allows the default value to have semantic equivalence to a user
set value, whilst retaining the ability to differentiate between them.
Practically, this allows a user set -mcpu= value to override the the ISA set by
-mmcu, whilst the default -mcpu= value can still have an explicit meaning.
gcc/ChangeLog:
* common/config/msp430/msp430-common.c (msp430_handle_option): Remove
OPT_mcpu_ handling.
Set target_cpu value to new enum values when parsing certain -mmcu=
values.
* config/msp430/msp430-opts.h (enum msp430_cpu_types): New.
* config/msp430/msp430.c (msp430_option_override): Handle new
target_cpu enum values.
Set target_cpu using extracted value for given MCU when -mcpu=
option is not passed by the user.
* config/msp430/msp430.opt: Handle -mcpu= values using enums.
gcc/testsuite/ChangeLog:
* gcc.target/msp430/mcpu-is-430.c: New test.
* gcc.target/msp430/mcpu-is-430x.c: New test.
* gcc.target/msp430/mcpu-is-430xv2.c: New test.
Running the libiberty testsuite
./test-demangle < libiberty/testsuite/d-demangle-expected
libiberty/d-demangle.c:214:14: runtime error: signed integer overflow: 922337203 * 10 cannot be represented in type 'long int'
On looking at silencing ubsan, I found a real bug in dlang_number.
For a 32-bit long, some overflows won't be detected. For example,
21474836480. Why? Well 214748364 * 10 is 0x7FFFFFF8 (no overflow so
far). Adding 8 gives 0x80000000 (which does overflow but there is no
test for that overflow in the code). Then multiplying 0x80000000 * 10
= 0x500000000 = 0 won't be caught by the multiplication overflow test.
The same holds for a 64-bit long using similarly crafted digit
sequences.
* d-demangle.c: Include limits.h.
(ULONG_MAX, UINT_MAX): Provide fall-back definition.
(dlang_number): Simplify and correct overflow test. Only
write *ret on returning non-NULL. Make "ret" an unsigned long*.
Only succeed for result of [0,UINT_MAX].
(dlang_decode_backref): Simplify and correct overflow test.
Only write *ret on returning non-NULL. Only succeed for
result [1,MAX_LONG].
(dlang_backref): Remove now unnecessary range check.
(dlang_symbol_name_p): Likewise.
(string_need): Take a size_t n arg, and use size_t tem.
(string_append): Use size_t n.
(string_appendn, string_prependn): Take a size_t n arg.
(TEMPLATE_LENGTH_UNKNOWN): Define as -1UL.
(dlang_lname, dlang_parse_template): Take an unsigned long len
arg.
(dlang_symbol_backref, dlang_identifier, dlang_parse_integer),
(dlang_parse_integer, dlang_parse_string),
(dlang_parse_arrayliteral, dlang_parse_assocarray),
(dlang_parse_structlit, dlang_parse_tuple),
(dlang_template_symbol_param, dlang_template_args): Use
unsigned long variables.
* testsuite/d-demangle-expected: Add new tests.
When rounding a real to the nearest integer, temporarily convert the real
argument to a longer real kind when the result is of type/kind integer(16).
gcc/fortran/ChangeLog:
* trans-intrinsic.c (build_round_expr): Use temporary with
appropriate kind for conversion before rounding to nearest
integer when the result precision is 128 bits.
gcc/testsuite/ChangeLog:
* gfortran.dg/pr96711.f90: New test.
This PR is about LRA cycling for a reload of the form:
----------------------------------------------------------------------------
Changing pseudo 196 in operand 1 of insn 103 on equiv [r105:DI*0x8+r140:DI]
Creating newreg=287, assigning class ALL_REGS to slow/invalid mem r287
Creating newreg=288, assigning class ALL_REGS to slow/invalid mem r288
103: r203:SI=r288:SI<<0x1+r196:DI#0
REG_DEAD r196:DI
Inserting slow/invalid mem reload before:
316: r287:DI=[r105:DI*0x8+r140:DI]
317: r288:SI=r287:DI#0
----------------------------------------------------------------------------
The problem is with r287. We rightly give it a broad starting class of
POINTER_AND_FP_REGS (reduced from ALL_REGS by preferred_reload_class).
However, we never make forward progress towards narrowing it down to
a specific choice of class (POINTER_REGS or FP_REGS).
I think in practice we rely on two things to narrow a reload pseudo's
class down to a specific choice:
(1) a restricted class is specified when the pseudo is created
This happens for input address reloads, where the class is taken
from the target's chosen base register class. It also happens
for simple REG reloads, where the class is taken from the chosen
alternative's constraints.
(2) uses of the reload pseudo as a direct input operand
In this case get_reload_reg tries to reuse the existing register
and narrow its class, instead of creating a new reload pseudo.
However, neither occurs here. As described above, r287 rightly
starts out with a wide choice of class, ultimately derived from
ALL_REGS, so we don't get (1). And as the comments in the PR
explain, r287 is never used as an input reload, only the subreg is,
so we don't get (2):
----------------------------------------------------------------------------
Choosing alt 13 in insn 317: (0) r (1) w {*movsi_aarch64}
Creating newreg=291, assigning class FP_REGS to r291
317: r288:SI=r291:SI
Inserting insn reload before:
320: r291:SI=r287:DI#0
----------------------------------------------------------------------------
IMO, in this case we should rely on the reload of r316 to narrow
down the class of r278. Currently we do:
----------------------------------------------------------------------------
Choosing alt 7 in insn 316: (0) r (1) m {*movdi_aarch64}
Creating newreg=289 from oldreg=287, assigning class GENERAL_REGS to r289
316: r289:DI=[r105:DI*0x8+r140:DI]
Inserting insn reload after:
318: r287:DI=r289:DI
---------------------------------------------------
i.e. we create a new pseudo register r289 and give *that* pseudo
GENERAL_REGS instead. This is because get_reload_reg only narrows
down the existing class for OP_IN and OP_INOUT, not OP_OUT.
But if we have a reload pseudo in a reload instruction and have chosen
a specific class for the reload pseudo, I think we should simply install
it for OP_OUT reloads too, if the class is a subset of the existing class.
We will need to pick such a register whatever happens (for r289 in the
example above). And as explained in the PR, doing this actually avoids
an unnecessary move via the FP registers too.
The patch is quite aggressive in that it does this for all reload
pseudos in all reload instructions. I wondered about reusing the
condition for a reload move in in_class_p:
INSN_UID (curr_insn) >= new_insn_uid_start
&& curr_insn_set != NULL
&& ((OBJECT_P (SET_SRC (curr_insn_set))
&& ! CONSTANT_P (SET_SRC (curr_insn_set)))
|| (GET_CODE (SET_SRC (curr_insn_set)) == SUBREG
&& OBJECT_P (SUBREG_REG (SET_SRC (curr_insn_set)))
&& ! CONSTANT_P (SUBREG_REG (SET_SRC (curr_insn_set)))))))
but I can't really justify that on first principles. I think we
should apply the rule consistently until we have a specific reason
for doing otherwise.
gcc/
PR rtl-optimization/96796
* lra-constraints.c (in_class_p): Add a default-false
allow_all_reload_class_changes_p parameter. Do not treat
reload moves specially when the parameter is true.
(get_reload_reg): Try to narrow the class of an existing OP_OUT
reload if we're reloading a reload pseudo in a reload instruction.
gcc/testsuite/
PR rtl-optimization/96796
* gcc.c-torture/compile/pr96796.c: New test.
We can simplify this constexpr function further because we know that
period::num >= 1 and period::den >= 1 so only the remainder can ever be
zero.
libstdc++-v3/ChangeLog:
* include/std/chrono (duration::_S_gcd): Use invariant that
neither value is zero initially.
gcc/ChangeLog
2020-09-07 Andrea Corallo <andrea.corallo@arm.com>
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Revert
dead-code removal introduced by 09fa6acd8d + add a comment to
clarify.
In d8487c949a, MODE_PARTIAL_INT modes were changed from having an
unknown number of undefined bits, to having a known number of undefined
bits, however the documentation on using SUBREG expressions with
MODE_PARTIAL_INT modes was not updated to reflect this.
gcc/ChangeLog:
* doc/rtl.texi (subreg): Fix documentation to state there is a known
number of undefined bits in regs and subregs of MODE_PARTIAL_INT modes.
430X is the default ISA under normal operation, so even when the MCU name
passed to -mmcu= is unrecognized, it should not be overriden.
gcc/ChangeLog:
* config/msp430/msp430.c (msp430_option_override): Don't set the
ISA to 430 when the MCU is unrecognized.
gcc/testsuite/ChangeLog:
* gcc.target/msp430/430x-default-isa.c: New test.
Recent changes in debug output have resulted in a change
in the length of the pub types info. This updates the tests to
reflect the new length.
gcc/testsuite/ChangeLog:
* gcc.dg/pubtypes-2.c: Amend Pub Info Length.
* gcc.dg/pubtypes-3.c: Likewise.
* gcc.dg/pubtypes-4.c: Likewise.
Following on from the previous commit to fix up the syntax for
add/sub/adds/subs and friends with a sign/zero-extended operand, this
patch removes the "mult" variants of these patterns which are all
redundant.
This patch removes the following patterns from the AArch64 backend:
*adds_mul_imm_<mode>
*subs_mul_imm_<mode>
*adds_<optab><mode>_multp2
*subs_<optab><mode>_multp2
*add_mul_imm_<mode>
*add_<optab><ALLX:mode>_mult_<GPI:mode>
*add_<optab><SHORT:mode>_mult_si_uxtw
*add_<optab><mode>_multp2
*add_<optab>si_multp2_uxtw
*add_uxt<mode>_multp2
*add_uxtsi_multp2_uxtw
*sub_mul_imm_<mode>
*sub_mul_imm_si_uxtw
*sub_<optab><mode>_multp2
*sub_<optab>si_multp2_uxtw
*sub_uxt<mode>_multp2
*sub_uxtsi_multp2_uxtw
*neg_mul_imm_<mode>2
*neg_mul_imm_si2_uxtw
Together with the following predicates which were used only by these
patterns:
aarch64_pwr_imm3
aarch64_pwr_2_si
aarch64_pwr_2_di
These patterns are all redundant since multiplications by powers of two
should be represented as shfits outside a (mem).
---
gcc/ChangeLog:
* config/aarch64/aarch64.md (*adds_mul_imm_<mode>): Delete.
(*subs_mul_imm_<mode>): Delete.
(*adds_<optab><mode>_multp2): Delete.
(*subs_<optab><mode>_multp2): Delete.
(*add_mul_imm_<mode>): Delete.
(*add_<optab><ALLX:mode>_mult_<GPI:mode>): Delete.
(*add_<optab><SHORT:mode>_mult_si_uxtw): Delete.
(*add_<optab><mode>_multp2): Delete.
(*add_<optab>si_multp2_uxtw): Delete.
(*add_uxt<mode>_multp2): Delete.
(*add_uxtsi_multp2_uxtw): Delete.
(*sub_mul_imm_<mode>): Delete.
(*sub_mul_imm_si_uxtw): Delete.
(*sub_<optab><mode>_multp2): Delete.
(*sub_<optab>si_multp2_uxtw): Delete.
(*sub_uxt<mode>_multp2): Delete.
(*sub_uxtsi_multp2_uxtw): Delete.
(*neg_mul_imm_<mode>2): Delete.
(*neg_mul_imm_si2_uxtw): Delete.
* config/aarch64/predicates.md (aarch64_pwr_imm3): Delete.
(aarch64_pwr_2_si): Delete.
(aarch64_pwr_2_di): Delete.
Given the following C function:
double *f(double *p, unsigned x)
{
return p + x;
}
prior to this patch, GCC at -O2 would generate:
f:
add x0, x0, x1, uxtw 3
ret
but this add instruction uses architecturally-invalid syntax: the width
of the third operand conflicts with the width of the extension
specifier. The third operand is only permitted to be an x register when
the extension specifier is (u|s)xtx.
This instruction, and analogous insns for adds, sub, subs, and cmp, are
rejected by clang, but accepted by binutils. Assembling and
disassembling such an insn with binutils gives the architecturally-valid
version in the disassembly:
0: 8b214c00 add x0, x0, w1, uxtw #3
This patch fixes several patterns in the AArch64 backend to use the
standard syntax as specified in the Arm ARM such that GCC's output can
be assembled by assemblers other than GAS.
---
gcc/ChangeLog:
* config/aarch64/aarch64.md
(*adds_<optab><ALLX:mode>_<GPI:mode>): Ensure extended operand
agrees with width of extension specifier.
(*subs_<optab><ALLX:mode>_<GPI:mode>): Likewise.
(*adds_<optab><ALLX:mode>_shift_<GPI:mode>): Likewise.
(*subs_<optab><ALLX:mode>_shift_<GPI:mode>): Likewise.
(*add_<optab><ALLX:mode>_<GPI:mode>): Likewise.
(*add_<optab><ALLX:mode>_shft_<GPI:mode>): Likewise.
(*add_uxt<mode>_shift2): Likewise.
(*sub_<optab><ALLX:mode>_<GPI:mode>): Likewise.
(*sub_<optab><ALLX:mode>_shft_<GPI:mode>): Likewise.
(*sub_uxt<mode>_shift2): Likewise.
(*cmp_swp_<optab><ALLX:mode>_reg<GPI:mode>): Likewise.
(*cmp_swp_<optab><ALLX:mode>_shft_<GPI:mode>): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/adds3.c: Fix test w.r.t. new syntax.
* gcc.target/aarch64/cmp.c: Likewise.
* gcc.target/aarch64/subs3.c: Likewise.
* gcc.target/aarch64/subsp.c: Likewise.
* gcc.target/aarch64/extend-syntax.c: New test.
This adds additional dumping helping in particular basic-block
vectorization SLP dump reading plus showing what we actually
generate code from.
2020-09-07 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_analyze_slp_instance): Dump
stmts we start SLP analysis from, failure and splitting.
(vect_schedule_slp): Dump SLP graph entry and root stmt
we are about to emit code for.
This fixes compilation of codepaths for dos-like filesystems
with Clang. When built with clang, it treats C input files as C++
when the compiler driver is invoked in C++ mode, triggering errors
when the return value of strchr() on a pointer to const is assigned
to a pointer to non-const variable.
This matches similar variables outside of the ifdefs for dos-like
path handling.
2020-09-07 Martin Storsjö <martin@martin.st>
gcc/
* dwarf2out.c (file_name_acquire): Make a strchr return value
pointer to const.
libcpp/
* files.c (remap_filename): Make a strchr return value pointer
to const.
gcc/fortran/ChangeLog:
PR fortran/96896
* resolve.c (get_temp_from_expr): Also reset proc_pointer +
use_assoc attribute.
(resolve_ptr_fcn_assign): Use information from the LHS.
gcc/testsuite/ChangeLog:
PR fortran/96896
* gfortran.dg/ptr_func_assign_4.f08: Update dg-error.
* gfortran.dg/ptr-func-3.f90: New test.
When compiling atomic-generic.c from the libatomic testsuite, we run into:
...
$ gcc src/libatomic/testsuite/libatomic.c/atomic-generic.c -latomic
src/libatomic/testsuite/libatomic.c/atomic-generic.c: In function ‘main’:
src/libatomic/testsuite/libatomic.c/atomic-generic.c:31:7: warning: \
implicit declaration of function ‘memcmp’ [-Wimplicit-function-declaration]
if (memcmp (&a, &zero, size))
^~~~~~
...
Fix this by adding the missing string.h include.
Tested on x86_64.
libatomic/ChangeLog:
* testsuite/libatomic.c/atomic-generic.c: Include string.h.
The following patch adds streaming of edge goto_locus (both LOCATION_LOCUS
and LOCATION_BLOCK from it), the PR shows a testcase (inappropriate for
gcc testsuite) where the lack of streaming of goto_locus results in worse
debug info.
Earlier version of the patch (without the output_function changes) failed
miserably, because on the order mismatch - input_function would
first input_cfg, then input_eh_regions and then input_bb (all of which now
have locations), while output_function used output_eh_regions, then output_bb
and then output_cfg. *_cfg went to a separate stream...
Now, is there a reason why the order is different?
If the intent is that the cfg could be read separately from the rest of
function or vice versa, alternatively we'd need to clear_line_info ();
before output_eh_regions and before/after output_cfg to make them
independent.
2020-09-07 Jakub Jelinek <jakub@redhat.com>
PR debug/94235
* lto-streamer-out.c (output_cfg): Also stream goto_locus for edges.
Use bp_pack_var_len_unsigned instead of streamer_write_uhwi to stream
e->dest->index and e->flags.
(output_function): Call output_cfg before output_ssa_name, rather than
after streaming all bbs.
* lto-streamer-in.c (input_cfg): Stream in goto_locus for edges.
Use bp_unpack_var_len_unsigned instead of streamer_read_uhwi to stream
in dest_index and edge_flags.
The following adds the capability to code-generate live lanes in
basic-block vectorization using lane extracts from vector stmts
rather than keeping the original scalar code around for those.
This eventually makes previously not profitable vectorizations
profitable (the live scalar code was appropriately costed so
are the lane extracts now), without considering the cost model
this patch doesn't add or remove any basic-block vectorization
capabilities.
The patch re/ab-uses STMT_VINFO_LIVE_P in basic-block vectorization
mode to tell whether a live lane is vectorized or whether it is
provided by means of keeping the scalar code live.
The patch is a first step towards vectorizing sequences of
stmts that do not end up in stores or vector constructors though.
Bootstrapped and tested on x86_64-unknown-linux-gnu.
2020-09-04 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vectorizable_live_operation): Adjust.
* tree-vect-loop.c (vectorizable_live_operation): Vectorize
live lanes out of basic-block vectorization nodes.
* tree-vect-slp.c (vect_bb_slp_mark_live_stmts): New function.
(vect_slp_analyze_operations): Analyze live lanes and their
vectorization possibility after the whole SLP graph is final.
(vect_bb_slp_scalar_cost): Adjust for vectorized live lanes.
* tree-vect-stmts.c (can_vectorize_live_stmts): Adjust.
(vect_transform_stmt): Call can_vectorize_live_stmts also for
basic-block vectorization.
* gcc.dg/vect/bb-slp-46.c: New testcase.
* gcc.dg/vect/bb-slp-47.c: Likewise.
* gcc.dg/vect/bb-slp-32.c: Adjust.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr92658-avx512bw-trunc.c: Add
-mprefer-vector-width=512 to avoid impact of different default
tune which gcc is built with.
Array concatenate expressions were creating more SAVE_EXPRs than what
was necessary. The internal error itself was the result of a forced
temporary being made on a TREE_ADDRESSABLE type.
gcc/d/ChangeLog:
PR d/96924
* expr.cc (ExprVisitor::visit (CatAssignExp *)): Don't force
temporaries needlessly.
gcc/testsuite/ChangeLog:
PR d/96924
* gdc.dg/simd13927b.d: Removed.
* gdc.dg/pr96924.d: New test.
This refines the previous fix for PR96698 by re-doing how and where
we arrange for setting vectorized cycle PHI backedge values.
2020-09-04 Richard Biener <rguenther@suse.de>
PR tree-optimization/96698
PR tree-optimization/96920
* tree-vectorizer.h (loop_vec_info::reduc_latch_defs): Remove.
(loop_vec_info::reduc_latch_slp_defs): Likewise.
* tree-vect-stmts.c (vect_transform_stmt): Remove vectorized
cycle PHI latch code.
* tree-vect-loop.c (maybe_set_vectorized_backedge_value): New
helper to set vectorized cycle PHI latch values.
(vect_transform_loop): Walk over all PHIs again after
vectorizing them, calling maybe_set_vectorized_backedge_value.
Call maybe_set_vectorized_backedge_value for each vectorized
stmt. Remove delayed update code.
* tree-vect-slp.c (vect_analyze_slp_instance): Initialize
SLP instance reduc_phis member.
(vect_schedule_slp): Set vectorized cycle PHI latch values.
* gfortran.dg/vect/pr96920.f90: New testcase.
* gcc.dg/vect/pr96920.c: Likewise.
gcc/ChangeLog
2020-09-04 Andrea Corallo <andrea.corallo@arm.com>
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Remove
dead code as LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) is
always verified.
The testcase shows that we fail to clear gimple_call_ctrl_altering_p
when the last abnormal edge goes away, causing an edge insert to
a loop header edge when we have preheaders to split the edge
unnecessarily.
The following addresses this by more aggressively clearing the
flag in cleanup_call_ctrl_altering_flag.
2020-09-04 Richard Biener <rguenther@suse.de>
PR tree-optimization/96931
* tree-cfgcleanup.c (cleanup_call_ctrl_altering_flag): If
there's a fallthru edge and no abnormal edge the call is
no longer control-altering.
(cleanup_control_flow_bb): Pass down the BB to
cleanup_call_ctrl_altering_flag.
* gcc.dg/pr96931.c: New testcase.
As discussed yesterday, stream_input_location_now has been used in 3
remaining places. For ERT_MUST_NOT_THROW, I believe the failure_loc
location is stable at least until the apply_cache after the bbs are all
read, and the locations do not include BLOCK, so we can use normal
stream_input_location, and the two input_struct_function_base also
shouldn't include BLOCK and are stable at least until that same apply_cache
after reading all bbs, so again we can use the location cache.
2020-09-04 Jakub Jelinek <jakub@redhat.com>
* lto-streamer.h (stream_input_location_now): Remove declaration.
* lto-streamer-in.c (stream_input_location_now): Remove.
(input_eh_region, input_struct_function_base): Use
stream_input_location instead of stream_input_location_now.
As discussed yesterday:
On the streamer out side, we call clear_line_info
in multiple spots which resets the current_* values to something, but on the
reader side, we don't have corresponding resets in the same location, just have
the stream_* static variables that keep the current values through the
entire stream in (so across all the clear_line_info spots in a single LTO
object but also across jumping from one LTO object to another one).
Now, in an earlier version of my patch it actually broke LTO bootstrap
(and a lot of LTO testcases), so for the BLOCK case I've solved it by
clear_line_info setting current_block to something that should never appear,
which means that in the LTO stream after the clear_line_info spots including
the start of the LTO stream we force the block change bit to be set and thus
BLOCK to be streamed and therefore stream_block from earlier to be
ignored. But for the rest I think that is not the case, so I wonder if we
don't sometimes end up with wrong line/column info because of that, or
please tell me what prevents that.
clear_line_info does:
ob->current_file = NULL;
ob->current_line = 0;
ob->current_col = 0;
ob->current_sysp = false;
while I think NULL current_file is something that should likely be different
from expanded_location (...).file (UNKNOWN_LOCATION/BUILTINS_LOCATION are
handled separately and not go through the caching), I think line number 0
can sometimes occur and especially column 0 occurs frequently if we ran out
of location_t with columns info. But then we do:
bp_pack_value (bp, ob->current_file != xloc.file, 1);
bp_pack_value (bp, ob->current_line != xloc.line, 1);
bp_pack_value (bp, ob->current_col != xloc.column, 1);
and stream the details only if the != is true. If that happens immediately
after clear_line_info and e.g. xloc.column is 0, we would stream 0 bit and
not stream the actual value, so on read-in it would reuse whatever
stream_col etc. were before. Shouldn't we set some ob->current_* new bit
that would signal we are immediately past clear_line_info which would force
all these != checks to non-zero? Either by oring something into those
tests, or perhaps:
if (ob->current_reset)
{
if (xloc.file == NULL)
ob->current_file = "";
if (xloc.line == 0)
ob->current_line = 1;
if (xloc.column == 0)
ob->current_column = 1;
ob->current_reset = false;
}
before doing those bp_pack_value calls with a comment, effectively forcing
all 6 != comparisons to be true?
2020-09-04 Jakub Jelinek <jakub@redhat.com>
* lto-streamer.h (struct output_block): Add reset_locus member.
* lto-streamer-out.c (clear_line_info): Set reset_locus to true.
(lto_output_location_1): If reset_locus, clear it and ensure
current_{file,line,col} is different from xloc members.
This patch updates the BPF back end to generate indirect calls via
the 'call %reg' instruction when targetting xBPF.
Additionally, the BPF ASM_SPEC is updated to pass along -mxbpf to
gas, where it is now supported.
2020-09-03 David Faust <david.faust@oracle.com>
gcc/
* config/bpf/bpf.h (ASM_SPEC): Pass -mxbpf to gas, if specified.
* config/bpf/bpf.c (bpf_output_call): Support indirect calls in xBPF.
gcc/testsuite/
* gcc.target/bpf/xbpf-indirect-call-1.c: New test.
This patch is to clean existing rs6000 test targets p8 and p9+
with existing has_arch_pwr8 and has_arch_pwr9 targets combination
or only one of them.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr92398.p9+.c: Replace p9+ with has_arch_pwr9.
* gcc.target/powerpc/pr92398.p9-.c: Replace p9+ with has_arch_pwr9,
and replace p8 with has_arch_pwr8 && !has_arch_pwr9.
* lib/target-supports.exp (check_effective_target_p8): Remove.
(check_effective_target_p9+): Remove.