This partially fixes PR target/53440 atleast in ARM and
Thumb2 state. I haven't yet managed to get my head around
rewriting the Thumb1 support yet.
Tested on armhf with a bootstrap and regression test
with no regressions.
Queued for stage1 now as it isn't technically a regression.
regards
Ramana
2016-05-13 Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>
PR target/53440
* config/arm/arm.c (arm32_output_mi_thunk): New.
(arm_output_mi_thunk): Rename to arm_thumb1_mi_thunk. Rework
to split Thumb1 vs TARGET_32BIT functionality.
(arm_thumb1_mi_thunk): New.
* g++.dg/inherit/thunk1.C: Support arm / aarch64.
From-SVN: r236198
The reason this caught my eye on aarch64 is because
the return value register (x0) is not identical to the register in which
the hidden parameter for AArch64 is set (x8). Thus setting this to true
seems to be quite reasonable and shaves off 100 odd mov x0, x8's from
cc1 in a bootstrap build.
I don't expect this to make a huge impact on performance but as they
say every little counts. The AAPCS64 is quite explicit about not
requiring that the contents of x8 be kept live.
Bootstrapped and regression tested on aarch64.
Ok to apply ?
Ramana
gcc/
* config/aarch64/aarch64.c (TARGET_OMIT_STRUCT_RETURN_REG): Set to
true.
gcc/testsuite
* gcc.target/aarch64/struct_return.c: New test.
From-SVN: r236197
* config/i386/i386.md (*call_got_x32): Change operand 0 to
DImode before it is passed to ix86_output_call_operand.
(*call_value_got_x32): Ditto for operand 1.
From-SVN: r236182
2016-05-12 Richard Biener <rguenther@suse.de>
PR tree-optimization/71059
* tree-ssa-pre.c (phi_translate_1): Fully fold translated
nary before looking up or entering the expression into the VN
hashes.
* tree-ssa-sccvn.c (vn_nary_build_or_lookup): Fix comment typo.
Make sure to re-use NARYs without result as inserted by
phi-translation.
* gcc.dg/torture/pr71059.c: New testcase.
From-SVN: r236175
2016-05-12 Richard Biener <rguenther@suse.de>
PR tree-optimization/71062
* tree-ssa-alias.h (struct pt_solution): Add vars_contains_restrict
field.
* tree-ssa-structalias.c (set_uids_in_ptset): Set vars_contains_restrict
if the var is a restrict tag.
* tree-ssa-alias.c (ptrs_compare_unequal): If vars_contains_restrict
do not disambiguate pointers against it.
(dump_points_to_solution): Re-structure and adjust for new
vars_contains_restrict flag.
* gimple-pretty-print.c (pp_points_to_solution): Likewise.
* gcc.dg/torture/pr71062.c: New testcase.
From-SVN: r236174
PR target/70830
* config/arm/arm.c (arm_output_multireg_pop): Avoid POP instruction
when popping the PC and within an interrupt handler routine.
Add missing tab to output of "ldmfd".
(output_return_instruction): Output LDMFD with SP update rather
than POP when returning from interrupt handler.
* gcc.target/arm/interrupt-1.c: Change dg-compile to dg-assemble.
Add -save-temps to dg-options.
Scan for ldmfd rather than pop instruction.
* gcc.target/arm/interrupt-2.c: Likewise.
* gcc.target/arm/pr70830.c: New test.
From-SVN: r236169
* config/i386/i386.md (isa): Add x64_avx512dq, enable if
TARGET_64BIT && TARGET_AVX512DQ.
* config/i386/sse.md (*vec_extract<mode>): Add avx512bw alternatives.
(*vec_extract<PEXTR_MODE12:mode>_zext): Add avx512bw alternative.
(*vec_extract<ssevecmodelower>_0, *vec_extractv4si_0_zext,
*vec_extractv2di_0_sse): Use v constraint instead of x constraint.
(*vec_extractv4si): Add avx512dq and avx512bw alternatives.
(*vec_extractv4si_zext): Add avx512dq alternative.
(*vec_extractv2di_1): Add x64_avx512dq and avx512bw alternatives,
use v instead of x constraint in other alternatives where possible.
* gcc.target/i386/avx512bw-vpextr-1.c: New test.
* gcc.target/i386/avx512dq-vpextr-1.c: New test.
From-SVN: r236167
* config/i386/sse.md (pinsr_evex_isa): New mode attr.
(<sse2p4_1>_pinsr<ssemodesuffix>): Add 2 alternatives with
v constraints instead of x and <pinsr_evex_isa> isa attribute.
* gcc.target/i386/avx512bw-vpinsr-1.c: New test.
* gcc.target/i386/avx512dq-vpinsr-1.c: New test.
* gcc.target/i386/avx512vl-vpinsr-1.c: New test.
From-SVN: r236165
PR target/71019
* config/i386/sse.md (<sse2_avx2>_packssdw<mask_name>,
<sse4_1_avx2>_packusdw<mask_name>): Make sure EVEX encoded insn
is not emitted unless TARGET_AVX512BW.
(<sse2_avx2>_packuswb<mask_name>, <sse2_avx2>_packsswb<mask_name>):
Likewise. For TARGET_AVX512BW, use "=v" constraint instead of "=x"
for the result operand.
* gcc.target/i386/avx512vl-pack-1.c: New test.
* gcc.target/i386/avx512vl-pack-2.c: New test.
* gcc.target/i386/avx512bw-pack-2.c: New test.
From-SVN: r236163
* config/i386/sse.md (*vec_setv4sf_sse4_1, sse4_1_insertps): Use v
constraint instead of x in avx alternatives. Use maybe_evex instead
of vex prefix.
* gcc.target/i386/avx512vl-vinsertps-1.c: New test.
From-SVN: r236162
* config/i386/constraints.md (Yv): New constraint.
* config/i386/i386.h (VALID_AVX512VL_128_REG_MODE): Allow
TFmode and V1TImode in xmm16+ registers for TARGET_AVX512VL.
* config/i386/i386.md (avx512fvecmode): New mode attr.
(*pushtf): Use v constraint instead of x.
(*movtf_internal): Likewise. For TARGET_AVX512VL and
xmm16+ registers, use vmovdqu64 or vmovdqa64 instructions.
(*absneg<mode>2): Use Yv constraint instead of x constraint.
(*absnegtf2_sse): Likewise.
(copysign<mode>3_const, copysign<mode>3_var): Likewise.
* config/i386/sse.md (*andnot<mode>3): Add avx512vl and
avx512f alternatives.
(*andnottf3, *<code><mode>3, *<code>tf3): Likewise.
* gcc.target/i386/avx512dq-abs-copysign-1.c: New test.
* gcc.target/i386/avx512vl-abs-copysign-1.c: New test.
* gcc.target/i386/avx512vl-abs-copysign-2.c: New test.
From-SVN: r236161
2016-05-12 Richard Biener <rguenther@suse.de>
PR tree-optimization/71060
* tree-data-ref.c (initialize_data_dependence_relation): Do not
require exact match of DR_BASE_OBJECT but only matching address and
type.
From-SVN: r236159
2016-05-11 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
* gcc.target/powerpc/pr70963.c: Require at least power8 at both
compile and run time.
From-SVN: r236146
[gcc]
2016-05-11 Michael Meissner <meissner@linux.vnet.ibm.com>
* config/rs6000/predicates.md (quad_memory_operand): Move most of
the code into quad_address_p and call it to share code with
vsx_quad_dform_memory_operand.
(vsx_quad_dform_memory_operand): New predicate for ISA 3.0 vector
d-form support.
* config/rs6000/rs6000.opt (-mlra): Switch to being an option mask
bit instead of being a separate word. Split -mpower9-dform into
two switches, -mpower9-dform-scalar and -mpower9-dform-vector.
* config/rs6000/rs6000.c (RELOAD_REG_QUAD_OFFSET): New addr_mask
for the register class supporting 128-bit quad word memory
offsets.
(mode_supports_vsx_dform_quad): Helper function to return if the
register class uses quad word memory offsets.
(rs6000_debug_addr_mask): Add support for quad word memory
offsets.
(rs6000_debug_reg_global): Always print if we are using LRA or
not.
(rs6000_setup_reg_addr_masks): If ISA 3.0 vector d-form
instructions are enabled, set up the appropriate addr_masks for
128-bit types.
(rs6000_init_hard_regno_mode_ok): wb constraint is now based on
-mpower9-dform-scalar, instead of -mpower9-dform.
(rs6000_option_override_internal): Split -mpower9-dform into two
switches, -mpower9-dform-scalar and -mpower9-dform-vector. The
-mpower9-dform switch sets or clears both. If we are not using
the LRA register allocator, do not enable -mpower9-dform-vector by
default. If we are using LRA, enable -mpower9-dform-vector and
-mvsx-timode if it is appropriate. Issue a warning if either
-mpower9-dform-vector or -mvsx-timode are explicitly used without
enabling LRA.
(quad_address_offset_p): New helper function to return if the
offset is legal for quad word memory instructions.
(quad_address_p): New function to determin if GPR or vector
register quad word memory addresses are legal.
(mem_operand_gpr): Validate quad word address offsets.
(reg_offset_addressing_ok_p): Add support for ISA 3.0 vector
d-form (register + offset) instructions.
(offsettable_ok_by_alignment): Likewise.
(rs6000_legitimate_offset_address_p): Likewise.
(legitimate_lo_sum_address_p): Likewise.
(rs6000_legitimize_address): Likewise.
(rs6000_legitimize_reload_address): Add more debug statements for
-mdebug=addr.
(rs6000_legitimate_address_p): Add support for ISA 3.0 vector
d-form instructions.
(rs6000_secondary_reload_memory): Add support for ISA 3.0 vector
d-form instructions. Distinguish different cases in debug
output. (rs6000_secondary_reload_inner): Add support for ISA 3.0 vector
d-form instructions.
(rs6000_preferred_reload_class): Likewise.
(rs6000_output_move_128bit): Add support for ISA 3.0 d-form
instructions. If ISA 3.0 is available, generate lxvx/stxvx instead
of the ISA 2.06 indexed memory instructions.
(rs6000_emit_prologue): If we have ISA 3.0 d-form instructions,
use them to save/restore the saved vector registers instead of
using Altivec instructions.
(rs6000_emit_epilogue): Likewise.
(rs6000_lra_p): Use TARGET_LRA instead of the old option word.
(rs6000_opt_masks): Split -mpower9-dform into
-mpower9-dform-scalar and -mpower9-dform-vector.
(rs6000_print_options_internal): Print -mno-<switch> if <switch>
was not selected.
* config/rs6000/vsx.md (p9_vecload_<mode>): Delete hack to emit
ISA 3.0 vector indexed memory instructions, and fold the code into
the normal mov<mode> patterns.
(p9_vecstore_<mode>): Likewise.
(vsx_mov<mode>): Add support for ISA 3.0 vector d-form
instructions.
(vsx_movti_64bit): Likewise.
(vsx_movti_32bit): Likewise.
* config/rs6000/constraints.md (wO constraint): New constraint for
ISA 3.0 vector d-form support.
* config/rs6000/rs6000-cpus.def (ISA_3_0_MASKS_SERVER): Use
-mpower9-dform-scalar instead of -mpower9-dform. Add note not to
include -mpower9-dform-vector until we switch over to LRA.
(POWERPC_MASKS): Add -mlra. Split -mpower9-dform into two.
switches, -mpower9-dform-scalar and -mpower9-dform-vector.
* config/rs6000/rs6000-protos.h (quad_address_p): Add declaration.
* doc/invoke.texi (RS/6000 and PowerPC Options): Add documentation
for -mpower9-dform and -mlra.
* doc/md.texi (wO constraint): Document wO constraint.
[gcc/testsuite]
2016-05-11 Michael Meissner <meissner@linux.vnet.ibm.com>
* gcc.target/powerpc/dform-3.c: New test for ISA 3.0 vector d-form
support.
* gcc.target/powerpc/dform-1.c: Add -mlra option to silence
warning when using -mvsx-timode.
* gcc.target/powerpc/p8vector-int128-1.c: Likewise.
* gcc.target/powerpc/dform-2.c: Likewise.
* gcc.target/powerpc/pr68805.c: Likewise.
From-SVN: r236133
2016-05-11 Richard Biener <rguenther@suse.de>
PR tree-optimization/71055
* tree-ssa-sccvn.c (vn_reference_lookup_3): When native-interpreting
sth with precision not equal to access size verify we don't chop
off bits.
* gcc.dg/torture/pr71055.c: New testcase.
From-SVN: r236122
2016-05-11 Richard Biener <rguenther@suse.de>
PR middle-end/71002
* alias.c (reference_alias_ptr_type): Preserve alias-set zero
if the langhook insists on it.
* fold-const.c (make_bit_field_ref): Add arg for the original
reference and preserve its alias-set.
(decode_field_reference): Take exp by reference and adjust it
to the original memory reference.
(optimize_bit_field_compare): Adjust callers.
(fold_truth_andor_1): Likewise.
* gimplify.c (gimplify_expr): Adjust in-SSA form test.
* g++.dg/torture/pr71002.C: New testcase.
From-SVN: r236117
Revision 235794 regressed compat/scalar-by-value-6 for powerpc-linux
-m32 due to accidentally changing the ABI. By another historical
accident, complex long double is stupidly passed in gprs for -m32.
* config/rs6000/rs6000.c (is_complex_IBM_long_double,
abi_v4_pass_in_fpr): New functions.
(rs6000_function_arg_boundary): Exclude complex IBM long double
from 64-bit alignment when ABI_V4.
(rs6000_function_arg, rs6000_function_arg_advance_1,
rs6000_gimplify_va_arg): Use abi_v4_pass_in_fpr.
From-SVN: r236111
If we have a conditional jump that has only a return in both the branch
path and the fallthrough path, and the return on the branch path can not
be made a conditional return, we will try to make a conditional return
from the fallthrough path, and that does not work because we then try
to redirect the (new) jump in the fallthrough block to the original
dest in the branch path, which is the exit block.
For the testcase on ARM we end up in this situation because before the
jump2 pass there are some other insns in the return blocks as well, but
the same insns in both, so those are moved above the conditional jump.
Only later (in the ce3 pass) are the conditional jump and two returns
melded into one return, so we need to handle this strange case here.
PR rtl-optimization/71028
* cfgcleanup.c (try_optimize_cfg): Do not flip a conditional
jump with just a return in the fallthrough block if the branch
block contains just a returns as well.
From-SVN: r236106
read-md.c and read-rtl.c repeatedly use this pattern:
c = read_skip_spaces ();
if (c != ')')
fatal_expected_char (')', c);
Simplify them by introduce a helper function to do this.
gcc/ChangeLog:
* read-md.c (require_char_ws): New function.
(read_string): Simplify using require_char_ws.
(handle_constants): Likewise.
(handle_enum): Likewise.
(handle_file): Likewise.
* read-md.h (require_char_ws): New declaration.
* read-rtl.c (read_conditions): Simplify using require_char_ws.
(read_mapping): Likewise.
(read_rtx_code): Likewise.
(read_nested_rtx): Likewise.
From-SVN: r236101
* config/i386/i386.c (legitimize_pic_address): Merge 64-bit and 32-bit
gotoff_operand code paths. Use copy_to_suggested_regs and
expand_simple_binop where appropriate. Cleanup.
From-SVN: r236096