Makefile.in (tree-vect-patterns.o): Add rule for new file.

* Makefile.in (tree-vect-patterns.o): Add rule for new file.
        * tree-vect-analyze.c (vect_determine_vectorization_factor): Use
        existing STMT_VINFO_VECTYPE if available.
        (vect_mark_relevant): Add special handling for stmts that are
        marked as STMT_VINFO_IN_PATTERN_P.
        (vect_analyze_loop): Call vect_pattern_recog.
        * tree-vectorizer.c (new_stmt_vec_info): Initialize new fields.
        * tree-vectorizer.h (in_pattern_p, related_stmt): New fields in
        stmt_info.
        (STMT_VINFO_IN_PATTERN_P, STMT_VINFO_RELATED_STMT): New macros.
        (vect_recog_func_ptr): New function-pointer type.
        * tree-vect-patterns.c: New file.
        (vect_recog_widen_sum_pattern, vect_recog_widen_mult_pattern):
        (vect_recog_dot_prod_pattern, vect_pattern_recog):
        (vect_pattern_recog_1): New functions.
        (vect_pattern_recog_funcs): New array of function pointers.

        * tree-vectorizer.h (ternary_op): New enum value.
        * tree-vect-transform.c (vect_create_epilog_for_reduction): Added
        declaration. Revised documentation. Removed redundant dump prints.
        Removed redundant argument. Added support for reduction patterns.
        (vectorizable_reduction): Added support for reduction patterns.
        (vect_transform_stmt): Added support for patterns.

        * expr.c (expand_expr_real_1): Added case for DOT_PROD_EXPR.
        * genopinit.c (udot_prod_optab, sdot_prod_optab): Initialize.
        * optabs.c (optab_for_tree_code): Added case for DOT_PROD_EXPR.
        (expand_widen_pattern_expr): New function.
        (init_optabs): Initialize new optabs udot_prod_optab,
        sdot_prod_optab.
        * optabs.h (OTI_sdot_prod, OTI_udot_prod): New.
        (sdot_prod_optab, udot_prod_optab): Define new optabs.
        (expand_widen_pattern_expr): New function declaration.
        * tree.def (DOT_PROD_EXPR, WIDEN_SUM_EXPR, WIDEN_MULT_EXPR): New
        tree-codes.
        * tree-inline.c (estimate_num_insns_1): Added cases for new
        tree-codes DOT_PROD_EXPR, WIDEN_SUM_EXPR, WIDEN_MULT_EXPR.
        * tree-pretty-print.c (dump_generic_node): Likewise.
        (op_prio): Likewise.
        (op_symbol): Added cases for WIDEN_SUM_EXPR, WIDEN_MULT_EXPR.
        * tree-ssa-operands.c (get_expr_operands): Added case for
        DOT_PROD_EXPR.
        * tree-vect-patterns.c (widened_name_p): New function.
        (vect_recog_dot_prod_pattern): Added function implementation.
        * tree-vect-transform.c (get_initial_def_for_reduction): Added
        cases for DOT_PROD_EXPR, WIDEN_SUM_EXPR.
        * config/rs6000/altivec.md (udot_prod<mode>, sdot_prodv8hi): New.
        * config/i386/sse.md (sdot_prodv8hi, udot_prodv4si): New.

        * expr.c (expand_expr_real_1): Added case for WIDEN_SUM_EXPR.
        * genopinit.c (widen_ssum_optab, widen_usum_optab): Initialize.
        * optabs.c (optab_for_tree_code): Added case for WIDEN_SUM_EXPR.
        (init_optabs): Initialize new optabs widen_ssum_optab,
        widen_usum_optab.
        * optabs.h (OTI_widen_ssum, OTI_widen_usum): New.
        (widen_ssum_optab, widen_usum_optab): Define new optabs.
        * tree-vect-generic.c: (expand_vector_operations_1): Check type of
        use instead of type of def.
        * tree-vect-patterns.c (vect_recog_widen_sum_pattern): Added
        function implementation.
        * config/rs6000/altivec.md (widen_usum<mode>, widen_ssumv16qi,
        widen_ssumv8hi): New.

        * doc/tm.texi (ssum_widen, usum_widen, sdot_prod, udot_prod): New
        patterns.

From-SVN: r109954
This commit is contained in:
Dorit Nuzman 2006-01-19 10:24:00 +00:00 committed by Dorit Nuzman
parent 681f47f25d
commit 20f0622174
30 changed files with 2283 additions and 167 deletions

View file

@ -1,3 +1,71 @@
2006-01-19 Dorit Nuzman <dorit@il.ibm.com>
* Makefile.in (tree-vect-patterns.o): Add rule for new file.
* tree-vect-analyze.c (vect_determine_vectorization_factor): Use
existing STMT_VINFO_VECTYPE if available.
(vect_mark_relevant): Add special handling for stmts that are
marked as STMT_VINFO_IN_PATTERN_P.
(vect_analyze_loop): Call vect_pattern_recog.
* tree-vectorizer.c (new_stmt_vec_info): Initialize new fields.
* tree-vectorizer.h (in_pattern_p, related_stmt): New fields in
stmt_info.
(STMT_VINFO_IN_PATTERN_P, STMT_VINFO_RELATED_STMT): New macros.
(vect_recog_func_ptr): New function-pointer type.
* tree-vect-patterns.c: New file.
(vect_recog_widen_sum_pattern, vect_recog_widen_mult_pattern):
(vect_recog_dot_prod_pattern, vect_pattern_recog):
(vect_pattern_recog_1): New functions.
(vect_pattern_recog_funcs): New array of function pointers.
* tree-vectorizer.h (ternary_op): New enum value.
* tree-vect-transform.c (vect_create_epilog_for_reduction): Added
declaration. Revised documentation. Removed redundant dump prints.
Removed redundant argument. Added support for reduction patterns.
(vectorizable_reduction): Added support for reduction patterns.
(vect_transform_stmt): Added support for patterns.
* expr.c (expand_expr_real_1): Added case for DOT_PROD_EXPR.
* genopinit.c (udot_prod_optab, sdot_prod_optab): Initialize.
* optabs.c (optab_for_tree_code): Added case for DOT_PROD_EXPR.
(expand_widen_pattern_expr): New function.
(init_optabs): Initialize new optabs udot_prod_optab,
sdot_prod_optab.
* optabs.h (OTI_sdot_prod, OTI_udot_prod): New.
(sdot_prod_optab, udot_prod_optab): Define new optabs.
(expand_widen_pattern_expr): New function declaration.
* tree.def (DOT_PROD_EXPR, WIDEN_SUM_EXPR, WIDEN_MULT_EXPR): New
tree-codes.
* tree-inline.c (estimate_num_insns_1): Added cases for new
tree-codes DOT_PROD_EXPR, WIDEN_SUM_EXPR, WIDEN_MULT_EXPR.
* tree-pretty-print.c (dump_generic_node): Likewise.
(op_prio): Likewise.
(op_symbol): Added cases for WIDEN_SUM_EXPR, WIDEN_MULT_EXPR.
* tree-ssa-operands.c (get_expr_operands): Added case for
DOT_PROD_EXPR.
* tree-vect-patterns.c (widened_name_p): New function.
(vect_recog_dot_prod_pattern): Added function implementation.
* tree-vect-transform.c (get_initial_def_for_reduction): Added
cases for DOT_PROD_EXPR, WIDEN_SUM_EXPR.
* config/rs6000/altivec.md (udot_prod<mode>, sdot_prodv8hi): New.
* config/i386/sse.md (sdot_prodv8hi, udot_prodv4si): New.
* expr.c (expand_expr_real_1): Added case for WIDEN_SUM_EXPR.
* genopinit.c (widen_ssum_optab, widen_usum_optab): Initialize.
* optabs.c (optab_for_tree_code): Added case for WIDEN_SUM_EXPR.
(init_optabs): Initialize new optabs widen_ssum_optab,
widen_usum_optab.
* optabs.h (OTI_widen_ssum, OTI_widen_usum): New.
(widen_ssum_optab, widen_usum_optab): Define new optabs.
* tree-vect-generic.c: (expand_vector_operations_1): Check type of
use instead of type of def.
* tree-vect-patterns.c (vect_recog_widen_sum_pattern): Added
function implementation.
* config/rs6000/altivec.md (widen_usum<mode>, widen_ssumv16qi,
widen_ssumv8hi): New.
* doc/tm.texi (ssum_widen, usum_widen, sdot_prod, udot_prod): New
patterns.
2006-01-19 Richard Sandiford <richard@codesourcery.com>
PR c/25805

View file

@ -967,6 +967,7 @@ OBJS-common = \
tree-vect-generic.o tree-ssa-loop.o tree-ssa-loop-niter.o \
tree-ssa-loop-manip.o tree-ssa-threadupdate.o \
tree-vectorizer.o tree-vect-analyze.o tree-vect-transform.o \
tree-vect-patterns.o \
tree-ssa-loop-ivcanon.o tree-ssa-propagate.o tree-ssa-address.o \
tree-ssa-math-opts.o \
tree-ssa-loop-ivopts.o tree-if-conv.o tree-ssa-loop-unswitch.o \
@ -2065,6 +2066,10 @@ tree-vect-analyze.o: tree-vect-analyze.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
$(TM_H) $(GGC_H) $(OPTABS_H) $(TREE_H) $(BASIC_BLOCK_H) \
$(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) \
tree-vectorizer.h $(TREE_DATA_REF_H) $(SCEV_H) $(EXPR_H) tree-chrec.h
tree-vect-patterns.o: tree-vect-patterns.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
$(TM_H) errors.h $(GGC_H) $(OPTABS_H) $(TREE_H) $(RTL_H) $(BASIC_BLOCK_H) \
diagnostic.h $(TREE_FLOW_H) $(TREE_DUMP_H) $(TIMEVAR_H) cfgloop.h \
tree-vectorizer.h tree-data-ref.h $(EXPR_H)
tree-vect-transform.o: tree-vect-transform.c $(CONFIG_H) $(SYSTEM_H) \
coretypes.h $(TM_H) $(GGC_H) $(OPTABS_H) $(RECOG_H) $(TREE_H) $(RTL_H) \
$(BASIC_BLOCK_H) $(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) \

View file

@ -1,5 +1,5 @@
;; GCC machine description for SSE instructions
;; Copyright (C) 2005
;; Copyright (C) 2005, 2006
;; Free Software Foundation, Inc.
;;
;; This file is part of GCC.
@ -2700,6 +2700,48 @@
DONE;
})
(define_expand "sdot_prodv8hi"
[(match_operand:V4SI 0 "register_operand" "")
(match_operand:V8HI 1 "nonimmediate_operand" "")
(match_operand:V8HI 2 "nonimmediate_operand" "")
(match_operand:V4SI 3 "register_operand" "")]
"TARGET_SSE2"
{
rtx t = gen_reg_rtx (V4SImode);
emit_insn (gen_sse2_pmaddwd (t, operands[1], operands[2]));
emit_insn (gen_addv4si3 (operands[0], operands[3], t));
DONE;
})
(define_expand "udot_prodv4si"
[(match_operand:V2DI 0 "register_operand" "")
(match_operand:V4SI 1 "register_operand" "")
(match_operand:V4SI 2 "register_operand" "")
(match_operand:V2DI 3 "register_operand" "")]
"TARGET_SSE2"
{
rtx t1, t2, t3, t4;
t1 = gen_reg_rtx (V2DImode);
emit_insn (gen_sse2_umulv2siv2di3 (t1, operands[1], operands[2]));
emit_insn (gen_addv2di3 (t1, t1, operands[3]));
t2 = gen_reg_rtx (V4SImode);
t3 = gen_reg_rtx (V4SImode);
emit_insn (gen_sse2_lshrti3 (gen_lowpart (TImode, t2),
gen_lowpart (TImode, operands[1]),
GEN_INT (32)));
emit_insn (gen_sse2_lshrti3 (gen_lowpart (TImode, t3),
gen_lowpart (TImode, operands[2]),
GEN_INT (32)));
t4 = gen_reg_rtx (V2DImode);
emit_insn (gen_sse2_umulv2siv2di3 (t4, t2, t3));
emit_insn (gen_addv2di3 (operands[0], t1, t4));
DONE;
})
(define_insn "ashr<mode>3"
[(set (match_operand:SSEMODE24 0 "register_operand" "=x")
(ashiftrt:SSEMODE24

View file

@ -1,5 +1,5 @@
;; AltiVec patterns.
;; Copyright (C) 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
;; Copyright (C) 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc.
;; Contributed by Aldy Hernandez (aldy@quesejoda.com)
;; This file is part of GCC.
@ -2150,6 +2150,77 @@
DONE;
}")
(define_expand "udot_prod<mode>"
[(set (match_operand:V4SI 0 "register_operand" "=v")
(plus:V4SI (match_operand:V4SI 3 "register_operand" "v")
(unspec:V4SI [(match_operand:VIshort 1 "register_operand" "v")
(match_operand:VIshort 2 "register_operand" "v")]
UNSPEC_VMSUMU)))]
"TARGET_ALTIVEC"
"
{
emit_insn (gen_altivec_vmsumu<VI_char>m (operands[0], operands[1], operands[2], operands[3]));
DONE;
}")
(define_expand "sdot_prodv8hi"
[(set (match_operand:V4SI 0 "register_operand" "=v")
(plus:V4SI (match_operand:V4SI 3 "register_operand" "v")
(unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
(match_operand:V8HI 2 "register_operand" "v")]
UNSPEC_VMSUMSHM)))]
"TARGET_ALTIVEC"
"
{
emit_insn (gen_altivec_vmsumshm (operands[0], operands[1], operands[2], operands[3]));
DONE;
}")
(define_expand "widen_usum<mode>3"
[(set (match_operand:V4SI 0 "register_operand" "=v")
(plus:V4SI (match_operand:V4SI 2 "register_operand" "v")
(unspec:V4SI [(match_operand:VIshort 1 "register_operand" "v")]
UNSPEC_VMSUMU)))]
"TARGET_ALTIVEC"
"
{
rtx vones = gen_reg_rtx (GET_MODE (operands[1]));
emit_insn (gen_altivec_vspltis<VI_char> (vones, const1_rtx));
emit_insn (gen_altivec_vmsumu<VI_char>m (operands[0], operands[1], vones, operands[2]));
DONE;
}")
(define_expand "widen_ssumv16qi3"
[(set (match_operand:V4SI 0 "register_operand" "=v")
(plus:V4SI (match_operand:V4SI 2 "register_operand" "v")
(unspec:V4SI [(match_operand:V16QI 1 "register_operand" "v")]
UNSPEC_VMSUMM)))]
"TARGET_ALTIVEC"
"
{
rtx vones = gen_reg_rtx (V16QImode);
emit_insn (gen_altivec_vspltisb (vones, const1_rtx));
emit_insn (gen_altivec_vmsummbm (operands[0], operands[1], vones, operands[2]));
DONE;
}")
(define_expand "widen_ssumv8hi3"
[(set (match_operand:V4SI 0 "register_operand" "=v")
(plus:V4SI (match_operand:V4SI 2 "register_operand" "v")
(unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")]
UNSPEC_VMSUMSHM)))]
"TARGET_ALTIVEC"
"
{
rtx vones = gen_reg_rtx (V8HImode);
emit_insn (gen_altivec_vspltish (vones, const1_rtx));
emit_insn (gen_altivec_vmsumshm (operands[0], operands[1], vones, operands[2]));
DONE;
}")
(define_expand "negv4sf2"
[(use (match_operand:V4SF 0 "register_operand" ""))
(use (match_operand:V4SF 1 "register_operand" ""))]

View file

@ -1,5 +1,5 @@
@c Copyright (C) 1988, 1989, 1992, 1993, 1994, 1996, 1998, 1999, 2000, 2001,
@c 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
@c 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc.
@c This is part of the GCC manual.
@c For copying conditions, see the file gcc.texi.
@ -3099,6 +3099,25 @@ Compute the sum of the unsigned elements of a vector. The vector is operand 1,
and the scalar result is stored in the least significant bits of operand 0
(also a vector). The output and input vector should have the same modes.
@cindex @code{sdot_prod@var{m}} instruction pattern
@item @samp{sdot_prod@var{m}}
@cindex @code{udot_prod@var{m}} instruction pattern
@item @samp{udot_prod@var{m}}
Compute the sum of the products of two signed/unsigned elements.
Operand 1 and operand 2 are of the same mode. Their product, which is of a
wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
wider than the mode of the product. The result is placed in operand 0, which
is of the same mode as operand 3.
@cindex @code{ssum_widen@var{m3}} instruction pattern
@item @samp{ssum_widen@var{m3}}
@cindex @code{usum_widen@var{m3}} instruction pattern
@item @samp{usum_widen@var{m3}}
Operands 0 and 2 are of the same mode, which is wider than the mode of
operand 1. Add operand 1 to operand 2 and place the widened result in
operand 0. (This is used express accumulation of elements into an accumulator
of a wider mode.)
@cindex @code{vec_shl_@var{m}} instruction pattern
@cindex @code{vec_shr_@var{m}} instruction pattern
@item @samp{vec_shl_@var{m}}, @samp{vec_shr_@var{m}}

View file

@ -8553,6 +8553,31 @@ expand_expr_real_1 (tree exp, rtx target, enum machine_mode tmode,
return temp;
}
case DOT_PROD_EXPR:
{
tree oprnd0 = TREE_OPERAND (exp, 0);
tree oprnd1 = TREE_OPERAND (exp, 1);
tree oprnd2 = TREE_OPERAND (exp, 2);
rtx op2;
expand_operands (oprnd0, oprnd1, NULL_RTX, &op0, &op1, 0);
op2 = expand_expr (oprnd2, NULL_RTX, VOIDmode, 0);
target = expand_widen_pattern_expr (exp, op0, op1, op2,
target, unsignedp);
return target;
}
case WIDEN_SUM_EXPR:
{
tree oprnd0 = TREE_OPERAND (exp, 0);
tree oprnd1 = TREE_OPERAND (exp, 1);
expand_operands (oprnd0, oprnd1, NULL_RTX, &op0, &op1, 0);
target = expand_widen_pattern_expr (exp, op0, NULL_RTX, op1,
target, unsignedp);
return target;
}
case REDUC_MAX_EXPR:
case REDUC_MIN_EXPR:
case REDUC_PLUS_EXPR:

View file

@ -1,6 +1,6 @@
/* Generate code to initialize optabs from machine description.
Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc.
This file is part of GCC.
@ -203,6 +203,10 @@ static const char * const optabs[] =
"vec_realign_load_optab->handlers[$A].insn_code = CODE_FOR_$(vec_realign_load_$a$)",
"vcond_gen_code[$A] = CODE_FOR_$(vcond$a$)",
"vcondu_gen_code[$A] = CODE_FOR_$(vcondu$a$)",
"ssum_widen_optab->handlers[$A].insn_code = CODE_FOR_$(widen_ssum$I$a3$)",
"usum_widen_optab->handlers[$A].insn_code = CODE_FOR_$(widen_usum$I$a3$)",
"udot_prod_optab->handlers[$A].insn_code = CODE_FOR_$(udot_prod$I$a$)",
"sdot_prod_optab->handlers[$A].insn_code = CODE_FOR_$(sdot_prod$I$a$)",
"reduc_smax_optab->handlers[$A].insn_code = CODE_FOR_$(reduc_smax_$a$)",
"reduc_umax_optab->handlers[$A].insn_code = CODE_FOR_$(reduc_umax_$a$)",
"reduc_smin_optab->handlers[$A].insn_code = CODE_FOR_$(reduc_smin_$a$)",

View file

@ -294,6 +294,12 @@ optab_for_tree_code (enum tree_code code, tree type)
case REALIGN_LOAD_EXPR:
return vec_realign_load_optab;
case WIDEN_SUM_EXPR:
return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
case DOT_PROD_EXPR:
return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
case REDUC_MAX_EXPR:
return TYPE_UNSIGNED (type) ? reduc_umax_optab : reduc_smax_optab;
@ -337,6 +343,154 @@ optab_for_tree_code (enum tree_code code, tree type)
}
/* Expand vector widening operations.
There are two different classes of operations handled here:
1) Operations whose result is wider than all the arguments to the operation.
Examples: VEC_UNPACK_HI/LO_EXPR, VEC_WIDEN_MULT_HI/LO_EXPR
In this case OP0 and optionally OP1 would be initialized,
but WIDE_OP wouldn't (not relevant for this case).
2) Operations whose result is of the same size as the last argument to the
operation, but wider than all the other arguments to the operation.
Examples: WIDEN_SUM_EXPR, VEC_DOT_PROD_EXPR.
In the case WIDE_OP, OP0 and optionally OP1 would be initialized.
E.g, when called to expand the following operations, this is how
the arguments will be initialized:
nops OP0 OP1 WIDE_OP
widening-sum 2 oprnd0 - oprnd1
widening-dot-product 3 oprnd0 oprnd1 oprnd2
widening-mult 2 oprnd0 oprnd1 -
type-promotion (vec-unpack) 1 oprnd0 - - */
rtx
expand_widen_pattern_expr (tree exp, rtx op0, rtx op1, rtx wide_op, rtx target,
int unsignedp)
{
tree oprnd0, oprnd1, oprnd2;
enum machine_mode wmode = 0, tmode0, tmode1 = 0;
optab widen_pattern_optab;
int icode;
enum machine_mode xmode0, xmode1 = 0, wxmode = 0;
rtx temp;
rtx pat;
rtx xop0, xop1, wxop;
int nops = TREE_CODE_LENGTH (TREE_CODE (exp));
oprnd0 = TREE_OPERAND (exp, 0);
tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
widen_pattern_optab =
optab_for_tree_code (TREE_CODE (exp), TREE_TYPE (oprnd0));
icode = (int) widen_pattern_optab->handlers[(int) tmode0].insn_code;
gcc_assert (icode != CODE_FOR_nothing);
xmode0 = insn_data[icode].operand[1].mode;
if (nops >= 2)
{
oprnd1 = TREE_OPERAND (exp, 1);
tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
xmode1 = insn_data[icode].operand[2].mode;
}
/* The last operand is of a wider mode than the rest of the operands. */
if (nops == 2)
{
wmode = tmode1;
wxmode = xmode1;
}
else if (nops == 3)
{
gcc_assert (tmode1 == tmode0);
gcc_assert (op1);
oprnd2 = TREE_OPERAND (exp, 2);
wmode = TYPE_MODE (TREE_TYPE (oprnd2));
wxmode = insn_data[icode].operand[3].mode;
}
if (!wide_op)
wmode = wxmode = insn_data[icode].operand[0].mode;
if (!target
|| ! (*insn_data[icode].operand[0].predicate) (target, wmode))
temp = gen_reg_rtx (wmode);
else
temp = target;
xop0 = op0;
xop1 = op1;
wxop = wide_op;
/* In case the insn wants input operands in modes different from
those of the actual operands, convert the operands. It would
seem that we don't need to convert CONST_INTs, but we do, so
that they're properly zero-extended, sign-extended or truncated
for their mode. */
if (GET_MODE (op0) != xmode0 && xmode0 != VOIDmode)
xop0 = convert_modes (xmode0,
GET_MODE (op0) != VOIDmode
? GET_MODE (op0)
: tmode0,
xop0, unsignedp);
if (op1)
if (GET_MODE (op1) != xmode1 && xmode1 != VOIDmode)
xop1 = convert_modes (xmode1,
GET_MODE (op1) != VOIDmode
? GET_MODE (op1)
: tmode1,
xop1, unsignedp);
if (wide_op)
if (GET_MODE (wide_op) != wxmode && wxmode != VOIDmode)
wxop = convert_modes (wxmode,
GET_MODE (wide_op) != VOIDmode
? GET_MODE (wide_op)
: wmode,
wxop, unsignedp);
/* Now, if insn's predicates don't allow our operands, put them into
pseudo regs. */
if (! (*insn_data[icode].operand[1].predicate) (xop0, xmode0)
&& xmode0 != VOIDmode)
xop0 = copy_to_mode_reg (xmode0, xop0);
if (op1)
{
if (! (*insn_data[icode].operand[2].predicate) (xop1, xmode1)
&& xmode1 != VOIDmode)
xop1 = copy_to_mode_reg (xmode1, xop1);
if (wide_op)
{
if (! (*insn_data[icode].operand[3].predicate) (wxop, wxmode)
&& wxmode != VOIDmode)
wxop = copy_to_mode_reg (wxmode, wxop);
pat = GEN_FCN (icode) (temp, xop0, xop1, wxop);
}
else
pat = GEN_FCN (icode) (temp, xop0, xop1);
}
else
{
if (wide_op)
{
if (! (*insn_data[icode].operand[2].predicate) (wxop, wxmode)
&& wxmode != VOIDmode)
wxop = copy_to_mode_reg (wxmode, wxop);
pat = GEN_FCN (icode) (temp, xop0, wxop);
}
else
pat = GEN_FCN (icode) (temp, xop0);
}
emit_insn (pat);
return temp;
}
/* Generate code to perform an operation specified by TERNARY_OPTAB
on operands OP0, OP1 and OP2, with result having machine-mode MODE.
@ -5139,6 +5293,11 @@ init_optabs (void)
reduc_splus_optab = init_optab (UNKNOWN);
reduc_uplus_optab = init_optab (UNKNOWN);
ssum_widen_optab = init_optab (UNKNOWN);
usum_widen_optab = init_optab (UNKNOWN);
sdot_prod_optab = init_optab (UNKNOWN);
udot_prod_optab = init_optab (UNKNOWN);
vec_extract_optab = init_optab (UNKNOWN);
vec_set_optab = init_optab (UNKNOWN);
vec_init_optab = init_optab (UNKNOWN);

View file

@ -1,5 +1,6 @@
/* Definitions for code generation pass of GNU compiler.
Copyright (C) 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006
Free Software Foundation, Inc.
This file is part of GCC.
@ -241,6 +242,14 @@ enum optab_index
OTI_reduc_splus,
OTI_reduc_uplus,
/* Summation, with result machine mode one or more wider than args. */
OTI_ssum_widen,
OTI_usum_widen,
/* Dot product, with result machine mode one or more wider than args. */
OTI_sdot_prod,
OTI_udot_prod,
/* Set specified field of vector operand. */
OTI_vec_set,
/* Extract specified field of vector operand. */
@ -367,6 +376,11 @@ extern GTY(()) optab optab_table[OTI_MAX];
#define reduc_umin_optab (optab_table[OTI_reduc_umin])
#define reduc_splus_optab (optab_table[OTI_reduc_splus])
#define reduc_uplus_optab (optab_table[OTI_reduc_uplus])
#define ssum_widen_optab (optab_table[OTI_ssum_widen])
#define usum_widen_optab (optab_table[OTI_usum_widen])
#define sdot_prod_optab (optab_table[OTI_sdot_prod])
#define udot_prod_optab (optab_table[OTI_udot_prod])
#define vec_set_optab (optab_table[OTI_vec_set])
#define vec_extract_optab (optab_table[OTI_vec_extract])
@ -495,6 +509,9 @@ extern enum insn_code sync_lock_release[NUM_MACHINE_MODES];
/* Define functions given in optabs.c. */
extern rtx expand_widen_pattern_expr (tree exp, rtx op0, rtx op1, rtx wide_op,
rtx target, int unsignedp);
extern rtx expand_ternary_op (enum machine_mode mode, optab ternary_optab,
rtx op0, rtx op1, rtx op2, rtx target,
int unsignedp);

View file

@ -1,3 +1,22 @@
2006-01-19 Dorit Nuzman <dorit@il.ibm.com>
* lib/target-suports.exp (check_effective_target_vect_sdot_qi): New.
(check_effective_target_vect_udot_qi): New.
(check_effective_target_vect_sdot_hi): New.
(check_effective_target_vect_udot_hi): New.
* gcc.dg/vect/vect.exp: Use dump-details, and compile testcases
prefixed with "wrapv-" with -fwrapv.
* gcc.dg/vect/wrapv-vect-reduc-dot-s8.c: New.
* gcc.dg/vect/vect-reduc-dot-u8.c: New.
* gcc.dg/vect/vect-reduc-dot-u16.c: New.
* gcc.dg/vect/vect-reduc-dot-s8.c: New.
* gcc.dg/vect/vect-reduc-dot-s16.c: New.
* lib/target-suports.exp (check_effective_target_vect_widen_sum): New.
* gcc.dg/vect/vect-reduc-pattern-1.c: New.
* gcc.dg/vect/vect-reduc-pattern-2.c: New.
* gcc.dg/vect/wrapv-vect-reduc-pattern-2.c: New.
2006-01-19 Volker Reichelt <reichelt@igpm.rwth-aachen.de>
PR c++/16829

View file

@ -0,0 +1,70 @@
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 64
#define DOT1 43680
#define DOT2 43680
signed short X[N] __attribute__ ((__aligned__(16)));
signed short Y[N] __attribute__ ((__aligned__(16)));
/* short->short->int dot product.
Not detected as a dot-product pattern.
Currently fails to be vectorized due to presence of type conversions. */
int
foo1(int len) {
int i;
int result = 0;
short prod;
for (i=0; i<len; i++) {
prod = X[i] * Y[i];
result += prod;
}
return result;
}
/* short->int->int dot product.
Detected as a dot-product pattern.
Vectorized on targets that support dot-product for signed shorts. */
int
foo2(int len) {
int i;
int result = 0;
for (i=0; i<len; i++) {
result += (X[i] * Y[i]);
}
return result;
}
int main (void)
{
int i, dot1, dot2;
check_vect ();
for (i=0; i<N; i++) {
X[i] = i;
Y[i] = 64-i;
}
dot1 = foo1 (N);
if (dot1 != DOT1)
abort ();
dot2 = foo2 (N);
if (dot2 != DOT2)
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_sdot_hi } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,111 @@
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 64
#define DOT1 43680
#define DOT2 -21856
#define DOT3 43680
signed char X[N] __attribute__ ((__aligned__(16)));
signed char Y[N] __attribute__ ((__aligned__(16)));
/* char->short->int dot product.
The dot-product pattern should be detected.
Vectorizable on vect_sdot_qi targets (targets that support dot-product of
signed chars).
In the future could also be vectorized as widening-mult + widening-summation,
or with type-conversion support.
*/
int
foo1(int len) {
int i;
int result = 0;
short prod;
for (i=0; i<len; i++) {
prod = X[i] * Y[i];
result += prod;
}
return result;
}
/* char->short->short dot product.
The dot-product pattern should be detected.
The reduction is currently not vectorized becaus of the signed->unsigned->signed
casts, since this patch:
2005-12-26 Kazu Hirata <kazu@codesourcery.com>
PR tree-optimization/25125
When the dot-product is detected, the loop should be vectorized on vect_sdot_qi
targets (targets that support dot-product of signed char).
This test would currently fail to vectorize on targets that support
dot-product of chars when the accumulator is int.
In the future could also be vectorized as widening-mult + summation,
or with type-conversion support.
*/
short
foo2(int len) {
int i;
short result = 0;
for (i=0; i<len; i++) {
result += (X[i] * Y[i]);
}
return result;
}
/* char->int->int dot product.
Not detected as a dot-product pattern.
Currently fails to be vectorized due to presence of type conversions. */
int
foo3(int len) {
int i;
int result = 0;
for (i=0; i<len; i++) {
result += (X[i] * Y[i]);
}
return result;
}
int main (void)
{
int i, dot1, dot3;
short dot2;
check_vect ();
for (i=0; i<N; i++) {
X[i] = i;
Y[i] = 64-i;
}
dot1 = foo1 (N);
if (dot1 != DOT1)
abort ();
dot2 = foo2 (N);
if (dot2 != DOT2)
abort ();
dot3 = foo3 (N);
if (dot3 != DOT3)
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 2 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_sdot_qi } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,77 @@
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 64
#define DOT1 43680
#define DOT2 43680
unsigned short X[N] __attribute__ ((__aligned__(16)));
unsigned short Y[N] __attribute__ ((__aligned__(16)));
/* short->short->int dot product.
Not detected as a dot-product pattern.
Not vectorized due to presence of type-conversions. */
unsigned int
foo1(int len) {
int i;
unsigned int result = 0;
unsigned short prod;
for (i=0; i<len; i++) {
prod = X[i] * Y[i];
result += prod;
}
return result;
}
/* short->int->int dot product.
Currently not detected as a dot-product pattern: the multiplication
promotes the ushorts to int, and then the product is promoted to unsigned
int for the addition. Which results in an int->unsigned int cast, which
since no bits are modified in the cast should be trivially vectorizable. */
unsigned int
foo2(int len) {
int i;
unsigned int result = 0;
for (i=0; i<len; i++) {
result += (X[i] * Y[i]);
}
return result;
}
int main (void)
{
unsigned int dot1, dot2;
int i;
check_vect ();
for (i=0; i<N; i++) {
X[i] = i;
Y[i] = 64-i;
}
dot1 = foo1 (N);
if (dot1 != DOT1)
abort ();
dot2 = foo2 (N);
if (dot2 != DOT2)
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 1 "vect" { xfail *-*-* } } } */
/* Once the dot-product pattern is detected in the second loop, we expect
that loop to be vectorized on vect_udot_hi targets (targets that support
dot-product of unsigned shorts). */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,101 @@
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 64
#define DOT1 43680
#define DOT2 43680
#define DOT3 43680
unsigned char X[N] __attribute__ ((__aligned__(16)));
unsigned char Y[N] __attribute__ ((__aligned__(16)));
/* char->short->int dot product.
Detected as a dot-product pattern.
Should be vectorized on targets that support dot-product for unsigned chars.
*/
unsigned int
foo1(int len) {
int i;
unsigned int result = 0;
unsigned short prod;
for (i=0; i<len; i++) {
prod = X[i] * Y[i];
result += prod;
}
return result;
}
/* char->short->short dot product.
Detected as a dot-product pattern.
Should be vectorized on targets that support dot-product for unsigned chars.
This test currently fails to vectorize on targets that support dot-product
of chars only when the accumulator is int.
*/
unsigned short
foo2(int len) {
int i;
unsigned short result = 0;
for (i=0; i<len; i++) {
result += (unsigned short)(X[i] * Y[i]);
}
return result;
}
/* char->int->int dot product.
Not detected as a dot-product.
Doesn't get vectorized due to presence of type converisons. */
unsigned int
foo3(int len) {
int i;
unsigned int result = 0;
for (i=0; i<len; i++) {
result += (X[i] * Y[i]);
}
return result;
}
int main (void)
{
unsigned int dot1, dot3;
unsigned short dot2;
int i;
check_vect ();
for (i=0; i<N; i++) {
X[i] = i;
Y[i] = 64-i;
}
dot1 = foo1 (N);
if (dot1 != DOT1)
abort ();
dot2 = foo2 (N);
if (dot2 != DOT2)
abort ();
dot3 = foo3 (N);
if (dot3 != DOT3)
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 2 "vect" } } */
/* When the vectorizer is enhanced to vectorize foo2 (accumulation into short) for
targets that support accumulation into int (powerpc, ia64) we'd have:
dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_udot_qi } }
*/
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_udot_qi } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,60 @@
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 16
#define SH_SUM 210
#define CH_SUM 120
int main1 ()
{
int i;
unsigned short udata_sh[N] = {0,2,4,6,8,10,12,14,16,18,20,22,24,26,28};
unsigned char udata_ch[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
unsigned int intsum = 0;
unsigned short shortsum = 0;
/* widenning sum: sum shorts into int. */
for (i = 0; i < N; i++){
intsum += udata_sh[i];
}
/* check results: */
if (intsum != SH_SUM)
abort ();
/* widenning sum: sum chars into int. */
intsum = 0;
for (i = 0; i < N; i++){
intsum += udata_ch[i];
}
/* check results: */
if (intsum != CH_SUM)
abort ();
/* widenning sum: sum chars into short.
pattern detected, but not vectorized yet. */
for (i = 0; i < N; i++){
shortsum += udata_ch[i];
}
/* check results: */
if (shortsum != CH_SUM)
abort ();
return 0;
}
int main (void)
{
check_vect ();
return main1 ();
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: detected" 3 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target vect_widen_sum } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,67 @@
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 16
#define SH_SUM 210
#define CH_SUM 120
int main1 ()
{
int i;
signed short data_sh[N] = {0,2,4,6,8,10,12,14,16,18,20,22,24,26,28};
signed char data_ch[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
signed int intsum = 0;
signed short shortsum = 0;
/* widenning sum: sum shorts into int. */
for (i = 0; i < N; i++){
intsum += data_sh[i];
}
/* check results: */
if (intsum != SH_SUM)
abort ();
/* widenning sum: sum chars into int. */
intsum = 0;
for (i = 0; i < N; i++){
intsum += data_ch[i];
}
/* check results: */
if (intsum != CH_SUM)
abort ();
/* widenning sum: sum chars into short.
The widening-summation pattern is currently not detected because of this
patch:
2005-12-26 Kazu Hirata <kazu@codesourcery.com>
PR tree-optimization/25125
*/
for (i = 0; i < N; i++){
shortsum += data_ch[i];
}
/* check results: */
if (shortsum != CH_SUM)
abort ();
return 0;
}
int main (void)
{
check_vect ();
return main1 ();
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: detected" 3 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: detected" 2 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target vect_widen_sum } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -1,4 +1,4 @@
# Copyright (C) 1997, 2004 Free Software Foundation, Inc.
# Copyright (C) 1997, 2004, 2005, 2006 Free Software Foundation, Inc.
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@ -78,7 +78,7 @@ dg-init
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/nodump-*.\[cS\]]] \
"" $DEFAULT_VECTCFLAGS
lappend DEFAULT_VECTCFLAGS "-ftree-vectorizer-verbose=4" "-fdump-tree-vect-stats"
lappend DEFAULT_VECTCFLAGS "-fdump-tree-vect-details"
# Main loop.
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/pr*.\[cS\]]] \
@ -96,6 +96,12 @@ lappend DEFAULT_VECTCFLAGS "-ffast-math"
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/fast-math-vect*.\[cS\]]] \
"" $DEFAULT_VECTCFLAGS
# -fwrapv tests
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
lappend DEFAULT_VECTCFLAGS "-fwrapv"
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/wrapv-vect*.\[cS\]]] \
"" $DEFAULT_VECTCFLAGS
# -ftrapv tests
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
lappend DEFAULT_VECTCFLAGS "-ftrapv"

View file

@ -0,0 +1,108 @@
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 64
#define DOT1 43680
#define DOT2 -21856
#define DOT3 43680
signed char X[N] __attribute__ ((__aligned__(16)));
signed char Y[N] __attribute__ ((__aligned__(16)));
/* char->short->int dot product.
The dot-product pattern should be detected.
Vectorizable on vect_sdot_qi targets (targets that support dot-product of
signed chars).
In the future could also be vectorized as widening-mult + widening-summation,
or with type-conversion support.
*/
int
foo1(int len) {
int i;
int result = 0;
short prod;
for (i=0; i<len; i++) {
prod = X[i] * Y[i];
result += prod;
}
return result;
}
/* char->short->short dot product.
The dot-product pattern should be detected.
Should be vectorized on vect_sdot_qi targets (targets that support
dot-product of signed char).
This test currently fails to vectorize on targets that support
dot-product of chars when the accumulator is int.
In the future could also be vectorized as widening-mult + summation,
or with type-conversion support.
*/
short
foo2(int len) {
int i;
short result = 0;
for (i=0; i<len; i++) {
result += (X[i] * Y[i]);
}
return result;
}
/* char->int->int dot product.
Not detected as a dot-product pattern.
Currently fails to be vectorized due to presence of type conversions. */
int
foo3(int len) {
int i;
int result = 0;
for (i=0; i<len; i++) {
result += (X[i] * Y[i]);
}
return result;
}
int main (void)
{
int i, dot1, dot3;
short dot2;
check_vect ();
for (i=0; i<N; i++) {
X[i] = i;
Y[i] = 64-i;
}
dot1 = foo1 (N);
if (dot1 != DOT1)
abort ();
dot2 = foo2 (N);
if (dot2 != DOT2)
abort ();
dot3 = foo3 (N);
if (dot3 != DOT3)
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 2 "vect" } } */
/* When vectorizer is enhanced to vectorize foo2 (accumulation into short) for targets
that support accumulation into int (ia64) we'd have:
dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_sdot_qi } }
*/
/* In the meantime expect: */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_sdot_qi } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,59 @@
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 16
#define SH_SUM 210
#define CH_SUM 120
int main1 ()
{
int i;
signed short data_sh[N] = {0,2,4,6,8,10,12,14,16,18,20,22,24,26,28};
signed char data_ch[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
signed int intsum = 0;
signed short shortsum = 0;
/* widenning sum: sum shorts into int. */
for (i = 0; i < N; i++){
intsum += data_sh[i];
}
/* check results: */
if (intsum != SH_SUM)
abort ();
/* widenning sum: sum chars into int. */
intsum = 0;
for (i = 0; i < N; i++){
intsum += data_ch[i];
}
/* check results: */
if (intsum != CH_SUM)
abort ();
/* widenning sum: sum chars into short. */
for (i = 0; i < N; i++){
shortsum += data_ch[i];
}
/* check results: */
if (shortsum != CH_SUM)
abort ();
return 0;
}
int main (void)
{
check_vect ();
return main1 ();
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: detected" 3 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target vect_widen_sum } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -1364,6 +1364,112 @@ proc check_effective_target_vect_no_bitwise { } {
return $et_vect_no_bitwise_saved
}
# Return 1 if the target plus current options supports a vector
# widening summation, 0 otherwise.
#
# This won't change for different subtargets so cache the result.
proc check_effective_target_vect_widen_sum { } {
global et_vect_widen_sum
if [info exists et_vect_widen_sum_saved] {
verbose "check_effective_target_vect_widen_sum: using cached result" 2
} else {
set et_vect_widen_sum_saved 0
if { [istarget powerpc*-*-*]
|| [istarget ia64-*-*] } {
set et_vect_widen_sum_saved 1
}
}
verbose "check_effective_target_vect_widen_sum: returning $et_vect_widen_sum_saved" 2
return $et_vect_widen_sum_saved
}
# Return 1 if the target plus current options supports a vector
# dot-product of signed chars, 0 otherwise.
#
# This won't change for different subtargets so cache the result.
proc check_effective_target_vect_sdot_qi { } {
global et_vect_sdot_qi
if [info exists et_vect_sdot_qi_saved] {
verbose "check_effective_target_vect_sdot_qi: using cached result" 2
} else {
set et_vect_sdot_qi_saved 0
if { [istarget ia64-*-*] } {
set et_vect_sdot_qi_saved 1
}
}
verbose "check_effective_target_vect_sdot_qi: returning $et_vect_sdot_qi_saved" 2
return $et_vect_sdot_qi_saved
}
# Return 1 if the target plus current options supports a vector
# dot-product of unsigned chars, 0 otherwise.
#
# This won't change for different subtargets so cache the result.
proc check_effective_target_vect_udot_qi { } {
global et_vect_udot_qi
if [info exists et_vect_udot_qi_saved] {
verbose "check_effective_target_vect_udot_qi: using cached result" 2
} else {
set et_vect_udot_qi_saved 0
if { [istarget powerpc*-*-*]
|| [istarget ia64-*-*] } {
set et_vect_udot_qi_saved 1
}
}
verbose "check_effective_target_vect_udot_qi: returning $et_vect_udot_qi_saved" 2
return $et_vect_udot_qi_saved
}
# Return 1 if the target plus current options supports a vector
# dot-product of signed shorts, 0 otherwise.
#
# This won't change for different subtargets so cache the result.
proc check_effective_target_vect_sdot_hi { } {
global et_vect_sdot_hi
if [info exists et_vect_sdot_hi_saved] {
verbose "check_effective_target_vect_sdot_hi: using cached result" 2
} else {
set et_vect_sdot_hi_saved 0
if { [istarget powerpc*-*-*]
|| [istarget i?86-*-*]
|| [istarget x86_64-*-*]
|| [istarget ia64-*-*] } {
set et_vect_sdot_hi_saved 1
}
}
verbose "check_effective_target_vect_sdot_hi: returning $et_vect_sdot_hi_saved" 2
return $et_vect_sdot_hi_saved
}
# Return 1 if the target plus current options supports a vector
# dot-product of unsigned shorts, 0 otherwise.
#
# This won't change for different subtargets so cache the result.
proc check_effective_target_vect_udot_hi { } {
global et_vect_udot_hi
if [info exists et_vect_udot_hi_saved] {
verbose "check_effective_target_vect_udot_hi: using cached result" 2
} else {
set et_vect_udot_hi_saved 0
if { [istarget powerpc*-*-*] } {
set et_vect_udot_hi_saved 1
}
}
verbose "check_effective_target_vect_udot_hi: returning $et_vect_udot_hi_saved" 2
return $et_vect_udot_hi_saved
}
# Return 1 if the target plus current options does not support a vector
# alignment mechanism, 0 otherwise.
#

View file

@ -1,5 +1,5 @@
/* Tree inlining.
Copyright 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
Copyright 2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc.
Contributed by Alexandre Oliva <aoliva@redhat.com>
This file is part of GCC.
@ -1728,6 +1728,10 @@ estimate_num_insns_1 (tree *tp, int *walk_subtrees, void *data)
case REDUC_MAX_EXPR:
case REDUC_MIN_EXPR:
case REDUC_PLUS_EXPR:
case WIDEN_SUM_EXPR:
case DOT_PROD_EXPR:
case WIDEN_MULT_EXPR:
case RESX_EXPR:
*count += 1;

View file

@ -1,5 +1,6 @@
/* Pretty formatting of GENERIC trees in C syntax.
Copyright (C) 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006
Free Software Foundation, Inc.
Adapted from c-pretty-print.c by Diego Novillo <dnovillo@redhat.com>
This file is part of GCC.
@ -1168,6 +1169,8 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
break;
/* Binary arithmetic and logic expressions. */
case WIDEN_SUM_EXPR:
case WIDEN_MULT_EXPR:
case MULT_EXPR:
case PLUS_EXPR:
case MINUS_EXPR:
@ -1686,6 +1689,16 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
pp_string (buffer, " > ");
break;
case DOT_PROD_EXPR:
pp_string (buffer, " DOT_PROD_EXPR < ");
dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
pp_string (buffer, " , ");
dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
pp_string (buffer, " , ");
dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
pp_string (buffer, " > ");
break;
case OMP_PARALLEL:
pp_string (buffer, "#pragma omp parallel");
dump_omp_clauses (buffer, OMP_PARALLEL_CLAUSES (node), spc, flags);
@ -2105,10 +2118,13 @@ op_prio (tree op)
case RROTATE_EXPR:
return 11;
case WIDEN_SUM_EXPR:
case PLUS_EXPR:
case MINUS_EXPR:
return 12;
case WIDEN_MULT_EXPR:
case DOT_PROD_EXPR:
case MULT_EXPR:
case TRUNC_DIV_EXPR:
case CEIL_DIV_EXPR:
@ -2263,6 +2279,12 @@ op_symbol_1 (enum tree_code code)
case REDUC_PLUS_EXPR:
return "r+";
case WIDEN_SUM_EXPR:
return "w+";
case WIDEN_MULT_EXPR:
return "w*";
case NEGATE_EXPR:
case MINUS_EXPR:
return "-";

View file

@ -1,5 +1,5 @@
/* SSA operands management for trees.
Copyright (C) 2003, 2004, 2005 Free Software Foundation, Inc.
Copyright (C) 2003, 2004, 2005, 2006 Free Software Foundation, Inc.
This file is part of GCC.
@ -1273,6 +1273,7 @@ get_expr_operands (tree stmt, tree *expr_p, int flags)
return;
}
case DOT_PROD_EXPR:
case REALIGN_LOAD_EXPR:
{
get_expr_operands (stmt, &TREE_OPERAND (expr, 0), flags);

View file

@ -1,5 +1,5 @@
/* Analysis Utilities for Loop Vectorization.
Copyright (C) 2003,2004,2005 Free Software Foundation, Inc.
Copyright (C) 2003,2004,2005,2006 Free Software Foundation, Inc.
Contributed by Dorit Naishlos <dorit@il.ibm.com>
This file is part of GCC.
@ -142,35 +142,46 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
return false;
}
if (STMT_VINFO_DATA_REF (stmt_info))
scalar_type = TREE_TYPE (DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
else if (TREE_CODE (stmt) == MODIFY_EXPR)
scalar_type = TREE_TYPE (TREE_OPERAND (stmt, 0));
else
scalar_type = TREE_TYPE (stmt);
if (STMT_VINFO_VECTYPE (stmt_info))
{
vectype = STMT_VINFO_VECTYPE (stmt_info);
scalar_type = TREE_TYPE (vectype);
}
else
{
if (STMT_VINFO_DATA_REF (stmt_info))
scalar_type =
TREE_TYPE (DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
else if (TREE_CODE (stmt) == MODIFY_EXPR)
scalar_type = TREE_TYPE (TREE_OPERAND (stmt, 0));
else
scalar_type = TREE_TYPE (stmt);
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "get vectype for scalar type: ");
print_generic_expr (vect_dump, scalar_type, TDF_SLIM);
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "get vectype for scalar type: ");
print_generic_expr (vect_dump, scalar_type, TDF_SLIM);
}
vectype = get_vectype_for_scalar_type (scalar_type);
if (!vectype)
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
{
fprintf (vect_dump,
"not vectorized: unsupported data-type ");
print_generic_expr (vect_dump, scalar_type, TDF_SLIM);
}
return false;
}
STMT_VINFO_VECTYPE (stmt_info) = vectype;
}
vectype = get_vectype_for_scalar_type (scalar_type);
if (!vectype)
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
{
fprintf (vect_dump, "not vectorized: unsupported data-type ");
print_generic_expr (vect_dump, scalar_type, TDF_SLIM);
}
return false;
}
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "vectype: ");
print_generic_expr (vect_dump, vectype, TDF_SLIM);
}
STMT_VINFO_VECTYPE (stmt_info) = vectype;
nunits = TYPE_VECTOR_SUBPARTS (vectype);
if (vect_print_dump_info (REPORT_DETAILS))
@ -1439,6 +1450,24 @@ vect_mark_relevant (VEC(tree,heap) **worklist, tree stmt,
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "mark relevant %d, live %d.",relevant_p, live_p);
if (STMT_VINFO_IN_PATTERN_P (stmt_info))
{
tree pattern_stmt;
/* This is the last stmt in a sequence that was detected as a
pattern that can potentially be vectorized. Don't mark the stmt
as relevant/live because it's not going to vectorized.
Instead mark the pattern-stmt that replaces it. */
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "last stmt in pattern. don't mark relevant/live.");
pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
stmt_info = vinfo_for_stmt (pattern_stmt);
gcc_assert (STMT_VINFO_RELATED_STMT (stmt_info) == stmt);
save_relevant_p = STMT_VINFO_RELEVANT_P (stmt_info);
save_live_p = STMT_VINFO_LIVE_P (stmt_info);
stmt = pattern_stmt;
}
STMT_VINFO_LIVE_P (stmt_info) |= live_p;
STMT_VINFO_RELEVANT_P (stmt_info) |= relevant_p;
@ -2002,6 +2031,8 @@ vect_analyze_loop (struct loop *loop)
vect_analyze_scalar_cycles (loop_vinfo);
vect_pattern_recog (loop_vinfo);
/* Data-flow analysis to detect stmts that do not need to be vectorized. */
ok = vect_mark_stmts_to_be_vectorized (loop_vinfo);

View file

@ -1,5 +1,5 @@
/* Lower vector operations to scalar operations.
Copyright (C) 2004, 2005 Free Software Foundation, Inc.
Copyright (C) 2004, 2005, 2006 Free Software Foundation, Inc.
This file is part of GCC.
@ -411,6 +411,11 @@ expand_vector_operations_1 (block_stmt_iterator *bsi)
gcc_assert (code != CONVERT_EXPR);
op = optab_for_tree_code (code, type);
/* For widening vector operations, the relevant type is of the arguments,
not the widened result. */
if (code == WIDEN_SUM_EXPR)
type = TREE_TYPE (TREE_OPERAND (rhs, 0));
/* Optabs will try converting a negation into a subtraction, so
look for it as well. TODO: negation of floating-point vectors
might be turned into an exclusive OR toggling the sign bit. */

637
gcc/tree-vect-patterns.c Normal file
View file

@ -0,0 +1,637 @@
/* Analysis Utilities for Loop Vectorization.
Copyright (C) 2006 Free Software Foundation, Inc.
Contributed by Dorit Nuzman <dorit@il.ibm.com>
This file is part of GCC.
GCC is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2, or (at your option) any later
version.
GCC is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details.
You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING. If not, write to the Free
Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301, USA. */
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "tm.h"
#include "ggc.h"
#include "tree.h"
#include "target.h"
#include "basic-block.h"
#include "diagnostic.h"
#include "tree-flow.h"
#include "tree-dump.h"
#include "timevar.h"
#include "cfgloop.h"
#include "expr.h"
#include "optabs.h"
#include "params.h"
#include "tree-data-ref.h"
#include "tree-vectorizer.h"
#include "recog.h"
#include "toplev.h"
/* Funcion prototypes */
static void vect_pattern_recog_1
(tree (* ) (tree, tree *, tree *), block_stmt_iterator);
static bool widened_name_p (tree, tree, tree *, tree *);
/* Pattern recognition functions */
static tree vect_recog_widen_sum_pattern (tree, tree *, tree *);
static tree vect_recog_widen_mult_pattern (tree, tree *, tree *);
static tree vect_recog_dot_prod_pattern (tree, tree *, tree *);
static vect_recog_func_ptr vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
vect_recog_widen_mult_pattern,
vect_recog_widen_sum_pattern,
vect_recog_dot_prod_pattern};
/* Function widened_name_p
Check whether NAME, an ssa-name used in USE_STMT,
is a result of a type-promotion, such that:
DEF_STMT: NAME = NOP (name0)
where the type of name0 (HALF_TYPE) is smaller than the type of NAME.
*/
static bool
widened_name_p (tree name, tree use_stmt, tree *half_type, tree *def_stmt)
{
tree dummy;
loop_vec_info loop_vinfo;
stmt_vec_info stmt_vinfo;
tree expr;
tree type = TREE_TYPE (name);
tree oprnd0;
enum vect_def_type dt;
tree def;
stmt_vinfo = vinfo_for_stmt (use_stmt);
loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
if (!vect_is_simple_use (name, loop_vinfo, def_stmt, &def, &dt))
return false;
if (dt != vect_loop_def
&& dt != vect_invariant_def && dt != vect_constant_def)
return false;
if (! *def_stmt)
return false;
if (TREE_CODE (*def_stmt) != MODIFY_EXPR)
return false;
expr = TREE_OPERAND (*def_stmt, 1);
if (TREE_CODE (expr) != NOP_EXPR)
return false;
oprnd0 = TREE_OPERAND (expr, 0);
*half_type = TREE_TYPE (oprnd0);
if (!INTEGRAL_TYPE_P (type) || !INTEGRAL_TYPE_P (*half_type)
|| (TYPE_UNSIGNED (type) != TYPE_UNSIGNED (*half_type))
|| (TYPE_PRECISION (type) < (TYPE_PRECISION (*half_type) * 2)))
return false;
if (!vect_is_simple_use (oprnd0, loop_vinfo, &dummy, &dummy, &dt))
return false;
if (dt != vect_invariant_def && dt != vect_constant_def
&& dt != vect_loop_def)
return false;
return true;
}
/* Function vect_recog_dot_prod_pattern
Try to find the following pattern:
type x_t, y_t;
TYPE1 prod;
TYPE2 sum = init;
loop:
sum_0 = phi <init, sum_1>
S1 x_t = ...
S2 y_t = ...
S3 x_T = (TYPE1) x_t;
S4 y_T = (TYPE1) y_t;
S5 prod = x_T * y_T;
[S6 prod = (TYPE2) prod; #optional]
S7 sum_1 = prod + sum_0;
where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
same size of 'TYPE1' or bigger. This is a sepcial case of a reduction
computation.
Input:
* LAST_STMT: A stmt from which the pattern search begins. In the example,
when this function is called with S7, the pattern {S3,S4,S5,S6,S7} will be
detected.
Output:
* TYPE_IN: The type of the input arguments to the pattern.
* TYPE_OUT: The type of the output of this pattern.
* Return value: A new stmt that will be used to replace the sequence of
stmts that constitute the pattern. In this case it will be:
WIDEN_DOT_PRODUCT <x_t, y_t, sum_0>
*/
static tree
vect_recog_dot_prod_pattern (tree last_stmt, tree *type_in, tree *type_out)
{
tree stmt, expr;
tree oprnd0, oprnd1;
tree oprnd00, oprnd01;
stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
tree type, half_type;
tree pattern_expr;
tree prod_type;
if (TREE_CODE (last_stmt) != MODIFY_EXPR)
return NULL;
expr = TREE_OPERAND (last_stmt, 1);
type = TREE_TYPE (expr);
/* Look for the following pattern
DX = (TYPE1) X;
DY = (TYPE1) Y;
DPROD = DX * DY;
DDPROD = (TYPE2) DPROD;
sum_1 = DDPROD + sum_0;
In which
- DX is double the size of X
- DY is double the size of Y
- DX, DY, DPROD all have the same type
- sum is the same size of DPROD or bigger
- sum has been recognized as a reduction variable.
This is equivalent to:
DPROD = X w* Y; #widen mult
sum_1 = DPROD w+ sum_0; #widen summation
or
DPROD = X w* Y; #widen mult
sum_1 = DPROD + sum_0; #summation
*/
/* Starting from LAST_STMT, follow the defs of its uses in search
of the above pattern. */
if (TREE_CODE (expr) != PLUS_EXPR)
return NULL;
if (STMT_VINFO_IN_PATTERN_P (stmt_vinfo))
{
/* Has been detected as widening-summation? */
stmt = STMT_VINFO_RELATED_STMT (stmt_vinfo);
expr = TREE_OPERAND (stmt, 1);
type = TREE_TYPE (expr);
if (TREE_CODE (expr) != WIDEN_SUM_EXPR)
return NULL;
oprnd0 = TREE_OPERAND (expr, 0);
oprnd1 = TREE_OPERAND (expr, 1);
half_type = TREE_TYPE (oprnd0);
}
else
{
tree def_stmt;
if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def)
return NULL;
oprnd0 = TREE_OPERAND (expr, 0);
oprnd1 = TREE_OPERAND (expr, 1);
if (TYPE_MAIN_VARIANT (TREE_TYPE (oprnd0)) != TYPE_MAIN_VARIANT (type)
|| TYPE_MAIN_VARIANT (TREE_TYPE (oprnd1)) != TYPE_MAIN_VARIANT (type))
return NULL;
stmt = last_stmt;
if (widened_name_p (oprnd0, stmt, &half_type, &def_stmt))
{
stmt = def_stmt;
expr = TREE_OPERAND (stmt, 1);
oprnd0 = TREE_OPERAND (expr, 0);
}
else
half_type = type;
}
/* So far so good. Since last_stmt was detected as a (summation) reduction,
we know that oprnd1 is the reduction variable (defined by a loop-header
phi), and oprnd0 is an ssa-name defined by a stmt in the loop body.
Left to check that oprnd0 is defined by a (widen_)mult_expr */
prod_type = half_type;
stmt = SSA_NAME_DEF_STMT (oprnd0);
gcc_assert (stmt);
stmt_vinfo = vinfo_for_stmt (stmt);
gcc_assert (stmt_vinfo);
gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_loop_def);
expr = TREE_OPERAND (stmt, 1);
if (TREE_CODE (expr) != MULT_EXPR)
return NULL;
if (STMT_VINFO_IN_PATTERN_P (stmt_vinfo))
{
/* Has been detected as a widening multiplication? */
stmt = STMT_VINFO_RELATED_STMT (stmt_vinfo);
expr = TREE_OPERAND (stmt, 1);
if (TREE_CODE (expr) != WIDEN_MULT_EXPR)
return NULL;
stmt_vinfo = vinfo_for_stmt (stmt);
gcc_assert (stmt_vinfo);
gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_loop_def);
oprnd00 = TREE_OPERAND (expr, 0);
oprnd01 = TREE_OPERAND (expr, 1);
}
else
{
tree half_type0, half_type1;
tree def_stmt;
tree oprnd0, oprnd1;
oprnd0 = TREE_OPERAND (expr, 0);
oprnd1 = TREE_OPERAND (expr, 1);
if (TYPE_MAIN_VARIANT (TREE_TYPE (oprnd0))
!= TYPE_MAIN_VARIANT (prod_type)
|| TYPE_MAIN_VARIANT (TREE_TYPE (oprnd1))
!= TYPE_MAIN_VARIANT (prod_type))
return NULL;
if (!widened_name_p (oprnd0, stmt, &half_type0, &def_stmt))
return NULL;
oprnd00 = TREE_OPERAND (TREE_OPERAND (def_stmt, 1), 0);
if (!widened_name_p (oprnd1, stmt, &half_type1, &def_stmt))
return NULL;
oprnd01 = TREE_OPERAND (TREE_OPERAND (def_stmt, 1), 0);
if (TYPE_MAIN_VARIANT (half_type0) != TYPE_MAIN_VARIANT (half_type1))
return NULL;
if (TYPE_PRECISION (prod_type) != TYPE_PRECISION (half_type0) * 2)
return NULL;
}
half_type = TREE_TYPE (oprnd00);
*type_in = half_type;
*type_out = type;
/* Pattern detected. Create a stmt to be used to replace the pattern: */
pattern_expr = build3 (DOT_PROD_EXPR, type, oprnd00, oprnd01, oprnd1);
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "vect_recog_dot_prod_pattern: detected: ");
print_generic_expr (vect_dump, pattern_expr, TDF_SLIM);
}
return pattern_expr;
}
/* Function vect_recog_widen_mult_pattern
Try to find the following pattern:
type a_t, b_t;
TYPE a_T, b_T, prod_T;
S1 a_t = ;
S2 b_t = ;
S3 a_T = (TYPE) a_t;
S4 b_T = (TYPE) b_t;
S5 prod_T = a_T * b_T;
where type 'TYPE' is at least double the size of type 'type'.
Input:
* LAST_STMT: A stmt from which the pattern search begins. In the example,
when this function is called with S5, the pattern {S3,S4,S5} is be detected.
Output:
* TYPE_IN: The type of the input arguments to the pattern.
* TYPE_OUT: The type of the output of this pattern.
* Return value: A new stmt that will be used to replace the sequence of
stmts that constitute the pattern. In this case it will be:
WIDEN_MULT <a_t, b_t>
*/
static tree
vect_recog_widen_mult_pattern (tree last_stmt ATTRIBUTE_UNUSED,
tree *type_in ATTRIBUTE_UNUSED,
tree *type_out ATTRIBUTE_UNUSED)
{
/* Yet to be implemented. */
return NULL;
}
/* Function vect_recog_widen_sum_pattern
Try to find the following pattern:
type x_t;
TYPE x_T, sum = init;
loop:
sum_0 = phi <init, sum_1>
S1 x_t = *p;
S2 x_T = (TYPE) x_t;
S3 sum_1 = x_T + sum_0;
where type 'TYPE' is at least double the size of type 'type', i.e - we're
summing elements of type 'type' into an accumulator of type 'TYPE'. This is
a sepcial case of a reduction computation.
Input:
* LAST_STMT: A stmt from which the pattern search begins. In the example,
when this function is called with S3, the pattern {S2,S3} will be detected.
Output:
* TYPE_IN: The type of the input arguments to the pattern.
* TYPE_OUT: The type of the output of this pattern.
* Return value: A new stmt that will be used to replace the sequence of
stmts that constitute the pattern. In this case it will be:
WIDEN_SUM <x_t, sum_0>
*/
static tree
vect_recog_widen_sum_pattern (tree last_stmt, tree *type_in, tree *type_out)
{
tree stmt, expr;
tree oprnd0, oprnd1;
stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
tree type, half_type;
tree pattern_expr;
if (TREE_CODE (last_stmt) != MODIFY_EXPR)
return NULL;
expr = TREE_OPERAND (last_stmt, 1);
type = TREE_TYPE (expr);
/* Look for the following pattern
DX = (TYPE) X;
sum_1 = DX + sum_0;
In which DX is at least double the size of X, and sum_1 has been
recognized as a reduction variable.
*/
/* Starting from LAST_STMT, follow the defs of its uses in search
of the above pattern. */
if (TREE_CODE (expr) != PLUS_EXPR)
return NULL;
if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def)
return NULL;
oprnd0 = TREE_OPERAND (expr, 0);
oprnd1 = TREE_OPERAND (expr, 1);
if (TYPE_MAIN_VARIANT (TREE_TYPE (oprnd0)) != TYPE_MAIN_VARIANT (type)
|| TYPE_MAIN_VARIANT (TREE_TYPE (oprnd1)) != TYPE_MAIN_VARIANT (type))
return NULL;
/* So far so good. Since last_stmt was detected as a (summation) reduction,
we know that oprnd1 is the reduction variable (defined by a loop-header
phi), and oprnd0 is an ssa-name defined by a stmt in the loop body.
Left to check that oprnd0 is defined by a cast from type 'type' to type
'TYPE'. */
if (!widened_name_p (oprnd0, last_stmt, &half_type, &stmt))
return NULL;
oprnd0 = TREE_OPERAND (TREE_OPERAND (stmt, 1), 0);
*type_in = half_type;
*type_out = type;
/* Pattern detected. Create a stmt to be used to replace the pattern: */
pattern_expr = build2 (WIDEN_SUM_EXPR, type, oprnd0, oprnd1);
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "vect_recog_widen_sum_pattern: detected: ");
print_generic_expr (vect_dump, pattern_expr, TDF_SLIM);
}
return pattern_expr;
}
/* Function vect_pattern_recog_1
Input:
PATTERN_RECOG_FUNC: A pointer to a function that detects a certain
computation pattern.
STMT: A stmt from which the pattern search should start.
If PATTERN_RECOG_FUNC successfully detected the pattern, it creates an
expression that computes the same functionality and can be used to
replace the sequence of stmts that are involved in the pattern.
Output:
This function checks if the expression returned by PATTERN_RECOG_FUNC is
supported in vector form by the target. We use 'TYPE_IN' to obtain the
relevant vector type. If 'TYPE_IN' is already a vector type, then this
indicates that target support had already been checked by PATTERN_RECOG_FUNC.
If 'TYPE_OUT' is also returned by PATTERN_RECOG_FUNC, we check that it fits
to the available target pattern.
This function also does some bookeeping, as explained in the documentation
for vect_recog_pattern. */
static void
vect_pattern_recog_1 (
tree (* vect_recog_func_ptr) (tree, tree *, tree *),
block_stmt_iterator si)
{
tree stmt = bsi_stmt (si);
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
stmt_vec_info pattern_stmt_info;
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
tree pattern_expr;
tree pattern_vectype;
tree type_in, type_out;
tree pattern_type;
enum tree_code code;
tree var, var_name;
stmt_ann_t ann;
pattern_expr = (* vect_recog_func_ptr) (stmt, &type_in, &type_out);
if (!pattern_expr)
return;
if (VECTOR_MODE_P (TYPE_MODE (type_in)))
{
/* No need to check target support (already checked by the pattern
recognition function). */
pattern_vectype = type_in;
}
else
{
enum tree_code vec_mode;
enum insn_code icode;
optab optab;
/* Check target support */
pattern_vectype = get_vectype_for_scalar_type (type_in);
optab = optab_for_tree_code (TREE_CODE (pattern_expr), pattern_vectype);
vec_mode = TYPE_MODE (pattern_vectype);
if (!optab
|| (icode = optab->handlers[(int) vec_mode].insn_code) ==
CODE_FOR_nothing
|| (type_out
&& (insn_data[icode].operand[0].mode !=
TYPE_MODE (get_vectype_for_scalar_type (type_out)))))
return;
}
/* Found a vectorizable pattern. */
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "pattern recognized: ");
print_generic_expr (vect_dump, pattern_expr, TDF_SLIM);
}
/* Mark the stmts that are involved in the pattern,
create a new stmt to express the pattern and insert it. */
code = TREE_CODE (pattern_expr);
pattern_type = TREE_TYPE (pattern_expr);
var = create_tmp_var (pattern_type, "patt");
add_referenced_tmp_var (var);
var_name = make_ssa_name (var, NULL_TREE);
pattern_expr = build2 (MODIFY_EXPR, void_type_node, var_name, pattern_expr);
SSA_NAME_DEF_STMT (var_name) = pattern_expr;
bsi_insert_before (&si, pattern_expr, BSI_SAME_STMT);
ann = stmt_ann (pattern_expr);
set_stmt_info ((tree_ann_t)ann, new_stmt_vec_info (pattern_expr, loop_vinfo));
pattern_stmt_info = vinfo_for_stmt (pattern_expr);
STMT_VINFO_RELATED_STMT (pattern_stmt_info) = stmt;
STMT_VINFO_DEF_TYPE (pattern_stmt_info) = STMT_VINFO_DEF_TYPE (stmt_info);
STMT_VINFO_VECTYPE (pattern_stmt_info) = pattern_vectype;
STMT_VINFO_IN_PATTERN_P (stmt_info) = true;
STMT_VINFO_RELATED_STMT (stmt_info) = pattern_expr;
return;
}
/* Function vect_pattern_recog
Input:
LOOP_VINFO - a struct_loop_info of a loop in which we want to look for
computation idioms.
Output - for each computation idiom that is detected we insert a new stmt
that provides the same functionality and that can be vectorized. We
also record some information in the struct_stmt_info of the relevant
stmts, as explained below:
At the entry to this function we have the following stmts, with the
following initial value in the STMT_VINFO fields:
stmt in_pattern_p related_stmt vec_stmt
S1: a_i = .... - - -
S2: a_2 = ..use(a_i).. - - -
S3: a_1 = ..use(a_2).. - - -
S4: a_0 = ..use(a_1).. - - -
S5: ... = ..use(a_0).. - - -
Say the sequence {S1,S2,S3,S4} was detected as a pattern that can be
represented by a single stmt. We then:
- create a new stmt S6 that will replace the pattern.
- insert the new stmt S6 before the last stmt in the pattern
- fill in the STMT_VINFO fields as follows:
in_pattern_p related_stmt vec_stmt
S1: a_i = .... - - -
S2: a_2 = ..use(a_i).. - - -
S3: a_1 = ..use(a_2).. - - -
> S6: a_new = .... - S4 -
S4: a_0 = ..use(a_1).. true S6 -
S5: ... = ..use(a_0).. - - -
(the last stmt in the pattern (S4) and the new pattern stmt (S6) point
to each other through the RELATED_STMT field).
S6 will be marked as relevant in vect_mark_stmts_to_be_vectorized instead
of S4 because it will replace all its uses. Stmts {S1,S2,S3} will
remain irrelevant unless used by stmts other than S4.
If vectorization succeeds, vect_transform_stmt will skip over {S1,S2,S3}
(because they are marked as irrelevent). It will vectorize S6, and record
a pointer to the new vector stmt VS6 both from S6 (as usual), and also
from S4. We do that so that when we get to vectorizing stmts that use the
def of S4 (like S5 that uses a_0), we'll know where to take the relevant
vector-def from. S4 will be skipped, and S5 will be vectorized as usual:
in_pattern_p related_stmt vec_stmt
S1: a_i = .... - - -
S2: a_2 = ..use(a_i).. - - -
S3: a_1 = ..use(a_2).. - - -
> VS6: va_new = .... - - -
S6: a_new = .... - S4 VS6
S4: a_0 = ..use(a_1).. true S6 VS6
> VS5: ... = ..vuse(va_new).. - - -
S5: ... = ..use(a_0).. - - -
DCE could then get rid of {S1,S2,S3,S4,S5,S6} (if their defs are not used
elsewhere), and we'll end up with:
VS6: va_new = ....
VS5: ... = ..vuse(va_new)..
If vectorization does not succeed, DCE will clean S6 away (its def is
not used), and we'll end up with the original sequence.
*/
void
vect_pattern_recog (loop_vec_info loop_vinfo)
{
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
unsigned int nbbs = loop->num_nodes;
block_stmt_iterator si;
tree stmt;
unsigned int i, j;
tree (* vect_recog_func_ptr) (tree, tree *, tree *);
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "=== vect_pattern_recog ===");
/* Scan through the loop stmts, applying the pattern recognition
functions starting at each stmt visited: */
for (i = 0; i < nbbs; i++)
{
basic_block bb = bbs[i];
for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
{
stmt = bsi_stmt (si);
/* Scan over all generic vect_recog_xxx_pattern functions. */
for (j = 0; j < NUM_PATTERNS; j++)
{
vect_recog_func_ptr = vect_vect_recog_func_ptrs[j];
vect_pattern_recog_1 (vect_recog_func_ptr, si);
}
}
}
}

View file

@ -1,5 +1,5 @@
/* Transformation Utilities for Loop Vectorization.
Copyright (C) 2003,2004,2005 Free Software Foundation, Inc.
Copyright (C) 2003,2004,2005,2006 Free Software Foundation, Inc.
Contributed by Dorit Naishlos <dorit@il.ibm.com>
This file is part of GCC.
@ -59,6 +59,7 @@ static void vect_finish_stmt_generation
(tree stmt, tree vec_stmt, block_stmt_iterator *bsi);
static bool vect_is_simple_cond (tree, loop_vec_info);
static void update_vuses_to_preheader (tree, struct loop*);
static void vect_create_epilog_for_reduction (tree, tree, enum tree_code, tree);
static tree get_initial_def_for_reduction (tree, tree, tree *);
/* Utility function dealing with loop peeling (not peeling itself). */
@ -656,6 +657,8 @@ get_initial_def_for_reduction (tree stmt, tree init_val, tree *scalar_def)
switch (code)
{
case WIDEN_SUM_EXPR:
case DOT_PROD_EXPR:
case PLUS_EXPR:
if (INTEGRAL_TYPE_P (type))
def = build_int_cst (type, 0);
@ -711,66 +714,66 @@ get_initial_def_for_reduction (tree stmt, tree init_val, tree *scalar_def)
}
/* Function vect_create_epilog_for_reduction:
/* Function vect_create_epilog_for_reduction
Create code at the loop-epilog to finalize the result of a reduction
computation.
computation.
LOOP_EXIT_VECT_DEF is a vector of partial results. We need to "reduce" it
into a single result, by applying the operation REDUC_CODE on the
partial-results-vector. For this, we need to create a new phi node at the
loop exit to preserve loop-closed form, as illustrated below.
STMT is the original scalar reduction stmt that is being vectorized.
REDUCTION_OP is the scalar reduction-variable.
VECT_DEF is a vector of partial results.
REDUC_CODE is the tree-code for the epilog reduction.
STMT is the scalar reduction stmt that is being vectorized.
REDUCTION_PHI is the phi-node that carries the reduction computation.
This function also sets the arguments for the REDUCTION_PHI:
The loop-entry argument is the (vectorized) initial-value of REDUCTION_OP.
The loop-latch argument is VECT_DEF - the vector of partial sums.
This function transforms this:
This function:
1. Creates the reduction def-use cycle: sets the the arguments for
REDUCTION_PHI:
The loop-entry argument is the vectorized initial-value of the reduction.
The loop-latch argument is VECT_DEF - the vector of partial sums.
2. "Reduces" the vector of partial results VECT_DEF into a single result,
by applying the operation specified by REDUC_CODE if available, or by
other means (whole-vector shifts or a scalar loop).
The function also creates a new phi node at the loop exit to preserve
loop-closed form, as illustrated below.
The flow at the entry to this function:
loop:
vec_def = phi <null, null> # REDUCTION_PHI
....
VECT_DEF = ...
vec_def = phi <null, null> # REDUCTION_PHI
VECT_DEF = vector_stmt # vectorized form of STMT
s_loop = scalar_stmt # (scalar) STMT
loop_exit:
s_out0 = phi <s_loop> # EXIT_PHI
s_out0 = phi <s_loop> # (scalar) EXIT_PHI
use <s_out0>
use <s_out0>
Into:
The above is transformed by this function into:
loop:
vec_def = phi <vec_init, VECT_DEF> # REDUCTION_PHI
....
VECT_DEF = ...
vec_def = phi <vec_init, VECT_DEF> # REDUCTION_PHI
VECT_DEF = vector_stmt # vectorized form of STMT
s_loop = scalar_stmt # (scalar) STMT
loop_exit:
s_out0 = phi <s_loop> # EXIT_PHI
v_out1 = phi <VECT_DEF> # NEW_EXIT_PHI
v_out2 = reduc_expr <v_out1>
s_out0 = phi <s_loop> # (scalar) EXIT_PHI
v_out1 = phi <VECT_DEF> # NEW_EXIT_PHI
v_out2 = reduce <v_out1>
s_out3 = extract_field <v_out2, 0>
use <s_out3>
use <s_out3>
s_out4 = adjust_result <s_out3>
use <s_out4>
use <s_out4>
*/
static void
vect_create_epilog_for_reduction (tree vect_def, tree stmt, tree reduction_op,
vect_create_epilog_for_reduction (tree vect_def, tree stmt,
enum tree_code reduc_code, tree reduction_phi)
{
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
enum machine_mode mode = TYPE_MODE (vectype);
tree vectype;
enum machine_mode mode;
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
basic_block exit_bb;
tree scalar_dest = TREE_OPERAND (stmt, 0);
tree scalar_type = TREE_TYPE (scalar_dest);
tree scalar_dest;
tree scalar_type;
tree new_phi;
block_stmt_iterator exit_bsi;
tree vec_dest;
@ -786,7 +789,16 @@ vect_create_epilog_for_reduction (tree vect_def, tree stmt, tree reduction_op,
imm_use_iterator imm_iter;
use_operand_p use_p;
bool extract_scalar_result;
tree reduction_op;
tree orig_stmt;
tree operation = TREE_OPERAND (stmt, 1);
int op_type;
op_type = TREE_CODE_LENGTH (TREE_CODE (operation));
reduction_op = TREE_OPERAND (operation, op_type-1);
vectype = get_vectype_for_scalar_type (TREE_TYPE (reduction_op));
mode = TYPE_MODE (vectype);
/*** 1. Create the reduction def-use cycle ***/
/* 1.1 set the loop-entry arg of the reduction-phi: */
@ -797,7 +809,6 @@ vect_create_epilog_for_reduction (tree vect_def, tree stmt, tree reduction_op,
&scalar_initial_def);
add_phi_arg (reduction_phi, vec_initial_def, loop_preheader_edge (loop));
/* 1.2 set the loop-latch arg for the reduction-phi: */
add_phi_arg (reduction_phi, vect_def, loop_latch_edge (loop));
@ -810,7 +821,32 @@ vect_create_epilog_for_reduction (tree vect_def, tree stmt, tree reduction_op,
}
/*** 2. Create epilog code ***/
/*** 2. Create epilog code
The reduction epilog code operates across the elements of the vector
of partial results computed by the vectorized loop.
The reduction epilog code consists of:
step 1: compute the scalar result in a vector (v_out2)
step 2: extract the scalar result (s_out3) from the vector (v_out2)
step 3: adjust the scalar result (s_out3) if needed.
Step 1 can be accomplished using one the following three schemes:
(scheme 1) using reduc_code, if available.
(scheme 2) using whole-vector shifts, if available.
(scheme 3) using a scalar loop. In this case steps 1+2 above are
combined.
The overall epilog code looks like this:
s_out0 = phi <s_loop> # original EXIT_PHI
v_out1 = phi <VECT_DEF> # NEW_EXIT_PHI
v_out2 = reduce <v_out1> # step 1
s_out3 = extract_field <v_out2, 0> # step 2
s_out4 = adjust_result <s_out3> # step 3
(step 3 is optional, and step2 1 and 2 may be combined).
Lastly, the uses of s_out0 are replaced by s_out4.
***/
/* 2.1 Create new loop-exit-phi to preserve loop-closed form:
v_out1 = phi <v_loop> */
@ -818,15 +854,39 @@ vect_create_epilog_for_reduction (tree vect_def, tree stmt, tree reduction_op,
exit_bb = loop->single_exit->dest;
new_phi = create_phi_node (SSA_NAME_VAR (vect_def), exit_bb);
SET_PHI_ARG_DEF (new_phi, loop->single_exit->dest_idx, vect_def);
exit_bsi = bsi_start (exit_bb);
/* 2.2 Get the relevant tree-code to use in the epilog for schemes 2,3
(i.e. when reduc_code is not available) and in the final adjusment code
(if needed). Also get the original scalar reduction variable as
defined in the loop. In case STMT is a "pattern-stmt" (i.e. - it
represents a reduction pattern), the tree-code and scalar-def are
taken from the original stmt that the pattern-stmt (STMT) replaces.
Otherwise (it is a regular reduction) - the tree-code and scalar-def
are taken from STMT. */
orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
if (!orig_stmt)
{
/* Regular reduction */
orig_stmt = stmt;
}
else
{
/* Reduction pattern */
stmt_vec_info stmt_vinfo = vinfo_for_stmt (orig_stmt);
gcc_assert (STMT_VINFO_IN_PATTERN_P (stmt_vinfo));
gcc_assert (STMT_VINFO_RELATED_STMT (stmt_vinfo) == stmt);
}
code = TREE_CODE (TREE_OPERAND (orig_stmt, 1));
scalar_dest = TREE_OPERAND (orig_stmt, 0);
scalar_type = TREE_TYPE (scalar_dest);
new_scalar_dest = vect_create_destination_var (scalar_dest, NULL);
bitsize = TYPE_SIZE (scalar_type);
bytesize = TYPE_SIZE_UNIT (scalar_type);
/* 2.2 Create the reduction code. */
/* 2.3 Create the reduction code, using one of the three schemes described
above. */
if (reduc_code < NUM_TREE_CODES)
{
@ -849,16 +909,11 @@ vect_create_epilog_for_reduction (tree vect_def, tree stmt, tree reduction_op,
{
enum tree_code shift_code = 0;
bool have_whole_vector_shift = true;
enum tree_code code = TREE_CODE (TREE_OPERAND (stmt, 1)); /* CHECKME */
int bit_offset;
int element_bitsize = tree_low_cst (bitsize, 1);
int vec_size_in_bits = tree_low_cst (TYPE_SIZE (vectype), 1);
tree vec_temp;
/* The result of the reduction is expected to be at the least
significant bits of the vector. This is merely convention,
as it's the extraction later that really matters, and that
is also under our control. */
if (vec_shr_optab->handlers[mode].insn_code != CODE_FOR_nothing)
shift_code = VEC_RSHIFT_EXPR;
else
@ -881,7 +936,7 @@ vect_create_epilog_for_reduction (tree vect_def, tree stmt, tree reduction_op,
if (have_whole_vector_shift)
{
/*** Case 2:
/*** Case 2: Create:
for (offset = VS/2; offset >= element_size; offset/=2)
{
Create: va' = vec_shift <va, offset>
@ -905,17 +960,12 @@ vect_create_epilog_for_reduction (tree vect_def, tree stmt, tree reduction_op,
new_name = make_ssa_name (vec_dest, epilog_stmt);
TREE_OPERAND (epilog_stmt, 0) = new_name;
bsi_insert_after (&exit_bsi, epilog_stmt, BSI_NEW_STMT);
if (vect_print_dump_info (REPORT_DETAILS))
print_generic_expr (vect_dump, epilog_stmt, TDF_SLIM);
epilog_stmt = build2 (MODIFY_EXPR, vectype, vec_dest,
build2 (code, vectype, new_name, new_temp));
new_temp = make_ssa_name (vec_dest, epilog_stmt);
TREE_OPERAND (epilog_stmt, 0) = new_temp;
bsi_insert_after (&exit_bsi, epilog_stmt, BSI_NEW_STMT);
if (vect_print_dump_info (REPORT_DETAILS))
print_generic_expr (vect_dump, epilog_stmt, TDF_SLIM);
}
extract_scalar_result = true;
@ -924,10 +974,11 @@ vect_create_epilog_for_reduction (tree vect_def, tree stmt, tree reduction_op,
{
tree rhs;
/*** Case 3:
Create:
/*** Case 3: Create:
s = extract_field <v_out2, 0>
for (offset=element_size; offset<vector_size; offset+=element_size;)
for (offset = element_size;
offset < vector_size;
offset += element_size;)
{
Create: s' = extract_field <v_out2, offset>
Create: s = op <s, s'>
@ -938,18 +989,13 @@ vect_create_epilog_for_reduction (tree vect_def, tree stmt, tree reduction_op,
vec_temp = PHI_RESULT (new_phi);
vec_size_in_bits = tree_low_cst (TYPE_SIZE (vectype), 1);
rhs = build3 (BIT_FIELD_REF, scalar_type, vec_temp, bitsize,
bitsize_zero_node);
BIT_FIELD_REF_UNSIGNED (rhs) = TYPE_UNSIGNED (scalar_type);
epilog_stmt = build2 (MODIFY_EXPR, scalar_type, new_scalar_dest,
rhs);
epilog_stmt = build2 (MODIFY_EXPR, scalar_type, new_scalar_dest, rhs);
new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
TREE_OPERAND (epilog_stmt, 0) = new_temp;
bsi_insert_after (&exit_bsi, epilog_stmt, BSI_NEW_STMT);
if (vect_print_dump_info (REPORT_DETAILS))
print_generic_expr (vect_dump, epilog_stmt, TDF_SLIM);
for (bit_offset = element_bitsize;
bit_offset < vec_size_in_bits;
@ -965,25 +1011,19 @@ vect_create_epilog_for_reduction (tree vect_def, tree stmt, tree reduction_op,
new_name = make_ssa_name (new_scalar_dest, epilog_stmt);
TREE_OPERAND (epilog_stmt, 0) = new_name;
bsi_insert_after (&exit_bsi, epilog_stmt, BSI_NEW_STMT);
if (vect_print_dump_info (REPORT_DETAILS))
print_generic_expr (vect_dump, epilog_stmt, TDF_SLIM);
epilog_stmt = build2 (MODIFY_EXPR, scalar_type, new_scalar_dest,
build2 (code, scalar_type, new_name, new_temp));
new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
TREE_OPERAND (epilog_stmt, 0) = new_temp;
bsi_insert_after (&exit_bsi, epilog_stmt, BSI_NEW_STMT);
if (vect_print_dump_info (REPORT_DETAILS))
print_generic_expr (vect_dump, epilog_stmt, TDF_SLIM);
}
extract_scalar_result = false;
}
}
/* 2.3 Extract the final scalar result. Create:
/* 2.4 Extract the final scalar result. Create:
s_out3 = extract_field <v_out2, bitpos> */
if (extract_scalar_result)
@ -993,7 +1033,6 @@ vect_create_epilog_for_reduction (tree vect_def, tree stmt, tree reduction_op,
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "extract scalar result");
/* The result is in the low order bits. */
if (BYTES_BIG_ENDIAN)
bitpos = size_binop (MULT_EXPR,
bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1),
@ -1007,17 +1046,14 @@ vect_create_epilog_for_reduction (tree vect_def, tree stmt, tree reduction_op,
new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
TREE_OPERAND (epilog_stmt, 0) = new_temp;
bsi_insert_after (&exit_bsi, epilog_stmt, BSI_NEW_STMT);
if (vect_print_dump_info (REPORT_DETAILS))
print_generic_expr (vect_dump, epilog_stmt, TDF_SLIM);
}
/* 2.4 Adjust the final result by the initial value of the reduction
variable. (when such adjustment is not needed, then
variable. (When such adjustment is not needed, then
'scalar_initial_def' is zero).
Create:
s_out = scalar_expr <s_out, scalar_initial_def> */
s_out4 = scalar_expr <s_out3, scalar_initial_def> */
if (scalar_initial_def)
{
@ -1026,18 +1062,13 @@ vect_create_epilog_for_reduction (tree vect_def, tree stmt, tree reduction_op,
new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
TREE_OPERAND (epilog_stmt, 0) = new_temp;
bsi_insert_after (&exit_bsi, epilog_stmt, BSI_NEW_STMT);
if (vect_print_dump_info (REPORT_DETAILS))
print_generic_expr (vect_dump, epilog_stmt, TDF_SLIM);
}
/* 2.6 Replace uses of s_out0 with uses of s_out3 */
/* 2.5 Replace uses of s_out0 with uses of s_out3 */
/* Find the loop-closed-use at the loop exit of the original
scalar result. (The reduction result is expected to have
two immediate uses - one at the latch block, and one at the
loop exit). */
/* Find the loop-closed-use at the loop exit of the original scalar result.
(The reduction result is expected to have two immediate uses - one at the
latch block, and one at the loop exit). */
exit_phi = NULL;
FOR_EACH_IMM_USE_FAST (use_p, imm_iter, scalar_dest)
{
@ -1047,9 +1078,10 @@ vect_create_epilog_for_reduction (tree vect_def, tree stmt, tree reduction_op,
break;
}
}
/* We expect to have found an exit_phi because of loop-closed-ssa form. */
gcc_assert (exit_phi);
/* Replace the uses: */
orig_name = PHI_RESULT (exit_phi);
FOR_EACH_IMM_USE_SAFE (use_p, imm_iter, orig_name)
SET_USE (use_p, new_temp);
}
@ -1060,33 +1092,69 @@ vect_create_epilog_for_reduction (tree vect_def, tree stmt, tree reduction_op,
Check if STMT performs a reduction operation that can be vectorized.
If VEC_STMT is also passed, vectorize the STMT: create a vectorized
stmt to replace it, put it in VEC_STMT, and insert it at BSI.
Return FALSE if not a vectorizable STMT, TRUE otherwise. */
Return FALSE if not a vectorizable STMT, TRUE otherwise.
This function also handles reduction idioms (patterns) that have been
recognized in advance during vect_pattern_recog. In this case, STMT may be
of this form:
X = pattern_expr (arg0, arg1, ..., X)
and it's STMT_VINFO_RELATED_STMT points to the last stmt in the original
sequence that had been detected and replaced by the pattern-stmt (STMT).
In some cases of reduction patterns, the type of the reduction variable X is
different than the type of the other arguments of STMT.
In such cases, the vectype that is used when transforming STMT into a vector
stmt is different than the vectype that is used to determine the
vectorization factor, because it consists of a different number of elements
than the actual number of elements that are being operated upon in parallel.
For example, consider an accumulation of shorts into an int accumulator.
On some targets it's possible to vectorize this pattern operating on 8
shorts at a time (hence, the vectype for purposes of determining the
vectorization factor should be V8HI); on the other hand, the vectype that
is used to create the vector form is actually V4SI (the type of the result).
Upon entry to this function, STMT_VINFO_VECTYPE records the vectype that
indicates what is the actual level of parallelism (V8HI in the example), so
that the right vectorization factor would be derived. This vectype
corresponds to the type of arguments to the reduction stmt, and should *NOT*
be used to create the vectorized stmt. The right vectype for the vectorized
stmt is obtained from the type of the result X:
get_vectype_for_scalar_type (TREE_TYPE (X))
This means that, contrary to "regular" reductions (or "regular" stmts in
general), the following equation:
STMT_VINFO_VECTYPE == get_vectype_for_scalar_type (TREE_TYPE (X))
does *NOT* necessarily hold for reduction patterns. */
bool
vectorizable_reduction (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
{
tree vec_dest;
tree scalar_dest;
tree op0, op1;
tree loop_vec_def;
tree op;
tree loop_vec_def0, loop_vec_def1;
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
tree operation;
enum tree_code code, reduc_code = 0;
enum tree_code code, orig_code, epilog_reduc_code = 0;
enum machine_mode vec_mode;
int op_type;
optab optab, reduc_optab;
tree new_temp;
tree def0, def1, def_stmt0, def_stmt1;
enum vect_def_type dt0, dt1;
tree def, def_stmt;
enum vect_def_type dt;
tree new_phi;
tree scalar_type;
bool is_simple_use0;
bool is_simple_use1;
bool is_simple_use;
tree orig_stmt;
stmt_vec_info orig_stmt_info;
tree expr = NULL_TREE;
int i;
/* Is vectorizable reduction? */
/* 1. Is vectorizable reduction? */
/* Not supportable if the reduction variable is used in the loop. */
if (STMT_VINFO_RELEVANT_P (stmt_info))
@ -1095,43 +1163,68 @@ vectorizable_reduction (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
if (!STMT_VINFO_LIVE_P (stmt_info))
return false;
/* Make sure it was already recognized as a reduction pattern. */
/* Make sure it was already recognized as a reduction computation. */
if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def)
return false;
/* 2. Has this been recognized as a reduction pattern?
Check if STMT represents a pattern that has been recognized
in earlier analysis stages. For stmts that represent a pattern,
the STMT_VINFO_RELATED_STMT field records the last stmt in
the original sequence that constitutes the pattern. */
orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
if (orig_stmt)
{
orig_stmt_info = vinfo_for_stmt (orig_stmt);
gcc_assert (STMT_VINFO_RELATED_STMT (orig_stmt_info) == stmt);
gcc_assert (STMT_VINFO_IN_PATTERN_P (orig_stmt_info));
gcc_assert (!STMT_VINFO_IN_PATTERN_P (stmt_info));
}
/* 3. Check the operands of the operation. The first operands are defined
inside the loop body. The last operand is the reduction variable,
which is defined by the loop-header-phi. */
gcc_assert (TREE_CODE (stmt) == MODIFY_EXPR);
operation = TREE_OPERAND (stmt, 1);
code = TREE_CODE (operation);
op_type = TREE_CODE_LENGTH (code);
if (op_type != binary_op)
if (op_type != binary_op && op_type != ternary_op)
return false;
op0 = TREE_OPERAND (operation, 0);
op1 = TREE_OPERAND (operation, 1);
scalar_dest = TREE_OPERAND (stmt, 0);
scalar_type = TREE_TYPE (scalar_dest);
/* Check the first operand. It is expected to be defined inside the loop. */
is_simple_use0 =
vect_is_simple_use (op0, loop_vinfo, &def_stmt0, &def0, &dt0);
is_simple_use1 =
vect_is_simple_use (op1, loop_vinfo, &def_stmt1, &def1, &dt1);
/* All uses but the last are expected to be defined in the loop.
The last use is the reduction variable. */
for (i = 0; i < op_type-1; i++)
{
op = TREE_OPERAND (operation, i);
is_simple_use = vect_is_simple_use (op, loop_vinfo, &def_stmt, &def, &dt);
gcc_assert (is_simple_use);
gcc_assert (dt == vect_loop_def || dt == vect_invariant_def ||
dt == vect_constant_def);
}
gcc_assert (is_simple_use0);
gcc_assert (is_simple_use1);
gcc_assert (dt0 == vect_loop_def);
gcc_assert (dt1 == vect_reduction_def);
gcc_assert (TREE_CODE (def_stmt1) == PHI_NODE);
gcc_assert (stmt == vect_is_simple_reduction (loop, def_stmt1));
op = TREE_OPERAND (operation, i);
is_simple_use = vect_is_simple_use (op, loop_vinfo, &def_stmt, &def, &dt);
gcc_assert (is_simple_use);
gcc_assert (dt == vect_reduction_def);
gcc_assert (TREE_CODE (def_stmt) == PHI_NODE);
if (orig_stmt)
gcc_assert (orig_stmt == vect_is_simple_reduction (loop, def_stmt));
else
gcc_assert (stmt == vect_is_simple_reduction (loop, def_stmt));
if (STMT_VINFO_LIVE_P (vinfo_for_stmt (def_stmt)))
return false;
if (STMT_VINFO_LIVE_P (vinfo_for_stmt (def_stmt1)))
return false;
/* 4. Supportable by target? */
/* Supportable by target? */
/* check support for the operation in the loop */
/* 4.1. check support for the operation in the loop */
optab = optab_for_tree_code (code, vectype);
if (!optab)
{
@ -1162,21 +1255,69 @@ vectorizable_reduction (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
return false;
}
/* check support for the epilog operation */
if (!reduction_code_for_scalar_code (code, &reduc_code))
/* 4.2. Check support for the epilog operation.
If STMT represents a reduction pattern, then the type of the
reduction variable may be different than the type of the rest
of the arguments. For example, consider the case of accumulation
of shorts into an int accumulator; The original code:
S1: int_a = (int) short_a;
orig_stmt-> S2: int_acc = plus <int_a ,int_acc>;
was replaced with:
STMT: int_acc = widen_sum <short_a, int_acc>
This means that:
1. The tree-code that is used to create the vector operation in the
epilog code (that reduces the partial results) is not the
tree-code of STMT, but is rather the tree-code of the original
stmt from the pattern that STMT is replacing. I.e, in the example
above we want to use 'widen_sum' in the loop, but 'plus' in the
epilog.
2. The type (mode) we use to check available target support
for the vector operation to be created in the *epilog*, is
determined by the type of the reduction variable (in the example
above we'd check this: plus_optab[vect_int_mode]).
However the type (mode) we use to check available target support
for the vector operation to be created *inside the loop*, is
determined by the type of the other arguments to STMT (in the
example we'd check this: widen_sum_optab[vect_short_mode]).
This is contrary to "regular" reductions, in which the types of all
the arguments are the same as the type of the reduction variable.
For "regular" reductions we can therefore use the same vector type
(and also the same tree-code) when generating the epilog code and
when generating the code inside the loop. */
if (orig_stmt)
{
/* This is a reduction pattern: get the vectype from the type of the
reduction variable, and get the tree-code from orig_stmt. */
orig_code = TREE_CODE (TREE_OPERAND (orig_stmt, 1));
vectype = get_vectype_for_scalar_type (TREE_TYPE (def));
vec_mode = TYPE_MODE (vectype);
}
else
{
/* Regular reduction: use the same vectype and tree-code as used for
the vector code inside the loop can be used for the epilog code. */
orig_code = code;
}
if (!reduction_code_for_scalar_code (orig_code, &epilog_reduc_code))
return false;
reduc_optab = optab_for_tree_code (reduc_code, vectype);
reduc_optab = optab_for_tree_code (epilog_reduc_code, vectype);
if (!reduc_optab)
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "no optab for reduction.");
reduc_code = NUM_TREE_CODES;
epilog_reduc_code = NUM_TREE_CODES;
}
if (reduc_optab->handlers[(int) vec_mode].insn_code == CODE_FOR_nothing)
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "reduc op not supported by target.");
reduc_code = NUM_TREE_CODES;
epilog_reduc_code = NUM_TREE_CODES;
}
if (!vec_stmt) /* transformation not required. */
@ -1193,25 +1334,31 @@ vectorizable_reduction (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
/* Create the destination vector */
vec_dest = vect_create_destination_var (scalar_dest, vectype);
/* Create the reduction-phi that defines the reduction-operand. */
new_phi = create_phi_node (vec_dest, loop->header);
/* Prepare the operand that is defined inside the loop body */
loop_vec_def = vect_get_vec_def_for_operand (op0, stmt, NULL);
op = TREE_OPERAND (operation, 0);
loop_vec_def0 = vect_get_vec_def_for_operand (op, stmt, NULL);
if (op_type == binary_op)
expr = build2 (code, vectype, loop_vec_def0, PHI_RESULT (new_phi));
else if (op_type == ternary_op)
{
op = TREE_OPERAND (operation, 1);
loop_vec_def1 = vect_get_vec_def_for_operand (op, stmt, NULL);
expr = build3 (code, vectype, loop_vec_def0, loop_vec_def1,
PHI_RESULT (new_phi));
}
/* Create the vectorized operation that computes the partial results */
*vec_stmt = build2 (MODIFY_EXPR, vectype, vec_dest,
build2 (code, vectype, loop_vec_def, PHI_RESULT (new_phi)));
*vec_stmt = build2 (MODIFY_EXPR, vectype, vec_dest, expr);
new_temp = make_ssa_name (vec_dest, *vec_stmt);
TREE_OPERAND (*vec_stmt, 0) = new_temp;
vect_finish_stmt_generation (stmt, *vec_stmt, bsi);
/* Finalize the reduction-phi (set it's arguments) and create the
epilog reduction code. */
vect_create_epilog_for_reduction (new_temp, stmt, op1, reduc_code, new_phi);
vect_create_epilog_for_reduction (new_temp, stmt, epilog_reduc_code, new_phi);
return true;
}
@ -2040,6 +2187,7 @@ vect_transform_stmt (tree stmt, block_stmt_iterator *bsi)
bool is_store = false;
tree vec_stmt = NULL_TREE;
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
tree orig_stmt_in_pattern;
bool done;
if (STMT_VINFO_RELEVANT_P (stmt_info))
@ -2078,7 +2226,25 @@ vect_transform_stmt (tree stmt, block_stmt_iterator *bsi)
gcc_unreachable ();
}
gcc_assert (vec_stmt);
STMT_VINFO_VEC_STMT (stmt_info) = vec_stmt;
orig_stmt_in_pattern = STMT_VINFO_RELATED_STMT (stmt_info);
if (orig_stmt_in_pattern)
{
stmt_vec_info stmt_vinfo = vinfo_for_stmt (orig_stmt_in_pattern);
if (STMT_VINFO_IN_PATTERN_P (stmt_vinfo))
{
gcc_assert (STMT_VINFO_RELATED_STMT (stmt_vinfo) == stmt);
/* STMT was inserted by the vectorizer to replace a computation
idiom. ORIG_STMT_IN_PATTERN is a stmt in the original
sequence that computed this idiom. We need to record a pointer
to VEC_STMT in the stmt_info of ORIG_STMT_IN_PATTERN. See more
detail in the documentation of vect_pattern_recog. */
STMT_VINFO_VEC_STMT (stmt_vinfo) = vec_stmt;
}
}
}
if (STMT_VINFO_LIVE_P (stmt_info))

View file

@ -1,5 +1,5 @@
/* Loop Vectorization
Copyright (C) 2003, 2004, 2005 Free Software Foundation, Inc.
Copyright (C) 2003, 2004, 2005, 2006 Free Software Foundation, Inc.
Contributed by Dorit Naishlos <dorit@il.ibm.com>
This file is part of GCC.
@ -1361,6 +1361,8 @@ new_stmt_vec_info (tree stmt, loop_vec_info loop_vinfo)
STMT_VINFO_LIVE_P (res) = 0;
STMT_VINFO_VECTYPE (res) = NULL;
STMT_VINFO_VEC_STMT (res) = NULL;
STMT_VINFO_IN_PATTERN_P (res) = false;
STMT_VINFO_RELATED_STMT (res) = NULL;
STMT_VINFO_DATA_REF (res) = NULL;
if (TREE_CODE (stmt) == PHI_NODE)
STMT_VINFO_DEF_TYPE (res) = vect_unknown_def_type;

View file

@ -1,5 +1,5 @@
/* Loop Vectorization
Copyright (C) 2003, 2004, 2005 Free Software Foundation, Inc.
Copyright (C) 2003, 2004, 2005, 2006 Free Software Foundation, Inc.
Contributed by Dorit Naishlos <dorit@il.ibm.com>
This file is part of GCC.
@ -43,10 +43,11 @@ enum vect_var_kind {
vect_scalar_var
};
/* Defines type of operation: unary or binary. */
/* Defines type of operation. */
enum operation_type {
unary_op = 1,
binary_op
binary_op,
ternary_op
};
/* Define type of available alignment support. */
@ -204,6 +205,20 @@ typedef struct _stmt_vec_info {
/* Information about the data-ref (access function, etc). */
struct data_reference *data_ref_info;
/* Stmt is part of some pattern (computation idiom) */
bool in_pattern_p;
/* Used for various bookeeping purposes, generally holding a pointer to
some other stmt S that is in some way "related" to this stmt.
Current use of this field is:
If this stmt is part of a pattern (i.e. the field 'in_pattern_p' is
true): S is the "pattern stmt" that represents (and replaces) the
sequence of stmts that constitutes the pattern. Similarly, the
related_stmt of the "pattern stmt" points back to this stmt (which is
the last stmt in the original sequence of stmts that constitutes the
pattern). */
tree related_stmt;
/* List of datarefs that are known to have the same alignment as the dataref
of this stmt. */
VEC(dr_p,heap) *same_align_refs;
@ -222,6 +237,8 @@ typedef struct _stmt_vec_info {
#define STMT_VINFO_VECTYPE(S) (S)->vectype
#define STMT_VINFO_VEC_STMT(S) (S)->vectorized_stmt
#define STMT_VINFO_DATA_REF(S) (S)->data_ref_info
#define STMT_VINFO_IN_PATTERN_P(S) (S)->in_pattern_p
#define STMT_VINFO_RELATED_STMT(S) (S)->related_stmt
#define STMT_VINFO_SAME_ALIGN_REFS(S) (S)->same_align_refs
#define STMT_VINFO_DEF_TYPE(S) (S)->def_type
@ -312,7 +329,6 @@ extern bool vect_can_force_dr_alignment_p (tree, unsigned int);
extern enum dr_alignment_support vect_supportable_dr_alignment
(struct data_reference *);
extern bool reduction_code_for_scalar_code (enum tree_code, enum tree_code *);
/* Creation and deletion of loop and stmt info structs. */
extern loop_vec_info new_loop_vec_info (struct loop *loop);
extern void destroy_loop_vec_info (loop_vec_info);
@ -320,10 +336,21 @@ extern stmt_vec_info new_stmt_vec_info (tree stmt, loop_vec_info);
/* Main driver. */
extern void vectorize_loops (struct loops *);
/** In tree-vect-analyze.c **/
/* Driver for analysis stage. */
extern loop_vec_info vect_analyze_loop (struct loop *);
/** In tree-vect-patterns.c **/
/* Pattern recognition functions.
Additional pattern recognition functions can (and will) be added
in the future. */
typedef tree (* vect_recog_func_ptr) (tree, tree *, tree *);
#define NUM_PATTERNS 3
void vect_pattern_recog (loop_vec_info);
/** In tree-vect-transform.c **/
extern bool vectorizable_load (tree, block_stmt_iterator *, tree *);
extern bool vectorizable_store (tree, block_stmt_iterator *, tree *);

View file

@ -1,7 +1,7 @@
/* This file contains the definitions and documentation for the
tree codes used in GCC.
Copyright (C) 1987, 1988, 1993, 1995, 1997, 1998, 2000, 2001, 2004, 2005
Free Software Foundation, Inc.
Copyright (C) 1987, 1988, 1993, 1995, 1997, 1998, 2000, 2001, 2004, 2005,
2006 Free Software Foundation, Inc.
This file is part of GCC.
@ -1073,6 +1073,33 @@ DEFTREECODE (REDUC_MAX_EXPR, "reduc_max_expr", tcc_unary, 1)
DEFTREECODE (REDUC_MIN_EXPR, "reduc_min_expr", tcc_unary, 1)
DEFTREECODE (REDUC_PLUS_EXPR, "reduc_plus_expr", tcc_unary, 1)
/* Widenning dot-product.
The first two arguments are of type t1.
The third argument and the result are of type t2, such that t2 is at least
twice the size of t1. DOT_PROD_EXPR(arg1,arg2,arg3) is equivalent to:
tmp = WIDEN_MULT_EXPR(arg1, arg2);
arg3 = PLUS_EXPR (tmp, arg3);
or:
tmp = WIDEN_MULT_EXPR(arg1, arg2);
arg3 = WIDEN_SUM_EXPR (tmp, arg3); */
DEFTREECODE (DOT_PROD_EXPR, "dot_prod_expr", tcc_expression, 3)
/* Widenning summation.
The first argument is of type t1.
The second argument is of type t2, such that t2 is at least twice
the size of t1. The type of the entire expression is also t2.
WIDEN_SUM_EXPR is equivalent to first widening (promoting)
the first argument from type t1 to type t2, and then summing it
with the second argument. */
DEFTREECODE (WIDEN_SUM_EXPR, "widen_sum_expr", tcc_binary, 2)
/* Widenning multiplication.
The two arguments are of type t1.
The result is of type t2, such that t2 is at least twice
the size of t1. WIDEN_MULT_EXPR is equivalent to first widening (promoting)
the arguments from type t1 to type t2, and then multiplying them. */
DEFTREECODE (WIDEN_MULT_EXPR, "widen_mult_expr", tcc_binary, 2)
/* Whole vector left/right shift in bits.
Operand 0 is a vector to be shifted.
Operand 1 is an integer shift amount in bits. */