FreeChainXenon/gcc - Aiden Isik's Forgejo Server

Author	SHA1	Message	Date
Maciej W. Rozycki	5e884a8942	RISC-V/testsuite: Add branched cases for equality cond-move operations Verify, for Ventana and Zicond targets and the equality conditional-move operations, that if-conversion does not trigger at the respective sufficiently low `-mbranch-cost=' settings that make original branched code sequences cheaper than their branchless equivalents if-conversion would emit. gcc/testsuite/ * gcc.target/riscv/movdibeq-ventana.c: New test. * gcc.target/riscv/movdibeq-zicond.c: New test. * gcc.target/riscv/movdibne-ventana.c: New test. * gcc.target/riscv/movdibne-zicond.c: New test. * gcc.target/riscv/movsibeq-ventana.c: New test. * gcc.target/riscv/movsibeq-zicond.c: New test. * gcc.target/riscv/movsibne-ventana.c: New test. * gcc.target/riscv/movsibne-zicond.c: New test.	2023-11-22 01:18:26 +00:00
Maciej W. Rozycki	cfec7fc110	RISC-V: Avoid extraneous EQ or NE operation in cond-move expansion In the non-zero case there is no need for the conditional value used by Ventana and Zicond integer conditional operations to be specifically 1. Regardless we canonicalize it by producing an extraneous conditional-set operation, such as with the sequence below: (insn 22 6 23 2 (set (reg:DI 141) (minus:DI (reg/v:DI 135 [ w ]) (reg/v:DI 136 [ x ]))) 11 {subdi3} (nil)) (insn 23 22 24 2 (set (reg:DI 140) (ne:DI (reg:DI 141) (const_int 0 [0]))) 307 {sne_zero_didi} (nil)) (insn 24 23 25 2 (set (reg:DI 143) (if_then_else:DI (eq:DI (reg:DI 140) (const_int 0 [0])) (const_int 0 [0]) (reg:DI 13 a3 [ z ]))) 27913 {czero.eqz.didi} (nil)) (insn 25 24 26 2 (set (reg:DI 142) (if_then_else:DI (ne:DI (reg:DI 140) (const_int 0 [0])) (const_int 0 [0]) (reg/v:DI 137 [ y ]))) 27914 {czero.nez.didi} (nil)) (insn 26 25 18 2 (set (reg/v:DI 138 [ z ]) (ior:DI (reg:DI 142) (reg:DI 143))) 105 {iordi3} (nil)) where insn 23 can well be removed without changing the semantics of the sequence. This is actually fixed up later on by combine and the insn does not make it to output meaning no SNEZ (or SEQZ in the reverse case) appears in the assembly produced, however it counts towards the cost of the sequence calculated by if-conversion, raising the trigger level for the branchless sequence to be chosen. Arguably to emit this extraneous operation it can be also considered rather sloppy of our backend's. Remove the check for operand 1 being constant 0 in the Ventana/Zicond case for equality comparisons then, observing that `riscv_zero_if_equal' called via `riscv_emit_int_compare' will canonicalize the comparison if required, removing the extraneous insn from output: (insn 22 6 23 2 (set (reg:DI 142) (minus:DI (reg/v:DI 135 [ w ]) (reg/v:DI 136 [ x ]))) 11 {subdi3} (nil)) (insn 23 22 24 2 (set (reg:DI 141) (if_then_else:DI (eq:DI (reg:DI 142) (const_int 0 [0])) (const_int 0 [0]) (reg:DI 13 a3 [ z ]))) 27913 {czero.eqz.didi} (nil)) (insn 24 23 25 2 (set (reg:DI 140) (if_then_else:DI (ne:DI (reg:DI 142) (const_int 0 [0])) (const_int 0 [0]) (reg/v:DI 137 [ y ]))) 27914 {czero.nez.didi} (nil)) (insn 25 24 18 2 (set (reg/v:DI 138 [ z ]) (ior:DI (reg:DI 140) (reg:DI 141))) 105 {iordi3} (nil)) while keeping actual assembly produced the same. Adjust branch costs across the test cases affected accordingly. gcc/ config/riscv/riscv.cc (riscv_expand_conditional_move): Remove the check for operand 1 being constant 0 in the Ventana/Zicond case for equality comparisons. gcc/testsuite/ * gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_imm.c: Lower `-mbranch-cost=' setting. * gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_reg.c: Likewise. * gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_reg_reg.c: Likewise. * gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_imm.c: Likewise. * gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_reg.c: Likewise. * gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_reg_reg.c: Likewise.	2023-11-22 01:18:26 +00:00
Maciej W. Rozycki	3a746501f6	RISC-V/testsuite: Add branchless cases for GEU and LEU cond-move operations Verify, for Ventana and Zicond targets and the GEU and LEU conditional-move operations, that if-conversion triggers via `noce_try_cmove' at `-mbranch-cost=4' setting, which makes branchless code sequences produced by if-conversion cheaper than their original branched equivalents, and that extraneous instructions such as SEQZ, etc. are not present in output. gcc/testsuite/ * gcc.target/riscv/movdigtu-ventana.c: New test. * gcc.target/riscv/movdigtu-zicond.c: New test. * gcc.target/riscv/movdiltu-ventana.c: New test. * gcc.target/riscv/movdiltu-zicond.c: New test. * gcc.target/riscv/movsigtu-ventana.c: New test. * gcc.target/riscv/movsigtu-zicond.c: New test. * gcc.target/riscv/movsiltu-ventana.c: New test. * gcc.target/riscv/movsiltu-zicond.c: New test.	2023-11-22 01:18:26 +00:00
Maciej W. Rozycki	cfa6536f29	RISC-V/testsuite: Add branched cases for GEU and LEU cond-move operations Verify, for Ventana and Zicond targets and the GEU and LEU conditional-move operations, that if-conversion does not trigger at `-mbranch-cost=3' setting, which makes original branched code sequences cheaper than their branchless equivalents if-conversion would emit. gcc/testsuite/ * gcc.target/riscv/movdibgtu-ventana.c: New test. * gcc.target/riscv/movdibgtu-zicond.c: New test. * gcc.target/riscv/movdibltu-ventana.c: New test. * gcc.target/riscv/movdibltu-zicond.c: New test. * gcc.target/riscv/movsibgtu-ventana.c: New test. * gcc.target/riscv/movsibgtu-zicond.c: New test. * gcc.target/riscv/movsibltu-ventana.c: New test. * gcc.target/riscv/movsibltu-zicond.c: New test.	2023-11-22 01:18:26 +00:00
Maciej W. Rozycki	db9d825b21	RISC-V: Also invert the cond-move condition for GEU and LEU Update `riscv_expand_conditional_move' and handle the missing GEU and LEU operators there, avoiding an extraneous conditional set operation, such as with this output: sgtu a0,a0,a1 seqz a1,a0 czero.eqz a3,a3,a1 czero.nez a1,a2,a1 or a0,a1,a3 produced when optimizing for Zicond targets from: int movsigtu (int w, int x, int y, int z) { return w > x ? y : z; } These operators can be inverted producing optimal code such as this: sgtu a1,a0,a1 czero.nez a3,a3,a1 czero.eqz a1,a2,a1 or a0,a1,a3 which this change causes to happen. gcc/ * config/riscv/riscv.cc (riscv_expand_conditional_move): Also invert the condition for GEU and LEU.	2023-11-22 01:18:25 +00:00
Maciej W. Rozycki	814485b256	RISC-V/testsuite: Add branchless cases for FP cond-move operations Verify, for short forward branch, T-Head, Ventana and Zicond targets and the ordered floating-point conditional-move operations that already work as expected, that if-conversion triggers via `noce_try_cmove' at the respective sufficiently high `-mbranch-cost=' settings that make branchless code sequences produced by if-conversion cheaper than their original branched equivalents, and that extraneous instructions such as SNEZ, etc. are not present in output. Cover all ordered floating-point relational operations to make sure no corner case escapes. gcc/testsuite/ * gcc.target/riscv/movdifge-sfb.c: New test. * gcc.target/riscv/movdifge-thead.c: New test. * gcc.target/riscv/movdifge-ventana.c: New test. * gcc.target/riscv/movdifge-zicond.c: New test. * gcc.target/riscv/movdifgt-sfb.c: New test. * gcc.target/riscv/movdifgt-thead.c: New test. * gcc.target/riscv/movdifgt-ventana.c: New test. * gcc.target/riscv/movdifgt-zicond.c: New test. * gcc.target/riscv/movdifle-sfb.c: New test. * gcc.target/riscv/movdifle-thead.c: New test. * gcc.target/riscv/movdifle-ventana.c: New test. * gcc.target/riscv/movdifle-zicond.c: New test. * gcc.target/riscv/movdiflt-sfb.c: New test. * gcc.target/riscv/movdiflt-thead.c: New test. * gcc.target/riscv/movdiflt-ventana.c: New test. * gcc.target/riscv/movdiflt-zicond.c: New test. * gcc.target/riscv/movdifne-sfb.c: New test. * gcc.target/riscv/movdifne-thead.c: New test. * gcc.target/riscv/movdifne-ventana.c: New test. * gcc.target/riscv/movdifne-zicond.c: New test. * gcc.target/riscv/movsifge-sfb.c: New test. * gcc.target/riscv/movsifge-thead.c: New test. * gcc.target/riscv/movsifge-ventana.c: New test. * gcc.target/riscv/movsifge-zicond.c: New test. * gcc.target/riscv/movsifgt-sfb.c: New test. * gcc.target/riscv/movsifgt-thead.c: New test. * gcc.target/riscv/movsifgt-ventana.c: New test. * gcc.target/riscv/movsifgt-zicond.c: New test. * gcc.target/riscv/movsifle-sfb.c: New test. * gcc.target/riscv/movsifle-thead.c: New test. * gcc.target/riscv/movsifle-ventana.c: New test. * gcc.target/riscv/movsifle-zicond.c: New test. * gcc.target/riscv/movsiflt-sfb.c: New test. * gcc.target/riscv/movsiflt-thead.c: New test. * gcc.target/riscv/movsiflt-ventana.c: New test. * gcc.target/riscv/movsiflt-zicond.c: New test. * gcc.target/riscv/movsifne-sfb.c: New test. * gcc.target/riscv/movsifne-thead.c: New test. * gcc.target/riscv/movsifne-ventana.c: New test. * gcc.target/riscv/movsifne-zicond.c: New test.	2023-11-22 01:18:25 +00:00
Maciej W. Rozycki	bc40013cd6	RISC-V/testsuite: Add branched cases for FP cond-move operations Verify, for Ventana and Zicond targets and the ordered floating-point conditional-move operations that already work as expected, that if-conversion does not trigger at `-mbranch-cost=2' setting, which makes original branched code sequences cheaper than their branchless equivalents if-conversion would emit. Cover all ordered floating-point relational operations to make sure no corner case escapes. gcc/testsuite/ * gcc.target/riscv/movdibfge-ventana.c: New test. * gcc.target/riscv/movdibfge-zicond.c: New test. * gcc.target/riscv/movdibfgt-ventana.c: New test. * gcc.target/riscv/movdibfgt-zicond.c: New test. * gcc.target/riscv/movdibfle-ventana.c: New test. * gcc.target/riscv/movdibfle-zicond.c: New test. * gcc.target/riscv/movdibflt-ventana.c: New test. * gcc.target/riscv/movdibflt-zicond.c: New test. * gcc.target/riscv/movdibfne-ventana.c: New test. * gcc.target/riscv/movdibfne-zicond.c: New test. * gcc.target/riscv/movsibfge-ventana.c: New test. * gcc.target/riscv/movsibfge-zicond.c: New test. * gcc.target/riscv/movsibfgt-ventana.c: New test. * gcc.target/riscv/movsibfgt-zicond.c: New test. * gcc.target/riscv/movsibfle-ventana.c: New test. * gcc.target/riscv/movsibfle-zicond.c: New test. * gcc.target/riscv/movsibflt-ventana.c: New test. * gcc.target/riscv/movsibflt-zicond.c: New test. * gcc.target/riscv/movsibfne-ventana.c: New test. * gcc.target/riscv/movsibfne-zicond.c: New test.	2023-11-22 01:18:25 +00:00
Maciej W. Rozycki	28d6d6bfbd	RISC-V/testsuite: Add branchless cases for integer cond-move operations Verify, for T-Head, Ventana and Zicond targets and the integer conditional-move operations that already work as expected, if-conversion to trigger via `noce_try_cmove' at the respective sufficiently high `-mbranch-cost=' settings that make branchless code sequences produced by if-conversion cheaper than their original branched equivalents, and that extraneous instructions such as SNEZ, etc. are not present in output. Cover all integer relational operations to make sure no corner case escapes. gcc/testsuite/ * gcc.target/riscv/movdieq-thead.c: New test. * gcc.target/riscv/movdige-ventana.c: New test. * gcc.target/riscv/movdige-zicond.c: New test. * gcc.target/riscv/movdigeu-ventana.c: New test. * gcc.target/riscv/movdigeu-zicond.c: New test. * gcc.target/riscv/movdigt-ventana.c: New test. * gcc.target/riscv/movdigt-zicond.c: New test. * gcc.target/riscv/movdile-ventana.c: New test. * gcc.target/riscv/movdile-zicond.c: New test. * gcc.target/riscv/movdileu-ventana.c: New test. * gcc.target/riscv/movdileu-zicond.c: New test. * gcc.target/riscv/movdilt-ventana.c: New test. * gcc.target/riscv/movdilt-zicond.c: New test. * gcc.target/riscv/movdine-thead.c: New test. * gcc.target/riscv/movsieq-thead.c: New test. * gcc.target/riscv/movsige-ventana.c: New test. * gcc.target/riscv/movsige-zicond.c: New test. * gcc.target/riscv/movsigeu-ventana.c: New test. * gcc.target/riscv/movsigeu-zicond.c: New test. * gcc.target/riscv/movsigt-ventana.c: New test. * gcc.target/riscv/movsigt-zicond.c: New test. * gcc.target/riscv/movsile-ventana.c: New test. * gcc.target/riscv/movsile-zicond.c: New test. * gcc.target/riscv/movsileu-ventana.c: New test. * gcc.target/riscv/movsileu-zicond.c: New test. * gcc.target/riscv/movsilt-ventana.c: New test. * gcc.target/riscv/movsilt-zicond.c: New test. * gcc.target/riscv/movsine-thead.c: New test.	2023-11-22 01:18:25 +00:00
Maciej W. Rozycki	dcf4395fc6	RISC-V/testsuite: Add branched cases for integer cond-move operations Verify, for T-Head, Ventana and Zicond targets and the integer conditional-move operations that already work as expected, that if-conversion does not trigger at the respective sufficiently low `-mbranch-cost=' settings that make original branched code sequences cheaper than their branchless equivalents if-conversion would emit. Cover all integer relational operations to make sure no corner case escapes. The reason to XFAIL movdibne-thead.c and movsibne-thead.c is the branchless T-Head sequence: sub a1,a0,a1 th.mveqz a2,a3,a1 mv a0,a2 ret produced rather than its original branched counterpart: beq a0,a1,.L3 mv a0,a2 ret .L3: mv a0,a3 ret at `-mbranch-cost=1', even though under this setting the latter sequence is obviously cheaper performance-wise. This is because the final move instruction in the branchless sequence is not counted towards its cost and consequently the cost of both sequences works out at 8 each, making if-conversion prefer the branchless variant. Use the XFAIL mark to keep track of these cases for future consideration. gcc/testsuite/ * gcc.target/riscv/movdibeq-thead.c: New test. * gcc.target/riscv/movdibge-ventana.c: New test. * gcc.target/riscv/movdibge-zicond.c: New test. * gcc.target/riscv/movdibgeu-ventana.c: New test. * gcc.target/riscv/movdibgeu-zicond.c: New test. * gcc.target/riscv/movdibgt-ventana.c: New test. * gcc.target/riscv/movdibgt-zicond.c: New test. * gcc.target/riscv/movdible-ventana.c: New test. * gcc.target/riscv/movdible-zicond.c: New test. * gcc.target/riscv/movdibleu-ventana.c: New test. * gcc.target/riscv/movdibleu-zicond.c: New test. * gcc.target/riscv/movdiblt-ventana.c: New test. * gcc.target/riscv/movdiblt-zicond.c: New test. * gcc.target/riscv/movdibne-thead.c: New test. * gcc.target/riscv/movsibeq-thead.c: New test. * gcc.target/riscv/movsibge-ventana.c: New test. * gcc.target/riscv/movsibge-zicond.c: New test. * gcc.target/riscv/movsibgeu-ventana.c: New test. * gcc.target/riscv/movsibgeu-zicond.c: New test. * gcc.target/riscv/movsibgt-ventana.c: New test. * gcc.target/riscv/movsibgt-zicond.c: New test. * gcc.target/riscv/movsible-ventana.c: New test. * gcc.target/riscv/movsible-zicond.c: New test. * gcc.target/riscv/movsibleu-ventana.c: New test. * gcc.target/riscv/movsibleu-zicond.c: New test. * gcc.target/riscv/movsiblt-ventana.c: New test. * gcc.target/riscv/movsiblt-zicond.c: New test. * gcc.target/riscv/movsibne-thead.c: New test.	2023-11-22 01:18:25 +00:00
Maciej W. Rozycki	c1e8cb3d9f	RISC-V: Rework branch costing model for if-conversion The generic branch costing model for if-conversion assumes a fixed cost of COSTS_N_INSNS (2) for a conditional branch, and that one half of that cost comes from a preceding condition-set instruction, such as with MODE_CC targets, and then the other half of that cost is for the actual branch instruction. This is hardcoded for `if_info.original_cost' in `noce_find_if_block' and regardless of the cost set for branches via BRANCH_COST. Then `default_max_noce_ifcvt_seq_cost' instructs if-conversion to prefer a branchless sequence as costly as high as triple the BRANCH_COST value set. This is apparently to make up for the inability to accurately guess the branch penalty. Consequently for the BRANCH_COST of 3 we commonly set for tuning, if-conversion will consider branchless sequences costing 3 * 3 - 2 = 7 instruction units more than a corresponding branch sequence. For the BRANCH_COST of 4 such as with `sifive-7-series' tuning this is even worse, at 3 * 4 - 2 = 10. Effectively it means a branchless sequence will always be chosen if available, even a very inefficient one. Rework the branch costing model to better match our architecture, observing in particular that we have no preparatory instructions for branches so that the cost of a branch is naked BRANCH_COST plus any extra overhead the processing of a branch's source RTX might incur. Provide TARGET_INSN_COST and TARGET_MAX_NOCE_IFCVT_SEQ_COST handlers than that return suitable cost based on BRANCH_COST. The latter hook usually returns a value that is lower than the cost of the corresponding branched sequence. This is because we don't really want to produce a branchless sequence that is more expensive than the original branched sequence. If this turns out too conservative for some corner case, then this choice might be revisited. Then we don't want to fiddle with `noce_find_if_block' without a lot of cross-target verification, so add TARGET_NOCE_CONVERSION_PROFITABLE_P defined such that it subtracts the fixed COSTS_N_INSNS (2) cost from the cost of the original branched sequence supplied and instead adds actual branch cost calculated from the conditional branch instruction used. It is then further tweaked according to simple analysis of the replacement branchless sequence produced so as to cancel the cost of an extraneous zero extend operation produced by `noce_try_store_flag_mask' as observed with gcc/testsuite/gcc.target/riscv/pr105314.c. Tweak the testsuite accordingly and set `-mbranch-cost=' explicitly for the relevant cases so that the expected if-conversion transformation is made regardless of the default BRANCH_COST value of tuning in effect. Some of these settings will be lowered later on as deficiencies in branchless sequence generation have been fixed that lower their cost calculated by if-conversion. gcc/ * config/riscv/riscv.cc (riscv_insn_cost): New function. (riscv_max_noce_ifcvt_seq_cost): Likewise. (riscv_noce_conversion_profitable_p): Likewise. (TARGET_INSN_COST): New macro. (TARGET_MAX_NOCE_IFCVT_SEQ_COST): New macro. (TARGET_NOCE_CONVERSION_PROFITABLE_P): New macro. gcc/testsuite/ * gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_imm.c: Explicitly set the branch cost. * gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_reg.c: Likewise. * gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_reg_reg.c: Likewise. * gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_imm.c: Likewise. * gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_reg.c: Likewise. * gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_reg_reg.c: Likewise.	2023-11-22 01:18:25 +00:00
Maciej W. Rozycki	35bea66d36	RISC-V: Simplify EQ vs NE selection in `riscv_expand_conditional_move' Just choose between EQ and NE at `gen_rtx_fmt_ee' invocation, removing an extraneous variable only referred once and improving code clarity. gcc/ * config/riscv/riscv.cc (riscv_expand_conditional_move): Remove extraneous variable for EQ vs NE operation selection.	2023-11-22 01:18:24 +00:00
Maciej W. Rozycki	5a21d50756	RISC-V: Use `nullptr' in` riscv_expand_conditional_move' Use `nullptr' for consistency rather than 0 to initialize `invert_ptr'. gcc/ * config/riscv/riscv.cc (riscv_expand_conditional_move): Use `nullptr' rather than 0 to initialize a pointer.	2023-11-22 01:18:24 +00:00
Maciej W. Rozycki	00a3bd4cca	RISC-V: Avoid repeated GET_MODE calls in `riscv_expand_conditional_move' Use `mode0' and `mode1' shorthands respectively for `GET_MODE (op0)' and `GET_MODE (op1)' to improve code readability. gcc/ * config/riscv/riscv.cc (riscv_expand_conditional_move): Use `mode0' and `mode1' for `GET_MODE (op0)' and `GET_MODE (op1)'.	2023-11-22 01:18:24 +00:00
Maciej W. Rozycki	04c9c27c6f	RISC-V: Fix `mode' usage in` riscv_expand_conditional_move' In `riscv_expand_conditional_move' `mode' is initialized right away from `GET_MODE (dest)', so remove needless references that refrain from using the local variable. gcc/ * config/riscv/riscv.cc (riscv_expand_conditional_move): Use `mode' for `GET_MODE (dest)' throughout.	2023-11-22 01:18:24 +00:00
Maciej W. Rozycki	9f5aa4e210	RISC-V: Sanitise NEED_EQ_NE_P case with `riscv_emit_int_compare' For the NEED_EQ_NE_P `riscv_emit_int_compare' is documented to only emit EQ or NE comparisons against zero, however it does not catch incorrect use where a non-equality comparison has been requested and falls through to the general case then. Add a safety guard to catch such a case then. Arguably the NEED_EQ_NE_P case would best be moved into a function of its own, but let's leave it for a separate cleanup. gcc/ * config/riscv/riscv.cc (riscv_emit_int_compare): Bail out if NEED_EQ_NE_P but the comparison is neither EQ nor NE.	2023-11-22 01:18:23 +00:00
Maciej W. Rozycki	3bb4000c5f	RISC-V: Reorder comment on SFB patterns Our `mov<mode>cc' expander is no longer specific to short forward branch targets, so move its associated comment accordingly. gcc/ * config/riscv/riscv.md (mov<mode>cc): Move comment on SFB patterns over to... (*mov<GPR:mode><X:mode>cc): ... here.	2023-11-22 01:18:23 +00:00
Maciej W. Rozycki	932fe50a8b	RISC-V/testsuite: Add cases for integer SFB cond-move operations Verify, for short forward branch targets and the conditional-move operations that already work as expected, that if-conversion triggers via `noce_try_cmove' already at `-mbranch-cost=1' and that extraneous instructions such as SNEZ, etc. are not present in output. Cover all integer relational operations to make sure no corner case escapes. gcc/testsuite/ * gcc.target/riscv/movdieq-sfb.c: New test. * gcc.target/riscv/movdige-sfb.c: New test. * gcc.target/riscv/movdigeu-sfb.c: New test. * gcc.target/riscv/movdigt-sfb.c: New test. * gcc.target/riscv/movdigtu-sfb.c: New test. * gcc.target/riscv/movdile-sfb.c: New test. * gcc.target/riscv/movdileu-sfb.c: New test. * gcc.target/riscv/movdilt-sfb.c: New test. * gcc.target/riscv/movdiltu-sfb.c: New test. * gcc.target/riscv/movdine-sfb.c: New test. * gcc.target/riscv/movsieq-sfb.c: New test. * gcc.target/riscv/movsige-sfb.c: New test. * gcc.target/riscv/movsigeu-sfb.c: New test. * gcc.target/riscv/movsigt-sfb.c: New test. * gcc.target/riscv/movsigtu-sfb.c: New test. * gcc.target/riscv/movsile-sfb.c: New test. * gcc.target/riscv/movsileu-sfb.c: New test. * gcc.target/riscv/movsilt-sfb.c: New test. * gcc.target/riscv/movsiltu-sfb.c: New test. * gcc.target/riscv/movsine-sfb.c: New test.	2023-11-22 01:18:23 +00:00
Maciej W. Rozycki	c21ad4bee5	testsuite: Add cases for conditional-move and conditional-add operations Add generic execution tests for expressions that are expected to expand to conditional-move and conditional-add operations where supported. To ensure no corner case escapes all relational operators are extensively covered for integer comparisons and all ordered operators are covered for floating-point comparisons. Unordered operators are not covered at this point as they'd require a different input data set. gcc/testsuite/ * gcc.dg/torture/addieq.c: New test. * gcc.dg/torture/addifeq.c: New test. * gcc.dg/torture/addifge.c: New test. * gcc.dg/torture/addifgt.c: New test. * gcc.dg/torture/addifle.c: New test. * gcc.dg/torture/addiflt.c: New test. * gcc.dg/torture/addifne.c: New test. * gcc.dg/torture/addige.c: New test. * gcc.dg/torture/addigeu.c: New test. * gcc.dg/torture/addigt.c: New test. * gcc.dg/torture/addigtu.c: New test. * gcc.dg/torture/addile.c: New test. * gcc.dg/torture/addileu.c: New test. * gcc.dg/torture/addilt.c: New test. * gcc.dg/torture/addiltu.c: New test. * gcc.dg/torture/addine.c: New test. * gcc.dg/torture/addleq.c: New test. * gcc.dg/torture/addlfeq.c: New test. * gcc.dg/torture/addlfge.c: New test. * gcc.dg/torture/addlfgt.c: New test. * gcc.dg/torture/addlfle.c: New test. * gcc.dg/torture/addlflt.c: New test. * gcc.dg/torture/addlfne.c: New test. * gcc.dg/torture/addlge.c: New test. * gcc.dg/torture/addlgeu.c: New test. * gcc.dg/torture/addlgt.c: New test. * gcc.dg/torture/addlgtu.c: New test. * gcc.dg/torture/addlle.c: New test. * gcc.dg/torture/addlleu.c: New test. * gcc.dg/torture/addllt.c: New test. * gcc.dg/torture/addlltu.c: New test. * gcc.dg/torture/addlne.c: New test. * gcc.dg/torture/movieq.c: New test. * gcc.dg/torture/movifeq.c: New test. * gcc.dg/torture/movifge.c: New test. * gcc.dg/torture/movifgt.c: New test. * gcc.dg/torture/movifle.c: New test. * gcc.dg/torture/moviflt.c: New test. * gcc.dg/torture/movifne.c: New test. * gcc.dg/torture/movige.c: New test. * gcc.dg/torture/movigeu.c: New test. * gcc.dg/torture/movigt.c: New test. * gcc.dg/torture/movigtu.c: New test. * gcc.dg/torture/movile.c: New test. * gcc.dg/torture/movileu.c: New test. * gcc.dg/torture/movilt.c: New test. * gcc.dg/torture/moviltu.c: New test. * gcc.dg/torture/movine.c: New test. * gcc.dg/torture/movleq.c: New test. * gcc.dg/torture/movlfeq.c: New test. * gcc.dg/torture/movlfge.c: New test. * gcc.dg/torture/movlfgt.c: New test. * gcc.dg/torture/movlfle.c: New test. * gcc.dg/torture/movlflt.c: New test. * gcc.dg/torture/movlfne.c: New test. * gcc.dg/torture/movlge.c: New test. * gcc.dg/torture/movlgeu.c: New test. * gcc.dg/torture/movlgt.c: New test. * gcc.dg/torture/movlgtu.c: New test. * gcc.dg/torture/movlle.c: New test. * gcc.dg/torture/movlleu.c: New test. * gcc.dg/torture/movllt.c: New test. * gcc.dg/torture/movlltu.c: New test. * gcc.dg/torture/movlne.c: New test.	2023-11-22 01:18:23 +00:00
GCC Administrator	92c480a423	Daily bump.	2023-11-22 00:17:52 +00:00
Thomas Schwinge	a0240662b2	Fix 'gcc.dg/tree-ssa/return-value-range-1.c' for 'char' defaulting to 'unsigned' ... added in recent commit `53ba8d6695` "inter-procedural value range propagation", fixed in commit `878a860cae` "Fix 'gcc.dg/tree-ssa/return-value-range-1.c'". gcc/testsuite/ * gcc.dg/tree-ssa/return-value-range-1.c: Fix.	2023-11-21 22:16:07 +01:00
Robin Dapp	2bbc7f4ef6	vect: Allow reduc_index != 1 for COND_OPs. In PR112406 Tamar found another problem with COND_OP reductions. I wrongly assumed that the reduction variable will always remain in operand 1, just as we create the COND_OP in ifcvt. But of course, addition being commutative, we are free to swap operand 1 and 2 and we end up with e.g. _ifc__60 = .COND_ADD (_2, _6, MADPictureC1_lsm.10_25, MADPictureC1_lsm.10_25); which does not pass the asserts I put in place. This patch removes this restriction and allows the reduction index to be 2 as well. gcc/ChangeLog: PR middle-end/112406 * tree-vect-loop.cc (vectorize_fold_left_reduction): Allow reduction index != 1. (vect_transform_reduction): Handle reduction index != 1. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr112406-2.c: New test.	2023-11-21 21:06:13 +01:00
Robin Dapp	638c2f3caf	RISC-V: testsuite: Fix popcount test. Due to Jakub's recent middle-end changes we now vectorize some more popcount instances. This patch just adjusts the dump check. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/popcount.c: Adjust check. * lib/target-supports.exp: Add riscv_zbb.	2023-11-21 21:06:13 +01:00
Robin Dapp	686056b796	RISC-V: testsuite: Add rv64 requirement for bug-9 and bug-14. This adds an effective target requirement to compile the tests. Since we disabled 64-bit indices on rv32 targets those tests should be unsupported on rv32. gcc/testsuite/ChangeLog: * g++.target/riscv/rvv/base/bug-14.C: Add dg-require-effective-target rv64. * g++.target/riscv/rvv/base/bug-9.C: Ditto.	2023-11-21 21:06:13 +01:00
Robin Dapp	4efa929a02	RISC-V: testsuite: Do not set default arch for RVV. This removes setting of the default arch and abi in the testsuite. We should directly use what the target provides. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Remove -march and -mabi from default CFLAGS.	2023-11-21 21:06:13 +01:00
Jakub Jelinek	c7c1ee1cfd	sanitizer: Fix build on SPARC/Solaris with Solaris as [PR112562] Solaris as apparently doesn't accept %function and requires @function instead. This cherry-picks upstream commit. 2023-11-21 Jakub Jelinek <jakub@redhat.com> PR sanitizer/112562 * sanitizer_common/sanitizer_asm.h: Cherry-pick llvm-project revision a855a16a02e76a0f4192c038bb64f3773947a2f7. * interception/interception.h: Likewise.	2023-11-21 21:01:48 +01:00
Patrick O'Neill	f1b2f3a7e0	gfortran: Rely on dg-do-what-default to avoid running pr85853.f90, pr107254.f90 and vect-alias-check-1.F90 on non-vector targets Testcases in gfortran.dg/vect/vect.exp rely on check_vect_support_and_set_flags to set dg-do-what-default and avoid running vector tests on non-vector targets. The three testcases in this patch overwrite the default with dg-do run which causes issues for non-vector targets. Removing the dg-do run directive resolves this issue for non-vector targets (while still running the tests on vector targets). gcc/testsuite/ChangeLog: * gfortran.dg/vect/pr107254.f90: Remove dg-do run directive. * gfortran.dg/vect/pr85853.f90: Ditto. * gfortran.dg/vect/vect-alias-check-1.F90: Ditto. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>	2023-11-21 10:02:23 -08:00
Jonathan Wakely	7adb7c6ea4	libstdc++: Do not declare strtok for C++26 freestanding (P2937R0) This was recently approved for C++26. We should define the __cpp_lib_freestanding_cstring macro in <string.h> as well as <cstring>, but we do not currently install our own <string.h> for most targets. libstdc++-v3/ChangeLog: * include/bits/version.def (freestanding_cstring): Add. * include/bits/version.h: Regenerate. * include/c_compatibility/string.h (strtok): Do not declare for C++26 freestanding. * include/c_global/cstring (strtok): Likewise. * testsuite/21_strings/headers/cstring/version.cc: New test.	2023-11-21 15:58:21 +00:00
Jonathan Wakely	43626143c9	libstdc++: Add freestanding feature test macros (P2407R5) This C++26 change makes several classes "partially freestanding", but we already fully supported them in freestanding mode. All we need to do is define the new feature test macros and add tests for them. libstdc++-v3/ChangeLog: * include/bits/version.def (freestanding_algorithm) (freestanding_array, freestanding_optional) (freestanding_string_view, freestanding_variant): Add. * include/bits/version.h: Regenerate. * include/std/algorithm (__glibcxx_want_freestanding_algorithm): Define. * include/std/array (__glibcxx_want_freestanding_array): Define. * include/std/optional (__glibcxx_want_freestanding_optional): Define. * include/std/string_view (__glibcxx_want_freestanding_string_view): Define. * include/std/variant (__glibcxx_want_freestanding_variant): Define. * testsuite/20_util/optional/version.cc: Add checks for __cpp_lib_freestanding_optional. * testsuite/20_util/variant/version.cc: Add checks for __cpp_lib_freestanding_variant. * testsuite/23_containers/array/tuple_interface/get_neg.cc: Adjust dg-error line numbers. * testsuite/21_strings/basic_string_view/requirements/version.cc: New test. * testsuite/23_containers/array/requirements/version.cc: New test. * testsuite/25_algorithms/fill_n/requirements/version.cc: New test. * testsuite/25_algorithms/swap_ranges/requirements/version.cc: New test.	2023-11-21 15:58:21 +00:00
Jonathan Wakely	1fa85dcf65	libstdc++: Add std::span::at for C++26 (P2821R5) Also define the new feature test macros from P2833R2, indicating that std::span and std::expected are supported for freestanding mode. libstdc++-v3/ChangeLog: * include/bits/version.def (freestanding_expected): New macro. (span): Add C++26 value. * include/bits/version.h: Regenerate. * include/std/expected (__glibcxx_want_freestanding_expected): Define. * include/std/span (span::at): New member function. * testsuite/20_util/expected/version.cc: Add checks for __cpp_lib_freestanding_expected. * testsuite/23_containers/span/2.cc: Moved to... * testsuite/23_containers/span/version.cc: ...here. Add checks for __cpp_lib_span in <span> as well as in <version>. * testsuite/23_containers/span/1.cc: Removed. * testsuite/23_containers/span/at.cc: New test.	2023-11-21 15:58:21 +00:00
Jonathan Wakely	49f7620a12	libstdc++: Fix std::tr2::dynamic_bitset support for alternate characters libstdc++-v3/ChangeLog: * include/tr2/dynamic_bitset (dynamic_bitset): Pass zero and one characters to _M_copy_from_string. * testsuite/tr2/dynamic_bitset/string.cc: New test.	2023-11-21 15:58:21 +00:00
Jonathan Wakely	623b8081ab	libstdc++: Remove outdated references to buildstat.html The buildstat.html pages have not existed since gcc-8 so remove referencs to them in the libstdc++ manual. libstdc++-v3/ChangeLog: * doc/html/: Regenerate. doc/xml/faq.xml: Remove reference to buildstat.html pages. * doc/xml/manual/test.xml: Likewise	2023-11-21 15:58:20 +00:00
Richard Sandiford	32ce90c601	Add an aligned_register_operand predicate This patch adds a target-independent aligned_register_operand predicate, for use with register constraints that use filters to impose an alignment. The definition deliberately jetisons some of the historical baggage in general_operand. gcc/ * common.md (aligned_register_operand): New predicate.	2023-11-21 15:39:11 +00:00
Richard Sandiford	ef4e6e2c04	ira: Handle register filters This patch makes IRA apply register filters when picking hard registers. All the new code should be optimised away on targets that don't use register filters. On targets that do use them, the new register_filters bitfield is expected to be only a handful of bits. Information about register filters is recorded in process_bb_node_lives. The information isn't really related to liveness, but it's a convenient point because (a) we've already built the allocno structures and (b) we've already extracted the insn and preprocessed the constraints. gcc/ * ira-int.h (ira_allocno): Add a register_filters field. (ALLOCNO_REGISTER_FILTERS): New macro. (ALLOCNO_SET_REGISTER_FILTERS): Likewise. * ira-build.cc (ira_create_allocno): Initialize register_filters. (create_cap_allocno): Propagate register_filters. (propagate_allocno_info): Likewise. (propagate_some_info_from_allocno): Likewise. * ira-lives.cc (process_register_constraint_filters): New function. (process_bb_node_lives): Use it to record register filter information. * ira-color.cc (assign_hard_reg): Check register filters. (improve_allocation, fast_allocation): Likewise.	2023-11-21 15:39:10 +00:00
Richard Sandiford	4095fac5a4	lra: Handle register filters This patch makes LRA apply register filters. This plus the recog change is enough for correct code generation, but a follow-on IRA patch improves the allocation. All the new code should be optimised away on targets that don't use register filters. That's because get_register_filter just wraps "return nullptr" on those targets. gcc/ * lra-constraints.cc (process_alt_operands): Check register filters.	2023-11-21 15:39:10 +00:00
Richard Sandiford	8265164810	recog: Handle register filters The main (but simplest) part of this patch makes constrain_operands take register filters into account. The rest of the patch adds register filter information to operand_alternative. Generally, if two register constraints have different register filters, it's better if they're in separate alternatives. However, the syntax doesn't enforce that, and we can't assert it due to inline asms. So it's a choice between (a) adding code to enforce consistent filters or (b) dealing with mixes of filters in a conservatively correct way (in the sense of not allowing invalid operands). The latter seems much easier. The patch therefore adds a mask of the filters that apply to at least one constraint in a given operand alternative. A register is OK if it passes all of the filters in the mask. gcc/ * recog.h (operand_alternative): Add a register_filters field. (alternative_register_filters): New function. * recog.cc (preprocess_constraints): Calculate the filters field. (constrain_operands): Check register filters.	2023-11-21 15:39:09 +00:00
Richard Sandiford	09a85191d0	Add register filter operand to define_register_constraint The main way of enforcing registers to be aligned is through HARD_REGNO_MODE_OK. But this is a global property that applies to all operands. A given (regno, mode) pair is either globally valid or globally invalid. This patch instead adds a way of specifying that individual operands must be aligned. More generally, it allows constraints to specify a C++ condition that the operand's REGNO must satisfy. The condition must be invariant for a given set of target options, so that it can be precomputed and cached as a HARD_REG_SET. This information will be used in very compile-time-sensitive parts of the compiler. A lot of the complication is in allowing the information to be stored and tested without much memory cost, and without impacting targets that don't use the feature. Specifically: - Constraints are encouraged to test the absolute REGNO rather than an offset from the start of the containing class. For example, all constraints for even registers should use the same condition, such as "regno % 2 == 0". This requires the classes to start at even register boundaries, but that's already an implicit requirement due to things like the ira-costs.cc code that begins: /* Some targets allow pseudos to be allocated to unaligned sequences of hard registers. However, selecting an unaligned sequence can unnecessarily restrict later allocations. So increase the cost of unaligned hard regs to encourage the use of aligned hard regs. / - Each unique condition is given a "filter identifier". - The total number of filters is given by NUM_REGISTER_FILTERS, defined automatically in insn-config.h. Structures can therefore use a bitfield of NUM_REGISTER_FILTERS to represent a mask of filters. - There is a new target global, target_constraints, that caches the HARD_REG_SET for each filter. - There is a function for looking up the HARD_REG_SET filter for a given constraint and one for looking up the filter id. Both simply return a constant on targets that don't use the feature. - There are functions for testing a register against a specific filter, or against a mask of filters. This patch just adds the information. Later ones make use of it. gcc/ rtl.def (DEFINE_REGISTER_CONSTRAINT): Add an optional filter operand. * doc/md.texi (define_register_constraint): Document it. * doc/tm.texi.in: Reference it in discussion about aligned registers. * doc/tm.texi: Regenerate. * gensupport.h (register_filters, get_register_filter_id): Declare. * gensupport.cc (register_filter_map, register_filters): New variables. (get_register_filter_id): New function. (process_define_register_constraint): Likewise. (process_rtx): Pass define_register_constraints to process_define_register_constraint. * genconfig.cc (main): Emit a definition of NUM_REGISTER_FILTERS. * genpreds.cc (constraint_data): Add a filter field. (add_constraint): Update accordingly. (process_define_register_constraint): Pass the filter operand. (write_init_reg_class_start_regs): New function. (write_get_register_filter): Likewise. (write_get_register_filter_id): Likewise. (write_tm_preds_h): Write a definition of target_constraints, plus helpers to test its contents. Write the get_register_filter* functions. (write_insn_preds_c): Write init_reg_class_start_regs. * reginfo.cc (init_reg_class_start_regs): Declare. (init_reg_sets): Call it. * target-globals.h (this_target_constraints): Declare. (target_globals): Add a constraints field. (restore_target_globals): Update accordingly. * target-globals.cc: Include tm_p.h. (default_target_globals): Initialize the constraints field. (save_target_globals): Handle the constraints field. (target_globals::~target_globals): Likewise.	2023-11-21 15:39:09 +00:00
Richard Biener	aef1aaff41	tree-optimization/112623 - forwprop VEC_PACK_TRUNC generation For vec_pack_trunc patterns there can be an ambiguity for the source mode for BFmode vs HFmode. The vectorizer checks the insns operand mode for this, the following makes forwprop do the same. That of course doesn't help if the target supports both conversions. PR tree-optimization/112623 * tree-ssa-forwprop.cc (simplify_vector_constructor): Check the source mode of the insn for vector pack/unpacks. * gcc.target/i386/pr112623.c: New testcase.	2023-11-21 15:44:11 +01:00
Richard Biener	ae156936cf	Move VF based dependence check The following moves the check whether the maximum vectorization factor determined by data dependence analysis is in conflict with the chosen vectorization factor to after the point where we applied both the SLP and the unrolling adjustment to the vectorization factor. We check the latter before applying unrolling, but the SLP adjustment can result in both missed optimization and wrong-code. * tree-vect-loop.cc (vect_analyze_loop_2): Move check of VF against max_vf until VF is final.	2023-11-21 15:31:56 +01:00
Jan Hubicka	1d82fc2e68	optimize std::vector::push_back this patch speeds up the push_back at -O3 significantly by making the reallocation to be inlined by default. _M_realloc_insert is general insertion that takes iterator pointing to location where the value should be inserted. As such it contains code to move other entries around that is quite large. Since appending to the end of array is common operation, I think we should have specialized code for that. Sadly it is really hard to work out this from IPA passes, since we basically care whether the iterator points to the same place as the end pointer, which are both passed by reference. This is inter-procedural value numbering that is quite out of reach. I also added extra check making it clear that the new length of the vector is non-zero. This saves extra conditionals. Again it is quite hard case since _M_check_len seem to be able to return 0 if its parameter is 0. This never happens here, but we are not able to propagate this early nor at IPA stage. libstdc++-v3/ChangeLog: PR libstdc++/110287 PR middle-end/109811 PR middle-end/109849 * include/bits/stl_vector.h (_M_realloc_append): New member function. (push_back): Use it. * include/bits/vector.tcc: (emplace_back): Use it. (_M_realloc_insert): Let compiler know that new vector size is non-zero. (_M_realloc_append): New member function.	2023-11-21 15:17:16 +01:00
Iain Buclaw	1250858ac9	d: Merge upstream dmd ff57fec515, druntime ff57fec515, phobos 17bafda79. D front-end changes: - Import dmd v2.106.0-rc.1. - New'ing multi-dimensional arrays are now are converted to a single template call `_d_newarraymTX'. D runtime changes: - Import druntime v2.106.0-rc.1. Phobos changes: - Import phobos v2.106.0-rc.1. gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd ff57fec515. * dmd/VERSION: Bump version to v2.106.0-rc.1. * expr.cc (ExprVisitor::visit (CatAssignExp )): Update for new front-end interface. (ExprVisitor::visit (NewExp )): Likewise. * runtime.def (NEWARRAYMTX): Remove. (NEWARRAYMITX): Remove. libphobos/ChangeLog: * libdruntime/MERGE: Merge upstream druntime ff57fec515. * src/MERGE: Merge upstream phobos 17bafda79. gcc/testsuite/ChangeLog: * gdc.dg/asm1.d: Adjust expected diagnostic.	2023-11-21 15:07:47 +01:00
Juzhe-Zhong	8faae311a6	RISC-V: Disallow COSNT_VECTOR for DI on RV32 This bug is exposed when testing on zvl512b RV32 system. The rootcause is RA reload DI CONST_VECTOR into vmv.v.x then it ICE. So disallow DI CONST_VECTOR on RV32. PR target/112598 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_const_insns): Disallow DI CONST_VECTOR on RV32. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr112598-1.c: New test.	2023-11-21 21:37:59 +08:00
Iain Buclaw	87b9a01ea7	d: Merge upstream dmd 65a3da148c, phobos fc06c514a. D front-end changes: - Import latest bug fixes from dmd v2.106.0-beta.1. Phobos changes: - Import latest bug fixes from phobos v2.106.0-beta.1. - `std.range.primitives.isForwardRange' now takes an optional element type. gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd 65a3da148c. libphobos/ChangeLog: * src/MERGE: Merge upstream phobos fc06c514a.	2023-11-21 14:29:28 +01:00
Tamar Christina	da332ce109	AArch64: only emit mismatch error when features would be disabled. At the moment we emit a warning whenever you specify both -march and -mcpu and the architecture of them differ. The idea originally was that the user may not be aware of this change. However this has a few problems: 1. Architecture revisions is not an observable part of the architecture, extensions are. Starting with GCC 14 we have therefore relaxed the rule that all extensions can be enabled at any architecture level. Therefore it's incorrect, or at least not useful to keep the check on architecture. 2. It's problematic in Makefiles and other build systems, where you want to for certain files enable CPU specific builds. i.e. you may be by default building for -march=armv8-a but for some file for -mcpu=neoverse-n1. Since there's no easy way to remove the earlier options we end up warning and there's no way to disable just this warning. Build systems compiling with -Werror face an issue in this case that compiling with GCC is needlessly hard. 3. It doesn't actually warn for cases that may lead to issues, so e.g. -march=armv8.2-a+sve -mcpu=neoverse-n1 does not give a warning that SVE would be disabled. For this reason I have one of two proposals: 1. Just remove this warning all together. 2. Rework the warning based on extensions and only warn when features would be disabled by the presence of the -mcpu. This is the approach this patch has taken. As examples: > aarch64-none-linux-gnu-gcc -march=armv8.2-a+sve -mcpu=neoverse-n1 cc1: warning: switch ‘-mcpu=neoverse-n1’ conflicts with ‘-march=armv8.2-a+sve’ switch and resulted in options +crc+sve+norcpc+nodotprod being added .arch armv8.2-a+crc+sve > aarch64-none-linux-gnu-gcc -march=armv8.2-a -mcpu=neoverse-n1 > aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -mcpu=neoverse-n1 > aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -mcpu=neoverse-n2 <no warning> The one remaining issue here is that if both -march and -mcpu are specified we pick the -march. This is not particularly obvious and for the use case to be more useful I think it makes sense to pick the CPU's arch? I did not make that change in the patch as it changes semantics. Note that I can't write a test for this because dg-warning expects warnings to be at a particular line and doesn't support warnings at the "global" level. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_override_options): Rework warnings.	2023-11-21 13:25:25 +00:00
Tamar Christina	c187fe4bce	AArch64: Add new generic-armv9-a CPU and make it the default for Armv9 This patch adds a new generic scheduling model "generic-armv9-a" and makes it the default for all Armv9 architectures. -mcpu=generic and -mtune=generic is kept around for those that really want the previous cost model. gcc/ChangeLog: PR target/111370 * config/aarch64/aarch64-arches.def (armv9-a, armv9.1-a, armv9.2-a, armv9.3-a): Update to generic-armv9-a. * config/aarch64/aarch64-cores.def (generic-armv9-a): New. * config/aarch64/aarch64-tune.md: Regenerate. * config/aarch64/aarch64.cc: Include generic_armv9_a.h. * config/aarch64/tuning_models/generic_armv9_a.h: New file.	2023-11-21 13:25:25 +00:00
Tamar Christina	33c2b70dba	AArch64: Add new generic-armv8-a CPU and make it the default. This patch adds a new generic scheduling model "generic-armv8-a" and makes it the default for all Armv8 architectures. -mcpu=generic and -mtune=generic is kept around for those that really want the previous cost model. This shows on SPECCPU 2017 the following: generic: SPECINT 1.0% improvement in geomean, SPECFP -0.6%. The SPECFP is due to fotonik3d_r where we vectorize an FP calculation that only ever needs one lane of the result. This I believe is a generic costing bug but at the moment we can't change costs of FP and INT independently. So will defer updating that cost to stage3 after Richard's other costing updates land. generic SVE: SPECINT 1.1% improvement in geomean, SPECFP 0.7% improvement. gcc/ChangeLog: PR target/111370 * config/aarch64/aarch64-arches.def (armv8-9, armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a, armv8.5-a, armv8.6-a, armv8.7-a, armv8.8-a): Update to generic_armv8_a. * config/aarch64/aarch64-cores.def (generic-armv8-a): New. * config/aarch64/aarch64-tune.md: Regenerate. * config/aarch64/aarch64.cc: Include generic_armv8_a.h * config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Change to TARGET_CPU_generic_armv8_a. * config/aarch64/tuning_models/generic_armv8_a.h: New file. gcc/testsuite/ChangeLog: PR target/111370 * gcc.target/aarch64/sve/cond_asrd_1.c: Updated. * gcc.target/aarch64/sve/cond_cnot_4.c: Likewise. * gcc.target/aarch64/sve/cond_unary_5.c: Likewise. * gcc.target/aarch64/sve/cond_uxt_5.c: Likewise. * gcc.target/aarch64/target_attr_13.c: Likewise. * gcc.target/aarch64/target_attr_15.c: Likewise.	2023-11-21 13:25:10 +00:00
Tamar Christina	e5678468e5	AArch64: Remove special handling of generic cpu. In anticipation of adding new generic turning values this removes the hardcoding of the "generic" CPU and instead just specifies it as a normal CPU. No change in behavior is expected. gcc/ChangeLog: PR target/111370 * config/aarch64/aarch64-cores.def: Add generic. * config/aarch64/aarch64-opts.h (enum aarch64_proc): Remove generic. * config/aarch64/aarch64-tune.md: Regenerate * config/aarch64/aarch64.cc (all_cores): Remove generic * config/aarch64/aarch64.h (enum target_cpus): Remove TARGET_CPU_generic.	2023-11-21 13:20:10 +00:00
Tamar Christina	4b6da8e7bd	AArch64: Refactor costs models to different files. This patch series attempts to move the generic cost model in AArch64 to a new and modern generic standard. The current standard is quite old and generates very suboptimal code out of the box for user of GCC. The goal is for the new cost model to be beneficial on newer/current Arm Microarchitectures while not being too negative for older ones. It does not change any core specific optimization. The final changes reflect both performance optimizations and size optimizations. This first patch just re-organizes the cost structures to their own files. The AArch64.cc file has gotten very big and it's hard to follow. No functional changes are expected from this change. Note that since all the structures have private visibility I've put them in header files instead. gcc/ChangeLog: PR target/111370 * config/aarch64/aarch64.cc (generic_addrcost_table, exynosm1_addrcost_table, xgene1_addrcost_table, thunderx2t99_addrcost_table, thunderx3t110_addrcost_table, tsv110_addrcost_table, qdf24xx_addrcost_table, a64fx_addrcost_table, neoversev1_addrcost_table, neoversen2_addrcost_table, neoversev2_addrcost_table, generic_regmove_cost, cortexa57_regmove_cost, cortexa53_regmove_cost, exynosm1_regmove_cost, thunderx_regmove_cost, xgene1_regmove_cost, qdf24xx_regmove_cost, thunderx2t99_regmove_cost, thunderx3t110_regmove_cost, tsv110_regmove_cost, a64fx_regmove_cost, neoversen2_regmove_cost, neoversev1_regmove_cost, neoversev2_regmove_cost, generic_vector_cost, a64fx_vector_cost, qdf24xx_vector_cost, thunderx_vector_cost, tsv110_vector_cost, cortexa57_vector_cost, exynosm1_vector_cost, xgene1_vector_cost, thunderx2t99_vector_cost, thunderx3t110_vector_cost, ampere1_vector_cost, generic_branch_cost, generic_tunings, cortexa35_tunings, cortexa53_tunings, cortexa57_tunings, cortexa72_tunings, cortexa73_tunings, exynosm1_tunings, thunderxt88_tunings, thunderx_tunings, tsv110_tunings, xgene1_tunings, emag_tunings, qdf24xx_tunings, saphira_tunings, thunderx2t99_tunings, thunderx3t110_tunings, neoversen1_tunings, ampere1_tunings, ampere1a_tunings, neoversev1_vector_cost, neoversev1_tunings, neoverse512tvb_vector_cost, neoverse512tvb_tunings, neoversen2_vector_cost, neoversen2_tunings, neoversev2_vector_cost, neoversev2_tunings a64fx_tunings): Split into own files. * config/aarch64/tuning_models/a64fx.h: New file. * config/aarch64/tuning_models/ampere1.h: New file. * config/aarch64/tuning_models/ampere1a.h: New file. * config/aarch64/tuning_models/cortexa35.h: New file. * config/aarch64/tuning_models/cortexa53.h: New file. * config/aarch64/tuning_models/cortexa57.h: New file. * config/aarch64/tuning_models/cortexa72.h: New file. * config/aarch64/tuning_models/cortexa73.h: New file. * config/aarch64/tuning_models/emag.h: New file. * config/aarch64/tuning_models/exynosm1.h: New file. * config/aarch64/tuning_models/generic.h: New file. * config/aarch64/tuning_models/neoverse512tvb.h: New file. * config/aarch64/tuning_models/neoversen1.h: New file. * config/aarch64/tuning_models/neoversen2.h: New file. * config/aarch64/tuning_models/neoversev1.h: New file. * config/aarch64/tuning_models/neoversev2.h: New file. * config/aarch64/tuning_models/qdf24xx.h: New file. * config/aarch64/tuning_models/saphira.h: New file. * config/aarch64/tuning_models/thunderx.h: New file. * config/aarch64/tuning_models/thunderx2t99.h: New file. * config/aarch64/tuning_models/thunderx3t110.h: New file. * config/aarch64/tuning_models/thunderxt88.h: New file. * config/aarch64/tuning_models/tsv110.h: New file. * config/aarch64/tuning_models/xgene1.h: New file.	2023-11-21 13:19:36 +00:00
Tamar Christina	f26f92b534	AArch64: Add pattern for unsigned widenings (uxtl) to zip{1,2} This changes unpack instructions to use zip{1,2} when doing a zero-extending widening operation. Permutes generally have a higher throughput than the widening operations. Zeros are shuffled into the top half of the registers. The testcase void d2 (unsigned * restrict a, unsigned short b, int n) { for (int i = 0; i < (n & -8); i++) a[i] = b[i]; } now generates: movi v1.4s, 0 .L3: ldr q0, [x1], 16 zip1 v2.8h, v0.8h, v1.8h zip2 v0.8h, v0.8h, v1.8h stp q2, q0, [x0] add x0, x0, 32 cmp x1, x2 bne .L3 instead of: .L3: ldr q0, [x1], 16 uxtl v1.4s, v0.4h uxtl2 v0.4s, v0.8h stp q1, q0, [x0] add x0, x0, 32 cmp x1, x2 bne .L3 Since we need the extra 0 register we do this only for the vectorizer's lo/hi pairs when we know the 0 will be floated outside of the loop. This gives an 8% speed-up in Imagick in SPECCPU 2017 on Neoverse V2. gcc/ChangeLog: config/aarch64/aarch64-simd.md (vec_unpack<su>_lo_<mode, vec_unpack<su>_lo_<mode): Split into... (vec_unpacku_lo_<mode, vec_unpacks_lo_<mode, vec_unpacku_lo_<mode, vec_unpacks_lo_<mode): ...These. (aarch64_usubw<mode>_<PERM_EXTEND:perm_hilo>_zip): New. (aarch64_uaddw<mode>_<PERM_EXTEND:perm_hilo>_zip): New. * config/aarch64/iterators.md (PERM_EXTEND, perm_index): New. (perm_hilo): Add UNSPEC_ZIP1, UNSPEC_ZIP2. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/vmovl_high_1.c: Update codegen. * gcc.target/aarch64/uxtl-combine-1.c: New test. * gcc.target/aarch64/uxtl-combine-2.c: New test. * gcc.target/aarch64/uxtl-combine-3.c: New test. * gcc.target/aarch64/uxtl-combine-4.c: New test. * gcc.target/aarch64/uxtl-combine-5.c: New test. * gcc.target/aarch64/uxtl-combine-6.c: New test.	2023-11-21 13:18:23 +00:00
Tamar Christina	5ff006bd3d	AArch64: only discount MLA for vector and scalar statements In testcases gcc.dg/tree-ssa/slsr-19.c and gcc.dg/tree-ssa/slsr-20.c we have a fairly simple computation. On the current generic costing we generate: f: add w0, w0, 2 madd w1, w0, w1, w1 lsl w0, w1, 1 ret but on any other cost model but generic (including the new up coming generic) we generate: f: adrp x2, .LC0 dup v31.2s, w0 fmov s30, w1 ldr d29, [x2, #:lo12:.LC0] add v31.2s, v31.2s, v29.2s mul v31.2s, v31.2s, v30.s[0] addp v31.2s, v31.2s, v31.2s fmov w0, s31 ret .LC0: .word 2 .word 4 This seems to be because the vectorizer thinks the vector transfers are free: x1_4 + x2_6 1 times vector_stmt costs 0 in body x1_4 + x2_6 1 times vec_to_scalar costs 0 in body This happens because the stmt it's using to get the cost of register transfers for the given type happens to be one feeding into a MUL. we incorrectly discount the + for the register transfer. This is fixed by guarding the check for aarch64_multiply_add_p with a kind check and only do it for scalar_stmt and vector_stmt. I'm sending this separate to my patch series but it's required for it. It also seems to fix overvectorization cases in fotonik3d_r in SPECCPU 2017. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_adjust_stmt_cost): Guard mla. (aarch64_vector_costs::count_ops): Likewise.	2023-11-21 13:14:29 +00:00
Juzhe-Zhong	0a033038cd	RISC-V: Add missing dump check of pr112438.c Notice the dump check is missing, add it. Committed as it is obvious. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr112438.c: Add missing dump check.	2023-11-21 20:57:59 +08:00

... 3 4 5 6 7 ...

205846 commits