Commit graph

206237 commits

Author SHA1 Message Date
Victor Do Nascimento
142abf03bc aarch64: rcpc3: Add Neon ACLE intrinsics
Register the target specific builtins in `aarch64-simd-builtins.def'
and implement their associated backend patterns in `aarch64-simd.md'.

gcc/ChangeLog:

	* config/aarch64/aarch64-simd-builtins.def
	(vec_ldap1_lane): New.
	(vec_stl1_lane): Likewise.
	* config/aarch64/aarch64-simd.md
	(aarch64_vec_stl1_lanes<mode>_lane<Vel>): New.
	(aarch64_vec_stl1_lane<mode>): Likewise.
	(aarch64_vec_ldap1_lanes<mode>_lane<Vel>): Likewise.
	(aarch64_vec_ldap1_lane<mode>): Likewise.
	* config/aarch64/aarch64.md (UNSPEC_LDAP1_LANE): New.
	(UNSPEC_STL1_LANE): Likewise.
2023-12-07 03:26:27 +00:00
Victor Do Nascimento
1750c038f9 aarch64: rcpc3: Add relevant iterators to handle Neon intrinsics
The LDAP1 and STL1 Neon ACLE intrinsics, operating on 64-bit data
values, operate on single-lane (Vt.1D) or twin-lane (Vt.2D) SIMD
register configurations, either in the DI or DF modes.  This leads to
the need for a mode iterator accounting for the V1DI, V1DF, V2DI and
V2DF modes.

This patch therefore introduces the new V12DIF mode iterator with
which to generate functions operating on signed 64-bit integer and
float values and V12DIUP for generating the unsigned and
polynomial-type counterparts.  Along with this, we modify the
associated mode attributes accordingly in order to allow for the
implementation of the relevant backend patterns for the intrinsics.

gcc/ChangeLog:

	* config/aarch64/iterators.md (V12DIF): New.
	(V12DUP): Likewise.
	(VEL): Add support for all V12DIF-associated modes.
	(Vetype): Add support for V1DI and V1DF.
	(Vel): Likewise.
2023-12-07 03:14:18 +00:00
Victor Do Nascimento
df193bda74 aarch64: rcpc3: Add +rcpc3 extension
Given the optional LRCPC3 target support for Armv8.2-a cores onwards,
the +rcpc3 arch feature modifier is added to GCC's command-line options.

gcc/ChangeLog:

	* config/aarch64/aarch64-option-extensions.def (rcpc3): New.
	* config/aarch64/aarch64.h (AARCH64_ISA_RCPC3): Likewise.
	(TARGET_RCPC3): Likewise.
	* doc/invoke.texi (rcpc3): Document feature in AArch64 Options.
2023-12-07 03:14:18 +00:00
Hongyu Wang
3ba505c7b1 [APX NDD] Support TImode shift for NDD
For TImode shifts, they are splitted by splitter functions, which assume
operands[0] and operands[1] to be the same. For the NDD alternative the
assumption may not be true so add split functions for NDD to emit the NDD
form instructions, and omit the handling of !64bit target split.

Although the NDD form allows memory src, for post-reload splitter there are
no extra register to accept NDD form shift, especially shld/shrd. So only
accept register alternative for shift src under NDD.

gcc/ChangeLog:

	* config/i386/i386-expand.cc (ix86_split_ashl_ndd): New
	function to split NDD form lshift.
	(ix86_split_rshift_ndd): Likewise for l/ashiftrt.
	* config/i386/i386-protos.h (ix86_split_ashl_ndd): New
	prototype.
	(ix86_split_rshift_ndd): Likewise.
	* config/i386/i386.md (ashl<mode>3_doubleword): Add NDD
	alternative, call ndd split function when operands[0]
	not equal to operands[1].
	(define_split for doubleword lshift): Likewise.
	(define_peephole for doubleword lshift): Likewise.
	(<insn><mode>3_doubleword): Likewise for l/ashiftrt.
	(define_split for doubleword l/ashiftrt): Likewise.
	(define_peephole for doubleword l/ashiftrt): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd-ti-shift.c: New test.
2023-12-07 09:32:33 +08:00
Hongyu Wang
42cb34f94b [APX NDD] Support APX NDD for cmove insns
gcc/ChangeLog:

	* config/i386/i386.md (*mov<mode>cc_noc): Extend with new constraints
	to support NDD.
	(*movsicc_noc_zext): Likewise.
	(*movsicc_noc_zext_1): Likewise.
	(*movqicc_noc): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd-cmov.c: New test.
2023-12-07 09:31:15 +08:00
Hongyu Wang
5fb807e1e8 [APX NDD] Support APX NDD for shld/shrd insns
For shld/shrd insns, the old pattern use match_dup 0 as its shift src and use
+r*m as its constraint. To support NDD we added new define_insns to handle NDD
form pattern with extra input and dest operand to be fixed in register.

gcc/ChangeLog:

	* config/i386/i386.md (x86_64_shld_ndd): New define_insn.
	(x86_64_shld_ndd_1): Likewise.
	(*x86_64_shld_ndd_2): Likewise.
	(x86_shld_ndd): Likewise.
	(x86_shld_ndd_1): Likewise.
	(*x86_shld_ndd_2): Likewise.
	(x86_64_shrd_ndd): Likewise.
	(x86_64_shrd_ndd_1): Likewise.
	(*x86_64_shrd_ndd_2): Likewise.
	(x86_shrd_ndd): Likewise.
	(x86_shrd_ndd_1): Likewise.
	(*x86_shrd_ndd_2): Likewise.
	(*x86_64_shld_shrd_1_nozext): Adjust codegen under TARGET_APX_NDD.
	(*x86_shld_shrd_1_nozext): Likewise.
	(*x86_64_shrd_shld_1_nozext): Likewise.
	(*x86_shrd_shld_1_nozext): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd-shld-shrd.c: New test.
2023-12-07 09:31:15 +08:00
Hongyu Wang
d1dea413ef [APX NDD] Support APX NDD for rotate insns
gcc/ChangeLog:

	* config/i386/i386.md (*<insn><mode>3_1): Extend with a new
	alternative to support NDD for SI/DI rotate, and adjust output
	template.
	(*<insn>si3_1_zext): Likewise.
	(*<insn><mode>3_1): Likewise for QI/HI modes.
	(rcrsi2): Likewise, and use nonimmediate_operand for operands[1]
	to accept memory input for NDD alternative.
	(rcrdi2): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: Add test for left/right rotate.
2023-12-07 09:31:15 +08:00
Hongyu Wang
16172db2df [APX NDD] Support APX NDD for right shift insns
Similar to LSHIFT, rshift do not need to omit $1 for NDD form.

gcc/ChangeLog:

	* config/i386/i386.md (ashr<mode>3_cvt): Extend with new
	alternatives to support NDD, and adjust output templates.
	(*ashr<mode>3_1): Likewise for SI/DI mode.
	(*lshr<mode>3_1): Likewise.
	(*<insn>si3_1_zext): Likewise.
	(*ashr<mode>3_1): Likewise for QI/HI mode.
	(*lshrqi3_1): Likewise.
	(*lshrhi3_1): Likewise.
	(<insn><mode>3_cmp): Likewise.
	(*<insn><mode>3_cconly): Likewise.
	(*ashrsi3_cvt_zext): Likewise, and use nonimmediate_operand for
	operands[1] to accept memory input for NDD alternative.
	(*highpartdisi2): Likewise.
	(*<insn>si3_cmp_zext): Likewise.
	(<insn><mode>3_carry): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: Add l/ashiftrt tests.
2023-12-07 09:31:15 +08:00
Hongyu Wang
03655cd427 [APX NDD] Support APX NDD for left shift insns
For left shift, there is an optimization TARGET_DOUBLE_WITH_ADD that shl
1 can be optimized to add. As NDD form of add requires src operand to
be register since NDD cannot take 2 memory src, we currently just keep
using NDD form shift instead of add.

The optimization TARGET_SHIFT1 will try to remove constant 1 to use shorter
opcode, but under NDD assembler will automatically use it whether $1 exist
or not, so do not involve NDD with it.

The doubleword insns for left shift calls ix86_expand_ashl, which assume
all shift related pattern has same operand[0] and operand[1]. For these pattern
we will support them in a standalone patch.

gcc/ChangeLog:

	* config/i386/i386.md (*ashl<mode>3_1): Extend with new
	alternatives to support NDD, limit the new alternative to
	generate sal only, and adjust output template for NDD.
	(*ashlsi3_1_zext): Likewise.
	(*ashlhi3_1): Likewise.
	(*ashlqi3_1): Likewise.
	(*ashl<mode>3_cmp): Likewise.
	(*ashlsi3_cmp_zext): Likewise, and use nonimmediate_operand for
	operands[1] to accept memory input for NDD alternative.
	(*ashl<mode>3_cconly): Likewise.
	(*ashl<dwi>3_doubleword_highpart): Adjust codegen for NDD.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: Add tests for sal.
2023-12-07 09:31:14 +08:00
Kong Lingling
c95f67b896 [APX NDD] Support APX NDD for or/xor insn
Similar to AND insn, two splitters need to be adjusted to prevent
misoptimizaiton for NDD OR/XOR.

Also adjust *one_cmplsi2_2_zext and its corresponding splitter that will
generate xor insn.

gcc/ChangeLog:

	* config/i386/i386.md (<code><mode>3): Add new alternative for NDD
	and adjust output templates.
	(*<code><mode>_1): Likewise.
	(*<code>qi_1): Likewise.
	(*notxor<mode>_1): Likewise.
	(*<code>si_1_zext): Likewise.
	(*notxorqi_1): Likewise.
	(*<code><mode>_2): Likewise.
	(*<code>si_2_zext): Likewise.
	(*<code>si_2_zext_imm): Likewise.
	(*<code>si_1_zext_imm): Likewise, and use nonimmediate_operand for
	operands[1] to accept memory input for NDD alternative.
	(*one_cmplsi2_2_zext): Likewise.
	(define_split for *one_cmplsi2_2_zext): Use nonimmediate_operand for
	operands[3].
	(*<code><dwi>3_doubleword): Add NDD constraints, adopt '&' to NDD dest
	and emit move for optimized case if operands[0] != operands[1] or
	operands[4] != operands[5].
	(define_split for QI highpart OR/XOR): Prohibit splitter to split NDD
	form OR/XOR insn to <any_logic:code>qi_ext<mode>_3.
	(define_split for QI strict_lowpart optimization): Prohibit splitter to
	split NDD form AND insn to *<code><mode>3_1_slp.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: Add or and xor test.
2023-12-07 09:31:14 +08:00
Kong Lingling
7463df5c2a [APX NDD] Support APX NDD for and insn
For NDD form AND insn, there are three splitter fixes after extending legacy
patterns.

1. APX NDD does not support high QImode registers like ah, bh, ch, dh, so for
some optimization splitters that generates highpart zero_extract for QImode
need to be prohibited under NDD pattern.

2. Legacy AND insn will use r/qm/L constraint, and a post-reload splitter will
transform it into zero_extend move. But for NDD form AND, the splitter is not
strict enough as the splitter assum such AND will have the const_int operand
matching the constraint "L", then NDD form AND allows const_int with any QI
values. Restrict the splitter condition to match "L" constraint that strictly
matches zero-extend sematic.

3. Legacy AND insn will adopt r/0/Z constraint, a splitter will try to optimize
such form into strict_lowpart QImode AND when 7th bit is not set. But the
splitter will wronly convert non-zext form of NDD and with memory src, then the
strict_lowpart transform matches alternative 1 of *<code><mode>_slp_1 and
generates *movstrict<mode>_1 so the zext sematic was omitted. This could cause
highpart of dest not cleared and generates wrong code. Disable the splitter
when NDD adopted and operands[0] and operands[1] are not equal.

gcc/ChangeLog:

	* config/i386/i386.md (and<mode>3): Add NDD alternatives and adjust
	output template.
	(*anddi_1): Likewise.
	(*and<mode>_1): Likewise.
	(*andqi_1): Likewise.
	(*andsi_1_zext): Likewise.
	(*anddi_2): Likewise.
	(*andsi_2_zext): Likewise.
	(*andqi_2_maybe_si): Likewise.
	(*and<mode>_2): Likewise.
	(*and<dwi>3_doubleword): Add NDD alternative, adopt '&' to NDD dest and
	emit move for optimized case if operands[0] not equal to operands[1].
	(define_split for QI highpart AND): Prohibit splitter to split NDD
	form AND insn to <any_logic:code>qi_ext<mode>_3.
	(define_split for QI strict_lowpart optimization): Prohibit splitter to
	split NDD form AND insn to *<code><mode>3_1_slp.
	(define_split for zero_extend and optimization): Prohibit splitter to
	split NDD form AND insn to zero_extend insn.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: Add and test.
2023-12-07 09:31:14 +08:00
Kong Lingling
c778241dbd [APX NDD] Support APX NDD for not insn
For *one_cmplsi2_2_zext, it will be splitted to xor, so its NDD form will be
added together with xor NDD support.

gcc/ChangeLog:

	* config/i386/i386.md (one_cmpl<mode>2): Add new constraints for NDD
	and adjust output template.
	(*one_cmpl<mode>2_1): Likewise.
	(*one_cmplqi2_1): Likewise.
	(*one_cmpl<dwi>2_doubleword): Likewise, and adopt '&' to NDD dest.
	(*one_cmpl<mode>2_2): Likewise.
	(*one_cmplsi2_1_zext): Likewise, and use nonimmediate_operand for
	operands[1] to accept memory input for NDD alternative.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: Add not test.
2023-12-07 09:31:14 +08:00
Kong Lingling
042519b617 [APX NDD] Support APX NDD for neg insn
gcc/ChangeLog:

	* config/i386/i386-expand.cc (ix86_expand_unary_operator): Add use_ndd
	parameter and adjust for NDD.
	* config/i386/i386-protos.h: Add use_ndd parameter for
	ix86_unary_operator_ok and ix86_expand_unary_operator.
	* config/i386/i386.cc (ix86_unary_operator_ok): Add use_ndd parameter
	and adjust for NDD.
	* config/i386/i386.md (neg<mode>2): Add new constraint for NDD and
	adjust output template.
	(*neg<mode>_1): Likewise.
	(*neg<dwi>2_doubleword): Likewise and adopt '&' to NDD dest.
	(*neg<mode>_2): Likewise.
	(*neg<mode>_ccc_1): Likewise.
	(*neg<mode>_ccc_2): Likewise.
	(*negsi_1_zext): Likewise, and use nonimmediate_operand for operands[1]
	to accept memory input for NDD alternatives.
	(*negsi_2_zext): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: Add neg test.
2023-12-07 09:31:14 +08:00
Kong Lingling
57fdb5c244 [APX NDD] Support APX NDD for sbb insn
Similar to *add<dwi>3_doubleword, operands[1] may not equal to operands[0] so
extra move and earlyclobber are required.

gcc/ChangeLog:

	* config/i386/i386.md (*sub<dwi>3_doubleword): Add new alternative for
	NDD, adopt '&' modifier to NDD dest and emit move when operands[0] not
	equal to operands[1].
	(*sub<dwi>3_doubleword_zext): Likewise.
	(*subv<dwi>4_doubleword): Likewise.
	(*subv<dwi>4_doubleword_1): Likewise.
	(*subv<mode>4_overflow_1): Add NDD alternatives and adjust output
	templates.
	(*subv<mode>4_overflow_2): Likewise.
	(@sub<mode>3_carry): Likewise.
	(*addsi3_carry_zext_0r): Likewise, and use nonimmediate_operand for
	operands[1] to accept memory input for NDD alternative.
	(*subsi3_carry_zext): Likewise.
	(subborrow<mode>): Parse TARGET_APX_NDD to ix86_binary_operator_ok.
	(subborrow<mode>_0): Likewise.
	(*sub<mode>3_eq): Likewise.
	(*sub<mode>3_ne): Likewise.
	(*sub<mode>3_eq_1): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd-sbb.c: New test.
2023-12-07 09:31:14 +08:00
Kong Lingling
c601744469 [APX NDD] Support APX NDD for sub insns
gcc/ChangeLog:

	* config/i386/i386-expand.cc (ix86_fixup_binary_operands_no_copy):
	Add use_ndd parameter and parse it.
	* config/i386/i386-protos.h (ix86_fixup_binary_operands_no_copy):
	Change define.
	* config/i386/i386.md (sub<mode>3): Add new alternatives for NDD
	and adjust output templates.
	(*sub<mode>_1): Likewise.
	(*sub<mode>_2): Likewise.
	(subv<mode>4): Likewise.
	(*subv<mode>4): Likewise.
	(subv<mode>4_1): Likewise.
	(usubv<mode>4): Likewise.
	(*sub<mode>_3): Likewise.
	(*subsi_1_zext): Likewise, and use nonimmediate_operand for operands[1]
	to accept memory input for NDD alternatives.
	(*subsi_2_zext): Likewise.
	(*subsi_3_zext): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: Add test for ndd sub.
2023-12-07 09:31:13 +08:00
Kong Lingling
592dc08e05 [APX NDD] Support APX NDD for adc insns
Legacy adc patterns are commonly adopted to TImode add, when extending TImode
add to NDD version, operands[0] and operands[1] can be different, so extra move
should be emitted if those patterns have optimization when adding const0_rtx.

For TImode insn, there could be register overlapping between operands[0]
and operands[1] as x86 allocates TImode register sequentially like rax:rdi,
rdi:rdx. After postreload split for TImode, write to 1st highpart rdi will
be overrided by the 2nd lowpart rdi if 2nd lowpart rdi have different src as
input, then the write to 1st highpart rdi will missed and cause miscompliation.
In addition, when input operands contain memory, the address register may also
overlaps with dest register if it is marked dead after one of highpart/lowpart
operation was done.
So the earlyclobber modifier '&' should be added to NDD dest to avoid
overlapping between dest and src operands.

NDD instructions will automatically zero-extend dest register to 64bit, so for
zext patterns it can adopt all NDD form that have memory src input.

gcc/ChangeLog:

	* config/i386/i386.md (*add<dwi>3_doubleword): Add ndd alternatives,
	adopt '&' to ndd dest and move operands[1] to operands[0] when they are
	not equal.
	(*add<dwi>3_doubleword_cc_overflow_1): Likewise.
	(*addv<dwi>4_doubleword): Likewise.
	(*addv<dwi>4_doubleword_1): Likewise.
	(*add<dwi>3_doubleword_zext): Likewise.
	(addv<mode>4_overflow_1): Add ndd alternatives.
	(*addv<mode>4_overflow_2): Likewise.
	(@add<mode>3_carry): Likewise.
	(*add<mode>3_carry_0): Likewise.
	(*addsi3_carry_zext): Likewise.
	(addcarry<mode>): Likewise.
	(addcarry<mode>_0): Likewise.
	(*addcarry<mode>_1): Likewise.
	(*add<mode>3_eq): Likewise.
	(*add<mode>3_ne): Likewise.
	(*addsi3_carry_zext_0): Likewise, and use nonimmediate_operand for
	operands[1] to accept memory input for NDD alternative.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd-adc.c: New test.
2023-12-07 09:31:13 +08:00
Hongyu Wang
d564198f96 [APX NDD] Disable seg_prefixed memory usage for NDD add
NDD uses evex prefix, so when segment prefix is also applied, the instruction
could excceed its 15byte limit, especially adding immediates. This could happen
when "e" constraint accepts any UNSPEC_TPOFF/UNSPEC_NTPOFF constant and it will
add the offset to segment register, which will be encoded using segment prefix.
Disable those *POFF constant usage in NDD add alternatives with new constraint.

gcc/ChangeLog:

	* config/i386/constraints.md (je): New constraint.
	* config/i386/i386-protos.h (x86_poff_operand_p): New function to
	check any *POFF constant in operand.
	* config/i386/i386.cc (x86_poff_operand_p): New prototype.
	* config/i386/i386.md (*add<mode>_1): Split out je alternative for add.
2023-12-07 09:31:13 +08:00
Kong Lingling
7abcef725e [APX NDD] Support APX NDD for optimization patterns of add
gcc/ChangeLog:

	* config/i386/i386.md: (addsi_1_zext): Add new alternatives for
	NDD and adjust output templates.
	(*add<mode>_2): Likewise.
	(*addsi_2_zext): Likewise.
	(*add<mode>_3): Likewise.
	(*addsi_3_zext): Likewise.
	(*adddi_4): Likewise.
	(*add<mode>_4): Likewise.
	(*add<mode>_5): Likewise.
	(*addv<mode>4): Likewise.
	(*addv<mode>4_1): Likewise.
	(*add<mode>3_cconly_overflow_1): Likewise.
	(*add<mode>3_cc_overflow_1): Likewise.
	(*addsi3_zext_cc_overflow_1): Likewise.
	(*add<mode>3_cconly_overflow_2): Likewise.
	(*add<mode>3_cc_overflow_2): Likewise.
	(*addsi3_zext_cc_overflow_2): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: Add more test.
2023-12-07 09:31:13 +08:00
Kong Lingling
e21b2caf6d [APX NDD] Support Intel APX NDD for legacy add insn
APX NDD provides an extra destination register operand for several gpr
related legacy insns, so a new alternative can be adopted to operand1
with "r" constraint.

This first patch supports NDD for add instruction, and keeps to use lea
when all operands are registers since lea have shorter encoding. For
add operations containing mem NDD will be adopted to save an extra move.

In legacy x86 binary operation expand it will force operands[0] and
operands[1] to be the same so add a helper function to allow NDD form
pattern that operands[0] and operands[1] can be different.

gcc/ChangeLog:

	* config/i386/i386-expand.cc (ix86_fixup_binary_operands): Add
	new use_ndd flag to check whether ndd can be used for this binop
	and adjust operand emit.
	(ix86_binary_operator_ok): Likewise.
	(ix86_expand_binary_operator): Likewise, and void postreload
	expand generate lea pattern when use_ndd is explicit parsed.
	* config/i386/i386-options.cc (ix86_option_override_internal):
	Prohibit apx subfeatures when not in 64bit mode.
	* config/i386/i386-protos.h (ix86_binary_operator_ok):
	Add use_ndd flag.
	(ix86_fixup_binary_operand): Likewise.
	(ix86_expand_binary_operand): Likewise.
	* config/i386/i386.md (*add<mode>_1): Extend with new alternatives
	to support NDD, and adjust output template.
	(*addhi_1): Likewise.
	(*addqi_1): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: New test.
2023-12-07 09:31:13 +08:00
David Malcolm
08b7462d3a analyzer: fix taint false positives with UNKNOWN [PR112850]
PR analyzer/112850 reports a false positive from
-Wanalyzer-tainted-allocation-size on the Linux kernel [1] where
-fanalyzer complains that an allocation size is attacker-controlled
despite the value being correctly sanitized against upper and lower
limits.

The root cause is that the expression is sufficiently complex
to exceed the -param=analyzer-max-svalue-depth= threshold,
currently at 12, with depth 13, and so it is treated as UNKNOWN.
Hence the sanitizations are seen as comparisons of an UNKNOWN
symbolic value against constants, and these were being ignored
by the taint state machine.

The expression in question is relatively typical for those seen in
Linux kernel ioctl handlers, and I was surprised that it had exceeded
the analyzer's default expression complexity limit.

This patch addresses this problem in three ways:
(a) the default value of the threshold parameter is increased, from 12
to 18, so that such expressions are precisely handled
(b) adding a new -Wanalyzer-symbol-too-complex to warn when the symbol
complexity limit is reached.  This is off by default for users, and
on by default in the test suite.
(c) the taint state machine handles comparisons against UNKNOWN svalues
by dropping all taint information on that execution path, so that if
the complexity limit has been exceeded we don't generate false positives

As well as fixing the taint false positive (PR analyzer/112850), the
patch also fixes a couple of leak false positives seen on flex-generated
scanners (PR analyzer/103546).

[1] specifically, in sound/core/rawmidi.c's handler for
SNDRV_RAWMIDI_STREAM_OUTPUT.

gcc/ChangeLog:
	PR analyzer/103546
	PR analyzer/112850
	* doc/invoke.texi: Add -Wanalyzer-symbol-too-complex.

gcc/analyzer/ChangeLog:
	PR analyzer/103546
	PR analyzer/112850
	* analyzer.opt (-param=analyzer-max-svalue-depth=): Increase from
	12 to 18.
	(Wanalyzer-symbol-too-complex): New.
	* diagnostic-manager.cc
	(null_assignment_sm_context::clear_all_per_svalue_state): New.
	* engine.cc (impl_sm_context::clear_all_per_svalue_state): New.
	* program-state.cc (sm_state_map::clear_all_per_svalue_state):
	New.
	* program-state.h (sm_state_map::clear_all_per_svalue_state): New
	decl.
	* region-model-manager.cc
	(region_model_manager::reject_if_too_complex): Add
	-Wanalyzer-symbol-too-complex.
	* sm-taint.cc (taint_state_machine::on_condition): Handle
	comparisons against UNKNOWN.
	* sm.h (sm_context::clear_all_per_svalue_state): New.

gcc/testsuite/ChangeLog:
	PR analyzer/103546
	PR analyzer/112850
	* c-c++-common/analyzer/call-summaries-pr107158-2.c: Add
	-Wno-analyzer-symbol-too-complex.
	* c-c++-common/analyzer/call-summaries-pr107158.c: Likewise.
	* c-c++-common/analyzer/deref-before-check-pr109060-haproxy-cfgparse.c:
	Likewise.
	* c-c++-common/analyzer/feasibility-3.c: Add
	-Wno-analyzer-too-complex and -Wno-analyzer-symbol-too-complex.
	* c-c++-common/analyzer/flex-with-call-summaries.c: Add
	-Wno-analyzer-symbol-too-complex.  Remove fail for
	PR analyzer/103546 leak false positive.
	* c-c++-common/analyzer/flex-without-call-summaries.c: Remove
	xfail for PR analyzer/103546 leak false positive.
	* c-c++-common/analyzer/infinite-recursion-3.c: Add
	-Wno-analyzer-symbol-too-complex.
	* c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c:
	Likewise.
	* c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early.c:
	Likewise.
	* c-c++-common/analyzer/null-deref-pr108400-SoftEtherVPN-WebUi.c:
	Likewise.
	* c-c++-common/analyzer/null-deref-pr108806-qemu.c: Likewise.
	* c-c++-common/analyzer/null-deref-pr108830.c: Likewise.
	* c-c++-common/analyzer/pr94596.c: Likewise.
	* c-c++-common/analyzer/strtok-2.c: Likewise.
	* c-c++-common/analyzer/strtok-4.c: Add -Wno-analyzer-too-complex
	and -Wno-analyzer-symbol-too-complex.
	* c-c++-common/analyzer/strtok-cppreference.c: Likewise.
	* gcc.dg/analyzer/analyzer.exp: Add -Wanalyzer-symbol-too-complex
	to DEFAULT_CFLAGS.
	* gcc.dg/analyzer/attr-const-3.c: Add
	-Wno-analyzer-symbol-too-complex.
	* gcc.dg/analyzer/call-summaries-pr107072.c: Likewise.
	* gcc.dg/analyzer/doom-s_sound-pr108867.c: Likewise.
	* gcc.dg/analyzer/explode-4.c: Likewise.
	* gcc.dg/analyzer/null-deref-pr102671-1.c: Likewise.
	* gcc.dg/analyzer/null-deref-pr105755.c: Likewise.
	* gcc.dg/analyzer/out-of-bounds-curl.c: Likewise.
	* gcc.dg/analyzer/pr101503.c: Likewise.
	* gcc.dg/analyzer/pr103892.c: Add -Wno-analyzer-too-complex and
	-Wno-analyzer-symbol-too-complex.
	* gcc.dg/analyzer/pr94851-4.c: Add
	-Wno-analyzer-symbol-too-complex.
	* gcc.dg/analyzer/pr96860-1.c: Likewise.
	* gcc.dg/analyzer/pr96860-2.c: Likewise.
	* gcc.dg/analyzer/pr98918.c: Likewise.
	* gcc.dg/analyzer/pr99044-2.c: Likewise.
	* gcc.dg/analyzer/uninit-pr108806-qemu.c: Likewise.
	* gcc.dg/analyzer/use-after-free.c: Add -Wno-analyzer-too-complex
	and -Wno-analyzer-symbol-too-complex.
	* gcc.dg/plugin/plugin.exp: Add new tests for
	analyzer_kernel_plugin.c.
	* gcc.dg/plugin/taint-CVE-2011-0521-4.c: Update expected results.
	* gcc.dg/plugin/taint-CVE-2011-0521-5.c: Likewise.
	* gcc.dg/plugin/taint-CVE-2011-0521-6.c: Likewise.
	* gcc.dg/plugin/taint-CVE-2011-0521-5-fixed.c: Remove xfail.
	* gcc.dg/plugin/taint-pr112850-precise.c: New test.
	* gcc.dg/plugin/taint-pr112850-too-complex.c: New test.
	* gcc.dg/plugin/taint-pr112850-unsanitized.c: New test.
	* gcc.dg/plugin/taint-pr112850.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-12-06 19:25:26 -05:00
GCC Administrator
ae9e48e5c0 Daily bump. 2023-12-07 00:17:06 +00:00
Juzhe-Zhong
db642d60ee
RISC-V: Fix PR112888 ICE
Committed as it is obvious.

gcc/ChangeLog:

	* config/riscv/riscv-vsetvl.cc (extract_single_source): new function.
	(pre_vsetvl::compute_lcm_local_properties): Fix ICE.
2023-12-06 14:41:24 -08:00
Victor Do Nascimento
09a08df719 aarch64: Add rsr128 and wsr128 ACLE tests
Extend existing unit tests for the ACLE system register manipulation
functions to include 128-bit tests.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/acle/rwsr.c (get_rsr128): New.
	(set_wsr128): Likewise.
2023-12-06 21:22:11 +00:00
Victor Do Nascimento
88157c8817 aarch64: Implement 128-bit extension to ACLE sysreg r/w builtins
Implement the ACLE builtins for 128-bit system register manipulation:

  * __uint128_t __arm_rsr128(const char *special_register);
  * void __arm_wsr128(const char *special_register, __uint128_t value);

gcc/ChangeLog:

	* config/aarch64/aarch64-builtins.cc (AARCH64_RSR128): New
	`enum aarch64_builtins' value.
	(AARCH64_WSR128): Likewise.
	(aarch64_init_rwsr_builtins): Init `__builtin_aarch64_rsr128'
	and `__builtin_aarch64_wsr128' builtins.
	(aarch64_expand_rwsr_builtin): Extend function to handle
	`__builtin_aarch64_{rsr|wsr}128'.
	* config/aarch64/aarch64-protos.h (aarch64_retrieve_sysreg):
	Update function signature.
	* config/aarch64/aarch64.cc (F_REG_128): New.
	(aarch64_retrieve_sysreg): Add 128-bit register mode check.
	* config/aarch64/aarch64.md (UNSPEC_SYSREG_RTI): New.
	(UNSPEC_SYSREG_WTI): Likewise.
	(aarch64_read_sysregti): Likewise.
	(aarch64_write_sysregti): Likewise.
	* config/aarch64/arm_acle.h (__arm_rsr128): New.
	(__arm_wsr128): Likewise.
2023-12-06 21:20:36 +00:00
Victor Do Nascimento
eac59af05a aarch64: Sync `aarch64-sys-regs.def' with Binutils.
This patch updates `aarch64-sys-regs.def', bringing it into sync with
the Binutils source.

gcc/ChangeLog:

	* config/aarch64/aarch64-sys-regs.def: Copy from Binutils.
2023-12-06 21:19:53 +00:00
Victor Do Nascimento
3aba045882 aarch64: Add support for GCS system registers with the +gcs modifier
Given the introduction of system registers associated with the Guarded
Control Stack extension to Armv9.4-a in Binutils and their reliance on
the `+gcs' modifier, we implement the necessary changes in GCC to
allow for them to be recognized by the compiler.

gcc/ChangeLog:

	* config/aarch64/aarch64-option-extensions.def (gcs): New.
	* config/aarch64/aarch64.h (AARCH64_ISA_GCS): New.
	(TARGET_THE):  Likewise.
	* doc/invoke.texi (AArch64 Options): Describe GCS.
2023-12-06 21:19:53 +00:00
Victor Do Nascimento
16a05fac33 aarch64: Add march flags for +the and +d128 arch extensions
Given the introduction of optional 128-bit page table descriptor and
translation hardening extension support with the Arm9.4-a
architecture, this introduces the relevant flags to enable the reading
and writing of 128-bit system registers.

The `+d128' -march modifier enables the use of the following ACLE
builtin functions:

  * __uint128_t __arm_rsr128(const char *special_register);
  * void __arm_wsr128(const char *special_register, __uint128_t value);

and defines the __ARM_FEATURE_SYSREG128 macro to 1.

Finally, the `rcwmask_el1' and `rcwsmask_el1' 128-bit system register
implementations are also reliant on the enablement of the `+the' flag,
which is thus also implemented in this patch.

gcc/ChangeLog:

	* config/aarch64/aarch64-c.cc (__ARM_FEATURE_SYSREG128): New.
	* config/aarch64/aarch64-arches.def (armv8.9-a): New.
	(armv9.4-a): Likewise.
	* config/aarch64/aarch64-option-extensions.def (d128): Likewise.
	(the): Likewise.
	* config/aarch64/aarch64.h (AARCH64_ISA_V9_4A): Likewise.
	(AARCH64_ISA_V8_9A): Likewise.
	(TARGET_ARMV9_4): Likewise.
	(AARCH64_ISA_D128): Likewise.
	(AARCH64_ISA_THE): Likewise.
	(TARGET_D128): Likewise.
	* doc/invoke.texi (AArch64 Options): Document new -march flags
	and extensions.
2023-12-06 21:18:29 +00:00
Edwin Lu
1bd15d8703 RISC-V: Remove xfail from ssa-fre-3.c testcase
Ran the test case at 122e7b4f9d where the xfail
was introduced. The test did pass at that hash and has continued to pass since
then. Remove the xfail

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/ssa-fre-3.c: Remove xfail

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
2023-12-06 10:54:37 -08:00
Eric Gallager
ec266cbb85 remove qmtest-related Makefile targets
On GitHub, Joseph Myers (@jsm28 there) says in MentorEmbedded/qmtest#1
that the qmtest-related targets should have been removed long ago. This
patch does so.

Ref:
https://github.com/MentorEmbedded/qmtest/issues/1

gcc/ChangeLog:

	* Makefile.in: Remove qmtest-related targets.
2023-12-06 13:42:20 -05:00
Yang Yujie
72bfb4a2d0 [PATCH] testsuite: Adjust for the new permerror -Wincompatible-pointer-types
r14-6037 turned -Wincompatible-pointer-types into a permerror,
which causes the following tests to fail.

gcc/testsuite/ChangeLog:

	* gcc.dg/fixed-point/composite-type.c: Replace dg-warning with dg-error.
2023-12-06 10:47:16 -07:00
David Malcolm
3bd8241a1f diagnostics: prettify JSON output formats
Previously our JSON output emitted the JSON all on one line, with
no indentation to show the structure of the values.

Although it's easy to reformat such output (e.g. with
"python -m json.tool"), I've found it's a pain to need to do so
e.g. my text editor sometimes hangs when opening a multimegabyte
json file all on one line.  Similarly diff-ing is easier if the
json is already formatted.

This patch add whitespace to json output to show the structure.
It turned out to be fairly easy to implement using pretty_printer's
existing indentation machinery.

The patch uses this formatting for the various JSON-based diagnostic
output formats.

For example, with this patch, the output from
fdiagnostics-format=json-stderr looks like:

[{"kind": "warning",
  "message": "stack-based buffer overflow",
  "option": "-Wanalyzer-out-of-bounds",
  "option_url": "https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html#index-Wanalyzer-out-of-bounds",
  "children": [{"kind": "note",
                "message": "write of 350 bytes to beyond the end of ‘buf’",
                "locations": [{"caret": {"file": "../../src/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-19.c",
                                         "line": 20,
                                         "display-column": 3,
                                         "byte-column": 3,
                                         "column": 3},
                               "finish": {"file": "../../src/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-19.c",
                                          "line": 20,
                                          "display-column": 27,
                                          "byte-column": 27,
                                          "column": 27}}],
                "escape-source": false},
               {"kind": "note",
                "message": "valid subscripts for ‘buf’ are ‘[0]’ to ‘[99]’",
                "locations": [{"caret": {"file": "../../src/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-19.c",
                                         "line": 20,
                                         "display-column": 3,
                                         "byte-column": 3,
                                         "column": 3},
                               "finish": {"file": "../../src/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-19.c",
                                          "line": 20,
                                          "display-column": 27,
                                          "byte-column": 27,
                                          "column": 27}}],
                "escape-source": false}],
  "column-origin": 1,
...snip...]

I was able to update almost all of our DejaGnu test cases for JSON to
handle this format tweak, and IMHO it improved the readability of these
test cases, but a couple were more awkward.  Hence I added
-fno-diagnostics-json-formatting as an option to disable this
formatting.

The formatting does not affect the output of -fsave-optimization-record
or the JSON output from gcov (but this could be enabled if desirable).

gcc/analyzer/ChangeLog:
	* engine.cc (dump_analyzer_json): Use
	flag_diagnostics_json_formatting.

gcc/ChangeLog:
	* common.opt (fdiagnostics-json-formatting): New.
	* diagnostic-format-json.cc: Add "formatted" boolean
	to json_output_format and subclasses, and to the
	diagnostic_output_format_init_json_* functions.  Use it when
	printing JSON.
	* diagnostic-format-sarif.cc: Likewise for sarif_builder,
	sarif_output_format, and the various
	diagnostic_output_format_init_sarif_* functions.
	* diagnostic.cc (diagnostic_output_format_init): Add
	"json_formatting" boolean and pass on to the various cases.
	* diagnostic.h (diagnostic_output_format_init): Add
	"json_formatted" param.
	(diagnostic_output_format_init_json_stderr): Add "formatted" param
	(diagnostic_output_format_init_json_file): Likewise.
	(diagnostic_output_format_init_sarif_stderr): Likewise.
	(diagnostic_output_format_init_sarif_file): Likewise.
	(diagnostic_output_format_init_sarif_stream): Likewise.
	* doc/invoke.texi (-fdiagnostics-format=json): Remove discussion
	about JSON output needing formatting.
	(-fno-diagnostics-json-formatting): Add.
	* gcc.cc (driver_handle_option): Use
	opts->x_flag_diagnostics_json_formatting.
	* gcov.cc (generate_results): Pass "false" for new formatting
	option when printing json.
	* json.cc (value::dump): Add new "formatted" param.
	(object::print): Likewise, using it to add whitespace to format
	the JSON output.
	(array::print): Likewise.
	(float_number::print): Add new "formatted" param.
	(integer_number::print): Likewise.
	(string::print): Likewise.
	(literal::print): Likewise.
	(selftest::assert_print_eq): Add "formatted" param.
	(ASSERT_PRINT_EQ): Add "FORMATTED" param.
	(selftest::test_writing_objects): Test both formatted and
	unformatted printing.
	(selftest::test_writing_arrays): Likewise.
	(selftest::test_writing_float_numbers): Update for new param of
	ASSERT_PRINT_EQ.
	(selftest::test_writing_integer_numbers): Likewise.
	(selftest::test_writing_strings): Likewise.
	(selftest::test_writing_literals): Likewise.
	(selftest::test_formatting): New.
	(selftest::json_cc_tests): Call it.
	* json.h (value::print): Add "formatted" param.
	(value::dump): Likewise.
	(object::print): Likewise.
	(array::print): Likewise.
	(float_number::print): Likewise.
	(integer_number::print): Likewise.
	(string::print): Likewise.
	(literal::print): Likewise.
	* optinfo-emit-json.cc (optrecord_json_writer::write): Pass
	"false" for new formatting option when printing json.
	(selftest::test_building_json_from_dump_calls): Likewise.
	* opts.cc (common_handle_option): Use
	opts->x_flag_diagnostics_json_formatting.

gcc/testsuite/ChangeLog:
	* c-c++-common/diagnostic-format-json-1.c: Update expected JSON
	output to reflect whitespace.
	* c-c++-common/diagnostic-format-json-2.c: Likewise.
	* c-c++-common/diagnostic-format-json-3.c: Likewise.
	* c-c++-common/diagnostic-format-json-4.c: Likewise.
	* c-c++-common/diagnostic-format-json-5.c: Likewise.
	* c-c++-common/diagnostic-format-json-stderr-1.c: Likewise.
	* g++.dg/pr90462.C: Add -fno-diagnostics-json-formatting.
	* gcc.dg/analyzer/malloc-sarif-1.c: Likewise.
	* gcc.dg/plugin/diagnostic-test-paths-3.c: Update expected JSON
	output to reflect whitespace.
	* gfortran.dg/diagnostic-format-json-1.F90: Likewise.
	* gfortran.dg/diagnostic-format-json-2.F90: Likewise.
	* gfortran.dg/diagnostic-format-json-3.F90: Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-12-06 12:35:44 -05:00
David Malcolm
8fc4e6c397 diagnostics: use const and references for diagnostic_info
No functional change intended.

gcc/c-family/ChangeLog:
	* c-opts.cc (c_diagnostic_finalizer): Make "diagnostic" param
	const.

gcc/cp/ChangeLog:
	* cp-tree.h (cxx_print_error_function): Make diagnostic_info param
	const.
	* error.cc (cxx_print_error_function): Likewise.
	(cp_diagnostic_starter): Likewise.
	(cp_print_error_function): Likewise.

gcc/ChangeLog:
	* diagnostic-format-json.cc (on_begin_diagnostic): Convert param
	to const reference.
	(on_end_diagnostic): Likewise.
	(json_output_format::on_end_diagnostic): Likewise.
	* diagnostic-format-sarif.cc
	(sarif_invocation::add_notification_for_ice): Likewise.
	(sarif_result::on_nested_diagnostic): Likewise.
	(sarif_ice_notification::sarif_ice_notification): Likewise.
	(sarif_builder::end_diagnostic): Likewise.
	(sarif_builder::make_result_object): Likewise.
	(make_reporting_descriptor_object_for_warning): Likewise.
	(sarif_builder::make_locations_arr): Likewise.
	(sarif_output_format::on_begin_diagnostic): Likewise.
	(sarif_output_format::on_end_diagnostic): Likewise.
	* diagnostic.cc (default_diagnostic_starter): Make diagnostic_info
	param const.
	(default_diagnostic_finalizer): Likewise.
	(diagnostic_context::report_diagnostic): Pass diagnostic by
	reference to on_{begin,end}_diagnostic.
	(diagnostic_text_output_format::on_begin_diagnostic): Convert
	param to const reference.
	(diagnostic_text_output_format::on_end_diagnostic): Likewise.
	* diagnostic.h (diagnostic_starter_fn): Make diagnostic_info param
	const.
	(diagnostic_finalizer_fn): Likeewise.
	(diagnostic_output_format::on_begin_diagnostic): Convert param to
	const reference.
	(diagnostic_output_format::on_end_diagnostic): Likewise.
	(diagnostic_text_output_format::on_begin_diagnostic): Likewise.
	(diagnostic_text_output_format::on_end_diagnostic): Likewise.
	(default_diagnostic_starter): Make diagnostic_info param const.
	(default_diagnostic_finalizer): Likewise.
	* langhooks-def.h (lhd_print_error_function): Make diagnostic_info
	param const.
	* langhooks.cc (lhd_print_error_function): Likewise.
	* langhooks.h (lang_hooks::print_error_function): Likewise.
	* tree-diagnostic.cc (diagnostic_report_current_function):
	Likewise.
	(default_tree_diagnostic_starter): Likewise.
	(virt_loc_aware_diagnostic_finalizer): Likewise.
	* tree-diagnostic.h (diagnostic_report_current_function):
	Likewise.
	(virt_loc_aware_diagnostic_finalizer): Likewise.

gcc/fortran/ChangeLog:
	* error.cc (gfc_diagnostic_starter): Make diagnostic_info param
	const.
	(gfc_diagnostic_finalizer): Likewise.

gcc/jit/ChangeLog:
	* dummy-frontend.cc (jit_begin_diagnostic): Make diagnostic_info
	param const.
	(jit_end_diagnostic): Likewise.  Pass to add_diagnostic by
	reference.
	* jit-playback.cc (jit::playback::context::add_diagnostic):
	Convert diagnostic_info to const reference.
	* jit-playback.h (jit::playback::context::add_diagnostic):
	Likewise.

gcc/testsuite/ChangeLog:
	* g++.dg/plugin/show_template_tree_color_plugin.c
	(noop_starter_fn): Make diagnostic_info param const.
	* gcc.dg/plugin/diagnostic_group_plugin.c
	(test_diagnostic_starter): Likewise.
	* gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
	(custom_diagnostic_finalizer): Likewise.
	* gcc.dg/plugin/location_overflow_plugin.c
	(verify_unpacked_ranges): Likewise.
	(verify_no_columns): Likewise.

libcc1/ChangeLog:
	* context.cc (plugin_print_error_function): Make diagnostic_info
	param const.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-12-06 12:35:08 -05:00
Andrew Stubbs
e7d6c277fa amdgcn, libgomp: low-latency allocator
This implements the OpenMP low-latency memory allocator for AMD GCN using the
small per-team LDS memory (Local Data Store).

Since addresses can now refer to LDS space, the "Global" address space is
no-longer compatible.  This patch therefore switches the backend to use
entirely "Flat" addressing (which supports both memories).  A future patch
will re-enable "global" instructions for cases where it is known to be safe
to do so.

gcc/ChangeLog:

	* config/gcn/gcn-builtins.def (DISPATCH_PTR): New built-in.
	* config/gcn/gcn.cc (gcn_init_machine_status): Disable global
	addressing.
	(gcn_expand_builtin_1): Implement GCN_BUILTIN_DISPATCH_PTR.

libgomp/ChangeLog:

	* config/gcn/libgomp-gcn.h (TEAM_ARENA_START): Move to here.
	(TEAM_ARENA_FREE): Likewise.
	(TEAM_ARENA_END): Likewise.
	(GCN_LOWLAT_HEAP): New.
	* config/gcn/team.c (LITTLEENDIAN_CPU): New, and import hsa.h.
	(__gcn_lowlat_init): New prototype.
	(gomp_gcn_enter_kernel): Initialize the low-latency heap.
	* libgomp.h (TEAM_ARENA_START): Move to libgomp.h.
	(TEAM_ARENA_FREE): Likewise.
	(TEAM_ARENA_END): Likewise.
	* plugin/plugin-gcn.c (lowlat_size): New variable.
	(print_kernel_dispatch): Label the group_segment_size purpose.
	(init_environment_variables): Read GOMP_GCN_LOWLAT_POOL.
	(create_kernel_dispatch): Pass low-latency head allocation to kernel.
	(run_kernel): Use shadow; don't assume values.
	* testsuite/libgomp.c/omp_alloc-traits.c: Enable for amdgcn.
	* config/gcn/allocator.c: New file.
	* libgomp.texi: Document low-latency implementation details.
2023-12-06 16:48:57 +00:00
Andrew Stubbs
e9a19ead49 openmp, nvptx: low-lat memory access traits
The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all".  This change means that the omp_low_lat_mem_alloc predefined
allocator no longer works (but omp_cgroup_mem_alloc still does).

libgomp/ChangeLog:

	* allocator.c (MEMSPACE_VALIDATE): New macro.
	(omp_init_allocator): Use MEMSPACE_VALIDATE.
	(omp_aligned_alloc): Use OMP_LOW_LAT_MEM_ALLOC_INVALID.
	(omp_aligned_calloc): Likewise.
	(omp_realloc): Likewise.
	* config/nvptx/allocator.c (nvptx_memspace_validate): New function.
	(MEMSPACE_VALIDATE): New macro.
	(OMP_LOW_LAT_MEM_ALLOC_INVALID): New define.
	* libgomp.texi: Document low-latency implementation details.
	* testsuite/libgomp.c/omp_alloc-1.c (main): Add gnu_lowlat.
	* testsuite/libgomp.c/omp_alloc-2.c (main): Add gnu_lowlat.
	* testsuite/libgomp.c/omp_alloc-3.c (main): Add gnu_lowlat.
	* testsuite/libgomp.c/omp_alloc-4.c (main): Add access trait.
	* testsuite/libgomp.c/omp_alloc-5.c (main): Add gnu_lowlat.
	* testsuite/libgomp.c/omp_alloc-6.c (main): Add access trait.
	* testsuite/libgomp.c/omp_alloc-traits.c: New test.
2023-12-06 16:48:57 +00:00
Andrew Stubbs
30486fab71 libgomp, nvptx: low-latency memory allocator
This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc.  The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
the GOMP_NVPTX_LOWLAT_POOL environment variable.

The use of the PTX dynamic_smem_size feature means that low-latency allocator
will not work with the PTX 3.1 multilib.

For now, the omp_low_lat_mem_alloc allocator also works, but that will change
when I implement the access traits.

libgomp/ChangeLog:

	* allocator.c (MEMSPACE_ALLOC): New macro.
	(MEMSPACE_CALLOC): New macro.
	(MEMSPACE_REALLOC): New macro.
	(MEMSPACE_FREE): New macro.
	(predefined_alloc_mapping): New array.  Add _Static_assert to match.
	(ARRAY_SIZE): New macro.
	(omp_aligned_alloc): Use MEMSPACE_ALLOC.
	Implement fall-backs for predefined allocators.  Simplify existing
	fall-backs.
	(omp_free): Use MEMSPACE_FREE.
	(omp_calloc): Use MEMSPACE_CALLOC. Implement fall-backs for
	predefined allocators.  Simplify existing fall-backs.
	(omp_realloc): Use MEMSPACE_REALLOC, MEMSPACE_ALLOC, and MEMSPACE_FREE.
	Implement fall-backs for predefined allocators.  Simplify existing
	fall-backs.
	* config/nvptx/team.c (__nvptx_lowlat_pool): New asm variable.
	(__nvptx_lowlat_init): New prototype.
	(gomp_nvptx_main): Call __nvptx_lowlat_init.
	* libgomp.texi: Update memory space table.
	* plugin/plugin-nvptx.c (lowlat_pool_size): New variable.
	(GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar.
	(GOMP_OFFLOAD_run): Apply lowlat_pool_size.
	* basic-allocator.c: New file.
	* config/nvptx/allocator.c: New file.
	* testsuite/libgomp.c/omp_alloc-1.c: New test.
	* testsuite/libgomp.c/omp_alloc-2.c: New test.
	* testsuite/libgomp.c/omp_alloc-3.c: New test.
	* testsuite/libgomp.c/omp_alloc-4.c: New test.
	* testsuite/libgomp.c/omp_alloc-5.c: New test.
	* testsuite/libgomp.c/omp_alloc-6.c: New test.

Co-authored-by: Kwok Cheung Yeung  <kcy@codesourcery.com>
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2023-12-06 16:48:57 +00:00
John David Anglin
458e7c9379 Fix c-c++-common/fhardened-[12].c test fails on hppa
The -fstack-protector and -fstack-protector-strong options are
not supported on hppa since the stack grows up.

2023-12-06  John David Anglin  <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

	* c-c++-common/fhardened-1.c: Ignore __SSP_STRONG__ define
	if __hppa__ is defined.
	* c-c++-common/fhardened-2.c: Ignore __SSP__ define
	if __hppa__ is defined.
2023-12-06 15:38:50 +00:00
Juzhe-Zhong
c9d5b46a25 RISC-V: Fix VSETVL PASS bug
As PR112855 mentioned, the VSETVL PASS insert vsetvli in unexpected location.

Due to 2 reasons:
1. incorrect transparant computation LCM data. We need to check VL operand defs and uses.
2. incorrect fusion of unrelated edge which is the edge never reach the vsetvl expression.

	PR target/112855

gcc/ChangeLog:

	* config/riscv/riscv-vsetvl.cc
	(pre_vsetvl::compute_lcm_local_properties): Fix transparant LCM data.
	(pre_vsetvl::earliest_fuse_vsetvl_info): Disable earliest fusion for unrelated edge.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/pr112855.c: New test.
2023-12-06 22:35:18 +08:00
Jason Merrill
c1e54c82a9 c++: partial ordering of object parameter [PR53499]
Looks like we implemented option 1 (skip the object parameter) for CWG532
before the issue was resolved, and never updated to the final resolution of
option 2 (model it as a reference).  More recently CWG2445 extended this
handling to static member functions; I think that's wrong, and have
opened CWG2834 to address that and how explicit object member functions
interact with it.

The FIXME comments are to guide how the explicit object member function
support should change the uses of DECL_NONSTATIC_MEMBER_FUNCTION_P.

The library testsuite changes are to make partial ordering work again
between the generic operator- in the testcase and
_Pointer_adapter::operator-.

	DR 532
	PR c++/53499

gcc/cp/ChangeLog:

	* pt.cc (more_specialized_fn): Fix object parameter handling.

gcc/testsuite/ChangeLog:

	* g++.dg/template/partial-order4.C: New test.
	* g++.dg/template/spec26.C: Adjust for CWG532.

libstdc++-v3/ChangeLog:

	* testsuite/23_containers/vector/ext_pointer/types/1.cc
	* testsuite/23_containers/vector/ext_pointer/types/2.cc
	(N::operator-): Make less specialized.
2023-12-06 09:02:01 -05:00
Marek Polacek
e0eca4a55b build: unbreak bootstrap on uclinux targets [PR112762]
Currently, cross-compiling with --target=c6x-uclinux (and several other)
fails due to:

../../src/gcc/config/linux.h:221:45: error: 'linux_fortify_source_default_level' was not declared in this scope
 #define TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL linux_fortify_source_default_level

In the PR Andrew mentions that another fix would be in config.gcc,
but really, here I meant to use the target hook for glibc only, not
uclibc.  This trivial patch fixes the build problem.  It means that
-fhardened with uclibc will use -D_FORTIFY_SOURCE=2 and not =3.

	PR target/112762

gcc/ChangeLog:

	* config/linux.h: Redefine TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL for
	glibc only.
2023-12-06 08:34:26 -05:00
Thomas Schwinge
fbacdeff97 Modula-2: Support '-isysroot [...]'
In GCC cross configurations (tested '--target=amdgcn-amdhsa' and
'--target=nvptx-none') with a sysroot configured, the 'gm2' driver invocations
are passed '--sysroot=[...]', which is translated into '-isysroot [...]' for
the 'cc1gm2' compiler invocation.  The latter, however gets complained about:

    cc1gm2: warning: command-line option ‘-isysroot [...]’ is valid for C/C++/D/Fortran/ObjC/ObjC++ but not for Modula-2

..., and therefore a ton of FAILs.

Reproducer (also for non-cross, native configurations):

    $ build-gcc/gcc/gm2 -Bbuild-gcc/gcc -v --sysroot=/tmp -x modula-2 /dev/null
    [...]
     build-gcc/gcc/cc1gm2 [...] -isysroot [...]/tmp [...]
    cc1gm2: warning: command-line option ‘-isysroot /tmp’ is valid for C/C++/D/Fortran/ObjC/ObjC++ but not for Modula-2
    [...]

	gcc/m2/
	* lang.opt (-isysroot): New.
2023-12-06 12:36:15 +01:00
Jakub Jelinek
d7ceffab96 libgcc: Avoid -Wbuiltin-declaration-mismatch warnings in emutls.c
When libgcc is being built in --disable-tls configuration or on
a target without native TLS support, one gets annoying warnings:
../../../../libgcc/emutls.c:61:7: warning: conflicting types for built-in function ‘__emutls_get_address’; expected ‘void *(void *)’ [-Wbuiltin-declaration-mismatch]
   61 | void *__emutls_get_address (struct __emutls_object *);
      |       ^~~~~~~~~~~~~~~~~~~~
../../../../libgcc/emutls.c:63:6: warning: conflicting types for built-in function ‘__emutls_register_common’; expected ‘void(void *, unsigned int,  unsigned int,  void *)’
+[-Wbuiltin-declaration-mismatch]
   63 | void __emutls_register_common (struct __emutls_object *, word, word, void *);
      |      ^~~~~~~~~~~~~~~~~~~~~~~~
../../../../libgcc/emutls.c:140:1: warning: conflicting types for built-in function ‘__emutls_get_address’; expected ‘void *(void *)’ [-Wbuiltin-declaration-mismatch]
  140 | __emutls_get_address (struct __emutls_object *obj)
      | ^~~~~~~~~~~~~~~~~~~~
../../../../libgcc/emutls.c:204:1: warning: conflicting types for built-in function ‘__emutls_register_common’; expected ‘void(void *, unsigned int,  unsigned int,  void *)’
+[-Wbuiltin-declaration-mismatch]
  204 | __emutls_register_common (struct __emutls_object *obj,
      | ^~~~~~~~~~~~~~~~~~~~~~~~
The thing is that in that case __emutls_get_address and
__emutls_register_common are builtins, and are declared with void *
arguments rather than struct __emutls_object *.
Now, struct __emutls_object is a type private to libgcc/emutls.c and the
middle-end creates on demand when calling the builtins a similar structure
(with small differences, like not having the union in there).

We have a precedent for this e.g. for fprintf or strftime builtins where
the builtins are created with magic fileptr_type_node or const_tm_ptr_type_node
types and then match it with user definition of pointers to some structure,
but I think for this case users should never define these functions
themselves nor call them and having special types for them in the compiler
would mean extra compile time spent during compiler initialization and more
GC data, so I think it is better to keep the compiler as is.

On the library side, there is an option to just follow what the
compiler is doing and do
 EMUTLS_ATTR void
-__emutls_register_common (struct __emutls_object *obj,
+__emutls_register_common (void *xobj,
                           word size, word align, void *templ)
 {
+  struct __emutls_object *obj = (struct __emutls_object *) xobj;
but that will make e.g. libabigail complain about ABI change in libgcc.

So, the patch just turns the warning off.

2023-12-06  Thomas Schwinge  <thomas@codesourcery.com>
	    Jakub Jelinek  <jakub@redhat.com>

	PR libgcc/109289
	* emutls.c: Add GCC diagnostic ignored "-Wbuiltin-declaration-mismatch"
	pragma.
2023-12-06 12:27:12 +01:00
Victor Do Nascimento
9a8fdade94 aarch64: Add system register duplication check selftest
Add a build-time test to check whether system register data, as
imported from `aarch64-sys-reg.def' has any duplicate entries.

Duplicate entries are defined as any two SYSREG entries in the .def
file which share the same encoding values (as specified by its `CPENC'
field) and where the relationship amongst the two does not fit into
one of the following categories:

  * Simple aliasing: In some cases, it is observed that one
  register name serves as an alias to another.  One example of
  this is where TRCEXTINSELR aliases TRCEXTINSELR0.
  * Expressing intent: It is possible that when a given register
  serves two distinct functions depending on how it is used, it
  is given two distinct names whose use should match the context
  under which it is being used.  Example:  Debug Data Transfer
  Register. When used to receive data, it should be accessed as
  DBGDTRRX_EL0 while when transmitting data it should be
  accessed via DBGDTRTX_EL0.
  * Register depreciation: Some register names have been
  deprecated and should no longer be used, but backwards-
  compatibility requires that such names continue to be
  recognized, as is the case for the SPSR_EL1 register, whose
  access via the SPSR_SVC name is now deprecated.
  * Same encoding different target: Some encodings are given
  different meaning depending on the target architecture and, as
  such, are given different names in each of theses contexts.
  We see an example of this for CPENC(3,4,2,0,0), which
  corresponds to TTBR0_EL2 for Armv8-A targets and VSCTLR_EL2
  in Armv8-R targets.

A consequence of these observations is that `CPENC' duplication is
acceptable iff at least one of the `properties' or `arch_reqs' fields
of the `sysreg_t' structs associated with the two registers in
question differ and it's this condition that is checked by the new
`aarch64_test_sysreg_encoding_clashes' function.

gcc/ChangeLog:

	* config/aarch64/aarch64.cc
	(aarch64_test_sysreg_encoding_clashes): New.
	(aarch64_run_selftests): add call to
	aarch64_test_sysreg_encoding_clashes selftest.
2023-12-06 10:39:55 +00:00
Victor Do Nascimento
5af697d72d aarch64: Add front-end argument type checking for target builtins
In implementing the ACLE read/write system register builtins it was
observed that leaving argument type checking to be done at expand-time
meant that poorly-formed function calls were being "fixed" by certain
optimization passes, meaning bad code wasn't being properly picked up
in checking.

Example:

  const char *regname = "amcgcr_el0";
  long long a = __builtin_aarch64_rsr64 (regname);

is reduced by the ccp1 pass to

  long long a = __builtin_aarch64_rsr64 ("amcgcr_el0");

As these functions require an argument of STRING_CST type, there needs
to be a check carried out by the front-end capable of picking this up.

The introduced `check_general_builtin_call' function will be called by
the TARGET_CHECK_BUILTIN_CALL hook whenever a call to a builtin
belonging to the AARCH64_BUILTIN_GENERAL category is encountered,
carrying out any appropriate checks associated with a particular
builtin function code.

gcc/ChangeLog:

	* config/aarch64/aarch64-builtins.cc (aarch64_general_check_builtin_call):
	New.
	* config/aarch64/aarch64-c.cc (aarch64_check_builtin_call):
	Add `aarch64_general_check_builtin_call' call.
	* config/aarch64/aarch64-protos.h (aarch64_general_check_builtin_call):
	New.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/acle/rwsr-3.c: New.
2023-12-06 10:39:14 +00:00
Victor Do Nascimento
fc42900d21 aarch64: Implement system register r/w arm ACLE intrinsic functions
Implement the aarch64 intrinsics for reading and writing system
registers with the following signatures:

	uint32_t __arm_rsr(const char *special_register);
	uint64_t __arm_rsr64(const char *special_register);
	void* __arm_rsrp(const char *special_register);
	float __arm_rsrf(const char *special_register);
	double __arm_rsrf64(const char *special_register);
	void __arm_wsr(const char *special_register, uint32_t value);
	void __arm_wsr64(const char *special_register, uint64_t value);
	void __arm_wsrp(const char *special_register, const void *value);
	void __arm_wsrf(const char *special_register, float value);
	void __arm_wsrf64(const char *special_register, double value);

gcc/ChangeLog:

	* config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
	Add enums for new builtins.
	(aarch64_init_rwsr_builtins): New.
	(aarch64_general_init_builtins): Call aarch64_init_rwsr_builtins.
	(aarch64_expand_rwsr_builtin):  New.
	(aarch64_general_expand_builtin): Call aarch64_general_expand_builtin.
	* config/aarch64/aarch64.md (read_sysregdi): New insn_and_split.
	(write_sysregdi): Likewise.
	* config/aarch64/arm_acle.h (__arm_rsr): New.
	(__arm_rsrp): Likewise.
	(__arm_rsr64): Likewise.
	(__arm_rsrf): Likewise.
	(__arm_rsrf64): Likewise.
	(__arm_wsr): Likewise.
	(__arm_wsrp): Likewise.
	(__arm_wsr64): Likewise.
	(__arm_wsrf): Likewise.
	(__arm_wsrf64): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/acle/rwsr.c: New.
	* gcc.target/aarch64/acle/rwsr-1.c: Likewise.
	* gcc.target/aarch64/acle/rwsr-2.c: Likewise.
	* gcc.dg/pch/rwsr-pch.c: Likewise.
	* gcc.dg/pch/rwsr-pch.hs: Likewise.
2023-12-06 10:39:14 +00:00
Victor Do Nascimento
7d36ea7057 aarch64: Implement system register validation tools
Given the implementation of a mechanism of encoding system registers
into GCC, this patch provides the mechanism of validating their use by
the compiler.  In particular, this involves:

  1. Ensuring a supplied string corresponds to a known system
     register name.  System registers can be accessed either via their
     name (e.g. `SPSR_EL1') or their encoding (e.g. `S3_0_C4_C0_0').
     Register names are validated using a hash map, mapping known
     system register names to its corresponding `sysreg_t' struct,
     which is populated from the `aarch64_system_regs.def' file.
     Register name validation is done via `lookup_sysreg_map', while
     the encoding naming convention is validated via a parser
     implemented in this patch - `is_implem_def_reg'.
  2. Once a given register name is deemed to be valid, it is checked
     against a further 2 criteria:
       a. Is the referenced register implemented in the target
	    architecture?  This is achieved by comparing the ARCH field
	  in the relevant SYSREG entry from `aarch64_system_regs.def'
	  against `aarch64_feature_flags' flags set at compile-time.
       b. Is the register being used correctly?  Check the requested
       	  operation against the FLAGS specified in SYSREG.
	  This prevents operations like writing to a read-only system
	  register.

gcc/ChangeLog:

	* config/aarch64/aarch64-protos.h (aarch64_valid_sysreg_name_p): New.
	(aarch64_retrieve_sysreg): Likewise.
	* config/aarch64/aarch64.cc (is_implem_def_reg): Likewise.
	(aarch64_valid_sysreg_name_p): Likewise.
	(aarch64_retrieve_sysreg): Likewise.
	(aarch64_register_sysreg): Likewise.
	(aarch64_init_sysregs): Likewise.
	(aarch64_lookup_sysreg_map): Likewise.
	* config/aarch64/predicates.md (aarch64_sysreg_string): New.
2023-12-06 10:39:14 +00:00
Victor Do Nascimento
76d3114b04 aarch64: Add support for aarch64-sys-regs.def
This patch defines the structure of a new .def file used for
representing the aarch64 system registers, what information it should
hold and the basic framework in GCC to process this file.

Entries in the aarch64-system-regs.def file should be as follows:

  SYSREG (NAME, CPENC (sn,op1,cn,cm,op2), FLAG1 | ... | FLAGn, ARCH)

Where the arguments to SYSREG correspond to:
  - NAME:  The system register name, as used in the assembly language.
  - CPENC: The system register encoding, mapping to:

    	       s<sn>_<op1>_c<cn>_c<cm>_<op2>

  - FLAG: The entries in the FLAGS field are bitwise-OR'd together to
    	  encode extra information required to ensure proper use of
	  the system register.  For example, a read-only system
	  register will have the flag F_REG_READ, while write-only
	  registers will be labeled F_REG_WRITE.  Such flags are
	  tested against at compile-time.
  - ARCH: The architectural features the system register is associated
    	  with.  This is encoded via one of three possible macros:
	  1. When a system register is universally implemented, we say
	  it has no feature requirements, so we tag it with the
	  AARCH64_NO_FEATURES macro.
	  2. When a register is only implemented for a single
	  architectural extension EXT, the AARCH64_FEATURE (EXT), is
	  used.
	  3. When a given system register is made available by any of N
	  possible architectural extensions, the AARCH64_FEATURES(N, ...)
	  macro is used to combine them accordingly.

In order to enable proper interpretation of the SYSREG entries by the
compiler, flags defining system register behavior such as `F_REG_READ'
and `F_REG_WRITE' are also defined here, so they can later be used for
the validation of system register properties.

Finally, any architectural feature flags from Binutils missing from GCC
have appropriate aliases defined here so as to ensure
cross-compatibility of SYSREG entries across the toolchain.

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (sysreg_t): New.
	(aarch64_sysregs): Likewise.
	(AARCH64_FEATURE): Likewise.
	(AARCH64_FEATURES): Likewise.
	(AARCH64_NO_FEATURES): Likewise.
	* config/aarch64/aarch64.h (AARCH64_ISA_V8A): Add missing
	ISA flag.
	(AARCH64_ISA_V8_1A): Likewise.
	(AARCH64_ISA_V8_7A): Likewise.
	(AARCH64_ISA_V8_8A): Likewise.
	(AARCH64_NO_FEATURES): Likewise.
	(AARCH64_FL_RAS): New ISA flag alias.
	(AARCH64_FL_LOR): Likewise.
	(AARCH64_FL_PAN): Likewise.
	(AARCH64_FL_AMU): Likewise.
	(AARCH64_FL_SCXTNUM): Likewise.
	(AARCH64_FL_ID_PFR2): Likewise.
	(F_DEPRECATED): New.
	(F_REG_READ): Likewise.
	(F_REG_WRITE): Likewise.
	(F_ARCHEXT): Likewise.
	(F_REG_ALIAS): Likewise.
2023-12-06 10:39:13 +00:00
Victor Do Nascimento
b3df847026 aarch64: Sync system register information with Binutils
This patch adds the `aarch64-sys-regs.def' file, originally written
for Binutils, to GCC. In so doing, it provides GCC with the necessary
information for teaching the compiler about system registers known to
the assembler and how these can be used.

By aligning the representation of data common to different parts of
the toolchain we can greatly reduce the duplication of work,
facilitating the maintenance of the aarch64 back-end across different
parts of the toolchain; By keeping both copies of the file in sync,
any `SYSREG (...)' that is added in one project is automatically added
to its counterpart.  This being the case, no change should be made in
the GCC copy of the file.  Any modifications should first be made in
Binutils and the resulting file copied over to GCC.

GCC does not implement the full range of ISA flags present in
Binutils.  Where this is the case, aliases must be added to aarch64.h
with the unknown architectural extension being mapped to its
associated base architecture, such that any flag present in Binutils
and used in system register definitions is understood in GCC.  Again,
this is done such that flags can be used interchangeably between
projects making use of the aarch64-system-regs.def file.  This is done
in the next patch in the series.

`.arch' directives missing from the emitted assembly files as a
consequence of this aliasing are accounted for by the compiler using
the S<op0>_<op1>_<Cn>_<Cm>_<op2> encoding of system registers when
issuing mrs/msr instructions.  This design choice ensures the
assembler will accept anything that was deemed acceptable by the
compiler.

gcc/ChangeLog:

	* config/aarch64/aarch64-sys-regs.def: New.
2023-12-06 10:39:02 +00:00
Robin Dapp
056cce4128 RISC-V: Add vec_init expander for masks [PR112854].
PR112854 shows a problem on rv32 with zvl1024b.  During the course of
expand_constructor we try to overlay/subreg a 64-element mask by a
scalar (Pmode) register.  This works for zvl512b and its maximum of
32 elements but fails for rv32 and 64 elements.

To circumvent this this patch adds a vec_init expander for vector masks
by initializing a QImode vector and comparing that against 0.

gcc/ChangeLog:

	PR target/112854
	PR target/112872

	* config/riscv/autovec.md (vec_init<mode>qi): New expander.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/pr112854.c: New test.
	* gcc.target/riscv/rvv/autovec/pr112872.c: New test.
2023-12-06 10:27:48 +01:00
Jakub Jelinek
e44ed92dbb i386: Move vzeroupper pass from after reload pass to after postreload_cse [PR112760]
Regardless of the outcome of the REG_UNUSED discussions, I think
it is a good idea to move the vzeroupper pass one pass later.
As can be seen in the multiple PRs and as postreload.cc documents,
reload/LRA is known to create dead statements quite often, which
is the reason why we have postreload_cse pass at all.
Doing vzeroupper pass before such cleanup means the pass including
df_analyze for it needs to process more instructions than needed
and because mode switching adds note problem, also higher chance of
having stale REG_UNUSED notes.
And, I really don't see why vzeroupper can't wait until those cleanups
are done.

2023-12-06  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/112760
	* config/i386/i386-passes.def (pass_insert_vzeroupper): Insert
	after pass_postreload_cse rather than pass_reload.
	* config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper):
	Adjust comment for it.

	* gcc.dg/pr112760.c: New test.
2023-12-06 09:59:12 +01:00
Jakub Jelinek
0ca64f846e lower-bitint: Fix arithmetics followed by extension by many bits [PR112809]
A zero or sign extension from result of some upwards_2limb operation
is implemented in lower_mergeable_stmt as an extra loop which fills in
the extra bits with 0s or 1s.
If the delta of extended vs. unextended bit count is small, the code
doesn't use a loop and emits up to a couple of stores to constant indexes,
but if the delta is large, it uses
          cnt = (bo_bit != 0) + 1 + (rem != 0);
statements.  bo_bit is non-zero for bit-field loads and is done in that
case as straight line, the unconditional 1 in there is for a loop which
handles most of the limbs in the delta and finally (rem != 0) is for the
case when the extended precision is not a multiple of limb_prec and is
again done in straight line code (after the loop).
The testcase ICEs because the decision what idx to use was incorrect
for kind == bitint_prec_huge (i.e. when the precision delta is very large)
and rem == 0 (i.e. the extended precision is multiple of limb_prec).
In that case cnt is either 1 (if bo_bit == 0) or 2, and idx should
be either first size_int (start) and then result of create_loop (for bo_bit
!= 0) or just result of create_loop, but by mistake the last case
was size_int (end), which means when precision is multiple of limb_prec
storing above the precision (which ICEs; but also not emitting the loop
which is needed).

2023-12-06  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/112809
	* gimple-lower-bitint.cc (bitint_large_huge::lower_mergeable_stmt): For
	separate_ext in kind == bitint_prec_huge mode if rem == 0, create for
	i == cnt - 1 the loop rather than using size_int (end).

	* gcc.dg/bitint-48.c: New test.
2023-12-06 09:55:30 +01:00