A friend declaration can only have constraints if it is defined. If
multiple instantiations of a class template define the same friend function
signature, it's an error, but that shouldn't happen if it's constrained to
only be declared in one instantiation.
Currently we don't mangle requirements, so the foos all mangle the same and
actually instantiating #1 will break, but for now we can test that they're
considered distinct.
gcc/cp/ChangeLog:
* pt.cc (tsubst_friend_function): Check satisfaction.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-friend11.C: New test.
The parameter of `create_insn_allocnos' for any parent expression of `x'
has always been called `outer' rather than `parent', just as added by
commit d1bb282efb ("Fix for "FAIL: tmpdir-gcc.dg-struct-layout-1/t028
c_compat_x_tst.o compile, (internal compiler error)""),
<https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02611.html>. Correct
inline documentation accordingly.
gcc/
* ira-build.cc (create_insn_allocnos): Fix documentation.
My recent patch had return statements in the match.pd expressions
which were recently outlawed.. Unfornately I didn't rebase this
patch before committing so this broke the build.
I've just reflowed the conditions to avoid the returns.
gcc/ChangeLog:
* match.pd: Remove returns.
As a basis for optimized string functions (e.g., the by-pieces
implementations), we need orc.b available. This adds orc.b as an
unspec, so we can expand to it.
gcc/ChangeLog:
* config/riscv/bitmanip.md (orcb<mode>2): Add orc.b as an
unspec.
* config/riscv/riscv.md: Add UNSPEC_ORC_B.
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
The Ventana VT1 core supports quad-issue and instruction fusion.
This implemented TARGET_SCHED_MACRO_FUSION_P to keep fusible sequences
together and adds idiom matcheing for the supported fusion cases.
gcc/ChangeLog:
* config/riscv/riscv.cc (enum riscv_fusion_pairs): Add symbolic
constants to identify supported fusion patterns.
(struct riscv_tune_param): Add fusible_op field.
(riscv_macro_fusion_p): Implement.
(riscv_fusion_enabled_p): Implement.
(riscv_macro_fusion_pair_p): Implement and recognize fusible
idioms for Ventana VT1.
(TARGET_SCHED_MACRO_FUSION_P): Point to riscv_macro_fusion_p.
(TARGET_SCHED_MACRO_FUSION_PAIR_P): Point to
riscv_macro_fusion_pair_p.
The Ventana-VT1 core is compatible with rv64gc, Zb[abcs], Zifenci and
XVentanaCondOps.
This introduces a placeholder -mcpu=ventana-vt1, so tooling and
scripts don't need to change once full support (pipeline, tuning,
etc.) will become public later.
gcc/ChangeLog:
* config/riscv/riscv-cores.def (RISCV_TUNE): Add ventana-vt1.
(RISCV_CORE): Ditto.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type): Ditto.
* config/riscv/riscv.cc: Add tune_info for ventana-vt1.
* doc/invoke.texi: Document -mcpu= and -mtune with ventana-vt1.
This patch adds codegen for FEAT_CSSC from the 2022 Architecture extensions.
It fits various existing optabs in GCC quite well.
There are instructions for scalar signed/unsigned min/max, abs, ctz, popcount.
We have expanders for these already, so they are wired up to emit single-insn
patterns for the new TARGET_CSSC.
These instructions are enabled by the +cssc command-line extension.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-option-extensions.def (cssc): Define.
* config/aarch64/aarch64.h (AARCH64_ISA_CSSC): Define.
(TARGET_CSSC): Likewise.
* config/aarch64/aarch64.md (*aarch64_abs<mode>2_cssc_ins): New define_insn.
(abs<mode>2): Adjust for the above.
(aarch64_umax<mode>3_insn): New define_insn.
(umax<mode>3): Adjust for the above.
(*aarch64_popcount<mode>2_cssc_insn): New define_insn.
(popcount<mode>2): Adjust for the above.
(<optab><mode>3): New define_insn.
* config/aarch64/constraints.md (Usm): Define.
(Uum): Likewise.
* doc/invoke.texi (AArch64 options): Document +cssc.
* config/aarch64/iterators.md (MAXMIN_NOUMAX): New code iterator.
* config/aarch64/predicates.md (aarch64_sminmax_immediate): Define.
(aarch64_sminmax_operand): Likewise.
(aarch64_uminmax_immediate): Likewise.
(aarch64_uminmax_operand): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cssc_1.c: New test.
* gcc.target/aarch64/cssc_2.c: New test.
* gcc.target/aarch64/cssc_3.c: New test.
* gcc.target/aarch64/cssc_4.c: New test.
* gcc.target/aarch64/cssc_5.c: New test.
In plenty of image and video processing code it's common to modify pixel values
by a widening operation and then scale them back into range by dividing by 255.
This patch adds an named function to allow us to emit an optimized sequence
when doing an unsigned division that is equivalent to:
x = y / (2 ^ (bitsize (y)/2)-1)
For SVE2 this means we generate for:
void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n)
{
for (int i = 0; i < (n & -16); i+=1)
pixel[i] = (pixel[i] * level) / 0xff;
}
the following:
mov z3.b, #1
.L3:
ld1b z0.h, p0/z, [x0, x3]
mul z0.h, p1/m, z0.h, z2.h
addhnb z1.b, z0.h, z3.h
addhnb z0.b, z0.h, z1.h
st1b z0.h, p0, [x0, x3]
inch x3
whilelo p0.h, w3, w2
b.any .L3
instead of:
.L3:
ld1b z0.h, p1/z, [x0, x3]
mul z0.h, p0/m, z0.h, z1.h
umulh z0.h, p0/m, z0.h, z2.h
lsr z0.h, z0.h, #7
st1b z0.h, p1, [x0, x3]
inch x3
whilelo p1.h, w3, w2
b.any .L3
Which results in significantly faster code.
gcc/ChangeLog:
* config/aarch64/aarch64-sve2.md (@aarch64_bitmask_udiv<mode>3): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve2/div-by-bitmask_1.c: New test.
This adds an implementation for the new optab for unsigned pow2 bitmask for
AArch64.
The implementation rewrites:
x = y / (2 ^ (sizeof (y)/2)-1
into e.g. (for bytes)
(x + ((x + 257) >> 8)) >> 8
where it's required that the additions be done in double the precision of x
such that we don't lose any bits during an overflow.
Essentially the sequence decomposes the division into doing two smaller
divisions, one for the top and bottom parts of the number and adding the results
back together.
To account for the fact that shift by 8 would be division by 256 we add 1 to
both parts of x such that when 255 we still get 1 as the answer.
Because the amount we shift are half the original datatype we can use the
halfing instructions the ISA provides to do the operation instead of using
actual shifts.
For AArch64 this means we generate for:
void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n)
{
for (int i = 0; i < (n & -16); i+=1)
pixel[i] = (pixel[i] * level) / 0xff;
}
the following:
movi v3.16b, 0x1
umull2 v1.8h, v0.16b, v2.16b
umull v0.8h, v0.8b, v2.8b
addhn v5.8b, v1.8h, v3.8h
addhn v4.8b, v0.8h, v3.8h
uaddw v1.8h, v1.8h, v5.8b
uaddw v0.8h, v0.8h, v4.8b
uzp2 v0.16b, v0.16b, v1.16b
instead of:
umull v2.8h, v1.8b, v5.8b
umull2 v1.8h, v1.16b, v5.16b
umull v0.4s, v2.4h, v3.4h
umull2 v2.4s, v2.8h, v3.8h
umull v4.4s, v1.4h, v3.4h
umull2 v1.4s, v1.8h, v3.8h
uzp2 v0.8h, v0.8h, v2.8h
uzp2 v1.8h, v4.8h, v1.8h
shrn v0.8b, v0.8h, 7
shrn2 v0.16b, v1.8h, 7
Which results in significantly faster code.
Thanks for Wilco for the concept.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (@aarch64_bitmask_udiv<mode>3): New.
* config/aarch64/aarch64.cc (aarch64_vectorize_can_special_div_by_constant): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/div-by-bitmask.c: New test.
In plenty of image and video processing code it's common to modify pixel values
by a widening operation and then scale them back into range by dividing by 255.
e.g.:
x = y / (2 ^ (bitsize (y)/2)-1
This patch adds a new target hook can_special_div_by_const, similar to
can_vec_perm which can be called to check if a target will handle a particular
division in a special way in the back-end.
The vectorizer will then vectorize the division using the standard tree code
and at expansion time the hook is called again to generate the code for the
division.
Alot of the changes in the patch are to pass down the tree operands in all paths
that can lead to the divmod expansion so that the target hook always has the
type of the expression you're expanding since the types can change the
expansion.
gcc/ChangeLog:
* expmed.h (expand_divmod): Pass tree operands down in addition to RTX.
* expmed.cc (expand_divmod): Likewise.
* explow.cc (round_push, align_dynamic_address): Likewise.
* expr.cc (force_operand, expand_expr_divmod): Likewise.
* optabs.cc (expand_doubleword_mod, expand_doubleword_divmod):
Likewise.
* target.h: Include tree-core.
* target.def (can_special_div_by_const): New.
* targhooks.cc (default_can_special_div_by_const): New.
* targhooks.h (default_can_special_div_by_const): New.
* tree-vect-generic.cc (expand_vector_operation): Use it.
* doc/tm.texi.in: Document it.
* doc/tm.texi: Regenerate.
* tree-vect-patterns.cc (vect_recog_divmod_pattern): Check for support.
* tree-vect-stmts.cc (vectorizable_operation): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-div-bitmask-1.c: New test.
* gcc.dg/vect/vect-div-bitmask-2.c: New test.
* gcc.dg/vect/vect-div-bitmask-3.c: New test.
* gcc.dg/vect/vect-div-bitmask.h: New file.
For IEEE 754 floating point formats we can replace a sequence of alternative
+/- with fneg of a wider type followed by an fadd. This eliminated the need for
using a permutation. This patch adds a math.pd rule to recognize and do this
rewriting.
For
void f (float *restrict a, float *restrict b, float *res, int n)
{
for (int i = 0; i < (n & -4); i+=2)
{
res[i+0] = a[i+0] + b[i+0];
res[i+1] = a[i+1] - b[i+1];
}
}
we generate:
.L3:
ldr q1, [x1, x3]
ldr q0, [x0, x3]
fneg v1.2d, v1.2d
fadd v0.4s, v0.4s, v1.4s
str q0, [x2, x3]
add x3, x3, 16
cmp x3, x4
bne .L3
now instead of:
.L3:
ldr q1, [x0, x3]
ldr q2, [x1, x3]
fadd v0.4s, v1.4s, v2.4s
fsub v1.4s, v1.4s, v2.4s
tbl v0.16b, {v0.16b - v1.16b}, v3.16b
str q0, [x2, x3]
add x3, x3, 16
cmp x3, x4
bne .L3
Thanks to George Steed for the idea.
gcc/ChangeLog:
* generic-match-head.cc: Include langooks.
* gimple-match-head.cc: Likewise.
* match.pd: Add fneg/fadd rule.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/addsub_1.c: New test.
* gcc.target/aarch64/sve/addsub_1.c: New test.
I noticed that the opindex for -m80387
option was wrong. It was just 80387 which
was not consistent with the rest of the options.
This fixes that and uses "@opindex m80387".
Committed as obvious after "make html" and checking
the option index page.
gcc/ChangeLog:
* doc/invoke.texi: Fix @opindex
for m80387 option.
I noticed this during the conversion of the docs
to sphinx that some options in the option index had a -
in the front of it for the texinfo docs. When the sphinx
conversion was reverted, I thought I would fix the texinfo
documentation for these options.
Committed as obvious after doing "make html" to check
the resulting option index page.
gcc/ChangeLog:
* doc/invoke.texi: Remove the front - from
some @opindex.
This patch adds support for Ampere-1A CPU:
- recognize the name of the core and provide detection for -mcpu=native,
- updated extra_costs,
- adds a new fusion pair for (A+B+1 and A-B-1).
Ampere-1A and Ampere-1 have more timing difference than the extra
costs indicate, but these don't propagate through to the headline
items in our extra costs (e.g. the change in latency for scalar sqrt
doesn't have a corresponding table entry).
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere1a.
* config/aarch64/aarch64-cost-tables.h: Add ampere1a_extra_costs.
* config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSION_PAIR):
Define a new fusion pair for A+B+1/A-B-1 (i.e., add/subtract two
registers and then +1/-1).
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
idiom-matcher for the new fusion pair.
* doc/invoke.texi: Add ampere1a.
Cleanup only; no change in behavior.
This patch removes and rewrites some comments regarding initialization.
These initializions are needed, so there's no need to apologize for
initializing these variables.
Note that -gnatVa is not relevant; reads of uninitialized variables
are wrong, whether or not we get caught.
gcc/ada/
* atree.ads: Remove some comments.
* err_vars.ads: Likewise.
* scans.ads: Likewise.
* sinput.ads: Likewise.
* checks.ads: Likewise. Also add a "???" comment indicating an
obsolete comment that is too difficult to correct at this time.
* sem_attr.adb: Minor comment rewrite.
Previously, control flow redundancy only checked the visited bitmap
against the control flow graph at return points and before mandatory
tail calls, missing various other possibilities of exiting a
subprogram, such as by raising or propagating exceptions, and calling
noreturn functions. The checks inserted before returns also prevented
potential tail-call optimizations.
This incremental change introduces options to control checking at each
of these previously-missed checkpoints. Unless disabled, a cleanup is
introduced to check when an exceptions escapes a subprogram. To avoid
disrupting sibcall optimizations, when they are enabled, checks are
introduced before calls whose results are immediately returned,
whether or not they are ultimately optimized. If enabled, checks are
introduced before noreturn calls and exception raises, or only before
nothrow noreturn calls.
Add examples of code transformations to the GNAT RM.
gcc/ada/
* doc/gnat_rm/security_hardening_features.rst: Document optional
hardcfr checkpoints.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.
The compiler crashes when trying to do a static check for a range violation
in a type conversion of a Pos attribute applied to a prefix of a type derived
from a generic formal discrete type. This optimization was suppressed in the
case of formal types, because the upper bound may not be known, but it also
needs to be suppressed for types derived from formal types.
gcc/ada/
* checks.adb
(Apply_Type_Conversion_Checks): Apply Root_Type to the type of the
prefix of a Pos attribute when checking whether the type is a
formal discrete type.
Before this patch, non-capturingly parenthesized expressions with more
than one branch were processed incorrectly when part of a branch
followed by another branch. This patch fixes this by aligning the
handling of non-capturing parentheses with the handling of regular
parentheses.
gcc/ada/
* libgnat/s-regpat.adb
(Parse): Fix handling of non-capturing parentheses.
When applying explicitly SPARK_Mode on a separate library-level spec
and body for which a contract needs to be checked, compilation with
-gnata was failing on a spurious error related to SPARK_Mode
placement. Now fixed.
gcc/ada/
* sem_prag.adb (Analyze_Pragma): Add special case for the special
local subprogram created for contracts.
When instantiating a generic that has formal subprogram parameter with
contracts, e.g.:
generic
with procedure P with Pre => ..., Post => ...;
...
we create a wrapper that executes Pre/Post contracts before/after
calling the actual subprogram. Errors emitted for these contracts
will now have locations of the instance and not just of the generic.
gcc/ada/
* sem_ch12.adb (Build_Subprogram_Wrappers): Adjust slocs of the
copied aspects, just like we do in Build_Class_Wide_Expression for
inherited class-wide contracts.
Code cleanup related to expansion generic formal subprograms with
contracts for GNATprove.
gcc/ada/
* inline.adb (Replace_Formal): Tune whitespace.
* sem_ch12.adb (Check_Overloaded_Formal_Subprogram): Refine type
of a formal parameter and local variable; this routine operates on
nodes and not entities.
* sem_ch12.ads: Tune whitespace.
In GNATprove mode generic formal subprograms with Pre/Post contracts are
now expanded into wrappers, just like in ordinary compilation.
gcc/ada/
* sem_ch12.adb (Analyze_Associations): Expand wrappers for
GNATprove.
QNX and RTEMS support 64-bit atomic primitives.
gcc/ada/
* libgnat/system-qnx-arm.ads: Set Support_Atomic_Primitives to
True.
* libgnat/system-rtems.ads: Add Support_Atomic_Primitives.
When flag -gnatdF is used, source code lines are displayed to point
the location of errors. The code of the instantiation was displayed
in case of errors inside generic instances, which was not precise.
Now the code inside the generic is displayed.
gcc/ada/
* errout.adb (Error_Msg_Internal): Store span for Optr field, and
adapt to new type of Optr.
(Finalize. Output_JSON_Message, Remove_Warning_Messages): Adapt to
new type of Optr.
(Output_Messages): Use Optr instead of Sptr to display code
snippet closer to error.
* erroutc.adb (dmsg): Adapt to new type of Optr.
* erroutc.ads (Error_Msg_Object): Make Optr a span like Sptr.
* errutil.adb (Error_Msg): Likewise.
The following merges match.pd patterns that cause genmatch complaints
about duplicates when in-order isn't enforced (you have to edit
genmatch.cc to do a full duplicate check).
* match.pd: Remove duplicates.
gcc/fortran/ChangeLog:
PR fortran/107444
* trans-openmp.cc (gfc_omp_check_optional_argument): Adjust to change
of prefix of internal symbol for presence status to '.'.
This target should have been changed by r13-3918-gba7551485bc576 and now
fails.
libstdc++-v3/ChangeLog:
* src/Makefile.am (install-debug): Remove use of $(debugdir).
* src/Makefile.in: Regenerate.