Later patches allow using SVE modes in ldp/stp with -msve-vector-bits=128,
so we need to make sure that we don't use SVE addressing modes when
printing the address for the ldp/stp.
This patch does that.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_print_address_internal): Handle SVE
modes when printing ldp/stp addresses.
This adjusts aarch64_print_operand to recognize zero rtxes in modes other than
VOIDmode. This allows us to use xzr/wzr for zero vectors, for example.
We extract the test into a helper function, aarch64_const_zero_rtx_p, since this
predicate is needed by later patches.
gcc/ChangeLog:
* config/aarch64/aarch64-protos.h (aarch64_const_zero_rtx_p): New.
* config/aarch64/aarch64.cc (aarch64_const_zero_rtx_p): New.
Use it ...
(aarch64_print_operand): ... here. Recognize CONST0_RTXes in
modes other than VOIDmode.
This disables scheduling in the pr103147-10 tests. The tests use
check-function-bodies, and upcoming changes lead to a different
schedule.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/pr103147-10.C: Add -fno-schedule-insns{,2}
to dg-options.
* gcc.target/aarch64/pr103147-10.c: Likewise.
Later patches in the series allow ldp and stp to use SVE modes if
-msve-vector-bits=128 is provided. This patch therefore adjusts tests
that pass -msve-vector-bits=128 to allow ldp/stp to save/restore SVE
registers.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/pcs/stack_clash_1_128.c: Allow ldp/stp saves
of SVE registers.
* gcc.target/aarch64/sve/pcs/struct_3_128.c: Likewise.
The tests currently depend on memcpy lowering forming stps at -O0,
but we no longer want to form stps during memcpy lowering, but instead
in the upcoming load/store pair fusion pass.
This patch therefore tweaks affected tests to enable optimizations
(-O1), and adjusts the tests to avoid parts of the structures being
optimized away where necessary.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/auto-init-padding-1.c: Add -O to options,
adjust test to work with optimizations enabled.
* gcc.target/aarch64/auto-init-padding-2.c: Add -O to options.
* gcc.target/aarch64/auto-init-padding-3.c: Add -O to options,
adjust test to work with optimizations enabled.
* gcc.target/aarch64/auto-init-padding-4.c: Likewise.
* gcc.target/aarch64/auto-init-padding-9.c: Likewise.
This patch would like to add new sub extension (aka Zvfbfmin) to the
-march= option. It introduces a new data type BF16.
Depending on different usage scenarios, the Zvfbfmin extension may
depend on 'V' or 'Zve32f'. This patch only implements dependencies
in scenario of Embedded Processor. In scenario of Application
Processor, it is necessary to explicitly indicate the dependent
'V' extension.
You can locate more information about Zvfbfmin from below spec doc.
https://github.com/riscv/riscv-bfloat16/releases/download/20231027/riscv-bfloat16.pdf
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc:
(riscv_implied_info): Add zvfbfmin item.
(riscv_ext_version_table): Ditto.
(riscv_ext_flag_table): Ditto.
* config/riscv/riscv.opt:
(MASK_ZVFBFMIN): New macro.
(MASK_VECTOR_ELEN_BF_16): Ditto.
(TARGET_ZVFBFMIN): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-31.c: New test.
* gcc.target/riscv/arch-32.c: New test.
* gcc.target/riscv/predef-32.c: New test.
* gcc.target/riscv/predef-33.c: New test.
This patch introduces type checking during FoldBecomes and also
adds set/string/enum checking to the type checker. FoldBecomes
has been re-written, tidied up and re-factored.
gcc/m2/ChangeLog:
PR modula2/112946
* gm2-compiler/M2Check.mod (checkConstMeta): New procedure
function.
(checkConstEquivalence): New procedure function.
(doCheckPair): Add call to checkConstEquivalence.
* gm2-compiler/M2GenGCC.mod (ResolveConstantExpressions): Call
FoldBecomes with reduced parameters.
(FoldBecomes): Re-write.
(TryDeclareConst): New procedure.
(RemoveQuads): New procedure.
(DeclaredOperandsBecomes): New procedure function.
(TypeCheckBecomes): New procedure function.
(PerformFoldBecomes): New procedure.
* gm2-compiler/M2Range.mod (FoldAssignment): Call
AssignmentTypeCompatible to check des expr compatibility.
* gm2-compiler/M2SymInit.mod (CheckReadBeforeInitQuad): Remove
parameter lst.
(FilterCheckReadBeforeInitQuad): Remove parameter lst.
(CheckReadBeforeInitFirstBasicBlock): Remove parameter lst.
Call FilterCheckReadBeforeInitQuad without lst.
gcc/testsuite/ChangeLog:
PR modula2/112946
* gm2/iso/fail/badassignment.mod: New test.
* gm2/iso/fail/badexpression.mod: New test.
* gm2/iso/fail/badexpression2.mod: New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
The section attribute currently has no effect on templates because the
call to set_decl_section_name only happens at parse time (on the
dependent decl) and not also at instantiation time. This patch fixes
this by propagating the section name from the template to the
instantiation.
PR c++/70435
PR c++/88061
gcc/cp/ChangeLog:
* pt.cc (tsubst_function_decl): Propagate DECL_SECTION_NAME
via set_decl_section_name.
(tsubst_decl) <case VAR_DECL>: Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/ext/attr-section1.C: New test.
* g++.dg/ext/attr-section1a.C: New test.
* g++.dg/ext/attr-section2.C: New test.
* g++.dg/ext/attr-section2a.C: New test.
* g++.dg/ext/attr-section2b.C: New test.
We need to look through TEMPLATE_DECL when looking up the abi_tag
attribute (as with other function/variable declaration attributes).
PR c++/109715
gcc/cp/ChangeLog:
* mangle.cc (get_abi_tags): Strip TEMPLATE_DECL before looking
up the abi_tag attribute.
gcc/testsuite/ChangeLog:
* g++.dg/abi/abi-tag25.C: New test.
* g++.dg/abi/abi-tag25a.C: New test.
libstdc++-v3/ChangeLog:
* src/c++23/print.cc (__write_to_terminal) [_WIN32]: If handle
does not refer to the console then just write to it using normal
file I/O.
* testsuite/27_io/print/2.cc (as_printed_to_terminal): Print
error message on failure.
(test_utf16_transcoding): Adjust for as_printed_to_terminal
modifying its argument.
Since we don't need to do anything special to print Unicode on
non-Windows targets, we might as well just use std::vprint_nonunicode to
implement std::vprint_unicode. Removing the duplicated code should
reduce code size in cases where those calls aren't inlined.
Also use an RAII type for the unused case where a non-Windows target
calls __open_terminal(streambuf*) and needs to fclose the result. This
makes the code futureproof in case we ever start using the
__write_terminal function for non-Windows targets.
libstdc++-v3/ChangeLog:
* include/std/ostream (vprint_unicode) [_WIN32]: Use RAII guard.
(vprint_unicode) [!_WIN32]: Just call vprint_nonunicode.
* include/std/print (vprint_unicode) [!_WIN32]: Likewise.
Tim Song pointed out that although std::print behaves as a formatted
output function, it does "determine padding" using the stream's flags.
libstdc++-v3/ChangeLog:
* include/std/ostream (vprint_nonunicode, vprint_unicode): Do
not insert padding.
* testsuite/27_io/basic_ostream/print/1.cc: Adjust expected
behaviour.
Enable lock-free 128-bit atomics on AArch64. This is backwards compatible with
existing binaries (as for these GCC always calls into libatomic, so all 128-bit
atomic uses in a process are switched), gives better performance than locking
atomics and is what most users expect.
128-bit atomic loads use a load/store exclusive loop if LSE2 is not supported.
This results in an implicit store which is invisible to software as long as the
given address is writeable (which will be true when using atomics in real code).
This doesn't yet change __atomic_is_lock_free eventhough all atomics are finally
lock-free on AArch64.
libatomic:
* config/linux/aarch64/atomic_16.S: Implement lock-free ARMv8.0 atomics.
(libat_exchange_16): Merge RELEASE and ACQ_REL/SEQ_CST cases.
* config/linux/aarch64/host-config.h: Use atomic_16.S for baseline v8.0.
Add support for inline memmove expansions. The generated code is identical
as for memcpy, except that all loads are emitted before stores rather than
being interleaved. The maximum size is 256 bytes which requires at most 16
registers.
gcc/ChangeLog:
* config/aarch64/aarch64.opt (aarch64_mops_memmove_size_threshold):
Change default.
* config/aarch64/aarch64.md (cpymemdi): Add a parameter.
(movmemdi): Call aarch64_expand_cpymem.
* config/aarch64/aarch64.cc (aarch64_copy_one_block): Rename function,
simplify, support storing generated loads/stores.
(aarch64_expand_cpymem): Support expansion of memmove.
* config/aarch64/aarch64-protos.h (aarch64_expand_cpymem): Add bool arg.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/memmove.c: Add new test.
* gcc.target/aarch64/memmove2.c: Likewise.
..., as in 'libgomp.c-c++-common/map-arrayofstruct-{2,3}.c'.
Minor fix-up for commit f5745dc142
"OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic".
libgomp/
* testsuite/libgomp.fortran/map-subarray-5.f90: Restrict
'dg-output's to 'target offload_device_nonshared_as'.
Given what I saw in the aarch64/arm psABIs for BITINT_TYPE, as I said
earlier I'm afraid we need to differentiate between the limb mode/precision
specified in the psABIs (what is used to decide how it is actually passed,
aligned or what size it has) vs. what limb mode/precision should be used
during bitint lowering and in the libgcc bitint APIs.
While in the x86_64 psABI a limb is 64-bit, which is perfect for both,
that is a wordsize which we can perform operations natively in,
e.g. aarch64 wants 128-bit limbs for alignment/sizing purposes, but
on the bitint lowering side I believe it would result in terribly bad code
and on the libgcc side wouldn't work at all (because it relies there on
longlong.h support).
So, the following patch makes it possible for aarch64 to use TImode
as abi_limb_mode for _BitInt(129) and larger, while using DImode as
limb_mode.
2023-12-15 Jakub Jelinek <jakub@redhat.com>
* target.h (struct bitint_info): Add abi_limb_mode member, adjust
comment.
* target.def (bitint_type_info): Mention abi_limb_mode instead of
limb_mode.
* varasm.cc (output_constant): Use abi_limb_mode rather than
limb_mode.
* stor-layout.cc (finish_bitfield_representative): Likewise. Assert
that if precision is smaller or equal to abi_limb_mode precision or
if info.big_endian is different from WORDS_BIG_ENDIAN, info.limb_mode
must be the same as info.abi_limb_mode.
(layout_type): Use abi_limb_mode rather than limb_mode.
* gimple-fold.cc (clear_padding_bitint_needs_padding_p): Likewise.
(clear_padding_type): Likewise.
* config/i386/i386.cc (ix86_bitint_type_info): Also set
info->abi_limb_mode.
* doc/tm.texi: Regenerated.
This patch adds support for non-constant component offsets in "map"
clauses for OpenMP (and the equivalants for OpenACC), which are not able
to be sorted into order at compile time. Normally struct accesses in
such clauses are gathered together and sorted into increasing address
order after a "GOMP_MAP_STRUCT" node: if we have variable indices,
that is no longer possible.
This version of the patch scales back the previously-posted version to
merely add a diagnostic for incorrect usage of component accesses with
variably-indexed arrays of structs: the only permitted variant is where
we have multiple indices that are the same, but we could not prove so
at compile time. Rather than silently producing the wrong result for
cases where the indices are in fact different, we error out (e.g.,
"map(dtarr(i)%arrptr, dtarr(j)%arrptr(4:8))", for different i/j).
For now, multiple *constant* array indices are still supported (see
map-arrayofstruct-1.c). That could perhaps be addressed with a follow-up
patch, if necessary.
This version of the patch renumbers the GOMP_MAP_STRUCT_UNORD kind to
avoid clashing with the OpenACC "non-contiguous" dynamic array support
(though that is not yet applied to mainline).
2023-08-18 Julian Brown <julian@codesourcery.com>
gcc/
* gimplify.cc (extract_base_bit_offset): Add VARIABLE_OFFSET parameter.
(omp_get_attachment, omp_group_last, omp_group_base,
omp_directive_maps_explicitly): Add GOMP_MAP_STRUCT_UNORD support.
(omp_accumulate_sibling_list): Update calls to extract_base_bit_offset.
Support GOMP_MAP_STRUCT_UNORD.
(omp_build_struct_sibling_lists, gimplify_scan_omp_clauses,
gimplify_adjust_omp_clauses, gimplify_omp_target_update): Add
GOMP_MAP_STRUCT_UNORD support.
* omp-low.cc (lower_omp_target): Add GOMP_MAP_STRUCT_UNORD support.
* tree-pretty-print.cc (dump_omp_clause): Likewise.
include/
* gomp-constants.h (gomp_map_kind): Add GOMP_MAP_STRUCT_UNORD.
libgomp/
* oacc-mem.c (find_group_last, goacc_enter_data_internal,
goacc_exit_data_internal, GOACC_enter_exit_data): Add
GOMP_MAP_STRUCT_UNORD support.
* target.c (gomp_map_vars_internal): Add GOMP_MAP_STRUCT_UNORD support.
Detect incorrect use of variable indexing of arrays of structs.
(GOMP_target_enter_exit_data, gomp_target_task_fn): Add
GOMP_MAP_STRUCT_UNORD support.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c: New test.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c: New test.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c: New test.
* testsuite/libgomp.fortran/map-subarray-5.f90: New test.
This patch uses the new force_reload_address routine added by the
previous patch to fix PR112906.
gcc/ChangeLog:
PR target/112906
* config/aarch64/aarch64-sve.md (@aarch64_vec_duplicate_vq<mode>_le):
Use force_reload_address to reload addresses that aren't suitable for
ld1rq in the pre-RA splitter.
gcc/testsuite/ChangeLog:
PR target/112906
* gcc.target/aarch64/sve/acle/general/pr112906.c: New test.
In PR112906 we ICE because we try to use force_reg to reload an
auto-increment address, but force_reg can't do this.
With the aim of fixing the PR by supporting reloading arbitrary
addresses in pre-RA splitters, this patch generalizes
lra-constraints.cc:emit_inc and makes it available to the rest of the
compiler by moving the generalized version to emit-rtl.cc.
We observe that the separate IN parameter to LRA's emit_inc is
redundant, since the function is static and is only (statically) called
once in lra-constraints.cc, with in == value. As such, we drop the IN
parameter and simplify the code accordingly.
We wrap the emit_inc code in a virtual class to allow LRA to override
how reload pseudos are created, thereby preserving the existing LRA
behaviour as much as possible.
We then add a second (higher-level) routine to emit-rtl.cc,
force_reload_address, which can reload arbitrary addresses. This uses
the generalized emit_inc code to handle the RTX_AUTOINC case. The
second patch in this series uses force_reload_address to fix PR112906.
Since we intend to call address_reload_context::emit_autoinc from within
splitters, and the code lifted from LRA calls recog, we have to avoid
clobbering recog_data. We do this by introducing a new RAII class for
saving/restoring recog_data on the stack.
gcc/ChangeLog:
PR target/112906
* emit-rtl.cc (address_reload_context::emit_autoinc): New.
(force_reload_address): New.
* emit-rtl.h (struct address_reload_context): Declare.
(force_reload_address): Declare.
* lra-constraints.cc (class lra_autoinc_reload_context): New.
(emit_inc): Drop IN parameter, invoke
code moved to emit-rtl.cc:address_reload_context::emit_autoinc.
(curr_insn_transform): Drop redundant IN parameter in call to
emit_inc.
* recog.h (class recog_data_saver): New.
For example, for GCN or nvptx target configurations, using newlib:
FAIL: gcc.dg/pr110279-2.c (test for excess errors)
UNRESOLVED: gcc.dg/pr110279-2.c scan-tree-dump-not reassoc2 "was chosen for reassociation"
UNRESOLVED: gcc.dg/pr110279-2.c scan-tree-dump-times optimized "\\.FMA " 3
[...]/source-gcc/gcc/testsuite/gcc.dg/pr110279-2.c:11:1: error: unknown type name '__attribute_noinline__'
[...]/source-gcc/gcc/testsuite/gcc.dg/pr110279-2.c:12:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'foo'
We cannot assume 'stdio.h' to define '__attribute_noinline__' -- but then, that
also isn't necessary for this test case (there is nothing to inline into).
gcc/testsuite/
* gcc.dg/pr110279-2.c: Don't '#include <stdio.h>'. Remove
'__attribute_noinline__'.
While looking at a bitint ICE, I've noticed we don't optimize
in f1 and f5 functions below the 2 casts into just one at GIMPLE,
even when optimize it in convert_to_integer if it appears in the same
stmt. The large match.pd simplification of two conversions in a row
has many complex rules and as the testcase shows, everything else from
the narrowest -> widest -> prec_in_between all integer conversions
is already handled, either because the inside_unsignedp == inter_unsignedp
rule kicks in, or the
&& ((inter_unsignedp && inter_prec > inside_prec)
== (final_unsignedp && final_prec > inter_prec))
one, but there is no reason why sign extension to from narrowest to
widest type followed by truncation to something in between can't be
done just as sign extension from narrowest to the final type. After all,
if the widest type is signed rather than unsigned, regardless of the final
type signedness we already handle it that way.
And since PR93044 we also handle it if the final precision is not wider
than the inside precision.
2023-12-15 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113024
* match.pd (two conversions in a row): Simplify scalar integer
sign-extension followed by truncation.
* gcc.dg/tree-ssa/pr113024.c: New test.
As shown in the testcase, .{ADD,SUB,MUL}_OVERFLOW calls are another
exception to the middle/large/huge _BitInt discovery through SSA_NAMEs
next to stores of INTEGER_CSTs to memory and their conversions to
floating point.
The calls can have normal COMPLEX_TYPE with INTEGER_TYPE elts return type
(or BITINT_TYPE with small precision) and one of the arguments can be
SSA_NAME with an INTEGER_TYPE or small BITINT_TYPE as well; still, when
there is an INTEGER_CST argument with large/huge BITINT_TYPE, we need to
lower it that way.
2023-12-15 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113003
* gimple-lower-bitint.cc (arith_overflow_arg_kind): New function.
(gimple_lower_bitint): Use it to catch .{ADD,SUB,MUL}_OVERFLOW
calls with large/huge INTEGER_CST arguments.
* gcc.dg/bitint-54.c: New test.
This test fails on darwin as it does not support _Decimal64,
so require dfp for it.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr112943.c: Require dfp.
Currently move_max follows the tuning feature first, but ideally it
should sync with prefer-vector-width when it is explicitly set to keep
vector move and operation with same vector size.
gcc/ChangeLog:
PR target/112824
* config/i386/i386-options.cc (ix86_option_override_internal):
Sync ix86_move_max/ix86_store_max with prefer_vector_width when
it is explicitly set.
gcc/testsuite/ChangeLog:
PR target/112824
* gcc.target/i386/pieces-memset-45.c: Remove
-mprefer-vector-width=256.
* g++.target/i386/pr112824-1.C: New test.
gcc/ChangeLog:
* config/i386/driver-i386.cc (host_detect_local_cpu): Do not
set Grand Ridge depending on RAO-INT.
* config/i386/i386.h: Remove PTA_RAOINT from PTA_GRANDRIDGE.
* doc/invoke.texi: Adjust documentation.
Notice current generic vector cost model make PR112387 failed to vectorize.
Adapt it same as ARM SVE generic vector cost model which can fix it.
Committed as it is obvious fix.
PR target/112387
gcc/ChangeLog:
* config/riscv/riscv.cc: Adapt generic cost model same ARM SVE.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr112387.c: Moved to...
* gcc.dg/vect/costmodel/riscv/rvv/pr112387-1.c: ...here.
* gcc.dg/vect/costmodel/riscv/rvv/pr112387-2.c: New test.
Follow Richard's suggestions, we should not model address cost in the loop
vectorizer for select_vl or decrement IV since other style vectorization doesn't
do that.
To make cost model comparison apple to apple.
This patch set COST from 2 to 1 which turns out have better codegen
in various codegen for RVV.
Ok for trunk ?
PR target/111153
gcc/ChangeLog:
* tree-vect-loop.cc (vect_estimate_min_profitable_iters):
Remove address cost for select_vl/decrement IV.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr111153.c: Moved to...
* gcc.dg/vect/costmodel/riscv/rvv/pr11153-2.c: ...here.
* gcc.dg/vect/costmodel/riscv/rvv/pr111153-1.c: New test.
This adds the C++23 std::print functions, which use std::format to write
to a FILE stream or std::ostream (defaulting to stdout).
The new extern symbols are in the libstdc++exp.a archive, so we aren't
committing to stable symbols in the DSO yet. There's a UTF-8 validating
and transcoding function added by this change. That can certainly be
optimized, but it's internal to libstdc++exp.a so can be tweaked later
at leisure.
Currently the external symbols work for all targets, but are only
actually used for Windows, where it's necessary to transcode to UTF-16
to write to the console. The standard seems to encourage us to also
diagnose invalid UTF-8 for non-Windows targets when writing to a
terminal (and only when writing to a terminal), but I'm reliably
informed that that wasn't the intent of the wording. Checking for
invalid UTF-8 sequences only needs to happen for Windows, which is good
as checking for a terminal requires a call to isatty, and on Linux that
uses an ioctl syscall, which would make std::print ten times slower!
Testing the std::print behaviour is difficult if it depends on whether
the output stream is connected to a Windows console or not, as we can't
(as far as I know) do that non-interactively in DejaGNU. One of the new
tests uses the internal __write_to_terminal function directly. That
allows us to verify its UTF-8 error handling on POSIX targets, even
though that's not actually used by std::print. For Windows, that
__write_to_terminal function transcodes to UTF-16 but then uses
WriteConsoleW which fails unless it really is writing to the console.
That means the 27_io/print/2.cc test FAILs on Windows. The UTF-16
transcoding has been manually tested using mingw-w64 and Wine, and
appears to work.
libstdc++-v3/ChangeLog:
PR libstdc++/107760
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/version.def (__cpp_lib_print): Define.
* include/bits/version.h: Regenerate.
* include/std/format (__literal_encoding_is_utf8): New function.
(_Seq_sink::view()): New member function.
* include/std/ostream (vprintf_nonunicode, vprintf_unicode)
(print, println): New functions.
* include/std/print: New file.
* src/c++23/Makefile.am: Add new source file.
* src/c++23/Makefile.in: Regenerate.
* src/c++23/print.cc: New file.
* testsuite/27_io/basic_ostream/print/1.cc: New test.
* testsuite/27_io/print/1.cc: New test.
* testsuite/27_io/print/2.cc: New test.
Fix an incorrect call to _Sink::_M_reserve() which should have passed
the __n parameter. This was not actually a problem because it was in an
discarded statement, because only the _Seq_sink<basic_string<C>>
specialization was used.
Also add some branch prediction hints, explanatory comments, and debug
mode assertions to _Seq_sink.
libstdc++-v3/ChangeLog:
* include/std/format (_Seq_sink): Fix missing argument in
discarded statement. Add comments, likely/unlikely attributes
and debug assertions as sanity checks.
These tests are expected to run interactively, with the output checked
by eye. Nobody ever does that, but we can at least use dg-output to
check that the output is as expected.
libstdc++-v3/ChangeLog:
* testsuite/27_io/objects/char/2.cc: Use dg-output.
* testsuite/27_io/objects/wchar_t/2.cc: Use dg-output.
I got the order of arguments to std::format_to wrong. It was in a
discarded statement, for a case which wasn't being tested.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (__formatter_chrono::_M_S): Fix order
of arguments to std::format_to.
* testsuite/20_util/duration/io.cc: Test subsecond duration with
floating-point rep.
This is sort of like r14-5514, but at block scope. Consider
struct A { A(int, int); };
void
g (int a)
{
A bar(auto(a), 42); // not a fn decl
}
where we emit error: 'auto' parameter not permitted in this context
which is bogus -- bar doesn't declare a function, so the auto is OK,
but we don't know it till we've seen the second argument. The error
comes from grokdeclarator invoked just after we've parsed the auto(a).
A possible approach seems to be to delay the auto parameter checking
and only check once we know we are indeed dealing with a function
declaration. For tparms, we should still emit the error right away.
PR c++/112482
gcc/cp/ChangeLog:
* decl.cc (grokdeclarator): Do not issue the auto parameter error while
tentatively parsing a function parameter.
* parser.cc (cp_parser_parameter_declaration_clause): Check it here.
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/auto-fncast15.C: New test.
After r14-2667-gceae1400cf24f329393e96dd9720, we force a constant to a register
if it is shared with one of the other operands. The problem is used the comparison
mode for the register but that could be different from the operand mode. This
causes some issues on some targets.
To fix it, we need to make sure the mode of the comparison matches the mode
of the other operands, before we can compare the constants (CONST_INT has no
modes so compare_rtx returns true if they have the same value even if the usage
is in a different mode).
Bootstrapped and tested on both aarch64-linux-gnu and x86_64-linux.
PR middle-end/111260
gcc/ChangeLog:
* optabs.cc (emit_conditional_move): Change the modes to be
equal before forcing the constant to a register.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/condmove-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
This test shows that we cannot clear *walk_subtrees in
cp_fold_immediate_r when we're in_immediate_context, because that
checks even e.g. sk_template_parms, and, as the comment says, affects
cp_fold_r as well. Here we had an expression with
min ((long int) VIEW_CONVERT_EXPR<long unsigned int>(bytecount), (long int) <<< Unknown tree: sizeof_expr
(int) <<< error >>> >>>)
as its sub-expression, and we never evaluated that into
min ((long int) bytecount, 4)
so the SIZEOF_EXPR leaked into the middle end. We need to make sure
we are calling cp_fold on the SIZEOF_EXPR.
PR c++/112869
gcc/cp/ChangeLog:
* cp-gimplify.cc (cp_fold_immediate_r): Check cp_unevaluated_operand
and DECL_IMMEDIATE_FUNCTION_P rather than in_immediate_context.
gcc/testsuite/ChangeLog:
* g++.dg/template/sizeof18.C: New test.
Recent commit f5fc001a84
"aarch64: enable mixed-types for aarch64 simdclones" added lines to those
test cases and GCN-specific line numbers got out of sync, which had
originally gotten added in commit b73c49f6f8
"amdgcn: OpenMP SIMD routine support".
gcc/testsuite/
* gcc.dg/vect/vect-simd-clone-1.c: Update GCN 'dg-warning's.
* gcc.dg/vect/vect-simd-clone-2.c: Likewise.
* gcc.dg/vect/vect-simd-clone-3.c: Likewise.
* gcc.dg/vect/vect-simd-clone-4.c: Likewise.
* gcc.dg/vect/vect-simd-clone-5.c: Likewise.
* gcc.dg/vect/vect-simd-clone-8.c: Likewise.
Add a new parameter param_fully_pipelined_fma. If it is non-zero,
reassociation considers the benefit of parallelizing FMA's
multiplication part and addition part, assuming FMUL and FMA use the
same units that can also do FADD.
With the patch and new option, there's ~2% improvement in spec2017
508.namd on AmpereOne. (The other options are "-Ofast -mcpu=ampere1
-flto".)
PR tree-optimization/110279
gcc/ChangeLog:
* doc/invoke.texi: New parameter fully-pipelined-fma.
* params.opt: New parameter fully-pipelined-fma.
* tree-ssa-reassoc.cc (get_mult_latency_consider_fma): Return
the latency of MULT_EXPRs that can't be hidden by the FMAs.
(get_reassociation_width): Search for a smaller width
considering the benefit of fully pipelined FMA.
(rank_ops_for_fma): Return the number of MULT_EXPRs.
(reassociate_bb): Pass the number of MULT_EXPRs to
get_reassociation_width; avoid calling
get_reassociation_width twice.
gcc/testsuite/ChangeLog:
* gcc.dg/pr110279-2.c: New test.
PR fortran/112873
gcc/fortran/ChangeLog:
* gfortran.texi: Update to reflect the changes.
* intrinsic.cc (add_functions): Update the standard that the
various degree trigonometric functions have been described in.
(gfc_check_intrinsic_standard): Add an error string for F2023.
* intrinsic.texi: Update accordingly.
The test says that CTAD from inherited constructors doesn't work
before C++23 so we should use c++20_down for the error.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/class-deduction67.C: Correct dg-error target.
In extract_bit_field_1 we try to get a better vector mode before
extracting from it. Better refers to the case when the requested target
mode does not equal the inner mode of the vector to extract from and we
have an equivalent tieable vector mode with a fitting inner mode.
On riscv this triggered an ICE (PR112999) because we would take the
detour of extracting from a mask-mode vector via a vector integer mode.
One element of that mode could be subreg-punned with TImode which, in
turn, would need to be operated on in DImode chunks.
This patch adds
&& known_eq (bitsize, GET_MODE_UNIT_PRECISION (new_mode))
&& multiple_p (bitnum, GET_MODE_UNIT_PRECISION (new_mode))
to the list of criteria for a better mode.
gcc/ChangeLog:
PR target/112999
* expmed.cc (extract_bit_field_1): Ensure better mode
has fitting unit_precision.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112999.c: New test.
This changes the vec_extract path of extract_bit_field to use
GET_MODE_PRECISION instead of GET_MODE_BITSIZE and uses
the mode obtained from insn_data[icode].operand[0] as target mode.
Also, it adds a vec_extract<mode>bi expander for riscv that maps
to vec_extract<mode>qi. This fixes an ICE on riscv where we did
not find a vec_extract optab and continued with the generic code
which requires 1-byte alignment that riscv mask modes do not provide.
Apart from that it adds poly_int support to riscv's vec_extract
expander and makes the RVV..BImode -> QImode expander call
emit_vec_extract in order not to duplicate code.
gcc/ChangeLog:
PR target/112773
* config/riscv/autovec.md (vec_extract<mode>bi): New expander
calling vec_extract<mode>qi.
* config/riscv/riscv-protos.h (riscv_legitimize_poly_move):
Export.
(emit_vec_extract): Change argument from poly_int64 to rtx.
* config/riscv/riscv-v.cc (shuffle_extract_and_slide1up_patterns):
Ditto.
* config/riscv/riscv.cc (riscv_legitimize_poly_move): Export.
(riscv_legitimize_move): Use rtx instead of poly_int64.
* expmed.cc (store_bit_field_1): Change BITSIZE to PRECISION.
(extract_bit_field_1): Change BITSIZE to PRECISION and use
return mode from insn_data as target mode.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/partial/pr112773.c: New test.
As it stands, GCC doesn't document any public AArch64-specific operand
modifiers for use in inline asm. This patch fixes that by documenting
an initial set of public AArch64-specific operand modifiers.
gcc/ChangeLog:
* doc/extend.texi: Document AArch64 Operand Modifiers.
This makes constexpr std::vector (mostly) work in Debug Mode. All safe
iterator instrumentation and checking is disabled during constant
evaluation, because it requires mutex locks and calls to non-inline
functions defined in libstdc++.so. It should be OK to disable the safety
checks, because most UB should be detected during constant evaluation
anyway.
We could try to enable the full checking in constexpr, but it would mean
wrapping all the non-inline functions like _M_attach with an inline
_M_constexpr_attach that does the iterator housekeeping inline without
mutex locks when called for constant evaluation, and calls the
non-inline function at runtime. That could be done in future if we find
that we've lost safety or useful checking by disabling the safe
iterators.
There are a few test failures in C++20 mode, which I'm unable to
explain. The _Safe_iterator::operator++() member gives errors for using
non-constexpr functions during constant evaluation, even though those
functions are guarded by std::is_constant_evaluated() checks. The same
code works fine for C++23 and up.
libstdc++-v3/ChangeLog:
PR libstdc++/109536
* include/bits/c++config (__glibcxx_constexpr_assert): Remove
macro.
* include/bits/stl_algobase.h (__niter_base, __copy_move_a)
(__copy_move_backward_a, __fill_a, __fill_n_a, __equal_aux)
(__lexicographical_compare_aux): Add constexpr to overloads for
debug mode iterators.
* include/debug/helper_functions.h (__unsafe): Add constexpr.
* include/debug/macros.h (_GLIBCXX_DEBUG_VERIFY_COND_AT): Remove
macro, folding it into ...
(_GLIBCXX_DEBUG_VERIFY_AT_F): ... here. Do not use
__glibcxx_constexpr_assert.
* include/debug/safe_base.h (_Safe_iterator_base): Add constexpr
to some member functions. Omit attaching, detaching and checking
operations during constant evaluation.
* include/debug/safe_container.h (_Safe_container): Likewise.
* include/debug/safe_iterator.h (_Safe_iterator): Likewise.
* include/debug/safe_iterator.tcc (__niter_base, __copy_move_a)
(__copy_move_backward_a, __fill_a, __fill_n_a, __equal_aux)
(__lexicographical_compare_aux): Add constexpr.
* include/debug/vector (_Safe_vector, vector): Add constexpr.
Omit safe iterator operations during constant evaluation.
* testsuite/23_containers/vector/bool/capacity/constexpr.cc:
Remove dg-xfail-if for debug mode.
* testsuite/23_containers/vector/bool/cmp_c++20.cc: Likewise.
* testsuite/23_containers/vector/bool/cons/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/element_access/1.cc:
Likewise.
* testsuite/23_containers/vector/bool/element_access/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/modifiers/assign/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/modifiers/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/modifiers/swap/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/capacity/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/cmp_c++20.cc: Likewise.
* testsuite/23_containers/vector/cons/constexpr.cc: Likewise.
* testsuite/23_containers/vector/data_access/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/element_access/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/modifiers/assign/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/modifiers/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/modifiers/swap/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/cons/destructible_debug_neg.cc:
Adjust dg-error line number.