Starting with GCC 14 we have the nice URLification of the options printed
in diagnostics, say for in
test.c:4:23: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long int’ [-Wformat=]
the -Wformat= is underlined in some terminals and hovering on it shows
https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wformat
link.
This works nicely on the GCC trunk, where the online documentation is
regenerated every day from a cron job and more importantly, people rarely
use the trunk snapshots for too long, so it is unlikely that further changes
in the documentation will make too many links stale, because users will
simply regularly update to newer snapshots.
I think it doesn't work properly on release branches though.
Some users only use the relased versions (i.e. MAJOR.MINOR.0) from tarballs
but can use them for a couple of years, others use snapshots from the
release branches, but again they could be in use for months or years and
the above mentioned online docs which represent just the GCC trunk might
diverge significantly.
Now, for the relases we always publish also online docs for the release,
which unlike the trunk online docs will not change further, under
e.g.
https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Warning-Options.html#index-Wformat
or
https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/Warning-Options.html#index-Wformat
etc.
So, I think at least for the MAJOR.MINOR.0 releases we want to use
URLs like above rather than the trunk ones and we can use the same process
of updating *.opt.urls as well for that.
For the snapshots from release branches, we don't have such docs.
One option (implemented in the patch below for the URL printing side) is
point to the MAJOR.MINOR.0 docs even for MAJOR.MINOR.1 snapshots.
Most of the links will work fine, for options newly added on the release
branches (rare thing but still happens) can have until the next release
no URLs for them and get them with the next point release.
The question is what to do about make regenerate-opt-urls for the release
branch snapshots. Either just document that users shouldn't
make regenerate-opt-urls on release branches (and filter out *.opt.urls
changes from their commits), add make regenerate-opt-urls task be RM
responsibility before making first release candidate from a branch and
adjust the autoregen CI to know about that. Or add a separate goal
which instead of relying on make html created files would download
copy of the html files from the last release from web (kind of web
mirroring the https://gcc.gnu.org/onlinedocs/gcc-14.1.0/ subtree locally)
and doing regenerate-opt-urls on top of that? But how to catch the
point when first release candidate is made and we want to update to
what will be the URLs once the release is made (but will be stale URLs
for a week or so)?
Another option would be to add to cron daily regeneration of the online
docs for the release branches. I don't think that is a good idea though,
because as I wrote earlier, not all users update to the latest snapshot
frequently, so there can be users that use gcc 13.1.1 20230525 for months
or years, and other users which use gcc 13.1.1 20230615 for years etc.
Another question is what is most sensible for users who want to override
the default root and use the --with-documentation-root-url= configure
option. Do we expect them to grab the whole onlinedocs tree or for release
branches at least include gcc-14.1.0/ subdirectory under the root?
If so, the patch below deals with that. Or should we just change the
default documentation root url, so if user doesn't specify
--with-documentation-root-url= and we are on a release branch, default that
to https://gcc.gnu.org/onlinedocs/gcc-14.1.0/ or
https://gcc.gnu.org/onlinedocs/gcc-14.2.0/ etc. and don't add any infix in
get_option_url/make_doc_url, but when people supply their own, let them
point to the root of the tree which contains the right docs?
Then such changes would go into gcc/configure.ac, some case based on
"$gcc_version", from that decide if it is a release branch or trunk.
2024-04-17 Jakub Jelinek <jakub@redhat.com>
PR other/114738
* opts.cc (get_option_url): On release branches append
gcc-MAJOR.MINOR.0/ after DOCUMENTATION_ROOT_URL.
* gcc-urlifier.cc (gcc_urlifier::make_doc_url): Likewise.
As discussed in the PR, aclocal.m4 and configure were incorrectly
regenerated at some point.
2024-04-17 Christophe Lyon <christophe.lyon@linaro.org>
PR preprocessor/114748
libcpp/
* aclocal.m4: Regenerate.
* configure: Regenerate.
The following makes sure to reset LOOP_VINFO_USING_PARTIAL_VECTORS_P
to its default of false when re-trying without SLP as otherwise
analysis may run into bogus asserts.
PR tree-optimization/114749
* tree-vect-loop.cc (vect_analyze_loop_2): Reset
LOOP_VINFO_USING_PARTIAL_VECTORS_P when re-trying without SLP.
... as made apparent by a number of unexpectedly UNSUPPORTED test cases, which
now all turn into PASS, with just one exception:
PASS: gcc.dg/vect/vect-early-break_124-pr114403.c (test for excess errors)
PASS: gcc.dg/vect/vect-early-break_124-pr114403.c execution test
FAIL: gcc.dg/vect/vect-early-break_124-pr114403.c scan-tree-dump vect "LOOP VECTORIZED"
..., which needs to be looked into, separately.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect_long_long):
Enable for GCN.
This resolves failing tests in check-simd.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
PR libstdc++/114750
* include/experimental/bits/simd_builtin.h
(_SimdImplBuiltin::_S_load, _S_store): Fall back to copying
scalars if the memory type cannot be vectorized for the target.
.ABNORMAL_DISPATCHER is currently the only internal function with
ECF_NORETURN, and asan likes to instrument ECF_NORETURN calls by adding
some builtin call before them, which breaks the .ABNORMAL_DISPATCHER
discovery added in gsi_safe_*.
The following patch fixes asan not to instrument .ABNORMAL_DISPATCHER
calls, like it doesn't instrument a couple of specific builtin calls
as well.
2024-04-17 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/114743
* asan.cc (maybe_instrument_call): Don't instrument calls to
.ABNORMAL_DISPATCHER.
* gcc.dg/asan/pr112709-2.c (freddy): New function from
gcc.dg/ubsan/pr112709-2.c version of the test.
The testcase had the wrong indices in the buffer check loop.
gcc/testsuite/ChangeLog:
PR tree-optimization/114403
* gcc.dg/vect/vect-early-break_124-pr114403.c: Fix check loop.
F2008 requires for ALLOCATE with SOURCE= or MOLD= specifier that the kind
type parameters of allocate-object and source-expr have the same values.
Add compile-time diagnostics for different character length and a runtime
check (under -fcheck=bounds). Use length from allocate-object to prevent
heap corruption and to allow string padding or truncation on assignment.
gcc/fortran/ChangeLog:
PR fortran/113793
* resolve.cc (resolve_allocate_expr): Reject ALLOCATE with SOURCE=
or MOLD= specifier for unequal length.
* trans-stmt.cc (gfc_trans_allocate): If an allocatable character
variable has fixed length, use it and do not use the source length.
With bounds-checking enabled, add a runtime check for same length.
gcc/testsuite/ChangeLog:
PR fortran/113793
* gfortran.dg/allocate_with_source_29.f90: New test.
* gfortran.dg/allocate_with_source_30.f90: New test.
* gfortran.dg/allocate_with_source_31.f90: New test.
This just adds a clause to make it more obvious that the vector_size
attribute extension works with typedefs.
Note this whole section needs a rewrite to be a similar format as other
extensions. But that is for another day.
gcc/ChangeLog:
PR c/92880
* doc/extend.texi (Using Vector Instructions): Add that
the base_types could be a typedef of them.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
The following fixes a DFS walk issue when identifying to be ignored
latch edges. We have (bogus) SLP_TREE_REPRESENTATIVEs for VEC_PERM
nodes so those have to be explicitly ignored as possibly being PHIs.
PR tree-optimization/114736
* tree-vect-slp.cc (vect_optimize_slp_pass::is_cfg_latch_edge):
Do not consider VEC_PERM_EXPRs as PHI use.
* gfortran.dg/vect/pr114736.f90: New testcase.
The neg induction vectorization code isn't prepared to deal with
single element vectors.
PR tree-optimization/114733
* tree-vect-loop.cc (vectorizable_nonlinear_induction): Reject
neg induction vectorization of single element vectors.
* gcc.dg/vect/pr114733.c: New testcase.
This patch adjusts the implementation of acc_map_data/acc_unmap_data API library
routines to more fit the description in the OpenACC 2.7 specification.
Instead of using REFCOUNT_INFINITY, we now define a REFCOUNT_ACC_MAP_DATA
special value to mark acc_map_data-created mappings. Adjustment around
mapping related code to respect OpenACC semantics are also added.
libgomp/ChangeLog:
* libgomp.h (REFCOUNT_ACC_MAP_DATA): Define as (REFCOUNT_SPECIAL | 2).
* oacc-mem.c (acc_map_data): Adjust to use REFCOUNT_ACC_MAP_DATA,
initialize dynamic_refcount as 1.
(acc_unmap_data): Adjust to use REFCOUNT_ACC_MAP_DATA,
(goacc_map_var_existing): Add REFCOUNT_ACC_MAP_DATA case.
(goacc_exit_datum_1): Add REFCOUNT_ACC_MAP_DATA case, respect
REFCOUNT_ACC_MAP_DATA when decrementing/finalizing. Force lowest
dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA.
(goacc_enter_data_internal): Add REFCOUNT_ACC_MAP_DATA case.
* target.c (gomp_increment_refcount): Return early for
REFCOUNT_ACC_MAP_DATA case.
(gomp_decrement_refcount): Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-96.c: New testcase.
* testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c: Adjust
testcase error output scan test.
While studying the TYPE_CANONICAL/TYPE_STRUCTURAL_EQUALITY_P stuff,
I've noticed some nits in comments, the following patch fixes them.
2024-04-16 Jakub Jelinek <jakub@redhat.com>
* tree.cc (array_type_nelts): Ensure 2 spaces after . in comment
instead of just one.
(build_variant_type_copy): Likewise.
(tree_check_failed): Likewise.
(build_atomic_base): Likewise.
* ipa-free-lang-data.cc (fld_incomplete_type_of): Use an indefinite
article rather than a.
..., until <https://github.com/Rust-GCC/gccrs/issues/2898>
"'cargo' should build for the host system" is resolved.
Follow-up to commit 3e1e73fc99
"build: Check for cargo when building rust language".
* configure.ac (have_cargo): Force to "no" in Canadian cross
configurations
* configure: Regenerate.
Follow-up to commit 3e1e73fc99
"build: Check for cargo when building rust language":
On 2024-04-15T13:14:42+0200, I wrote:
> I now wonder: instead of 'AC_CHECK_TOOL', shouldn't this use
> 'AC_CHECK_PROG'? (We always want plain 'cargo', not host-prefixed
> 'aarch64-linux-gnu-cargo' etc., right?) I'll look into changing this.
* configure: Regenerate.
config/
* acx.m4 (ACX_PROG_CARGO): Use 'AC_CHECK_PROGS'.
https://eel.is/c++draft/bit.cast#3 says that std::bit_cast isn't constexpr
if To, From and the types of all subobjects have certain properties which the
check_bit_cast_type checks (such as it isn't a pointer, reference, union,
member pointer, volatile). The function doesn't cp_walk_tree though, so
I've missed one important case, for ARRAY_TYPEs we need to recurse on the
element type. I think we don't need to handle VECTOR_TYPEs/COMPLEX_TYPEs,
because those will not have a pointer/reference/union/member pointer in
the element type and if the element type is volatile, I think the whole
derived type is volatile as well.
2024-04-16 Jakub Jelinek <jakub@redhat.com>
PR c++/114706
* constexpr.cc (check_bit_cast_type): Handle ARRAY_TYPE.
* g++.dg/cpp2a/bit-cast17.C: New test.
When one of the two input operands is 0, ADD and IOR are functionally
equivalent.
ADD is slightly preferred over IOR because ADD has a higher likelihood
of being implemented as a compressed instruction when compared to IOR.
C.ADD uses the CR format with any of the 32 RVI registers availble,
while C.OR uses the CA format with limit to just 8 of them.
Conditional select, if zero case:
rd = (rc == 0) ? rs1 : rs2
before patch:
czero.nez rd, rs1, rc
czero.eqz rtmp, rs2, rc
or rd, rd, rtmp
after patch:
czero.eqz rd, rs1, rc
czero.nez rtmp, rs2, rc
add rd, rd, rtmp
Same trick applies for the conditional select, if non-zero case:
rd = (rc != 0) ? rs1 : rs2
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_conditional_move):
replace or with add when expanding zicond if possible.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zicond-prefer-add-to-or.c: New test.
The earlier patch for PR112938 arranged for volatile parms to be made
indirect in internal strub wrapped bodies.
The first problem that remained, more evident, was that the indirected
parameter remained volatile, despite the indirection, but it wasn't
regimplified, so indirecting it was malformed gimple.
Regimplifying turned out not to be needed. The best course of action
was to drop the volatility from the by-reference parm, that was being
unexpectedly inherited from the original volatile parm.
That exposed another problem: the dereferences would then lose their
volatile status, so we had to bring volatile back to them.
for gcc/ChangeLog
PR middle-end/112938
* ipa-strub.cc (pass_ipa_strub::execute): Drop volatility from
indirected parm.
(maybe_make_indirect): Restore volatility in dereferences.
for gcc/testsuite/ChangeLog
PR middle-end/112938
* g++.dg/strub-internal-pr112938.cc: New.
The regen bot recently flagged a difference in gotools/Makefile.in.
Trying it locally, it seems pretty random
for i in `seq 20`; do PATH=~/automake-1.15.1/bin:~/autoconf-2.69/bin:$PATH automake; echo -n `git diff Makefile.in | wc -l`" "; done; echo; for i in `seq 20`; do
+PATH=~/automake-1.15.1/bin:~/autoconf-2.69/bin:$PATH setarch x86_64 -R automake; echo -n `git diff Makefile.in | wc -l`" "; done; echo;
14 14 14 0 0 0 14 0 14 0 14 14 14 14 0 14 14 0 0 0
14 0 14 0 0 14 14 14 0 14 14 0 0 14 14 14 0 0 0 14
The 14 line git diff is
diff --git a/gotools/Makefile.in b/gotools/Makefile.in
index 36c2ec2abd3..f40883c39be 100644
--- a/gotools/Makefile.in
+++ b/gotools/Makefile.in
@@ -704,8 +704,8 @@ distclean-generic:
maintainer-clean-generic:
@echo "This command is intended for maintainers to use"
@echo "it deletes files that may require special tools to rebuild."
-@NATIVE_FALSE@install-exec-local:
@NATIVE_FALSE@uninstall-local:
+@NATIVE_FALSE@install-exec-local:
clean: clean-am
clean-am: clean-binPROGRAMS clean-generic clean-noinstPROGRAMS \
so whether it is
@NATIVE_FALSE@install-exec-local:
@NATIVE_FALSE@uninstall-local:
or
@NATIVE_FALSE@uninstall-local:
@NATIVE_FALSE@install-exec-local:
depends on some hash table traversal or what.
I'm not familiar with automake/m4 enough to debug that, so I'm
instead offering a workaround, with this patch the order is deterministic.
2024-04-15 Jakub Jelinek <jakub@redhat.com>
* Makefile.am (install-exec-local, uninstall-local): Add goals
on the else branch of if NATIVE to ensure reproducibility.
* Makefile.in: Regenerate.
We can replace "GCC <next>" with "GCC 14.1.0" now that we're nearing the
release.
libstdc++-v3/ChangeLog:
* doc/xml/manual/abi.xml: Replace "<next>" with "14.1.0".
* doc/html/manual/abi.html: Regenerate.
This C++26 change was just approved in Tokyo, in P2944R3. It adds
operator== and operator<=> overloads to std::reference_wrapper.
The operator<=> overloads in the paper cause compilation errors for any
type without <=> so they're implemented here with deduced return types
and constrained by a requires clause.
libstdc++-v3/ChangeLog:
* include/bits/refwrap.h (reference_wrapper): Add comparison
operators as proposed by P2944R3.
* include/bits/version.def (reference_wrapper): Define.
* include/bits/version.h: Regenerate.
* include/std/functional: Enable feature test macro.
* testsuite/20_util/reference_wrapper/compare.cc: New test.
I'm only treating this as a DR for C++20 for now, because it's less work
and only requires changes to operator== and operator<=>. To do this for
older standards would require changes to the six relational operators
used pre-C++20.
libstdc++-v3/ChangeLog:
PR libstdc++/113386
* include/bits/stl_pair.h (operator==, operator<=>): Support
heterogeneous comparisons, as per LWG 3865.
* testsuite/20_util/pair/comparison_operators/lwg3865.cc: New
test.
A negative delim value passed to std::istream::ignore can never match
any character in the stream, because the comparison is done using
traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never returns
negative values (except at EOF). The optimized version of ignore for the
std::istream specialization uses traits_type::find to locate the delim
character in the streambuf, which _can_ match a negative delim on
platforms where char is signed, but then we do another comparison using
eq_int_type which fails. The code then keeps looping forever, with
traits_type::find locating the character and traits_type::eq_int_type
saying it's not a match, so traits_type::find is used again and finds
the same character again.
A possible fix would be to check with eq_int_type after a successful
find, to see whether we really have a match. However, that would be
suboptimal since we know that a negative delimiter will never match
using eq_int_type. So a better fix is to adjust the check at the top of
the function that handles delim==eof(), so that we treat all negative
delim values as equivalent to EOF. That way we don't bother using find
to search for something that will never match with eq_int_type.
The version of ignore in the primary template doesn't need a change,
because it doesn't use traits_type::find, instead characters are
extracted one-by-one and always matched using eq_int_type. That avoids
the inconsistency between find and eq_int_type. The specialization for
std::wistream does use traits_type::find, but traits_type::to_int_type
is equivalent to an implicit conversion from wchar_t to wint_t, so
passing a wchar_t directly to ignore without using to_int_type works.
libstdc++-v3/ChangeLog:
PR libstdc++/93672
* src/c++98/istream.cc (istream::ignore(streamsize, int_type)):
Treat all negative delimiter values as eof().
* testsuite/27_io/basic_istream/ignore/char/93672.cc: New test.
* testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc: New
test.
cppcheck apparently warns on the | !!sticky part of the expression and
using | (!!sticky) quiets it up (it is correct as is).
The following patch adds the ()s, and also adds them around mant >> 1 just
in case it makes it clearer to all readers that the expression is parsed
that way already.
2024-04-15 Jakub Jelinek <jakub@redhat.com>
PR libgcc/114689
* config/m68k/fpgnulib.c (__truncdfsf2): Add parentheses around
!!sticky bitwise or operand to quiet up cppcheck. Add parentheses
around mant >> 1 bitwise or operand.
Add minimal description for pragma and aspect Exceptional_Cases, based
on a similarly minimal descriptions for other SPARK contracts.
gcc/ada/
* doc/gnat_rm/implementation_defined_aspects.rst
(Exceptional_Cases): Add description for aspect.
* doc/gnat_rm/implementation_defined_pragmas.rst
(Exceptional_Cases): Add description for pragma.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.
Guard the longjmp to not infinitely loop. The longjmp (jump) function is
called unconditionally to make test flow simpler, but the jump
destination would return to a point in main that would call longjmp
again. The longjmp is really there to exercise the then-branch of
setjmp, to verify coverage is accurately counted in the presence of
complex edges.
PR gcov-profile/114720
gcc/testsuite/ChangeLog:
* gcc.misc-tests/gcov-22.c: Guard longjmp to not loop.
This adds the missing VLS modes to the mask extract expanders.
gcc/ChangeLog:
PR target/114668
* config/riscv/autovec.md: Add VLS.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr114668.c: New test.
The following avoids missing coverage for the line of a switch statement
which happens when gimplification emits a BIND_EXPR wrapping the switch
as that prevents us from setting locations on the containing statements
via annotate_all_with_location. Instead set the location of the GIMPLE
switch directly.
PR gcov-profile/114715
* gimplify.cc (gimplify_switch_expr): Set the location of the
GIMPLE switch.
* gcc.misc-tests/gcov-24.c: New testcase.
The x86 instruction size limit is 15 bytes. If a NDD instruction has
a segment prefix byte, a 4-byte opcode prefix, a MODRM byte, a SIB byte,
a 4-byte displacement and a 4-byte immediate, adding an address size
prefix will exceed the size limit. Change TImode ADD, AND, OR and XOR
to allow offsettable memory only with 8-bit signed integer constant,
which is encoded with a 1-byte immediate, if the address size prefix
is used.
gcc/
PR target/114696
* config/i386/i386.md (isa): Add apx_ndd_64.
(enabled): Likewise.
(*add<dwi>3_doubleword): Change rjO to r,ro,jO with 8-bit
signed integer constant and enable jO only for apx_ndd_64.
(*add<dwi>3_doubleword_cc_overflow_1): Likewise.
(*and<dwi>3_doubleword): Likewise.
(*<code><dwi>3_doubleword): Likewise.
gcc/testsuite/
PR target/114696
* gcc.target/i386/apx-ndd-x32-2a.c: New test.
* gcc.target/i386/apx-ndd-x32-2b.c: Likewise.
* gcc.target/i386/apx-ndd-x32-2c.c: Likewise.
* gcc.target/i386/apx-ndd-x32-2d.c: Likewise.
This fixes a bug with the interaction between peeling for gaps and early break.
Before I go further, I'll first explain how I understand this to work for loops
with a single exit.
When peeling for gaps we peel N < VF iterations to scalar.
This happens by removing N iterations from the calculation of niters such that
vect_iters * VF == niters is always false.
In other words, when we exit the vector loop we always fall to the scalar loop.
The loop bounds adjustment guarantees this. Because of this we potentially
execute a vector loop iteration less. That is, if you're at the boundary
condition where niters % VF by peeling one or more scalar iterations the vector
loop executes one less.
This is accounted for by the adjustments in vect_transform_loops. This
adjustment happens differently based on whether the the vector loop can be
partial or not:
Peeling for gaps sets the bias to 0 and then:
when not partial: we take the floor of (scalar_upper_bound / VF) - 1 to get the
vector latch iteration count.
when loop is partial: For a single exit this means the loop is masked, we take
the ceil to account for the fact that the loop can handle
the final partial iteration using masking.
Note that there's no difference between ceil an floor on the boundary condition.
There is a difference however when you're slightly above it. i.e. if scalar
iterates 14 times and VF = 4 and we peel 1 iteration for gaps.
The partial loop does ((13 + 0) / 4) - 1 == 2 vector iterations. and in effect
the partial iteration is ignored and it's done as scalar.
This is fine because the niters modification has capped the vector iteration at
2. So that when we reduce the induction values you end up entering the scalar
code with ind_var.2 = ind_var.1 + 2 * VF.
Now lets look at early breaks. To make it esier I'll focus on the specific
testcase:
char buffer[64];
__attribute__ ((noipa))
buff_t *copy (buff_t *first, buff_t *last)
{
char *buffer_ptr = buffer;
char *const buffer_end = &buffer[SZ-1];
int store_size = sizeof(first->Val);
while (first != last && (buffer_ptr + store_size) <= buffer_end)
{
const char *value_data = (const char *)(&first->Val);
__builtin_memcpy(buffer_ptr, value_data, store_size);
buffer_ptr += store_size;
++first;
}
if (first == last)
return 0;
return first;
}
Here the first, early exit is on the condition:
(buffer_ptr + store_size) <= buffer_end
and the main exit is on condition:
first != last
This is important, as this bug only manifests itself when the first exit has a
known constant iteration count that's lower than the latch exit count.
because buffer holds 64 bytes, and VF = 4, unroll = 2, we end up processing 16
bytes per iteration. So the exit has a known bounds of 8 + 1.
The vectorizer correctly analizes this:
Statement (exit)if (ivtmp_21 != 0)
is executed at most 8 (bounded by 8) + 1 times in loop 1.
and as a consequence the IV is bound by 9:
# vect_vec_iv_.14_117 = PHI <_118(9), { 9, 8, 7, 6 }(20)>
...
vect_ivtmp_21.16_124 = vect_vec_iv_.14_117 + { 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615 };
mask_patt_22.17_126 = vect_ivtmp_21.16_124 != { 0, 0, 0, 0 };
if (mask_patt_22.17_126 == { -1, -1, -1, -1 })
goto <bb 3>; [88.89%]
else
goto <bb 30>; [11.11%]
The imporant bits are this:
In this example the value of last - first = 416.
the calculated vector iteration count, is:
x = (((ptr2 - ptr1) - 16) / 16) + 1 = 27
the bounds generated, adjusting for gaps:
x == (((x - 1) >> 2) << 2)
which means we'll always fall through to the scalar code. as intended.
Here are two key things to note:
1. In this loop, the early exit will always be the one taken. When it's taken
we enter the scalar loop with the correct induction value to apply the gap
peeling.
2. If the main exit is taken, the induction values assumes you've finished all
vector iterations. i.e. it assumes you have completed 24 iterations, as we
treat the main exit the same for normal loop vect and early break when not
PEELED.
This means the induction value is adjusted to ind_var.2 = ind_var.1 + 24 * VF;
So what's going wrong. The vectorizer's codegen is correct and efficient,
however when we adjust the upper bounds, that code knows that the loops upper
bound is based on the early exit. i.e. 8 latch iterations. or in other words.
It thinks the loop iterates once.
This is incorrect as the vector loop iterates twice, as it has set up the
induction value such that it exits at the early exit. So it in effect iterates
2.5x times.
Becuase the upper bound is incorrect, when we unroll it now exits from the main
exit which uses the incorrect induction value.
So there are three ways to fix this:
1. If we take the position that the main exit should support both premature
exits and final exits then vect_update_ivs_after_vectorizer needs to be
skipped for this case, and vectorizable_induction updated with third case
where we reduce with LAST reduction based on the IVs instead of assuming
you're at the end of the vector loop.
I don't like this approach. It don't think we should add a third induction
style to cover up an issue introduced by unrolling. It makes the code
harder to follow and makes main exits harder to reason about.
2. We could say that vec_init_loop_exit_info should pick the exit which has the
smallest known iteration count. This would turn this case into a PEELED case
and the induction values would be correct as we'd always recalculate them
from a reduction. This is suboptimal though as the reason we pick the latch
exit as the IV one is to prevent having to rotate the loop. This results
in more efficient code for what we assume is the common case, i.e. the main
exit.
3. In PR113734 we've established that for vectorization of early breaks that we
must always treat the loop as partial. Here partiallity means that we have
enough vector elements to start the iteration, but we may take an early exit
and so never reach the latch/main exit.
This requirement is overwritten by the peeling for gaps adjustment of the
upper bound. I believe the bug is simply that this shouldn't be done.
The adjustment here is to indicate that the main exit always leads to the
scalar loop when peeling for gaps.
But this invariant is already always true for all early exits. Remember that
early exits restart the scalar loop at the start of the vector iteration, so
the induction values will start it where we want to do the gaps peeling.
I think no# 3 is the correct fix, and also one that doesn't degrade code quality.
gcc/ChangeLog:
PR tree-optimization/114403
* tree-vect-loop.cc (vect_transform_loop): Adjust upper bounds for when
peeling for gaps and early break.
gcc/testsuite/ChangeLog:
PR tree-optimization/114403
* gcc.dg/vect/vect-early-break_124-pr114403.c: New test.
* gcc.dg/vect/vect-early-break_125-pr114403.c: New test.
Prevent rust language from building when cargo is
missing.
config/ChangeLog:
* acx.m4: Add a macro to check for rust
components.
ChangeLog:
* configure: Regenerate.
* configure.ac: Emit an error message when cargo
is missing.
Signed-off-by: Pierre-Emmanuel Patry <pierre-emmanuel.patry@embecosm.com>
This isn't necessary, as the full path to 'libproc_macro_internal.a' is
specified elsewhere.
gcc/rust/
* Make-lang.in (RUST_LDFLAGS): Remove
'libgrust/libproc_macro_internal'.
The new gcc.target/i386/fhardened-1.c etc. tests FAIL on Solaris/x86 and
Darwin/x86:
FAIL: gcc.target/i386/fhardened-1.c (test for excess errors)
FAIL: gcc.target/i386/fhardened-2.c (test for excess errors)
Excess errors:
cc1: warning: '-fhardened' not supported for this target
Support for -fhardened is restricted to HAVE_FHARDENED_SUPPORT in
toplev.cc (process_options) which again is only defined for linux*|gnu*
targets in gcc/configure.ac.
Accordingly, this patch restricts the tests to those two, as is already
done in gcc.target/i386/cf_check-6.c.
Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.
2024-04-15 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc/testsuite:
* gcc.target/i386/fhardened-1.c: Restrict to Linux/GNU.
* gcc.target/i386/fhardened-2.c: Likewise.
The enumerator still doesn't have TREE_TYPE set but diag_attr_exclusions
assumes that all decls must have types.
I think it is better in something as unimportant as diag_attr_exclusions
to be more robust, if there is no type, it can just diagnose exclusions
on the DECL_ATTRIBUTES, like for types it only diagnoses it on
TYPE_ATTRIBUTES.
2024-04-15 Jakub Jelinek <jakub@redhat.com>
PR c++/114634
* attribs.cc (diag_attr_exclusions): Set attrs[1] to NULL_TREE for
decls with NULL TREE_TYPE.
* g++.dg/ext/attrib68.C: New test.
A typo in r14-6978 made us emit too many things. This ensures that we
don't emit using-declarations from the GMF that we don't need to.
PR c++/114600
gcc/cp/ChangeLog:
* module.cc (depset:#️⃣:add_binding_entity): Require both
WMB_Using and WMB_Export for GMF entities.
gcc/testsuite/ChangeLog:
* g++.dg/modules/using-14.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Co-authored-by: Patrick Palka <ppalka@redhat.com>
I wonder if more generally we need to be doing more work when importing
definitions from header units especially to handle all the work that
'make_rtl_for_nonlocal_decl' and 'rest_of_decl_compilation' would have
been performing. But this patch fixes at least one missing step.
PR c++/106820
gcc/cp/ChangeLog:
* module.cc (trees_in::decl_value): Assemble alias when needed.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr106820_a.H: New test.
* g++.dg/modules/pr106820_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>