Original bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111731
The unwinding mechanism registers both the code range and the unwind
table itself within a b-tree lookup structure. That data structure
assumes that is consists of non-overlappping intervals. This
becomes a problem if the unwinding table is embedded within the
code itself, as now the intervals do overlap.
To fix this problem we now keep the unwind tables in a separate
b-tree, which prevents the overlap.
libgcc/ChangeLog:
PR libgcc/111731
* unwind-dw2-fde.c: Split unwind ranges if they contain the
unwind table.
The Knuth's division algorithm relies on the number of dividend limbs
to be greater ore equal to number of divisor limbs, which is why
I've added a special case for un < vn at the start of __divmodbitint4.
Unfortunately, my assumption that it then implies abs(v) > abs(u) and
so quotient must be 0 and remainder same as dividend is incorrect.
This is because this check is done before negation of the operands.
While bitint_reduce_prec reduces precision from clearly useless limbs,
the problematic case is when the dividend is unsigned or non-negative
and divisor is negative. We can have limbs (from MS to LS):
dividend: 0 M ?...
divisor: -1 -N ?...
where M has most significant bit set and M >= N (if M == N then it
also the following limbs matter) and the most significant limbs can
be even partial. In this case, the quotient should be -1 rather than
0. bitint_reduce_prec will reduce the precision of the dividend so
that M is the most significant limb, but can't reduce precision of the
divisor to more than having the -1 as most significant limb, because
-N doesn't have the most significant bit set.
The following patch fixes it by detecting this problematic case in the
un < vn handling, and instead of assuming q is 0 and r is u will
decrease vn by 1 because it knows the later code will negate the divisor
and it can be then expressed after negation in one fewer limbs.
2024-03-21 Jakub Jelinek <jakub@redhat.com>
PR libgcc/114397
* libgcc2.c (__divmodbitint4): Don't assume un < vn always means
abs(v) > abs(u), check for a special case of un + 1 == vn where
u is non-negative and v negative and after v's negation vn could
be reduced by 1.
* gcc.dg/torture/bitint-65.c: New test.
Tested with some simple toy examples where an exception is thrown in the
signal handler.
libgcc/ChangeLog:
* config/i386/gnu-unwind.h: Support unwinding x86_64 signal frames.
Signed-off-by: Flavio Cruz <flaviocruz@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
While for __mulbitint3 we actually don't negate anything and perform the
multiplication in unsigned style always, for __divmodbitint4 if the operands
aren't unsigned and are negative, we negate them first and then try to
negate them as needed at the end.
quotient is negated if just one of the operands was negated and the other
wasn't or vice versa, and remainder is negated if the first operand was
negated.
The case which doesn't work correctly is if due to limited range of the
operands we perform the division/modulo in some smaller number of limbs
and then extend it to the desired precision of the quotient and/or
remainder results. If they aren't negated, the extension is done with
memset to 0, if they are negated, the extension was done with memset
to -1. The problem is that if the quotient or remainder is zero,
then bitint_negate negates it again to zero (that is ok), but we should
then extend with memset to 0, not memset to -1.
The following patch achieves that by letting bitint_negate also check if
the negated operand is zero and changes the memset argument based on that.
2024-03-15 Jakub Jelinek <jakub@redhat.com>
PR libgcc/114327
* libgcc2.c (bitint_negate): Return UWtype bitwise or of all the limbs
before negation rather than void.
(__divmodbitint4): Determine whether to fill in the upper limbs after
negation based on whether bitint_negate returned 0 or non-zero, rather
then always filling with -1.
* gcc.dg/torture/bitint-63.c: New test.
This arranges that the byte order of the instruction sequences is
independent of the byte order of memory.
libgcc/ChangeLog:
* config/aarch64/heap-trampoline.c
(aarch64_trampoline_insns): Arrange to encode instructions as a
byte array so that the order is independent of memory byte order.
(struct aarch64_trampoline): Likewise.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
This allows the same trampoline pattern to be used on all linux variants
rather than restricting it to linux gnu.
PR target/113971
libgcc/ChangeLog:
* config/aarch64/heap-trampoline.c: Allow all linux variants.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Fix a typo in __gthr_win32_abs_to_rel_time that caused it to return a
relative time in seconds instead of milliseconds. As a consequence,
__gthr_win32_cond_timedwait called SleepConditionVariableCS with a
1000x shorter timeout; this caused ~1000x more spurious wakeups in
CV timed waits such as std::condition_variable::wait_for or wait_until,
resulting generally in much higher CPU usage.
This can be demonstrated by this sample program:
```
int main() {
std::condition_variable cv;
std::mutex mx;
bool pass = false;
auto thread_fn = [&](bool timed) {
int wakeups = 0;
using sc = std::chrono::system_clock;
auto before = sc::now();
std::unique_lock<std::mutex> ml(mx);
if (timed) {
cv.wait_for(ml, std::chrono::seconds(2), [&]{
++wakeups;
return pass;
});
} else {
cv.wait(ml, [&]{
++wakeups;
return pass;
});
}
printf("pass: %d; wakeups: %d; elapsed: %d ms\n", pass, wakeups,
int((sc::now() - before) / std::chrono::milliseconds(1)));
pass = false;
};
{
// timed wait, let expire
std::thread t(thread_fn, true);
t.join();
}
{
// timed wait, wake up explicitly after 1 second
std::thread t(thread_fn, true);
std::this_thread::sleep_for(std::chrono::seconds(1));
{
std::unique_lock<std::mutex> ml(mx);
pass = true;
}
cv.notify_all();
t.join();
}
{
// non-timed wait, wake up explicitly after 1 second
std::thread t(thread_fn, false);
std::this_thread::sleep_for(std::chrono::seconds(1));
{
std::unique_lock<std::mutex> ml(mx);
pass = true;
}
cv.notify_all();
t.join();
}
return 0;
}
```
On builds based on non-affected threading models (e.g. POSIX on Linux,
or winpthreads or MCF on Win32) the output is something like
```
pass: 0; wakeups: 2; elapsed: 2000 ms
pass: 1; wakeups: 2; elapsed: 991 ms
pass: 1; wakeups: 2; elapsed: 996 ms
```
while with the Win32 threading model we get
```
pass: 0; wakeups: 1418; elapsed: 2000 ms
pass: 1; wakeups: 479; elapsed: 988 ms
pass: 1; wakeups: 2; elapsed: 992 ms
```
(notice the huge number of wakeups in the timed wait cases only).
This commit fixes the conversion, adjusting the final division by
NSEC100_PER_SEC to use NSEC100_PER_MSEC instead (already defined in the
file and not used in any other place, so probably just a typo).
libgcc/ChangeLog:
PR libgcc/113850
* config/i386/gthr-win32-cond.c (__gthr_win32_abs_to_rel_time):
fix absolute timespec to relative milliseconds count
conversion (it incorrectly returned seconds instead of
milliseconds); this avoids spurious wakeups in
__gthr_win32_cond_timedwait
Add x32 and IBT support to x86 heap trampoline implementation with a
testcase.
2024-02-13 Jakub Jelinek <jakub@redhat.com>
H.J. Lu <hjl.tools@gmail.com>
libgcc/
PR target/113855
* config/i386/heap-trampoline.c (trampoline_insns): Add IBT
support and pad to the multiple of 4 bytes. Use movabsq
instead of movabs in comments. Add -mx32 variant.
gcc/testsuite/
PR target/113855
* gcc.dg/heap-trampoline-1.c: New test.
* lib/target-supports.exp (check_effective_target_heap_trampoline):
New.
As I wrote earlier, I was seeing
FAIL: gcc.dg/torture/bitint-24.c -O0 execution test
FAIL: gcc.dg/torture/bitint-24.c -O2 execution test
with the ia32 _BitInt enablement patch on i686-linux. I thought
floatbitintxf.c was miscompiled with -O2 -march=i686 -mtune=generic, but it
turned out to be UB in it.
If a signed _BitInt to be converted to binary floating point has
(after sign extension from possible partial limb to full limb) one or
more most significant limbs equal to all ones and then in the limb below
(the most significant non-~(UBILtype)0 limb) has the most significant limb
cleared, like for 32-bit limbs
0x81582c05U, 0x0a8b01e4U, 0xc1b8b18fU, 0x2aac2a08U, -1U, -1U
then bitint_reduce_prec can't reduce it to that 0x2aac2a08U limb, so
msb is all ones and precision is negative (so it reduced precision from
161 to 192 bits down to 160 bits, in theory could go as low as 129 bits
but that wouldn't change anything on the following behavior).
But still iprec is negative, -160 here.
For that case (i.e. where we are dealing with an negative input), the
code was using 65 - __builtin_clzll (~msb) to compute how many relevant
bits we have from the msb. Unfortunately that invokes UB for msb all ones.
The right number of relevant bits in that case is 1 though (like for
-2 it is 2 and -4 or -3 3 as already computed) - all we care about from that
is that the most significant bit is set (i.e. the number is negative) and
the bits below that should be supplied from the limbs below.
So, the following patch fixes it by special casing it not to invoke UB.
For msb 0 we already have a special case from before (but that is also
different because msb 0 implies the whole number is 0 given the way
bitint_reduce_prec works - even if we have limbs like ..., 0x80000000U, 0U
the reduction can skip the most significant limb and msb then would be
the one below it), so if iprec > 0, we already don't call __builtin_clzll
on 0.
2024-02-13 Jakub Jelinek <jakub@redhat.com>
* soft-fp/bitint.h (FP_FROM_BITINT): If iprec < 0 and msb is all ones,
just set n to 1 instead of using __builtin_clzll (~msb).
The initial heap trampoline implementation was targeting 64b
platforms. As the PR demonstrates this creates an issue where it
is expected that the same symbols are exported for 32 and 64b.
Rather than conditionalize the exports and code-gen on x86_64,
this patch provides a basic implementation of the IA32 trampoline.
This also avoids potential user confusion, when a 32b target has
64b multilibs, and vice versa; which is the case for Darwin.
PR target/113855
gcc/ChangeLog:
* config/i386/darwin.h (DARWIN_HEAP_T_LIB): Moved to be
available to all sub-targets.
* config/i386/darwin32-biarch.h (DARWIN_HEAP_T_LIB): Delete.
* config/i386/darwin64-biarch.h (DARWIN_HEAP_T_LIB): Delete.
libgcc/ChangeLog:
* config.host: Add trampoline support to x?86-linux.
* config/i386/heap-trampoline.c (trampoline_insns): Provide
a variant for IA32.
(union ix86_trampoline): Likewise.
(__gcc_nested_func_ptr_created): Implement a basic trampoline
for IA32.
The ia32 _BitInt support revealed a bug in floatbitint?d.c.
As can be even guessed from how the code is written in the loop,
the intention was to set inexact to non-zero whenever the remainder
after division wasn't zero, but I've ended up just checking whether
the 2 least significant limbs of the remainder were non-zero.
Now, in the dfp/bitint-4.c test in one case the remainder happens
to have least significant 64 bits zero and then the higher limbs are
non-zero; with 32-bit limbs that means 2 least significant limbs are zero
and so the code acted as if it was exactly divisible.
Fixed thusly.
2024-02-10 Jakub Jelinek <jakub@redhat.com>
* soft-fp/floatbitintdd.c (__bid_floatbitintdd): Or in all remainder
limbs into inexact rather than just first two.
* soft-fp/floatbitintsd.c (__bid_floatbitintsd): Likewise.
* soft-fp/floatbitinttd.c (__bid_floatbitinttd): Likewise.
I've tried last night to enable _BitInt support for i?86-linux, and
a few spots in libgcc emitted -Wshift-count-overflow warnings and clearly
didn't do what it was supposed to do.
Fixed thusly.
2024-02-10 Jakub Jelinek <jakub@redhat.com>
* soft-fp/fixddbitint.c (__bid_fixddbitint): Fix up
BIL_TYPE_SIZE == 32 shifts.
* soft-fp/fixsdbitint.c (__bid_fixsdbitint): Likewise.
* soft-fp/fixtdbitint.c (__bid_fixtdbitint): Likewise.
* soft-fp/floatbitintdd.c (__bid_floatbitintdd): Likewise.
* soft-fp/floatbitinttd.c (__bid_floatbitinttd): Likewise.
Some exports were missed from the GCC-13 cycle, these are added here
along with the bitint-related ones added in GCC-14.
libgcc/ChangeLog:
* config/i386/libgcc-darwin.ver: Export bf and bitint-related
synbols.
As reported in the PR, all libgcc x86 symbol versions added after
GCC_7.0.0 were only added to i386/libgcc-glibc.ver, missing all of
libgcc-sol2.ver, libgcc-bsd.ver, and libgcc-darwin.ver.
This patch fixes this for Solaris/x86, adding all of them
(GCC_1[234].0.0) as GCC_14.0.0 to not retroactively change history.
Since this isn't the first time this happens, I've added a note to the
end of libgcc-glibc.ver to request notifying other maintainers in case
of additions.
Tested on i386-pc-solaris2.11.
2024-02-01 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
libgcc:
PR target/113700
* config/i386/libgcc-sol2.ver (GCC_14.0.0): Added all symbols from
i386/libgcc-glibc.ver (GCC_12.0.0, GCC_13.0.0, GCC_14.0.0).
* config/i386/libgcc-glibc.ver: Request notifications on updates.
SEH _Unwind_Resume_or_Rethrow invokes abort directly if
_Unwind_RaiseException doesn't manage to find a handler for the rethrown
exception; this is incorrect, as in this case std::terminate should be
invoked, allowing an application-provided terminate handler to handle
the situation instead of straight crashing the application through
abort.
The bug can be demonstrated with this simple test case:
===
static void custom_terminate_handler() {
fprintf(stderr, "custom_terminate_handler invoked\n");
std::exit(1);
}
int main(int argc, char *argv[]) {
std::set_terminate(&custom_terminate_handler);
if (argc < 2) return 1;
const char *mode = argv[1];
fprintf(stderr, "%s\n", mode);
if (strcmp(mode, "throw") == 0) {
throw std::exception();
} else if (strcmp(mode, "rethrow") == 0) {
try {
throw std::exception();
} catch (...) {
throw;
}
} else {
return 1;
}
return 0;
}
===
On all gcc builds with non-SEH exceptions, this will print
"custom_terminate_handler invoked" both if launched as ./a.out throw or
as ./a.out rethrow, on SEH builds instead if will work as expected only
with ./a.exe throw, but will crash with the "built-in" abort message
with ./a.exe rethrow.
This patch fixes the problem, forwarding back the error code to the
caller (__cxa_rethrow), that calls std::terminate if
_Unwind_Resume_or_Rethrow returns.
The change makes the code path coherent with SEH _Unwind_RaiseException,
and with the generic _Unwind_Resume_or_Rethrow from libgcc/unwind.inc
(used for SjLj and Dw2 exception backend).
libgcc/ChangeLog:
PR libgcc/113337
* unwind-seh.c (_Unwind_Resume_or_Rethrow): forward
_Unwind_RaiseException return code back to caller instead of
calling abort, allowing __cxa_rethrow to invoke std::terminate
in case of uncaught rethrown exception
The following testcase ends up with SIGFPE in __divmodbitint4.
The problem is a thinko in my attempt to implement Knuth's algorithm.
The algorithm does (where b is 65536, i.e. one larger than what
fits in their unsigned short word):
// Compute estimate qhat of q[j].
qhat = (un[j+n]*b + un[j+n-1])/vn[n-1];
rhat = (un[j+n]*b + un[j+n-1]) - qhat*vn[n-1];
again:
if (qhat >= b || qhat*vn[n-2] > b*rhat + un[j+n-2])
{ qhat = qhat - 1;
rhat = rhat + vn[n-1];
if (rhat < b) goto again;
}
The problem is that it uses a double-word / word -> double-word
division (and modulo), while all we have is udiv_qrnnd unless
we'd want to do further library calls, and udiv_qrnnd is a
double-word / word -> word division and modulo.
Now, as the algorithm description says, it can produce at most
word bits + 1 bit quotient. And I believe that actually the
highest qhat the original algorithm can produce is
(1 << word_bits) + 1. The algorithm performs earlier canonicalization
where both the divisor and dividend are shifted left such that divisor
has msb set. If it has msb set already before, no shifting occurs but
we start with added 0 limb, so in the first uv1:uv0 double-word uv1
is 0 and so we can't get too high qhat, if shifting occurs, the first
limb of dividend is shifted right by UWtype bits - shift count into
a new limb, so again in the first iteration in the uv1:uv0 double-word
uv1 doesn't have msb set while vv1 does and qhat has to fit into word.
In the following iterations, previous iteration should guarantee that
the previous quotient digit is correct. Even if the divisor was the
maximal possible vv1:all_ones_in_all_lower_limbs, if the old uv0:lower_limbs
would be larger or equal to the divisor, the previous quotient digit
would increase and another divisor would be subtracted, which I think
implies that in the next iteration in uv1:uv0 double-word uv1 <= vv1,
but uv0 could be up to all ones, e.g. in case of all lower limbs
of divisor being all ones and at least one dividend limb below uv0
being not all ones. So, we can e.g. for 64-bit UWtype see
uv1:uv0 / vv1 0x8000000000000000UL:0xffffffffffffffffUL / 0x8000000000000000UL
or 0xffffffffffffffffUL:0xffffffffffffffffUL / 0xffffffffffffffffUL
In all these cases (when uv1 == vv1 && uv0 >= uv1), qhat is
0x10000000000000001UL, i.e. 2 more than fits into UWtype result,
if uv1 == vv1 && uv0 < uv1 it would be 0x10000000000000000UL, i.e.
1 more than fits into UWtype result.
Because we only have udiv_qrnnd which can't deal with those too large
cases (SIGFPEs or otherwise invokes undefined behavior on those), I've
tried to handle the uv1 >= vv1 case separately, but for one thing
I thought it would be at most 1 larger than what fits, and for two
have actually subtracted vv1:vv1 from uv1:uv0 instead of subtracting
0:vv1 from uv1:uv0.
For the uv1 < vv1 case, the implementation already performs roughly
what the algorithm does.
Now, let's see what happens with the two possible extra cases in
the original algorithm.
If uv1 == vv1 && uv0 < uv1, qhat above would be b, so we take
if (qhat >= b, decrement qhat by 1 (it becomes b - 1), add
vn[n-1] aka vv1 to rhat and goto again if rhat < b (but because
qhat already fits we can goto to the again label in the uv1 < vv1
code). rhat in this case is uv0 and rhat + vv1 can but doesn't
have to overflow, say for uv0 42UL and vv1 0x8000000000000000UL
it will not (and so we should goto again), while for uv0
0x8000000000000000UL and vv1 0x8000000000000001UL it will (and
we shouldn't goto again).
If uv1 == vv1 && uv0 >= uv1, qhat above would be b + 1, so we
take if (qhat >= b, decrement qhat by 1 (it becomes b), add
vn[n-1] aka vv1 to rhat. But because vv1 has msb set and
rhat in this case is uv0 - vv1, the rhat + vv1 addition
certainly doesn't overflow, because (uv0 - vv1) + vv1 is uv0,
so in the algorithm we goto again, again take if (qhat >= b and
decrement qhat so it finally becomes b - 1, and add vn[n-1]
aka vv1 to rhat again. But this time I believe it must always
overflow, simply because we added (uv0 - vv1) + vv1 + vv1 and
vv1 has msb set, so already vv1 + vv1 must overflow. And
because it overflowed, it will not goto again.
So, I believe the following patch implements this correctly, by
subtracting vv1 from uv1:uv0 double-word once, then comparing
again if uv1 >= vv1. If that is true, subtract vv1 from uv1:uv0
again and add 2 * vv1 to rhat, no __builtin_add_overflow is needed
as we know it always overflowed and so won't goto again.
If after the first subtraction uv1 < vv1, use __builtin_add_overflow
when adding vv1 to rhat, because it can but doesn't have to overflow.
I've added an extra testcase which tests the behavior of all the changed
cases, so it has a case where uv1:uv0 / vv1 is 1:1, where it is
1:0 and rhat + vv1 overflows and where it is 1:0 and rhat + vv1 does not
overflow, and includes tests also from Zdenek's other failing tests.
2024-02-02 Jakub Jelinek <jakub@redhat.com>
PR libgcc/113604
* libgcc2.c (__divmodbitint4): If uv1 >= vv1, subtract
vv1 from uv1:uv0 once or twice as needed, rather than
subtracting vv1:vv1.
* gcc.dg/torture/bitint-53.c: New test.
* gcc.dg/torture/bitint-55.c: New test.
Rainer pointed out that __PFX__ and __FIXPTPFX__ prefix replacement is done
solely for libgcc-std.ver.in and not for the *.ver files in config.
I've used the __PFX__ prefix even in config/i386/libgcc-glibc.ver because it
was used for similar symbols in libgcc-std.ver.in, and that results in those
symbols being STB_LOCAL in libgcc_s.so.1. Tests still work because gcc by
default uses -static-libgcc when linking (unlike g++ etc.), but would
have failed when using -shared-libgcc (but I see nothing in the testsuite
actually testing with -shared-libgcc, so am not adding tests).
With the patch, libgcc_s.so.1 now exports
__fixtfbitint@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__fixxfbitint@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitintbf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitinthf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitinttf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitintxf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
on x86_64-linux which it wasn't before.
2024-02-02 Jakub Jelinek <jakub@redhat.com>
PR target/113700
* config/i386/libgcc-glibc.ver (GCC_14.0.0): Remove __PFX prefixes
from symbol names.
I'm seeing hundreds of
In file included from ../../../libgcc/libgcc2.c:56:
../../../libgcc/libgcc2.h:32:13: warning: conflicting types for built-in function ‘__gcc_nested_func_ptr_created’; expected ‘void(void *, void *, void *)’
+[-Wbuiltin-declaration-mismatch]
32 | extern void __gcc_nested_func_ptr_created (void *, void *, void **);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
warnings.
Either we need to add like in r14-6218
#pragma GCC diagnostic ignored "-Wbuiltin-declaration-mismatch"
(but in that case because of the libgcc2.h prototype (why is it there?)
it would need to be also with #pragma GCC diagnostic push/pop around),
or we could go with just following how the builtins are prototyped on the
compiler side and only cast to void ** when dereferencing (which is in
a single spot in each TU).
2024-02-01 Jakub Jelinek <jakub@redhat.com>
PR libgcc/113402
* libgcc2.h (__gcc_nested_func_ptr_created): Change type of last
argument from void ** to void *.
* config/i386/heap-trampoline.c (__gcc_nested_func_ptr_created):
Change type of dst from void ** to void * and cast dst to void **
before dereferencing it.
* config/aarch64/heap-trampoline.c (__gcc_nested_func_ptr_created):
Likewise.
I'm seeing
../../../libgcc/shared-object.mk:14: warning: overriding recipe for target 'heap-trampoline.o'
../../../libgcc/shared-object.mk:14: warning: ignoring old recipe for target 'heap-trampoline.o'
../../../libgcc/shared-object.mk:17: warning: overriding recipe for target 'heap-trampoline_s.o'
../../../libgcc/shared-object.mk:17: warning: ignoring old recipe for target 'heap-trampoline_s.o'
This patch fixes that.
2024-02-01 Jakub Jelinek <jakub@redhat.com>
PR libgcc/113403
* config/i386/t-heap-trampoline: Add to LIB2ADDEHSHARED
i386/heap-trampoline.c rather than aarch64/heap-trampoline.c.
Use aarch64-asm.h in asm code consistently, this was started in
commit c608ada288
Author: Zac Walker <zacwalker@microsoft.com>
CommitDate: 2024-01-23 15:32:30 +0000
Ifdef `.hidden`, `.type`, and `.size` pseudo-ops for `aarch64-w64-mingw32` target
But that commit failed to remove some existing markings from asm files,
which means some objects got double marked with gnu property notes.
libgcc/ChangeLog:
* config/aarch64/crti.S: Remove stack marking.
* config/aarch64/crtn.S: Remove stack marking, include aarch64-asm.h
* config/aarch64/lse.S: Remove stack and GNU property markings.
In order to handle system security constraints during GCC build
and test and that most platform versions cannot link to libgcc_eh
since the unwinder there is incompatible with the system one.
1. We make the support functions weak definitions.
2. We include them as a CRT for platform conditions that do not
allow libgcc_eh.
3. We ensure that the weak symbols are exported from DSOs (which
includes exes on Darwin) so that the dynamic linker will
pick one instance (which avoids duplication of trampoline
caches).
PR libgcc/113403
gcc/ChangeLog:
* config/darwin.h (DARWIN_SHARED_WEAK_ADDS, DARWIN_WEAK_CRTS): New.
(REAL_LIBGCC_SPEC): Move weak CRT handling to separate spec.
* config/i386/darwin.h (DARWIN_HEAP_T_LIB): New.
* config/i386/darwin32-biarch.h (DARWIN_HEAP_T_LIB): New.
* config/i386/darwin64-biarch.h (DARWIN_HEAP_T_LIB): New.
* config/rs6000/darwin.h (DARWIN_HEAP_T_LIB): New.
libgcc/ChangeLog:
* config.host: Build libheap_t.a for i686/x86_64 Darwin.
* config/aarch64/heap-trampoline.c (HEAP_T_ATTR): New.
(allocate_tramp_ctrl): Allow a target to build this as a weak def.
(__gcc_nested_func_ptr_created): Likewise.
* config/i386/heap-trampoline.c (HEAP_T_ATTR): New.
(allocate_tramp_ctrl): Allow a target to build this as a weak def.
(__gcc_nested_func_ptr_created): Likewise.
* config/t-darwin: Build libheap_t.a (a CRT with heap trampoline
support).
This removes the heap trampoline support functions from libgcc.a and
adds them to libgcc_eh.a. They are also present in libgcc_s.
PR libgcc/113403
libgcc/ChangeLog:
* config/aarch64/t-heap-trampoline: Move the heap trampoline
support functions from libgcc.a to libgcc_eh.a.
* config/i386/t-heap-trampoline: Likewise.
The symbols for the functions supporting heap-based trampolines were
exported at an incorrect symbol version, the following patch fixes that.
As requested in the PR, this also renames __builtin_nested_func_ptr* to
__gcc_nested_func_ptr*. In carrying our the rename, we move the builtins
to use DEF_EXT_LIB_BUILTIN.
PR libgcc/113402
gcc/ChangeLog:
* builtins.cc (expand_builtin): Handle BUILT_IN_GCC_NESTED_PTR_CREATED
and BUILT_IN_GCC_NESTED_PTR_DELETED.
* builtins.def (BUILT_IN_GCC_NESTED_PTR_CREATED,
BUILT_IN_GCC_NESTED_PTR_DELETED): Make these builtins LIB-EXT and
rename the library fallbacks to __gcc_nested_func_ptr_created and
__gcc_nested_func_ptr_deleted.
* doc/invoke.texi: Rename these to __gcc_nested_func_ptr_created
and __gcc_nested_func_ptr_deleted.
* tree-nested.cc (finalize_nesting_tree_1): Use builtin_explicit for
BUILT_IN_GCC_NESTED_PTR_CREATED and BUILT_IN_GCC_NESTED_PTR_DELETED.
* tree.cc (build_common_builtin_nodes): Build the
BUILT_IN_GCC_NESTED_PTR_CREATED and BUILT_IN_GCC_NESTED_PTR_DELETED local
builtins only for non-explicit.
libgcc/ChangeLog:
* config/aarch64/heap-trampoline.c: Rename
__builtin_nested_func_ptr_created to __gcc_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted to __gcc_nested_func_ptr_deleted.
* config/i386/heap-trampoline.c: Likewise.
* libgcc2.h: Likewise.
* libgcc-std.ver.in (GCC_7.0.0): Likewise and then move
__gcc_nested_func_ptr_created and
__gcc_nested_func_ptr_deleted from this symbol version to ...
(GCC_14.0.0): ... this one.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Co-authored-by: Jakub Jelinek <jakub@redhat.com>
This is enough to get gfx1030 and gfx1100 working; there are still some test
failures to investigate, and probably some tuning to do.
gcc/ChangeLog:
* config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3.
* config/gcn/gcn-valu.md (all_convert): New iterator.
(<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): New
define_expand, and rename the old one to ...
(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): ... this.
(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): Likewise, to ...
(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): .. this.
(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>): New.
* config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly.
(gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100.
* config/gcn/gcn.md (<u>mulhisi3): Disable on RDNA3.
(<u>mulqihi3_scalar): Likewise.
libgcc/ChangeLog:
* config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3.
libgomp/ChangeLog:
* config/gcn/time.c (RTC_TICKS): Configure RDNA3.
(omp_get_wtime): Add RDNA3-compatible variant.
* plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100.
Signed-off-by: Andrew Stubbs <ams@baylibre.com>
Recent
change (https://gcc.gnu.org/pipermail/gcc-cvs/2023-December/394915.html)
added a generic SME support using `.hidden`, `.type`, and ``.size`
pseudo-ops in the assembly sources, `aarch64-w64-mingw32` does not
support the pseudo-ops though. This patch wraps usage of those
pseudo-ops using macros and ifdefs them for `__ELF__` define.
libgcc/
* config/aarch64/aarch64-asm.h (HIDDEN, SYMBOL_SIZE, SYMBOL_TYPE)
(ENTRY_ALIGN, GNU_PROPERTY): New macros.
* config/aarch64/__arm_sme_state.S: Use them.
* config/aarch64/__arm_tpidr2_save.S: Likewise.
* config/aarch64/__arm_za_disable.S: Likewise.
* config/aarch64/crti.S: Likewise.
* config/aarch64/lse.S: Likewise.
As discussed on IRC, the following patch uses may_alias attribute, so that
on targets like aarch64 where abi_limb_mode != limb_mode the library
accesses the limbs (half limbs of the ABI) in the arrays with conservative
alias set.
2024-01-12 Jakub Jelinek <jakub@redhat.com>
* libgcc2.h (UBILtype): New typedef with may_alias attribute.
(__mulbitint3, __divmodbitint4): Use UBILtype * instead of
UWtype * and const UBILtype * instead of const UWtype *.
* libgcc2.c (bitint_reduce_prec, bitint_mul_1, bitint_addmul_1,
__mulbitint3, bitint_negate, bitint_submul_1, __divmodbitint4):
Likewise.
* soft-fp/bitint.h (UBILtype): Change define into a typedef with
may_alias attribute.