This modifies the special __tls_get_addr stub that checks for a
tlsdesc style __tls_index entry and returns early. Not using r11
isn't much benefit at the moment but a followup patch will preserve
regs around the first call to __tls_get_addr when the __tls_index
entry isn't yet set up for an early return.
bfd/
* elf64-ppc.c (LD_R11_0R3, CMPDI_R11_0, STD_R11_0R1, LD_R11_0R1),
(MTLR_R11): Don't define.
(LD_R0_0R3, CMPDI_R0_0): Define.
(build_tls_get_addr_stub): Don't use r11 in stub.
ld/
* testsuite/ld-powerpc/tlsexe.d: Match new __tls_get_addr stub.
* testsuite/ld-powerpc/tlsexeno.d: Likewise.
* testsuite/ld-powerpc/tlsexetoc.d: Likewise.
* testsuite/ld-powerpc/tlsexetocno.d: Likewise.
* testsuite/ld-powerpc/tlsopt5.d: Likewise.
This is in preparation for the next patch adding Spectre variant 2
mitigation for PowerPC and PowerPC64. Besides tidying code involved
in stub output (to reduce the number of places where bctr is output),
the patch adds some user visible features:
1) PowerPC64 ELFv2 global entry stubs now are aligned under the
control of --plt-align, with a default alignment of 32 bytes.
2) PowerPC64 __glink_PLTresolve is no longer padded out with nops.
3) PowerPC32 PLT stubs are aligned under the control of --plt-align,
with the default alignment being 16 bytes as before.
4) The PowerPC32 branch/nop table emitted before __glink_PLTresolve
is now smaller in many cases. It was sized incorrectly when the
__tls_get_addr_opt stub was used, and unnecessarily included space
for local ifuncs.
bfd/
* elf32-ppc.c (GLINK_ENTRY_SIZE): Add parameters, handle
__tls_get_addr_opt, and alignment sizing.
(TLS_GET_ADDR_GLINK_SIZE): Delete.
(is_nonpic_glink_stub): Don't use GLINK_ENTRY_SIZE.
(ppc_elf_get_synthetic_symtab): Recognize stubs spaced at 4, 6,
or 8 insns.
(ppc_elf_link_hash_table_create): Init new ppc_elf_params field.
(allocate_dynrelocs): Use new GLINK_ENTRY_SIZE.
(ppc_elf_size_dynamic_sections): Likewise. Size branch table
by PLT reloc count.
(write_glink_stub): Handle __tls_get_addr_opt stub.
Pad out to size given by GLINK_ENTRY_SIZE.
(ppc_elf_relocate_section): Adjust write_glink_stub call.
(ppc_elf_finish_dynamic_symbol): Likewise.
(ppc_elf_finish_dynamic_sections): Write PLTresolve without using
insn array since so many need rewriting.
* elf32-ppc.h (struct ppc_elf_params): Add plt_stub_align.
* elf64-ppc.c (GLINK_PLTRESOLVE_SIZE): Rename from
GLINK_CALL_STUB_SIZE. Add htab param and evaluate to size without
nops. Adjust all uses.
(ppc64_elf_get_synthetic_symtab): Don't use GLINK_CALL_STUB_SIZE
in glink_vma calculation.
(struct ppc_link_hash_table): Add global_entry section pointer.
(create_linkage_sections): Create separate section for global
entry stubs.
(PPC_LO, PPC_HI, PPC_HA): Move earlier.
(size_global_entry_stubs): Handle sizing for aligned stubs.
(ppc64_elf_size_dynamic_sections): Handle global_entry alloc,
and don't stash end of glink branch table in rawsize.
(ppc_build_one_stub): Rewrite stub size calculations.
(build_global_entry_stubs): Use new section.
(ppc64_elf_build_stubs): Don't pad __glink_PLTresolve with nops.
Build lazy link stubs out to end of section. Build global entry
stubs in new section.
gold/
* options.h (plt_align): Support for PowerPC32 too.
* powerpc.cc (Stub_table::stub_align): Heed --plt-align for 32-bit.
(Stub_table::plt_call_size, branch_stub_size): Tidy.
(Stub_table::plt_call_align): Implement using stub_align.
(Output_data_glink::global_entry_align): New function.
(Output_data_glink::global_entry_off): New function.
(Output_data_glink::global_entry_address): Use global_entry_off.
(Output_data_glink::pltresolve_size): New function, replacing
pltresolve_size_ constant. Update all uses.
(Output_data_glink::add_global_entry): Align offset.
(Output_data_glink::set_final_data_size): Use global_entry_align.
(Stub_table::do_write): Don't pad __glink_PLTrelsolve with nops.
Tidy stub output. Use global_entry_off.
ld/
* emultempl/ppc32elf.em (params): Init new field.
(enum ppc32_opt): New enum to define OPTION_* values. Add
OPTION_PLT_ALIGN and OPTION_NO_PLT_ALIGN.
(PARSE_AND_LIST_LONGOPTS): Handle new options.
(PARSE_AND_LIST_ARGS_CASES): Likewise.
(PARSE_AND_LIST_OPTIONS): Likewise. Break up help output.
* emultempl/ppc64elf.em (ppc_add_stub_section): Init alignment
correctly for negative --plt-stub-align.
* testsuite/ld-powerpc/elfv2exe.d,
* testsuite/ld-powerpc/elfv2so.d,
* testsuite/ld-powerpc/relbrlt.d,
* testsuite/ld-powerpc/relbrlt.s,
* testsuite/ld-powerpc/tlsexe.d,
* testsuite/ld-powerpc/tlsexe.r,
* testsuite/ld-powerpc/tlsexe32.d,
* testsuite/ld-powerpc/tlsexe32.g,
* testsuite/ld-powerpc/tlsexe32.r,
* testsuite/ld-powerpc/tlsexetoc.d,
* testsuite/ld-powerpc/tlsexetoc.r,
* testsuite/ld-powerpc/tlsopt5_32.d,
* testsuite/ld-powerpc/tlsso.d,
* testsuite/ld-powerpc/tlstocso.d: Update for changed stub order.
In the TLS GD/LD to LE optimization, ld replaces a sequence like
addi 3,2,x@got@tlsgd R_PPC64_GOT_TLSGD16 x
bl __tls_get_addr(x@tlsgd) R_PPC64_TLSGD x
R_PPC64_REL24 __tls_get_addr
nop
with
addis 3,13,x@tprel@ha R_PPC64_TPREL16_HA x
addi 3,3,x@tprel@l R_PPC64_TPREL16_LO x
nop
When the tprel offset is small, this can be further optimized to
nop
addi 3,13,x@tprel
nop
bfd/
* elf64-ppc.c (struct ppc_link_hash_table): Add do_tls_opt.
(ppc64_elf_tls_optimize): Set it.
(ppc64_elf_relocate_section): Nop addis on TPREL16_HA, and convert
insn on TPREL16_LO and TPREL16_LO_DS relocs to use r13 when
addis would add zero.
* elf32-ppc.c (struct ppc_elf_link_hash_table): Add do_tls_opt.
(ppc_elf_tls_optimize): Set it.
(ppc_elf_relocate_section): Nop addis on TPREL16_HA, and convert
insn on TPREL16_LO relocs to use r2 when addis would add zero.
gold/
* powerpc.cc (Target_powerpc::Relocate::relocate): Nop addis on
TPREL16_HA, and convert insn on TPREL16_LO and TPREL16_LO_DS
relocs to use r2/r13 when addis would add zero.
ld/
* testsuite/ld-powerpc/tls.s: Add calls with tls markers.
* testsuite/ld-powerpc/tls32.s: Likewise.
* testsuite/ld-powerpc/powerpc.exp: Run tls marker tests.
* testsuite/ld-powerpc/tls.d: Adjust for TPREL16_HA/LO optimization.
* testsuite/ld-powerpc/tlsexe.d: Likewise.
* testsuite/ld-powerpc/tlsexetoc.d: Likewise.
* testsuite/ld-powerpc/tlsld.d: Likewise.
* testsuite/ld-powerpc/tlsmark.d: Likewise.
* testsuite/ld-powerpc/tlsopt4.d: Likewise.
* testsuite/ld-powerpc/tlstoc.d: Likewise.
There isn't a good reason for ld.bfd to behave differently from gold
in the code generated by TLS GD/LD to LE optimization.
bfd/
* elf64-ppc.c (ppc64_elf_relocate_section): When optimizing
__tls_get_addr call sequences to LE, don't move the addi down
to the nop. Replace the bl with addi and leave the nop alone.
ld/
* testsuite/ld-powerpc/tls.d: Update.
* testsuite/ld-powerpc/tlsexe.d: Update.
* testsuite/ld-powerpc/tlsexetoc.d: Update.
* testsuite/ld-powerpc/tlsld.d: Update.
* testsuite/ld-powerpc/tlsmark.d: Update.
* testsuite/ld-powerpc/tlsopt4.d: Update.
* testsuite/ld-powerpc/tlstoc.d: Update.
This patch fixes a number of issues with powerpc dynamic relocations.
1) Both ppc and ppc64 were emitting more dynamic symbols and
relocations than necessary, due to not supporting static linker
resolution of tls_index entries for __tls_get_addr_opt. This meant
that any @got@tlsgd or @got@tlsld reloc needed to make their symbols
dynamic and generate dptmod and dtprel relocs for the dynamic linker.
That would have been passable, but what happened was that practically
all @got relocations resulted in their symbols being made dynamic and
dynamic relocations emitted against the GOT entries. (Mostly visible
on ppc32 executables since ppc64 gcc really only uses @got style
relocs for TLS.)
2) The PowerOpen syntax was not supported with __tls_get_addr_opt.
DTPMOD/DTPREL relocs on tls_index TOC entries did not use the trick of
forcing dynamic symbols and relocations so those entries always
resulted in the full __tls_get_addr processing. gcc doesn't use the
PowerOpen syntax for TLS, and normally such code would be optimized to
TLS IE or LE so the impact of missing this support was minimal.
3) In an executable, relocations against GNU indirect functions always
used the value of their PLT stub. While this is correct, it is
better in some cases to use a dynamic relocation. An extra dynamic
relocation can mean that calls via function pointers need not bounce
through the PLT stub at runtime.
The patch also tidies the PLT handling code in ppc32
allocate_dynrelocs. Allocating PLT entries after other dynamic relocs
allows the PLT loop to omit special handling for undefined weak
symbols, and that in turn allows the loop to be simplified.
bfd/
* elf32-ppc.c (ppc_elf_adjust_dynamic_symbol): Merge two cases
where dynamic relocs are preferable. Allow ifunc too.
(ensure_undefweak_dynamic): New function.
(allocate_dynrelocs): Use it here. Move plt handling last and
don't make symbols dynamic, simplifying loop. Only make undef
weak symbols with GOT entries dynamic. Correct condition
for GOT relocs. Handle dynamic relocs on ifuncs. Correct
comments. Remove goto.
(ppc_elf_relocate_section): Correct test for using dynamic
symbol on GOT relocs. Rearrange test for emitting GOT relocs
to suit. Set up explicit tls_index entries and implicit GOT
tls_index entries resolvable at link time for
__tls_get_addr_opt. Simplify test to clear mem for prelink.
* elf64-ppc.c (allocate_got): Correct condition for GOT relocs.
(ensure_undefweak_dynamic): New function.
(allocate_dynrelocs): Use it here. Only make undef weak symbols
with GOT entries dynamic. Remove unnecessary test of
WILL_CALL_FINISH_DYNAMIC_SYMBOL in PLT handling.
(ppc64_elf_relocate_section): Correct test for using dynamic
symbol on GOT relocs. Rearrange test for emitting GOT relocs
to suit. Set up explicit tls_index entries and implicit GOT
tls_index entries resolvable at link time for __tls_get_addr_opt.
Simplify expression to clear mem for prelink.
ld/
* testsuite/ld-powerpc/tlsexe.r: Update for fewer dynamic relocs
and symbols.
* testsuite/ld-powerpc/tlsexe.d: Likewise.
* testsuite/ld-powerpc/tlsexe.g: Likewise.
This change is to support the new ELFv2 ABI, which uses the value in
r12 on function entry to calculate the got/toc pointer.
bfd/
* elf64-ppc.c (build_plt_stub): Switch stubs to use r11 as base
reg and r12 as destination.
(ppc_build_one_stub): Likewise.
(ppc64_elf_build_stubs): Likewise for glink.
ld/testsuite/
* ld-powerpc/tls.s: Add proper .opd entry for _start.
* ld-powerpc/tlstoc.s: Likewise.
* ld-powerpc/relbrlt.d: Update for changed stubs.
* ld-powerpc/tls.d: Update for changed stubs and _start .opd entry.
* ld-powerpc/tls.g: Likewise.
* ld-powerpc/tlsexe.d: Likewise.
* ld-powerpc/tlsexe.g: Likewise.
* ld-powerpc/tlsexe.r: Likewise.
* ld-powerpc/tlsexetoc.d: Likewise.
* ld-powerpc/tlsexetoc.g: Likewise.
* ld-powerpc/tlsexetoc.r: Likewise.
* ld-powerpc/tlsso.d: Likewise.
* ld-powerpc/tlsso.g: Likewise.
* ld-powerpc/tlsso.r: Likewise.
* ld-powerpc/tlstoc.d: Likewise.
* ld-powerpc/tlstoc.g: Likewise.
* ld-powerpc/tlstocso.d: Likewise.
* ld-powerpc/tlstocso.g: Likewise.
* ld-powerpc/tlstocso.r: Likewise.