The extended instructions implemented in powerpc_macros aren't used by
the disassembler. That means instructions like "sldi r3,r3,2" appear
in disassembly as "rldicr r3,r3,2,61", which is annoying since many
other extended instructions are shown.
Note that some of the instructions moved out of the macro table to the
opcode table won't appear in disassembly, because they are aliases
rather than a subset of the underlying raw instruction. If enabled,
rotrdi, extrdi, extldi, clrlsldi, and insrdi would replace all
occurrences of rotldi, rldicl, rldicr, rldic and rldimi. (Or many
occurrences in the case of clrlsldi if n <= b was added to the extract
functions.)
The patch also fixes a small bug in opcode sanity checking.
include/
* opcode/ppc.h (PPC_OPSHIFT_SH6): Define.
opcodes/
* ppc-opc.c (insert_erdn, extract_erdn, insert_eldn, extract_eldn),
(insert_crdn, extract_crdn, insert_rrdn, extract_rrdn),
(insert_sldn, extract_sldn, insert_srdn, extract_srdn),
(insert_erdb, extract_erdb, insert_csldn, extract_csldb),
(insert_irdb, extract_irdn): New functions.
(ELDn, ERDn, ERDn, RRDn, SRDn, ERDb, CSLDn, CSLDb, IRDn, IRDb):
Define and add associated powerpc_operands entries.
(powerpc_opcodes): Add "rotrdi", "srdi", "extrdi", "clrrdi",
"sldi", "extldi", "clrlsldi", "insrdi" and corresponding record
(ie. dot suffix) forms.
(powerpc_macros): Delete same from here.
gas/
* config/tc-ppc.c (insn_validate): Don't modify value passed
to operand->insert for PPC_OPERAND_PLUS1 when calculating mask.
Handle PPC_OPSHIFT_SH6.
* testsuite/gas/ppc/prefix-reloc.d: Update.
* testsuite/gas/ppc/simpshft.d: Update.
ld/
* testsuite/ld-powerpc/elfv2so.d: Update.
* testsuite/ld-powerpc/notoc.d: Update.
* testsuite/ld-powerpc/notoc3.d: Update.
* testsuite/ld-powerpc/tlsdesc2.d: Update.
* testsuite/ld-powerpc/tlsget.d: Update.
* testsuite/ld-powerpc/tlsget2.d: Update.
* testsuite/ld-powerpc/tlsopt5.d: Update.
* testsuite/ld-powerpc/tlsopt6.d: Update.
The save of r2 in __glink_PLTresolve is the culprit. Remove it,
unless we know we need it for --plt-localentry. --plt-localentry
should not be used with power10 pc-relative code that makes tail
calls.
The patch also removes use of r2 as a scratch reg in the ELFv2
__glink_PLTresolve. Using r2 isn't a problem, this is just reducing
the number of scratch regs.
bfd/
* elf64-ppc.c (GLINK_PLTRESOLVE_SIZE): Depend on has_plt_localentry0.
(LD_R0_0R11, ADD_R11_R0_R11): Define.
(ppc64_elf_tls_setup): Disable params->plt_localentry0 when power10
code detected.
(ppc64_elf_size_stubs): Update __glink_PLTresolve eh_frame.
(ppc64_elf_build_stubs): Move r2 save to start of __glink_PLTresolve,
and only emit for has_plt_localentry0. Don't use r2 in the stub.
ld/
* testsuite/ld-powerpc/elfv2so.d,
* testsuite/ld-powerpc/notoc2.d,
* testsuite/ld-powerpc/tlsdesc.wf,
* testsuite/ld-powerpc/tlsdesc2.d,
* testsuite/ld-powerpc/tlsdesc2.wf,
* testsuite/ld-powerpc/tlsopt5.d,
* testsuite/ld-powerpc/tlsopt5.wf,
* testsuite/ld-powerpc/tlsopt6.d,
* testsuite/ld-powerpc/tlsopt6.wf: Update __glink_PLTresolve.
While the skiboot linker script bears some culpability in this PR,
it's also true that the GOT indirect to GOT relative optimisation for
16-bit offsets isn't safe. At least, it isn't safe to remove the GOT
entry based on distance between the GOT pointer and symbol calculated
from the preliminary layout. So this patch removes that optimisation,
and reduces the range allowed for 32-bit and 34-bit offsets.
PR 24704
bfd/
* elf64-ppc.c (R_PPC64_GOT16_DS): Don't set has_gotrel.
(ppc64_elf_edit_toc): Don't remove R_PPC64_GOT16_DS got entries.
Reduce range of offsets allowed for other GOT relocs.
ld/
* testsuite/ld-powerpc/elfv2exe.d: Update.
* testsuite/ld-powerpc/elfv2so.d: Update.
This adds support for ".localentry 1", a new st_other
STO_PPC64_LOCAL_MASK encoding that signifies a function with a single
entry point like ".localentry 0", but unlike a ".localentry 0"
function does not preserve r2.
include/
* elf/ppc64.h: Specify byte offset to local entry for values
of two to six in STO_PPC64_LOCAL_MASK. Clarify r2 return
value for such functions when entering via global entry point.
Specify meaning of a value of one in STO_PPC64_LOCAL_MASK.
bfd/
* elf64-ppc.c (ppc64_elf_size_stubs): Use a ppc_stub_long_branch_r2off
for calls to symbols with STO_PPC64_LOCAL_MASK bits set to 1.
gas/
* config/tc-ppc.c (ppc_elf_localentry): Allow .localentry values
of 1 and 7 to directly set value into STO_PPC64_LOCAL_MASK bits.
ld/testsuite/
* ld-powerpc/elfv2.s: Add .localentry f5,1 testcase.
* ld-powerpc/elfv2exe.d: Update.
* ld-powerpc/elfv2so.d: Update.
Necessary if gcc is to use PLT16 relocs to implement -mlongcall, and
there isn't a good technical reason why local symbols should be
excluded from PLT16 support. Non-ifunc local symbol PLT entries go in
a separate section to other PLT entries. In a fixed position
executable they won't need to be relocated, and in a PIE or shared
library I chose to not implement lazy relocation.
bfd/
* elf64-ppc.c (LOCAL_PLT_ENTRY_SIZE): Define.
(struct ppc_stub_hash_entry): Add symtype field.
(PLT_KEEP): Define.
(struct ppc_link_hash_table): Add pltlocal and relpltlocal.
(create_linkage_sections): Create pltlocal and relpltlocal.
(ppc64_elf_check_relocs): Allow PLT relocs on local symbols.
Set PLT_KEEP.
(ppc64_elf_adjust_dynamic_symbol): Keep PLT entries for inline calls.
(allocate_dynrelocs): Allocate pltlocal and relpltlocal.
(ppc64_elf_size_dynamic_sections): Size pltlocal and relpltlocal.
Keep PLT entries for inline calls against locals.
(ppc_build_one_stub): Use pltlocal as appropriate.
(ppc_size_one_stub): Likewise.
(ppc64_elf_size_stubs): Set symtype.
(build_global_entry_stubs_and_plt): Init pltlocal and write
relpltlocal for globals.
(write_plt_relocs_for_local_syms): Likewise for local syms.
(ppc64_elf_relocate_section): Support PLT for local syms.
* elf32-ppc.c (PLT_KEEP): Define.
(struct ppc_elf_link_hash_table): Add pltlocal and relpltlocal.
(ppc_elf_create_glink): Create pltlocal and relpltlocal.
(ppc_elf_check_relocs): Allow PLT relocs on local symbols.
Set PLT_KEEP. Adjust update_local_sym_info call.
(ppc_elf_adjust_dynamic_symbol): Keep PLT entries for inline calls.
(allocate_dynrelocs): Allocate pltlocal and relpltlocal.
(ppc_elf_size_dynamic_sections): Size pltlocal and relpltlocal.
(ppc_elf_relocate_section): Support PLT16 relocs for local syms.
(write_global_sym_plt): Init pltlocal and write relpltlocal.
(ppc_finish_symbols): Likewise for locals.
ld/
* emulparams/elf32ppc.sh (OTHER_RELRO_SECTIONS_2): Add .branch_lt.
(OTHER_GOT_RELOC_SECTIONS): Add .rela.branch_lt.
* testsuite/ld-powerpc/elfv2so.d: Update for symbol/stub reordering.
* testsuite/ld-powerpc/relbrlt.d: Likewise.
* testsuite/ld-powerpc/relbrlt.s: Likewise.
* testsuite/ld-powerpc/tlsso.r: Likewise.
* testsuite/ld-powerpc/tlstocso.r: Likewise.
gold/
* powerpc.cc (Target_powerpc::lplt_): New variable.
(Target_powerpc::lplt_section): Associated accessor.
(Target_powerpc::plt_off): Handle local non-ifunc symbols.
(Target_powerpc::make_lplt_section): New function.
(Target_powerpc::make_local_plt_entry): New function.
(Powerpc_relobj::do_relocate_sections): Write out lplt.
(Output_data_plt_powerpc::first_plt_entry_offset): Zero for lplt.
(Output_data_plt_powerpc::add_local_entry): New function.
(Output_data_plt_powerpc::do_write): Ignore lplt.
(Target_powerpc::make_iplt_section): Make lplt first.
(Target_powerpc::make_brlt_section): Make .branch_lt relro.
(Target_powerpc::Scan::local): Handle PLT16 relocs.
This reverts most of commit 1be5d8d3bb.
Left in place are addition of --no-plt-align to some ppc32 ld tests
and the ld.texinfo --no-plt-thread-safe fix.
This is in preparation for the next patch adding Spectre variant 2
mitigation for PowerPC and PowerPC64. Besides tidying code involved
in stub output (to reduce the number of places where bctr is output),
the patch adds some user visible features:
1) PowerPC64 ELFv2 global entry stubs now are aligned under the
control of --plt-align, with a default alignment of 32 bytes.
2) PowerPC64 __glink_PLTresolve is no longer padded out with nops.
3) PowerPC32 PLT stubs are aligned under the control of --plt-align,
with the default alignment being 16 bytes as before.
4) The PowerPC32 branch/nop table emitted before __glink_PLTresolve
is now smaller in many cases. It was sized incorrectly when the
__tls_get_addr_opt stub was used, and unnecessarily included space
for local ifuncs.
bfd/
* elf32-ppc.c (GLINK_ENTRY_SIZE): Add parameters, handle
__tls_get_addr_opt, and alignment sizing.
(TLS_GET_ADDR_GLINK_SIZE): Delete.
(is_nonpic_glink_stub): Don't use GLINK_ENTRY_SIZE.
(ppc_elf_get_synthetic_symtab): Recognize stubs spaced at 4, 6,
or 8 insns.
(ppc_elf_link_hash_table_create): Init new ppc_elf_params field.
(allocate_dynrelocs): Use new GLINK_ENTRY_SIZE.
(ppc_elf_size_dynamic_sections): Likewise. Size branch table
by PLT reloc count.
(write_glink_stub): Handle __tls_get_addr_opt stub.
Pad out to size given by GLINK_ENTRY_SIZE.
(ppc_elf_relocate_section): Adjust write_glink_stub call.
(ppc_elf_finish_dynamic_symbol): Likewise.
(ppc_elf_finish_dynamic_sections): Write PLTresolve without using
insn array since so many need rewriting.
* elf32-ppc.h (struct ppc_elf_params): Add plt_stub_align.
* elf64-ppc.c (GLINK_PLTRESOLVE_SIZE): Rename from
GLINK_CALL_STUB_SIZE. Add htab param and evaluate to size without
nops. Adjust all uses.
(ppc64_elf_get_synthetic_symtab): Don't use GLINK_CALL_STUB_SIZE
in glink_vma calculation.
(struct ppc_link_hash_table): Add global_entry section pointer.
(create_linkage_sections): Create separate section for global
entry stubs.
(PPC_LO, PPC_HI, PPC_HA): Move earlier.
(size_global_entry_stubs): Handle sizing for aligned stubs.
(ppc64_elf_size_dynamic_sections): Handle global_entry alloc,
and don't stash end of glink branch table in rawsize.
(ppc_build_one_stub): Rewrite stub size calculations.
(build_global_entry_stubs): Use new section.
(ppc64_elf_build_stubs): Don't pad __glink_PLTresolve with nops.
Build lazy link stubs out to end of section. Build global entry
stubs in new section.
gold/
* options.h (plt_align): Support for PowerPC32 too.
* powerpc.cc (Stub_table::stub_align): Heed --plt-align for 32-bit.
(Stub_table::plt_call_size, branch_stub_size): Tidy.
(Stub_table::plt_call_align): Implement using stub_align.
(Output_data_glink::global_entry_align): New function.
(Output_data_glink::global_entry_off): New function.
(Output_data_glink::global_entry_address): Use global_entry_off.
(Output_data_glink::pltresolve_size): New function, replacing
pltresolve_size_ constant. Update all uses.
(Output_data_glink::add_global_entry): Align offset.
(Output_data_glink::set_final_data_size): Use global_entry_align.
(Stub_table::do_write): Don't pad __glink_PLTrelsolve with nops.
Tidy stub output. Use global_entry_off.
ld/
* emultempl/ppc32elf.em (params): Init new field.
(enum ppc32_opt): New enum to define OPTION_* values. Add
OPTION_PLT_ALIGN and OPTION_NO_PLT_ALIGN.
(PARSE_AND_LIST_LONGOPTS): Handle new options.
(PARSE_AND_LIST_ARGS_CASES): Likewise.
(PARSE_AND_LIST_OPTIONS): Likewise. Break up help output.
* emultempl/ppc64elf.em (ppc_add_stub_section): Init alignment
correctly for negative --plt-stub-align.
* testsuite/ld-powerpc/elfv2exe.d,
* testsuite/ld-powerpc/elfv2so.d,
* testsuite/ld-powerpc/relbrlt.d,
* testsuite/ld-powerpc/relbrlt.s,
* testsuite/ld-powerpc/tlsexe.d,
* testsuite/ld-powerpc/tlsexe.r,
* testsuite/ld-powerpc/tlsexe32.d,
* testsuite/ld-powerpc/tlsexe32.g,
* testsuite/ld-powerpc/tlsexe32.r,
* testsuite/ld-powerpc/tlsexetoc.d,
* testsuite/ld-powerpc/tlsexetoc.r,
* testsuite/ld-powerpc/tlsopt5_32.d,
* testsuite/ld-powerpc/tlsso.d,
* testsuite/ld-powerpc/tlstocso.d: Update for changed stub order.
This changes the PowerPC64 --plt-align option to perform the usual
alignment of code as suggested by its name, as well as the previous
behaviour of padding so as to reduce boundary crossing. The old
behaviour is had by using a negative parameter.
The default is also changed to align plt stub code by default to 32
byte boundaries, the point being to get better bctr branch prediction
on power8 and power9 hardware.
bfd/
* elf64-ppp.c (plt_stub_pad): Handle positive and negative
plt_stub_align.
ld/
* ld.texinfo (--plt-align): Describe new behaviour of option.
* emultempl/ppc64elf.em (params): Default plt_stub_align to 5.
* testsuite/ld-powerpc/powerpc.exp: Pass --no-plt-align for
selected tests.
* testsuite/ld-powerpc/relbrlt.d: Pass --no-plt-align.
* testsuite/ld-powerpc/elfv2so.d: Adjust expected output.
ELFv2 functions with localentry:0 are those with a single entry point,
ie. global entry == local entry, and that have no requirement on r2 or
r12, and guarantee r2 is unchanged on return. Such an external
function can be called via the PLT without saving r2 or restoring it
on return, avoiding a common load-hit-store for small functions. The
optimization is attractive. The TOC pointer load-hit-store is a major
reason why calls to small functions that need no register saves, or
with shrink-wrap, no register saves on a fast path, are slow on
powerpc64le.
To be safe, this optimization needs ld.so support to check that the
run-time matches link-time function implementation. If a function
in a shared library with st_other localentry non-zero is called
without saving and restoring r2, r2 will be trashed on return, leading
to segfaults. For that reason the optimization does not happen for
weak functions since a weak definition is a fairly solid hint that the
function will likely be overridden. I'm also not enabling the
optimization by default unless glibc-2.26 is detected, which should
have the ld.so checks implemented.
bfd/
* elf64-ppc.c (struct ppc_link_hash_table): Add has_plt_localentry0.
(ppc64_elf_merge_symbol_attribute): Merge localentry bits from
dynamic objects.
(is_elfv2_localentry0): New function.
(ppc64_elf_tls_setup): Default params->plt_localentry0.
(plt_stub_size): Adjust size for tls_get_addr_opt stub.
(build_tls_get_addr_stub): Use a simpler stub when r2 is not saved.
(ppc64_elf_size_stubs): Leave stub_type as ppc_stub_plt_call for
optimized localentry:0 stubs.
(ppc64_elf_build_stubs): Save r2 in ELFv2 __glink_PLTresolve.
(ppc64_elf_relocate_section): Leave nop unchanged for optimized
localentry:0 stubs.
(ppc64_elf_finish_dynamic_sections): Set PPC64_OPT_LOCALENTRY in
DT_PPC64_OPT.
* elf64-ppc.h (struct ppc64_elf_params): Add plt_localentry0.
include/
* elf/ppc64.h (PPC64_OPT_LOCALENTRY): Define.
ld/
* emultempl/ppc64elf.em (params): Init plt_localentry0 field.
(enum ppc64_opt): New, replacing OPTION_* defines. Add
OPTION_PLT_LOCALENTRY, and OPTION_NO_PLT_LOCALENTRY.
(PARSE_AND_LIST_*): Support --plt-localentry and --no-plt-localentry.
* testsuite/ld-powerpc/elfv2so.d: Update.
* testsuite/ld-powerpc/powerpc.exp (TLS opt 5): Use --no-plt-localentry.
* testsuite/ld-powerpc/tlsopt5.d: Update.
Commit 23283c1b changed the layout of some bss style sections on
powerpc64, but neglected to add a page gap before the third PT_LOAD
segment created by this reording. Without a page gap we get two
PT_LOAD headers that overlap by one page in memory. That shouldn't be
allowed because the dynamic loader will load garbage from the first
page of the last segment over the last page of the previous segment.
bfd/
* elf.c (_bfd_elf_map_sections_to_segments): Do not make a new
segment for loaded sections after nonloaded sections if the
sections are on the same page.
ld/testsuite/
* ld-powerpc/elfv2so.d: Update
This moves .got too, which requires .sdata and .sbss to move with it,
because these sections share addressing via the toc pointer and with
small-model code must be within a 16-bit signed offset. .plt, .iplt
and .branch_lt must also be moved since they are addressed via a
32-bit offset from the toc pointer, and we might have a very large
.data section.
This change means we may have some bss style sections before the data
segment, necessitating another PT_LOAD header. Also, since _edata is
defined at the end of the data segment it's possible with an empty
.data to have _edata at the end of .plt which looks a little unusual
since .plt is a bss style section. That should only happen rarely in
real world binaries, but does occur in the ld testsuite.
ld/
* emulparams/elf64ppc.sh (BSS_PLT): Don't define.
(OTHER_READWRITE_SECTIONS): Move .branch_lt to..
(OTHER_RELRO_SECTIONS_2): ..here.
(DATA_GOT, SEPARATE_GOTPLT, DATA_SDATA, DATA_PLT,
PLT_BEFORE_GOT): Define.
* scripttempl/elf.sc: Handle DATA_SDATA and DATA_GOT/DATA_PLT/
PLT_BEFORE_GOT combination.
(DATA_GOT, SDATA_GOT): Don't define if either is already defined.
ld/testsuite/
* ld-powerpc/ambiguousv1.d,
* ld-powerpc/ambiguousv1b.d,
* ld-powerpc/ambiguousv2.d,
* ld-powerpc/ambiguousv2b.d,
* ld-powerpc/elfv2exe.d,
* ld-powerpc/elfv2so.d,
* ld-powerpc/tlsexe.r,
* ld-powerpc/tlsexetoc.r,
* ld-powerpc/tlsso.r,
* ld-powerpc/tlstocso.r: Update.