ppc64 ld optimises sequences like the following
addis 3,13,wot@tprel@ha
lwz 3,wot@tprel@l(3)
to
nop
lwz 3,wot@tprel(13)
when "wot" is located near enough to the thread pointer.
However, the ABI doesn't require that R_PPC64_TPREL16_HA always be on
an addis rt,13,imm instruction, and while ld checked for that on the
high-part instruction it didn't disable the optimisation on the
low-part instruction. This patch fixes that problem, disabling the
tprel optimisation globally if high-part instructions don't pass
sanity checks. The optimisation is also enabled for ppc32, where
before ld.bfd had the code in the wrong place and ld.gold had it in a
block only enabled for ppc64.
bfd/
* elf32-ppc.c (ppc_elf_check_relocs): Set has_tls_reloc for
high part tprel16 relocs.
(ppc_elf_tls_optimize): Sanity check high part tprel16 relocs.
Clear do_tls_opt on odd instructions.
(ppc_elf_relocate_section): Move TPREL16_HA/LO optimisation later.
Don't sanity check them here.
* elf64-ppc.c (ppc64_elf_check_relocs): Set has_tls_reloc for
high part tprel16 relocs.
(ppc64_elf_tls_optimize): Sanity check high part tprel16 relocs.
Clear do_tls_opt on odd instructions.
(ppc64_elf_relocate_section): Don't sanity check TPREL16_HA.
ld/
* testsuite/ld-powerpc/tls32.d: Update for TPREL_HA/LO optimisation.
* testsuite/ld-powerpc/tlsexe32.d: Likewise.
* testsuite/ld-powerpc/tlsldopt32.d: Likewise.
* testsuite/ld-powerpc/tlsmark32.d: Likewise.
* testsuite/ld-powerpc/tlsopt4_32.d: Likewise.
* testsuite/ld-powerpc/tprel.s,
* testsuite/ld-powerpc/tprel.d,
* testsuite/ld-powerpc/tprel32.d: New tests.
* testsuite/ld-powerpc/tprelbad.s,
* testsuite/ld-powerpc/tprelbad.d: New test.
* testsuite/ld-powerpc/powerpc.exp: Run them.
gold/
* powerpc.cc (Target_powerpc): Add tprel_opt_ and accessors.
(Target_powerpc::Scan::local): Sanity check tprel high relocs.
(Target_powerpc::Scan::global): Likewise.
(Target_powerpc::Relocate::relocate): Control tprel optimisation
with tprel_opt_ and enable for 32-bit.
When an instruction has operands, the PowerPC disassembler prints
spaces after the opcode so as to line up operands. If the operands
are all optional and all default value, then no operands are printed,
leaving trailing spaces. This patch fixes that.
opcodes/
* ppc-dis.c (print_insn_powerpc): Delay printing spaces after
opcode until first operand is output.
gas/
* testsuite/gas/ppc/476.d: Remove trailing spaces.
* testsuite/gas/ppc/a2.d: Likewise.
* testsuite/gas/ppc/booke.d: Likewise.
* testsuite/gas/ppc/booke_xcoff.d: Likewise.
* testsuite/gas/ppc/e500.d: Likewise.
* testsuite/gas/ppc/e500mc.d: Likewise.
* testsuite/gas/ppc/e6500.d: Likewise.
* testsuite/gas/ppc/htm.d: Likewise.
* testsuite/gas/ppc/power6.d: Likewise.
* testsuite/gas/ppc/power8.d: Likewise.
* testsuite/gas/ppc/power9.d: Likewise.
* testsuite/gas/ppc/vle.d: Likewise.
ld/
* testsuite/ld-powerpc/tlsexe32.d: Remove trailing spaces.
* testsuite/ld-powerpc/tlsopt5.d: Likewise.
* testsuite/ld-powerpc/tlsopt5_32.d: Likewise.
This is in preparation for the next patch adding Spectre variant 2
mitigation for PowerPC and PowerPC64. Besides tidying code involved
in stub output (to reduce the number of places where bctr is output),
the patch adds some user visible features:
1) PowerPC64 ELFv2 global entry stubs now are aligned under the
control of --plt-align, with a default alignment of 32 bytes.
2) PowerPC64 __glink_PLTresolve is no longer padded out with nops.
3) PowerPC32 PLT stubs are aligned under the control of --plt-align,
with the default alignment being 16 bytes as before.
4) The PowerPC32 branch/nop table emitted before __glink_PLTresolve
is now smaller in many cases. It was sized incorrectly when the
__tls_get_addr_opt stub was used, and unnecessarily included space
for local ifuncs.
bfd/
* elf32-ppc.c (GLINK_ENTRY_SIZE): Add parameters, handle
__tls_get_addr_opt, and alignment sizing.
(TLS_GET_ADDR_GLINK_SIZE): Delete.
(is_nonpic_glink_stub): Don't use GLINK_ENTRY_SIZE.
(ppc_elf_get_synthetic_symtab): Recognize stubs spaced at 4, 6,
or 8 insns.
(ppc_elf_link_hash_table_create): Init new ppc_elf_params field.
(allocate_dynrelocs): Use new GLINK_ENTRY_SIZE.
(ppc_elf_size_dynamic_sections): Likewise. Size branch table
by PLT reloc count.
(write_glink_stub): Handle __tls_get_addr_opt stub.
Pad out to size given by GLINK_ENTRY_SIZE.
(ppc_elf_relocate_section): Adjust write_glink_stub call.
(ppc_elf_finish_dynamic_symbol): Likewise.
(ppc_elf_finish_dynamic_sections): Write PLTresolve without using
insn array since so many need rewriting.
* elf32-ppc.h (struct ppc_elf_params): Add plt_stub_align.
* elf64-ppc.c (GLINK_PLTRESOLVE_SIZE): Rename from
GLINK_CALL_STUB_SIZE. Add htab param and evaluate to size without
nops. Adjust all uses.
(ppc64_elf_get_synthetic_symtab): Don't use GLINK_CALL_STUB_SIZE
in glink_vma calculation.
(struct ppc_link_hash_table): Add global_entry section pointer.
(create_linkage_sections): Create separate section for global
entry stubs.
(PPC_LO, PPC_HI, PPC_HA): Move earlier.
(size_global_entry_stubs): Handle sizing for aligned stubs.
(ppc64_elf_size_dynamic_sections): Handle global_entry alloc,
and don't stash end of glink branch table in rawsize.
(ppc_build_one_stub): Rewrite stub size calculations.
(build_global_entry_stubs): Use new section.
(ppc64_elf_build_stubs): Don't pad __glink_PLTresolve with nops.
Build lazy link stubs out to end of section. Build global entry
stubs in new section.
gold/
* options.h (plt_align): Support for PowerPC32 too.
* powerpc.cc (Stub_table::stub_align): Heed --plt-align for 32-bit.
(Stub_table::plt_call_size, branch_stub_size): Tidy.
(Stub_table::plt_call_align): Implement using stub_align.
(Output_data_glink::global_entry_align): New function.
(Output_data_glink::global_entry_off): New function.
(Output_data_glink::global_entry_address): Use global_entry_off.
(Output_data_glink::pltresolve_size): New function, replacing
pltresolve_size_ constant. Update all uses.
(Output_data_glink::add_global_entry): Align offset.
(Output_data_glink::set_final_data_size): Use global_entry_align.
(Stub_table::do_write): Don't pad __glink_PLTrelsolve with nops.
Tidy stub output. Use global_entry_off.
ld/
* emultempl/ppc32elf.em (params): Init new field.
(enum ppc32_opt): New enum to define OPTION_* values. Add
OPTION_PLT_ALIGN and OPTION_NO_PLT_ALIGN.
(PARSE_AND_LIST_LONGOPTS): Handle new options.
(PARSE_AND_LIST_ARGS_CASES): Likewise.
(PARSE_AND_LIST_OPTIONS): Likewise. Break up help output.
* emultempl/ppc64elf.em (ppc_add_stub_section): Init alignment
correctly for negative --plt-stub-align.
* testsuite/ld-powerpc/elfv2exe.d,
* testsuite/ld-powerpc/elfv2so.d,
* testsuite/ld-powerpc/relbrlt.d,
* testsuite/ld-powerpc/relbrlt.s,
* testsuite/ld-powerpc/tlsexe.d,
* testsuite/ld-powerpc/tlsexe.r,
* testsuite/ld-powerpc/tlsexe32.d,
* testsuite/ld-powerpc/tlsexe32.g,
* testsuite/ld-powerpc/tlsexe32.r,
* testsuite/ld-powerpc/tlsexetoc.d,
* testsuite/ld-powerpc/tlsexetoc.r,
* testsuite/ld-powerpc/tlsopt5_32.d,
* testsuite/ld-powerpc/tlsso.d,
* testsuite/ld-powerpc/tlstocso.d: Update for changed stub order.
* elf32-ppc.c (ppc_elf_create_linker_section): Set SEC_LINKER_CREATED
on section. Correct comment, and add FIXME.
(ppc_elf_additional_program_headers): Don't bump header count for
interp. Test SEC_ALLOC, not SEC_LOAD, and don't test size.
(ppc_elf_size_dynamic_sections): Don't strip sdata and sdata2, but
do allocate memory if they need it.
ld/
* emulparams/elf32ppclinux.sh (OTHER_READWRITE_SECTION): Delete.
(OTHER_RELRO_SECTIONS): Set this instead.
ld/testsuite/
* ld-powerpc/tlsexe32.d: Update.
* ld-powerpc/tlsexe32.g: Update.
* ld-powerpc/tlsexe32.r: Update.
* ld-powerpc/tlsexe32.t: Update.
* ld-powerpc/tlsso32.d: Update.
* ld-powerpc/tlsso32.g: Update.
* ld-powerpc/tlsso32.r: Update.
* ld-powerpc/tlsso32.t: Update.