libctf, ld: fix symtypetab and var section population under ld -r

The variable section in a CTF dict is meant to contain the types of
variables that do not appear in the symbol table (mostly file-scope
static declarations).  We implement this by having the compiler emit
all potential data symbols into both sections, then delete those
symbols from the variable section that correspond to data symbols the
linker has reported.

Unfortunately, the check for this in ctf_serialize is wrong: rather than
checking the set of linker-reported symbols, we check the set of names
in the data object symtypetab section: if the linker has reported no
symbols at all (usually if ld -r has been run, or if a non-linker
program that does not use symbol tables is calling ctf_link) this will
include every single symbol, emptying the variable section completely.

Worse, when ld -r is in use, we want to force writeout of every
symtypetab entry on the inputs, in an indexed section, whether or not
the linker has reported them, since this isn't a final link yet and the
symbol table is not finalized (and may grow more symbols than the linker
has yet reported).  But the check for this is flawed too: we were
relying on ctf_link_shuffle_syms not having been called if no symbols
exist, but that function is *always* called by ld even when ld -r is in
use: ctf_link_add_linker_symbol is the one that's not called when there
are no symbols.

We clearly need to rethink this.  Using the emptiness of the set of
reported symbols as a test for ld -r is just ugly: the linker already
knows if ld -r is underway and can just tell us.  So add a new linker
flag CTF_LINK_NO_FILTER_REPORTED_SYMS that is set to stop the linker
filtering the symbols in the symtypetab sections using the set that the
linker has reported: use the presence or absence of this flag to
determine whether to emit unindexed symtabs: we only remove entries from
the variable section when filtering symbols, and we only remove them if
they are in the reported symbol set, fixing the case where no symbols
are reported by the linker at all.

(The negative sense of the new CTF_LINK flag is intentional: the common
case, both for ld and for simple tools that want to do a ctf_link with
no ELF symbol table in sight, is probably to filter out symbols that no
linker has reported: i.e., for the simple tools, all of them.)

There's another wrinkle, though.  It is quite possible for a non-linker
to add symbols to a dict via ctf_add_*_sym and then write it out via the
ctf_write APIs: perhaps it's preparing a dict for a later linker
invocation.  Right now this would not lead to anything terribly
meaningful happening: ctf_serialize just assumes it was called via
ctf_link if symbols are present.  So add an (internal-to-libctf) flag
that indicates that a writeout is happening via ctf_link_write, and set
it there (propagating it to child dicts as needed).  ctf_serialize can
then spot when it is not being called by a linker, and arrange to always
write out an indexed, sorted symtypetab for fastest possible future
symbol lookup by name in that case.  (The writeouts done by ld -r are
unsorted, because the only thing likely to use those symtabs is the
linker, which doesn't benefit from symtypetab sorting.)

Tests added for all three linking cases (ld -r, ld -shared, ld), with a
bit of testsuite framework enhancement to stop it unconditionally
linking the CTF to be checked by the lookup program with -shared, so
tests can now examine CTF linked with -r or indeed with no flags at all,
though the output filename is still foo.so even in this case.

Another test added for the non-linker case that endeavours to determine
whether the symtypetab is sorted by examining the order of entries
returned from ctf_symbol_next: nobody outside libctf should rely on
this ordering, but this test is not outside libctf :)

include/ChangeLog
2021-01-26  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-api.h (CTF_LINK_NO_FILTER_REPORTED_SYMS): New.

ld/ChangeLog
2021-01-26  Nick Alcock  <nick.alcock@oracle.com>

	* ldlang.c (lang_merge_ctf): Set CTF_LINK_NO_FILTER_REPORTED_SYMS
	when appropriate.

libctf/ChangeLog
2021-01-27  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-impl.c (_libctf_nonnull_): Add parameters.
	(LCTF_LINKING): New flag.
	(ctf_dict_t) <ctf_link_flags>: Mention it.
	* ctf-link.c (ctf_link): Keep LCTF_LINKING set across call.
	(ctf_write): Likewise, including in child dictionaries.
	(ctf_link_shuffle_syms): Make sure ctf_dynsyms is NULL if there
	are no reported symbols.
	* ctf-create.c (symtypetab_delete_nonstatic_vars): Make sure
	the variable has been reported as a symbol by the linker.
	(symtypetab_skippable): Mention relationship between SYMFP and the
	flags.
	(symtypetab_density): Adjust nonnullity.  Exit early if no symbols
	were reported and force-indexing is off (i.e., we are doing a
	final link).
	(ctf_serialize): Handle the !LCTF_LINKING case by writing out an
	indexed, sorted symtypetab (and allow SYMFP to be NULL in this
	case).  Turn sorting off if this is a non-final link.  Only delete
	nonstatic vars if we are filtering symbols and the linker has
	reported some.
	* testsuite/libctf-regression/nonstatic-var-section-ld-r*:
	New test of variable and symtypetab section population when
	ld -r is used.
	* testsuite/libctf-regression/nonstatic-var-section-ld-executable.lk:
	Likewise, when ld of an executable is used.
	* testsuite/libctf-regression/nonstatic-var-section-ld.lk:
	Likewise, when ld -shared alone is used.
	* testsuite/libctf-regression/nonstatic-var-section-ld*.c:
	Lookup programs for the above.
	* testsuite/libctf-writable/symtypetab-nonlinker-writeout.*: New
	test, testing survival of symbols across ctf_write paths.
	* testsuite/lib/ctf-lib.exp (run_lookup_test): New option,
	nonshared, suppressing linking of the SOURCE with -shared.
This commit is contained in:
Nick Alcock 2021-01-16 16:49:29 +00:00
parent 1a2f1b54a5
commit 35a01a0454
17 changed files with 612 additions and 57 deletions

View file

@ -0,0 +1,218 @@
/* Make sure that writing out a dict with a symtypetab without going via
ctf_link_write (as a compiler might do to generate input destined for a
linker) always writes out a complete indexed, sorted symtypetab, ignoring the
set of symbols reported (if any). Also a test of dynamic dict sym
iteration. */
#include <ctf-api.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static int
report_sym (ctf_dict_t *fp, ctf_link_sym_t *sym, const char *name,
uint32_t idx, uint32_t st_type)
{
sym->st_name = name;
sym->st_symidx = idx;
sym->st_type = st_type;
return ctf_link_add_linker_symbol (fp, sym);
}
static void
try_maybe_reporting (int report)
{
ctf_dict_t *fp;
ctf_id_t func, func2, func3, base, base2, base3;
ctf_encoding_t e = { CTF_INT_SIGNED, 0, sizeof (long) };
ctf_id_t dummy;
ctf_funcinfo_t fi;
ctf_next_t *i = NULL;
ctf_id_t symtype;
const char *symname;
unsigned char *buf;
size_t bufsiz;
int err;
if ((fp = ctf_create (&err)) == NULL)
goto create_err;
/* Add a couple of sets of types to hang symbols off. We use multiple
identical types so we can distinguish between distinct func / data symbols
later on. */
if (((base = ctf_add_integer (fp, CTF_ADD_ROOT, "long int", &e)) == CTF_ERR) ||
((base2 = ctf_add_integer (fp, CTF_ADD_ROOT, "long int", &e)) == CTF_ERR) ||
((base3 = ctf_add_integer (fp, CTF_ADD_ROOT, "long int", &e)) == CTF_ERR))
goto create_types_err;
fi.ctc_return = base;
fi.ctc_argc = 0;
fi.ctc_flags = 0;
if (((func = ctf_add_function (fp, CTF_ADD_ROOT, &fi, &dummy)) == CTF_ERR) ||
((func2 = ctf_add_function (fp, CTF_ADD_ROOT, &fi, &dummy)) == CTF_ERR) ||
((func3 = ctf_add_function (fp, CTF_ADD_ROOT, &fi, &dummy)) == CTF_ERR))
goto create_types_err;
/* Add some function and data symbols. We intentionally add the symbols in
near-inverse order by symbol name, so that we can tell whether the
(necessarily indexed) section was sorted (since the sort is always in
lexicographical sort ordef by name). */
if ((ctf_add_objt_sym (fp, "data_c", base) < 0) ||
(ctf_add_objt_sym (fp, "data_a", base2) < 0) ||
(ctf_add_objt_sym (fp, "data_b", base3) < 0))
goto create_syms_err;
if ((ctf_add_func_sym (fp, "func_c", func) < 0) ||
(ctf_add_func_sym (fp, "func_a", func2) < 0) ||
(ctf_add_func_sym (fp, "func_b", func3) < 0))
goto create_syms_err;
/* Make sure we can iterate over them in a dynamic dict and that they have the
right types. We don't care about their order at this stage, which makes
the validation here a bit more verbose than it is below. */
while ((symtype = ctf_symbol_next (fp, &i, &symname, 0)) != CTF_ERR)
{
if (symtype == base && strcmp (symname, "data_c") == 0)
continue;
if (symtype == base2 && strcmp (symname, "data_a") == 0)
continue;
if (symtype == base3 && strcmp (symname, "data_b") == 0)
continue;
goto iter_compar_err;
}
if (ctf_errno (fp) != ECTF_NEXT_END)
goto iter_err;
while ((symtype = ctf_symbol_next (fp, &i, &symname, 1)) != CTF_ERR)
{
if (symtype == func && strcmp (symname, "func_c") == 0)
continue;
if (symtype == func2 && strcmp (symname, "func_a") == 0)
continue;
if (symtype == func3 && strcmp (symname, "func_b") == 0)
continue;
goto iter_compar_err;
}
if (ctf_errno (fp) != ECTF_NEXT_END)
goto iter_err;
/* Possibly report some but not all of the symbols, as if we are a linker (no
real program would do this without using the ctf_link APIs, but it's not
*prohibited*, just useless, and if they do we don't want things to
break. In particular we want all the symbols written out, reported or no,
ignoring the reported symbol set entirely.) */
if (report)
{
ctf_link_sym_t sym;
sym.st_nameidx_set = 0;
sym.st_nameidx = 0;
sym.st_shndx = 404; /* Arbitrary, not SHN_UNDEF or SHN_EXTABS. */
sym.st_value = 404; /* Arbitrary, nonzero. */
/* STT_OBJECT: 1. Don't rely on the #define being visible: this may be a
non-ELF platform! */
if (report_sym (fp, &sym, "data_c", 2, 1) < 0 ||
report_sym (fp, &sym, "data_a", 3, 1) < 0)
goto report_err;
/* STT_FUNC: 2. */
if (report_sym (fp, &sym, "func_c", 4, 2) < 0 ||
report_sym (fp, &sym, "func_a", 5, 2) < 0)
goto report_err;
}
/* Write out, to memory. */
if ((buf = ctf_write_mem (fp, &bufsiz, 4096)) == NULL)
goto write_err;
ctf_file_close (fp);
/* Read back in. */
if ((fp = ctf_simple_open ((const char *) buf, bufsiz, NULL, 0, 0, NULL,
0, &err)) == NULL)
goto open_err;
/* Verify symbol order against the order we expect if this dict is sorted and
indexed. */
struct ctf_symtype_expected
{
const char *name;
ctf_id_t id;
} *expected;
struct ctf_symtype_expected expected_obj[] = { { "data_a", base2 },
{ "data_b", base3 },
{ "data_c", base }, NULL };
struct ctf_symtype_expected expected_func[] = { { "func_a", func2 },
{ "func_b", func3 },
{ "func_c", func }, NULL };
expected = expected_obj;
while ((symtype = ctf_symbol_next (fp, &i, &symname, 0)) != CTF_ERR)
{
if (expected == NULL)
goto expected_overshoot_err;
if (symtype != expected->id || strcmp (symname, expected->name) != 0)
goto expected_compar_err;
printf ("Seen: %s\n", symname);
expected++;
}
expected = expected_func;
while ((symtype = ctf_symbol_next (fp, &i, &symname, 1)) != CTF_ERR)
{
if (expected == NULL)
goto expected_overshoot_err;
if (symtype != expected->id || strcmp (symname, expected->name) != 0)
goto expected_compar_err;
printf ("Seen: %s\n", symname);
expected++;
}
ctf_file_close (fp);
return;
create_err:
fprintf (stderr, "Creation failed: %s\n", ctf_errmsg (err));
exit (1);
open_err:
fprintf (stderr, "Reopen failed: %s\n", ctf_errmsg (err));
exit (1);
create_types_err:
fprintf (stderr, "Cannot create types: %s\n", ctf_errmsg (ctf_errno (fp)));
exit (1);
create_syms_err:
fprintf (stderr, "Cannot create syms: %s\n", ctf_errmsg (ctf_errno (fp)));
exit (1);
iter_compar_err:
fprintf (stderr, "Dynamic iteration comparison failure: %s "
"(reported type: %lx)\n", symname, symtype);
exit (1);
iter_err:
fprintf (stderr, "Cannot iterate: %s\n", ctf_errmsg (ctf_errno (fp)));
exit (1);
report_err:
fprintf (stderr, "Cannot report symbol: %s\n", ctf_errmsg (ctf_errno (fp)));
exit (1);
write_err:
fprintf (stderr, "Cannot write out: %s\n", ctf_errmsg (ctf_errno (fp)));
exit (1);
expected_overshoot_err:
fprintf (stderr, "Too many symbols in post-writeout comparison\n");
exit (1);
expected_compar_err:
fprintf (stderr, "Non-dynamic iteration comparison failure: %s "
"(type %lx): expected %s (type %lx)\n", symname, symtype,
expected->name, expected->id);
exit (1);
}
int
main (int argc, char *argv[])
{
try_maybe_reporting (0);
try_maybe_reporting (1);
}

View file

@ -0,0 +1,12 @@
Seen: data_a
Seen: data_b
Seen: data_c
Seen: func_a
Seen: func_b
Seen: func_c
Seen: data_a
Seen: data_b
Seen: data_c
Seen: func_a
Seen: func_b
Seen: func_c