Most of these are harmless, but some of the type confusions and especially
a missing ctf_strerror() on an error path were actual bugs that could
have resulted in test failures crashing rather than printing an error
message.
libctf/
* testsuite/libctf-lookup/enumerator-iteration.c: Fix type
confusion, signedness confusion and a missing ctf_errmsg().
* testsuite/libctf-regression/libctf-repeat-cu-main.c: Return 0 from
the test function.
* testsuite/libctf-regression/open-error-free.c: Fix signedness
confusion.
* testsuite/libctf-regression/zrewrite.c: Remove unused label.
Three new functions for looking up the enum type containing a given
enumeration constant, and optionally that constant's value.
The simplest, ctf_lookup_enumerator, looks up a root-visible enumerator by
name in one dict: if the dict contains multiple such constants (which is
possible for dicts created by older versions of the libctf deduplicator),
ECTF_DUPLICATE is returned.
The next simplest, ctf_lookup_enumerator_next, is an iterator which returns
all enumerators with a given name in a given dict, whether root-visible or
not.
The most elaborate, ctf_arc_lookup_enumerator_next, finds all
enumerators with a given name across all dicts in an entire CTF archive,
whether root-visible or not, starting looking in the shared parent dict;
opened dicts are cached (as with all other ctf_arc_*lookup functions) so
that repeated use does not incur repeated opening costs.
All three of these return enumerator values as int64_t: unfortunately, API
compatibility concerns prevent us from doing the same with the other older
enum-related functions, which all return enumerator constant values as ints.
We may be forced to add symbol-versioning compatibility aliases that fix the
other functions in due course, bumping the soname for platforms that do not
support such things.
ctf_arc_lookup_enumerator_next is implemented as a nested ctf_archive_next
iterator, and inside that, a nested ctf_lookup_enumerator_next iterator
within each dict. To aid in this, add support to ctf_next_t iterators for
iterators that are implemented in terms of two simultaneous nested iterators
at once. (It has always been possible for callers to use as many nested or
semi-overlapping ctf_next_t iterators as they need, which is one of the
advantages of this style over the _iter style that calls a function for each
thing iterated over: the iterator change here permits *ctf_next_t iterators
themselves* to be implemented by iterating using multiple other iterators as
part of their internal operation, transparently to the caller.)
Also add a testcase that tests all these functions (which is fairly easy
because ctf_arc_lookup_enumerator_next is implemented in terms of
ctf_lookup_enumerator_next) in addition to enumeration addition in
ctf_open()ed dicts, ctf_add_enumerator duplicate enumerator addition, and
conflicting enumerator constant deduplication.
include/
* ctf-api.h (ctf_lookup_enumerator): New.
(ctf_lookup_enumerator_next): Likewise.
(ctf_arc_lookup_enumerator_next): Likewise.
libctf/
* libctf.ver: Add them.
* ctf-impl.h (ctf_next_t) <ctn_next_inner>: New.
* ctf-util.c (ctf_next_copy): Copy it.
(ctf_next_destroy): Destroy it.
* ctf-lookup.c (ctf_lookup_enumerator): New.
(ctf_lookup_enumerator_next): New.
* ctf-archive.c (ctf_arc_lookup_enumerator_next): New.
* testsuite/libctf-lookup/enumerator-iteration.*: New test.
* testsuite/libctf-lookup/enum-ctf-2.c: New test CTF, used by the
above.
This was always an error, because the ctn_fp routinely has errors set on it,
which is not something you can (or should) do to a const object.
libctf/
* ctf-impl.h (ctf_next_) <cu.ctn_fp>: Make non-const.
libctf has long prohibited addition of enums with overlapping constants in a
single enum, but now that we are properly considering enums with overlapping
constants to be conflciting types, we can go further and prohibit addition
of enumeration constants to a dict if they already exist in any enum in that
dict: the same rules as C itself.
We do this in a fashion vaguely similar to what we just did in the
deduplicator, by considering enumeration constants as identifiers and adding
them to the core type/identifier namespace, ctf_dict_t.ctf_names. This is a
little fiddly, because we do not want to prohibit opening of existing dicts
into which the deduplicator has stuffed enums with overlapping constants!
We just want to prohibit the addition of *new* enumerators that violate that
rule. Even then, it's fine to add overlapping enumerator constants as long
as at least one of them is in a non-root type. (This is essential for
proper deduplicator operation in cu-mapped mode, where multiple compilation
units can be smashed into one dict, with conflicting types marked as
hidden: these types may well contain overlapping enumerators.)
So, at open time, keep track of all enums observed, then do a third pass
through the enums alone, adding each enumerator either to the ctf_names
table as a mapping from the enumerator name to the enum it is part of (if
not already present), or to a new ctf_conflicting_enums hashtable that
tracks observed duplicates. (The latter is not used yet, but will be soon.)
(We need to do a third pass because it's quite possible to have an enum
containing an enumerator FOO followed by a type FOO: since they're processed
in order, the enumerator would be processed before the type, and at that
stage it seems nonconflicting. The easiest fix is to run through the
enumerators after all type names are interned.)
At ctf_add_enumerator time, if the enumerator to which we are adding a type
is root-visible, check for an already-present name and error out if found,
then intern the new name in the ctf_names table as is done at open time.
(We retain the existing code which scans the enum itself for duplicates
because it is still an error to add an enumerator twice to a
non-root-visible enum type; but we only need to do this if the enum is
non-root-visible, so the cost of enum addition is reduced.)
Tested in an upcoming commit.
libctf/
* ctf-impl.h (ctf_dict_t) <ctf_names>: Augment comment.
<ctf_conflicting_enums>: New.
(ctf_dynset_elements): New.
* ctf-hash.c (ctf_dynset_elements): Implement it.
* ctf-open.c (init_static_types): Split body into...
(init_static_types_internal): ... here. Count enumerators;
keep track of observed enums in pass 2; populate ctf_names and
ctf_conflicting_enums with enumerators in a third pass.
(ctf_dict_close): Free ctf_conflicting_enums.
* ctf-create.c (ctf_add_enumerator): Prohibit addition of duplicate
enumerators in root-visible enum types.
include/
* ctf-api.h (CTF_ADD_NONROOT): Describe what non-rootness
means for enumeration constants.
(ctf_add_enumerator): The name is not a misnomer.
We now require that enumerators have unique names.
Document the non-rootness of enumerators.
The libctf-regression/open-error-free.c test works by interposing malloc
and counting mallocs and frees across libctf operations. This only
works under suitably-interposable mallocs on systems supporting
dlsym (RTLD_NEXT, ...), so its operation is restricted to glibc
systems for now, but also it interacts badly with valgrind, which
interposes malloc itself. Detect a running valgrind and skip the test.
Add new facilities allowing libctf lookup tests to declare themselves
unsupported, by printing "UNSUPPORTED: " and then some meaningful
message instead of their normal output.
libctf/
* configure.ac: Check for <valgrind/valgrind.h>.
* config.h.in: Regenerate.
* configure: Likewise.
* testsuite/lib/ctf-lib.exp (run_lookup_test): Add support for
UNSUPPORTED tests.
* testsuite/libctf-regression/open-error-free.c: When running
under valgrind, this test is unsupported.
If a lookup fails for a reason unrelated to a lack of type data for this
symbol, we return with an error; but we fail to close the dict we opened
most recently, which is leaked.
libctf/
* ctf-archive.c (ctf_arc_lookup_sym_or_name): Close dict.
If ctf_add_type failed in the middle of enumerator addition, the
destination would end up containing the source enum type and some
but not all of its enumerator constants.
Use snapshots to roll back the enum addition as a whole if this happens.
Before now, it's been pretty unlikely, but in an upcoming commit we will ban
addition of enumerators that already exist in a given dict, making failure
of ctf_add_enumerator and thus of this part of ctf_add_type much more
likely.
libctf/
* ctf-create.c (ctf_add_type_internal): Roll back if enum or
enumerator addition fails.
The CTF deduplicator was not considering enumerators inside enum types to be
things that caused type conflicts, so if the following two TUs were linked
together, you would end up with the following in the resulting dict:
1.c:
enum foo { A, B };
2.c:
enum bar { A, B };
linked:
enum foo { A, B };
enum bar { A, B };
This does work -- but it's not something that's valid C, and the general
point of the shared dict is that it is something that you could potentially
get from any valid C TU.
So consider such types to be conflicting, but obviously don't consider
actually identical enums to be conflicting, even though they too have (all)
their identifiers in common. This involves surprisingly little code. The
deduplicator detects conflicting types by counting types in a hash table of
hash tables:
decorated identifier -> (type hash -> count)
where the COUNT is the number of times a given hash has been observed: any
name with more than one hash associated with it is considered conflicting
(the count is used to identify the most common such name for promotion to
the shared dict).
Before now, those identifiers were all the identifiers of types (possibly
decorated with their namespace on the front for enumerator identifiers), but
we can equally well put *enumeration constant names* in there, undecorated
like the identifiers of types in the global namespace, with the type hash
being the hash of each enum containing that enumerator. The existing
conflicting-type-detection code will then accurately identify distinct enums
with enumeration constants in common. The enum that contains the most
commonly-appearing enumerators will be promoted to the shared dict.
libctf/
* ctf-impl.h (ctf_dedup_t) <cd_name_counts>: Extend comment.
* ctf-dedup.c (ctf_dedup_count_name): New, split out of...
(ctf_dedup_populate_mappings): ... here. Call it for all
* enumeration constants in an enum as well as types.
ld/
* testsuite/ld-ctf/enum-3.c: New test CTF.
* testsuite/ld-ctf/enum-4.c: Likewise.
* testsuite/ld-ctf/overlapping-enums.d: New test.
* testsuite/ld-ctf/overlapping-enums-2.d: Likewise.
ctf_str_add_ref and ctf_str_add_movable_ref take a string and are supposed
to return a strtab offset. These offsets are "provisional": the ref
mechanism records the address of the location in which the ref is stored and
modifies it when the strtab is finally written out. Provisional refs in new
dicts start at 0 and go up via strlen() as new refs are added: this is fine,
because the strtab is empty and none of these values will overlap any
existing string offsets (since there are none). Unfortunately, when a dict
is ctf_open()ed, we fail to set the initial provisional strtab offset to a
higher value than any existing string offset: it starts at zero again!
It's a shame that we already *have* strings at those offsets...
This is all fixed up once the string is reserialized, but if you look up
newly-added strings before serialization, you get corrupted partial string
results from the existing ctf_open()ed dict.
Observed (and thus regtested) by an upcoming test (in this patch series).
Exposed by the recently-introduced series that permits modification of
ctf_open()ed dicts, which has not been released anywhere. Before that,
any attempt to do such things would fail with ECTF_RDONLY.
libctf/
* ctf-string.c (ctf_str_create_atoms): Initialize
ctf_str_prov_offset.
Ever since commit 1fa7a0c24e ("libctf: sort out potential refcount
loops") ctf_dict_close has only freed anything if the refcount on entry
to the function is precisely 1. >1 obviously just decrements the
refcount, but the linker machinery can sometimes cause freeing to recurse
from a dict to another dict and then back to the first dict again, so
we interpret a refcount of 0 as an indication that this is a recursive call
and we should just return, because a caller is already freeing this dict.
Unfortunately there is one situation in which this is not true: the bad:
codepath in ctf_bufopen entered when opening fails. Because the refcount is
bumped only at the very end of ctf_bufopen, any failure causes
ctf_dict_close to be entered with a refcount of zero, and it frees nothing
and we leak the entire dict.
The solution is to bump the refcount to 1 right before freeing... but this
codepath is clearly delicate enough that we need to properly validate it,
so we add a test that uses malloc interposition to count allocations and
frees, creates a dict, writes it out, intentionally corrupts it (by setting
a bunch of bytes after the header to a value high enough that it is
definitely not a valid CTF type kind), then tries to open it again and
counts the malloc/free pairs to make sure they're matched. (Test run only
on *-linux-gnu, because malloc interposition is not a thing you can rely
upon working everywhere, and this test is not arch-dependent so if it
passes on one arch it can be assumed to pass on all of them.)
libctf/
* ctf-open.c (ctf_bufopen): Bump the refcount on failure.
* testsuite/libctf-regression/open-error-free.*: New test.
This .lk option lets you run the lookup program via a wrapper executable.
For example, to run under valgrind and check for leaks (albeit noisily
because of the libtool shell script wrapper):
libctf/
* testsuite/lib/ctf-lib.exp (run_lookup_test): Add wrapper.
This .lk option lets you execute particular tests only on specific host
architectures.
libctf/
* testsuite/lib/ctf-lib.exp (run_lookup_test): Add host.
This .lk option lets you link the lookup program with extra libraries
in addition to -lctf.
libctf/
* testsuite/lib/ctf-lib.exp (run_lookup_test): Add lookup_link.
If iteration fails because opening a dict has failed, ctf_archive_next does
not destroy the iterator, so the caller can keep going and try to open other
dicts further into the archive. ctf_archive_iter just returns, though, so
it should free the iterator rather than leaking it.
libctf/
* ctf-archive.c (ctf_archive_iter): Don't leak the iterator on
failure.
CTF archive member opening (via ctf_arc_open_by_name, ctf_archive_iter, et
al) attempts to be helpful and auto-open and import any needed parent dict
in the same archive. But if this fails, the error is not reported but
simply discarded, and you silently get back a dict with no parent, that
*you* suddenly have to remember to import.
This is not helpful behaviour: if the parent is corrupted or we run out of
memory or something, the caller is going to want to know! Split it in two:
if the dict cites a parent that doesn't exist at all (a lot of historic
dicts name "PARENT" as their parent, even when they're not even children, or
perhaps the parent dict is stored separately and you plan to manually
associate it), we skip it as now, but if the import fails with an actual
error other than ECTF_ARNNAME, return the error and fail the open.
libctf/
* ctf-archive.c (ctf_arc_import_parent): Return failure if
parent opening fails for reasons other thnn nonexistence.
(ctf_dict_open_sections): Adjust.
Seen on 64-bit targets.
ERROR: compilation of lookup program .../libctf-regression/gzrewrite.c failed
* testsuite/libctf-regression/gzrewrite.c (main): Use %zu to
print size_t values.
* testsuite/libctf-regression/zrewrite.c (main): Likewise.
libctf's version script is applied to two libraries: libctf.so,
and libctf-nobfd.so. The latter library is a subset of the former
which does not link to libbfd and does not include a few public
entry points that use it (found in libctf-open-bfd.c). This means
that some of the symbols in this version script only exist in one
of the libraries it's applied to.
A number of linkers dislike this: before now, only Solaris's linker
caused serious problems, introducing NOTYPE-typed symbols when such
things were found, but now LLD has started to complain as well:
ld: error: version script assignment of 'LIBCTF_1.0' to symbol 'ctf_arc_open' failed: symbol not defined
ld: error: version script assignment of 'LIBCTF_1.0' to symbol 'ctf_fdopen' failed: symbol not defined
ld: error: version script assignment of 'LIBCTF_1.0' to symbol 'ctf_open' failed: symbol not defined
ld: error: version script assignment of 'LIBCTF_1.0' to symbol 'ctf_bfdopen' failed: symbol not defined
ld: error: version script assignment of 'LIBCTF_1.0' to symbol 'ctf_bfdopen_ctfsect' failed: symbol not defined
Rather than adding more and more whack-a-mole fixes for every
linker we encounter that does this, simply exclude such symbols
unconditionally, using the same trick we used to use for Solaris.
(Well, unconditionally if we can use version scripts with this
linker at all, which is not always the case.)
Thanks to Nicholas Vinson for the original report and a fix very
similar to this one (but not quite identical).
libctf/
* configure.ac: Always exclude libctf symbols from
libctf-nobfd's version script.
* configure: Regenerated.
Starting with ld.lld-17, ld.lld is invoked with the option
--no-undefined-version enabled by default. Furthermore, The functions
ctf_label_set() and ctf_label_get() are not defined. Their inclusion in
libctf/libctf.ver causes ld.lld-17 to fail emitting the following error
messages:
ld.lld: error: version script assignment of 'LIBCTF_1.0' to symbol 'ctf_label_set' failed: symbol not defined
ld.lld: error: version script assignment of 'LIBCTF_1.0' to symbol 'ctf_label_get' failed: symbol not defined
This patch fixes the issue by removing the symbol names from
libctf/libctf.ver.
[nca: fused in later commit that marked ctf_arc_open as libctf
only as well. Added ChangeLog entry.]
Signed-off-by: Nicholas Vinson <nvinson234@gmail.com>
libctf/
* libctf.ver: drop nonexistent label functions: mark
ctf_arc_open as libctf-only.
The libctf-internal warning function ctf_err_warn() can be passed a libctf
errno as a parameter, and will add its textual errmsg form to the passed-in
error message. But if there is an error on the fp already, and this is
specifically an error and not a warning, ctf_err_warn() will print the error
out regardless: there's no need to pass in anything but 0.
There are still a lot of places where we do
ctf_err_warn (fp, 0, EFOO, ...);
return ctf_set_errno (fp, 0, EFOO);
I've left all of those alone, because fixing it makes the code a bit longer:
but fixing the cases where no return is involved and the error has just been
set on the fp itself costs nothing and reduces redundancy a bit.
libctf/
* ctf-dedup.c (ctf_dedup_walk_output_mapping): Drop the errno arg.
(ctf_dedup_emit): Likewise.
(ctf_dedup_type_mapping): Likewise.
* ctf-link.c (ctf_create_per_cu): Likewise.
(ctf_link_deduplicating_close_inputs): Likewise.
(ctf_link_deduplicating_one_symtypetab): Likewise.
(ctf_link_deduplicating_per_cu): Likewise.
* ctf-lookup.c (ctf_lookup_symbol_idx): Likewise.
* ctf-subr.c (ctf_assert_fail_internal): Likewise.
This purely serves to make it easier to interpret valgrind output.
No functional effect.
libctf/
* testsuite/libctf-lookup/conflicting-type-syms.c: Free everything.
Now there's a chance of it actually working, we can add more tests for
the long-broken dict read-and-rewrite cases. This is the first ever
test for the (rarely-used, unpleasant, and until recently completely
broken) ctf_gzwrite function.
libctf/
* testsuite/libctf-regression/gzrewrite*: New test.
* testsuite/libctf-regression/zrewrite*: Likewise.
In particular, we don't need a symbol table if we're looking up a
symbol by name and that type of symbol has an indexed symtypetab,
since in that case we get the name from the symtypetab index, not
from the symbol table.
This lets you do symbol lookups in unlinked object files and unlinked
dicts written out via libctf's writeout functions.
libctf/
* ctf-lookup.c (ctf_lookup_by_sym_or_name): Allow lookups
by index even when there is no symtab.
When dumping a type fails with an error, we want to emit a warning noting
this: a warning because it's not fatal and we can continue. But warnings
don't automatically print out the ctf_errno (because not all cases causing
warnings set the errno at all), so we must do it at warning-emission time or
lose track of what's gone wrong.
libctf/
* ctf-dump.c (ctf_dump_format_type): Dump the underlying error on
type dump failure.
Without this, you might get things like this in the output:
Flags: 0xa (CTF_F_NEWFUNCINFO, , CTF_F_DYNSTR)
Note the spurious comma.
libctf/
* ctf-dump.c (ctf_dump_header): Fix comma emission.
ctf_serialize() evolved from the old ctf_update(), which mutated the
in-memory CTF dict to make all the dynamic in-memory types into static,
unchanging written-to-the-dict types (by deserializing and reserializing
it): back in the days when you could only do type lookups on static types,
this meant you could see all the types you added recently, at the small,
small cost of making it impossible to change those older types ever again
and inducing an amortized O(n^2) cost if you actually wanted to add
references to types you added at arbitrary times to later types.
It also reset things so that ctf_discard() would throw away only types you
added after the most recent ctf_update() call.
Some time ago this was all changed so that you could look up dynamic types
just as easily as static types: ctf_update() changed so that only its
visible side-effect of affecting ctf_discard() remained: the old
ctf_update() was renamed to ctf_serialize(), made internal to libctf, and
called from the various functions that wrote files out.
... but it was still working by serializing and deserializing the entire
dict, swapping out its guts with the newly-serialized copy in an invasive
and horrible fashion that coupled ctf_serialize() to almost every field in
the ctf_dict_t. This is totally useless, and fixing it is easy: just rip
all that code out and have ctf_serialize return a serialized representation,
and let everything use that directly. This simplifies most of its callers
significantly.
(It also points up another bug: ctf_gzwrite() failed to call ctf_serialize()
at all, so it would only ever work for a dict you just ctf_write_mem()ed
yourself, just for its invisible side-effect of serializing the dict!)
This lets us simplify away a bunch of internal-only open-side functionality
for overriding the syn_ext_strtab and some just-added functionality for
forcing in an existing atoms table, without loss of functionality, and lets
us lift the restriction on reserializing a dict that was ctf_open()ed rather
than being ctf_create()d: it's now perfectly OK to open a dict, modify it
(except for adding members to existing structs, unions, or enums, which
fails with -ECTF_RDONLY), and write it out again, just as one would expect.
libctf/
* ctf-serialize.c (ctf_symtypetab_sect_sizes): Fix typos.
(ctf_type_sect_size): Add static type sizes too.
(ctf_serialize): Return the new dict rather than updating the
existing dict. No longer fail for dicts with static types;
copy them onto the start of the new types table.
(ctf_gzwrite): Actually serialize before gzwriting.
(ctf_write_mem): Improve forced (test-mode) endian-flipping:
flip dicts even if they are too small to be compressed.
Improve confusing variable naming.
* ctf-archive.c (arc_write_one_ctf): Don't bother to call
ctf_serialize: both the functions we call do so.
* ctf-string.c (ctf_str_create_atoms): Drop serializing case
(atoms arg).
* ctf-open.c (ctf_simple_open): Call ctf_bufopen directly.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Delete/rename to ctf_bufopen: no
longer bother with syn_ext_strtab or forced atoms table,
serialization no longer needs them.
* ctf-create.c (ctf_create): Call ctf_bufopen directly.
* ctf-impl.h (ctf_str_create_atoms): Drop atoms arg.
(ctf_simple_open_internal): Delete.
(ctf_bufopen_internal): Likewise.
(ctf_serialize): Adjust.
* testsuite/libctf-lookup/add-to-opened.c: Adjust now that
this is supposed to work.
This commit finally adjusts strtab writeout so that repeated writeouts, or
writeouts of a dict that was read in earlier, only sorts the portion of the
strtab that was newly added.
There are three intertwined changes here:
- pull the contents of strtabs from newly ctf_bufopened dicts into the
atoms table, so that future additions will reuse the existing offset etc
rather than adding new identical strings
- allow the internal ctf_bufopen done by serialization to contribute its
existing atoms table, so that existing atoms can be used for the
remainder of the open process (like name table construction): this atoms
table currente gets thrown away in the mass reassignment done later in
ctf_serialize in any case, but it needs to be there during the open.
- rewrite ctf_str_write_strtab so that a) it uses iterators rather than
ctf_*_iter, reducing pointless structures which serve no other purpose
than to implement ordinary variable scope, but more clunkily, and b)
retains the existing strtab on the front of the new one, with its sort
retained, rather than resorting, so all existing already-written strtab
offsets remain valid across the call.
This latter change finally permits repeated serializations, and
reserializations of ctf_open()ed dicts, to work, but for now we keep the
code that prevents that because serialization is about to change again in a
way that will make it more obvious that doing such things is safe, and we
can take it out then.
(There are also some smaller changes like moving the purge of the refs table
into ctf_str_write_strtab(), since that's where the changes happen that
invalidate it, rather than doing it in ctf_serialize(). We also prohibit
something that has never worked, opening a dict and then reporting symbols
to it via ctf_link_add_strtab() et al: you must do that to newly-created
dicts which have had stuff ctf_link()ed into them. This is very unlikely
ever to be a problem in practice: linkers just don't do that sort of thing.)
libctf/
* ctf-create.c (ctf_create): Add (temporary) atoms arg.
* ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New.
(ctf_str_create_atoms): Adjust.
(ctf_str_write_strtab): Likewise.
(ctf_simple_open_internal): Likewise.
* ctf-open.c (ctf_simple_open_internal): Add atoms arg.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Initialize just enough of an
atoms table: pre-init from the atoms arg if supplied.
(ctf_simple_open): Adjust.
* ctf-serialize.c (ctf_serialize): Constify the strtab.
Move ref list purging into ctf_str_write_strtab.
Initialize the new dict with the old dict's atoms table.
Accept the new strtab from ctf_str_write_strtab.
Adjust for addition of ctf_dynstrtab.
* ctf-string.c (ctf_strraw_explicit): Improve comments.
(ctf_str_create_atoms): Prepopulate from an existing atoms table,
or alternatively pull in all strings from the strtab and turn
them into atoms.
(ctf_str_free_atoms): Free the dynstrtab and its strtab.
(struct ctf_strtab_write_state): Remove.
(ctf_str_count_strtab): Fold this...
(ctf_str_populate_sorttab): ... and this...
(ctf_str_write_strtab): ... into this. Prepend existing strings
to the strtab rather than resorting them (and wrecking their
offsets). Keep the dynstrtab updated. Update refs for all
atoms with refs, whether or not they are strings newly added
to the strtab.
A few years ago we introduced a 'pending refs' abstraction to fix one
problem: serializing a dict, then changing it would tend to corrupt the dict
because the strtab sort we do on strtab writeout (to improve compression
efficiency) would modify the offset of any strings that sorted
lexicographically earlier in the strtab: so we added a new restriction that
all strings are added only at serialization time, and maintained a set of
'pending' refs that were added earlier, whose offsets we could update (like
other refs) at writeout time.
This was in hindsight seriously problematic for maintenance (because
serialization has to traverse all strings in all datatypes in the entire
dict), and has become impossible to sustain now that we can read in existing
dicts, modify them, and reserialize them again. We really don't want to
have to dig through the entire dict we jut read in just in order to dig out
all its strtab offsets, then *change* it, just for the sake of a sort that
adds a frankly trivial amount of compression efficiency.
Sorting *is* still worthwhile -- but it sacrifices very little to only sort
newly-added portions of the strtab, reusing older portions as necessary.
As a first stage in this, discard the whole "pending refs" abstraction and
replace it with "movable" refs, which are exactly like all other refs
(addresses containing the strtab offset of some string, which are updated
wiht the final strtab offset on serialization) except that we track them in
a reverse dict so that we can move the refs around (which we do whenever we
realloc() a buffer containing a bunch of structure members or something when
we add members to the structure).
libctf/
* ctf-create.c (ctf_add_enumerator): Call ctf_str_move_refs; add
a movable ref.
(ctf_add_member_offset): Likewise.
* ctf-util.c (ctf_realloc): Delete.
* ctf-serialize.c (ctf_serialize): No longer use it. Adjust to
new fields.
* ctf-string.c (ctf_str_purge_atom_refs): Purge movable refs.
(ctf_str_free_atom): Free freeable atoms' strings.
(ctf_str_create_atoms): Create the movable refs dynhash if needed.
(ctf_str_free_atoms): Destroy it.
(CTF_STR_MOVABLE): Switch (back) from ints to flags (see previous
reversion). Add new flag.
(aref_create): New, populate movable refs if need be.
(ctf_str_add_ref_internal): Switch back to flags, update refs
directly for nonprovisional strings (with already-known fixed offsets);
create refs via aref_create. Allocate strings only if not within an
mmapped strtab.
(ctf_str_add_movable_ref): New.
(ctf_str_add): Adjust to CTF_STR_* reintroduction.
(ctf_str_add_external): LIkewise.
(ctf_str_move_refs): New, move refs via ctf_str_movable_refs
backpointer.
(ctf_str_purge_refs): Drop ctf_str_num_refs.
(ctf_str_update_refs): Fix indentation.
* ctf-impl.h (struct ctf_str_atom_movable): New.
(struct ctf_dict.ctf_str_num_refs): Drop.
(struct ctf_dict.ctf_str_movable_refs): New.
(ctf_str_add_movable_ref): Declare.
(ctf_str_move_refs): Likewise.
(ctf_realloc): Drop.
This reverts commit 986e9e3aa0.
(We do not revert the testcase -- it remains valid -- but we are
taking a different, less complex and more robust approach.)
This also deletes the pending refs abstraction without (yet)
replacing it, so some tests will fail for a commit or two.
These two fields are constantly confusing because CTF dicts contain both
a symtypetab and strtab, but these fields are not that: they are the
symtab and strtab from the ELF file. We have enough string tables now
(internal, external, synthetic external, dynamic) that we need to at
least name them better than this to avoid getting totally confused.
Rename them to ctf_ext_symtab and ctf_ext_strtab.
libctf/
* ctf-dump.c (ctf_dump_objts): Rename ctf_symtab -> ctf_ext_symtab.
* ctf-impl.h (struct ctf_dict.ctf_symtab): Rename to...
(struct ctf_dict.ctf_ext_strtab): ... this.
(struct ctf_dict.ctf_strtab): Rename to...
(struct ctf_dict.ctf_ext_strtab): ... this.
* ctf-lookup.c (ctf_lookup_symbol_name): Adapt.
(ctf_lookup_symbol_idx): Adapt.
(ctf_lookup_by_sym_or_name): Adapt.
* ctf-open.c (ctf_bufopen_internal): Adapt.
(ctf_dict_close): Adapt.
(ctf_getsymsect): Adapt.
(ctf_getstrsect): Adapt.
(ctf_symsect_endianness): Adapt.
This flag was meant as an optimization to avoid reserializing dicts
unnecessarily. It was critically necessary back when serialization was
done by ctf_update() and you had to call that every time you wanted any
new modifications to the type table to be usable by other types, but
that has been unnecessary for years now, and serialization is only done
once when writing out, which one would naturally assume would always
serialize the dict. Worse, it never really worked: it only tracked
newly-added types, not things like added symbols which might equally
well require reserialization, and it gets in the way of an upcoming
change. Delete entirely.
libctf/
* ctf-create.c (ctf_create): Drop LCTF_DIRTY.
(ctf_discard): Likewise.
(ctf_rollback): Likewise.
(ctf_add_generic): Likewise.
(ctf_set_array): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_variable_forced): Likewise.
* ctf-link.c (ctf_link_intern_extern_string): Likewise.
(ctf_link_add_strtab): Likewise.
* ctf-serialize.c (ctf_serialize): Likewise.
* ctf-impl.h (LCTF_DIRTY): Likewise.
(LCTF_LINKING): Renumber.
A mistaken "not" in ctf_err_warn made it seem like we only extracted
error messages if this was not an error.
libctf/
* ctf-subr.c (ctf_err_warn): Fix comment.
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
The intent of the name lookup code was for lookups to yield non-bitfield
basic types except if none existed with a given name, and only then
return bitfield types with that name. Unfortunately, the code as
written only does this if the base type has a type ID higher than all
bitfield types, which is most unlikely (the opposite is almost always
the case).
Adjust it so that what ends up in the name table is the highest-width
zero-offset type with a given name, if any such exist, and failing that
the first type with that name we see, no matter its offset. (We don't
define *which* bitfield type you get, after all, so we might as well
just stuff in the first we find.)
Reported by Stephen Brennan <stephen.brennan@oracle.com>.
libctf/
* ctf-open.c (init_types): Modify to allow some lookups during open;
detect bitfield name reuse and prefer less bitfieldy types.
* testsuite/libctf-writable/libctf-bitfield-name-lookup.*: New test.
libctf internally maintains a set of hash tables for type name lookups,
one for each valid C type namespace (struct, union, enum, and everything
else).
Or, rather, it maintains *two* sets of hash tables: one, a ctf_hash *,
is meant for lookups in ctf_(buf)open()ed dicts with fixed content; the
other, a ctf_dynhash *, is meant for lookups in ctf_create()d dicts.
This distinction was somewhat valuable in the far pre-binutils past when
two different hashtable implementations were used (one expanding, the
other fixed-size), but those days are long gone: the hash table
implementations are almost identical, both wrappers around the libiberty
hashtab. The ctf_dynhash has many more capabilities than the ctf_hash
(iteration, deletion, etc etc) and has no downsides other than starting
at a fixed, arbitrary small size.
That limitation is easy to lift (via a new ctf_dynhash_create_sized()),
following which we can throw away nearly all the ctf_hash
implementation, and all the code to choose between readable and writable
hashtabs; the few convenience functions that are still useful (for
insertion of name -> type mappings) can also be generalized a bit so
that the extra string verification they do is potentially available to
other string lookups as well.
(libctf still has two hashtable implementations, ctf_dynhash, above,
and ctf_dynset, which is a key-only hashtab that can avoid a great many
malloc()s, used for high-volume applications in the deduplicator.)
libctf/
* ctf-create.c (ctf_create): Eliminate ctn_writable.
(ctf_dtd_insert): Likewise.
(ctf_dtd_delete): Likewise.
(ctf_rollback): Likewise.
(ctf_name_table): Eliminate ctf_names_t.
* ctf-hash.c (ctf_dynhash_create): Comment update.
Reimplement in terms of...
(ctf_dynhash_create_sized): ... this new function.
(ctf_hash_create): Remove.
(ctf_hash_size): Remove.
(ctf_hash_define_type): Remove.
(ctf_hash_destroy): Remove.
(ctf_hash_lookup_type): Rename to...
(ctf_dynhash_lookup_type): ... this.
(ctf_hash_insert_type): Rename to...
(ctf_dynhash_insert_type): ... this, moving validation to...
* ctf-string.c (ctf_strptr_validate): ... this new function.
* ctf-impl.h (struct ctf_names): Extirpate.
(struct ctf_lookup.ctl_hash): Now a ctf_dynhash_t.
(struct ctf_dict): All ctf_names_t fields are now ctf_dynhash_t.
(ctf_name_table): Now returns a ctf_dynhash_t.
(ctf_lookup_by_rawhash): Remove.
(ctf_hash_create): Likewise.
(ctf_hash_insert_type): Likewise.
(ctf_hash_define_type): Likewise.
(ctf_hash_lookup_type): Likewise.
(ctf_hash_size): Likewise.
(ctf_hash_destroy): Likewise.
(ctf_dynhash_create_sized): New.
(ctf_dynhash_insert_type): New.
(ctf_dynhash_lookup_type): New.
(ctf_strptr_validate): New.
* ctf-lookup.c (ctf_lookup_by_name_internal): Adapt.
* ctf-open.c (init_types): Adapt.
(ctf_set_ctl_hashes): Adapt.
(ctf_dict_close): Adapt.
* ctf-serialize.c (ctf_serialize): Adapt.
* ctf-types.c (ctf_lookup_by_rawhash): Remove.
This cache replaced a cache of symbol index->ctf_id_t. That cache was
just an array, so it could get away with just being free()d, but the
ctfi_symnamedicts cache that replaced it is a full dynhash with a
dynamically-allocated string as the key. As such, it needs freeing with
ctf_dynhash_destroy(), not just free(), or we leak parts of the
underlying hashtab, and all the keys.
libctf/ChangeLog:
* ctf-archive.c (ctf_arc_flush_caches): Fix leak.
Seen with every compiler I have if using -fno-inline:
home/alan/src/binutils-gdb/libctf/ctf-create.c: In function ‘ctf_add_encoded’:
/home/alan/src/binutils-gdb/libctf/ctf-create.c:555:3: warning: ‘encoding’ may be used uninitialized [-Wmaybe-uninitialized]
555 | memcpy (dtd->dtd_vlen, &encoding, sizeof (encoding));
Seen with gcc-4.9 and probably others at lower optimisation levels:
home/alan/src/binutils-gdb/libctf/ctf-serialize.c: In function 'symtypetab_density':
/home/alan/src/binutils-gdb/libctf/ctf-serialize.c:211:18: warning: 'sym' may be used uninitialized in this function [-Wmaybe-uninitialized]
if (*max < sym->st_symidx)
Seen with gcc-4.5 and probably others at lower optimisation levels:
/home/alan/src/binutils-gdb/libctf/ctf-types.c:1649:21: warning: 'tp' may be used uninitialized in this function
/home/alan/src/binutils-gdb/libctf/ctf-link.c:765:16: warning: 'parent_i' may be used uninitialized in this function
Also with gcc-4.5:
In file included from /home/alan/src/binutils-gdb/libctf/ctf-endian.h:25:0,
from /home/alan/src/binutils-gdb/libctf/ctf-archive.c:24:
/home/alan/src/binutils-gdb/libctf/swap.h:70:0: warning: "_Static_assert" redefined
/usr/include/sys/cdefs.h:568:0: note: this is the location of the previous definition
* swap.h (_Static_assert): Don't define if already defined.
* ctf-serialize.c (symtypetab_density): Merge two
CTF_SYMTYPETAB_FORCE_INDEXED blocks.
* ctf-create.c (ctf_add_encoded): Avoid "encoding" may be used
uninitialized warning.
* ctf-link.c (ctf_link_deduplicating_open_inputs): Avoid
"parent_i" may be used uninitialized warning.
* ctf-types.c (ctf_type_rvisit): Avoid "tp" may be used
uninitialized warning.
Just because a path is an error path doesn't mean the program terminates
there if you don't ask it to. And we don't want to -- but that means
we need to initialize the variables that are missed if an error happens to
*something*. Type ID 0 (unimplemented) will do: it'll induce further
ECTF_BADID errors, but that's no bad thing.
libctf/ChangeLog:
* testsuite/libctf-writable/libctf-errors.c: Initialize variables.