Commit graph

10 commits

Author SHA1 Message Date
Joel Brobecker
213516ef31 Update copyright year range in header of all files managed by GDB
This commit is the result of running the gdb/copyright.py script,
which automated the update of the copyright year range for all
source files managed by the GDB project to be updated to include
year 2023.
2023-01-01 17:01:16 +04:00
Tom Tromey
55fc1623f9 Add name canonicalization for C
PR symtab/29105 shows a number of situations where symbol lookup can
result in the expansion of too many CUs.

What happens is that lookup_signed_typename will try to look up a type
like "signed int".  In cooked_index_functions::expand_symtabs_matching,
when looping over languages, the C++ case will canonicalize this type
name to be "int" instead.  Then this method will proceed to expand
every CU that has an entry for "int" -- i.e., nearly all of them.  A
crucial component of this is that the caller, objfile::lookup_symbol,
does not do this canonicalization, so when it tries to find the symbol
for "signed int", it fails -- causing the loop to continue.

This patch fixes the problem by introducing name canonicalization for
C.  The idea here is that, by making C and C++ agree on the canonical
name when a symbol name can have multiple spellings, we avoid the bad
behavior in objfile::lookup_symbol (and any other such code -- I don't
know if there is any).

Unlike C++, C only has a few situations where canonicalization is
needed.  And, in particular, due to the lack of overloading (thus
avoiding any issues in linespec) and due to the way c-exp.y works, I
think that no canonicalization is needed during symbol lookup -- only
during symtab construction.  This explains why lookup_name_info is not
touched.

The stabs reader is modified on a "best effort" basis.

The DWARF reader needed one small tweak in dwarf2_name to avoid a
regression in dw2-unusual-field-names.exp.  I think this is adequately
explained by the comment, but basically this is a scenario that should
not occur in real code, only the gdb test suite.

lookup_signed_typename is simplified.  It used to search for two
different type names, but now gdb can search just for the canonical
form.

gdb.dwarf2/enum-type.exp needed a small tweak, because the
canonicalizer turns "unsigned integer" into "unsigned int integer".
It seems better here to use the correct C type name.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105
Tested-by: Simon Marchi <simark@simark.ca>
Reviewed-by: Andrew Burgess <aburgess@redhat.com>
2022-12-01 11:16:41 -07:00
Tom Tromey
bed34ce705 Refactor cooked_index::do_finalize
This refactors cooked_index::do_finalize, reordering an 'if' to make
it a little less redundant.  This change makes a subsequent patch
easier to read.

Reviewed-by: Andrew Burgess <aburgess@redhat.com>
2022-12-01 11:16:41 -07:00
Tom de Vries
2c474c4694 [gdb/symtab] Add get/set functions for per_cu->lang/unit_type
The dwarf2_per_cu_data fields lang and unit_type both have a dont-know
initial value (respectively language_unknown and (dwarf_unit_type)0), which
allows us to add certain checks, f.i. checking that that a field is not read
before written.

Add get/set member functions for the two fields as a convenient location to
add such checks, make the fields private to enforce using the member
functions, and add the m_ prefix.

Tested on x86_64-linux.
2022-07-04 10:28:42 +02:00
Tom Tromey
20a26f4e01 Finalize each cooked index separately
After DWARF has been scanned, the cooked index code does a
"finalization" step in a worker thread.  This step combines all the
index entries into a single master list, canonicalizes C++ names, and
splits Ada names to synthesize package names.

While this step is run in the background, gdb will wait for the
results in some situations, and it turns out that this step can be
slow.  This is PR symtab/29105.

This can be sped up by parallelizing, at a small memory cost.  Now
each index is finalized on its own, in a worker thread.  The cost
comes from name canonicalization: if a given non-canonical name is
referred to by multiple indices, there will be N canonical copies (one
per index) rather than just one.

This requires changing the users of the index to iterate over multiple
results.  However, this is easily done by introducing a new "chained
range" class.

When run on gdb itself, the memory cost seems rather low -- on my
current machine, "maint space 1" reports no change due to the patch.

For performance testing, using "maint time 1" and "file" will not show
correct results.  That approach measures "time to next prompt", but
because the patch only affects background work, this shouldn't (and
doesn't) change.  Instead, a simple way to make gdb wait for the
results is to set a breakpoint.

Before:

    $ /bin/time -f%e ~/gdb/install/bin/gdb -nx -q -batch \
        -ex 'break main' /tmp/gdb
    Breakpoint 1 at 0x43ec30: file ../../binutils-gdb/gdb/gdb.c, line 28.
    2.00

After:

    $ /bin/time -f%e ./gdb/gdb -nx -q -batch \
        -ex 'break main' /tmp/gdb
    Breakpoint 1 at 0x43ec30: file ../../binutils-gdb/gdb/gdb.c, line 28.
    0.65

Regression tested on x86-64 Fedora 34.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105
2022-05-26 07:35:30 -06:00
Tom Tromey
72b580b8f4 Micro-optimize cooked_index_entry::full_name
I noticed that cooked_index_entry::full_name can return the canonical
string when there is no parent entry.

Regression tested on x86-64 Fedora 34.
2022-04-20 06:21:06 -06:00
Simon Marchi
a8b7a13911 gdb: fix "passing NULL to memcpy" UBsan error in dwarf2/cooked-index.c
Reading a simple file compiled with :

    $ gcc -DONE=1 -gdwarf-4 -g3  test.c
    $ gcc --version
    gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0

I get:

    Reading symbols from /tmp/cwd/a.out...
    /home/smarchi/src/binutils-gdb/gdb/dwarf2/cooked-index.c:332:11: runtime error: null pointer passed as argument 2, which is declared to never be null

It looks like even if the size is 0 (the size of the `entries` vector is
0), we shouldn't be passing a NULL pointer to memcpy.  And
`entries.data ()` returns NULL.

Fix that by using std::vector::insert to insert the items of entries
into m_entries.  I haven't checked, but it should essentially compile
down to a memcpy, since the vector elements are trivially copyiable.

Change-Id: I75f1c901e9b522e42e89eb5936e2c70d68eb21e5
2022-04-12 14:42:02 -04:00
Tom Tromey
7e75279093 "Finalize" the DWARF index in the background
After scanning the CUs, the DWARF indexer merges all the data into a
single vector, canonicalizing C++ names as it proceeds.  While not
necessarily single-threaded, this process is currently done in just
one thread, to keep memory costs lower.

However, this work is all done without reference to any data outside
of the indexes.  This patch improves the apparent performance of GDB
by moving it to the background.  All uses of the index are then made
to wait for this process to complete.

In our ongoing example, this reduces the scanning time on gdb itself
to 0.173937 (wall).  Recall that before this patch, the time was
0.668923; and psymbol reader does this in 1.598869.  That is, at the
end of this series, we see about a 10x speedup.
2022-04-12 09:31:16 -06:00
Tom Tromey
46114cb7be Parallelize DWARF indexing
This parallelizes the new DWARF indexer.  The indexer's storage was
designed so that each storage object and each indexer is fully
independent.  This setup makes it simple to scan different CUs
independently.

This patch creates a new cooked index storage object per thread, and
then scans a subset of all the CUs in each such thread, using gdb's
existing thread pool.

In the ongoing "gdb gdb" example, this patch reduces the wall time
down to 0.668923, from 0.903534.  (Note that the 0.903534 is the time
for the new index -- that is, when the "enable the new index" patch is
rebased to before this one.  However, in the final series, that patch
appears toward the end.  Hopefully this isn't too confusing.)
2022-04-12 09:31:16 -06:00
Tom Tromey
51f5a4b8e9 Introduce the new DWARF index class
This patch introduces the new DWARF index class.  It is called
"cooked" to contrast against a "raw" index, which is mapped from disk
without extra effort.

Nothing constructs a cooked index yet.  The essential idea here is
that index entries are created via the "add" method; then when all the
entries have been read, they are "finalize"d -- name canonicalization
is performed and the entries are added to a sorted vector.

Entries use the DWARF name (DW_AT_name) or linkage name, not the full
name as is done for partial symbols.

These two facets -- the short name and the deferred canonicalization
-- help improve the performance of this approach.  This will become
clear in later patches, when parallelization is added.

Some special code is needed for Ada, because GNAT only emits mangled
("encoded", in the Ada lingo) names, and so we reconstruct the
hierarchical structure after the fact.  This is also done in the
finalization phase.

One other aspect worth noting is that the way the "main" function is
found is different in the new code.  Currently gdb will notice
DW_AT_main_subprogram, but won't recognize "main" during reading --
this is done later, via explicit symbol lookup.  This is done
differently in the new code so that finalization can be done in the
background without then requiring a synchronization to look up the
symbol.
2022-04-12 09:31:16 -06:00