binutils-gdb

Author	SHA1	Message	Date
Joel Brobecker	213516ef31	Update copyright year range in header of all files managed by GDB This commit is the result of running the gdb/copyright.py script, which automated the update of the copyright year range for all source files managed by the GDB project to be updated to include year 2023.	2023-01-01 17:01:16 +04:00
Tom Tromey	55fc1623f9	Add name canonicalization for C PR symtab/29105 shows a number of situations where symbol lookup can result in the expansion of too many CUs. What happens is that lookup_signed_typename will try to look up a type like "signed int". In cooked_index_functions::expand_symtabs_matching, when looping over languages, the C++ case will canonicalize this type name to be "int" instead. Then this method will proceed to expand every CU that has an entry for "int" -- i.e., nearly all of them. A crucial component of this is that the caller, objfile::lookup_symbol, does not do this canonicalization, so when it tries to find the symbol for "signed int", it fails -- causing the loop to continue. This patch fixes the problem by introducing name canonicalization for C. The idea here is that, by making C and C++ agree on the canonical name when a symbol name can have multiple spellings, we avoid the bad behavior in objfile::lookup_symbol (and any other such code -- I don't know if there is any). Unlike C++, C only has a few situations where canonicalization is needed. And, in particular, due to the lack of overloading (thus avoiding any issues in linespec) and due to the way c-exp.y works, I think that no canonicalization is needed during symbol lookup -- only during symtab construction. This explains why lookup_name_info is not touched. The stabs reader is modified on a "best effort" basis. The DWARF reader needed one small tweak in dwarf2_name to avoid a regression in dw2-unusual-field-names.exp. I think this is adequately explained by the comment, but basically this is a scenario that should not occur in real code, only the gdb test suite. lookup_signed_typename is simplified. It used to search for two different type names, but now gdb can search just for the canonical form. gdb.dwarf2/enum-type.exp needed a small tweak, because the canonicalizer turns "unsigned integer" into "unsigned int integer". It seems better here to use the correct C type name. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105 Tested-by: Simon Marchi <simark@simark.ca> Reviewed-by: Andrew Burgess <aburgess@redhat.com>	2022-12-01 11:16:41 -07:00
Tom Tromey	bed34ce705	Refactor cooked_index::do_finalize This refactors cooked_index::do_finalize, reordering an 'if' to make it a little less redundant. This change makes a subsequent patch easier to read. Reviewed-by: Andrew Burgess <aburgess@redhat.com>	2022-12-01 11:16:41 -07:00
Tom de Vries	2c474c4694	[gdb/symtab] Add get/set functions for per_cu->lang/unit_type The dwarf2_per_cu_data fields lang and unit_type both have a dont-know initial value (respectively language_unknown and (dwarf_unit_type)0), which allows us to add certain checks, f.i. checking that that a field is not read before written. Add get/set member functions for the two fields as a convenient location to add such checks, make the fields private to enforce using the member functions, and add the m_ prefix. Tested on x86_64-linux.	2022-07-04 10:28:42 +02:00
Tom Tromey	20a26f4e01	Finalize each cooked index separately After DWARF has been scanned, the cooked index code does a "finalization" step in a worker thread. This step combines all the index entries into a single master list, canonicalizes C++ names, and splits Ada names to synthesize package names. While this step is run in the background, gdb will wait for the results in some situations, and it turns out that this step can be slow. This is PR symtab/29105. This can be sped up by parallelizing, at a small memory cost. Now each index is finalized on its own, in a worker thread. The cost comes from name canonicalization: if a given non-canonical name is referred to by multiple indices, there will be N canonical copies (one per index) rather than just one. This requires changing the users of the index to iterate over multiple results. However, this is easily done by introducing a new "chained range" class. When run on gdb itself, the memory cost seems rather low -- on my current machine, "maint space 1" reports no change due to the patch. For performance testing, using "maint time 1" and "file" will not show correct results. That approach measures "time to next prompt", but because the patch only affects background work, this shouldn't (and doesn't) change. Instead, a simple way to make gdb wait for the results is to set a breakpoint. Before: $ /bin/time -f%e ~/gdb/install/bin/gdb -nx -q -batch \ -ex 'break main' /tmp/gdb Breakpoint 1 at 0x43ec30: file ../../binutils-gdb/gdb/gdb.c, line 28. 2.00 After: $ /bin/time -f%e ./gdb/gdb -nx -q -batch \ -ex 'break main' /tmp/gdb Breakpoint 1 at 0x43ec30: file ../../binutils-gdb/gdb/gdb.c, line 28. 0.65 Regression tested on x86-64 Fedora 34. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105	2022-05-26 07:35:30 -06:00
Tom Tromey	72b580b8f4	Micro-optimize cooked_index_entry::full_name I noticed that cooked_index_entry::full_name can return the canonical string when there is no parent entry. Regression tested on x86-64 Fedora 34.	2022-04-20 06:21:06 -06:00
Simon Marchi	a8b7a13911	gdb: fix "passing NULL to memcpy" UBsan error in dwarf2/cooked-index.c Reading a simple file compiled with : $ gcc -DONE=1 -gdwarf-4 -g3 test.c $ gcc --version gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0 I get: Reading symbols from /tmp/cwd/a.out... /home/smarchi/src/binutils-gdb/gdb/dwarf2/cooked-index.c:332:11: runtime error: null pointer passed as argument 2, which is declared to never be null It looks like even if the size is 0 (the size of the `entries` vector is 0), we shouldn't be passing a NULL pointer to memcpy. And `entries.data ()` returns NULL. Fix that by using std::vector::insert to insert the items of entries into m_entries. I haven't checked, but it should essentially compile down to a memcpy, since the vector elements are trivially copyiable. Change-Id: I75f1c901e9b522e42e89eb5936e2c70d68eb21e5	2022-04-12 14:42:02 -04:00
Tom Tromey	7e75279093	"Finalize" the DWARF index in the background After scanning the CUs, the DWARF indexer merges all the data into a single vector, canonicalizing C++ names as it proceeds. While not necessarily single-threaded, this process is currently done in just one thread, to keep memory costs lower. However, this work is all done without reference to any data outside of the indexes. This patch improves the apparent performance of GDB by moving it to the background. All uses of the index are then made to wait for this process to complete. In our ongoing example, this reduces the scanning time on gdb itself to 0.173937 (wall). Recall that before this patch, the time was 0.668923; and psymbol reader does this in 1.598869. That is, at the end of this series, we see about a 10x speedup.	2022-04-12 09:31:16 -06:00
Tom Tromey	46114cb7be	Parallelize DWARF indexing This parallelizes the new DWARF indexer. The indexer's storage was designed so that each storage object and each indexer is fully independent. This setup makes it simple to scan different CUs independently. This patch creates a new cooked index storage object per thread, and then scans a subset of all the CUs in each such thread, using gdb's existing thread pool. In the ongoing "gdb gdb" example, this patch reduces the wall time down to 0.668923, from 0.903534. (Note that the 0.903534 is the time for the new index -- that is, when the "enable the new index" patch is rebased to before this one. However, in the final series, that patch appears toward the end. Hopefully this isn't too confusing.)	2022-04-12 09:31:16 -06:00
Tom Tromey	51f5a4b8e9	Introduce the new DWARF index class This patch introduces the new DWARF index class. It is called "cooked" to contrast against a "raw" index, which is mapped from disk without extra effort. Nothing constructs a cooked index yet. The essential idea here is that index entries are created via the "add" method; then when all the entries have been read, they are "finalize"d -- name canonicalization is performed and the entries are added to a sorted vector. Entries use the DWARF name (DW_AT_name) or linkage name, not the full name as is done for partial symbols. These two facets -- the short name and the deferred canonicalization -- help improve the performance of this approach. This will become clear in later patches, when parallelization is added. Some special code is needed for Ada, because GNAT only emits mangled ("encoded", in the Ada lingo) names, and so we reconstruct the hierarchical structure after the fact. This is also done in the finalization phase. One other aspect worth noting is that the way the "main" function is found is different in the new code. Currently gdb will notice DW_AT_main_subprogram, but won't recognize "main" during reading -- this is done later, via explicit symbol lookup. This is done differently in the new code so that finalization can be done in the background without then requiring a synchronization to look up the symbol.	2022-04-12 09:31:16 -06:00

10 commits