There is a problem with how GDB handles a vfork happening in a
multi-threaded program. This problem was reported to me by somebody not
using vfork directly, but using system(3) in a multi-threaded program,
which may be implemented using vfork.
This patch only deals about the follow-fork-mode=parent,
detach-on-fork=on case, because it would be too much to chew at once to
fix the bugs in the other cases as well (I tried).
The problem
-----------
When a program vforks, the parent thread is suspended by the kernel
until the child process exits or execs. Specifically, in a
multi-threaded program, only the thread that called vfork is suspended,
other threads keep running freely. This is documented in the vfork(2)
man page ("Caveats" section).
Let's suppose GDB is handling a vfork and the user's desire is to detach
from the child. Before detaching the child, GDB must remove the software
breakpoints inserted in the shared parent/child address space, in case
there's a breakpoint in the path the child is going to take before
exec'ing or exit'ing (unlikely, but possible). Otherwise the child could
hit a breakpoint instruction while running outside the control of GDB,
which would make it crash. GDB must also avoid re-inserting breakpoints
in the parent as long as it didn't receive the "vfork done" event (that
is, when the child has exited or execed): since the address space is
shared with the child, that would re-insert breakpoints in the child
process also. So what GDB does is:
1. Receive "vfork" event for the parent
2. Remove breakpoints from the (shared) address space and set
program_space::breakpoints_not_allowed to avoid re-inserting them
3. Detach from the child thread
4. Resume the parent
5. Wait for and receive "vfork done" event for the parent
6. Clean program_space::breakpoints_not_allowed and re-insert
breakpoints
7. Resume the parent
Resuming the parent at step 4 is necessary in order for the kernel to
report the "vfork done" event. The kernel won't report a ptrace event
for a thread that is ptrace-stopped. But the theory behind this is that
between steps 4 and 5, the parent won't actually do any progress even
though it is ptrace-resumed, because the kernel keeps it suspended,
waiting for the child to exec or exit. So it doesn't matter for that
thread if breakpoints are not inserted.
The problem is when the program is multi-threaded. In step 4, GDB
resumes all threads of the parent. The thread that did the vfork stays
suspended by the kernel, so that's fine. But other threads are running
freely while breakpoints are removed, which is a problem because they
could miss a breakpoint that they should have hit.
The problem is present with all-stop and non-stop targets. The only
difference is that with an all-stop targets, the other threads are
stopped by the target when it reports the vfork event and are resumed by
the target when GDB resumes the parent. With a non-stop target, the
other threads are simply never stopped.
The fix
-------
There many combinations of settings to consider (all-stop/non-stop,
target-non-stop on/off, follow-fork-mode parent/child, detach-on-fork
on/off, schedule-multiple on/off), but for this patch I restrict the
scope to follow-fork-mode=parent, detach-on-fork=on. That's the
"default" case, where we detach the child and keep debugging the
parent. I tried to fix them all, but it's just too much to do at once.
The code paths and behaviors for when we don't detach the child are
completely different.
The guiding principle for this patch is that all threads of the vforking
inferior should be stopped as long as breakpoints are removed. This is
similar to handling in-line step-overs, in a way.
For non-stop targets (the default on Linux native), this is what
happens:
- In follow_fork, we call stop_all_threads to stop all threads of the
inferior
- In follow_fork_inferior, we record the vfork parent thread in
inferior::thread_waiting_for_vfork_done
- Back in handle_inferior_event, we call keep_going, which resumes only
the event thread (this is already the case, with a non-stop target).
This is the thread that will be waiting for vfork-done.
- When we get the vfork-done event, we go in the (new) handle_vfork_done
function to restart the previously stopped threads.
In the same scenario, but with an all-stop target:
- In follow_fork, no need to stop all threads of the inferior, the
target has stopped all threads of all its inferiors before returning
the event.
- In follow_fork_inferior, we record the vfork parent thread in
inferior::thread_waiting_for_vfork_done.
- Back in handle_inferior_event, we also call keep_going. However, we
only want to resume the event thread here, not all inferior threads.
In internal_resume_ptid (called by resume_1), we therefore now check
whether one of the inferiors we are about to resume has
thread_waiting_for_vfork_done set. If so, we only resume that
thread.
Note that when resuming multiple inferiors, one vforking and one not
non-vforking, we could resume the vforking thread from the vforking
inferior plus all threads from the non-vforking inferior. However,
this is not implemented, it would require more work.
- When we get the vfork-done event, the existing call to keep_going
naturally resumes all threads.
Testing-wise, add a test that tries to make the main thread hit a
breakpoint while a secondary thread calls vfork. Without the fix, the
main thread keeps going while breakpoints are removed, resulting in a
missed breakpoint and the program exiting.
Change-Id: I20eb78e17ca91f93c19c2b89a7e12c382ee814a1
I noticed that the gdb.server/server-pipe.exp test would sometimes
timeout when my machine was more heavily loaded. Turns out the test
is reading all the shared libraries over GDB's remote protocol, which
can be slow.
We avoid this in other tests by setting the sysroot in GDBFLAGS,
something which is missing from the gdb.server/server-pipe.exp test.
Fix the timeouts by setting sysroot in GDBFLAGS, after this the shared
libraries are no longer copied over the remote protocol, and I no
longer see the test timeout.
The previous patch added support for the DWARF prologue-end flag in line
table. This flag can be used by DWARF producers to indicate where to
place breakpoints past a function prologue. However, this takes
precedence over prologue analyzers. So if we have to debug a program
with erroneous debug information, the overall debugging experience will
be degraded.
This commit proposes to add a maintenance command to instruct GDB to
ignore the prologue_end flag.
Tested on x86_64-gnu-linux.
Change-Id: Idda6d1b96ba887f4af555b43d9923261b9cc6f82
Add support for DW_LNS_set_prologue_end when building line-tables. This
attribute can be set by the compiler to indicate that an instruction is
an adequate place to set a breakpoint just after the prologue of a
function.
The compiler might set multiple prologue_end, but considering how
current skip_prologue_using_sal works, this commit modifies it to accept
the first instruction with this marker (if any) to be the place where a
breakpoint should be placed to be at the end of the prologue.
The need for this support came from a problematic usecase generated by
hipcc (i.e. clang). The problem is as follows: There's a function
(lets call it foo) which covers PC from 0xa800 to 0xa950. The body of
foo begins with a call to an inlined function, covering from 0xa800 to
0xa94c. The issue is that when placing a breakpoint at 'foo', GDB
inserts the breakpoint at 0xa818. The 0x18 offset is what GDB thinks is
foo's first address past the prologue.
Later, when hitting the breakpoint, GDB reports the stop within the
inlined function because the PC falls in its range while the user
expects to stop in FOO.
Looking at the line-table for this location, we have:
INDEX LINE ADDRESS IS-STMT
[...]
14 293 0x000000000000a66c Y
15 END 0x000000000000a6e0 Y
16 287 0x000000000000a800 Y
17 END 0x000000000000a818 Y
18 287 0x000000000000a824 Y
[...]
For comparison, let's look at llvm-dwarfdump's output for this CU:
Address Line Column File ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
[...]
0x000000000000a66c 293 12 2 0 0 is_stmt
0x000000000000a6e0 96 43 82 0 0 is_stmt
0x000000000000a6f8 102 18 82 0 0 is_stmt
0x000000000000a70c 102 24 82 0 0
0x000000000000a710 102 18 82 0 0
0x000000000000a72c 101 16 82 0 0 is_stmt
0x000000000000a73c 2915 50 83 0 0 is_stmt
0x000000000000a74c 110 1 1 0 0 is_stmt
0x000000000000a750 110 1 1 0 0 is_stmt end_sequence
0x000000000000a800 107 0 1 0 0 is_stmt
0x000000000000a800 287 12 2 0 0 is_stmt prologue_end
0x000000000000a818 114 59 81 0 0 is_stmt
0x000000000000a824 287 12 2 0 0 is_stmt
0x000000000000a828 100 58 82 0 0 is_stmt
[...]
The main difference we are interested in here is that llvm-dwarfdump's
output tells us that 0xa800 is an adequate place to place a breakpoint
past a function prologue. Since we know that foo covers from 0xa800 to
0xa94c, 0xa800 is the address at which the breakpoint should be placed
if the user wants to break in foo.
This commit proposes to add support for the prologue_end flag in the
line-program processing.
The processing of this prologue_end flag is made in skip_prologue_sal,
before it calls gdbarch_skip_prologue_noexcept. The intent is that if
the compiler gave information on where the prologue ends, we should use
this information and not try to rely on architecture dependent logic to
guess it.
The testsuite have been executed using this patch on GNU/Linux x86_64.
Testcases have been compiled with both gcc/g++ (verison 9.4.0) and
clang/clang++ (version 10.0.0) since at the time of writing GCC does not
set the prologue_end marker. Tests done with GCC 11.2.0 (not over the
entire testsuite) show that it does not emit this flag either.
No regression have been observed with GCC or Clang. Note that when
using Clang, this patch fixes a failure in
gdb.opt/inline-small-func.exp.
Change-Id: I720449a8a9b2e1fb45b54c6095d3b1e9da9152f8
This updates the Ada expression parser to implement context-sensitive
field name completion. This is PR ada/28727.
This is somewhat complicated due to some choices in the Ada lexer --
it chooses to represent a sequence of "."-separated identifiers as a
single token, so the parser must partially recreate the completer's
logic to find the completion word boundaries.
Despite the minor warts in this patch, though, it is a decent
improvement. It's possible that the DWARF reader rewrite will help
fix the package completion problem pointed out in this patch as well.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28727
This adds a completer for Ada attributes. Some work in the lexer is
required in order to match end-of-input correctly, as flex does not
have a general-purpose way of doing this. (The approach taken here is
recommended in the flex manual.)
The Ada lexer allows whitespace between the apostrophe and the
attribute text, but processAttribute does not handle this. This patch
fixes the problem and introduces a regression test.
Normally, SPARK ghost entities are removed from the executable.
However, with -gnata, they will be preserved. In this situation, it's
handy to be able to inspect them. This patch allows this by removing
the "___ghost_" prefix in the appropriate places.
On openSUSE Leap 15.3 I run into:
...
KPASS: gdb.ada/arrayptr.exp: scenario=minimal: print pa_ptr.all \
(PRMS minimal encodings)
KPASS: gdb.ada/arrayptr.exp: scenario=minimal: print pa_ptr(3) \
(PRMS minimal encodings)
KPASS: gdb.ada/arrayptr.exp: scenario=minimal: print pa_ptr.all(3) \
(PRMS minimal encodings)
...
The test-case KFAILs some tests. However, the analysis in the corresponding
PR talks of a compiler problem, so it should use XFAILs instead.
The KFAILs are setup for pre-gcc-12, but apparantly the fix has been
backported to system compiler 7.5.0, hence the KPASS.
Fix this by:
- using an XFAIL instead of a KFAIL
- matching the specific gdb output that corresponds to the XFAILs
(reproduced on Fedora 34).
Tested on x86_64-linux, specifically openSUSE Leap 15.3 and Fedora 34.
This patch adds a new dynamic property DYN_PROP_RANK, this property is
read from the DW_AT_rank attribute and stored within the type just
like other dynamic properties.
As arrays with dynamic ranks make use of a single
DW_TAG_generic_subrange to represent all ranks of the array, support
for this tag has been added to dwarf2/read.c.
The final piece of this puzzle is to add support in gdbtypes.c so that
we can resolve an array type with dynamic rank. To do this the
existing resolve_dynamic_array_or_string function is split into two,
there's a new resolve_dynamic_array_or_string_1 core that is
responsible for resolving each rank of the array, while the now outer
resolve_dynamic_array_or_string is responsible for figuring out the
array rank (which might require resolving a dynamic property) and then
calling the inner core.
The resolve_dynamic_range function now takes a rank, which is passed
on to the dwarf expression evaluator. This rank will only be used in
the case where the array itself has dynamic rank, but we now pass the
rank in all cases, this should be harmless if the rank is not needed.
The only small nit is that resolve_dynamic_type_internal actually
handles resolving dynamic ranges itself, which now obviously requires
us to pass a rank value. But what rank value to use? In the end I
just passed '1' through here as a sane default, my thinking is that if
we are in resolve_dynamic_type_internal to resolve a range, then the
range isn't part of an array with dynamic rank, and so the range
should actually be using the rank value at all.
An alternative approach would be to make the rank value a
gdb::optional, however, this ends up adding a bunch of complexity to
the code (e.g. having to conditionally build the array to pass to
dwarf2_evaluate_property, and handling the 'rank - 1' in
resolve_dynamic_array_or_string_1) so I haven't done that, but could,
if people think that would be a better approach.
Finally, support for assumed rank arrays was only fixed very recently
in gcc, so you'll need the latest gcc in order to run the tests for
this.
Here's an example test program:
PROGRAM arank
REAL :: a1(10)
CALL sub1(a1)
CONTAINS
SUBROUTINE sub1(a)
REAL :: a(..)
PRINT *, RANK(a)
END SUBROUTINE sub1
END PROGRAM arank
Compiler Version:
gcc (GCC) 12.0.0 20211122 (experimental)
Compilation command:
gfortran assumedrank.f90 -gdwarf-5 -o assumedrank
Without Patch:
gdb -q assumedrank
Reading symbols from assumedrank...
(gdb) break sub1
Breakpoint 1 at 0x4006ff: file assumedrank.f90, line 10.
(gdb) run
Starting program: /home/rupesh/STAGING-BUILD-2787/bin/assumedrank
Breakpoint 1, arank::sub1 (a=<unknown type in /home/rupesh/STAGING-BUILD-2787
/bin/assumedrank, CU 0x0, DIE 0xd5>) at assumedrank.f90:10
10 PRINT *, RANK(a)
(gdb) print RANK(a)
'a' has unknown type; cast it to its declared type
With patch:
gdb -q assumedrank
Reading symbols from assumedrank...
(gdb) break sub1
Breakpoint 1 at 0x4006ff: file assumedrank.f90, line 10.
(gdb) run
Starting program: /home/rupesh/STAGING-BUILD-2787/bin/assumedrank
Breakpoint 1, arank::sub1 (a=...) at assumedrank.f90:10
10 PRINT *, RANK(a)
(gdb) print RANK(a)
$1 = 1
(gdb) ptype a
type = real(kind=4) (10)
(gdb)
Co-Authored-By: Andrew Burgess <aburgess@redhat.com>
This commit resolves all the duplicate test names that I see in the
script:
gdb.base/attach-pie-misread.exp
The duplicate names all come from a second call to
build_executable_own_libs, so in this commit I've places the second
call inside a with_test_prefix block.
While I was making this change I've also modified the value being
passed as the testname for the second build_executable_own_libs call.
Previously we used ${test}, however, I think this was likely a
mistake, the 'test' variable is setup for the previous test. I
suspect that ${testfile} is a better choice - especially now we have a
testname prefix.
As the testname is only used (after various calls) from within
build_executable_from_specs should the build fail, I don't think this
change really makes much difference though.
There should be no change in what is tested after this commit.
Solve two duplicate test names in the test script:
gdb.mi/mi-catch-cpp-exceptions.exp
by moving the call to restart_for_test inside the with_test_prefix
block. There should be no difference in what is tested after this
commit.
Currently, in master gdb, when a tui window is changed in size, the
screen delta is mostly just added to the next available window. We
do take care to respect the min/max size, but in most cases, these
limits are just "the terminal size", and so, we end up placing the
whole delta on the next window.
Consider these steps in an 80 column, 24 line terminal:
(gdb) tui enable
(gdb) layout src
(gdb) layout split
(gdb) info win
Name Lines Columns Focus
src 8 80 (has focus)
asm 8 80
status 1 80
cmd 8 80
(gdb) winheight cmd +2
(gdb) info win
Name Lines Columns Focus
src 6 80 (has focus)
asm 8 80
status 1 80
cmd 10 80
Notice that initially, the windows were balanced, 8 lines each for the
major windows. Then, when the cmd window was adjusted, the extra two
lines were given to the asm window.
I think it would be nicer if the delta was spread more evenly over the
available windows. In the example above, after the adjustment the
layout now looks like:
(gdb) info win
Name Lines Columns Focus
src 7 80 (has focus)
asm 7 80
status 1 80
cmd 10 80
This is achieved within tui_layout_split::set_size, by just handing
out the delta in increments of 1 to each window (except for the window
the user adjusted), until there's no more delta left. Of course, we
continue to respect the min/max window sizes.
This commit removes some arbitrary adjustments made in
tui_cmd_window::max_height, tui_win_info::max_height, and
tui_win_info::max_width.
These member functions all subtract some constant from the theoretical
maximum height or width. I've looked back through the history a
little and can see no real reason why these adjustments should be
needed, with these adjustments removed all the existing tui tests
still pass.
However, retaining these restrictions causes some bugs, consider:
(gdb) tui new-layout hsrc {-horizontal src 1 cmd 1} 1
When this layout is selected with current master, gdb will leave a 4
line gap at the bottom of the terminal.
The problem is that the maximum height is restricted, for the cmd
window, to 4 less than the terminal height.
By removing this restriction gdb is able to size the windows to the
complete terminal height, and the layout is done correctly.
This 4 line restriction is also what prevents this layout from working
correctly:
(gdb) tui new-layout conly cmd 1
Previously, this layout would present a cmd window only, but there
would be a 4 line gap at the bottom of the terminal. This issue was
mentioned in an earlier commit in this series (when a different bug
was fixed), but with this commit, the above layout now correctly fills
the terminal. The associated test is updated.
After removing the adjustment in tui_cmd_window::max_height, the
implementation is now the same as the implementation in the parent
class tui_win_info, so I've completely removed the max_height call
from tui_cmd_window.
This commit just adds an extra check of the src window size prior to
sending all the commands to gdb. We also set the cmd window height to
its existing height, this (obviously) shouldn't change the window
layout, which we check.
My main motivation was adding the initial window layout check, the
winheight and recheck are just extras. All of these test pass both
before and after this commit.
This commit allows the user to place the cmd window within horizontal
tui layouts. Consider this set of steps, carried out in an 80 columns
by 24 lines terminal, using current master gdb:
(gdb) tui new-layout hsrc { -horizontal src 1 cmd 1 } 1 status 1
(gdb) tui layout hsrc
What you end up with is a full width cmd window with the status bar
beneath. Where's the src window gone? We then try:
(gdb) info win
Name Lines Columns Focus
src 23 3 (has focus)
cmd 23 80
status 1 80
(gdb)
Something weird has gone on, gdb has overlapped the cmd window with
the src window. If we trigger the src window to redraw is content,
for example, 'list main', then we see corruption in the cmd window as
the src window overwrites it.
So, what's going on?
The problem is some code in tui_layout_split::apply, in tui-layout.c.
Within 'Step 1', there is a loop that calculates the min/max window
sizes for all windows within a tui_layout_split. However, there's a
special case for the cmd window.
This special case is trying to have the cmd window retain its current
size when a layout is re-applied, or a new layout is applied. This
makes sense, consider moving from the 'src' layout to the 'asm'
layout, this looks something like this (status window removed):
.-------. .-------.
| src | | asm |
|-------| ====> |-------|
| cmd | | cmd |
'-------' '-------'
If the user has gone to the effort of adjusting the cmd window size,
then, the thinking goes, we shouldn't reset the cmd window size when
switching layouts like this.
The problem though, is that when we do a switch more like this:
.-----------. .-----------.
| src | | | |
|-----------| ====> | asm | cmd |
| cmd | | | |
'-----------' '-----------'
Now retaining the cmd window width makes no sense; the new layout has
a completely different placement for the cmd window, instead of sizing
by height, we're now sizing by width. The existing code doesn't
understand this though, and tried to retain the full width for the cmd
window.
To solve this problem, I propose we introduce the idea of a layout
"fingerprint". The fingerprint tries to capture, in an abstract way,
where the cmd window lives within the layout.
Only when two layouts have the same fingerprint will we attempt to
retain the cmd window size.
The fingerprint for a layout is represented as a string, the string is
a series of 'V' or 'H' characters, ending with a single 'C'
character. The series of 'V' and 'H' characters represent the
vertical or horizontal layouts that must be passed through to find the
cmd window.
Here are a few examples:
# This layout is equivalent to the builtin 'src' layout.
# Fingerprint: VC
tui new-layout example1 src 2 status 0 cmd 1
# This layout is equivalent to the builtin 'split' layout.
# Fingerprint: VC
tui new-layout example2 src 1 asm 1 status 0 cmd 1
# This is the same layout that was given at the top.
# Fingerprint: VHC
tui new-layout hsrc { -horizontal src 1 cmd 1 } 1 status 1
And so, when switching between example1 and example2, gdb knows that
the cmd window is, basically, in the same sort of position within the
layout, and will retain the cmd window size.
In contrast, when switching to the hsrc layout, gdb understands that
the position of the cmd window is different, and does not try to
retain the cmd window size.
When we switch layouts we call the tui_layout_split::apply member
function to reapply the layout, and recalculate all the window sizes.
One special case is the cmd window, which we try to keep at its
existing size.
However, in some cases it is not appropriate to keep the cmd window at
its existing size. I will describe two such cases here, in one, we
want the cmd window to reduce in size, and in the other, we want the
cmd window to grow in size.
Try these steps in a 80 columns, by 24 lines terminal:
(gdb) tui enable
(gdb) layout src
(gdb) winheight cmd 20
(gdb) layout split
You should see that the status window is missing from the new layout,
and that the cmd window has been placed over the border of the asm
window. The 'info win' output is:
(gdb) info win
Name Lines Columns Focus
src 3 80 (has focus)
asm 3 80
status 1 80
cmd 20 80
Notice that gdb has assigned 27 lines of screen space, even with the
border overlap between the src and asm windows, this is still 2 lines
too many.
The problem here is that after switching layouts, gdb has forced the
cmd window to retain its 20 line height. Really, we want the cmd
window to reduce in height so that the src and asm windows can occupy
their minimum required space.
This commit allows this (details on how are below). After this
commit, in the above situation, we now see the status window displayed
correctly, and the 'info win' output is:
(gdb) info win
Name Lines Columns Focus
src 3 80 (has focus)
asm 3 80
status 1 80
cmd 18 80
The cmd window has been reduced in size by 2 lines so that everything
can fit on the screen.
The second example is one which was discussed in a recent commit,
consider this case (still in the 80 column, 24 line terminal):
(gdb) tui enable
(gdb) tui new-layout conly cmd 1
(gdb) layout conly
(gdb) info win
Name Lines Columns Focus
cmd 8 80 (has focus)
(gdb)
This layout only contains a cmd window, which we would expect to
occupy the entire terminal. But instead, the cmd window only occupies
the first 8 lines, and the rest of the terminal is unused!
The reason is, again, that the cmd window is keeping its previous
size (8 lines).
After this commit things are slightly different, the 'info win' output
is now:
(gdb) info win
Name Lines Columns Focus
cmd 20 80 (has focus)
Which is a little better, but why only 20 lines? Turns out there's
yet another bug hitting this case. That bug will be addressed in a
later commit, so, for now, we're accepting the 20 lines.
What this commit does is modify the phase of tui_layout_split::apply
that handles any left over space. Usually, in "Step 2", each
sub-layout has a size calculated. As the size is an integer, then,
when all sizes are calculated we may have some space left over.
This extra space is then distributed between all the windows fairly
until all the space is used up.
When we consider windows minimum size, or fixed size windows, then it
is possible that we might try to use more space than is available,
this was our first example above. The same code that added extra
space to the windows, can also be used to reclaim space (in the over
allocation case) to allow all windows to fit.
The problem then is the cmd window, which we often force to a fixed
size. Inside the loop that handles the allocation of excess space, if
we find that we have tried every window, and still have space either
left to give, or we need to claim back more space, then, if the cmd
window was changed to a fixed size, we can change the cmd window back
to a non-fixed-size window, and proceed to either give, or take space
from the cmd window as needed.
When applying layouts gdb computes the size of each window (or rather,
each sub-layout within a layout) using integer arithmetic. As this
rounds down the results, then, when all sub-layouts are sized, there
is the possibility that we have some space left over.
Currently, this space is just assigned to an arbitrary sub-layout.
This can result in some unbalanced results. Consider this set of
steps with current master:
(gdb) tui enable
(gdb) layout regs
(gdb) info win
Name Lines Columns Focus
regs 7 80
src 9 80 (has focus)
status 1 80
cmd 8 80
Notice the weird split between the src and regs windows, the original
layout specification has these windows given equal weight. The
problem is that, with rounding, both the regs and src windows are
initially sized to 7, the extra 2 lines are then arbitrarily added to
the src window.
In this commit, rather than add all the extra space to one single
window, I instead hand out the extra space 1 line at a time, looping
over all the sub-layouts. We take care to respect the min/max sizes,
and so, we now get this result:
(gdb) tui enable
(gdb) layout regs
(gdb) info win
Name Lines Columns Focus
regs 8 80
src 8 80 (has focus)
status 1 80
cmd 8 80
This looks more natural to me.
This is obviously a change in behaviour, and so, lots of the existing
tests need to be updated to take this into account. None of the
changes are huge, it's just a line or two (or column for width) moved
between windows.
Consider:
(gdb) tui enable
(gdb) layout src
(gdb) tui new-layout conly cmd 1
(gdb) layout conly
After this, with current master, gdb crashes with a floating-point
exception.
The problem is that in tui_layout_split::apply, when we switch from
'src' to 'conly', we will try to retain the cmd window height. As
such, the cmd window will become a fixed size window, which decreases
the available_size, but doesn't count towards the total_weight.
As the cmd window is the only window, the total_weight stays at zero,
and, when we move into step 2, where we attempt to size the windows,
we perform a divide by zero, and crash.
After this commit we avoid the divide by zero, and just directly set
the window size based on the fixed size.
There is still a problem after this commit, when the conly layout is
selected the cmd window retains its original height, which will only
be part of the terminal. The rest of the terminal is left unused.
This issue will be addressed in a later commit, this commit is just
about the floating-point exception.
This commit changes the gdb.tui/new-layout.exp test to make use of a
list of test descriptions, and a loop to check each description in
turn. There's no change to what is actually tested after this commit.
In future commits I plan to add additional tests to this file, and
this will be easier now that all I have to do is add a new test
description to the list.
This commit adds a new command 'tui window width', and an alias
'winwidth'. This command is equivalent to the old 'winheight'
command (which was recently renamed 'tui window height').
Even though I recently moved the old tui commands under the tui
namespace, and I would strongly encourage all new tui commands to be
added as 'tui ....' only (users can create their own top-level aliases
if they want), I'm breaking that suggestion here, and adding a
'winwidth' alias.
Given that we already have 'winheight' and have done for years, it
just didn't seem right to no have the matching 'winwidth'.
You might notice in the test that the window resizing doesn't quite
work right. I setup a horizontal layout, then grow and shrink the
windows. At the end of the test the windows should be back to their
original size...
... they are not. This isn't my fault, honest! GDB's window resizing
is a little ... temperamental, and is prone to getting things slightly
wrong during resizes, off by 1 type things. This is true for height
resizing, as well as the new width resizing.
Later patches in this series will rework the resizing algorithm, which
should improve things in this area. For now, I'm happy that the width
resizing is as good as the height resizing, given the existing quirks.
For the docs side I include a paragraph that explains how multiple
windows are required before the width can be adjusted. For
completeness, I've added the same paragraph to the winheight
description. With the predefined layouts this extra paragraph is not
really needed for winheight, as there are always multiple windows on
the screen. However, with custom layouts, this might not be true, so
adding the paragraph seems like a good idea.
As for the changes in gdb itself, I've mostly just taken the existing
height adjustment code, changed the name to make it generic 'size'
adjustment, and added a boolean flag to indicate if we are adjusting
the width or the height.
I noticed that GDB will display URLs in a few spots. This changes
them to be styled. Originally I thought I'd introduce a new "url"
style, but there aren't many places to use this, so I just reused
filename styling instead. This patch also changes the debuginfod URL
list to be printed one URL per line. I think this is probably a bit
easier to read.
The check removed by this patch, using current_inferior, looks wrong.
When debugging multiple inferiors with the Linux native target and
linux_handle_extended_wait is called, there's no guarantee about which
is the current inferior. The vfork-done event we receive could be for
any inferior. If the vfork-done event is for a non-current inferior, we
end up wrongfully ignoring it. As a result, the core never processes a
TARGET_WAITKIND_VFORK_DONE event, program_space::breakpoints_not_allowed
is never cleared, and breakpoints are never reinserted. However,
because the Linux native target decided to ignore the event, it resumed
the thread - while breakpoints out. And that's bad.
The proposed fix is to remove this check. Always report vfork-done
events and let infrun's logic decide if it should be ignored. We don't
save much cycles by filtering the event here.
Add a test that replicates the situation described above. See comments
in the test for more details.
Change-Id: Ibe33c1716c3602e847be6c2093120696f2286fbf
Since commit 3cd5229387 ("Change the pager to a ui_file"), I see these
errors when running gdb.tui/scroll.exp:
ERROR: invalid command name "_csi_P"
while executing
"::gdb_tcl_unknown _csi_P 2"
("uplevel" body line 1)
invoked from within
"uplevel 1 ::gdb_tcl_unknown $args"
(procedure "::unknown" line 5)
invoked from within
"_csi_P 2"
("eval" body line 1)
invoked from within
"eval _csi_$cmd $params"
It looks like GDB is emitting a CSI that it did not emit before, the
"Delete character" one:
https://vt100.net/docs/vt510-rm/DCH.html
Implement it.
Co-Authored-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: I5bf86b6104d51b0623a26a69df83d1ca9a4851b7
The particular behavior we have when using that combination of settings
doesn't seem tested at all (at least, I don't find it if I grep for "Can
not resume the parent process"). Add a simple test for that.
Change-Id: Ib9454a615abba661b42f1b15056df73ed1bcd4c5
While trying to review Andrew's patch here [1], I thought I spotted a
bug in the handling of a CSI, but I had no way to know for sure. So I
thought it would be useful to have unit tests for the handling of
control characters and control sequences of our toy terminal
implementation. It might help avoid chasing bugs in the GDB TUI when in
reality it's a problem with the testsuite's terminal implementation.
Add the gdb.tui/tuiterm.exp file to do that. All currently supported
control sequences and characters are tested, except _csi_m (the one that
handles colors and stuff). _csi_m should probably be tested too, but it
will require more work.
Fix a few issues that the tests spotted:
- backspace: according to [3] (table 4-1), a backspace when the cursor
is at the beginning of a line should have no effect. Our
implementation did wrap to the end of the previous line. Change our
implementation to match the doc (and the test).
- insert character: this control sequence is supposed to insert blank
characters, shifting all the rest of the line right. The current
implementation moves N characters right, but it overwrites the
characters on the right instead of shifting them. It also doesn't
insert blank characters at the cursor.
- Cursor down, forward, next line: off-by-one error when reaching the
end of the display.
- erase in display, line: off-by-one errors.
- vertical line position absolute: allowed setting the cursor outside
the display, when it should clamp it to the display size.
I found that this web page [2] gave some good clues on the expected
behavior of some control characters or sequences that some other pages
didn't.
[1] https://sourceware.org/pipermail/gdb-patches/2022-March/186433.html
[2] https://docs.microsoft.com/en-us/windows/console/console-virtual-terminal-sequences
[3] https://vt100.net/docs/vt510-rm/chapter4.html#S4.3.3
Change-Id: Iab4141fdcfb7459d1b7c45cc63bd1fcb50a78d5d
In Ada, if a class implements an interface and has a dynamic
superclass, then the "offset to top" -- the offset that says how to
turn a pointer to the interface into a pointer to the whole object --
is stored in the object itself. This patch changes GDB to understand
this.
Because this only touches Ada code, and because Joel already reviewed
it internally, I am checking it in.
When parsing the ptid out of a reply package, if the multi-process
extensions are not supported, use current_inferior's pid as the pid of
the reported thread, instead of inferior_ptid. This is needed because
the inferior_ptid may be null_ptid although a legit context exists,
due to a prior context switch via switch_to_inferior_no_thread.
Below is a scenario that illustrates what could go wrong. First,
setup a multi-target scenario. This is needed, because in a
multi-target setting, the inferior_ptid is cleared out before waiting
on targets. The second inferior below sits on top of a remote target.
Multi-process packets are disabled.
$ # First, spawn a process with PID 26253 to attach to later.
$ gdb-up a.out
Reading symbols from a.out...
(gdb) maint set target-non-stop on
(gdb) set remote multiprocess-feature-packet off
(gdb) start
...
(gdb) add-inferior -no-connection
[New inferior 2]
Added inferior 2
(gdb) inferior 2
[Switching to inferior 2 [<null>] (<noexec>)]
(gdb) target extended-remote | gdbserver --multi -
Remote debugging using | gdbserver --multi -
Remote debugging using stdio
(gdb) attach 26253
Attaching to Remote target
Attached; pid = 26253
[New Thread 26253]
[New inferior 3]
Reading /tmp/a.out from remote target...
...
[New Thread 26253]
...
Reading /usr/local/lib/debug/....debug from remote target...
>>> GDB seems to hang here.
After attaching to a process and reading some library files, GDB
seems to hang. One interesting thing to note is that
[New Thread 26253]
appears twice. We also see
[New inferior 3]
Running the same scenario with "debug infrun on" reveals more details.
...
(gdb) attach 26253
[infrun] scoped_disable_commit_resumed: reason=attaching
Attaching to Remote target
Attached; pid = 26253
[New Thread 26253]
[infrun] infrun_async: enable=1
[infrun] attach_command: immediately after attach:
[infrun] attach_command: thread 26253.26253.0, executing = 1, resumed = 0, state = RUNNING
[infrun] clear_proceed_status_thread: 26253.26253.0
[infrun] reset: reason=attaching
[infrun] maybe_set_commit_resumed_all_targets: not requesting commit-resumed for target native, no resumed threads
[infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target extended-remote
[infrun] fetch_inferior_event: enter
[infrun] scoped_disable_commit_resumed: reason=handling event
[infrun] do_target_wait: Found 2 inferiors, starting at #1
[infrun] random_pending_event_thread: None found.
[infrun] print_target_wait_results: target_wait (-1.0.0 [Thread 0], status) =
[infrun] print_target_wait_results: 26253.26253.0 [Thread 26253],
[infrun] print_target_wait_results: status->kind = STOPPED, sig = GDB_SIGNAL_0
[infrun] handle_inferior_event: status->kind = STOPPED, sig = GDB_SIGNAL_0
[infrun] start_step_over: enter
[infrun] start_step_over: stealing global queue of threads to step, length = 0
[infrun] operator(): step-over queue now empty
[infrun] start_step_over: exit
[infrun] context_switch: Switching context from 0.0.0 to 26253.26253.0
[infrun] handle_signal_stop: stop_pc=0x7f849d8cf151
[infrun] stop_waiting: stop_waiting
[infrun] stop_all_threads: starting
[infrun] stop_all_threads: pass=0, iterations=0
[New inferior 3]
Reading /tmp/a.out from remote target...
warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead.
Reading /tmp/a.out from remote target...
Reading symbols from target:/tmp/a.out...
[New Thread 26253]
[infrun] stop_all_threads: 4723.4723.0 not executing
[infrun] stop_all_threads: 26253.26253.0 not executing
[infrun] stop_all_threads: 42000.26253.0 executing, need stop
[infrun] print_target_wait_results: target_wait (-1.0.0 [Thread 0], status) =
[infrun] print_target_wait_results: -1.0.0 [Thread 0],
[infrun] print_target_wait_results: status->kind = IGNORE
[infrun] print_target_wait_results: target_wait (-1.0.0 [Thread 0], status) =
[infrun] print_target_wait_results: -1.0.0 [Thread 0],
[infrun] print_target_wait_results: status->kind = IGNORE
GDB tried to stop Thread 42000.26253.0, which does not exist, and we
are waiting for a stop event that will never happen. The PID in
'42000.26253.0', namely 42000, is the PID of magic_null_ptid.
It comes from gdb/remote.c:read_ptid:
/* Since the stub is not sending a process id, then default to
what's in inferior_ptid, unless it's null at this point. If so,
then since there's no way to know the pid of the reported
threads, use the magic number. */
if (inferior_ptid == null_ptid)
pid = magic_null_ptid.pid ();
else
pid = inferior_ptid.pid ();
if (obuf)
*obuf = pp;
return ptid_t (pid, tid);
Because multi-process was turned off, GDB did not parse an explicitly
specified PID. Furthermore, inferior_ptid == null_ptid, and
eventually GDB picked the PID from magic_null_ptid.
If target-non-stop is not turned on at the beginning, the same bug
reveals itself as a duplicated thread as shown below.
# Same setup as above, without 'maint set target-non-stop on'.
...
(gdb) attach 26253
Attaching to Remote target
Attached; pid = 26253
[New inferior 3]
...
[New Thread 26253]
...
(gdb) info threads
Id Target Id Frame
1.1 process 13517 "a.out" main () at test.c:3
* 2.1 Thread 26253 "a.out" 0x00007f12750c5151 in read () from target:/lib/x86_64-linux-gnu/libc.so.6
3.1 Thread 26253 "a.out" Remote 'g' packet reply is too long (expected 560 bytes, got 2496 bytes): 00feffffffffffff000a3a75127f000051510c75127f0000000400000000000060d24ef6af5500000000000000000000680d000000000000b85b31e3fc7f0000c0283a75127f000000e55b75127f000010d04ef6af550000460200000000000060c73975127f0000a0d23975127f0000000a3a75127f0000000000000000000051510c75127f000046020000330000002b0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007f03000000000000ffff0000000000000000000000000000000000000000000080143a75127f000080143a75127f000040143a75127f000040143a75127f00007d0000007e0000007f00000080000000300c3a75127f0000300c3a75127f00000e000000000000000e0000000000000000000000000000000000000000000000ffffffffffffffffffffffffffffffff0400000004000000040000000400000020143a75127f000020143a75127f000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000801f0000000000000000000000e55b75127f0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
(gdb)
Fix the problem by preferring current_inferior()'s pid instead of
magic_null_ptid.
Regression-tested on X86-64 Linux.
Co-authored-by: Aleksandar Paunovic <aleksandar.paunovic@intel.com>
The test added in the commit:
commit a6b413d24c
Date: Fri Mar 11 14:44:03 2022 +0000
gdb: work around prompt corruption caused by bracketed-paste-mode
Was not written with readline 7 in mind, only readline 8+. Between
readline 7 and 8 the escape sequence used to disable bracketed paste
mode changed, an additional '\r' character was added to the end. In
fact, it was the addition of this '\r' character that triggered the
issue for which the above commit is part of the solution.
Anyway, the test tries to spot the case where the output from GDB is
not perfect, but does have the above work around applied. However,
the pattern in the test assumes that the problematic '\r' will be
present, and this is only true for readline 8+. With readline 7 the
test was failing.
In this commit I generalise the pattern a little so that the test will
still KFAIL with readline 7.
It's a little unfortunate that the test is KFAILing with readline 7,
as without the problematic '\r' there's actually no reason that GDB
couldn't "do the right thing" in this case, in which case, the test
would PASS, but that would require changes within GDB itself.
My preference then is that initially we patch the test to get it
KFAILing, then in a separate commit I can modify GDB so that it can
PASS with readline 7.
The test gdb.python/py-format-address.exp, added in commit:
commit 25209e2c69
Date: Sat Oct 23 09:59:25 2021 +0100
gdb/python: add gdb.format_address function
included 3 copy & paste errors where the wrong address was used in the
expected output patterns.
The test compiles two almost identical test binaries (one function
changes its name, that's the only difference), two inferiors are
created, each inferior using one of the test binaries.
We then take the address of the name changing function in both
inferiors ('foo' in inferior 1 and 'bar' in inferior 2) and the tests
are carried out using these addresses.
What we're checking for is that symbols 'foo' and 'bar' show up in the
correct inferior, and that (as this test is for a Python API feature),
the user can have one inferior selected, but ask about the other
inferior, and see the correct symbol in the result.
The hope is that the two binaries will be laid out identically by the
compiler, and that 'foo' and 'bar' will be at the same address. This
is fine, unless the executable is compiled as PIE (position
independent executable), in which case there is a problem.
The problem is that though inferior 1 is set running, the inferior 2
never is. If the executables are compiled as PIE, then the address in
the inferior 2 will not have been resolved, while the address in the
inferior 1 will have been, and so the two addresses we use in the
tests will be different.
This issue was reported here:
https://sourceware.org/pipermail/gdb-patches/2022-March/186911.html
The first part of the fix is to use the correct address variable in
the expected output patterns, with this change the tests pass even
when the executables are compiled as PIE.
A second part of this fix is to pass the 'nopie' option when we
compile the tests, this should ensure that the address obtained in
inferior 2 is the same as the address from inferior 1, which makes the
test more useful.
I noticed that "print 5," passed in Rust -- the parser wasn't checking
that the entire input was used. This patch fixes the problem. This
in turn pointed out another bug in the parser, namely that it didn't
lex the next token after handling a string token. This is also fixed
here.
call_site_find_chain_1 has a comment claiming that recursive calls
would be too expensive. However, I doubt this is so expensive; and
furthermore the explicit state management approach here is difficult
both to understand and to modify. This patch changes this code to use
explicit recursion, so that a subsequent patch can generalize this
code without undue trauma.
Additionally, I think this patch detects a latent bug in the recursion
code. (It's hard for me to be completely certain.) The bug is that
when a new target_call_site is entered, the code does:
if (target_call_site)
{
if (addr_hash.insert (target_call_site->pc ()).second)
{
/* Successfully entered TARGET_CALL_SITE. */
chain.push_back (target_call_site);
break;
}
}
Here, if entering the target_call_site fails, then any tail_call_next
elements in this call site are not visited. However, if this code
does happen to enter a call site, then the tail_call_next elements
will be visited during backtracking. This applies when doing the
backtracking as well -- it will only continue through a given chain as
long as each element in the chain can successfully be visited.
I'd appreciate some review of this. If this behavior is intentional,
it can be added to the new implementation.
This test was added without a corresponding fix, with some setup_kfails.
However, it results in UNRESOLVED results when GDB is built with ASan.
ERROR: GDB process no longer exists
GDB process exited with wait status 1946871 exp7 0 1
UNRESOLVED: gdb.python/pretty-print-call-by-hand.exp: frame print: backtrace test (PRMS gdb/28856)
Remove the test from the tree, I'll attach it to the Bugzilla bug
instead [1].
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=28856
Change-Id: Id95d8949fb8742874bd12aeac758aa4d7564d678
The gdb.mi/mi-multi-commands.exp test was added in commit:
commit d08cbc5d32
Date: Wed Dec 22 12:57:44 2021 +0000
gdb: unbuffer all input streams when not using readline
And then tweaked in commit:
commit 144459531d
Date: Mon Feb 7 20:35:58 2022 +0000
gdb/testsuite: relax pattern in new gdb.mi/mi-multi-commands.exp test
The second of these commits was intended to address periodic test
failures that I was seeing, and this change did fix some problems,
but, unfortunately, introduced other issues.
The problem is that the test relies on sending two commands to GDB in
a single write. As the characters that make these two commands arrive
they are echoed to GDB's console. However, there is a race between
how quickly the characters are echoed and how quickly GDB decides to
act on the incoming commands.
Usually, both commands are echoed in full before GDB acts on the first
command, but sometimes this is not the case, and GDB can execute the
first command before both commands are fully echoed to the console.
In this case, the output of the first command will be mixed in with
the echoing of the second command.
This mixing of the command echoing and the first command output is
what was causing failures in the original version of the test.
The second commit relaxed the expected output pattern a little, but
was still susceptible to failures, so this commit further relaxes the
pattern.
Now, we look for the first command output with no regard to what is
before, or after the command. Then we look for the first mi prompt to
indicate that the first command has completed.
I believe that this change should make the test more stable than it
was before.
New in this version:
- Add a PY_MAJOR_VERSION check in configure.ac / AC_TRY_LIBPYTHON. If
the user passes --with-python=python2, this will cause a configure
failure saying that GDB only supports Python 3.
Support for Python 2 is a maintenance burden for any patches touching
Python support. Among others, the differences between Python 2 and 3
string and integer types are subtle. It requires a lot of effort and
thinking to get something that behaves correctly on both. And that's if
the author and reviewer of the patch even remember to test with Python
2.
See this thread for an example:
https://sourceware.org/pipermail/gdb-patches/2021-December/184260.html
So, remove Python 2 support. Update the documentation to state that GDB
can be built against Python 3 (as opposed to Python 2 or 3).
Update all the spots that use:
- sys.version_info
- IS_PY3K
- PY_MAJOR_VERSION
- gdb_py_is_py3k
... to only keep the Python 3 portions and drop the use of some
now-removed compatibility macros.
I did not update the configure script more than just removing the
explicit references to Python 2. We could maybe do more there, like
check the Python version and reject it if that version is not
supported. Otherwise (with this patch), things will only fail at
compile time, so it won't really be clear to the user that they are
trying to use an unsupported Python version. But I'm a bit lost in the
configure code that checks for Python, so I kept that for later.
Change-Id: I75b0f79c148afbe3c07ac664cfa9cade052c0c62
If /proc/sys/kernel/yama/ptrace_scope is 1, when execute the following
command without superuser:
make check-gdb TESTS="gdb.base/jit-elf.exp"
we can see the following messages in gdb/testsuite/gdb.log:
(gdb) attach 1650108
Attaching to program: /home/yangtiezhu/build/gdb/testsuite/outputs/gdb.base/jit-elf/jit-elf-main, process 1650108
ptrace: Operation not permitted.
(gdb) FAIL: gdb.base/jit-elf.exp: attach: one_jit_test-2: break here 1: attach
use gdb_attach to fix the above issue, at the same time, the clean_reattach
proc should return a value to indicate whether it worked, and the callers
should return early as well on failure.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
If /proc/sys/kernel/yama/ptrace_scope is 1, when execute the following
command without superuser:
make check-gdb TESTS="gdb.base/attach-pie-noexec.exp"
we can see the following messages in gdb/testsuite/gdb.log:
(gdb) attach 6500
Attaching to process 6500
ptrace: Operation not permitted.
(gdb) PASS: gdb.base/attach-pie-noexec.exp: attach
It is obviously wrong, the expected result should be UNSUPPORTED in such
a case.
With this patch, we can see "Operation not permitted" in the log info,
and then we can do the following processes to test:
(1) set ptrace_scope as 0
$ echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
$ make check-gdb TESTS="gdb.base/attach-pie-noexec.exp"
(2) use sudo
$ sudo make check-gdb TESTS="gdb.base/attach-pie-noexec.exp"
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
This commit adds new gdb_attach to centralize the failure checking of
"attach" command. Return 0 if attach failed, otherwise return 1.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
As Pedro Alves said, caching procs should not issue pass/fail [1],
this commit removes attach test from can_spawn_for_attach, at the
same time, use "verbose -log" instead of "unsupported" to get a
trace about why a test run doesn't support spawning for attach.
[1] https://sourceware.org/pipermail/gdb-patches/2022-March/186311.html
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Add a new function, gdb.format_address, which is a wrapper around
GDB's print_address function.
This method takes an address, and returns a string with the format:
ADDRESS <SYMBOL+OFFSET>
Where, ADDRESS is the original address, formatted as hexadecimal,
SYMBOL is a symbol with an address lower than ADDRESS, and OFFSET is
the offset from SYMBOL to ADDRESS in decimal.
If there's no SYMBOL suitably close to ADDRESS then the
<SYMBOL+OFFSET> part is not included.
This is useful if a user wants to write a Python script that
pretty-prints addresses, the user no longer needs to do manual symbol
lookup, or worry about correctly formatting addresses.
Additionally, there are some settings that effect how GDB picks
SYMBOL, and whether the file name and line number should be included
with the SYMBOL name, the gdb.format_address function ensures that the
users Python script also benefits from these settings.
The gdb.format_address by default selects SYMBOL from the current
inferiors program space, and address is formatted using the
architecture for the current inferior. However, a user can also
explicitly pass a program space and architecture like this:
gdb.format_address(ADDRESS, PROGRAM_SPACE, ARCHITECTURE)
In order to format an address for a different inferior.
Notes on the implementation:
In py-arch.c I extended arch_object_to_gdbarch to add an assertion for
the type of the PyObject being worked on. Prior to this commit all
uses of arch_object_to_gdbarch were guaranteed to pass this function a
gdb.Architecture object, but, with this commit, this might not be the
case.
So, with this commit I've made it a requirement that the PyObject be a
gdb.Architecture, and this is checked with the assert. And in order
that callers from other files can check if they have a
gdb.Architecture object, I've added the new function
gdbpy_is_architecture.
In py-progspace.c I've added two new function, the first
progspace_object_to_program_space, converts a PyObject of type
gdb.Progspace to the associated program_space pointer, and
gdbpy_is_progspace checks if a PyObject is a gdb.Progspace or not.
Add debuginfod support to core_target::build_file_mappings and
locate_exec_from_corefile_build_id to enable the downloading of
missing executables and shared libraries referenced in core files.
Also add debuginfod support to solib_map_sections so that previously
downloaded shared libraries can be retrieved from the local debuginfod
cache.
When core file shared libraries are found locally, verify that their
build-ids match the corresponding build-ids found in the core file.
If there is a mismatch, attempt to query debuginfod for the correct
build and print a warning if unsuccessful:
warning: Build-id of /lib64/libc.so.6 does not match core file.
Also disable debuginfod when gcore invokes gdb. Debuginfo is not
needed for core file generation so debuginfod queries will slow down
gcore unnecessarily.
If GDB reports a watchpoint hit, and then the next event is not
TARGET_WAITKIND_STOPPED, but instead some event for which there's a
catchpoint, such that GDB calls bpstat_stop_status, GDB mistakenly
thinks the watchpoint triggered. Vis, using foll-fork.c:
(gdb) awatch v
Hardware access (read/write) watchpoint 2: v
(gdb) catch fork
Catchpoint 3 (fork)
(gdb) c
Continuing.
Hardware access (read/write) watchpoint 2: v
Old value = 0
New value = 5
main () at gdb.base/foll-fork.c:16
16 pid = fork ();
(gdb)
Continuing.
Hardware access (read/write) watchpoint 2: v <<<<
<<<< these lines are spurious
Value = 5 <<<<
Catchpoint 3 (forked process 1712369), arch_fork (ctid=0x7ffff7fa4810) at arch-fork.h:49
49 arch-fork.h: No such file or directory.
(gdb)
The problem is that when we handle the fork event, nothing called
watchpoints_triggered before calling bpstat_stop_status. Thus, each
watchpoint's watchpoint_triggered field was still set to
watch_triggered_yes from the previous (real) watchpoint stop.
watchpoint_triggered is only current called in the handle_signal_stop
path, when handling TARGET_WAITKIND_STOPPED.
This fixes it by adding watchpoint_triggered calls in the other events
paths that call bpstat_stop_status. But instead of adding them
explicitly, it adds a new function bpstat_stop_status_nowatch that
wraps bpstat_stop_status and calls watchpoint_triggered, and then
replaces most calls to bpstat_stop_status with calls to
bpstat_stop_status_nowatch.
This required constifying watchpoints_triggered.
New test included, which fails without the fix.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28621
Change-Id: I282b38c2eee428d25319af3bc842f9feafed461c
On x86 machines with xmm register, and with recent versions of
systemtap (and gcc?), it can occur that stap probe arguments will be
placed into xmm registers.
I notice this happening on a current Fedora Rawhide install with the
following package versions installed:
$ rpm -q glibc systemtap gcc
glibc-2.35.9000-10.fc37.x86_64
systemtap-4.7~pre16468670g9f253544-1.fc37.x86_64
gcc-12.0.1-0.12.fc37.x86_64
If I check the probe data in libc, I see this:
$ readelf -n /lib64/libc.so.6
...
stapsdt 0x0000004d NT_STAPSDT (SystemTap probe descriptors)
Provider: libc
Name: pthread_start
Location: 0x0000000000090ac3, Base: 0x00000000001c65c4, Semaphore: 0x0000000000000000
Arguments: 8@%xmm1 8@1600(%rbx) 8@1608(%rbx)
stapsdt 0x00000050 NT_STAPSDT (SystemTap probe descriptors)
Provider: libc
Name: pthread_create
Location: 0x00000000000912f1, Base: 0x00000000001c65c4, Semaphore: 0x0000000000000000
Arguments: 8@%xmm1 8@%r13 8@8(%rsp) 8@16(%rsp)
...
Notice that for both of these probes, the first argument is a uint64_t
stored in the xmm1 register.
Unfortunately, if I try to use this probe within GDB, then I can't
view the first argument. Here's an example session:
$ gdb $(which gdb)
(gdb) start
...
(gdb) info probes stap libc pthread_create
...
(gdb) break *0x00007ffff729e2f1 # Use address of probe.
(gdb) continue
...
(gdb) p $_probe_arg0
Invalid cast.
What's going wrong? If I re-run my session, but this time use 'set
debug stap-expression 1', this is what I see:
(gdb) set debug stap-expression 1
(gdb) p $_probe_arg0
Operation: UNOP_CAST
Operation: OP_REGISTER
String: xmm1
Type: uint64_t
Operation: UNOP_CAST
Operation: OP_REGISTER
String: r13
Type: uint64_t
Operation: UNOP_CAST
Operation: UNOP_IND
Operation: UNOP_CAST
Operation: BINOP_ADD
Operation: OP_LONG
Type: long
Constant: 0x0000000000000008
Operation: OP_REGISTER
String: rsp
Type: uint64_t *
Type: uint64_t
Operation: UNOP_CAST
Operation: UNOP_IND
Operation: UNOP_CAST
Operation: BINOP_ADD
Operation: OP_LONG
Type: long
Constant: 0x0000000000000010
Operation: OP_REGISTER
String: rsp
Type: uint64_t *
Type: uint64_t
Invalid cast.
(gdb)
The important bit is this:
Operation: UNOP_CAST
Operation: OP_REGISTER
String: xmm1
Type: uint64_t
Which is where we cast the xmm1 register to uint64_t. And the final
piece of the puzzle is:
(gdb) ptype $xmm1
type = union vec128 {
v8bf16 v8_bfloat16;
v4f v4_float;
v2d v2_double;
v16i8 v16_int8;
v8i16 v8_int16;
v4i32 v4_int32;
v2i64 v2_int64;
uint128_t uint128;
}
So, we are attempting to cast a union type to a scalar type, which is
not supporting in C/C++, and as a consequence GDB's expression
evaluator throws an error when we attempt to do this.
The first approach I considered for solving this problem was to try
and make use of gdbarch_stap_adjust_register. We already have a
gdbarch method (gdbarch_stap_adjust_register) that allows us to tweak
the name of the register that we access. Currently only x86
architectures use this to transform things like ax to eax in some
cases.
I wondered, what if we change gdbarch_stap_adjust_register to do more
than just change the register names? What if this method instead
became gdbarch_stap_read_register. This new method would return a
operation_up, and would take the register name, and the type we are
trying to read from the register, and return the operation that
actually reads the register.
The default implementation of this method would just use
user_reg_map_name_to_regnum, and then create a register_operation,
like we already do in stap_parse_register_operand. But, for x86
architectures this method would fist possibly adjust the register
name, then do the default action to read the register. Finally, for
x86 this method would spot when we were accessing an xmm register,
and, based on the type being pulled from the register, would extract
the correct field from the union.
The benefit of this approach is that it would work with the expression
types that GDB currently supports. The draw back would be that this
approach would not be very generic. We'd need code to handle each
sub-field size with an xmm register. If other architectures started
using vector registers for probe arguments, those architectures would
have to create their own gdbarch_stap_read_register method. And
finally, the type of the xmm registers comes from the type defined in
the target description, there's a risk that GDB might end up
hard-coding the names of type sub-fields, then if a target uses a
different target description, with different field names for xmm
registers, the stap probes would stop working.
And so, based on all the above draw backs, I rejected this first
approach.
My second plan involves adding a new expression type to GDB called
unop_extract_operation. This new expression takes a value and a type,
during evaluation the value contents are fetched, and then a new value
is extracted from the value contents (based on type). This is similar
to the following C expression:
result_value = *((output_type *) &input_value);
Obviously we can't actually build this expression in this case, as the
input_value is in a register, but hopefully the above makes it clearer
what I'm trying to do.
The benefit of the new expression approach is that this code can be
shared across all architectures, and it doesn't care about sub-field
names within the union type.
The draw-backs that I see are potential future problems if arguments
are not stored within the least significant bytes of the register.
However if/when that becomes an issue we can adapt the
gdbarch_stap_read_register approach to allow architectures to control
how a value is extracted.
For testing, I've extended the existing gdb.base/stap-probe.exp test
to include a function that tries to force an argument into an xmm
register. Obviously, that will only work on a x86 target, so I've
guarded the new function with an appropriate GCC define. In the exp
script we use readelf to check if the probe exists, and is using the
xmm register.
If the probe doesn't exist then the associated tests are skipped.
If the probe exists, put isn't using the xmm register (which will
depend on systemtap/gcc versions), then again, the tests are skipped.
Otherwise, we can run the test. I think the cost of running readelf
is pretty low, so I don't feel too bad making all the non-xmm targets
running this step.
I found that on a Fedora 35 install, with these packages installed, I
was able to run this test and have the probe argument be placed in an
xmm register:
$ rpm -q systemtap gcc glibc
systemtap-4.6-4.fc35.x86_64
gcc-11.2.1-9.fc35.x86_64
glibc-2.34-7.fc35.x86_64
Finally, as this patch adds a new operation type, then I need to
consider how to generate an agent expression for the new operation
type.
I have kicked the can down the road a bit on this. In the function
stap_parse_register_operand, I only create a unop_extract_operation in
the case where the register type is non-scalar, this means that in
most cases I don't need to worry about generating an agent expression
at all.
In the xmm register case, when an unop_extract_operation will be
created, I have sketched out how the agent expression could be
handled, however, this code is currently not reached. When we try to
generate the agent expression to place the xmm register on the stack,
GDB hits this error:
(gdb) trace -probe-stap test:xmmreg
Tracepoint 1 at 0x401166
(gdb) actions
Enter actions for tracepoint 1, one per line.
End with a line saying just "end".
>collect $_probe_arg0
Value not scalar: cannot be an rvalue.
This is because GDB doesn't currently support placing non-scalar types
on the agent expression evaluation stack. Solving this is clearly
related to the original problem, but feels a bit like a second
problem. I'd like to get feedback on whether my approach to solving
the original problem is acceptable or not before I start looking at
how to handle xmm registers within agent expressions.
The test case added here is testing the bug gdb/28856, where calling a
function by hand from a pretty printer makes GDB crash. There are 6
mechanisms to trigger this crash in the current test, using the commands
backtrace, up, down, finish, step and continue. Since the failure happens
because of use-after-free (more details below) the tests will always
have a chance of passing through sheer luck, but anecdotally they seem
to fail all of the time.
The reason GDB is crashing is a use-after-free problem. The above
mentioned functions save a pointer to the current frame's information,
then calls the pretty printer, and uses the saved pointer for different
reasons, depending on the function. The issue happens because
call_function_by_hand needs to reset the obstack to get the current
frame, invalidating the saved pointer.
In testsuite/README, we suggest that you can run the testsuite against
some other GDB binary by using:
make check RUNTESTFLAGS=GDB=/usr/bin/gdb
However, that example isn't fully correct, because with that command
line, the testsuite will still pass
-data-directory=[pwd]/../data-directory
to /usr/bin/gdb, like e.g.:
...
builtin_spawn /usr/bin/gdb -nw -nx -data-directory /home/pedro/gdb/build/gdb/testsuite/../data-directory -iex set height 0 -iex set width 0
...
while if you're testing an installed GDB (the system GDB being the
most usual scenario), then you should normally let it use its own
configured directory, not the just-built GDB's data directory.
This commit improves the status quo with the following two changes:
- if the user specifies GDB on the command line, then by default,
don't start GDB with the -data-directory command line option.
I.e., let the tested GDB use its own configured data directory.
- let the user override the data directory, via a new
GDB_DATA_DIRECTORY global. This replaces the existing
BUILD_DATA_DIRECTORY variable in testsuite/lib/gdb.exp, which
wasn't overridable, and was a bit misnamed for the new purpose.
So after this, the following commands I believe behave intuitively:
# Test the non-installed GDB in some build dir:
make check \
RUNTESTFLAGS="GDB=/path/to/other/build/gdb \
GDB_DATA_DIRECTORY=/path/to/other/build/gdb/data-directory"
# Test the GDB installed in some prefix:
make check \
RUNTESTFLAGS="GDB=/opt/gdb/bin/gdb"
# Test the built GDB with some alternative data directory, e.g., the
system GDB's data directory:
make check \
RUNTESTFLAGS="GDB_DATA_DIRECTORY=/usr/share/gdb"
Change-Id: Icdc21c85219155d9564a9900961997e6624b78fb
Now that the GDB 12 branch has been created,
this commit bumps the version number in gdb/version.in to
13.0.50.DATE-git
For the record, the GDB 12 branch was created
from commit 2be64de603.
Also, as a result of the version bump, the following changes
have been made in gdb/testsuite:
* gdb.base/default.exp: Change $_gdb_major to 13.