Not a big deal, but it seems strange to check errno instead of the
ptrace return value to know whether it succeeded.
Change-Id: If0a6d0280ab0e5ecb077e546af0d6fe489c5b9fd
No caller cares about the value of *SIGINFO on failure. It's also
documented in the function doc that *SIGINFO is uninitialized (I
understand "untouched") on failure.
Change-Id: I5ef38a5f58e3635e109b919ddf6f827f38f1225a
Add a pid parameter to linux_proc_xfer_memory_partial, making the
inferior_ptid reference bubble up close to the target_ops::xfer_partial
boundary. No behavior change expected.
Change-Id: I58171b00ee1bba1ea22efdbb5dcab8b1ab3aac4c
linux_handle_extended_wait calls target_post_attach if we're handling
a PTRACE_EVENT_CLONE, and libthread_db.so isn't active.
target_post_attach just calls linux_init_ptrace_procfs to set the
lwp's ptrace options. However, this is completely unnecessary,
because, as man ptrace [1] says, options are inherited:
"Flags are inherited by new tracees created and "auto-attached" via
active PTRACE_O_TRACEFORK, PTRACE_O_TRACEVFORK, or PTRACE_O_TRACECLONE
options."
This removes the unnecessary call.
[1] - https://man7.org/linux/man-pages/man2/ptrace.2.html
Approved-By: Simon Marchi <simon.marchi@efficios.com>
Change-Id: I533eaa60b700f7e40760311fc0d344d0b3f19a78
Currently, every internal_error call must be passed __FILE__/__LINE__
explicitly, like:
internal_error (__FILE__, __LINE__, "foo %d", var);
The need to pass in explicit __FILE__/__LINE__ is there probably
because the function predates widespread and portable variadic macros
availability. We can use variadic macros nowadays, and in fact, we
already use them in several places, including the related
gdb_assert_not_reached.
So this patch renames the internal_error function to something else,
and then reimplements internal_error as a variadic macro that expands
__FILE__/__LINE__ itself.
The result is that we now should call internal_error like so:
internal_error ("foo %d", var);
Likewise for internal_warning.
The patch adjusts all calls sites. 99% of the adjustments were done
with a perl/sed script.
The non-mechanical changes are in gdbsupport/errors.h,
gdbsupport/gdb_assert.h, and gdb/gdbarch.py.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
Change-Id: Ia6f372c11550ca876829e8fd85048f4502bdcf06
Converting from free-form macros to an enum gives a bit of type-safety.
This caught places where we would assign host error numbers to what
should contain a target fileio error number, for instance in
target_fileio_pread.
I added the FILEIO_SUCCESS enumerator, because
remote.c:remote_hostio_parse_result initializes the remote_errno output
variable to 0. It seems better to have an explicit enumerator than to
assign a value for which there is no enumerator. I considered
initializing this variable to FILEIO_EUNKNOWN instead, such that if the
remote side replies with an error and omits the errno value, we'll get
an errno that represents an error instead of 0 (which reprensents no
error). But it's not clear what the consequences of that change would
be, so I prefer to err on the side of caution and just keep the existing
behavior (there is no intended change in behavior with this patch).
Note that remote_hostio_parse_resul still reads blindly what the remote
side sends as a target errno into this variable, so we can still end up
with a nonsensical value here. It's not good, but out of the scope of
this patch.
Convert host_to_fileio_error and fileio_errno_to_host to return / accept
a fileio_error instead of an int, and cascade the change in the whole
chain that uses that.
Change-Id: I454b0e3fcf0732447bc872252fa8e57d138b0e03
Commit 05c06f318f enabled GDB to access
memory while threads are running. It did this by accessing
/proc/PID/task/LWP/mem.
Unfortunately, this interface is not implemented for writing in older
kernels (such as RHEL6). This means that GDB is unable to insert
breakpoints on these hosts:
$ ./gdb -q gdb -ex start
Reading symbols from gdb...
Temporary breakpoint 1 at 0x40fdd5: file ../../src/gdb/gdb.c, line 28.
Starting program: /home/rhel6/fsf/linux/gdb/gdb
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x40fdd5
(gdb)
Before this patch, linux_proc_xfer_memory_partial (previously called
linux_proc_xfer_partial) would return TARGET_XFER_EOF if the write to
/proc/PID/mem failed. [More specifically, linux_proc_xfer_partial
would not "bother for one word," but the effect is the essentially
same.]
This status was checked by linux_nat_target::xfer_partial, which would
then fallback to using ptrace to perform the operation.
This is the specific hunk that removed the fallback:
- xfer = linux_proc_xfer_partial (object, annex, readbuf, writebuf,
- offset, len, xfered_len);
- if (xfer != TARGET_XFER_EOF)
- return xfer;
+ return linux_proc_xfer_memory_partial (readbuf, writebuf,
+ offset, len, xfered_len);
+ }
return inf_ptrace_target::xfer_partial (object, annex, readbuf, writebuf,
offset, len, xfered_len);
This patch makes linux_nat_target::xfer_partial go straight to writing
memory via ptrace if writing via /proc/pid/mem is not possible in the
running kernel, enabling GDB to insert breakpoints on these older
kernels. Note that a recent patch changed the return status from
TARGET_XFER_EOF to TARGET_XFER_E_IO.
Tested on {unix,native-gdbserver,native-extended-gdbserver}/-m{32,64}
on x86_64, s390x, aarch64, and ppc64le.
Change-Id: If1d884278e8c4ea71d8836bedd56e6a6c242a415
Probe whether /proc/pid/mem is writable, by using it to write to a GDB
variable. This will be used in the following patch to avoid falling
back to writing to inferior memory with ptrace if /proc/pid/mem _is_
writable.
Change-Id: If87eff0b46cbe5e32a583e2977a9e17d29d0ed3e
For every stop, Linux GDB and GDBserver save the stopped thread's PC,
in lwp->stop_pc. This is done in save_stop_reason, in both
gdb/linux-nat.c and gdbserver/linux-low.cc. However, while we're
going through the shell after "run", in startup_inferior, we shouldn't
be reading registers, as we haven't yet determined the target's
architecture -- the shell's architecture may not even be the same as
the final inferior's.
In gdb/linux-nat.c, lwp->stop_pc is only needed when the thread has
stopped for a breakpoint, and since when going through the shell, no
breakpoint is going to hit, we could simply teach save_stop_reason to
only record the stop pc when the thread stopped for a breakpoint.
However, in gdbserver/linux-low.cc, lwp->stop_pc is used in more cases
than breakpoint hits (e.g., it's used in tracepoints & the
"while-stepping" feature).
So to avoid GDB vs GDBserver divergence, we apply the same approach to
both implementations.
We set a flag in the inferior (process in GDBserver) whenever it is
being nursed through the shell, and when that flag is set,
save_stop_reason bails out early. While going through the shell,
we'll only ever get process exits (normal or signalled), random
signals, and exec events, so nothing is lost.
Change-Id: If0f01831514d3a74d17efd102875de7d2c6401ad
When accessing /proc/PID/mem, if pread64/pwrite64/read/write encounters
an error and return -1, linux_proc_xfer_memory_partial return
TARGET_XFER_EOF.
I think it should return TARGET_XFER_E_IO in this case. TARGET_XFER_EOF
is returned when pread64/pwrite64/read/frite returns 0, which indicates
that the address space is gone and the whole process has exited or
execed.
This patch makes this change.
Regression tested on x86_64-linux-gnu.
Change-Id: I6030412459663b8d7933483fdda22a6c2c5d7221
This changes target_pid_to_exec_file and target_ops::pid_to_exec_file
to return a "const char *". I couldn't build many of these targets,
but did examine the code by hand -- also, as this only affects the
return type, it's normally pretty safe. This brings gdb and gdbserver
a bit closer, and allows for the removal of a const_cast as well.
The current target_resume interface is a bit odd & non-intuitive.
I've found myself explaining it a couple times the recent past, while
reviewing patches that assumed STEP/SIGNAL always applied to the
passed in PTID. It goes like this today:
- if the passed in PTID is a thread, then the step/signal request is
for that thread.
- otherwise, if PTID is a wildcard (all threads or all threads of
process), the step/signal request is for inferior_ptid, and PTID
indicates which set of threads run free.
Because GDB always switches the current thread to "leader" thread
being resumed/stepped/signalled, we can simplify this a bit to:
- step/signal are always for inferior_ptid.
- PTID indicates the set of threads that run free.
Still not ideal, but it's a minimal change and at least there are no
special cases this way.
That's what this patch does. It renames the PTID parameter to
SCOPE_PTID, adds some assertions to target_resume, and tweaks
target_resume's description. In addition, it also renames PTID to
SCOPE_PTID in the remote and linux-nat targets, and simplifies their
implementation a little bit. Other targets could do the same, but
they don't have to.
Change-Id: I02a2ec2ab3a3e9b191de1e9a84f55c17cab7daaf
The check removed by this patch, using current_inferior, looks wrong.
When debugging multiple inferiors with the Linux native target and
linux_handle_extended_wait is called, there's no guarantee about which
is the current inferior. The vfork-done event we receive could be for
any inferior. If the vfork-done event is for a non-current inferior, we
end up wrongfully ignoring it. As a result, the core never processes a
TARGET_WAITKIND_VFORK_DONE event, program_space::breakpoints_not_allowed
is never cleared, and breakpoints are never reinserted. However,
because the Linux native target decided to ignore the event, it resumed
the thread - while breakpoints out. And that's bad.
The proposed fix is to remove this check. Always report vfork-done
events and let infrun's logic decide if it should be ignored. We don't
save much cycles by filtering the event here.
Add a test that replicates the situation described above. See comments
in the test for more details.
Change-Id: Ibe33c1716c3602e847be6c2093120696f2286fbf
Now that filtered and unfiltered output can be treated identically, we
can unify the printf family of functions. This is done under the name
"gdb_printf". Most of this patch was written by script.
A number of spots call printf_unfiltered only because they are in code
that should not be interrupted by the pager. However, I believe these
cases are all handled by infrun's blanket ban on paging, and so can be
converted to the default (_filtered) API.
After this patch, I think all the remaining _unfiltered calls are ones
that really ought to be. A few -- namely in complete_command -- could
be replaced by a scoped assignment to pagination_enabled, but for the
remainder, the code seems simple enough like this.
The current zombie leader detection code in linux-nat.c has a race --
if a multi-threaded inferior exits just before check_zombie_leaders
finds that the leader is now zombie via checking /proc/PID/status,
check_zombie_leaders deletes the leader, assuming we won't get an
event for that exit (which we won't in some scenarios, but not in this
one). That might seem mostly harmless, but it has some downsides:
- later when we continue pulling events out of the kernel, we will
collect the exit event of the non-leader threads, and once we see
the last lwp in our list exit, we return _that_ lwp's exit code as
whole-process exit code to infrun, instead of the leader's exit
code.
- this can cause a hang in stop_all_threads in infrun.c. Say there
are 2 threads in the process. stop_all_threads stops each of those
threads, and then waits for two stop or exit events, one for each
thread. If the whole process exits, and check_zombie_leaders hits
the false-positive case, linux-nat.c will only return one event to
GDB (the whole-process exit returned when we see the last thread,
the non-leader thread, exit), making stop_all_threads hang forever
waiting for a second event that will never come.
However, in this false-positive scenario, where the whole process is
exiting, as opposed to just the leader (with pthread_exit(), for
example), we _will_ get an exit event shortly for the leader, after we
collect the exit event of all the other non-leader threads. Or put
another way, we _always_ get an event for the leader after we see it
become zombie.
I tried a number of approaches to fix this:
#1 - My first thought to address the race was to make GDB always
report the whole-process exit status for the leader thread, not for
whatever is the last lwp in the list. We _always_ get a final exit
(or exec) event for the leader, and when the race triggers, we're not
collecting it.
#2 - My second thought was to try to plug the race in the first place.
I thought of making GDB call waitpid/WNOHANG for all non-leader
threads immediately when the zombie leader is detected, assuming there
would be an exit event pending for each of them waiting to be
collected. Turns out that that doesn't work -- you can see the leader
become zombie _before_ the kernel kills all other threads. Waitpid in
that small time window returns 0, indicating no-event. Thankfully we
hit that race window all the time, which avoided trading one race for
another. Looking at the non-leader thread's status in /proc doesn't
help either, the threads are still in running state for a bit, for the
same reason.
#3 - My next attempt, which seemed promising, was to synchronously
stop and wait for the stop for each of the non-leader threads. For
the scenario in question, this will collect all the exit statuses of
the non-leader threads. Then, if we are left with only the zombie
leader in the lwp list, it means we either have a normal while-process
exit or an exec, in which case we should not delete the leader. If
_only_ the leader exited, like in gdb.threads/leader-exit.exp, then
after pausing threads, we will still have at least one live non-leader
thread in the list, and so we delete the leader lwp. I got this
working and polished, and it was only after staring at the kernel code
to convince myself that this would really work (and it would, for the
scenario I considered), that I realized I had failed to account for
one scenario -- if any non-leader thread is _already_ stopped when
some thread triggers a group exit, like e.g., if you have some threads
stopped and then resume just one thread with scheduler-locking or
non-stop, and that thread exits the process. I also played with
PTRACE_EVENT_EXIT, see if it would help in any way to plug the race,
and I couldn't find a way that it would result in any practical
difference compared to looking at /proc/PID/status, with respect to
having a race.
So I concluded that there's no way to plug the race, we just have to
deal with it. Which means, going back to approach #1. That is the
approach taken by this patch.
Change-Id: I6309fd4727da8c67951f9cea557724b77e8ee979
Reorganize linux-nat.c:linux_nat_filter_event such that all the
handling for events for LWPs not in the LWP list is together. This
helps make a following patch clearer. The comments and debug messages
have also been tweaked - the end goal is to have them synchronized
with the gdbserver counterpart.
Change-Id: I8586d8dcd76d8bd3795145e3056fc660e3b2cd22
Subclasses of inf_ptrace_target have to opt-in to using the event_pipe
by implementing the can_async_p and async methods. For subclasses
which do this, inf_ptrace_target provides is_async_p, async_wait_fd
and closes the pipe in the close target method.
inf_ptrace_target also provides wrapper routines around the event pipe
(async_file_open, async_file_close, async_file_flush, and
async_file_mark) for use in target methods such as async.
inf_ptrace_target also exports a static async_file_mark_if_open
function which can be used in SIGCHLD signal handlers.
Now that target_resume always enables async mode after target::resume
returns, these calls are redundant.
The other place that target resume methods are invoked outside of
target_resume are as the beneath target in record_full_wait_1. In
this case, async mode should already be enabled when supported by the
target before the resume method is invoked due to the following:
In general, targets which support async mode run as async until
::wait returns TARGET_WAITKIND_NO_RESUMED to indicate that there are
no unwaited for children (either they have exited or are stopped).
When that occurs, the loop in wait_one disables async mode. Later
if a stopped child is resumed, async mode is re-enabled in
do_target_resume before waiting for the next event.
In the case of record_full_wait_1, this function is invoked from the
::wait target method when fetching an event. If the underlying
target supports async mode, then an earlier call to do_target_resume
to resume the child reporting an event in the loop in
record_full_wait_1 would have already enabled async mode before
::wait was invoked. In addition, nothing in the code executed in
the loop in record_full_wait_1 disables async mode. Async mode is
only disabled higher in the call stack in wait_one after ::wait
returns.
It is also true that async mode can be disabled by an
INF_EXEC_COMPLETE event passed to inferior_event_handle, but all of
the places that invoke that are in the gdb core which is "above" a
target ::wait method.
Note that there is an earlier call to enable async mode in
linux_nat_target::resume. That call also marks the async event pipe
to report an existing event after enabling async mode, so it needs to
stay.
Same idea as 0fab795564 ("gdb: use ptid_t::to_string in infrun debug
messages"), but throughout GDB.
Change-Id: I62ba36eaef29935316d7187b9b13d7b88491acc1
Rename 'set debug lin-lwp' to 'set debug linux-nat' and 'show debug
lin-lwp' to 'show debug linux-nat'.
I've updated the documentation and help text to match, as well as
making it clear that the debug that is coming out relates to all
aspects of Linux native inferior support, not just the LWP aspect of
it.
The boundary between general "native" target debug, and the lwp
specific part of that debug was always a little blurry, but the actual
debug variable inside GDB is debug_linux_nat, and the print routine
linux_nat_debug_printf, is used throughout the linux-nat.c file, not
just for lwp related debug, so the new name seems to make more sense.
This commit brings all the changes made by running gdb/copyright.py
as per GDB's Start of New Year Procedure.
For the avoidance of doubt, all changes in this commits were
performed by the script.
Convert the 'set debug lin-lwp' command to a boolean. Adds a new
LINUX_NAT_SCOPED_DEBUG_ENTER_EXIT macro, and makes use of it in one
place (linux_nat_target::stop).
The manual entry for 'set debug lin-lwp' is already vague about
exactly what arguments this command takes, and the description talks
about turning debug on and off, so I don't think there's any updates
required there.
I have updated the doc strings shown when the users enters 'help show
debug lin-lwp' or 'help show debug lin-lwp'. The old title lines used
to talk about the 'GNU/Linux lwp module', but this debug flag is now
used for any native linux target debug, so we now talk about
'GNU/Linux native target'. The body string for this setting has been
changed from 'Enables printf debugging output.' to 'When on, print
debug messages relating to the GNU/Linux native target.', the old
value looks like a cut&paste error to me.
While working on a later patch that required me to understand how GDB
starts up inferiors, I was confused by the
target_ops::post_startup_inferior method.
The post_startup_inferior target function is only called from
inf_ptrace_target::create_inferior.
Part of the target class hierarchy looks like this:
inf_child_target
|
'-- inf_ptrace_target
|
|-- linux_nat_target
|
|-- fbsd_nat_target
|
|-- nbsd_nat_target
|
|-- obsd_nat_target
|
'-- rs6000_nat_target
Every sub-class of inf_ptrace_target, except rs6000_nat_target,
implements ::post_startup_inferior. The rs6000_nat_target picks up
the implementation of ::post_startup_inferior not from
inf_ptrace_target, but from inf_child_target.
No descendent of inf_child_target, outside the inf_ptrace_target
sub-tree, implements ::post_startup_inferior, which isn't really
surprising, as they would never see the method called (remember, the
method is only called from inf_ptrace_target::create_inferior).
What I find confusing is the role inf_child_target plays in
implementing, what is really a helper function for just one of its
descendents.
In this commit I propose that we formally make ::post_startup_inferior
a helper function of inf_ptrace_target. To do this I will remove the
::post_startup_inferior from the target_ops API, and instead make this
a protected, pure virtual function on inf_ptrace_target.
I'll remove the empty implementation of ::post_startup_inferior from
the inf_child_target class, and add a new empty implementation to the
rs6000_nat_target class.
All the other descendents of inf_ptrace_target already provide an
implementation of this method and so don't need to change beyond
making the method protected within their class declarations.
To me, this makes much more sense now. The helper function, which is
only called from within the inf_ptrace_target class, is now a part of
the inf_ptrace_target class.
The only way in which this change is visible to a user is if the user
turns on 'set debug target 1'. With this debug flag on, prior to this
patch the user would see something like:
-> native->post_startup_inferior (...)
<- native->post_startup_inferior (2588939)
After this patch these lines are no longer present, as the
post_startup_inferior is no longer a top level target method. For me,
this is an acceptable change.
While working with pending fork events, I wondered what would happen if
the user detached an inferior while a thread of that inferior had a
pending fork event. What happens with the fork child, which is
ptrace-attached by the GDB process (or by GDBserver), but not known to
the core? Sure enough, neither the core of GDB or the target detach the
child process, so GDB (or GDBserver) just stays ptrace-attached to the
process. The result is that the fork child process is stuck, while you
would expect it to be detached and run.
Make GDBserver detach of fork children it knows about. That is done in
the generic handle_detach function. Since a process_info already exists
for the child, we can simply call detach_inferior on it.
GDB-side, make the linux-nat and remote targets detach of fork children
known because of pending fork events. These pending fork events can be
stored in:
- thread_info::pending_waitstatus, if the core has consumed the event
but then saved it for later (for example, because it got the event
while stopping all threads, to present an all-stop stop on top of a
non-stop target)
- thread_info::pending_follow: if we ran to a "catch fork" and we
detach at that moment
Additionally, pending fork events can be in target-specific fields:
- For linux-nat, they can be in lwp_info::status and
lwp_info::waitstatus.
- For the remote target, they could be stored as pending stop replies,
saved in `remote_state::notif_state::pending_event`, if not
acknowledged yet, or in `remote_state::stop_reply_queue`, if
acknowledged. I followed the model of remove_new_fork_children for
this: call remote_notif_get_pending_events to process /
acknowledge any unacknowledged notification, then look through
stop_reply_queue.
Update the gdb.threads/pending-fork-event.exp test (and rename it to
gdb.threads/pending-fork-event-detach.exp) to try to detach the process
while it is stopped with a pending fork event. In order to verify that
the fork child process is correctly detached and resumes execution
outside of GDB's control, make that process create a file in the test
output directory, and make the test wait $timeout seconds for that file
to appear (it happens instantly if everything goes well).
This test catches a bug in linux-nat.c, also reported as PR 28512
("waitstatus.h:300: internal-error: gdb_signal target_waitstatus::sig()
const: Assertion `m_kind == TARGET_WAITKIND_STOPPED || m_kind ==
TARGET_WAITKIND_SIGNALLED' failed.). When detaching a thread with a
pending event, get_detach_signal unconditionally fetches the signal
stored in the waitstatus (`tp->pending_waitstatus ().sig ()`). However,
that is only valid if the pending event is of type
TARGET_WAITKIND_STOPPED, and this is now enforced using assertions (iit
would also be valid for TARGET_WAITKIND_SIGNALLED, but that would mean
the thread does not exist anymore, so we wouldn't be detaching it). Add
a condition in get_detach_signal to access the signal number only if the
wait status is of kind TARGET_WAITKIND_STOPPED, and use GDB_SIGNAL_0
instead (since the thread was not stopped with a signal to begin with).
Add another test, gdb.threads/pending-fork-event-ns.exp, specifically to
verify that we consider events in pending stop replies in the remote
target. This test has many threads constantly forking, and we detach
from the program while the program is executing. That gives us some
chance that we detach while a fork stop reply is stored in the remote
target. To verify that we correctly detach all fork children, we ask
the parent to exit by sending it a SIGUSR1 signal and have it write a
file to the filesystem before exiting. Because the parent's main thread
joins the forking threads, and the forking threads wait for their fork
children to exit, if some fork child is not detach by GDB, the parent
will not write the file, and the test will time out. If I remove the
new remote_detach_pid calls in remote.c, the test fails eventually if I
run it in a loop.
There is a known limitation: we don't remove breakpoints from the
children before detaching it. So the children, could hit a trap
instruction after being detached and crash. I know this is wrong, and
it should be fixed, but I would like to handle that later. The current
patch doesn't fix everything, but it's a step in the right direction.
Change-Id: I6d811a56f520e3cb92d5ea563ad38976f92e93dd
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28512
The following patch will add some code paths that need to ptrace-detach
a given PID. Factor out the code that does this and put it in its own
function, so that it can be re-used.
Change-Id: Ie65ca0d89893b41aea0a23d9fc6ffbed042a9705
store_waitstatus is basically a translation function between a status
integer and an equivalent target_waitstatus object. It would make sense
for it to take the integer as a parameter and return the
target_waitstatus by value. Do that, and rename to
host_status_to_waitstatus. Users can then do:
ws = host_status_to_waitstatus (status)
which does the right thing, given the move constructor of
target_waitstatus.
Change-Id: I7a07d59d3dc19d3ed66929642f82f44f3e85d61b
This commit moves the target_async_permitted check out of each targets
::can_async_p method and into the target_can_async_p wrapper function.
I've left some asserts in the two ::can_async_p methods that I
changed, which will hopefully catch any direct calls to these methods
that might be added in the future.
There should be no user visible changes after this commit.
PR 28065 (gdb.threads/access-mem-running-thread-exit.exp intermittent
failure) shows that GDB can hit an unexpected scenario -- it can
happen that the kernel manages to open a /proc/PID/task/LWP/mem file,
but then reading from the file returns 0/EOF, even though the process
hasn't exited or execed.
"0" out of read/write is normally what you get when the address space
of the process the file was open for is gone, because the process
execed or exited. So when GDB gets the 0, it returns memory access
failure. In the bad case in question, the process hasn't execed or
exited, so GDB fails a memory access when the access should have
worked.
GDB has code in place to gracefully handle the case of opening the
/proc/PID/task/LWP/mem just while the LWP is exiting -- most often the
open fails with EACCES or ENOENT. When it happens, GDB just tries
opening the file for a different thread of the process. The testcase
is written such that it stresses GDB's logic of closing/reopening the
/proc/PID/task/LWP/mem file, by constantly spawning short lived
threads.
However, there's a window where the kernel manages to find the thread,
but the thread exits just after and clears its address space pointer.
In this case, the kernel creates a file successfully, but the file
ends up with no address space associated, so a subsequent read/write
returns 0/EOF too, just like if the whole process had execed or
exited. This is the case in question that GDB does not handle.
Oleg Nesterov gave this suggestion as workaround for that race:
gdb can open(/proc/pid/mem) and then read (say) /proc/pid/statm.
If statm reports something non-zero, then open() was "successfull".
I think that might work. However, I didn't try it, because I realized
we have another nasty race that that wouldn't fix.
The other race I realized is that because we close/reopen the
/proc/PID/task/LWP/mem file when GDB switches to a different inferior,
then it can happen that GDB reopens /proc/PID/task/LWP/mem just after
a thread execs, and before GDB has seen the corresponding exec event.
I.e., we can open a /proc/PID/task/LWP/mem file accessing the
post-exec address space thinking we're accessing the pre-exec address
space.
A few months back, Simon, Oleg and I discussed a similar race:
[Bug gdb/26754] Race condition when resuming threads and one does an exec
https://sourceware.org/bugzilla/show_bug.cgi?id=26754
The solution back then was to make the kernel fail any ptrace
operation until the exec event is consumed, with this kernel commit:
commit dbb5afad100a828c97e012c6106566d99f041db6
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Wed May 12 15:33:08 2021 +0200
Commit: Linus Torvalds <torvalds@linux-foundation.org>
CommitDate: Wed May 12 10:45:22 2021 -0700
ptrace: make ptrace() fail if the tracee changed its pid unexpectedly
This however, only applies to ptrace, not to the /proc/pid/mem file
opening case. Also, even if it did apply to the file open case, we
would want to support current kernels until such a fix is more wide
spread anyhow.
So all in all, this commit gives up on the idea of only ever keeping
one /proc/pid/mem file descriptor open. Instead, make GDB open a
/proc/pid/mem per inferior, and keep it open until the inferior exits,
is detached or execs. Make GDB open the file right after the inferior
is created or is attached to or forks, at which point we know the
inferior is stable and stopped and isn't thus going to exec, or have a
thread exit, and so the file open won't fail (unless the whole process
is SIGKILLed from outside GDB, at which point it doesn't matter
whether we open the file).
This way, we avoid both races described above, at the expense of using
more file descriptors (one per inferior).
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28065
Change-Id: Iff943b95126d0f98a7973a07e989e4f020c29419
I stumbled on a bug caused by the fact that a code path read
target_waitstatus::value::sig (expecting it to contain a gdb_signal
value) while target_waitstatus::kind was TARGET_WAITKIND_FORKED. This
meant that the active union field was in fact
target_waitstatus::value::related_pid, and contained a ptid. The read
signal value was therefore garbage, and that caused GDB to crash soon
after. Or, since that GDB was built with ubsan, this nice error
message:
/home/simark/src/binutils-gdb/gdb/linux-nat.c:1271:12: runtime error: load of value 2686365, which is not a valid value for type 'gdb_signal'
Despite being a large-ish change, I think it would be nice to make
target_waitstatus safe against that kind of bug. As already done
elsewhere (e.g. dynamic_prop), validate that the type of value read from
the union matches what is supposed to be the active field.
- Make the kind and value of target_waitstatus private.
- Make the kind initialized to TARGET_WAITKIND_IGNORE on
target_waitstatus construction. This is what most users appear to do
explicitly.
- Add setters, one for each kind. Each setter takes as a parameter the
data associated to that kind, if any. This makes it impossible to
forget to attach the associated data.
- Add getters, one for each associated data type. Each getter
validates that the data type fetched by the user matches the wait
status kind.
- Change "integer" to "exit_status", "related_pid" to "child_ptid",
just because that's more precise terminology.
- Fix all users.
That last point is semi-mechanical. There are a lot of obvious changes,
but some less obvious ones. For example, it's not possible to set the
kind at some point and the associated data later, as some users did.
But in any case, the intent of the code should not change in this patch.
This was tested on x86-64 Linux (unix, native-gdbserver and
native-extended-gdbserver boards). It was built-tested on x86-64
FreeBSD, NetBSD, MinGW and macOS. The rest of the changes to native
files was done as a best effort. If I forgot any place to update in
these files, it should be easy to fix (unless the change happens to
reveal an actual bug).
Change-Id: I0ae967df1ff6e28de78abbe3ac9b4b2ff4ad03b7
Make gdb_open_cloexec return a scoped_fd, to encourage using automatic
management of the file descriptor closing. Except in the most trivial
cases, I changed the callers to just release the fd, which retains their
existing behavior. That will allow the transition to using scoped_fd
more to go gradually, one caller at a time.
Change-Id: Ife022b403f96e71d5ebb4f1056ef6251b30fe554
Replace the manually maintained linked list of lwp_info objects with
intrusive_list. Replace the ALL_LWPS macro with all_lwps, which returns
a range. Add all_lwps_safe as well, for use in iterate_over_lwps, which
currently iterates in a safe manner.
Change-Id: I355313502510acc0103f5eaf2fbde80897d6376c
Replace the lwp_free function with a destructor. Make lwp_info
non-copyable, since there is now a destructor (we wouldn't want an
lwp_info object getting copied and this->arch_private getting deleted
twice).
Change-Id: I09fcbe967e362566d3a06fed2abca2a9955570fa
Initialize all fields in the class declaration directly. This opens the
door to using intrusive_list, done in the following patch.
Change-Id: I38bb27410cd9ebf511d310bb86fe2ea1872c3b05
I wanted to find, and potentially modify, all the spots where the
'tid' parameter to the ptid_t constructor was used. So, I temporarily
removed this parameter and then rebuilt.
In order to make it simpler to search through the "real" (nonzero)
uses of this parameter, something I knew I'd have to do multiple
times, I removed any ", 0" from constructor calls.
Co-Authored-By: John Baldwin <jhb@FreeBSD.org>
Rename thread_info::executing to thread_info::m_executing, and make it
private. Add a new get/set member functions, and convert GDB to make
use of these.
The only real change of interest in this patch is in thread.c where I
have deleted the helper function set_executing_thread, and now just
use the new set function thread_info::set_executing. However, the old
helper function set_executing_thread included some code to reset the
thread's stop_pc, so I moved this code into the new function
thread_info::set_executing. However, I don't believe there is
anywhere that this results in a change of behaviour, previously the
executing flag was always set true through a call to
set_executing_thread anyway.
I spotted a couple of stray newlines that were left at the end of
debug message during conversion to the new debug output scheme. These
messages are part of the 'set debug lin-lwp 1' output.
In the context of ROCm-gdb [1], the ROCm target sits on top of the
linux-nat target. when a process forks, it needs to carry over some
data from the forking inferior to the fork child inferior. Ideally, the
ROCm target would implement the follow_fork target_ops method, but there
are some small problems. This patch fixes these, which helps the ROCm
target, but also makes things more consistent and a bit nicer in
general, I believe.
The main problem is: when follow-fork-mode is "parent",
target_follow_fork is called with the parent as the current inferior.
When it's "child", target_follow_fork is called with the child as the
current inferior. This means that target_follow_fork is sometimes
called on the parent's target stack and sometimes on the child's target
stack.
The parent's target stack may contain targets above the process target,
such as the ROCm target. So if follow-fork-child is "parent", the ROCm
target would get notified of the fork and do whatever is needed. But
the child's target stack, at that moment, only contains the exec and
process target copied over from the parent. The child's target stack is
set up by follow_fork_inferior, before calling target_follow_fork. In
that case, the ROCm target wouldn't get notified of the fork.
For consistency, I think it would be good to always call
target_follow_fork on the parent inferior's target stack. I think it
makes sense as a way to indicate "this inferior has called fork, do
whatever is needed". The desired outcome of the fork (whether an
inferior is created for the child, do we need to detach from the child)
can be indicated by passed parameter.
I therefore propose these changes:
- make follow_fork_inferior always call target_follow_fork with the
parent as the current inferior. That lets all targets present on the
parent's target stack do some fork-related handling and push
themselves on the fork child's target stack if needed.
For this purpose, pass the child inferior down to target_follow_fork
and follow_fork implementations. This is nullptr if no inferior is
created for the child, because we want to detach from it.
- as a result, in follow_fork_inferior, detach from the parent inferior
(if needed) only after the target_follow_fork call. This is needed
because we want to call target_follow_fork before the parent's
target stack is torn down.
- hand over to the targets in the parent's target stack (including the
process target) the responsibility to push themselves, if needed, to
the child's target stack. Also hand over the responsibility to the
process target, at the same time, to create the child's initial
thread (just like we do for follow_exec).
- pass the child inferior to exec_on_vfork, so we don't need to swap
the current inferior between parent and child. Nothing in
exec_on_vfork depends on the current inferior, after this change.
Although this could perhaps be replaced with just having the exec
target implement follow_fork and push itself in the child's target
stack, like the process target does... We would just need to make
sure the process target calls beneath()->follow_fork(...). I'm not
sure about this one.
gdb/ChangeLog:
* target.h (struct target_ops) <follow_fork>: Add inferior*
parameter.
(target_follow_fork): Likewise.
* target.c (default_follow_fork): Likewise.
(target_follow_fork): Likewise.
* fbsd-nat.h (class fbsd_nat_target) <follow_fork>: Likewise.
(fbsd_nat_target::follow_fork): Likewise, and call
inf_ptrace_target::follow_fork.
* linux-nat.h (class linux_nat_target) <follow_fork>: Likewise.
* linux-nat.c (linux_nat_target::follow_fork): Likewise, and
call inf_ptrace_target::follow_fork.
* obsd-nat.h (obsd_nat_target) <follow_fork>: Likewise.
* obsd-nat.c (obsd_nat_target::follow_fork): Likewise, and call
inf_ptrace_target::follow_fork.
* remote.c (class remote_target) <follow_fork>: Likewise.
(remote_target::follow_fork): Likewise, and call
process_stratum_target::follow_fork.
* process-stratum-target.h (class process_stratum_target)
<follow_fork>: New.
* process-stratum-target.c
(process_stratum_target::follow_fork): New.
* target-delegates.c: Re-generate.
[1] https://github.com/ROCm-Developer-Tools/ROCgdb
Change-Id: I460bd0af850f0485e8aed4b24c6d8262a4c69929
GDB doesn't handle well the case of an inferior using the JIT interface
to register JIT-ed objfiles and forking. If an inferior registers a
code object using the JIT interface and then forks, the child process
conceptually has the same code object loaded, so GDB should look it up
and learn about it (it currently doesn't).
To achieve this, I think it would make sense to have the
inferior_created observable called when an inferior is created due to a
fork in follow_fork_inferior. The inferior_created observable is
currently called both after starting a new inferior and after attaching
to an inferior, allowing various sub-components to learn about that new
executing inferior. We can see handling a fork child just like
attaching to it, so any work done when attaching should also be done in
the case of a fork child.
Instead of just calling the inferior_created observable, this patch
makes follow_fork_inferior call the whole post_create_inferior function.
This way, the attach and follow-fork code code paths are more alike.
Given that post_create_inferior calls solib_create_inferior_hook,
follow_fork_inferior doesn't need to do it itself, so those calls to
solib_create_inferior_hook are removed.
One question you may have: why not just call post_create_inferior at the
places where solib_create_inferior_hook is currently called, instead of
after target_follow_fork?
- there's something fishy for the second solib_create_inferior_hook
call site: at this point we have switched the current program space
to the child's, but not the current inferior nor the current thread.
So solib_create_inferior_hook (and everything under, including
check_for_thread_db, for example) is called with inferior 1 as the
current inferior and inferior 2's program space as the current
program space. I think that's wrong, because at this point we are
setting up inferior 2, and all that code relies on the current
inferior. We could just add a switch_to_thread call before it to
make inferior 2 the current one, but there are other problems (see
below).
- solib_create_inferior_hook is currently not called on the
`follow_child && detach_fork` path. I think we need to call it,
because we still get a new inferior in that case (even though we
detach the parent). If we only call post_create_inferior where
solib_create_inferior_hook used to be called, then the JIT
subcomponent doesn't get informed about the new inferior, and that
introduces a failure in the new gdb.base/jit-elf-fork.exp test.
- if we try to put the post_create_inferior just after the
switch_to_thread that was originally at line 662, or just before the
call to target_follow_fork, we introduce a subtle failure in
gdb.threads/fork-thread-pending.exp. What happens then is that
libthread_db gets loaded (somewhere under post_create_inferior)
before the linux-nat target learns about the LWPs (which happens in
linux_nat_target::follow_fork). As a result, the ALL_LWPS loop in
try_thread_db_load_1 doesn't see the child LWP, and the thread-db
target doesn't have the chance to fill in thread_info::priv. A bit
later, when the test does "info threads", and
thread_db_target::pid_to_str is called, the thread-db target doesn't
recognize the thread as one of its own, and delegates the request to
the target below. Because the pid_to_str output is not the expected
one, the test fails.
This tells me that we need to call the process target's follow_fork
first, to make the process target create the necessary LWP and thread
structures. Then, we can call post_create_inferior to let the other
components of GDB do their thing.
But then you may ask: check_for_thread_db is already called today,
somewhere under solib_create_inferior_hook, and that is before
target_follow_fork, why don't we see this ordering problem!? Well,
because of the first bullet point: when check_for_thread_db /
thread_db_load are called, the current inferior is (erroneously)
inferior 1, the parent. Because libthread_db is already loaded for
the parent, thread_db_load early returns. check_for_thread_db later
gets called by linux_nat_target::follow_fork. At this point, the
current inferior is the correct one and the child's LWP exists, so
all is well.
Since we now call post_create_inferior after target_follow_fork, which
calls the inferior_created observable, which calls check_for_thread_db,
I don't think linux_nat_target needs to explicitly call
check_for_thread_db itself, so that is removed.
In terms of testing, this patch adds a new gdb.base/jit-elf-fork.exp
test. It makes an inferior register a JIT code object and then fork.
It then verifies that whatever the detach-on-fork and follow-fork-child
parameters are, GDB knows about the JIT code object in all the inferiors
that survive the fork. It verifies that the inferiors can unload that
code object.
There isn't currently a way to get visibility into GDB's idea of the JIT
code objects for each inferior. For the purpose of this test, add the
"maintenance info jit" command. There isn't much we can print about the
JIT code objects except their load address. So the output looks a bit
bare, but it's good enough for the test.
gdb/ChangeLog:
* NEWS: Mention "maint info jit" command.
* infrun.c (follow_fork_inferior): Don't call
solib_create_inferior_hook, call post_create_inferior if a new
inferior was created.
* jit.c (maint_info_jit_cmd): New.
(_initialize_jit): Register new command.
* linux-nat.c (linux_nat_target::follow_fork): Don't call
check_for_thread_db.
* linux-nat.h (check_for_thread_db): Remove declaration.
* linux-thread-db.c (check_thread_signals): Make static.
gdb/doc/ChangeLog:
* gdb.texinfo (Maintenance Commands): Mention "maint info jit".
gdb/testsuite/ChangeLog:
* gdb.base/jit-elf-fork-main.c: New test.
* gdb.base/jit-elf-fork-solib.c: New test.
* gdb.base/jit-elf-fork.exp: New test.
Change-Id: I9a192e55b8a451c00e88100669283fc9ca60de5c
A following patch will want to take some action when a pending wait
status is set on or removed from a thread. Add a getter and a setter on
thread_info for the pending waitstatus, so that we can add some code in
the setter later.
The thing is, the pending wait status field is in the
thread_suspend_state, along with other fields that we need to backup
before and restore after the thread does an inferior function call.
Therefore, make the thread_suspend_state member private
(thread_info::suspend becomes thread_info::m_suspend), and add getters /
setters for all of its fields:
- pending wait status
- stop signal
- stop reason
- stop pc
For the pending wait status, add the additional has_pending_waitstatus
and clear_pending_waitstatus methods.
I think this makes the thread_info interface a bit nicer, because we
now access the fields as:
thread->stop_pc ()
rather than
thread->suspend.stop_pc
The stop_pc field being in the `suspend` structure is an implementation
detail of thread_info that callers don't need to be aware of.
For the backup / restore of the thread_suspend_state structure, add
save_suspend_to and restore_suspend_from methods. You might wonder why
`save_suspend_to`, as opposed to a simple getter like
thread_suspend_state &suspend ();
I want to make it clear that this is to be used only for backing up and
restoring the suspend state, _not_ to access fields like:
thread->suspend ()->stop_pc
Adding some getters / setters allows adding some assertions. I find
that this helps understand how things are supposed to work. Add:
- When getting the pending status (pending_waitstatus method), ensure
that there is a pending status.
- When setting a pending status (set_pending_waitstatus method), ensure
there is no pending status.
There is one case I found where this wasn't true - in
remote_target::process_initial_stop_replies - which needed adjustments
to respect that contract. I think it's because
process_initial_stop_replies is kind of (ab)using the
thread_info::suspend::waitstatus to store some statuses temporarily, for
its internal use (statuses it doesn't intent on leaving pending).
process_initial_stop_replies pulls out stop replies received during the
initial connection using target_wait. It always stores the received
event in `evthread->suspend.waitstatus`. But it only sets
waitstatus_pending_p, if it deems the event interesting enough to leave
pending, to be reported to the core:
if (ws.kind != TARGET_WAITKIND_STOPPED
|| ws.value.sig != GDB_SIGNAL_0)
evthread->suspend.waitstatus_pending_p = 1;
It later uses this flag a bit below, to choose which thread to make the
"selected" one:
if (selected == NULL
&& thread->suspend.waitstatus_pending_p)
selected = thread;
And ultimately that's used if the user-visible mode is all-stop, so that
we print the stop for that interesting thread:
/* In all-stop, we only print the status of one thread, and leave
others with their status pending. */
if (!non_stop)
{
thread_info *thread = selected;
if (thread == NULL)
thread = lowest_stopped;
if (thread == NULL)
thread = first;
print_one_stopped_thread (thread);
}
But in any case (all-stop or non-stop), print_one_stopped_thread needs
to access the waitstatus value of these threads that don't have a
pending waitstatus (those that had TARGET_WAITKIND_STOPPED +
GDB_SIGNAL_0). This doesn't work with the assertions I've
put.
So, change the code to only set the thread's wait status if it is an
interesting one that we are going to leave pending. If the thread
stopped due to a non-interesting event (TARGET_WAITKIND_STOPPED +
GDB_SIGNAL_0), don't store it. Adjust print_one_stopped_thread to
understand that if a thread has no pending waitstatus, it's because it
stopped with TARGET_WAITKIND_STOPPED + GDB_SIGNAL_0.
The call to set_last_target_status also uses the pending waitstatus.
However, given that the pending waitstatus for the thread may have been
cleared in print_one_stopped_thread (and that there might not even be a
pending waitstatus in the first place, as explained above), it is no
longer possible to do it at this point. To fix that, move the call to
set_last_target_status in print_one_stopped_thread. I think this will
preserve the existing behavior, because set_last_target_status is
currently using the current thread's wait status. And the current
thread is the last one for which print_one_stopped_thread is called. So
by calling set_last_target_status in print_one_stopped_thread, we'll get
the same result. set_last_target_status will possibly be called
multiple times, but only the last call will matter. It just means
possibly more calls to set_last_target_status, but those are cheap.
Change-Id: Iedab9653238eaf8231abcf0baa20145acc8b77a7