gdb: avoid premature dummy frame garbage collection

Consider the following chain of events: * GDB is performing an inferior call, and * the inferior calls longjmp, and * GDB detects that the longjmp has completed, stops, and enters check_longjmp_breakpoint_for_call_dummy (in breakpoint.c), and * GDB tries to unwind the stack in order to check that the dummy frame (setup for the inferior call) is still on the stack, but * The unwind fails, possibly due to missing debug information, so * GDB incorrectly concludes that the inferior has longjmp'd past the dummy frame, and so deletes the dummy frame, including the dummy frame breakpoint, but then * The inferior continues, and eventually returns to the dummy frame, which is usually (always?) on the stack, the inferior starts trying to execute the random contents of the stack, this results in undefined behaviour. This situation is already warned about in the comment on the function check_longjmp_breakpoint_for_call_dummy where we say: You should call this function only at places where it is safe to currently unwind the whole stack. Failed stack unwind would discard live dummy frames. The warning here is fine, the problem is that, even though we call the function from a location within GDB where we hope to be able to unwind, sometime the state of the inferior means that the unwind will not succeed. This commit tries to improve the situation by adding the following additional check; when GDB fails to find the dummy frame on the stack, instead of just assuming that the dummy frame can be garbage collected, first find the stop_reason for the last frame on the stack. If this stop_reason indicates that the stack unwinding may have failed then we assume that the dummy frame is still in use. However, if the last frame's stop_reason indicates that the stack unwind completed successfully then we can be confident that the dummy frame is no longer in use, and we garbage collect it. Tested on x86-64 GNU/Linux. gdb/ChangeLog: * breakpoint.c (check_longjmp_breakpoint_for_call_dummy): Add check for why the backtrace stopped. gdb/testsuite/ChangeLog: * gdb.base/premature-dummy-frame-removal.c: New file. * gdb.base/premature-dummy-frame-removal.exp: New file. * gdb.base/premature-dummy-frame-removal.py: New file. Change-Id: I8f330cfe0f3f33beb3a52a36994094c4abada07e
2019-08-29 12:37:00 +01:00 · 2019-08-29 12:37:00 +01:00 · b4b3e2dee2
commit b4b3e2dee2
parent a2cf3633b3
6 changed files with 238 additions and 4 deletions
--- a/gdb/breakpoint.c
+++ b/gdb/breakpoint.c
@ -7357,9 +7357,10 @@ set_longjmp_breakpoint_for_call_dummy (void)
   TP.  Remove those which can no longer be found in the current frame
   stack.

-   You should call this function only at places where it is safe to currently
-   unwind the whole stack.  Failed stack unwind would discard live dummy
-   frames.  */
+   If the unwind fails then there is not sufficient information to discard
+   dummy frames.  In this case, elide the clean up and the dummy frames will
+   be cleaned up next time this function is called from a location where
+   unwinding is possible.  */

 void
 check_longjmp_breakpoint_for_call_dummy (struct thread_info *tp)
@ -7371,12 +7372,55 @@ check_longjmp_breakpoint_for_call_dummy (struct thread_info *tp)
      {
 	struct breakpoint *dummy_b = b->related_breakpoint;

+	/* Find the bp_call_dummy breakpoint in the list of breakpoints
+	   chained off b->related_breakpoint.  */
 	while (dummy_b != b && dummy_b->type != bp_call_dummy)
 	  dummy_b = dummy_b->related_breakpoint;
+
+	/* If there was no bp_call_dummy breakpoint then there's nothing
+	   more to do.  Or, if the dummy frame associated with the
+	   bp_call_dummy is still on the stack then we need to leave this
+	   bp_call_dummy in place.  */
 	if (dummy_b->type != bp_call_dummy
 	    || frame_find_by_id (dummy_b->frame_id) != NULL)
 	  continue;
-	
+
+	/* We didn't find the dummy frame on the stack, this could be
+	   because we have longjmp'd to a stack frame that is previous to
+	   the dummy frame, or it could be because the stack unwind is
+	   broken at some point between the longjmp frame and the dummy
+	   frame.
+
+	   Next we figure out why the stack unwind stopped.  If it looks
+	   like the unwind is complete then we assume the dummy frame has
+	   been jumped over, however, if the unwind stopped for an
+	   unexpected reason then we assume the stack unwind is currently
+	   broken, and that we will (eventually) return to the dummy
+	   frame.
+
+	   It might be tempting to consider using frame_id_inner here, but
+	   that is not safe.   There is no guarantee that the stack frames
+	   we are looking at here are even on the same stack as the
+	   original dummy frame, hence frame_id_inner can't be used.  See
+	   the comments on frame_id_inner for more details.  */
+	bool unwind_finished_unexpectedly = false;
+	for (struct frame_info *fi = get_current_frame (); fi != nullptr; )
+	  {
+	    struct frame_info *prev = get_prev_frame (fi);
+	    if (prev == nullptr)
+	      {
+		/* FI is the last stack frame.  Why did this frame not
+		   unwind further?  */
+		auto stop_reason = get_frame_unwind_stop_reason (fi);
+		if (stop_reason != UNWIND_NO_REASON
+		    && stop_reason != UNWIND_OUTERMOST)
+		  unwind_finished_unexpectedly = true;
+	      }
+	    fi = prev;
+	  }
+	if (unwind_finished_unexpectedly)
+	  continue;
+
 	dummy_frame_discard (dummy_b->frame_id, tp);

 	while (b->related_breakpoint != b)