gdb/python: improve formatting of help text for user defined commands

Consider this command defined in Python (in the file test-cmd.py): class test_cmd (gdb.Command): """ This is the first line. Indented second line. This is the third line. """ def __init__ (self): super ().__init__ ("test-cmd", gdb.COMMAND_OBSCURE) def invoke (self, arg, from_tty): print ("In test-cmd") test_cmd() Now, within a GDB session: (gdb) source test-cmd.py (gdb) help test-cmd This is the first line. Indented second line. This is the third line. (gdb) I think there's three things wrong here: 1. The leading blank line, 2. The trailing blank line, and 3. Every line is indented from the left edge slightly. The problem of course, is that GDB is using the Python doc string verbatim as its help text. While the user has formatted the help text so that it appears clear within the .py file, this means that the text appear less well formatted when displayed in the "help" output. The same problem can be observed for gdb.Parameter objects in their set/show output. In this commit I aim to improve the "help" output for commands and parameters. To do this I have added gdbpy_fix_doc_string_indentation, a new function that rewrites the doc string text following the following rules: 1. Leading blank lines are removed, 2. Trailing blank lines are removed, and 3. Leading whitespace is removed in a "smart" way such that the relative indentation of lines is retained. With this commit in place the above example now looks like this: (gdb) source ~/tmp/test-cmd.py (gdb) help test-cmd This is the first line. Indented second line. This is the third line. (gdb) Which I think is much neater. Notice that the indentation of the second line is retained. Any blank lines within the help text (not leading or trailing) will be retained. I've added a NEWS entry to note that there has been a change in behaviour, but I didn't update the manual. The existing manual is suitably vague about how the doc string is used, so I think the new behaviour is covered just as well by the existing text.
2022-05-16 19:26:54 +01:00 · 2022-05-16 19:26:54 +01:00 · 51e8dbe1fb
commit 51e8dbe1fb
parent 0e77ff2c86
6 changed files with 507 additions and 0 deletions
--- a/gdb/python/py-cmd.c
+++ b/gdb/python/py-cmd.c
@ -493,6 +493,7 @@ cmdpy_init (PyObject *self, PyObject *args, PyObject *kw)
 	  docstring = python_string_to_host_string (ds_obj.get ());
 	  if (docstring == nullptr)
 	    return -1;
+	  docstring = gdbpy_fix_doc_string_indentation (std::move (docstring));
 	}
    }
  if (docstring == nullptr)
--- a/gdb/python/py-param.c
+++ b/gdb/python/py-param.c
@ -385,6 +385,8 @@ get_doc_string (PyObject *object, enum doc_string_type doc_type,
 	  result = python_string_to_host_string (ds_obj.get ());
 	  if (result == NULL)
 	    gdbpy_print_stack ();
+	  else if (doc_type == doc_string_description)
+	    result = gdbpy_fix_doc_string_indentation (std::move (result));
 	}
    }

--- a/gdb/python/py-utils.c
+++ b/gdb/python/py-utils.c
@ -400,3 +400,197 @@ gdbpy_handle_exception ()
  else
    error ("%s", msg.get ());
 }
+
+/* See python-internal.h.  */
+
+gdb::unique_xmalloc_ptr<char>
+gdbpy_fix_doc_string_indentation (gdb::unique_xmalloc_ptr<char> doc)
+{
+  /* A structure used to track the white-space information on each line of
+     DOC.  */
+  struct line_whitespace
+  {
+    /* Constructor.  OFFSET is the offset from the start of DOC, WS_COUNT
+       is the number of whitespace characters starting at OFFSET.  */
+    line_whitespace (size_t offset, int ws_count)
+      : m_offset (offset),
+	m_ws_count (ws_count)
+    { /* Nothing.  */ }
+
+    /* The offset from the start of DOC.  */
+    size_t offset () const
+    { return m_offset; }
+
+    /* The number of white-space characters at the start of this line.  */
+    int ws () const
+    { return m_ws_count; }
+
+  private:
+    /* The offset from the start of DOC to the first character of this
+       line.  */
+    size_t m_offset;
+
+    /* White space count on this line, the first character of this
+       whitespace is at OFFSET.  */
+    int m_ws_count;
+  };
+
+  /* Count the number of white-space character starting at TXT.  We
+     currently only count true single space characters, things like tabs,
+     newlines, etc are not counted.  */
+  auto count_whitespace = [] (const char *txt) -> int
+  {
+    int count = 0;
+
+    while (*txt == ' ')
+      {
+	++txt;
+	++count;
+      }
+
+    return count;
+  };
+
+  /* In MIN_WHITESPACE we track the smallest number of whitespace
+     characters seen at the start of a line (that has actual content), this
+     is the number of characters that we can delete off all lines without
+     altering the relative indentation of all lines in DOC.
+
+     The first line often has no indentation, but instead starts immediates
+     after the 3-quotes marker within the Python doc string, so, if the
+     first line has zero white-space then we just ignore it, and don't set
+     MIN_WHITESPACE to zero.
+
+     Lines without any content should (ideally) have no white-space at
+     all, but if they do then they might have an artificially low number
+     (user left a single stray space at the start of an otherwise blank
+     line), we don't consider lines without content when updating the
+     MIN_WHITESPACE value.  */
+  gdb::optional<int> min_whitespace;
+
+  /* The index into WS_INFO at which the processing of DOC can be
+     considered "all done", that is, after this point there are no further
+     lines with useful content and we should just stop.  */
+  gdb::optional<size_t> all_done_idx;
+
+  /* White-space information for each line in DOC.  */
+  std::vector<line_whitespace> ws_info;
+
+  /* Now look through DOC and collect the required information.  */
+  const char *tmp = doc.get ();
+  while (*tmp != '\0')
+    {
+      /* Add an entry for the offset to the start of this line, and how
+	 much white-space there is at the start of this line.  */
+      size_t offset = tmp - doc.get ();
+      int ws_count = count_whitespace (tmp);
+      ws_info.emplace_back (offset, ws_count);
+
+      /* Skip over the white-space.  */
+      tmp += ws_count;
+
+      /* Remember where the content of this line starts, and skip forward
+	 to either the end of this line (newline) or the end of the DOC
+	 string (null character), whichever comes first.  */
+      const char *content_start = tmp;
+      while (*tmp != '\0' && *tmp != '\n')
+	++tmp;
+
+      /* If this is not the first line, and if this line has some content,
+	 then update MIN_WHITESPACE, this reflects the smallest number of
+	 whitespace characters we can delete from all lines without
+	 impacting the relative indentation of all the lines of DOC.  */
+      if (offset > 0 && tmp > content_start)
+	{
+	  if (!min_whitespace.has_value ())
+	    min_whitespace = ws_count;
+	  else
+	    min_whitespace = std::min (*min_whitespace, ws_count);
+	}
+
+      /* Each time we encounter a line that has some content we update
+	 ALL_DONE_IDX to be the index of the next line.  If the last lines
+	 of DOC don't contain any content then ALL_DONE_IDX will be left
+	 pointing at an earlier line.  When we rewrite DOC, when we reach
+	 ALL_DONE_IDX then we can stop, the allows us to trim any blank
+	 lines from the end of DOC.  */
+      if (tmp > content_start)
+	all_done_idx = ws_info.size ();
+
+      /* If we reached a newline then skip forward to the start of the next
+	 line.  The other possibility at this point is that we're at the
+	 very end of the DOC string (null terminator).  */
+      if (*tmp == '\n')
+	++tmp;
+    }
+
+  /* We found no lines with content, fail safe by just returning the
+     original documentation string.  */
+  if (!all_done_idx.has_value () || !min_whitespace.has_value ())
+    return doc;
+
+  /* Setup DST and SRC, both pointing into the DOC string.  We're going to
+     rewrite DOC in-place, as we only ever make DOC shorter (by removing
+     white-space), thus we know this will not overflow.  */
+  char *dst = doc.get ();
+  char *src = doc.get ();
+
+  /* Array indices used with DST, SRC, and WS_INFO respectively.  */
+  size_t dst_offset = 0;
+  size_t src_offset = 0;
+  size_t ws_info_offset = 0;
+
+  /* Now, walk over the source string, this is the original DOC.  */
+  while (src[src_offset] != '\0')
+    {
+      /* If we are at the start of the next line (in WS_INFO), then we may
+	 need to skip some white-space characters.  */
+      if (src_offset == ws_info[ws_info_offset].offset ())
+	{
+	  /* If a line has leading white-space then we need to skip over
+	     some number of characters now.  */
+	  if (ws_info[ws_info_offset].ws () > 0)
+	    {
+	      /* If the line is entirely white-space then we skip all of
+		 the white-space, the next character to copy will be the
+		 newline or null character.  Otherwise, we skip the just
+		 some portion of the leading white-space.  */
+	      if (src[src_offset + ws_info[ws_info_offset].ws ()] == '\n'
+		  || src[src_offset + ws_info[ws_info_offset].ws ()] == '\0')
+		src_offset += ws_info[ws_info_offset].ws ();
+	      else
+		src_offset += std::min (*min_whitespace,
+					ws_info[ws_info_offset].ws ());
+
+	      /* If we skipped white-space, and are now at the end of the
+		 input, then we're done.  */
+	      if (src[src_offset] == '\0')
+		break;
+	    }
+	  if (ws_info_offset < (ws_info.size () - 1))
+	    ++ws_info_offset;
+	  if (ws_info_offset > *all_done_idx)
+	    break;
+	}
+
+      /* Don't copy a newline to the start of the DST string, this would
+	 result in a leading blank line.  But in all other cases, copy the
+	 next character into the destination string.  */
+      if ((dst_offset > 0 || src[src_offset] != '\n'))
+	{
+	  dst[dst_offset] = src[src_offset];
+	  ++dst_offset;
+	}
+
+      /* Move to the next source character.  */
+      ++src_offset;
+    }
+
+  /* Remove the trailing newline character(s), and ensure we have a null
+     terminator in place.  */
+  while (dst_offset > 1 && dst[dst_offset - 1] == '\n')
+    --dst_offset;
+  dst[dst_offset] = '\0';
+
+  return doc;
+}
--- a/gdb/python/python-internal.h
+++ b/gdb/python/python-internal.h
@ -822,4 +822,25 @@ extern bool gdbpy_is_architecture (PyObject *obj);

 extern bool gdbpy_is_progspace (PyObject *obj);

+/* Take DOC, the documentation string for a GDB command defined in Python,
+   and return an (possibly) modified version of that same string.
+
+   When a command is defined in Python, the documentation string will
+   usually be indented based on the indentation of the surrounding Python
+   code.  However, the documentation string is a literal string, all the
+   white-space added for indentation is included within the documentation
+   string.
+
+   This indentation is then included in the help text that GDB displays,
+   which looks odd out of the context of the original Python source code.
+
+   This function analyses DOC and tries to figure out what white-space
+   within DOC was added as part of the indentation, and then removes that
+   white-space from the copy that is returned.
+
+   If the analysis of DOC fails then DOC will be returned unmodified.  */
+
+extern gdb::unique_xmalloc_ptr<char> gdbpy_fix_doc_string_indentation
+  (gdb::unique_xmalloc_ptr<char> doc);
+
 #endif /* PYTHON_PYTHON_INTERNAL_H */