Currently, Python code can use event registries to detect when gdb
loads a new objfile, and when gdb clears the objfile list. However,
there's no way to detect the removal of an objfile, say when the
inferior calls dlclose.
This patch adds a gdb.free_objfile event registry and arranges for an
event to be emitted in this case.
When I rebased and updated the print_options patch, I forgot to update
print_options to add the new 'nibbles' feature to the result. This
patch fixes the oversight. I'm checking this in.
This adds a 'summary' mode to Value.format_string and to
gdb.print_options. For the former, it lets Python code format values
using this mode. For the latter, it lets a printer potentially detect
if it is being called in a backtrace with 'set print frame-arguments'
set to 'scalars'.
I considered adding a new mode here to let a pretty-printer see
whether it was being called in a 'backtrace' context at all, but I'm
not sure if this is really desirable.
PR python/17291 asks for access to the current print options. While I
think this need is largely satisfied by the existence of
Value.format_string, it seemed to me that a bit more could be done.
First, while Value.format_string uses the user's settings, it does not
react to temporary settings such as "print/x". This patch changes
this.
Second, there is no good way to examine the current settings (in
particular the temporary ones in effect for just a single "print").
This patch adds this as well.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17291
PR python/27000 points out that gdb.block_for_pc will accept a Python
integer, but not a gdb.Value. This patch corrects this oversight.
I looked at all uses of GDB_PY_LLU_ARG and fixed these up to use
get_addr_from_python instead. I also looked at uses of GDB_PY_LL_ARG,
but those seemed relatively unlikely to be useful with a gdb.Value, so
I didn't change them. My thinking here is that a Value will typically
come from inferior memory, and something like a line number is not too
likely to be found this way.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27000
PR python/29217 points out that gdb.parameter will return bool values,
but gdb.set_parameter will not properly accept them. This patch fixes
the problem by adding a special case to set_parameter.
I looked at a fix involving rewriting set_parameter in C++. However,
this one is simpler.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29217
Pierre-Marie noticed that, while gdb.events is a Python module, it
can't be imported. This patch changes how this module is created, so
that it can be imported, while also ensuring that the module is always
visible, just as it was in the past.
This new approach required one non-obvious change -- when running
gdb.base/warning.exp, where --data-directory is intentionally not
found, the event registries can now be nullptr. Consequently, this
patch probably also requires
https://sourceware.org/pipermail/gdb-patches/2022-June/189796.html
Note that this patch obsoletes
https://sourceware.org/pipermail/gdb-patches/2022-June/189797.html
With python 3.4, I run into:
...
Traceback (most recent call last):^M
File "<string>", line 1, in <module>^M
File
"outputs/gdb.python/py-send-packet/py-send-packet.py", line 128, in \
run_set_global_var_test^M
res = conn.send_packet(b"X%x,4:\x02\x02\x02\x02" % addr)^M
TypeError: Could not convert Python object: b'X%x,4:\x02\x02\x02\x02'.^M
Error while executing Python code.^M
...
while with python 3.6 this works fine.
The type of addr is <class 'gdb.Value'>, so the first thing to try is whether
changing it into a string works:
...
addr_str = "%x" % addr
res = conn.send_packet(b"X%s,4:\x02\x02\x02\x02" % addr_str)
...
which gets us the more detailed:
...
TypeError: unsupported operand type(s) for %: 'bytes' and 'str'
...
Fix this by avoiding the '%' operator in the byte literal, and use instead:
...
def xpacket_header (addr):
return ("X%x,4:" % addr).encode('ascii')
...
res = conn.send_packet(xpacket_header(addr) + b"\x02\x02\x02\x02")
...
Tested on x86_64-linux, with python 3.4 and 3.6, and a backported version was
tested on the gdb-12-branch in combination with python 2.7.
It is not necessary to call get_compiler_info before calling
test_compiler_info, and, after recent commits that removed setting up
the gcc_compiled, true, and false globals from get_compiler_info,
there is now no longer any need for any test script to call
get_compiler_info directly.
As a result every call to get_compiler_info outside of lib/gdb.exp is
redundant, and this commit removes them all.
There should be no change in what is tested after this commit.
This patch makes it possible to allow Value.format_string() to return
nibbles output.
When we set the parameter of nibbles to True, we can achieve the
displaying binary values in groups of every four bits.
Here's an example:
(gdb) py print (gdb.Value (1230).format_string (format='t', nibbles=True))
0100 1100 1110
(gdb)
Note that the parameter nibbles is only useful if format='t' is also used.
This patch also includes update to the relevant testcase and
documentation.
Tested on x86_64 openSUSE Tumbleweed.
This commit extends the Python API to include disassembler support.
The motivation for this commit was to provide an API by which the user
could write Python scripts that would augment the output of the
disassembler.
To achieve this I have followed the model of the existing libopcodes
disassembler, that is, instructions are disassembled one by one. This
does restrict the type of things that it is possible to do from a
Python script, i.e. all additional output has to fit on a single line,
but this was all I needed, and creating something more complex would,
I think, require greater changes to how GDB's internal disassembler
operates.
The disassembler API is contained in the new gdb.disassembler module,
which defines the following classes:
DisassembleInfo
Similar to libopcodes disassemble_info structure, has read-only
properties: address, architecture, and progspace. And has methods:
__init__, read_memory, and is_valid.
Each time GDB wants an instruction disassembled, an instance of
this class is passed to a user written disassembler function, by
reading the properties, and calling the methods (and other support
methods in the gdb.disassembler module) the user can perform and
return the disassembly.
Disassembler
This is a base-class which user written disassemblers should
inherit from. This base class provides base implementations of
__init__ and __call__ which the user written disassembler should
override.
DisassemblerResult
This class can be used to hold the result of a call to the
disassembler, it's really just a wrapper around a string (the text
of the disassembled instruction) and a length (in bytes). The user
can return an instance of this class from Disassembler.__call__ to
represent the newly disassembled instruction.
The gdb.disassembler module also provides the following functions:
register_disassembler
This function registers an instance of a Disassembler sub-class
as a disassembler, either for one specific architecture, or, as a
global disassembler for all architectures.
builtin_disassemble
This provides access to GDB's builtin disassembler. A common
use case that I see is augmenting the existing disassembler output.
The user code can call this function to have GDB disassemble the
instruction in the normal way. The user gets back a
DisassemblerResult object, which they can then read in order to
augment the disassembler output in any way they wish.
This function also provides a mechanism to intercept the
disassemblers reads of memory, thus the user can adjust what GDB
sees when it is disassembling.
The included documentation provides a more detailed description of the
API.
There is also a new CLI command added:
maint info python-disassemblers
This command is defined in the Python gdb.disassemblers module, and
can be used to list the currently registered Python disassemblers.
When running test-case gdb.python/py-mi-cmd.exp on openSUSE Leap 42.3 with
python 3.4, I occasionally run into:
...
Expecting: ^(-pycmd dct[^M
]+)?(\^done,result={hello="world",times="42"}[^M
]+[(]gdb[)] ^M
[ ]*)
-pycmd dct^M
^done,result={times="42",hello="world"}^M
(gdb) ^M
FAIL: gdb.python/py-mi-cmd.exp: -pycmd dct (unexpected output)
...
The problem is that the data type used here in py-mi-cmd.py:
...
elif argv[0] == "dct":
return {"result": {"hello": "world", "times": 42}}
...
is a dictionary, and only starting version 3.6 are dictionaries insertion
ordered, so using PyDict_Next in serialize_mi_result doesn't guarantee a
fixed order.
Fix this by allowing the alternative order.
Tested on x86_64-linux.
In commit:
commit 51e8dbe1fb
Date: Mon May 16 19:26:54 2022 +0100
gdb/python: improve formatting of help text for user defined commands
the test that was added (gdb.python/py-doc-reformat.exp) was missing a
call to skip_python_tests. As a result, this test would fail for any
GDB built within Python support.
This commit adds a call to skip_python_tests.
This adds the gdb.current_language function, which can be used to find
the current language without (1) ever having the value "auto" or (2)
having to parse the output of "show language".
It also adds the gdb.Frame.language, which can be used to find the
language of a given frame. This is normally preferable if one has a
Frame object handy.
Consider this command defined in Python (in the file test-cmd.py):
class test_cmd (gdb.Command):
"""
This is the first line.
Indented second line.
This is the third line.
"""
def __init__ (self):
super ().__init__ ("test-cmd", gdb.COMMAND_OBSCURE)
def invoke (self, arg, from_tty):
print ("In test-cmd")
test_cmd()
Now, within a GDB session:
(gdb) source test-cmd.py
(gdb) help test-cmd
This is the first line.
Indented second line.
This is the third line.
(gdb)
I think there's three things wrong here:
1. The leading blank line,
2. The trailing blank line, and
3. Every line is indented from the left edge slightly.
The problem of course, is that GDB is using the Python doc string
verbatim as its help text. While the user has formatted the help text
so that it appears clear within the .py file, this means that the text
appear less well formatted when displayed in the "help" output.
The same problem can be observed for gdb.Parameter objects in their
set/show output.
In this commit I aim to improve the "help" output for commands and
parameters.
To do this I have added gdbpy_fix_doc_string_indentation, a new
function that rewrites the doc string text following the following
rules:
1. Leading blank lines are removed,
2. Trailing blank lines are removed, and
3. Leading whitespace is removed in a "smart" way such that the
relative indentation of lines is retained.
With this commit in place the above example now looks like this:
(gdb) source ~/tmp/test-cmd.py
(gdb) help test-cmd
This is the first line.
Indented second line.
This is the third line.
(gdb)
Which I think is much neater. Notice that the indentation of the
second line is retained. Any blank lines within the help text (not
leading or trailing) will be retained.
I've added a NEWS entry to note that there has been a change in
behaviour, but I didn't update the manual. The existing manual is
suitably vague about how the doc string is used, so I think the new
behaviour is covered just as well by the existing text.
After the patch to make gdb_test's question non-optional when
specified, gdb.python/py-connection.exp started failing like so:
$ make check TESTS="gdb.python/py-connection.exp" RUNTESTFLAGS="--target_board=native-gdbserver"
(gdb) PASS: gdb.python/py-connection.exp: info connections while the connection is still around
disconnect^M
Ending remote debugging.^M
(gdb) FAIL: gdb.python/py-connection.exp: kill the inferior
The problem is that "disconnect" when debugging with the native target
asks the user whether to kill the program, while with remote targets,
it doesn't.
Fix it by explicitly killing before disconnecting.
Tested with --target_board unix, native-gdbserver, and native-extended-gdbserver.
Change-Id: Icd85015c76deb84b71894715d43853c1087eba0b
A following patch will make gdb_test error out if bogus arguments are
passed, which exposed bugs in a few testcases:
- gdb.python/py-parameter.exp, passing a spurious "1" as extra
parameter, resulting in:
ERROR: Unexpected arguments: {set test-file-param bar.txt} {The name of the file has been changed to bar.txt} {set new file parameter} 1
- gdb.python/py-xmethods.exp, a missing test message, resulting in
the next gdb_test being interpreted as message, question and
response! With the enforcing patch, this was caught with:
ERROR: Unexpected arguments: {p g.mul<char>('a')} {From Python G<>::mul.*} gdb_test {p g_ptr->mul<char>('a')} {From Python G<>::mul.*} {after: g_ptr->mul<char>('a')}
- gdb.base/pointers.exp, missing a quote.
Change-Id: I66f2db4412025a64121db7347dfb0b48240d46d4
Many test cases had a few lines in the beginning that look like:
if { condition } {
continue
}
Where conditions varied, but were mostly in the form of ![runto_main] or
[skip_*_tests], making it quite clear that this code block was supposed
to finish the test if it entered the code block. This generates TCL
errors, as most of these tests are not inside loops. All cases on which
this was an obvious mistake are changed in this patch.
Commit c67f4e538 ("gdb/testsuite: make gdb.ada/mi_prot.exp stop at
expected location") introduced some DUPLICATEs in MI tests using
mi_continue_to_line, for example:
DUPLICATE: gdb.ada/mi_ref_changeable.exp: mi_continue_to_line: set temporary breakpoint
These test names were previously differentiated by the location passed
to mi_continue_to_line. Since the location can contain a path, that
commit removed the location from the test name, in favor of a hardcoded
string "set temporary breakpoint", hence removing the differentiator.
mi_continue_to_line receives a "test" parameter, containing a test
name. Add a "with_test_prefix" with that name, so that all tests
recorded during mi_continue_to_line have this in their name.
mi_continue_to_line passes that "test" string to mi_get_stop_line, that
is a bit superfluous. mi_get_stop_line only uses that string in case of
failures (it doesn't record a pass if everything goes fine). Since it's
not crucial, just remove it, and adjust all callers.
Adjust three gdb.mi/mi-var-*.exp tests to use prefixes to differentiate
the multiple calls to mi_run_inline_test (which calls
mi_continue_to_line).
Change-Id: I511c6caa70499f8657b1cde37d71068d74d56a74
We currently only test decimal and hexadecimal for the
gdb.Value.format_string() interface, this patch adds testcases for
binary format.
Tested on x86_64 openSUSE Tumbleweed(VERSION_ID="20220413").
Bug 28980 shows that trying to value_copy an entirely optimized out
value causes an internal error. The original bug report involves MI and
some Python pretty printer, and is quite difficult to reproduce, but
another easy way to reproduce (that is believed to be equivalent) was
proposed:
$ ./gdb -q -nx --data-directory=data-directory -ex "py print(gdb.Value(gdb.Value(5).type.optimized_out()))"
/home/smarchi/src/binutils-gdb/gdb/value.c:1731: internal-error: value_copy: Assertion `arg->contents != nullptr' failed.
This is caused by 5f8ab46bc6 ("gdb: constify parameter of
value_copy"). It added an assertion that the contents buffer is
allocated if the value is not lazy:
if (!value_lazy (val))
{
gdb_assert (arg->contents != nullptr);
This was based on the comment on value::contents, which suggest that
this is the case:
/* Actual contents of the value. Target byte-order. NULL or not
valid if lazy is nonzero. */
gdb::unique_xmalloc_ptr<gdb_byte> contents;
However, it turns out that it can also be nullptr also if the value is
entirely optimized out, for example on exit of
allocate_optimized_out_value. That function creates a lazy value, marks
the entire value as optimized out, and then clears the lazy flag. But
contents remains nullptr.
This wasn't a problem for value_copy before, because it was calling
value_contents_all_raw on the input value, which caused contents to be
allocated before doing the copy. This means that the input value to
value_copy did not have its contents allocated on entry, but had it
allocated on exit. The result value had it allocated on exit. And that
we copied bytes for an entirely optimized out value (i.e. meaningless
bytes).
From here I see two choices:
1. respect the documented invariant that contents is nullptr only and
only if the value is lazy, which means making
allocate_optimized_out_value allocate contents
2. extend the cases where contents can be nullptr to also include
values that are entirely optimized out (note that you could still
have some entirely optimized out values that do have contents
allocated, it depends on how they were created) and adjust
value_copy accordingly
Choice #1 is safe, but less efficient: it's not very useful to allocate
a buffer for an entirely optimized out value. It's even a bit less
efficient than what we had initially, because values coming out of
allocate_optimized_out_value would now always get their contents
allocated.
Choice #2 would be more efficient than what we had before: giving an
optimized out value without allocated contents to value_copy would
result in an optimized out value without allocated contents (and the
input value would still be without allocated contents on exit). But
it's more risky, since it's difficult to ensure that all users of the
contents (through the various_contents* accessors) are all fine with
that new invariant.
In this patch, I opt for choice #2, since I think it is a better
direction than choice #1. #1 would be a pessimization, and if we go
this way, I doubt that it will ever be revisited, it will just stay that
way forever.
Add a selftest to test this. I initially started to write it as a
Python test (since the reproducer is in Python), but a selftest is more
straightforward.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28980
Change-Id: I6e2f5c0ea804fafa041fcc4345d47064b5900ed7
The test gdb.python/py-format-address.exp, added in commit:
commit 25209e2c69
Date: Sat Oct 23 09:59:25 2021 +0100
gdb/python: add gdb.format_address function
included 3 copy & paste errors where the wrong address was used in the
expected output patterns.
The test compiles two almost identical test binaries (one function
changes its name, that's the only difference), two inferiors are
created, each inferior using one of the test binaries.
We then take the address of the name changing function in both
inferiors ('foo' in inferior 1 and 'bar' in inferior 2) and the tests
are carried out using these addresses.
What we're checking for is that symbols 'foo' and 'bar' show up in the
correct inferior, and that (as this test is for a Python API feature),
the user can have one inferior selected, but ask about the other
inferior, and see the correct symbol in the result.
The hope is that the two binaries will be laid out identically by the
compiler, and that 'foo' and 'bar' will be at the same address. This
is fine, unless the executable is compiled as PIE (position
independent executable), in which case there is a problem.
The problem is that though inferior 1 is set running, the inferior 2
never is. If the executables are compiled as PIE, then the address in
the inferior 2 will not have been resolved, while the address in the
inferior 1 will have been, and so the two addresses we use in the
tests will be different.
This issue was reported here:
https://sourceware.org/pipermail/gdb-patches/2022-March/186911.html
The first part of the fix is to use the correct address variable in
the expected output patterns, with this change the tests pass even
when the executables are compiled as PIE.
A second part of this fix is to pass the 'nopie' option when we
compile the tests, this should ensure that the address obtained in
inferior 2 is the same as the address from inferior 1, which makes the
test more useful.
This test was added without a corresponding fix, with some setup_kfails.
However, it results in UNRESOLVED results when GDB is built with ASan.
ERROR: GDB process no longer exists
GDB process exited with wait status 1946871 exp7 0 1
UNRESOLVED: gdb.python/pretty-print-call-by-hand.exp: frame print: backtrace test (PRMS gdb/28856)
Remove the test from the tree, I'll attach it to the Bugzilla bug
instead [1].
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=28856
Change-Id: Id95d8949fb8742874bd12aeac758aa4d7564d678
New in this version:
- Add a PY_MAJOR_VERSION check in configure.ac / AC_TRY_LIBPYTHON. If
the user passes --with-python=python2, this will cause a configure
failure saying that GDB only supports Python 3.
Support for Python 2 is a maintenance burden for any patches touching
Python support. Among others, the differences between Python 2 and 3
string and integer types are subtle. It requires a lot of effort and
thinking to get something that behaves correctly on both. And that's if
the author and reviewer of the patch even remember to test with Python
2.
See this thread for an example:
https://sourceware.org/pipermail/gdb-patches/2021-December/184260.html
So, remove Python 2 support. Update the documentation to state that GDB
can be built against Python 3 (as opposed to Python 2 or 3).
Update all the spots that use:
- sys.version_info
- IS_PY3K
- PY_MAJOR_VERSION
- gdb_py_is_py3k
... to only keep the Python 3 portions and drop the use of some
now-removed compatibility macros.
I did not update the configure script more than just removing the
explicit references to Python 2. We could maybe do more there, like
check the Python version and reject it if that version is not
supported. Otherwise (with this patch), things will only fail at
compile time, so it won't really be clear to the user that they are
trying to use an unsupported Python version. But I'm a bit lost in the
configure code that checks for Python, so I kept that for later.
Change-Id: I75b0f79c148afbe3c07ac664cfa9cade052c0c62
Add a new function, gdb.format_address, which is a wrapper around
GDB's print_address function.
This method takes an address, and returns a string with the format:
ADDRESS <SYMBOL+OFFSET>
Where, ADDRESS is the original address, formatted as hexadecimal,
SYMBOL is a symbol with an address lower than ADDRESS, and OFFSET is
the offset from SYMBOL to ADDRESS in decimal.
If there's no SYMBOL suitably close to ADDRESS then the
<SYMBOL+OFFSET> part is not included.
This is useful if a user wants to write a Python script that
pretty-prints addresses, the user no longer needs to do manual symbol
lookup, or worry about correctly formatting addresses.
Additionally, there are some settings that effect how GDB picks
SYMBOL, and whether the file name and line number should be included
with the SYMBOL name, the gdb.format_address function ensures that the
users Python script also benefits from these settings.
The gdb.format_address by default selects SYMBOL from the current
inferiors program space, and address is formatted using the
architecture for the current inferior. However, a user can also
explicitly pass a program space and architecture like this:
gdb.format_address(ADDRESS, PROGRAM_SPACE, ARCHITECTURE)
In order to format an address for a different inferior.
Notes on the implementation:
In py-arch.c I extended arch_object_to_gdbarch to add an assertion for
the type of the PyObject being worked on. Prior to this commit all
uses of arch_object_to_gdbarch were guaranteed to pass this function a
gdb.Architecture object, but, with this commit, this might not be the
case.
So, with this commit I've made it a requirement that the PyObject be a
gdb.Architecture, and this is checked with the assert. And in order
that callers from other files can check if they have a
gdb.Architecture object, I've added the new function
gdbpy_is_architecture.
In py-progspace.c I've added two new function, the first
progspace_object_to_program_space, converts a PyObject of type
gdb.Progspace to the associated program_space pointer, and
gdbpy_is_progspace checks if a PyObject is a gdb.Progspace or not.
The test case added here is testing the bug gdb/28856, where calling a
function by hand from a pretty printer makes GDB crash. There are 6
mechanisms to trigger this crash in the current test, using the commands
backtrace, up, down, finish, step and continue. Since the failure happens
because of use-after-free (more details below) the tests will always
have a chance of passing through sheer luck, but anecdotally they seem
to fail all of the time.
The reason GDB is crashing is a use-after-free problem. The above
mentioned functions save a pointer to the current frame's information,
then calls the pretty printer, and uses the saved pointer for different
reasons, depending on the function. The issue happens because
call_function_by_hand needs to reset the obstack to get the current
frame, invalidating the saved pointer.
The motivation for this patch is the fact that py-micmd.c doesn't build
with Python 2, due to PyDict_GetItemWithError being a Python 3-only
function:
CXX python/py-micmd.o
/home/smarchi/src/binutils-gdb/gdb/python/py-micmd.c: In function ‘int micmdpy_uninstall_command(micmdpy_object*)’:
/home/smarchi/src/binutils-gdb/gdb/python/py-micmd.c:430:20: error: ‘PyDict_GetItemWithError’ was not declared in this scope; did you mean ‘PyDict_GetItemString’?
430 | PyObject *curr = PyDict_GetItemWithError (mi_cmd_dict.get (),
| ^~~~~~~~~~~~~~~~~~~~~~~
| PyDict_GetItemString
A first solution to fix this would be to try to replace
PyDict_GetItemWithError equivalent Python 2 code. But I looked at why
we are doing this in the first place: it is to maintain the
`gdb._mi_commands` Python dictionary that we use as a `name ->
gdb.MICommand object` map. Since the `gdb._mi_commands` dictionary is
never actually used in Python, it seems like a lot of trouble to use a
Python object for this.
My first idea was to replace it with a C++ map
(std::unordered_map<std::string, gdbpy_ref<micmdpy_object>>). While
implementing this, I realized we don't really need this map at all. The
mi_command_py objects registered in the main MI command table can own
their backing micmdpy_object (that's a gdb.MICommand, but seen from the
C++ code). To know whether an mi_command is an mi_command_py, we can
use a dynamic cast. Since there's one less data structure to maintain,
there are less chances of messing things up.
- Change mi_command_py::m_pyobj to a gdbpy_ref, the mi_command_py is
now what keeps the MICommand alive.
- Set micmdpy_object::mi_command in the constructor of mi_command_py.
If mi_command_py manages setting/clearing that field in
swap_python_object, I think it makes sense that it also takes care of
setting it initially.
- Move a bunch of checks from micmdpy_install_command to
swap_python_object, and make them gdb_asserts.
- In micmdpy_install_command, start by doing an mi_cmd_lookup. This is
needed to know whether there's a Python MI command already registered
with that name. But we can already tell if there's a non-Python
command registered with that name. Return an error if that happens,
rather than waiting for insert_mi_cmd_entry to fail. Change the
error message to "name is already in use" rather than "may already be
in use", since it's more precise.
I asked Andrew about the original intent of using a Python dictionary
object to hold the command objects. The reason was to make sure the
objects get destroyed when the Python runtime gets finalized, not later.
Holding the objects in global C++ data structures and not doing anything
more means that the held Python objects will be decref'd after the
Python interpreter has been finalized. That's not desirable. I tried
it and it indeed segfaults.
Handle this by adding a gdbpy_finalize_micommands function called in
finalize_python. This is the mirror of gdbpy_initialize_micommands
called in do_start_initialization. In there, delete all Python MI
commands. I think it makes sense to do it this way: if it was somehow
possible to unload Python support from GDB in the middle of a session
we'd want to unregister any Python MI command. Otherwise, these MI
commands would be backed with a stale PyObject or simply nothing.
Delete tests that were related to `gdb._mi_commands`.
Co-Authored-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: I060d5ebc7a096c67487998a8a4ca1e8e56f12cd3
This commit allows a user to create custom MI commands using Python
similarly to what is possible for Python CLI commands.
A new subclass of mi_command is defined for Python MI commands,
mi_command_py. A new file, gdb/python/py-micmd.c contains the logic
for Python MI commands.
This commit is based on work linked too from this mailing list thread:
https://sourceware.org/pipermail/gdb/2021-November/049774.html
Which has also been previously posted to the mailing list here:
https://sourceware.org/pipermail/gdb-patches/2019-May/158010.html
And was recently reposted here:
https://sourceware.org/pipermail/gdb-patches/2022-January/185190.html
The version in this patch takes some core code from the previously
posted patches, but also has some significant differences, especially
after the feedback given here:
https://sourceware.org/pipermail/gdb-patches/2022-February/185767.html
A new MI command can be implemented in Python like this:
class echo_args(gdb.MICommand):
def invoke(self, args):
return { 'args': args }
echo_args("-echo-args")
The 'args' parameter (to the invoke method) is a list
containing (almost) all command line arguments passed to the MI
command (--thread and --frame are handled before the Python code is
called, and removed from the args list). This list can be empty if
the MI command was passed no arguments.
When used within gdb the above command produced output like this:
(gdb)
-echo-args a b c
^done,args=["a","b","c"]
(gdb)
The 'invoke' method of the new command must return a dictionary. The
keys of this dictionary are then used as the field names in the mi
command output (e.g. 'args' in the above).
The values of the result returned by invoke can be dictionaries,
lists, iterators, or an object that can be converted to a string.
These are processed recursively to create the mi output. And so, this
is valid:
class new_command(gdb.MICommand):
def invoke(self,args):
return { 'result_one': { 'abc': 123, 'def': 'Hello' },
'result_two': [ { 'a': 1, 'b': 2 },
{ 'c': 3, 'd': 4 } ] }
Which produces output like:
(gdb)
-new-command
^done,result_one={abc="123",def="Hello"},result_two=[{a="1",b="2"},{c="3",d="4"}]
(gdb)
I have required that the fields names used in mi result output must
match the regexp: "^[a-zA-Z][-_a-zA-Z0-9]*$" (without the quotes).
This restriction was never written down anywhere before, but seems
sensible to me, and we can always loosen this rule later if it proves
to be a problem. Much harder to try and add a restriction later, once
people are already using the API.
What follows are some details about how this implementation differs
from the original patch that was posted to the mailing list.
In this patch, I have changed how the lifetime of the Python
gdb.MICommand objects is managed. In the original patch, these object
were kept alive by an owned reference within the mi_command_py object.
As such, the Python object would not be deleted until the
mi_command_py object itself was deleted.
This caused a problem, the mi_command_py were held in the global mi
command table (in mi/mi-cmds.c), which, as a global, was not cleared
until program shutdown. By this point the Python interpreter has
already been shutdown. Attempting to delete the mi_command_py object
at this point was causing GDB to try and invoke Python code after
finalising the Python interpreter, and we would crash.
To work around this problem, the original patch added code in
python/python.c that would search the mi command table, and delete the
mi_command_py objects before the Python environment was finalised.
In contrast, in this patch, I have added a new global dictionary to
the gdb module, gdb._mi_commands. We already have several such global
data stores related to pretty printers, and frame unwinders.
The MICommand objects are placed into the new gdb.mi_commands
dictionary, and it is this reference that keeps the objects alive.
When GDB's Python interpreter is shut down gdb._mi_commands is deleted,
and any MICommand objects within it are deleted at this point.
This change avoids having to make the mi_cmd_table global, and walk
over it from within GDB's python related code.
This patch handles command redefinition entirely within GDB's python
code, though this does impose one small restriction which is not
present in the original code (detailed below), I don't think this is a
big issue. However, the original patch relied on being able to
finish executing the mi_command::do_invoke member function after the
mi_command object had been deleted. Though continuing to execute a
member function after an object is deleted is well defined, it is
also (IMHO) risky, its too easy for someone to later add a use of the
object without realising that the object might sometimes, have been
deleted. The new patch avoids this issue.
The one restriction that is added to avoid this, is that an MICommand
object can't be reinitialised with a different command name, so:
(gdb) python cmd = MyMICommand("-abc")
(gdb) python cmd.__init__("-def")
can't reinitialize object with a different command name
This feels like a pretty weird edge case, and I'm happy to live with
this restriction.
I have also changed how the memory is managed for the command name.
In the most recently posted patch series, the command name is moved
into a subclass of mi_command, the python mi_command_py, which
inherits from mi_command is then free to use a smart pointer to manage
the memory for the name.
In this patch, I leave the mi_command class unchanged, and instead
hold the memory for the name within the Python object, as the lifetime
of the Python object always exceeds the c++ object stored in the
mi_cmd_table. This adds a little more complexity in py-micmd.c, but
leaves the mi_command class nice and simple.
Next, this patch adds some extra functionality, there's a
MICommand.name read-only attribute containing the name of the command,
and a read-write MICommand.installed attribute that can be used to
install (make the command available for use) and uninstall (remove the
command from the mi_cmd_table so it can't be used) the command. This
attribute will be automatically updated if a second command replaces
an earlier command.
This patch adds additional error handling, and makes more use the
gdbpy_handle_exception function.
Co-Authored-By: Jan Vrany <jan.vrany@labware.com>
There's an interesting property of the 'char' type in C and C++, the
three types 'char', 'unsigned char', and 'signed char', are all
considered distinct.
In contrast, and 'int' is signed by default, and so 'int' and 'signed
int' are considered the same type.
This commit adds a test to ensure that this edge case is visible to a
user from Python.
It is worth noting that for any particular compiler implementation (or
the flags a compiler was invoked with), a 'char' will be either signed
or unsigned; it has to be one or the other, and a user can access this
information by using the Type.is_signed property. However, for
something like function overload resolution, the 'char' type is
considered distinct from the signed and unsigned variants.
There's no change to GDB with this commit, this is just adding a new
test to guard some existing functionality.
Add a new read-only property, Type.is_signed, which is True for signed
types, and False otherwise.
This property should only be read on types for which Type.is_scalar is
true, attempting to read this property for non-scalar types will raise
a ValueError.
I chose 'is_signed' rather than 'is_unsigned' in order to match the
existing Architecture.integer_type method, which takes a 'signed'
parameter. As far as I could find, that was the only existing
signed/unsigned selector in the Python API, so it seemed reasonable to
stay consistent.
This adds a new read-only attribute gdb.InferiorThread.details, this
attribute contains a string, the results of target_extra_thread_info
for the thread, or None, if target_extra_thread_info returns nullptr.
As the string returned by target_extra_thread_info is unstructured,
this attribute is only really useful for echoing straight through to
the user, but, if a user wants to write a command that displays the
same, or a similar 'Thread Id' to the one seen in 'info threads', then
they need access to this string.
Given that the string produced by target_extra_thread_info varies by
target, there's only minimal testing of this attribute, I check that
the attribute can be accessed, and that the return value is either
None, or a string.
Add a new argument to the gdb.Value.format_string method, 'styling'.
This argument is False by default.
When this argument is True, then the returned string can contain output
styling escape sequences.
When this argument is False, then the returned string will not contain
any styling escape sequences.
If the returned string is going to be printed to the user, then it is
often nice to retain the GDB styling.
For the testing, we need to adjust the TERM environment variable, as
we do for all the styling tests. I'm now running all of the C tests
in gdb.python/py-format-string.exp in an environment where styling
could be generated, but only my new test should actually produce
styled output, hopefully this will catch the case where a bug might
cause format_string to always produce styled output.
This commit adds support for source files that contain non utf-8
characters when performing source styling using the Python pygments
package. This does not change the behaviour of GDB when the GNU
Source Highlight library is used.
For the following problem description, assume that either GDB is built
without GNU Source Highlight support, of that this has been disabled
using 'maintenance set gnu-source-highlight enabled off'.
The initial problem reported was that a source file containing non
utf-8 characters would cause GDB to print a Python exception, and then
display the source without styling, e.g.:
Python Exception <class 'UnicodeDecodeError'>: 'utf-8' codec can't decode byte 0xc0 in position 142: invalid start byte
/* Source code here, without styling... */
Further, as the user steps through different source files, each time
the problematic source file was evicted from the source cache, and
then later reloaded, the exception would be printed again.
Finally, this problem is only present when using Python 3, this issue
is not present for Python 2.
What makes this especially frustrating is that GDB can clearly print
the source file contents, they're right there... If we disable
styling completely, or make use of the GNU Source Highlight library,
then everything is fine. So why is there an error when we try to
apply styling using Python?
The problem is the use of PyString_FromString (which is an alias for
PyUnicode_FromString in Python 3), this function converts a C string
into a either a Unicode object (Py3) or a str object (Py2). For
Python 2 there is no unicode encoding performed during this function
call, but for Python 3 the input is assumed to be a uft-8 encoding
string for the purpose of the conversion. And here of course, is the
problem, if the source file contains non utf-8 characters, then it
should not be treated as utf-8, but that's what we do, and that's why
we get an error.
My first thought when looking at this was to spot when the
PyString_FromString call failed with a UnicodeDecodeError and silently
ignore the error. This would mean that GDB would print the source
without styling, but would also avoid the annoying exception message.
However, I also make use of `pygmentize`, a command line wrapper
around the Python pygments module, which I use to apply syntax
highlighting in the output of `less`. And this command line wrapper
is quite happy to syntax highlight my source file that contains non
utf-8 characters, so it feels like the problem should be solvable.
It turns out that inside the pygments module there is already support
for guessing the encoding of the incoming file content, if the
incoming content is not already a Unicode string. This is what
happens for Python 2 where the incoming content is of `str` type.
We could try and make GDB smarter when it comes to converting C
strings into Python Unicode objects; this would probably require us to
just try a couple of different encoding schemes rather than just
giving up after utf-8.
However, I figure, why bother? The pygments module already does this
for us, and the colorize API is not part of the documented external
API of GDB. So, why not just change the colorize API, instead of the
content being a Unicode string (for Python 3), lets just make the
content be a bytes object. The pygments module can then take
responsibility for guessing the encoding.
So, currently, the colorize API receives a unicode object, and returns
a unicode object. I propose that the colorize API receive a bytes
object, and return a bytes object.
This commit attempts to improve the help text that is generated for
gdb.Parameter objects when the user fails to provide their own
documentation.
Documentation for a gdb.Parameter is currently pulled from two
sources: the class documentation string, and the set_doc/show_doc
class attributes. Thus, a fully documented parameter might look like
this:
class Param_All (gdb.Parameter):
"""This is the class documentation string."""
show_doc = "Show the state of this parameter"
set_doc = "Set the state of this parameter"
def get_set_string (self):
val = "on"
if (self.value == False):
val = "off"
return "Test Parameter has been set to " + val
def __init__ (self, name):
super (Param_All, self).__init__ (name, gdb.COMMAND_DATA, gdb.PARAM_BOOLEAN)
self._value = True
Param_All ('param-all')
Then in GDB we see this:
(gdb) help set param-all
Set the state of this parameter
This is the class documentation string.
Which is fine. But, if the user skips both of the documentation parts
like this:
class Param_None (gdb.Parameter):
def get_set_string (self):
val = "on"
if (self.value == False):
val = "off"
return "Test Parameter has been set to " + val
def __init__ (self, name):
super (Param_None, self).__init__ (name, gdb.COMMAND_DATA, gdb.PARAM_BOOLEAN)
self._value = True
Param_None ('param-none')
Now in GDB we see this:
(gdb) help set param-none
This command is not documented.
This command is not documented.
That's not great, the duplicated text looks a bit weird. If we drop
different parts we get different results. Here's what we get if the
user drops the set_doc and show_doc attributes:
(gdb) help set param-doc
This command is not documented.
This is the class documentation string.
That kind of sucks, we say it's undocumented, then proceed to print
the documentation. Finally, if we drop the class documentation but
keep the set_doc and show_doc:
(gdb) help set param-set-show
Set the state of this parameter
This command is not documented.
That seems OK.
So, I think there's room for improvement.
With this patch, for the four cases above we now see this:
# All values provided by the user, no change in this case:
(gdb) help set param-all
Set the state of this parameter
This is the class documentation string.
# Nothing provided by the user, the first string is now different:
(gdb) help set param-none
Set the current value of 'param-none'.
This command is not documented.
# Only the class documentation is provided, the first string is
# changed as in the previous case:
(gdb) help set param-doc
Set the current value of 'param-doc'.
This is the class documentation string.
# Only the set_doc and show_doc are provided, this case is unchanged
# from before the patch:
(gdb) help set param-set-show
Set the state of this parameter
This command is not documented.
The one place where this change might be considered a negative is when
dealing with prefix commands. If we create a prefix command but don't
supply the set_doc / show_doc strings, then this is what we saw before
my patch:
(gdb) python Param_None ('print param-none')
(gdb) help set print
set print, set pr, set p
Generic command for setting how things print.
List of set print subcommands:
... snip ...
set print param-none -- This command is not documented.
... snip ...
And after my patch:
(gdb) python Param_None ('print param-none')
(gdb) help set print
set print, set pr, set p
Generic command for setting how things print.
List of set print subcommands:
... snip ...
set print param-none -- Set the current value of 'print param-none'.
... snip ...
This seems slightly less helpful than before, but I don't think its
terrible.
Additionally, I've changed what we print when the get_show_string
method is not provided in Python.
Back when gdb.Parameter was first added to GDB, we didn't provide a
show function when registering the internal command object within
GDB. As a result, GDB would make use of its "magic" mangling of the
show_doc string to create a sentence that would display the current
value (see deprecated_show_value_hack in cli/cli-setshow.c).
However, when we added support for the get_show_string method to
gdb.Parameter, there was an attempt to maintain backward compatibility
by displaying the show_doc string with the current value appended, see
get_show_value in py-param.c. Unfortunately, this isn't anywhere
close to what deprecated_show_value_hack does, and the results are
pretty poor, for example, this is GDB before my patch:
(gdb) show param-none
This command is not documented. off
I think we can all agree that this is pretty bad.
After my patch, we how show this:
(gdb) show param-none
The current value of 'param-none' is "off".
Which at least is a real sentence, even if it's not very informative.
This patch does change the way that the Python API behaves slightly,
but only in the cases when the user has missed providing GDB with some
information. In most cases I think the new behaviour is a lot better,
there's the one case (noted above) which is a bit iffy, but I think is
still OK.
I've updated the existing gdb.python/py-parameter.exp test to cover
the modified behaviour.
Finally, I've updated the documentation to (I hope) make it clearer
how the various bits of help text come together.
Add a new function gdb.history_count to the Python api, this function
returns an integer, the number of items in GDB's value history.
This is useful if you want to pull items from the history by their
absolute number, for example, if you wanted to show a complete history
list. Previously we could figure out how many items are in the
history list by trying to fetch the items, and then catching the
exception when the item is not available, but having this function
seems nicer.
It's sometimes useful to temporarily set some gdb parameter from
Python. Now that the 'endian' crash is fixed, and now that the
current language is no longer captured by the Python layer, it seems
reasonable to add a helper function for this situation.
This adds a new gdb.with_parameter function. This creates a context
manager which temporarily sets some parameter to a specified value.
The old value is restored when the context is exited. This is most
useful with the Python "with" statement:
with gdb.with_parameter('language', 'ada'):
... do Ada stuff
This also adds a simple function to set a parameter,
gdb.set_parameter, as suggested by Andrew.
This is PR python/10790.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=10790
While looking into the language-capturing issue, I found another way
to crash gdb using parameters from Python:
(gdb) python print(gdb.parameter('endian'))
(This is related to PR python/12188, though this patch isn't going to
fix what that bug is really about.)
The problem here is that the global variable that underlies the
"endian" parameter is initialized to NULL. However, that's not a
valid value for an "enum" set/show parameter.
My understanding is that, in gdb, an "enum" parameter's underlying
variable must have a value that is "==" (not just strcmp-equal) to one
of the values coming from the enum array. This invariant is relied on
in various places.
I started this patch by fixing the problem with "endian". Then I
added some assertions to add_setshow_enum_cmd to try to catch other
problems of the same type.
This patch fixes all the problems that I found. I also looked at all
the calls to add_setshow_enum_cmd to ensure that they were all
included in the gdb I tested. I think they are: there are no calls in
nat-* files, or in remote-sim.c; and I was trying a build with all
targets, Python, and Guile enabled.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=12188
Currently, gdb's Python layer captures the current architecture and
language when "entering" Python code. This has some undesirable
effects, and so this series changes how this is handled.
First, there is code like this:
gdbpy_enter enter_py (python_gdbarch, python_language);
This is incorrect, because both of these are NULL when not otherwise
assigned. This can cause crashes in some cases -- I've added one to
the test suite. (Note that this crasher is just an example, other
ones along the same lines are possible.)
Second, when the language is captured in this way, it means that
Python code cannot affect the current language for its own purposes.
It's reasonable to want to write code like this:
gdb.execute('set language mumble')
... stuff using the current language
gdb.execute('set language previous-value')
However, this won't actually work, because the language is captured on
entry. I've added a test to show this as well.
This patch changes gdb to try to avoid capturing the current values.
The Python concept of the current gdbarch is only set in those few
cases where a non-default value is computed or needed; and the
language is not captured at all -- instead, in the cases where it's
required, the current language is temporarily changed.
When executed with --target_board=native-extended-gdbserver, the
gdb.python/py-events.exp test errors out with
ERROR: tcl error sourcing /path/to/gdb/testsuite/gdb.python/py-events.exp.
ERROR: can't read "process_id": no such variable
while executing
"lappend expected "ptid: \\($process_id, $process_id, 0\\)" "address: $addr""
(file "/path/to/gdb/testsuite/gdb.python/py-events.exp" line 103)
invoked from within
"source /path/to/gdb/testsuite/gdb.python/py-events.exp"
("uplevel" body line 1)
invoked from within
"uplevel #0 source /path/to/gdb/testsuite/gdb.python/py-events.exp"
invoked from within
"catch "uplevel #0 source $test_file_name""
There are multiple problems around this:
1. The process_id variable is not initialized to a default value.
2. The test attempts to find the PID of the current thread, but the
regexp that it uses is not tailored for the output printed by the
remote target.
3. The test uses "info threads" to find the current thread PID.
Using the "thread" command instead is simpler.
Fix these problems.
Commit 72ee03ff58 fixed a use-after-move bug in add_thread_object, but
it changed the inferior_thread attribute to contain the inferior instead
of the actual thread.
This now uses the thread_obj in its new location instead.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28429
When running test-case gdb.threads/schedlock-thread-exit.exp on a system with
system compiler gcc 4.8.5, I run into:
...
src/gdb/testsuite/gdb.threads/schedlock-thread-exit.c:33:3: error: \
'for' loop initial declarations are only allowed in C99 mode
...
Fix this by:
- using -std=c99, or
- using -std=gnu99, in case that's required, or
- in the case of the jit test-cases, rewriting the for loops.
Tested on x86_64-linux, both with gcc 4.8.5 and gcc 7.5.0.
This commit brings all the changes made by running gdb/copyright.py
as per GDB's Start of New Year Procedure.
For the avoidance of doubt, all changes in this commits were
performed by the script.
The documentation suggests that we implement gdb.Value.__init__,
however, this is not currently true, we really implement
gdb.Value.__new__. This will cause confusion if a user tries to
sub-class gdb.Value. They might write:
class MyVal (gdb.Value):
def __init__ (self, val):
gdb.Value.__init__(self, val)
obj = MyVal(123)
print ("Got: %s" % obj)
But, when they source this code they'll see:
(gdb) source ~/tmp/value-test.py
Traceback (most recent call last):
File "/home/andrew/tmp/value-test.py", line 7, in <module>
obj = MyVal(123)
File "/home/andrew/tmp/value-test.py", line 5, in __init__
gdb.Value.__init__(self, val)
TypeError: object.__init__() takes exactly one argument (the instance to initialize)
(gdb)
The reason for this is that, as we don't implement __init__ for
gdb.Value, Python ends up calling object.__init__ instead, which
doesn't expect any arguments.
The Python docs suggest that the reason why we might take this
approach is because we want gdb.Value to be immutable:
https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_new
But I don't see any reason why we should require gdb.Value to be
immutable when other types defined in GDB are not. This current
immutability can be seen in this code:
obj = gdb.Value(1234)
print("Got: %s" % obj)
obj.__init__ (5678)
print("Got: %s" % obj)
Which currently runs without error, but prints:
Got: 1234
Got: 1234
In this commit I propose that we switch to using __init__ to
initialize gdb.Value objects.
This does introduce some additional complexity, during the __init__
call a gdb.Value might already be associated with a gdb value object,
in which case we need to cleanly break that association before
installing the new gdb value object. However, the cost of doing this
is not great, and the benefit - being able to easily sub-class
gdb.Value seems worth it.
After this commit the first example above works without error, while
the second example now prints:
Got: 1234
Got: 5678
In order to make it easier to override the gdb.Value.__init__ method,
I have tweaked the definition of gdb.Value.__init__. The second,
optional argument to __init__ is a gdb.Type, if this argument is not
present then GDB figures out a suitable type.
However, if we want to override the __init__ method in a sub-class,
and still support the default argument, it is easier to write:
class MyVal (gdb.Value):
def __init__ (self, val, type=None):
gdb.Value.__init__(self, val, type)
Currently, passing None for the Type will result in an error:
TypeError: type argument must be a gdb.Type.
After this commit I now allow the type argument to be None, in which
case GDB figures out a suitable type just as if the type had not been
passed at all.
Unless a user is trying to reinitialize a value, or create sub-classes
of gdb.Value, there should be no user visible changes after this
commit.
A test in gdb.python/py-send-packet.exp added in this commit:
commit 24b2de7b77
Date: Tue Aug 31 14:04:36 2021 +0100
gdb/python: add gdb.RemoteTargetConnection.send_packet
included a large amount of binary data in the command sent to GDB. As
this test didn't have a real test name the binary data was included in
the gdb.sum file. The contents of the binary data could change
between different runs of GDB, and this makes comparing results
harder.
This commit gives the test a real test name.
When running the gdb.python/py-arch.exp tests on a GDB built
against Python 2 I ran into some errors. The problem is that this
test script exercises the gdb.Architecture.integer_type method, and
this method uses 'p' as an argument format specifier in a call to
gdb_PyArg_ParseTupleAndKeywords.
Unfortunately this specified was only added in Python 3.3, so will
cause an error for earlier versions of Python.
This commit switches to use the 'O' specifier to collect a PyObject,
and then uses PyObject_IsTrue to convert the object to a boolean.
An earlier version of this patch incorrectly switched from using 'p'
to use 'i', however, it was pointed out during review that this would
cause some changes in behaviour, for example both of these will work
with 'p', but not with 'i':
gdb.selected_inferior().architecture().integer_type(32, None)
gdb.selected_inferior().architecture().integer_type(32, "foo")
The new approach of using 'O' works fine with these cases. I've added
some new tests to cover both of the above.
There should be no user visible changes after this commit.