This implements support for the OpenMP 5.1 'present' modifier, which can be
used in map clauses in the 'target', 'target data', 'target data enter' and
'target data exit' constructs, and in the 'to' and 'from' clauses of the
'target update' construct. It is also supported in defaultmap.
The modifier triggers a fatal runtime error if the data specified by the
clause is not already present on the target device. It can also be combined
with 'always' in map clauses.
2023-06-06 Kwok Cheung Yeung <kcy@codesourcery.com>
Tobias Burnus <tobias@codesourcery.com>
gcc/c/
* c-parser.cc (c_parser_omp_clause_defaultmap,
c_parser_omp_clause_map): Parse 'present'.
(c_parser_omp_clause_to, c_parser_omp_clause_from): Remove.
(c_parser_omp_clause_from_to): New; parse to/from clauses with
optional present modifer.
(c_parser_omp_all_clauses): Update call.
(c_parser_omp_target_data, c_parser_omp_target_enter_data,
c_parser_omp_target_exit_data): Handle new map enum values
for 'present' mapping.
gcc/cp/
* parser.cc (cp_parser_omp_clause_defaultmap,
cp_parser_omp_clause_map): Parse 'present'.
(cp_parser_omp_clause_from_to): New; parse to/from
clauses with optional 'present' modifier.
(cp_parser_omp_all_clauses): Update call.
(cp_parser_omp_target_data, cp_parser_omp_target_enter_data,
cp_parser_omp_target_exit_data): Handle new enum value for
'present' mapping.
* semantics.cc (finish_omp_target): Likewise.
gcc/fortran/
* dump-parse-tree.cc (show_omp_namelist): Display 'present' map
modifier.
(show_omp_clauses): Display 'present' motion modifier for 'to'
and 'from' clauses.
* gfortran.h (enum gfc_omp_map_op): Add entries with 'present'
modifiers.
(struct gfc_omp_namelist): Add 'present_modifer'.
* openmp.cc (gfc_match_motion_var_list): New, handles optional
'present' modifier for to/from clauses.
(gfc_match_omp_clauses): Call it for to/from clauses; parse 'present'
in defaultmap and map clauses.
(resolve_omp_clauses): Allow 'present' modifiers on 'target',
'target data', 'target enter' and 'target exit' directives.
* trans-openmp.cc (gfc_trans_omp_clauses): Apply 'present' modifiers
to tree node for 'map', 'to' and 'from' clauses. Apply 'present' for
defaultmap.
gcc/
* gimplify.cc (omp_notice_variable): Apply GOVD_MAP_ALLOC_ONLY flag
and defaultmap flags if the defaultmap has GOVD_MAP_FORCE_PRESENT flag
set.
(omp_get_attachment): Handle map clauses with 'present' modifier.
(omp_group_base): Likewise.
(gimplify_scan_omp_clauses): Reorder present maps to come first.
Set GOVD flags for present defaultmaps.
(gimplify_adjust_omp_clauses_1): Set map kind for present defaultmaps.
* omp-low.cc (scan_sharing_clauses): Handle 'always, present' map
clauses.
(lower_omp_target): Handle map clauses with 'present' modifier.
Handle 'to' and 'from' clauses with 'present'.
* tree-core.h (enum omp_clause_defaultmap_kind): Add
OMP_CLAUSE_DEFAULTMAP_PRESENT defaultmap kind.
* tree-pretty-print.cc (dump_omp_clause): Handle 'map', 'to' and
'from' clauses with 'present' modifier. Handle present defaultmap.
* tree.h (OMP_CLAUSE_MOTION_PRESENT): New #define.
include/
* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_5): New.
(GOMP_MAP_FLAG_FORCE): Redefine.
(GOMP_MAP_FLAG_PRESENT, GOMP_MAP_FLAG_ALWAYS_PRESENT): New.
(enum gomp_map_kind): Add map kinds with 'present' modifiers.
(GOMP_MAP_COPY_TO_P, GOMP_MAP_COPY_FROM_P): Evaluate to true for
map variants with 'present'
(GOMP_MAP_ALWAYS_TO_P, GOMP_MAP_ALWAYS_FROM_P): Evaluate to true
for map variants with 'always, present' modifiers.
(GOMP_MAP_ALWAYS): Redefine.
(GOMP_MAP_FORCE_P, GOMP_MAP_PRESENT_P): New.
libgomp/
* libgomp.texi (OpenMP 5.1 Impl. status): Set 'present' support for
defaultmap to 'Y', add 'Y' entry for 'present' on to/from/map clauses.
* target.c (gomp_to_device_kind_p): Add map kinds with 'present'
modifier.
(gomp_map_vars_existing): Use new GOMP_MAP_FORCE_P macro.
(gomp_map_vars_internal, gomp_update, gomp_target_rev):
Emit runtime error if memory region not present.
* testsuite/libgomp.c-c++-common/target-present-1.c: New test.
* testsuite/libgomp.c-c++-common/target-present-2.c: New test.
* testsuite/libgomp.c-c++-common/target-present-3.c: New test.
* testsuite/libgomp.fortran/target-present-1.f90: New test.
* testsuite/libgomp.fortran/target-present-2.f90: New test.
* testsuite/libgomp.fortran/target-present-3.f90: New test.
gcc/testsuite/
* c-c++-common/gomp/map-6.c: Update dg-error, extend to test for
duplicated 'present' and extend scan-dump tests for 'present'.
* gfortran.dg/gomp/defaultmap-1.f90: Update dg-error.
* gfortran.dg/gomp/map-7.f90: Extend parse and dump test for
'present'.
* gfortran.dg/gomp/map-8.f90: Extend for duplicate 'present'
modifier checking.
* c-c++-common/gomp/defaultmap-4.c: New test.
* c-c++-common/gomp/map-9.c: New test.
* c-c++-common/gomp/target-update-1.c: New test.
* gfortran.dg/gomp/defaultmap-8.f90: New test.
* gfortran.dg/gomp/map-11.f90: New test.
* gfortran.dg/gomp/map-12.f90: New test.
* gfortran.dg/gomp/target-update-1.f90: New test.
... by using the existing 'goacc_asyncqueue' instead of re-coding parts of it.
Follow-up to commit 131d18e928
"libgomp/nvptx: Prepare for reverse-offload callback handling",
and commit ea4b23d9c8
"libgomp: Handle OpenMP's reverse offloads".
libgomp/
* target.c (gomp_target_rev): Instead of 'dev_to_host_cpy',
'host_to_dev_cpy', 'token', take a single 'goacc_asyncqueue'.
* libgomp.h (gomp_target_rev): Adjust.
* libgomp-plugin.c (GOMP_PLUGIN_target_rev): Adjust.
* libgomp-plugin.h (GOMP_PLUGIN_target_rev): Adjust.
* plugin/plugin-gcn.c (process_reverse_offload): Adjust.
* plugin/plugin-nvptx.c (rev_off_dev_to_host_cpy)
(rev_off_host_to_dev_cpy): Remove.
(GOMP_OFFLOAD_run): Adjust.
Thereby considerably simplify the device plugins' 'GOMP_OFFLOAD_openacc_exec',
'GOMP_OFFLOAD_openacc_async_exec' functions: in terms of lines of code, but in
particular conceptually: no more device memory allocation, host to device data
copying, device memory deallocation -- 'GOMP_MAP_VARS_TARGET' does all that for
us.
This depends on commit 2b2340e236
"Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data",
where I said that "a use will emerge later", which is this one here.
PR libgomp/90596
libgomp/
* target.c (gomp_map_vars_internal): Allow for
'param_kind == GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_TARGET'.
* oacc-parallel.c (GOACC_parallel_keyed): Pass
'GOMP_MAP_VARS_TARGET' to 'goacc_map_vars'.
* plugin/plugin-gcn.c (alloc_by_agent, gcn_exec)
(GOMP_OFFLOAD_openacc_exec, GOMP_OFFLOAD_openacc_async_exec):
Adjust, simplify.
(gomp_offload_free): Remove.
* plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec)
(GOMP_OFFLOAD_openacc_async_exec): Adjust, simplify.
(cuda_free_argmem): Remove.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
Adjust.
This does *allow*, but under no circumstances is this currently going to be
used: all potentially applicable data is non-'ephemeral', and thus not
considered for 'gomp_coalesce_buf_add' for OpenACC 'async'. (But a use will
emerge later.)
Follow-up to commit r12-2530-gd88a6951586c7229b25708f4486eaaf4bf4b5bbe
"Don't use libgomp 'cbuf' buffering with OpenACC 'async'", addressing this
TODO comment:
TODO ... but we could allow CBUF usage for EPHEMERAL data? (Open question:
is it more performant to use libgomp CBUF buffering or individual device
asyncronous copying?)
Ephemeral data is small, and therefore individual device asyncronous copying
does seem dubious -- in particular given that for all those, we'd individually
have to allocate and queue for deallocation a temporary buffer to capture the
ephemeral data. Instead, just let the 'cbuf' *be* the temporary buffer.
libgomp/
* target.c (gomp_copy_host2dev, gomp_map_vars_internal): Allow
libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral'
data.
For 'OFFSET_INLINED', 'gomp_map_val' does the right thing, and we may then
simplify the device plugins accordingly.
This is a follow-up to
Subversion r279551 (Git commit a6163563f2)
"Add OpenACC 2.6's no_create",
Subversion r279622 (Git commit 5bcd470bf0)
"Use gomp_map_val for OpenACC host-to-device address translation".
libgomp/
* target.c (gomp_map_vars_internal): Use 'OFFSET_INLINED' for
'GOMP_MAP_IF_PRESENT'.
* plugin/plugin-gcn.c (gcn_exec, GOMP_OFFLOAD_openacc_exec)
(GOMP_OFFLOAD_openacc_async_exec): Adjust.
* plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec)
(GOMP_OFFLOAD_openacc_async_exec): Likewise.
* testsuite/libgomp.oacc-c-c++-common/no_create-1.c: Add 'async'
testing.
* testsuite/libgomp.oacc-c-c++-common/no_create-2.c: Likewise.
libgomp/
* target.c (gomp_target_rev): Dereference ptr
to get device address.
* testsuite/libgomp.fortran/reverse-offload-5.f90: Add test
for unallocated allocatable.
As GOMP_MAP_ALWAYS_POINTER operates on the previous map item, ensure that
with 'target enter data' both are passed together to gomp_map_vars_internal.
libgomp/ChangeLog:
* target.c (gomp_map_vars_internal): Add 'i > 0' before doing a
kind check.
(GOMP_target_enter_exit_data): If the next map item is
GOMP_MAP_ALWAYS_POINTER map it together with the current item.
* testsuite/libgomp.fortran/target-enter-data-3.f90: New test.
If there is nothing to map, skip the mapping and avoid attempting to
copy 0 bytes from addrs, sizes and kinds.
Additionally, it could happen that a non-allocated address was deallocated,
such as a pointer set, leading to a free for the actual data.
libgomp/
* target.c (gomp_target_rev): Handle mapnum == 0 and avoid
freeing not allocated memory.
* testsuite/libgomp.fortran/reverse-offload-6.f90: New test.
This commit enabled reverse offload for nvptx such that gomp_target_rev
actually gets called. And it fills the latter function to do all of
the following: finding the host function to the device func ptr and
copying the arguments to the host, processing the mapping/firstprivate,
calling the host function, copying back the data and freeing as needed.
The data handling is made easier by assuming that all host variables
either existed before (and are in the mapping) or that those are
devices variables not yet available on the host. Thus, the reverse
mapping can do without refcounts etc. Note that the spec disallows
inside a target region device-affecting constructs other than target
plus ancestor device-modifier and it also limits the clauses permitted
on this construct.
For the function addresses, an additional splay tree is used; for
the lookup of mapped variables, the existing splay-tree is used.
Unfortunately, its data structure requires a full walk of the tree;
Additionally, the just mapped variables are recorded in a separate
data structure an extra lookup. While the lookup is slow, assuming
that only few variables get mapped in each reverse offload construct
and that reverse offload is the exception and not performance critical,
this seems to be acceptable.
libgomp/ChangeLog:
* libgomp.h (struct target_mem_desc): Predeclare; move
below after 'reverse_splay_tree_node' and add rev_array
member.
(struct reverse_splay_tree_key_s, reverse_splay_compare): New.
(reverse_splay_tree_node, reverse_splay_tree,
reverse_splay_tree_key): New typedef.
(struct gomp_device_descr): Add mem_map_rev member.
* oacc-host.c (host_dispatch): NULL init .mem_map_rev.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim
support for GOMP_REQUIRES_REVERSE_OFFLOAD.
* splay-tree.h (splay_tree_callback_stop): New typedef; like
splay_tree_callback but returning int not void.
(splay_tree_foreach_lazy): Define; like splay_tree_foreach but
taking splay_tree_callback_stop as argument.
* splay-tree.c (splay_tree_foreach_internal_lazy,
splay_tree_foreach_lazy): New; but early exit if callback returns
nonzero.
* target.c: Instatiate splay_tree_c with splay_tree_prefix 'reverse'.
(gomp_map_lookup_rev): New.
(gomp_load_image_to_device): Handle reverse-offload function
lookup table.
(gomp_unload_image_from_device): Free devicep->mem_map_rev.
(struct gomp_splay_tree_rev_lookup_data, gomp_splay_tree_rev_lookup,
gomp_map_rev_lookup, struct cpy_data, gomp_map_cdata_lookup_int,
gomp_map_cdata_lookup): New auxiliary structs and functions for
gomp_target_rev.
(gomp_target_rev): Implement reverse offloading and its mapping.
(gomp_target_init): Init current_device.mem_map_rev.root.
* testsuite/libgomp.fortran/reverse-offload-2.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-3.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-4.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-5.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-5a.f90: New test without
mapping of on-device allocated variables.
This patch adds support for omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of
device-specific ICV values (specified as environment variables or changed on a
device). In order to reuse device-specific ICV values, a copy back mechanism is
implemented that copies ICV values back from device to the host.
Additionally, a limitation of the number of teams on gcn offload devices is
implemented. The number of teams is limited by twice the number of compute
units (one team is executed on one compute unit). This avoids queueing
unnessecary many teams and a corresponding allocation of large amounts of
memory. Without that limitation the memory allocation for a large number of
user-specified teams can result in an "memory access fault".
A limitation of the number of teams is already also implemented for nvptx
devices (see nvptx_adjust_launch_bounds in libgomp/plugin/plugin-nvptx.c).
gcc/ChangeLog:
* gimplify.cc (optimize_target_teams): Set initial num_teams_upper
to "-2" instead of "1" for non-existing num_teams clause in order to
disambiguate from the case of an existing num_teams clause with value 1.
libgomp/ChangeLog:
* config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
allow processing of device-specific values.
(omp_set_teams_thread_limit): Likewise.
(ialias): Likewise.
* config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
(omp_set_teams_thread_limit): Likewise.
(ialias): Likewise.
* icv-device.c (omp_get_teams_thread_limit): Likewise.
(ialias): Likewise.
(omp_set_teams_thread_limit): Likewise.
* icv.c (omp_set_teams_thread_limit): Removed.
(omp_get_teams_thread_limit): Likewise.
(ialias): Likewise.
* libgomp.texi: Updated documentation for nvptx and gcn corresponding
to the limitation of the number of teams.
* plugin/plugin-gcn.c (limit_teams): New helper function that limits
the number of teams by twice the number of compute units.
(parse_target_attributes): Limit the number of teams on gcn offload
devices.
* target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
handling.
(gomp_load_image_to_device): Added a size check for the ICVs struct
variable.
(gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
copy back the ICV values from device to host.
(GOMP_target_ext): Update the number of teams and threads in the kernel
args also considering device-specific values.
* testsuite/libgomp.c-c++-common/icv-4.c: Fixed an error in the reading
of OMP_TEAMS_THREAD_LIMIT from the environment.
* testsuite/libgomp.c-c++-common/icv-5.c: Extended.
* testsuite/libgomp.c-c++-common/icv-6.c: Extended.
* testsuite/libgomp.c-c++-common/icv-7.c: Extended.
* testsuite/libgomp.c-c++-common/icv-9.c: New test.
* testsuite/libgomp.fortran/icv-5.f90: New test.
* testsuite/libgomp.fortran/icv-6.f90: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/target-teams-1.c: Adapt expected values for
num_teams from "1" to "-2" in cases without num_teams clause.
* g++.dg/gomp/target-teams-1.C: Likewise.
* gfortran.dg/gomp/defaultmap-4.f90: Likewise.
* gfortran.dg/gomp/defaultmap-5.f90: Likewise.
* gfortran.dg/gomp/defaultmap-6.f90: Likewise.
This patch adds a stub 'gomp_target_rev' in the host's target.c, which will
later handle the reverse offload.
For nvptx, it adds support for forwarding the offload gomp_target_ext call
to the host by setting values in a struct on the device and querying it on
the host - invoking gomp_target_rev on the result.
include/ChangeLog:
* cuda/cuda.h (enum CUdevice_attribute): Add
CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING.
(CU_MEMHOSTALLOC_DEVICEMAP): Define.
(cuMemHostAlloc): Add prototype.
libgomp/ChangeLog:
* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Remove
'static' for this variable.
* config/nvptx/libgomp-nvptx.h: New file.
* config/nvptx/target.c: Include it.
(GOMP_ADDITIONAL_ICVS): Declare extern var.
(GOMP_REV_OFFLOAD_VAR): Declare var.
(GOMP_target_ext): Handle reverse offload.
* libgomp-plugin.h (GOMP_PLUGIN_target_rev): New prototype.
* libgomp-plugin.c (GOMP_PLUGIN_target_rev): New, call ...
* target.c (gomp_target_rev): ... this new stub function.
* libgomp.h (gomp_target_rev): Declare.
* libgomp.map (GOMP_PLUGIN_1.4): New; add GOMP_PLUGIN_target_rev.
* plugin/cuda-lib.def (cuMemHostAlloc): Add.
* plugin/plugin-nvptx.c: Include libgomp-nvptx.h.
(struct ptx_device): Add rev_data member.
(nvptx_open_device): Remove async_engines query, last used in
r10-304-g1f4c5b9b; add unified-address assert check.
(GOMP_OFFLOAD_get_num_devices): Claim unified address
support.
(GOMP_OFFLOAD_load_image): Free rev_fn_table if no
offload functions exist. Make offload var available
on host and device.
(rev_off_dev_to_host_cpy, rev_off_host_to_dev_cpy): New.
(GOMP_OFFLOAD_run): Handle reverse offload.
When not in explicit parallel/target/teams construct, we in some cases create
an artificial parallel with a single thread (either to handle target nowait
or for task reduction purposes). In those cases, it handled again artificially
created implicit task (created by gomp_new_icv for cases where we needed to write
to some ICVs), but as the testcases show, didn't take into account possibility
of this being done from explicit task(s). The code would destroy/free the previous
task and replace it with the new implicit task. If task is an explicit task
(when teams is NULL, all explicit tasks behave like if (0)), it is a pointer to
a local stack variable, so freeing it doesn't work, and additionally we shouldn't
lose the explicit tasks - the new implicit task should instead replace the
ancestor task which is the first implicit one.
2022-10-12 Jakub Jelinek <jakub@redhat.com>
* task.c (gomp_create_artificial_team): Fix up handling of invocations
from within explicit task.
* target.c (GOMP_target_ext): Likewise.
* testsuite/libgomp.c/task-7.c: New test.
* testsuite/libgomp.c/task-8.c: New test.
* testsuite/libgomp.c-c++-common/task-reduction-17.c: New test.
* testsuite/libgomp.c-c++-common/task-reduction-18.c: New test.
Prepare for reverse-offloading function-pointer lookup by passing
a rev_fn_table argument to GOMP_OFFLOAD_load_image.
The argument will be NULL, unless GOMP_REQUIRES_REVERSE_OFFLOAD is
requested and devices not supported it, are filtered out.
(Up to and including this commit, no non-host device claims such
support and the caller currently always passes NULL.)
libgomp/ChangeLog:
* libgomp-plugin.h (GOMP_OFFLOAD_load_image): Add
'uint64_t **rev_fn_table' argument.
* oacc-host.c (host_load_image): Likewise.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Likewise;
currently unused.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
* target.c (gomp_load_image_to_device): Update call but pass
NULL for now.
liboffloadmic/ChangeLog:
* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_load_image):
Add (unused) uint64_t **rev_fn_table argument.
This patch considers the environment variable syntax extension for
device-specific variants of environment variables from OpenMP 5.1 (see
OpenMP 5.1 specification, p. 75 and p. 639). An environment variable (e.g.
OMP_NUM_TEAMS) can have different suffixes:
_DEV (e.g. OMP_NUM_TEAMS_DEV): affects all devices but not the host.
_DEV_<device> (e.g. OMP_NUM_TEAMS_DEV_42): affects only device with
number <device>.
no suffix (e.g. OMP_NUM_TEAMS): affects only the host.
In future OpenMP versions also suffix _ALL will be introduced (see discussion
https://github.com/OpenMP/spec/issues/3179). This is also considered in this
patch:
_ALL (e.g. OMP_NUM_TEAMS_ALL): affects all devices and the host.
The precedence is as follows (descending). For the host:
1. no suffix
2. _ALL
For devices:
1. _DEV_<device>
2. _DEV
3. _ALL
That means, _DEV_<device> is used whenever available. Otherwise _DEV is used if
available, and at last _ALL. If there is no value for any of the variable
variants, default values are used as already implemented before.
This patch concerns parsing (a), storing (b), output (c) and transmission to the
device (d):
(a) The actual number of devices and the numbering are not known when parsing
the environment variables. Thus all environment variables are iterated and
searched for device-specific ones.
(b) Only configured device-specific variables are stored. Thus, a linked list
is used.
(c) The output is done in omp_display_env (see specification p. 468f). Global
ICVs are tagged with [all], see https://github.com/OpenMP/spec/issues/3179.
ICVs which are not global but aren't handled device-specific yet are tagged
with [host]. omp_display_env outputs the initial values of the ICVs. That is
why a dedicated data structure is introduced for the inital values only
(gomp_initial_icv_list).
(d) Device-specific ICVs are transmitted to the device via GOMP_ADDITIONAL_ICVS.
libgomp/ChangeLog:
* config/gcn/icv-device.c (omp_get_default_device): Return device-
specific ICV.
(omp_get_max_teams): Added for GCN devices.
(omp_set_num_teams): Likewise.
(ialias): Likewise.
* config/nvptx/icv-device.c (omp_get_default_device): Return device-
specific ICV.
(omp_get_max_teams): Added for NVPTX devices.
(omp_set_num_teams): Likewise.
(ialias): Likewise.
* env.c (struct gomp_icv_list): New struct to store entries of initial
ICV values.
(struct gomp_offload_icv_list): New struct to store entries of device-
specific ICV values that are copied to the device and back.
(struct gomp_default_icv_values): New struct to store default values of
ICVs according to the OpenMP standard.
(parse_schedule): Generalized for different variants of OMP_SCHEDULE.
(print_env_var_error): Function that prints an error for invalid values
for ICVs.
(parse_unsigned_long_1): Removed getenv. Generalized.
(parse_unsigned_long): Likewise.
(parse_int_1): Likewise.
(parse_int): Likewise.
(parse_int_secure): Likewise.
(parse_unsigned_long_list): Likewise.
(parse_target_offload): Likewise.
(parse_bind_var): Likewise.
(parse_stacksize): Likewise.
(parse_boolean): Likewise.
(parse_wait_policy): Likewise.
(parse_allocator): Likewise.
(omp_display_env): Extended to output different variants of environment
variables.
(print_schedule): New helper function for omp_display_env which prints
the values of run_sched_var.
(print_proc_bind): New helper function for omp_display_env which prints
the values of proc_bind_var.
(enum gomp_parse_type): Collection of types used for parsing environment
variables.
(ENTRY): Preprocess string lengths of environment variables.
(OMP_VAR_CNT): Preprocess table size.
(OMP_HOST_VAR_CNT): Likewise.
(INT_MAX_STR_LEN): Constant for the maximal number of digits of a device
number.
(gomp_get_icv_flag): Returns if a flag for a particular ICV is set.
(gomp_set_icv_flag): Sets a flag for a particular ICV.
(print_device_specific_icvs): New helper function for omp_display_env to
print device specific ICV values.
(get_device_num): New helper function for parse_device_specific.
Extracts the device number from an environment variable name.
(get_icv_member_addr): Gets the memory address for a particular member
of an ICV struct.
(gomp_get_initial_icv_item): Get a list item of gomp_initial_icv_list.
(initialize_icvs): New function to initialize a gomp_initial_icvs
struct.
(add_initial_icv_to_list): Adds an ICV struct to gomp_initial_icv_list.
(startswith): Checks if a string starts with a given prefix.
(initialize_env): Extended to parse the new syntax of environment
variables.
* icv-device.c (omp_get_max_teams): Added.
(ialias): Likewise.
(omp_set_num_teams): Likewise.
* icv.c (omp_set_num_teams): Moved to icv-device.c.
(omp_get_max_teams): Likewise.
(ialias): Likewise.
* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Removed.
(GOMP_ADDITIONAL_ICVS): New target-side struct that
holds the designated ICVs of the target device.
* libgomp.h (enum gomp_icvs): Collection of ICVs.
(enum gomp_device_num): Definition of device numbers for _ALL, _DEV, and
no suffix.
(enum gomp_env_suffix): Collection of possible suffixes of environment
variables.
(struct gomp_initial_icvs): Contains all ICVs for which we need to store
initial values.
(struct gomp_default_icv):New struct to hold ICVs for which we need
to store initial values.
(struct gomp_icv_list): Definition of a linked list that is used for
storing ICVs for the devices and also for _DEV, _ALL, and without
suffix.
(struct gomp_offload_icvs): New struct to hold ICVs that are copied to
a device.
(struct gomp_offload_icv_list): Definition of a linked list that holds
device-specific ICVs that are copied to devices.
(gomp_get_initial_icv_item): Get a list item of gomp_initial_icv_list.
(gomp_get_icv_flag): Returns if a flag for a particular ICV is set.
* libgomp.texi: Updated.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Extended to read
further ICVs from the offload image.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
* target.c (gomp_get_offload_icv_item): Get a list item of
gomp_offload_icv_list.
(get_gomp_offload_icvs): New. Returns the ICV values
depending on the device num and the variable hierarchy.
(gomp_load_image_to_device): Extended to copy further ICVs to a device.
* testsuite/libgomp.c-c++-common/icv-5.c: New test.
* testsuite/libgomp.c-c++-common/icv-6.c: New test.
* testsuite/libgomp.c-c++-common/icv-7.c: New test.
* testsuite/libgomp.c-c++-common/icv-8.c: New test.
* testsuite/libgomp.c-c++-common/omp-display-env-1.c: New test.
* testsuite/libgomp.c-c++-common/omp-display-env-2.c: New test.
Contrary to gomp_{error,warning,fatal}, no tailing '\n' is added with
gomp_debug; only affected was a 'requires'-related output.
libgomp/ChangeLog:
* target.c (gomp_target_init): Added tailing '\n' to gomp_debug.
Similar to how the other 'mkoffload's got changed in
recent commit 683f118439
"OpenMP: Move omp requires checks to libgomp".
This also means finally switching Intel MIC 'mkoffload' to
'GOMP_offload_register_ver', 'GOMP_offload_unregister_ver',
making 'GOMP_offload_register', 'GOMP_offload_unregister'
legacy entry points.
gcc/
* config/i386/intelmic-mkoffload.cc (generate_host_descr_file)
(prepare_target_image, main): Handle OpenMP 'requires'.
(generate_host_descr_file): Switch to 'GOMP_offload_register_ver',
'GOMP_offload_unregister_ver'.
libgomp/
* target.c (GOMP_offload_register, GOMP_offload_unregister):
Denote as legacy entry points.
* testsuite/lib/libgomp.exp
(check_effective_target_offload_target_any): New proc.
* testsuite/libgomp.c-c++-common/requires-1.c: Enable for
'offload_target_any'.
* testsuite/libgomp.c-c++-common/requires-3.c: Likewise.
* testsuite/libgomp.c-c++-common/requires-7.c: Likewise.
* testsuite/libgomp.fortran/requires-1.f90: Likewise.
Handle reverse_offload, unified_address, and unified_shared_memory
requirements in libgomp by saving them alongside the offload table.
When the device lto1 runs, it extracts the data for mkoffload. The
latter than passes the value on to GOMP_offload_register_ver.
lto1 (either the host one, with -flto [+ ENABLE_OFFLOADING], or in the
offload-device lto1) also does the the consistency check is done,
erroring out when the 'omp requires' clause use is inconsistent.
For all in-principle supported devices, if a requirement cannot be fulfilled,
the device is excluded from the (supported) devices list. Currently, none of
those requirements are marked as supported for any of the non-host devices.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_target_data, c_parser_omp_target_update,
c_parser_omp_target_enter_data, c_parser_omp_target_exit_data): Set
OMP_REQUIRES_TARGET_USED.
(c_parser_omp_requires): Remove sorry.
gcc/ChangeLog:
* config/gcn/mkoffload.cc (process_asm): Write '#include <stdint.h>'.
(process_obj): Pass omp_requires_mask to GOMP_offload_register_ver.
(main): Ask lto1 to obtain omp_requires_mask and pass it on.
* config/nvptx/mkoffload.cc (process, main): Likewise.
* lto-cgraph.cc (omp_requires_to_name): New.
(input_offload_tables): Save omp_requires_mask.
(output_offload_tables): Read it, check for consistency,
save value for mkoffload.
* omp-low.cc (lower_omp_target): Force output_offloadtables
call for OMP_REQUIRES_TARGET_USED.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_target_data,
cp_parser_omp_target_enter_data, cp_parser_omp_target_exit_data,
cp_parser_omp_target_update): Set OMP_REQUIRES_TARGET_USED.
(cp_parser_omp_requires): Remove sorry.
gcc/fortran/ChangeLog:
* openmp.cc (gfc_match_omp_requires): Remove sorry.
* parse.cc (decode_omp_directive): Don't regard 'declare target'
as target usage for 'omp requires'; add more flags to
omp_requires_mask.
include/ChangeLog:
* gomp-constants.h (GOMP_VERSION): Bump to 2.
(GOMP_REQUIRES_UNIFIED_ADDRESS, GOMP_REQUIRES_UNIFIED_SHARED_MEMORY,
GOMP_REQUIRES_REVERSE_OFFLOAD, GOMP_REQUIRES_TARGET_USED):
New defines.
libgomp/ChangeLog:
* libgomp-plugin.h (GOMP_OFFLOAD_get_num_devices): Add
omp_requires_mask arg.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Likewise;
return -1 when device available but omp_requires_mask != 0.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Likewise.
* oacc-host.c (host_get_num_devices, host_openacc_get_property):
Update call.
* oacc-init.c (resolve_device, acc_init_1, acc_shutdown_1,
goacc_attach_host_thread_to_device, acc_get_num_devices,
acc_set_device_num, get_property_any): Likewise.
* target.c (omp_requires_mask): New global var.
(gomp_requires_to_name): New.
(GOMP_offload_register_ver): Handle passed omp_requires_mask.
(gomp_target_init): Handle omp_requires_mask.
* libgomp.texi (OpenMP 5.0): Update requires impl. status.
(OpenMP 5.1): Add a missed item.
(OpenMP 5.2): Mark linear-clause change as supported in C/C++.
* testsuite/libgomp.c-c++-common/requires-1-aux.c: New test.
* testsuite/libgomp.c-c++-common/requires-1.c: New test.
* testsuite/libgomp.c-c++-common/requires-2-aux.c: New test.
* testsuite/libgomp.c-c++-common/requires-2.c: New test.
* testsuite/libgomp.c-c++-common/requires-3-aux.c: New test.
* testsuite/libgomp.c-c++-common/requires-3.c: New test.
* testsuite/libgomp.c-c++-common/requires-4-aux.c: New test.
* testsuite/libgomp.c-c++-common/requires-4.c: New test.
* testsuite/libgomp.c-c++-common/requires-5-aux.c: New test.
* testsuite/libgomp.c-c++-common/requires-5.c: New test.
* testsuite/libgomp.c-c++-common/requires-6.c: New test.
* testsuite/libgomp.c-c++-common/requires-7-aux.c: New test.
* testsuite/libgomp.c-c++-common/requires-7.c: New test.
* testsuite/libgomp.fortran/requires-1-aux.f90: New test.
* testsuite/libgomp.fortran/requires-1.f90: New test.
liboffloadmic/ChangeLog:
* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_num_devices):
Return -1 when device available but omp_requires_mask != 0.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/requires-4.c: Update dg-*.
* c-c++-common/gomp/reverse-offload-1.c: Likewise.
* c-c++-common/gomp/target-device-ancestor-2.c: Likewise.
* c-c++-common/gomp/target-device-ancestor-3.c: Likewise.
* c-c++-common/gomp/target-device-ancestor-4.c: Likewise.
* c-c++-common/gomp/target-device-ancestor-5.c: Likewise.
* gfortran.dg/gomp/target-device-ancestor-3.f90: Likewise.
* gfortran.dg/gomp/target-device-ancestor-4.f90: Likewise.
* gfortran.dg/gomp/target-device-ancestor-5.f90: Likewise.
* gfortran.dg/gomp/target-device-ancestor-2.f90: Likewise. Move
post-FE checks to ...
* gfortran.dg/gomp/target-device-ancestor-2a.f90: ... this new file.
* gfortran.dg/gomp/requires-8.f90: Update as we don't regard
'declare target' for the 'requires' usage requirement.
Co-authored-by: Chung-Lin Tang <cltang@codesourcery.com>
Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
OpenMP 5.2 changed once more what device numbers are allowed.
In 5.1, valid device numbers were [0, omp_get_num_devices()].
5.2 makes also -1 valid (calls it omp_initial_device), which is equivalent
in behavior to omp_get_num_devices() number but has the advantage that it
is a constant. And it also introduces omp_invalid_device which is
also a constant with implementation defined value < -1. That value should
act like sNaN, any time any device construct (GOMP_target*) or OpenMP runtime
API routine is asked for such a device, the program is terminated.
And if OMP_TARGET_OFFLOAD=mandatory, all non-conforming device numbers (which
is all but [-1, omp_get_num_devices()] other than omp_invalid_device)
must be treated like omp_invalid_device.
For device constructs, we have a compatibility problem, we've historically
used 2 magic negative values to mean something special.
GOMP_DEVICE_ICV (-1) means device clause wasn't present, pick the
omp_get_default_device () number
GOMP_DEVICE_FALLBACK (-2) means the host device (this is used e.g. for
#pragma omp target if (cond)
where if cond is false, we pass -2
But 5.2 requires that omp_initial_device is -1 (there were discussions
about it, advantage of -1 is that one can say iterate over the
[-1, omp_get_num_devices()-1] range to get all devices starting with
the host/initial one.
And also, if user passes -2, unless it is omp_invalid_device, we need to
treat it like non-conforming with OMP_TARGET_OFFLOAD=mandatory.
So, the patch does on the compiler side some number remapping,
user_device_num >= -2U ? user_device_num - 1 : user_device_num.
This remapping is done at compile time if device clause has constant
argument, otherwise at runtime, and means that for user -1 (omp_initial_device)
we pass -2 to GOMP_* in the runtime library where it treats it like host
fallback, while -2 is remapped to -3 (one of the non-conforming device numbers,
for those it doesn't matter which one is which).
omp_invalid_device is then -4.
For the OpenMP device runtime APIs, no remapping is done.
This patch doesn't deal with the initial default-device-var for
OMP_TARGET_OFFLOAD=mandatory , the spec says that the inital ICV value
for that should in that case depend on whether there are any offloading
devices or not (if not, should be omp_invalid_device), but that means
we can't determine the number of devices lazily (and let libraries have the
possibility to register their offloading data etc.).
2022-06-13 Jakub Jelinek <jakub@redhat.com>
gcc/
* omp-expand.cc (expand_omp_target): Remap user provided
device clause arguments, -1 to -2 and -2 to -3, either
at compile time if constant, or at runtime.
include/
* gomp-constants.h (GOMP_DEVICE_INVALID): Define.
libgomp/
* omp.h.in (omp_initial_device, omp_invalid_device): New enumerators.
* omp_lib.f90.in (omp_initial_device, omp_invalid_device): New
parameters.
* omp_lib.h.in (omp_initial_device, omp_invalid_device): Likewise.
* target.c (resolve_device): Add remapped argument, handle
GOMP_DEVICE_ICV only if remapped is true (and clear remapped),
for negative values, treat GOMP_DEVICE_FALLBACK as fallback only
if remapped, otherwise treat omp_initial_device that way. For
omp_invalid_device, always emit gomp_fatal, even when
OMP_TARGET_OFFLOAD isn't mandatory.
(GOMP_target, GOMP_target_ext, GOMP_target_data, GOMP_target_data_ext,
GOMP_target_update, GOMP_target_update_ext,
GOMP_target_enter_exit_data): Pass true as remapped argument to
resolve_device.
(omp_target_alloc, omp_target_free, omp_target_is_present,
omp_target_memcpy_check, omp_target_associate_ptr,
omp_target_disassociate_ptr, omp_get_mapped_ptr,
omp_target_is_accessible): Pass false as remapped argument to
resolve_device. Treat omp_initial_device the same as
gomp_get_num_devices (). Don't bypass resolve_device calls if
device_num is negative.
(omp_pause_resource): Treat omp_initial_device the same as
gomp_get_num_devices (). Call resolve_device.
* icv-device.c (omp_set_default_device): Always set to device_num
even when it is negative.
* libgomp.texi: Document that Conforming device numbers,
omp_initial_device and omp_invalid_device is implemented.
* testsuite/libgomp.c/target-41.c (main): Add test with
omp_initial_device.
* testsuite/libgomp.c/target-45.c: New test.
* testsuite/libgomp.c/target-46.c: New test.
* testsuite/libgomp.c/target-47.c: New test.
* testsuite/libgomp.c-c++-common/target-is-accessible-1.c (main): Add
test with omp_initial_device. Use -5 instead of -1 for negative value
test.
* testsuite/libgomp.fortran/target-is-accessible-1.f90 (main):
Likewise. Reorder stop numbers.
For allocatable/pointer arrays, a firstprivate to a device
not only needs to privatize the descriptor but also the actual
data. This is implemented as:
firstprivate(x) firstprivate(x.data) attach(x [bias: &x.data-&x)
where the address of x in device memory is saved in hostaddrs[i]
by libgomp and the middle end actually passes hostaddrs[i]' to
attach.
As side effect, has_device_addr(array_desc) had to be changed:
before, it was converted to firstprivate in the front end; now
it is handled in omp-low.cc as has_device_addr requires a shallow
firstprivate (not touching the data pointer) while the normal
firstprivate requires (now) a deep firstprivate.
gcc/fortran/ChangeLog:
PR fortran/104949
* f95-lang.cc (LANG_HOOKS_OMP_ARRAY_SIZE): Redefine.
* trans-openmp.cc (gfc_omp_array_size): New.
(gfc_trans_omp_variable_list): Never turn has_device_addr
to firstprivate.
* trans.h (gfc_omp_array_size): New.
gcc/ChangeLog:
PR fortran/104949
* langhooks-def.h (lhd_omp_array_size): New.
(LANG_HOOKS_OMP_ARRAY_SIZE): Define.
(LANG_HOOKS_DECLS): Add it.
* langhooks.cc (lhd_omp_array_size): New.
* langhooks.h (struct lang_hooks_for_decls): Add hook.
* omp-low.cc (scan_sharing_clauses, lower_omp_target):
Handle GOMP_MAP_FIRSTPRIVATE for array descriptors.
libgomp/ChangeLog:
PR fortran/104949
* target.c (gomp_map_vars_internal, copy_firstprivate_data):
Support attach for GOMP_MAP_FIRSTPRIVATE.
* testsuite/libgomp.fortran/target-firstprivate-1.f90: New test.
* testsuite/libgomp.fortran/target-firstprivate-2.f90: New test.
* testsuite/libgomp.fortran/target-firstprivate-3.f90: New test.
This patch adds two new OpenMP runtime routines: omp_target_memcpy_async and
omp_target_memcpy_rect_async. Both functions are introduced in OpenMP 5.1 as
asynchronous variants of omp_target_memcpy and omp_target_memcpy_rect.
In contrast to the synchronous variants, the asynchronous functions have two
additional function parameters to allow the specification of task dependences:
int depobj_count
omp_depend_t *depobj_list
integer(c_int), value :: depobj_count
integer(omp_depend_kind), optional :: depobj_list(*)
The implementation splits the synchronous functions into two parts: (a) check
and (b) copy. Then (a) is used in the asynchronous functions for the sequential
part, and the actual copy process (b) is executed in a new created task. The
sequential part (a) takes into account the requirements for the return values:
"The routine returns zero if successful. Otherwise, it returns a non-zero
value." (omp_target_memcpy_async, OpenMP 5.1 spec, section 3.8.7)
"An application can determine the number of inclusive dimensions supported by an
implementation by passing NULL pointers (or C_NULL_PTR, for Fortran) for both
dst and src. The routine returns the number of dimensions supported by the
implementation for the specified device numbers. No copy operation is
performed." (omp_target_memcpy_rect_async, OpenMP 5.1 spec, section 3.8.8)
Due to asynchronicity an error is thrown if the asynchronous memcpy is not
successful (in contrast to the synchronous functions which use a return
value unequal to zero).
gcc/ChangeLog:
* omp-low.cc (omp_runtime_api_call): Added target_memcpy_async and
target_memcpy_rect_async to omp_runtime_apis array.
libgomp/ChangeLog:
* libgomp.map: Added omp_target_memcpy_async and
omp_target_memcpy_rect_async.
* libgomp.texi: Both functions are now supported.
* omp.h.in: Added omp_target_memcpy_async and
omp_target_memcpy_rect_async.
* omp_lib.f90.in: Added interfaces for both new functions.
* omp_lib.h.in: Likewise.
* target.c (ialias_redirect): Added for GOMP_task.
(omp_target_memcpy): Restructured into check and copy part.
(omp_target_memcpy_check): New helper function for omp_target_memcpy and
omp_target_memcpy_async that checks requirements.
(omp_target_memcpy_copy): New helper function for omp_target_memcpy and
omp_target_memcpy_async that performs the memcpy.
(omp_target_memcpy_async_helper): New helper function that is used in
omp_target_memcpy_async for the asynchronous task.
(omp_target_memcpy_async): Added.
(omp_target_memcpy_rect): Restructured into check and copy part.
(omp_target_memcpy_rect_check): New helper function for
omp_target_memcpy_rect and omp_target_memcpy_rect_async that checks
requirements.
(omp_target_memcpy_rect_copy): New helper function for
omp_target_memcpy_rect and omp_target_memcpy_rect_async that performs
the memcpy.
(omp_target_memcpy_rect_async_helper): New helper function that is used
in omp_target_memcpy_rect_async for the asynchronous task.
(omp_target_memcpy_rect_async): Added.
* task.c (ialias): Added for GOMP_task.
* testsuite/libgomp.c-c++-common/target-memcpy-async-1.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-async-2.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-rect-async-1.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-rect-async-2.c: New test.
* testsuite/libgomp.fortran/target-memcpy-async-1.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-async-2.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-rect-async-1.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-rect-async-2.f90: New test.
This patch adds the 'has_device_addr' clause to the OpenMP 'target' construct
which was introduced in OpenMP 5.1 (OpenMP API 5.1 specification pp. 197ff):
has_device_addr(list)
"The has_device_addr clause indicates that its list items already have device
addresses and therefore they may be directly accessed from a target device.
If the device address of a list item is not for the device on which the target
region executes, accessing the list item inside the region results in
unspecified behavior. The list items may include array sections." (p. 200)
"A list item may not be specified in both an is_device_ptr clause and a
has_device_addr clause on the directive." (p. 202)
"A list item that appears in an is_device_ptr or a has_device_addr clause must
not be specified in any data-sharing attribute clause on the same target
construct." (p. 203)
gcc/c-family/ChangeLog:
* c-omp.cc (c_omp_split_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case.
* c-pragma.h (enum pragma_kind): Added 5.1 in comment.
(enum pragma_omp_clause): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_clause_name): Parse 'has_device_addr'
clause.
(c_parser_omp_variable_list): Handle array sections.
(c_parser_omp_clause_has_device_addr): Added.
(c_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR
case.
(c_parser_omp_target_exit_data): Added HAS_DEVICE_ADDR to
OMP_CLAUSE_MASK.
* c-typeck.cc (handle_omp_array_sections): Handle clause restrictions.
(c_finish_omp_clauses): Handle array sections.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_clause_name): Parse 'has_device_addr' clause.
(cp_parser_omp_var_list_no_open): Handle array sections.
(cp_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR
case.
(cp_parser_omp_target_update): Added HAS_DEVICE_ADDR to OMP_CLAUSE_MASK.
* semantics.cc (handle_omp_array_sections): Handle clause restrictions.
(finish_omp_clauses): Handle array sections.
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (show_omp_clauses): Added OMP_LIST_HAS_DEVICE_ADDR
case.
* gfortran.h: Added OMP_LIST_HAS_DEVICE_ADDR.
* openmp.cc (enum omp_mask2): Added OMP_CLAUSE_HAS_DEVICE_ADDR.
(gfc_match_omp_clauses): Parse HAS_DEVICE_ADDR clause.
(resolve_omp_clauses): Same.
* trans-openmp.cc (gfc_trans_omp_variable_list): Added
OMP_LIST_HAS_DEVICE_ADDR case.
(gfc_trans_omp_clauses): Firstprivatize of array descriptors.
gcc/ChangeLog:
* gimplify.cc (gimplify_scan_omp_clauses): Added cases for
OMP_CLAUSE_HAS_DEVICE_ADDR
and handle array sections.
(gimplify_adjust_omp_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case.
* omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_HAS_DEVICE_ADDR.
(lower_omp_target): Same.
* tree-core.h (enum omp_clause_code): Same.
* tree-nested.cc (convert_nonlocal_omp_clauses): Same.
(convert_local_omp_clauses): Same.
* tree-pretty-print.cc (dump_omp_clause): Same.
* tree.cc: Same.
libgomp/ChangeLog:
* libgomp.texi: Updated entry for HAS_DEVICE_ADDR.
* target.c (copy_firstprivate_data): Copy only if host address is not
NULL.
* testsuite/libgomp.c++/target-has-device-addr-2.C: New test.
* testsuite/libgomp.c++/target-has-device-addr-4.C: New test.
* testsuite/libgomp.c++/target-has-device-addr-5.C: New test.
* testsuite/libgomp.c++/target-has-device-addr-6.C: New test.
* testsuite/libgomp.c-c++-common/target-has-device-addr-1.c: New test.
* testsuite/libgomp.c/target-has-device-addr-3.c: New test.
* testsuite/libgomp.fortran/target-has-device-addr-1.f90: New test.
* testsuite/libgomp.fortran/target-has-device-addr-2.f90: New test.
* testsuite/libgomp.fortran/target-has-device-addr-3.f90: New test.
* testsuite/libgomp.fortran/target-has-device-addr-4.f90: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/clauses-1.c: Added has_device_addr to test cases.
* g++.dg/gomp/attrs-1.C: Added has_device_addr to test cases.
* g++.dg/gomp/attrs-2.C: Added has_device_addr to test cases.
* c-c++-common/gomp/target-has-device-addr-1.c: New test.
* c-c++-common/gomp/target-has-device-addr-2.c: New test.
* c-c++-common/gomp/target-is-device-ptr-1.c: New test.
* c-c++-common/gomp/target-is-device-ptr-2.c: New test.
* gfortran.dg/gomp/is_device_ptr-3.f90: New test.
* gfortran.dg/gomp/target-has-device-addr-1.f90: New test.
* gfortran.dg/gomp/target-has-device-addr-2.f90: New test.
This patch implements three pieces of functionality:
(1) Adjust array section mapping to have standards conforming behavior,
mapping array sections should *NOT* also map the base-pointer:
struct S { int *ptr; ... };
struct S s;
Instead of generating this during gimplify:
map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0])
Now, adjust to:
(i.e. do not map the base-pointer together. The attach operation is still
generated, and if s.ptr is already mapped prior, attachment will happen)
The correct way of achieving the base-pointer-also-mapped behavior would be to
use:
(A small Fortran front-end patch to trans-openmp.c:gfc_trans_omp_array_section
is also included, which removes generation of a GOMP_MAP_ALWAYS_POINTER for
array types, which appears incorrect and causes a regression in
libgomp.fortranlibgomp.fortran/struct-elem-map-1.f90)
(2) Related to the first item above, are fixes in libgomp/target.c to not
overwrite attached pointers when handling device<->host copies, mainly for the
"always" case.
(3) The third is a set of changes to the C/C++ front-ends to extend the allowed
component access syntax in map clauses. These changes are enabled for both
OpenACC and OpenMP.
gcc/c/ChangeLog:
* c-parser.c (struct omp_dim): New struct type for use inside
c_parser_omp_variable_list.
(c_parser_omp_variable_list): Allow multiple levels of array and
component accesses in array section base-pointer expression.
(c_parser_omp_clause_to): Set 'allow_deref' to true in call to
c_parser_omp_var_list_parens.
(c_parser_omp_clause_from): Likewise.
* c-typeck.c (handle_omp_array_sections_1): Extend allowed range
of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
POINTER_PLUS_EXPR.
(c_finish_omp_clauses): Extend allowed ranged of expressions
involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.
gcc/cp/ChangeLog:
* parser.c (struct omp_dim): New struct type for use inside
cp_parser_omp_var_list_no_open.
(cp_parser_omp_var_list_no_open): Allow multiple levels of array and
component accesses in array section base-pointer expression.
(cp_parser_omp_all_clauses): Set 'allow_deref' to true in call to
cp_parser_omp_var_list for to/from clauses.
* semantics.c (handle_omp_array_sections_1): Extend allowed range
of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
POINTER_PLUS_EXPR.
(handle_omp_array_sections): Adjust pointer map generation of
references.
(finish_omp_clauses): Extend allowed ranged of expressions
involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.
gcc/fortran/ChangeLog:
* trans-openmp.c (gfc_trans_omp_array_section): Do not generate
GOMP_MAP_ALWAYS_POINTER map for main array maps of ARRAY_TYPE type.
gcc/ChangeLog:
* gimplify.c (extract_base_bit_offset): Add 'tree *offsetp' parameter,
accomodate case where 'offset' return of get_inner_reference is
non-NULL.
(is_or_contains_p): Further robustify conditions.
(omp_target_reorder_clauses): In alloc/to/from sorting phase, also
move following GOMP_MAP_ALWAYS_POINTER maps along. Add new sorting
phase where we make sure pointers with an attach/detach map are ordered
correctly.
(gimplify_scan_omp_clauses): Add modifications to avoid creating
GOMP_MAP_STRUCT and associated alloc map for attach/detach maps.
gcc/testsuite/ChangeLog:
* c-c++-common/goacc/deep-copy-arrayofstruct.c: Adjust testcase.
* c-c++-common/gomp/target-enter-data-1.c: New testcase.
* c-c++-common/gomp/target-implicit-map-2.c: New testcase.
libgomp/ChangeLog:
* target.c (gomp_map_vars_existing): Make sure attached pointer is
not overwritten during cross-host/device copying.
(gomp_update): Likewise.
(gomp_exit_data): Likewise.
* testsuite/libgomp.c++/target-11.C: Adjust testcase.
* testsuite/libgomp.c++/target-12.C: Likewise.
* testsuite/libgomp.c++/target-15.C: Likewise.
* testsuite/libgomp.c++/target-16.C: Likewise.
* testsuite/libgomp.c++/target-17.C: Likewise.
* testsuite/libgomp.c++/target-21.C: Likewise.
* testsuite/libgomp.c++/target-23.C: Likewise.
* testsuite/libgomp.c/target-23.c: Likewise.
* testsuite/libgomp.c/target-29.c: Likewise.
* testsuite/libgomp.c-c++-common/target-implicit-map-2.c: New testcase.
This patch implements several C++ specific mapping capabilities introduced for
OpenMP 5.0, including implicit mapping of this[:1] for non-static member
functions, zero-length array section mapping of pointer-typed members,
lambda captured variable access in target regions, and use of lambda objects
inside target regions.
Several adjustments to the C/C++ front-ends to allow more member-access syntax
as valid is also included.
PR middle-end/92120
gcc/cp/ChangeLog:
* cp-tree.h (finish_omp_target): New declaration.
(finish_omp_target_clauses): Likewise.
* parser.c (cp_parser_omp_clause_map): Adjust call to
cp_parser_omp_var_list_no_open to set 'allow_deref' argument to true.
(cp_parser_omp_target): Factor out code, adjust into calls to new
function finish_omp_target.
* pt.c (tsubst_expr): Add call to finish_omp_target_clauses for
OMP_TARGET case.
* semantics.c (handle_omp_array_sections_1): Add handling to create
'this->member' from 'member' FIELD_DECL. Remove case of rejecting
'this' when not in declare simd.
(handle_omp_array_sections): Likewise.
(finish_omp_clauses): Likewise. Adjust to allow 'this[]' in OpenMP
map clauses. Handle 'A->member' case in map clauses. Remove case of
rejecting 'this' when not in declare simd.
(struct omp_target_walk_data): New struct for walking over
target-directive tree body.
(finish_omp_target_clauses_r): New function for tree walk.
(finish_omp_target_clauses): New function.
(finish_omp_target): New function.
gcc/c/ChangeLog:
* c-parser.c (c_parser_omp_clause_map): Set 'allow_deref' argument in
call to c_parser_omp_variable_list to 'true'.
* c-typeck.c (handle_omp_array_sections_1): Add strip of MEM_REF in
array base handling.
(c_finish_omp_clauses): Handle 'A->member' case in map clauses.
gcc/ChangeLog:
* gimplify.c ("tree-hash-traits.h"): Add include.
(gimplify_scan_omp_clauses): Change struct_map_to_clause to type
hash_map<tree_operand, tree> *. Adjust struct map handling to handle
cases of *A and A->B expressions. Under !DECL_P case of
GOMP_CLAUSE_MAP handling, add STRIP_NOPS for indir_p case, add to
struct_deref_set for map(*ptr_to_struct) cases. Add MEM_REF case when
handling component_ref_p case. Add unshare_expr and gimplification
when created GOMP_MAP_STRUCT is not a DECL. Add code to add
firstprivate pointer for *pointer-to-struct case.
(gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for
exit data directives code to earlier position.
* omp-low.c (lower_omp_target):
Handle GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
* tree-pretty-print.c (dump_omp_clause): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/gomp/target-3.c: New testcase.
* g++.dg/gomp/target-3.C: New testcase.
* g++.dg/gomp/target-lambda-1.C: New testcase.
* g++.dg/gomp/target-lambda-2.C: New testcase.
* g++.dg/gomp/target-this-1.C: New testcase.
* g++.dg/gomp/target-this-2.C: New testcase.
* g++.dg/gomp/target-this-3.C: New testcase.
* g++.dg/gomp/target-this-4.C: New testcase.
* g++.dg/gomp/target-this-5.C: New testcase.
* g++.dg/gomp/this-2.C: Adjust testcase.
include/ChangeLog:
* gomp-constants.h (enum gomp_map_kind):
Add GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
(GOMP_MAP_POINTER_P):
Include GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION.
libgomp/ChangeLog:
* libgomp.h (gomp_attach_pointer): Add bool parameter.
* oacc-mem.c (acc_attach_async): Update call to gomp_attach_pointer.
(goacc_enter_data_internal): Likewise.
* target.c (gomp_map_vars_existing): Update assert condition to
include GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION.
(gomp_map_pointer): Add 'bool allow_zero_length_array_sections'
parameter, add support for mapping a pointer with NULL target.
(gomp_attach_pointer): Add 'bool allow_zero_length_array_sections'
parameter, add support for attaching a pointer with NULL target.
(gomp_map_vars_internal): Update calls to gomp_map_pointer and
gomp_attach_pointer, add handling for
GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION cases.
* testsuite/libgomp.c++/target-23.C: New testcase.
* testsuite/libgomp.c++/target-lambda-1.C: New testcase.
* testsuite/libgomp.c++/target-lambda-2.C: New testcase.
* testsuite/libgomp.c++/target-this-1.C: New testcase.
* testsuite/libgomp.c++/target-this-2.C: New testcase.
* testsuite/libgomp.c++/target-this-3.C: New testcase.
* testsuite/libgomp.c++/target-this-4.C: New testcase.
* testsuite/libgomp.c++/target-this-5.C: New testcase.
OpenMP 5.1 says that thread_limit clause can also appear on target,
and similarly to teams should affect the thread-limit-var ICV.
On combined target teams, the clause goes to both.
We actually passed thread_limit internally on target already before,
but only used it for gcn/ptx offloading to hint how many threads should be
created and for ptx didn't set thread_limit_var in that case.
Similarly for host fallback.
Also, I found that we weren't copying the args array that contains encoded
thread_limit and num_teams clause for target (etc.) for async target.
2021-11-15 Jakub Jelinek <jakub@redhat.com>
gcc/
* gimplify.c (optimize_target_teams): Only add OMP_CLAUSE_THREAD_LIMIT
to OMP_TARGET_CLAUSES if it isn't there already.
gcc/c-family/
* c-omp.c (c_omp_split_clauses) <case OMP_CLAUSE_THREAD_LIMIT>:
Duplicate to both OMP_TARGET and OMP_TEAMS.
gcc/c/
* c-parser.c (OMP_TARGET_CLAUSE_MASK): Add
PRAGMA_OMP_CLAUSE_THREAD_LIMIT.
gcc/cp/
* parser.c (OMP_TARGET_CLAUSE_MASK): Add
PRAGMA_OMP_CLAUSE_THREAD_LIMIT.
libgomp/
* task.c (gomp_create_target_task): Copy args array as well.
* target.c (gomp_target_fallback): Add args argument.
Set gomp_icv (true)->thread_limit_var if thread_limit is present.
(GOMP_target): Adjust gomp_target_fallback caller.
(GOMP_target_ext): Likewise.
(gomp_target_task_fn): Likewise.
* config/nvptx/team.c (gomp_nvptx_main): Set
gomp_global_icv.thread_limit_var.
* testsuite/libgomp.c-c++-common/thread-limit-1.c: New test.
This patch implements relaxing the requirements when a map with the implicit
attribute encounters an overlapping existing map. As the OpenMP 5.0 spec
describes on page 320, lines 18-27 (and 5.1 spec, page 352, lines 13-22):
"If a single contiguous part of the original storage of a list item with an
implicit data-mapping attribute has corresponding storage in the device data
environment prior to a task encountering the construct that is associated with
the map clause, only that part of the original storage will have corresponding
storage in the device data environment as a result of the map clause."
2021-11-12 Chung-Lin Tang <cltang@codesourcery.com>
include/ChangeLog:
* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_3): Define special bit macro.
(GOMP_MAP_IMPLICIT): New special map kind bits value.
(GOMP_MAP_FLAG_SPECIAL_BITS): Define helper mask for whole set of
special map kind bits.
(GOMP_MAP_IMPLICIT_P): New predicate macro for implicit map kinds.
gcc/ChangeLog:
* tree.h (OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P): New access macro for
'implicit' bit, using 'base.deprecated_flag' field of tree_node.
* tree-pretty-print.c (dump_omp_clause): Add support for printing
implicit attribute in tree dumping.
* gimplify.c (gimplify_adjust_omp_clauses_1):
Set OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P to 1 if map clause is implicitly
created.
(gimplify_adjust_omp_clauses): Adjust place of adding implicitly created
clauses, from simple append, to starting of list, after non-map clauses.
* omp-low.c (lower_omp_target): Add GOMP_MAP_IMPLICIT bits into kind
values passed to libgomp for implicit maps.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/target-implicit-map-1.c: New test.
* c-c++-common/goacc/combined-reduction.c: Adjust scan test pattern.
* c-c++-common/goacc/firstprivate-mappings-1.c: Likewise.
* c-c++-common/goacc/mdc-1.c: Likewise.
* g++.dg/goacc/firstprivate-mappings-1.C: Likewise.
libgomp/ChangeLog:
* target.c (gomp_map_vars_existing): Add 'bool implicit' parameter, add
implicit map handling to allow a "superset" existing map as valid case.
(get_kind): Adjust to filter out GOMP_MAP_IMPLICIT bits in return value.
(get_implicit): New function to extract implicit status.
(gomp_map_fields_existing): Adjust arguments in calls to
gomp_map_vars_existing, and add uses of get_implicit.
(gomp_map_vars_internal): Likewise.
* testsuite/libgomp.c-c++-common/target-implicit-map-1.c: New test.
The following patch implements what I've been talking about earlier,
honor that for explicit num_teams clause we create at least the
lower-bound (if not specified, upper-bound) teams in the league.
For host fallback, it still means we only have one thread doing all the
teams, sequentially one after another.
For PTX and GCN, I think the new teams-2.c test and maybe teams-4.c too
will or might fail.
For these offloads, I think it is ok to remove symbols no longer used
from libgomp.a.
If num_teams_lower is bigger than the provided num_blocks or num_workgroups,
we should arrange for gomp_num_teams_var to be num_teams_lower - 1,
stop using the %ctaid.x or __builtin_gcn_dim_pos (0) for omp_get_team_num ()
and instead use for it some .shared var that GOMP_teams4 initializes to
%ctaid.x or __builtin_gcn_dim_pos (0) when first and for !first
increment that by num_blocks or num_workgroups each time and only
return false when we are above num_teams_lower.
Any help with actually implementing this for the 2 architectures highly
appreciated.
2021-11-12 Jakub Jelinek <jakub@redhat.com>
gcc/
* omp-builtins.def (BUILT_IN_GOMP_TEAMS): Remove.
(BUILT_IN_GOMP_TEAMS4): New.
* builtin-types.def (BT_FN_VOID_UINT_UINT): Remove.
(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
* omp-low.c (lower_omp_teams): Use GOMP_teams4 instead of
GOMP_teams, pass to it also num_teams lower-bound expression
or a dup of upper-bound if it is missing and a flag whether
it is the first call or not.
gcc/fortran/
* types.def (BT_FN_VOID_UINT_UINT): Remove.
(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
libgomp/
* libgomp_g.h (GOMP_teams4): Declare.
* libgomp.map (GOMP_5.1): Export GOMP_teams4.
* target.c (GOMP_teams4): New function.
* config/nvptx/target.c (GOMP_teams): Remove.
(GOMP_teams4): New function.
* config/gcn/target.c (GOMP_teams): Remove.
(GOMP_teams4): New function.
* testsuite/libgomp.c/teams-4.c (main): Expect exactly 2
teams instead of <= 2.
* testsuite/libgomp.c-c++-common/teams-2.c: New test.
This patch releases the device lock on a sanity-checking error path in
transfer combining (cbuf) handling in libgomp:target.c. This shouldn't
happen when handling well-formed mapping clauses, but erroneous clauses
can currently cause a hang if the condition triggers.
2021-12-10 Julian Brown <julian@codesourcery.com>
libgomp/
* target.c (gomp_copy_host2dev): Release device lock on cbuf
error path.
This patch implements the omp_get_device_num library routine, specified in
OpenMP 5.0.
GOMP_DEVICE_NUM_VAR is a macro symbol which defines name of a "device number"
variable, is defined on the device-side libgomp, has it's address returned to
host-side libgomp during device initialization, and the host libgomp then
sets its value to the designated device number.
libgomp/ChangeLog:
* icv-device.c (omp_get_device_num): New API function, host side.
* fortran.c (omp_get_device_num_): New interface function.
* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol.
* libgomp.map (OMP_5.0.2): New version space with omp_get_device_num,
omp_get_device_num_.
* libgomp.texi (omp_get_device_num): Add documentation for new API
function.
* omp.h.in (omp_get_device_num): Add declaration.
* omp_lib.f90.in (omp_get_device_num): Likewise.
* omp_lib.h.in (omp_get_device_num): Likewise.
* target.c (gomp_load_image_to_device): If additional entry for device
number exists at end of returned entries from 'load_image_func' hook,
copy the assigned device number over to the device variable.
* config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
(omp_get_device_num): New API function, device side.
* plugin/plugin-gcn.c ("symcat.h"): Add include.
(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
at end of returned 'target_table' entries.
* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
(omp_get_device_num): New API function, device side.
* plugin/plugin-nvptx.c ("symcat.h"): Add include.
(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
at end of returned 'target_table' entries.
* testsuite/lib/libgomp.exp
(check_effective_target_offload_target_intelmic): New function for
testing for intelmic offloading.
* testsuite/libgomp.c-c++-common/target-45.c: New test.
* testsuite/libgomp.fortran/target10.f90: New test.
The host data might not be computed yet (by an earlier asynchronous compute
region, for example.
libgomp/
* target.c (gomp_coalesce_buf_add): Update comment.
(gomp_copy_host2dev, gomp_map_vars_internal): Don't expect to see
'aq && cbuf'.
(gomp_map_vars_internal): Only 'if (!aq)', do
'gomp_coalesce_buf_add'.
* testsuite/libgomp.oacc-c-c++-common/async-data-1-2.c: Remove
XFAIL.
Co-Authored-By: Julian Brown <julian@codesourcery.com>
This patch fixes several places in libgomp/target.c where "ephemeral" data
(on the stack or in temporary heap locations) may be used as the source of
an asynchronous host-to-device copy that may not complete before the host
data disappears.
An existing, but flawed, workaround for this problem in the AMD GCN
libgomp offloading plugin is currently present on mainline, and was
posted for the og9 branch here:
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg00901.html
and previous versions of this patch were posted here (for mainline/og9):
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg01482.htmlhttps://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01026.html
libgomp/
* libgomp.h (gomp_copy_host2dev): Update prototype.
* oacc-mem.c (memcpy_tofrom_device, update_dev_host): Add new
argument to gomp_copy_host2dev (false).
* plugin/plugin-gcn.c (struct copy_data): Remove free_src field.
(copy_data): Don't free src.
(queue_push_copy): Remove free_src handling.
(GOMP_OFFLOAD_dev2dev): Update call to queue_push_copy.
(GOMP_OFFLOAD_openacc_async_host2dev): Remove source-data
snapshotting.
(GOMP_OFFLOAD_openacc_async_dev2host): Update call to
queue_push_copy.
* target.c (goacc_device_copy_async): Add SRCADDR_ORIG parameter.
(gomp_copy_host2dev): Add EPHEMERAL parameter. Snapshot source
data when true, and set up deferred freeing of temporary buffer.
(gomp_copy_dev2host): Update call to goacc_device_copy_async.
(gomp_map_vars_existing, gomp_map_pointer, gomp_attach_pointer)
(gomp_detach_pointer, gomp_map_vars_internal, gomp_update): Update
calls to gomp_copy_host2dev with appropriate ephemeral argument.
* testsuite/libgomp.oacc-c-c++-common/async-data-1-1.c: Remove
XFAIL.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
This patch implement OpenMP 5.0 requirements of incrementing/decrementing
the reference count of a mapped structure at most once (across all elements)
on a construct.
This is implemented by pulling in libgomp/hashtab.h and using htab_t as a
pointer set. Structure element list siblings also have pointers-to-refcounts
linked together, to naturally achieve uniform increment/decrement without
repeating.
There are still some questions on whether using such a htab_t based set is
faster/slower than using a sorted pointer array based implementation. This
is to be researched on later.
libgomp/ChangeLog:
* hashtab.h (htab_clear): New function with initialization code
factored out from...
(htab_create): ...here, adjust to use htab_clear function.
* libgomp.h (REFCOUNT_SPECIAL): New symbol to denote range of
special refcount values, add comments.
(REFCOUNT_INFINITY): Adjust definition to use REFCOUNT_SPECIAL.
(REFCOUNT_LINK): Likewise.
(REFCOUNT_STRUCTELEM): New special refcount range for structure
element siblings.
(REFCOUNT_STRUCTELEM_P): Macro for testing for structure element
sibling maps.
(REFCOUNT_STRUCTELEM_FLAG_FIRST): Flag to indicate first sibling.
(REFCOUNT_STRUCTELEM_FLAG_LAST): Flag to indicate last sibling.
(REFCOUNT_STRUCTELEM_FIRST_P): Macro to test _FIRST flag.
(REFCOUNT_STRUCTELEM_LAST_P): Macro to test _LAST flag.
(struct splay_tree_key_s): Add structelem_refcount and
structelem_refcount_ptr fields into a union with dynamic_refcount.
Add comments.
(gomp_map_vars): Delete declaration.
(gomp_map_vars_async): Likewise.
(gomp_unmap_vars): Likewise.
(gomp_unmap_vars_async): Likewise.
(goacc_map_vars): New declaration.
(goacc_unmap_vars): Likewise.
* oacc-mem.c (acc_map_data): Adjust to use goacc_map_vars.
(goacc_enter_datum): Likewise.
(goacc_enter_data_internal): Likewise.
* oacc-parallel.c (GOACC_parallel_keyed): Adjust to use goacc_map_vars
and goacc_unmap_vars.
(GOACC_data_start): Adjust to use goacc_map_vars.
(GOACC_data_end): Adjust to use goacc_unmap_vars.
* target.c (hash_entry_type): New typedef.
(htab_alloc): New function hook for hashtab.h.
(htab_free): Likewise.
(htab_hash): Likewise.
(htab_eq): Likewise.
(hashtab.h): Add file include.
(gomp_increment_refcount): New function.
(gomp_decrement_refcount): Likewise.
(gomp_map_vars_existing): Add refcount_set parameter, adjust to use
gomp_increment_refcount.
(gomp_map_fields_existing): Add refcount_set parameter, adjust calls
to gomp_map_vars_existing.
(gomp_map_vars_internal): Add refcount_set parameter, add local openmp_p
variable to guard OpenMP specific paths, adjust calls to
gomp_map_vars_existing, add structure element sibling splay_tree_key
sequence creation code, adjust Fortran map case to avoid increment
under OpenMP.
(gomp_map_vars): Adjust to static, add refcount_set parameter, manage
local refcount_set if caller passed in NULL, adjust call to
gomp_map_vars_internal.
(gomp_map_vars_async): Adjust and rename into...
(goacc_map_vars): ...this new function, adjust call to
gomp_map_vars_internal.
(gomp_remove_splay_tree_key): New function with code factored out from
gomp_remove_var_internal.
(gomp_remove_var_internal): Add code to handle removing multiple
splay_tree_key sequence for structure elements, adjust code to use
gomp_remove_splay_tree_key for splay-tree key removal.
(gomp_unmap_vars_internal): Add refcount_set parameter, adjust to use
gomp_decrement_refcount.
(gomp_unmap_vars): Adjust to static, add refcount_set parameter, manage
local refcount_set if caller passed in NULL, adjust call to
gomp_unmap_vars_internal.
(gomp_unmap_vars_async): Adjust and rename into...
(goacc_unmap_vars): ...this new function, adjust call to
gomp_unmap_vars_internal.
(GOMP_target): Manage refcount_set and adjust calls to gomp_map_vars and
gomp_unmap_vars.
(GOMP_target_ext): Likewise.
(gomp_target_data_fallback): Adjust call to gomp_map_vars.
(GOMP_target_data): Likewise.
(GOMP_target_data_ext): Likewise.
(GOMP_target_end_data): Adjust call to gomp_unmap_vars.
(gomp_exit_data): Add refcount_set parameter, adjust to use
gomp_decrement_refcount, adjust to queue splay-tree keys for removal
after main loop.
(GOMP_target_enter_exit_data): Manage refcount_set and adjust calls to
gomp_map_vars and gomp_exit_data.
(gomp_target_task_fn): Likewise.
* testsuite/libgomp.c-c++-common/refcount-1.c: New testcase.
* testsuite/libgomp.c-c++-common/struct-elem-1.c: New testcase.
* testsuite/libgomp.c-c++-common/struct-elem-2.c: New testcase.
* testsuite/libgomp.c-c++-common/struct-elem-3.c: New testcase.
* testsuite/libgomp.c-c++-common/struct-elem-4.c: New testcase.
* testsuite/libgomp.c-c++-common/struct-elem-5.c: New testcase.
If configured with --enable-offload-defaulted, configured but not installed
offload compilers and libgomp plugins are silently ignored. Useful for
distribution compilers where those are in separate optional packages.
2021-04-28 Jakub Jelinek <jakub@redhat.com>
Tobias Burnus <tobias@codesourcery.com>
ChangeLog:
* configure.ac (--enable-offload-defaulted): New.
* configure: Regenerate.
gcc/ChangeLog:
* configure.ac (OFFLOAD_DEFAULTED): AC_DEFINE if offload-defaulted.
* gcc.c (process_command): New variable.
(driver::maybe_putenv_OFFLOAD_TARGETS): If OFFLOAD_DEFAULTED,
set it if -foffload is defaulted.
* lto-wrapper.c (OFFLOAD_TARGET_DEFAULT_ENV): Define.
(compile_offload_image): If OFFLOAD_DEFAULTED and
OFFLOAD_TARGET_DEFAULT is in the environment, don't fail
if corresponding mkoffload can't be found.
(compile_images_for_offload_targets): Likewise. Free and clear
offload_names if no valid offload is found.
* config.in: Regenerate.
* configure: Regenerate.
libgomp/ChangeLog:
* configure.ac (OFFLOAD_DEFAULTED): AC_DEFINE if offload-defaulted.
* target.c (gomp_load_plugin_for_device): If set and if a plugin
can't be dlopened, silently assume it has no devices.
* Makefile.in: Regenerate.
* config.h.in: Regenerate.
* configure: Regenerate.
This patch implements some parts of the target variable mapping changes
specified in OpenMP 5.0, including base-pointer attachment/detachment
behavior for array section list-items in map clauses, and ordering of
map clauses according to map kind.
2020-11-10 Chung-Lin Tang <cltang@codesourcery.com>
gcc/c-family/ChangeLog:
* c-common.h (c_omp_adjust_map_clauses): New declaration.
* c-omp.c (struct map_clause): Helper type for c_omp_adjust_map_clauses.
(c_omp_adjust_map_clauses): New function.
gcc/c/ChangeLog:
* c-parser.c (c_parser_omp_target_data): Add use of
new c_omp_adjust_map_clauses function. Add GOMP_MAP_ATTACH_DETACH as
handled map clause kind.
(c_parser_omp_target_enter_data): Likewise.
(c_parser_omp_target_exit_data): Likewise.
(c_parser_omp_target): Likewise.
* c-typeck.c (handle_omp_array_sections): Adjust COMPONENT_REF case to
use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type.
(c_finish_omp_clauses): Adjust bitmap checks to allow struct decl and
same struct field access to co-exist on OpenMP construct.
gcc/cp/ChangeLog:
* parser.c (cp_parser_omp_target_data): Add use of
new c_omp_adjust_map_clauses function. Add GOMP_MAP_ATTACH_DETACH as
handled map clause kind.
(cp_parser_omp_target_enter_data): Likewise.
(cp_parser_omp_target_exit_data): Likewise.
(cp_parser_omp_target): Likewise.
* semantics.c (handle_omp_array_sections): Adjust COMPONENT_REF case to
use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type. Fix
interaction between reference case and attach/detach.
(finish_omp_clauses): Adjust bitmap checks to allow struct decl and
same struct field access to co-exist on OpenMP construct.
gcc/ChangeLog:
* gimplify.c (is_or_contains_p): New static helper function.
(omp_target_reorder_clauses): New function.
(gimplify_scan_omp_clauses): Add use of omp_target_reorder_clauses to
reorder clause list according to OpenMP 5.0 rules. Add handling of
GOMP_MAP_ATTACH_DETACH for OpenMP cases.
* omp-low.c (is_omp_target): New static helper function.
(scan_sharing_clauses): Add scan phase handling of GOMP_MAP_ATTACH/DETACH
for OpenMP cases.
(lower_omp_target): Add lowering handling of GOMP_MAP_ATTACH/DETACH for
OpenMP cases.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/clauses-2.c: Remove dg-error cases now valid.
* gfortran.dg/gomp/map-2.f90: Likewise.
* c-c++-common/gomp/map-5.c: New testcase.
libgomp/ChangeLog:
* libgomp.h (enum gomp_map_vars_kind): Adjust enum values to be bit-flag
usable.
* oacc-mem.c (acc_map_data): Adjust gomp_map_vars argument flags to
'GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_ENTER_DATA'.
(goacc_enter_datum): Likewise for call to gomp_map_vars_async.
(goacc_enter_data_internal): Likewise.
* target.c (gomp_map_vars_internal):
Change checks of GOMP_MAP_VARS_ENTER_DATA to use bit-and (&). Adjust use
of gomp_attach_pointer for OpenMP cases.
(gomp_exit_data): Add handling of GOMP_MAP_DETACH.
(GOMP_target_enter_exit_data): Add handling of GOMP_MAP_ATTACH.
* testsuite/libgomp.c-c++-common/ptr-attach-1.c: New testcase.
> Therefore, I think until omp_get_initial_device () value is changed, we
The following so far untested patch implements that change.
OpenMP 4.5 said for omp_get_initial_device:
The value of the device number is implementation defined. If it is between 0 and one less than
omp_get_num_devices() then it is valid for use with all device constructs and routines; if it is
outside that range, then it is only valid for use with the device memory routines and not in the
device clause.
and OpenMP 5.0 similarly, but OpenMP 5.1 says:
The value of the device number is the value returned by the omp_get_num_devices routine.
As the new value is compatible with what has been required earlier, I think
we can change it already now.
2020-10-22 Jakub Jelinek <jakub@redhat.com>
* icv.c (omp_get_initial_device): Remove including corresponding
ialias.
* icv-device.c (omp_get_initial_device): New function. Return
gomp_get_num_devices (). Add ialias.
* target.c (resolve_device): Don't fail with
OMP_TARGET_OFFLOAD=mandatory if device_id is equal to
gomp_get_num_devices ().
(omp_target_alloc, omp_target_free, omp_target_is_present,
omp_target_memcpy, omp_target_memcpy_rect, omp_target_associate_ptr,
omp_target_disassociate_ptr, omp_pause_resource): Use
gomp_get_num_devices () instead of GOMP_DEVICE_HOST_FALLBACK on the
first use in the functions, in uses dominated by the
gomp_get_num_devices call use num_devices_openmp instead.
* libgomp.texi (omp_get_initial_device): Document.
* config/gcn/icv-device.c (omp_get_initial_device): New function.
Add ialias.
* config/nvptx/icv-device.c (omp_get_initial_device): Likewise.
* testsuite/libgomp.c/target-40.c: New test.
> On 10/20/20 2:11 PM, Tobias Burnus wrote:
>
> > Unfortunately, the committed patch
> > (r11-4121-g1bfc07d150790fae93184a79a7cce897655cb37b)
> > causes build errors.
> >
> > The error seems to be provoked by function cloning – as the code
> > itself looks fine:
> > ...
> > struct gomp_device_descr *devices_s
> > = malloc (num_devices * sizeof (struct gomp_device_descr));
> > ...
> > for (i = 0; i < num_devices; i++)
> > if (!(devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
> > devices_s[num_devices_after_openmp++] = devices[i];
>
> gomp_target_init.part.0 ()
> {
> ...
> <bb 2>
> devices_s_1 = malloc (0);
> ...
> num_devices.16_67 = num_devices;
> ...
> if (num_devices.16_67 > 0)
> goto <bb 3>; [89.00%]
> else
> goto <bb 18>; [11.00%]
>
> Which seems to have an ordering problem.
This patch fixes the warning that breaks the bootstrap.
2020-10-20 Jakub Jelinek <jakub@redhat.com>
* target.c (gomp_target_init): Inside of the function, use automatic
variables corresponding to num_devices, num_devices_openmp and devices
global variables and update the globals only at the end of the
function.
This implements support for the OMP_TARGET_OFFLOAD environment variable
introduced in the OpenMP 5.0 standard, which controls how offloading
is handled. It may be set to MANDATORY (abort if offloading cannot be
performed), DISABLED (no offloading to devices) or DEFAULT (offload to
device if possible, fall back to host if not).
2020-10-20 Kwok Cheung Yeung <kcy@codesourcery.com>
libgomp/
* env.c (gomp_target_offload_var): New.
(parse_target_offload): New.
(handle_omp_display_env): Print value of OMP_TARGET_OFFLOAD.
(initialize_env): Parse OMP_TARGET_OFFLOAD.
* libgomp.h (gomp_target_offload_t): New.
(gomp_target_offload_var): New.
* libgomp.texi (OMP_TARGET_OFFLOAD): New section.
* target.c (resolve_device): Generate error if device not found and
offloading is mandatory.
(gomp_target_fallback): Generate error if offloading is mandatory.
(GOMP_target): Add argument in call to gomp_target_fallback.
(GOMP_target_ext): Likewise.
(gomp_target_data_fallback): Generate error if offloading is mandatory.
(GOMP_target_data): Add argument in call to gomp_target_data_fallback.
(GOMP_target_data_ext): Likewise.
(gomp_target_task_fn): Add argument in call to gomp_target_fallback.
(gomp_target_init): Return early if offloading is disabled.
Attach and detach operations are not supposed to affect structural or
dynamic reference counts for OpenACC. Previously they did so, which led to
subtle problems in some circumstances. We can avoid reference-counting
attach/detach operations by extending and slightly repurposing the
do_detach field in target_var_desc. It is now called is_attach to better
reflect its new role.
2020-07-27 Julian Brown <julian@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>
libgomp/
* libgomp.h (struct target_var_desc): Rename do_detach field to
is_attach.
* oacc-mem.c (goacc_exit_datum_1): Add assert. Don't set finalize for
GOMP_MAP_FORCE_DETACH. Update checking to use is_attach field.
(goacc_enter_data_internal): Don't affect reference counts
for attach mappings.
(goacc_exit_data_internal): Don't affect reference counts for detach
mappings.
* target.c (gomp_map_vars_existing): Don't affect reference counts for
attach mappings.
(gomp_map_vars_internal): Set renamed is_attach flag unconditionally to
mark attach mappings.
(gomp_unmap_vars_internal): Use is_attach flag to prevent affecting
reference count for attach mappings.
* testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/mdc-refcount-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/mdc-refcount-2.c: New test.
* testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90: Mark
test as shouldfail.
* testsuite/libgomp.oacc-fortran/deep-copy-6.f90: Adjust to fail
gracefully in no-finalize mode.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
The call to gomp_detach_pointer in gomp_unmap_vars_internal does not
need to force finalization, and doing so may mask mismatched pointer
attachments/detachments. This patch removes the forcing.
2020-07-16 Julian Brown <julian@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>
libgomp/
* target.c (gomp_unmap_vars_internal): Remove unnecessary forcing of
finalization for detach operation.
* testsuite/libgomp.oacc-c-c++-common/structured-detach-underflow.c:
New test.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
This patch adjusts how dynamic reference counts work so that they match
the semantics of the source program more closely, instead of representing
"excess" reference counts beyond those that represent pointers in the
internal libgomp splay-tree data structure. This allows some corner
cases to be handled more gracefully.
2020-07-10 Julian Brown <julian@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>
libgomp/
* libgomp.h (struct splay_tree_key_s): Change virtual_refcount to
dynamic_refcount.
(struct gomp_device_descr): Remove GOMP_MAP_VARS_OPENACC_ENTER_DATA.
* oacc-mem.c (acc_map_data): Substitute virtual_refcount for
dynamic_refcount.
(acc_unmap_data): Update comment.
(goacc_map_var_existing, goacc_enter_datum): Adjust for
dynamic_refcount semantics.
(goacc_exit_datum_1, goacc_exit_datum): Re-add some error checking.
Adjust for dynamic_refcount semantics.
(goacc_enter_data_internal): Implement "present" case of dynamic
memory-map handling here. Update "non-present" case for
dynamic_refcount semantics.
(goacc_exit_data_internal): Use goacc_exit_datum_1.
* target.c (gomp_map_vars_internal): Remove
GOMP_MAP_VARS_OPENACC_ENTER_DATA handling. Update for dynamic_refcount
handling.
(gomp_unmap_vars_internal): Remove virtual_refcount handling.
(gomp_load_image_to_device): Substitute dynamic_refcount for
virtual_refcount.
* testsuite/libgomp.oacc-c-c++-common/pr92843-1.c: Remove XFAILs.
* testsuite/libgomp.oacc-c-c++-common/refcounting-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/refcounting-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/struct-3-1-1.c: New test.
* testsuite/libgomp.oacc-fortran/deep-copy-6.f90: Remove XFAILs and
trace output.
* testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90: Remove
trace output.
* testsuite/libgomp.oacc-fortran/dynamic-incr-structural-1.f90: New
test.
* testsuite/libgomp.oacc-c-c++-common/structured-dynamic-lifetimes-4.c:
Remove stale comment.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-1-1.f90: Remove XFAILs.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-1-2.F90: Likewise.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-2-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-2-2.f90: Likewise.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-3-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-4-1.f90: Adjust XFAIL.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>