For the type of the target callbacks we use elsehwere void (*) (void *) and
IMHO should use that for the reverse offload fallback as well (where the actual
callback is emitted using the same code as for host fallback or device kernel
entry routines), even when it is also ok to use void (*) () before C23 and
we aren't building libgomp with C23 yet. On some arches perhaps void (*) ()
could result in worse code generation because calls in that case like casts
to unprototyped functions need to sometimes pass argument in two different spots
etc. so that it deals with both passing it through ... and as a named argument.
2024-03-04 Jakub Jelinek <jakub@redhat.com>
PR libgomp/114216
* target.c (gomp_target_rev): Change host_fn type and corresponding
cast from void (*)() to void (*) (void *).
This patch adds support for non-constant component offsets in "map"
clauses for OpenMP (and the equivalants for OpenACC), which are not able
to be sorted into order at compile time. Normally struct accesses in
such clauses are gathered together and sorted into increasing address
order after a "GOMP_MAP_STRUCT" node: if we have variable indices,
that is no longer possible.
This version of the patch scales back the previously-posted version to
merely add a diagnostic for incorrect usage of component accesses with
variably-indexed arrays of structs: the only permitted variant is where
we have multiple indices that are the same, but we could not prove so
at compile time. Rather than silently producing the wrong result for
cases where the indices are in fact different, we error out (e.g.,
"map(dtarr(i)%arrptr, dtarr(j)%arrptr(4:8))", for different i/j).
For now, multiple *constant* array indices are still supported (see
map-arrayofstruct-1.c). That could perhaps be addressed with a follow-up
patch, if necessary.
This version of the patch renumbers the GOMP_MAP_STRUCT_UNORD kind to
avoid clashing with the OpenACC "non-contiguous" dynamic array support
(though that is not yet applied to mainline).
2023-08-18 Julian Brown <julian@codesourcery.com>
gcc/
* gimplify.cc (extract_base_bit_offset): Add VARIABLE_OFFSET parameter.
(omp_get_attachment, omp_group_last, omp_group_base,
omp_directive_maps_explicitly): Add GOMP_MAP_STRUCT_UNORD support.
(omp_accumulate_sibling_list): Update calls to extract_base_bit_offset.
Support GOMP_MAP_STRUCT_UNORD.
(omp_build_struct_sibling_lists, gimplify_scan_omp_clauses,
gimplify_adjust_omp_clauses, gimplify_omp_target_update): Add
GOMP_MAP_STRUCT_UNORD support.
* omp-low.cc (lower_omp_target): Add GOMP_MAP_STRUCT_UNORD support.
* tree-pretty-print.cc (dump_omp_clause): Likewise.
include/
* gomp-constants.h (gomp_map_kind): Add GOMP_MAP_STRUCT_UNORD.
libgomp/
* oacc-mem.c (find_group_last, goacc_enter_data_internal,
goacc_exit_data_internal, GOACC_enter_exit_data): Add
GOMP_MAP_STRUCT_UNORD support.
* target.c (gomp_map_vars_internal): Add GOMP_MAP_STRUCT_UNORD support.
Detect incorrect use of variable indexing of arrays of structs.
(GOMP_target_enter_exit_data, gomp_target_task_fn): Add
GOMP_MAP_STRUCT_UNORD support.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c: New test.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c: New test.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c: New test.
* testsuite/libgomp.fortran/map-subarray-5.f90: New test.
This patch reworks clause expansion in the C, C++ and (to a lesser
extent) Fortran front ends for OpenMP and OpenACC mapping nodes used in
GPU offloading support.
At present a single clause may be turned into several mapping nodes,
or have its mapping type changed, in several places scattered through
the front- and middle-end. The analysis relating to which particular
transformations are needed for some given expression has become quite hard
to follow. Briefly, we manipulate clause types in the following places:
1. During parsing, in c_omp_adjust_map_clauses. Depending on a set of
rules, we may change a FIRSTPRIVATE_POINTER (etc.) mapping into
ATTACH_DETACH, or mark the decl addressable.
2. In semantics.cc or c-typeck.cc, clauses are expanded in
handle_omp_array_sections (called via {c_}finish_omp_clauses, or in
finish_omp_clauses itself. The two cases are for processing array
sections (the former), or non-array sections (the latter).
3. In gimplify.cc, we build sibling lists for struct accesses, which
groups and sorts accesses along with their struct base, creating
new ALLOC/RELEASE nodes for pointers.
4. In gimplify.cc:gimplify_adjust_omp_clauses, mapping nodes may be
adjusted or created.
This patch doesn't completely disrupt this scheme, though clause
types are no longer adjusted in c_omp_adjust_map_clauses (step 1).
Clause expansion in step 2 (for C and C++) now uses a single, unified
mechanism, parts of which are also reused for analysis in step 3.
Rather than the kind-of "ad-hoc" pattern matching on addresses used to
expand clauses used at present, a new method for analysing addresses is
introduced. This does a recursive-descent tree walk on expression nodes,
and emits a vector of tokens describing each "part" of the address.
This tokenized address can then be translated directly into mapping nodes,
with the assurance that no part of the expression has been inadvertently
skipped or misinterpreted. In this way, all the variations of ways
pointers, arrays, references and component accesses might be combined
can be teased apart into easily-understood cases - and we know we've
"parsed" the whole address before we start analysis, so the right code
paths can easily be selected.
For example, a simple access "arr[idx]" might parse as:
base-decl access-indexed-array
or "mystruct->foo[x]" with a pointer "foo" component might parse as:
base-decl access-pointer component-selector access-pointer
A key observation is that support for "array" bases, e.g. accesses
whose root nodes are not structures, but describe scalars or arrays,
and also *one-level deep* structure accesses, have first-class support
in gimplify and beyond. Expressions that use deeper struct accesses
or e.g. multiple indirections were more problematic: some cases worked,
but lots of cases didn't. This patch reimplements the support for those
in gimplify.cc, again using the new "address tokenization" support.
An expression like "mystruct->foo->bar[0:10]" used in a mapping node will
translate the right-hand access directly in the front-end. The base for
the access will be "mystruct->foo". This is handled recursively in
gimplify.cc -- there may be several accesses of "mystruct"'s members
on the same directive, so the sibling-list building machinery can be
used again. (This was already being done for OpenACC, but the new
implementation differs somewhat in details, and is more robust.)
For OpenMP, in the case where the base pointer itself,
i.e. "mystruct->foo" here, is NOT mapped on the same directive, we
create a "fragile" mapping. This turns the "foo" component access
into a zero-length allocation (which is a new feature for the runtime,
so support has been added there too).
A couple of changes have been made to how mapping clauses are turned
into mapping nodes:
The first change is based on the observation that it is probably never
correct to use GOMP_MAP_ALWAYS_POINTER for component accesses (e.g. for
references), because if the containing struct is already mapped on the
target then the host version of the pointer in question will be corrupted
if the struct is copied back from the target. This patch removes all
such uses, across each of C, C++ and Fortran.
The second change is to the way that GOMP_MAP_ATTACH_DETACH nodes
are processed during sibling-list creation. For OpenMP, for pointer
components, we must map the base pointer separately from an array section
that uses the base pointer, so e.g. we must have both "map(mystruct.base)"
and "map(mystruct.base[0:10])" mappings. These create nodes such as:
GOMP_MAP_TOFROM mystruct.base
G_M_TOFROM *mystruct.base [len: 10*elemsize] G_M_ATTACH_DETACH mystruct.base
Instead of using the first of these directly when building the struct
sibling list then skipping the group using GOMP_MAP_ATTACH_DETACH,
leading to:
GOMP_MAP_STRUCT mystruct [len: 1] GOMP_MAP_TOFROM mystruct.base
we now introduce a new "mini-pass", omp_resolve_clause_dependencies, that
drops the GOMP_MAP_TOFROM for the base pointer, marks the second group
as having had a base-pointer mapping, then omp_build_struct_sibling_lists
can create:
GOMP_MAP_STRUCT mystruct [len: 1] GOMP_MAP_ALLOC mystruct.base [len: ptrsize]
This ends up working better in many cases, particularly those involving
references. (The "alloc" space is immediately overwritten by a pointer
attachment, so this is mildly more efficient than a redundant TO mapping
at runtime also.)
There is support in the address tokenizer for "arbitrary" base expressions
which aren't rooted at a decl, but that is not used as present because
such addresses are disallowed at parse time.
In the front-ends, the address tokenization machinery is mostly only
used for clause expansion and not for diagnostics at present. It could
be used for those too, which would allow more of my previous "address
inspector" implementation to be removed.
The new bits in gimplify.cc work with OpenACC also.
This version of the patch addresses several first-pass review comments
from Tobias, and fixes a few previously-missed cases for manually-managed
ragged array mappings (including cases using references). Some arbitrary
differences between handling of clause expansion for C vs. C++ have also
been fixed, and some fragments from later in the patch series have been
moved forward (where they were useful for fixing bugs). Several new
test cases have been added.
2023-11-29 Julian Brown <julian@codesourcery.com>
gcc/c-family/
* c-common.h (c_omp_region_type): Add C_ORT_EXIT_DATA,
C_ORT_OMP_EXIT_DATA and C_ORT_ACC_TARGET.
(omp_addr_token): Add forward declaration.
(c_omp_address_inspector): New class.
* c-omp.cc (c_omp_adjust_map_clauses): Mark decls addressable here, but
do not change any mapping node types.
(c_omp_address_inspector::unconverted_ref_origin,
c_omp_address_inspector::component_access_p,
c_omp_address_inspector::check_clause,
c_omp_address_inspector::get_root_term,
c_omp_address_inspector::map_supported_p,
c_omp_address_inspector::get_origin,
c_omp_address_inspector::maybe_unconvert_ref,
c_omp_address_inspector::maybe_zero_length_array_section,
c_omp_address_inspector::expand_array_base,
c_omp_address_inspector::expand_component_selector,
c_omp_address_inspector::expand_map_clause): New methods.
(omp_expand_access_chain): New function.
gcc/c/
* c-parser.cc (c_parser_oacc_all_clauses): Add TARGET_P parameter. Use
to select region type for c_finish_omp_clauses call.
(c_parser_oacc_loop): Update calls to c_parser_oacc_all_clauses.
(c_parser_oacc_compute): Likewise.
(c_parser_omp_target_data, c_parser_omp_target_enter_data): Support
ATTACH kind.
(c_parser_omp_target_exit_data): Support DETACH kind.
(check_clauses): Handle GOMP_MAP_POINTER and GOMP_MAP_ATTACH here.
* c-typeck.cc (handle_omp_array_sections_1,
handle_omp_array_sections, c_finish_omp_clauses): Use
c_omp_address_inspector class and OMP address tokenizer to analyze and
expand map clause expressions. Fix some diagnostics. Fix "is OpenACC"
condition for C_ORT_ACC_TARGET addition.
gcc/cp/
* parser.cc (cp_parser_oacc_all_clauses): Add TARGET_P parameter. Use
to select region type for finish_omp_clauses call.
(cp_parser_omp_target_data, cp_parser_omp_target_enter_data): Support
GOMP_MAP_ATTACH kind.
(cp_parser_omp_target_exit_data): Support GOMP_MAP_DETACH kind.
(cp_parser_oacc_declare): Update call to cp_parser_oacc_all_clauses.
(cp_parser_oacc_loop): Update calls to cp_parser_oacc_all_clauses.
(cp_parser_oacc_compute): Likewise.
* pt.cc (tsubst_expr): Use C_ORT_ACC_TARGET for call to
tsubst_omp_clauses for OpenACC compute regions.
* semantics.cc (cp_omp_address_inspector): New class, derived from
c_omp_address_inspector.
(handle_omp_array_sections_1, handle_omp_array_sections,
finish_omp_clauses): Use cp_omp_address_inspector class and OMP address
tokenizer to analyze and expand OpenMP map clause expressions. Fix
some diagnostics. Support C_ORT_ACC_TARGET.
(finish_omp_target): Handle GOMP_MAP_POINTER.
gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_array_section): Add OPENMP parameter.
Use GOMP_MAP_ATTACH_DETACH instead of GOMP_MAP_ALWAYS_POINTER for
derived type components.
(gfc_trans_omp_clauses): Update calls to gfc_trans_omp_array_section.
gcc/
* gimplify.cc (build_struct_comp_nodes): Don't process
GOMP_MAP_ATTACH_DETACH "middle" nodes here.
(omp_mapping_group): Add REPROCESS_STRUCT and FRAGILE booleans for
nested struct handling.
(omp_strip_components_and_deref, omp_strip_indirections): Remove
functions.
(omp_get_attachment): Handle GOMP_MAP_DETACH here.
(omp_group_last): Handle GOMP_MAP_*, GOMP_MAP_DETACH,
GOMP_MAP_ATTACH_DETACH groups for "exit data" of reference-to-pointer
component array sections.
(omp_gather_mapping_groups_1): Initialise reprocess_struct and fragile
fields.
(omp_group_base): Handle GOMP_MAP_ATTACH_DETACH after GOMP_MAP_STRUCT.
(omp_index_mapping_groups_1): Skip reprocess_struct groups.
(omp_get_nonfirstprivate_group, omp_directive_maps_explicitly,
omp_resolve_clause_dependencies, omp_first_chained_access_token): New
functions.
(omp_check_mapping_compatibility): Adjust accepted node combinations
for "from" clauses using release instead of alloc.
(omp_accumulate_sibling_list): Add GROUP_MAP, ADDR_TOKENS, FRAGILE_P,
REPROCESSING_STRUCT, ADDED_TAIL parameters. Use OMP address tokenizer
to analyze addresses. Reimplement nested struct handling, and
implement "fragile groups".
(omp_build_struct_sibling_lists): Adjust for changes to
omp_accumulate_sibling_list. Recalculate bias for ATTACH_DETACH nodes
after GOMP_MAP_STRUCT nodes.
(gimplify_scan_omp_clauses): Call omp_resolve_clause_dependencies. Use
OMP address tokenizer.
(gimplify_adjust_omp_clauses_1): Use build_fold_indirect_ref_loc
instead of build_simple_mem_ref_loc.
* omp-general.cc (omp-general.h, tree-pretty-print.h): Include.
(omp_addr_tokenizer): New namespace.
(omp_addr_tokenizer::omp_addr_token): New.
(omp_addr_tokenizer::omp_parse_component_selector,
omp_addr_tokenizer::omp_parse_ref,
omp_addr_tokenizer::omp_parse_pointer,
omp_addr_tokenizer::omp_parse_access_method,
omp_addr_tokenizer::omp_parse_access_methods,
omp_addr_tokenizer::omp_parse_structure_base,
omp_addr_tokenizer::omp_parse_structured_expr,
omp_addr_tokenizer::omp_parse_array_expr,
omp_addr_tokenizer::omp_access_chain_p,
omp_addr_tokenizer::omp_accessed_addr): New functions.
(omp_parse_expr, debug_omp_tokenized_addr): New functions.
* omp-general.h (omp_addr_tokenizer::access_method_kinds,
omp_addr_tokenizer::structure_base_kinds,
omp_addr_tokenizer::token_type,
omp_addr_tokenizer::omp_addr_token,
omp_addr_tokenizer::omp_access_chain_p,
omp_addr_tokenizer::omp_accessed_addr): New.
(omp_addr_token, omp_parse_expr): New.
* omp-low.cc (scan_sharing_clauses): Skip error check for references
to pointers.
* tree.h (OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED): New macro.
gcc/testsuite/
* c-c++-common/gomp/clauses-2.c: Fix error output.
* c-c++-common/gomp/target-implicit-map-2.c: Adjust scan output.
* c-c++-common/gomp/target-50.c: Adjust scan output.
* c-c++-common/gomp/target-enter-data-1.c: Adjust scan output.
* g++.dg/gomp/static-component-1.C: New test.
* gcc.dg/gomp/target-3.c: Adjust scan output.
* gfortran.dg/gomp/map-9.f90: Adjust scan output.
libgomp/
* target.c (gomp_map_pointer): Modify zero-length array section
pointer handling.
(gomp_attach_pointer): Likewise.
(gomp_map_fields_existing): Use gomp_map_0len_lookup.
(gomp_attach_pointer): Allow attaching null pointers (or Fortran
"unassociated" pointers).
(gomp_map_vars_internal): Handle zero-sized struct members. Add
diagnostic for unmapped struct pointer members.
* testsuite/libgomp.c-c++-common/baseptrs-1.c: New test.
* testsuite/libgomp.c-c++-common/baseptrs-2.c: New test.
* testsuite/libgomp.c-c++-common/baseptrs-6.c: New test.
* testsuite/libgomp.c-c++-common/baseptrs-7.c: New test.
* testsuite/libgomp.c-c++-common/ptr-attach-2.c: New test.
* testsuite/libgomp.c-c++-common/target-implicit-map-2.c: Fix missing
"free".
* testsuite/libgomp.c-c++-common/target-implicit-map-5.c: New test.
* testsuite/libgomp.c-c++-common/target-map-zlas-1.c: New test.
* testsuite/libgomp.c++/class-array-1.C: New test.
* testsuite/libgomp.c++/baseptrs-3.C: New test.
* testsuite/libgomp.c++/baseptrs-4.C: New test.
* testsuite/libgomp.c++/baseptrs-5.C: New test.
* testsuite/libgomp.c++/baseptrs-8.C: New test.
* testsuite/libgomp.c++/baseptrs-9.C: New test.
* testsuite/libgomp.c++/ref-mapping-1.C: New test.
* testsuite/libgomp.c++/target-48.C: New test.
* testsuite/libgomp.c++/target-49.C: New test.
* testsuite/libgomp.c++/target-exit-data-reftoptr-1.C: New test.
* testsuite/libgomp.c++/target-lambda-1.C: Update for OpenMP 5.2
semantics.
* testsuite/libgomp.c++/target-this-3.C: Likewise.
* testsuite/libgomp.c++/target-this-4.C: Likewise.
* testsuite/libgomp.fortran/struct-elem-map-1.f90: Add temporary XFAIL.
* testsuite/libgomp.fortran/target-enter-data-6.f90: Likewise.
This commit adds -fopenmp-allocators which enables support for
'omp allocators' and 'omp allocate' that are associated with a Fortran
allocate-stmt. If such a construct is encountered, an error is shown,
unless the -fopenmp-allocators flag is present.
With -fopenmp -fopenmp-allocators, those constructs get turned into
GOMP_alloc allocations, while -fopenmp-allocators (also without -fopenmp)
ensures deallocation and reallocation (via intrinsic assignments) are
properly directed to GOMP_free/omp_realloc - while normal Fortran
allocations are processed by free/realloc.
In order to distinguish a 'malloc'ed from a 'GOMP_alloc'ed memory, the
version field of the Fortran array discriptor is (mis)used: 0 indicates
the normal Fortran allocation while 1 denotes GOMP_alloc. For scalars,
there is record keeping in libgomp: GOMP_add_alloc(ptr) will add the
pointer address to a splay_tree while GOMP_is_alloc(ptr) will return
true it was previously added but also removes it from the list.
Besides Fortran FE work, BUILT_IN_GOMP_REALLOC is no part of
omp-builtins.def and libgomp gains the mentioned two new function.
gcc/ChangeLog:
* builtin-types.def (BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE): New.
* omp-builtins.def (BUILT_IN_GOMP_REALLOC): New.
* builtins.cc (builtin_fnspec): Handle it.
* gimple-ssa-warn-access.cc (fndecl_alloc_p,
matching_alloc_calls_p): Likewise.
* gimple.cc (nonfreeing_call_p): Likewise.
* predict.cc (expr_expected_value_1): Likewise.
* tree-ssa-ccp.cc (evaluate_stmt): Likewise.
* tree.cc (fndecl_dealloc_argno): Likewise.
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_ALLOCATE
and EXEC_OMP_ALLOCATORS.
* f95-lang.cc (ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST):
Add 'ECF_LEAF | ECF_MALLOC' to existing 'ECF_NOTHROW'.
(ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LEAF_LIST): Define.
* gfortran.h (gfc_omp_clauses): Add contained_in_target_construct.
* invoke.texi (-fopenacc, -fopenmp): Update based on C version.
(-fopenmp-simd): New, based on C version.
(-fopenmp-allocators): New.
* lang.opt (fopenmp-allocators): Add.
* openmp.cc (resolve_omp_clauses): For allocators/allocate directive,
add target and no dynamic_allocators diagnostic and more invalid
diagnostic.
* parse.cc (decode_omp_directive): Set contains_teams_construct.
* trans-array.h (gfc_array_allocate): Update prototype.
(gfc_conv_descriptor_version): New prototype.
* trans-decl.cc (gfc_init_default_dt): Fix comment.
* trans-array.cc (gfc_conv_descriptor_version): New.
(gfc_array_allocate): Support GOMP_alloc allocation.
(gfc_alloc_allocatable_for_assignment, structure_alloc_comps):
Handle GOMP_free/omp_realloc as needed.
* trans-expr.cc (gfc_conv_procedure_call): Likewise.
(alloc_scalar_allocatable_for_assignment): Likewise.
* trans-intrinsic.cc (conv_intrinsic_move_alloc): Likewise.
* trans-openmp.cc (gfc_trans_omp_allocators,
gfc_trans_omp_directive): Handle allocators/allocate directive.
(gfc_omp_call_add_alloc, gfc_omp_call_is_alloc): New.
* trans-stmt.h (gfc_trans_allocate): Update prototype.
* trans-stmt.cc (gfc_trans_allocate): Support GOMP_alloc.
* trans-types.cc (gfc_get_dtype_rank_type): Set version field.
* trans.cc (gfc_allocate_using_malloc, gfc_allocate_allocatable):
Update to handle GOMP_alloc.
(gfc_deallocate_with_status, gfc_deallocate_scalar_with_status):
Handle GOMP_free.
(trans_code): Update call.
* trans.h (gfc_allocate_allocatable, gfc_allocate_using_malloc):
Update prototype.
(gfc_omp_call_add_alloc, gfc_omp_call_is_alloc): New prototype.
* types.def (BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE): New.
libgomp/ChangeLog:
* allocator.c (struct fort_alloc_splay_tree_key_s,
fort_alloc_splay_compare, GOMP_add_alloc, GOMP_is_alloc): New.
* libgomp.h: Define splay_tree_static for 'reverse' splay tree.
* libgomp.map (GOMP_5.1.2): New; add GOMP_add_alloc and
GOMP_is_alloc; move GOMP_target_map_indirect_ptr from ...
(GOMP_5.1.1): ... here.
* libgomp.texi (Impl. Status, Memory management): Update for
allocators/allocate directives.
* splay-tree.c: Handle splay_tree_static define to declare all
functions as static.
(splay_tree_lookup_node): New.
* splay-tree.h: Handle splay_tree_decl_only define.
(splay_tree_lookup_node): New prototype.
* target.c: Define splay_tree_static for 'reverse'.
* testsuite/libgomp.fortran/allocators-1.f90: New test.
* testsuite/libgomp.fortran/allocators-2.f90: New test.
* testsuite/libgomp.fortran/allocators-3.f90: New test.
* testsuite/libgomp.fortran/allocators-4.f90: New test.
* testsuite/libgomp.fortran/allocators-5.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/allocate-14.f90: Add coarray and
not-listed tests.
* gfortran.dg/gomp/allocate-5.f90: Remove sorry dg-message.
* gfortran.dg/bind_c_array_params_2.f90: Update expected
dump for dtype '.version=0'.
* gfortran.dg/gomp/allocate-16.f90: New test.
* gfortran.dg/gomp/allocators-3.f90: New test.
* gfortran.dg/gomp/allocators-4.f90: New test.
This adds support for the 'indirect' clause in the 'declare target'
directive. Functions declared as indirect may be called via function
pointers passed from the host in offloaded code.
Virtual calls to member functions via the object pointer in C++ are
currently not supported in target regions.
2023-11-07 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/c-family/
* c-attribs.cc (c_common_attribute_table): Add attribute for
indirect functions.
* c-pragma.h (enum parma_omp_clause): Add entry for indirect clause.
gcc/c/
* c-decl.cc (c_decl_attributes): Add attribute for indirect
functions.
* c-lang.h (c_omp_declare_target_attr): Add indirect field.
* c-parser.cc (c_parser_omp_clause_name): Handle indirect clause.
(c_parser_omp_clause_indirect): New.
(c_parser_omp_all_clauses): Handle indirect clause.
(OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(c_parser_omp_declare_target): Handle indirect clause. Emit error
message if device_type or indirect clauses used alone. Emit error
if indirect clause used with device_type that is not 'any'.
(OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(c_parser_omp_begin): Handle indirect clause.
* c-typeck.cc (c_finish_omp_clauses): Handle indirect clause.
gcc/cp/
* cp-tree.h (cp_omp_declare_target_attr): Add indirect field.
* decl2.cc (cplus_decl_attributes): Add attribute for indirect
functions.
* parser.cc (cp_parser_omp_clause_name): Handle indirect clause.
(cp_parser_omp_clause_indirect): New.
(cp_parser_omp_all_clauses): Handle indirect clause.
(handle_omp_declare_target_clause): Add extra parameter. Add
indirect attribute for indirect functions.
(OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(cp_parser_omp_declare_target): Handle indirect clause. Emit error
message if device_type or indirect clauses used alone. Emit error
if indirect clause used with device_type that is not 'any'.
(OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(cp_parser_omp_begin): Handle indirect clause.
* semantics.cc (finish_omp_clauses): Handle indirect clause.
gcc/
* lto-cgraph.cc (enum LTO_symtab_tags): Add tag for indirect
functions.
(output_offload_tables): Write indirect functions.
(input_offload_tables): read indirect functions.
* lto-section-names.h (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New.
* omp-builtins.def (BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR): New.
* omp-offload.cc (offload_ind_funcs): New.
(omp_discover_implicit_declare_target): Add functions marked with
'omp declare target indirect' to indirect functions list.
(omp_finish_file): Add indirect functions to section for offload
indirect functions.
(execute_omp_device_lower): Redirect indirect calls on target by
passing function pointer to BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR.
(pass_omp_device_lower::gate): Run pass_omp_device_lower if
indirect functions are present on an accelerator device.
* omp-offload.h (offload_ind_funcs): New.
* tree-core.h (omp_clause_code): Add OMP_CLAUSE_INDIRECT.
* tree.cc (omp_clause_num_ops): Add entry for OMP_CLAUSE_INDIRECT.
(omp_clause_code_name): Likewise.
* tree.h (OMP_CLAUSE_INDIRECT_EXPR): New.
* config/gcn/mkoffload.cc (process_asm): Process offload_ind_funcs
section. Count number of indirect functions.
(process_obj): Emit number of indirect functions.
* config/nvptx/mkoffload.cc (ind_func_ids, ind_funcs_tail): New.
(process): Emit offload_ind_func_table in PTX code. Emit indirect
function names and count in image.
* config/nvptx/nvptx.cc (nvptx_record_offload_symbol): Mark
indirect functions in PTX code with IND_FUNC_MAP.
gcc/testsuite/
* c-c++-common/gomp/declare-target-7.c: Update expected error message.
* c-c++-common/gomp/declare-target-indirect-1.c: New.
* c-c++-common/gomp/declare-target-indirect-2.c: New.
* g++.dg/gomp/attrs-21.C (v12): Update expected error message.
* g++.dg/gomp/declare-target-indirect-1.C: New.
* gcc.dg/gomp/attrs-21.c (v12): Update expected error message.
include/
* gomp-constants.h (GOMP_VERSION): Increment to 3.
(GOMP_VERSION_SUPPORTS_INDIRECT_FUNCS): New.
libgcc/
* offloadstuff.c (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New.
(__offload_ind_func_table): New.
(__offload_ind_funcs_end): New.
(__OFFLOAD_TABLE__): Add entries for indirect functions.
libgomp/
* Makefile.am (libgomp_la_SOURCES): Add target-indirect.c.
* Makefile.in: Regenerate.
* libgomp-plugin.h (GOMP_INDIRECT_ADDR_MAP): New define.
(GOMP_OFFLOAD_load_image): Add extra argument.
* libgomp.h (struct indirect_splay_tree_key_s): New.
(indirect_splay_tree_node, indirect_splay_tree,
indirect_splay_tree_key): New.
(indirect_splay_compare): New.
* libgomp.map (GOMP_5.1.1): Add GOMP_target_map_indirect_ptr.
* libgomp.texi (OpenMP 5.1): Update documentation on indirect
calls in target region and on indirect clause.
(Other new OpenMP 5.2 features): Add entry for virtual function calls.
* libgomp_g.h (GOMP_target_map_indirect_ptr): Add prototype.
* oacc-host.c (host_load_image): Add extra argument.
* target.c (gomp_load_image_to_device): If the GOMP_VERSION is high
enough, read host indirect functions table and pass to
load_image_func.
* config/accel/target-indirect.c: New.
* config/linux/target-indirect.c: New.
* config/gcn/team.c (build_indirect_map): Add prototype.
(gomp_gcn_enter_kernel): Initialize support for indirect
function calls on GCN target.
* config/nvptx/team.c (build_indirect_map): Add prototype.
(gomp_nvptx_main): Initialize support for indirect function
calls on NVPTX target.
* plugin/plugin-gcn.c (struct gcn_image_desc): Add field for
indirect functions count.
(GOMP_OFFLOAD_load_image): Add extra argument. If the GOMP_VERSION
is high enough, build address translation table and copy it to target
memory.
* plugin/plugin-nvptx.c (nvptx_tdata): Add field for indirect
functions count.
(GOMP_OFFLOAD_load_image): Add extra argument. If the GOMP_VERSION
is high enough, Build address translation table and copy it to target
memory.
* testsuite/libgomp.c-c++-common/declare-target-indirect-1.c: New.
* testsuite/libgomp.c-c++-common/declare-target-indirect-2.c: New.
* testsuite/libgomp.c++/declare-target-indirect-1.C: New.
This reverts commit ffffffffffffffffffffffffffffffffffffffff.
This should get rejected because of the invalid hash.
If it still is accepted, it does something sensible:
It removes tailing white space from a line in libgomp/target.c.
Fixes for commit r14-2792-g25072a477a56a727b369bf9b20f4d18198ff5894
"OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect",
namely:
In that commit, the code was changed to handle shared-memory devices;
however, as pointed out, omp_target_memcpy_check already set the pointer
to NULL in that case. Hence, this commit reverts to the prior version.
In cuda.h, it adds cuMemcpyPeer{,Async} for symmetry for cuMemcpy3DPeer
(all currently unused) and in three structs, fixes reserved-member names
and remove a bogus 'const' in three structs.
And it changes a DLSYM to DLSYM_OPT as not all plugins support the new
functions, yet.
include/ChangeLog:
* cuda/cuda.h (CUDA_MEMCPY2D, CUDA_MEMCPY3D, CUDA_MEMCPY3D_PEER):
Remove bogus 'const' from 'const void *dst' and fix reserved-name
name in those structs.
(cuMemcpyPeer, cuMemcpyPeerAsync): Add.
libgomp/ChangeLog:
* target.c (omp_target_memcpy_rect_worker): Undo dim=1 change for
GOMP_OFFLOAD_CAP_SHARED_MEM.
(omp_target_memcpy_rect_copy): Likewise for lock condition.
(gomp_load_plugin_for_device): Use DLSYM_OPT not DLSYM for
memcpy3d/memcpy2d.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d,
GOMP_OFFLOAD_memcpy3d): Use memset 0 to nullify reserved and
unused src/dst fields for that mem type; remove '{src,dst}LOD = 0'.
When copying a 2D or 3D rectangular memmory block, the performance is
better when using CUDA's cuMemcpy2D/cuMemcpy3D instead of copying the
data one by one. That's what this commit does.
Additionally, it permits device-to-device copies, if neccessary using a
temporary variable on the host.
include/ChangeLog:
* cuda/cuda.h (CUlimit): Add CUDA_ERROR_NOT_INITIALIZED,
CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_INVALID_HANDLE.
(CUarray, CUmemorytype, CUDA_MEMCPY2D, CUDA_MEMCPY3D,
CUDA_MEMCPY3D_PEER): New typdefs.
(cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned,
cuMemcpy3D, cuMemcpy3DAsync, cuMemcpy3DPeer,
cuMemcpy3DPeerAsync): New prototypes.
libgomp/ChangeLog:
* libgomp-plugin.h (GOMP_OFFLOAD_memcpy2d,
GOMP_OFFLOAD_memcpy3d): New prototypes.
* libgomp.h (struct gomp_device_descr): Add memcpy2d_func
and memcpy3d_func.
* libgomp.texi (nvtpx): Document when cuMemcpy2D/cuMemcpy3D is used.
* oacc-host.c (memcpy2d_func, .memcpy3d_func): Init with NULL.
* plugin/cuda-lib.def (cuMemcpy2D, cuMemcpy2DUnaligned,
cuMemcpy3D): Invoke via CUDA_ONE_CALL.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d,
GOMP_OFFLOAD_memcpy3d): New.
* target.c (omp_target_memcpy_rect_worker):
(omp_target_memcpy_rect_check, omp_target_memcpy_rect_copy):
Permit all device-to-device copyies; invoke new plugins for
2D and 3D copying when available.
(gomp_load_plugin_for_device): DLSYM the new plugin functions.
* testsuite/libgomp.c/target-12.c: Fix dimension bug.
* testsuite/libgomp.fortran/target-12.f90: Likewise.
* testsuite/libgomp.fortran/target-memcpy-rect-1.f90: New test.
For C/C++ pointers, default implicit mapping firstprivatizes the pointer
but if the memory it points to is mapped, the it is updated to point to
the device memory (by attaching a zero sized array section of the pointed-to
storage).
However, if the pointed-to storage wasn't mapped, the pointer was set to
NULL on the device side (OpenMP 5.0/5.1 semantic). With this commit, the
pointer retains the on-host address in that case (OpenMP 5.2 semantic).
The new semantic avoids an explicit map/firstprivate/is_device_ptr in the
following sensible cases: Special values (e.g. pointer or 0x1, 0x2 etc.),
explicitly device allocated memory (e.g. omp_target_alloc), and with
(unified) shared memory.
(Note: With (U)SM, mappings still must be tracked, at least when
omp_target_associate_ptr does not fail when passing in two destinct pointers.)
libgomp/
PR middle-end/110270
* target.c (gomp_map_vars_internal): Copy host value instead of NULL
for GOMP_MAP_ZERO_LEN_ARRAY_SECTION if not mapped.
* libgomp.texi (OpenMP 5.2 Impl.): Mark as 'Y'.
* testsuite/libgomp.c/target-19.c: Update expected value.
* testsuite/libgomp.c++/target-18.C: Likewise.
* testsuite/libgomp.c++/target-19.C: Likewise.
* testsuite/libgomp.c-c++-common/requires-unified-addr-2.c: New test.
* testsuite/libgomp.c-c++-common/target-implicit-map-3.c: New test.
* testsuite/libgomp.c-c++-common/target-implicit-map-4.c: New test.
It turned out that gomp_init_targets_once() was not run when directly
calling 'omp target' or 'omp target (enter/exit) data' causing an
abort with OMP_TARGET_OFFLOAD=mandatory wrongly claiming that no
device is available. It was called a tiny bit later but few lines too
late for updating the default-device-var.
libgomp/ChangeLog:
* target.c (resolve_device): Call gomp_get_num_devices early to ensure
gomp_init_targets_once was called before using default-device-var.
* testsuite/libgomp.c/target-55.c: New test.
* testsuite/libgomp.c/target-55a.c: New test.
On 2023-06-14T11:42:22+0200, Tobias Burnus <tobias@codesourcery.com> wrote:
> On 14.06.23 10:09, Thomas Schwinge wrote:
>> Let me know if I should also adjust the new 'target { ! offload_device }'
>> diagnostic "[...] MANDATORY but only the host device is available" to
>> include a comma before 'but', for consistency with the other existing
>> diagnostics (cited above)?
>
> I think it makes sense to be consistent. Thus: Yes, please add the commas.
Fix-up for recent commit 18c8b56c7d
"OpenMP: Set default-device-var with OMP_TARGET_OFFLOAD=mandatory".
libgomp/
* target.c (resolve_device): Align a
'OMP_TARGET_OFFLOAD=mandatory' diagnostic with others.
* testsuite/libgomp.c/target-51.c: Adjust.
OMP_TARGET_OFFLOAD=mandatory handling was before inconsistent. Hence, in
OpenMP 5.2 it was clarified/extended by having implications on the
default-device-var; additionally, omp_initial_device and omp_invalid_device
enum values/PARAMETERs were added; support for it was added
in r13-1066-g1158fe43407568 including aborting for omp_invalid_device and
non-conforming device numbers. Only the mandatory handling was missing.
Namely, while the default-device-var is usually initialized to value 0,
with 'mandatory' it must have the value 'omp_invalid_device' if and only if
zero non-host devices are available. (The OMP_DEFAULT_DEVICE env var
overrides this as it comes semantically after the initialization.)
To achieve this, default-device-var is now initialized to MIN_INT. If
there is no 'mandatory', it is set to 0 directly after env var parsing.
Otherwise, it is updated in gomp_target_init to either 0 or
omp_invalid_device. To ensure INT_MIN is never seen by the user, both
the omp_get_default_device API routine and omp_display_env (user call
and OMP_DISPLAY_ENV env var) call gomp_init_targets_once() in that case.
libgomp/ChangeLog:
* env.c (gomp_default_icv_values): Init default_device_var to
an nonconforming value - INT_MIN.
(initialize_env): After env-var parsing, set default_device_var to
device 0 unless OMP_TARGET_OFFLOAD=mandatory.
(omp_display_env): If default_device_var is INT_MIN, call
gomp_init_targets_once.
* icv-device.c (omp_get_default_device): Likewise.
* libgomp.texi (OMP_DEFAULT_DEVICE): Update init description.
(OpenMP 5.2 Impl. Status): Mark OMP_TARGET_OFFLOAD=mandatory as 'Y'.
* target.c (resolve_device): Improve error message device-num < 0
with 'mandatory' and no no-host devices available.
(gomp_target_init): Set default-device-var if INT_MIN.
* testsuite/libgomp.c/target-48.c: New test.
* testsuite/libgomp.c/target-49.c: New test.
* testsuite/libgomp.c/target-50.c: New test.
* testsuite/libgomp.c/target-50a.c: New test.
* testsuite/libgomp.c/target-51.c: New test.
* testsuite/libgomp.c/target-52.c: New test.
* testsuite/libgomp.c/target-53.c: New test.
* testsuite/libgomp.c/target-54.c: New test.
Reduce number of enum values passed to libgomp as
GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} have the same semantic as
GOMP_MAP_FORCE_PRESENT (i.e. abort if not present, otherwise ignore);
that's different to GOMP_MAP_ALWAYS_PRESENT_{TO,TOFROM,FROM} which also
abort if not present but copy data when present. This is is a follow-up to
the commit r14-1579-g4ede915d5dde93 done 6 days ago.
Additionally, the commit improves a libgomp run-time and a C/C++ compile-time
error wording and extends testcases a tiny bit.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_clause_map): Reword error message for
clearness especially with 'omp target (enter/exit) data.'
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_clause_map): Reword error message for
clearness especially with 'omp target (enter/exit) data.'
* semantics.cc (handle_omp_array_sections): Handle
GOMP_MAP_{ALWAYS_,}PRESENT_{TO,TOFROM,FROM,ALLOC} enum values.
gcc/ChangeLog:
* gimplify.cc (gimplify_adjust_omp_clauses_1): Use
GOMP_MAP_FORCE_PRESENT for 'present alloc' implicit mapping.
(gimplify_adjust_omp_clauses): Change
GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} to the equivalent
GOMP_MAP_FORCE_PRESENT.
* omp-low.cc (lower_omp_target): Remove handling of no-longer valid
GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC}; update map kinds used for
to/from clauses with present modifier.
include/ChangeLog:
* gomp-constants.h (enum gomp_map_kind): Change the enum values
GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} to be compiler only.
(GOMP_MAP_PRESENT_P): Update to include also GOMP_MAP_FORCE_PRESENT.
libgomp/ChangeLog:
* target.c (gomp_to_device_kind_p, gomp_map_vars_internal): Replace
GOMP_MAP_PRESENT_{FROM,TO,TOFROM,ACLLOC} by GOMP_MAP_FORCE_PRESENT.
(gomp_map_vars_internal, gomp_update): Likewise; unify and improve
error message.
* testsuite/libgomp.c-c++-common/target-present-2.c: Update for
changed error message.
* testsuite/libgomp.fortran/target-present-1.f90: Likewise.
* testsuite/libgomp.fortran/target-present-2.f90: Likewise.
* testsuite/libgomp.oacc-c-c++-common/present-1.c: Likewise.
* testsuite/libgomp.c-c++-common/target-present-1.c: Likewise and
extend testcase to check that data is copied when needed.
* testsuite/libgomp.c-c++-common/target-present-3.c: Likewise.
* testsuite/libgomp.fortran/target-present-3.f90: Likewise.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/defaultmap-4.c: Update scan-tree-dump.
* c-c++-common/gomp/map-9.c: Likewise.
* gfortran.dg/gomp/defaultmap-8.f90: Likewise.
* gfortran.dg/gomp/map-11.f90: Likewise.
* gfortran.dg/gomp/target-update-1.f90: Likewise.
* gfortran.dg/gomp/map-12.f90: Likewise; also check original dump.
* c-c++-common/gomp/map-6.c: Update dg-error and also check
clause error with 'target (enter/exit) data'.
This implements support for the OpenMP 5.1 'present' modifier, which can be
used in map clauses in the 'target', 'target data', 'target data enter' and
'target data exit' constructs, and in the 'to' and 'from' clauses of the
'target update' construct. It is also supported in defaultmap.
The modifier triggers a fatal runtime error if the data specified by the
clause is not already present on the target device. It can also be combined
with 'always' in map clauses.
2023-06-06 Kwok Cheung Yeung <kcy@codesourcery.com>
Tobias Burnus <tobias@codesourcery.com>
gcc/c/
* c-parser.cc (c_parser_omp_clause_defaultmap,
c_parser_omp_clause_map): Parse 'present'.
(c_parser_omp_clause_to, c_parser_omp_clause_from): Remove.
(c_parser_omp_clause_from_to): New; parse to/from clauses with
optional present modifer.
(c_parser_omp_all_clauses): Update call.
(c_parser_omp_target_data, c_parser_omp_target_enter_data,
c_parser_omp_target_exit_data): Handle new map enum values
for 'present' mapping.
gcc/cp/
* parser.cc (cp_parser_omp_clause_defaultmap,
cp_parser_omp_clause_map): Parse 'present'.
(cp_parser_omp_clause_from_to): New; parse to/from
clauses with optional 'present' modifier.
(cp_parser_omp_all_clauses): Update call.
(cp_parser_omp_target_data, cp_parser_omp_target_enter_data,
cp_parser_omp_target_exit_data): Handle new enum value for
'present' mapping.
* semantics.cc (finish_omp_target): Likewise.
gcc/fortran/
* dump-parse-tree.cc (show_omp_namelist): Display 'present' map
modifier.
(show_omp_clauses): Display 'present' motion modifier for 'to'
and 'from' clauses.
* gfortran.h (enum gfc_omp_map_op): Add entries with 'present'
modifiers.
(struct gfc_omp_namelist): Add 'present_modifer'.
* openmp.cc (gfc_match_motion_var_list): New, handles optional
'present' modifier for to/from clauses.
(gfc_match_omp_clauses): Call it for to/from clauses; parse 'present'
in defaultmap and map clauses.
(resolve_omp_clauses): Allow 'present' modifiers on 'target',
'target data', 'target enter' and 'target exit' directives.
* trans-openmp.cc (gfc_trans_omp_clauses): Apply 'present' modifiers
to tree node for 'map', 'to' and 'from' clauses. Apply 'present' for
defaultmap.
gcc/
* gimplify.cc (omp_notice_variable): Apply GOVD_MAP_ALLOC_ONLY flag
and defaultmap flags if the defaultmap has GOVD_MAP_FORCE_PRESENT flag
set.
(omp_get_attachment): Handle map clauses with 'present' modifier.
(omp_group_base): Likewise.
(gimplify_scan_omp_clauses): Reorder present maps to come first.
Set GOVD flags for present defaultmaps.
(gimplify_adjust_omp_clauses_1): Set map kind for present defaultmaps.
* omp-low.cc (scan_sharing_clauses): Handle 'always, present' map
clauses.
(lower_omp_target): Handle map clauses with 'present' modifier.
Handle 'to' and 'from' clauses with 'present'.
* tree-core.h (enum omp_clause_defaultmap_kind): Add
OMP_CLAUSE_DEFAULTMAP_PRESENT defaultmap kind.
* tree-pretty-print.cc (dump_omp_clause): Handle 'map', 'to' and
'from' clauses with 'present' modifier. Handle present defaultmap.
* tree.h (OMP_CLAUSE_MOTION_PRESENT): New #define.
include/
* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_5): New.
(GOMP_MAP_FLAG_FORCE): Redefine.
(GOMP_MAP_FLAG_PRESENT, GOMP_MAP_FLAG_ALWAYS_PRESENT): New.
(enum gomp_map_kind): Add map kinds with 'present' modifiers.
(GOMP_MAP_COPY_TO_P, GOMP_MAP_COPY_FROM_P): Evaluate to true for
map variants with 'present'
(GOMP_MAP_ALWAYS_TO_P, GOMP_MAP_ALWAYS_FROM_P): Evaluate to true
for map variants with 'always, present' modifiers.
(GOMP_MAP_ALWAYS): Redefine.
(GOMP_MAP_FORCE_P, GOMP_MAP_PRESENT_P): New.
libgomp/
* libgomp.texi (OpenMP 5.1 Impl. status): Set 'present' support for
defaultmap to 'Y', add 'Y' entry for 'present' on to/from/map clauses.
* target.c (gomp_to_device_kind_p): Add map kinds with 'present'
modifier.
(gomp_map_vars_existing): Use new GOMP_MAP_FORCE_P macro.
(gomp_map_vars_internal, gomp_update, gomp_target_rev):
Emit runtime error if memory region not present.
* testsuite/libgomp.c-c++-common/target-present-1.c: New test.
* testsuite/libgomp.c-c++-common/target-present-2.c: New test.
* testsuite/libgomp.c-c++-common/target-present-3.c: New test.
* testsuite/libgomp.fortran/target-present-1.f90: New test.
* testsuite/libgomp.fortran/target-present-2.f90: New test.
* testsuite/libgomp.fortran/target-present-3.f90: New test.
gcc/testsuite/
* c-c++-common/gomp/map-6.c: Update dg-error, extend to test for
duplicated 'present' and extend scan-dump tests for 'present'.
* gfortran.dg/gomp/defaultmap-1.f90: Update dg-error.
* gfortran.dg/gomp/map-7.f90: Extend parse and dump test for
'present'.
* gfortran.dg/gomp/map-8.f90: Extend for duplicate 'present'
modifier checking.
* c-c++-common/gomp/defaultmap-4.c: New test.
* c-c++-common/gomp/map-9.c: New test.
* c-c++-common/gomp/target-update-1.c: New test.
* gfortran.dg/gomp/defaultmap-8.f90: New test.
* gfortran.dg/gomp/map-11.f90: New test.
* gfortran.dg/gomp/map-12.f90: New test.
* gfortran.dg/gomp/target-update-1.f90: New test.
... by using the existing 'goacc_asyncqueue' instead of re-coding parts of it.
Follow-up to commit 131d18e928
"libgomp/nvptx: Prepare for reverse-offload callback handling",
and commit ea4b23d9c8
"libgomp: Handle OpenMP's reverse offloads".
libgomp/
* target.c (gomp_target_rev): Instead of 'dev_to_host_cpy',
'host_to_dev_cpy', 'token', take a single 'goacc_asyncqueue'.
* libgomp.h (gomp_target_rev): Adjust.
* libgomp-plugin.c (GOMP_PLUGIN_target_rev): Adjust.
* libgomp-plugin.h (GOMP_PLUGIN_target_rev): Adjust.
* plugin/plugin-gcn.c (process_reverse_offload): Adjust.
* plugin/plugin-nvptx.c (rev_off_dev_to_host_cpy)
(rev_off_host_to_dev_cpy): Remove.
(GOMP_OFFLOAD_run): Adjust.
Thereby considerably simplify the device plugins' 'GOMP_OFFLOAD_openacc_exec',
'GOMP_OFFLOAD_openacc_async_exec' functions: in terms of lines of code, but in
particular conceptually: no more device memory allocation, host to device data
copying, device memory deallocation -- 'GOMP_MAP_VARS_TARGET' does all that for
us.
This depends on commit 2b2340e236
"Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data",
where I said that "a use will emerge later", which is this one here.
PR libgomp/90596
libgomp/
* target.c (gomp_map_vars_internal): Allow for
'param_kind == GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_TARGET'.
* oacc-parallel.c (GOACC_parallel_keyed): Pass
'GOMP_MAP_VARS_TARGET' to 'goacc_map_vars'.
* plugin/plugin-gcn.c (alloc_by_agent, gcn_exec)
(GOMP_OFFLOAD_openacc_exec, GOMP_OFFLOAD_openacc_async_exec):
Adjust, simplify.
(gomp_offload_free): Remove.
* plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec)
(GOMP_OFFLOAD_openacc_async_exec): Adjust, simplify.
(cuda_free_argmem): Remove.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
Adjust.
This does *allow*, but under no circumstances is this currently going to be
used: all potentially applicable data is non-'ephemeral', and thus not
considered for 'gomp_coalesce_buf_add' for OpenACC 'async'. (But a use will
emerge later.)
Follow-up to commit r12-2530-gd88a6951586c7229b25708f4486eaaf4bf4b5bbe
"Don't use libgomp 'cbuf' buffering with OpenACC 'async'", addressing this
TODO comment:
TODO ... but we could allow CBUF usage for EPHEMERAL data? (Open question:
is it more performant to use libgomp CBUF buffering or individual device
asyncronous copying?)
Ephemeral data is small, and therefore individual device asyncronous copying
does seem dubious -- in particular given that for all those, we'd individually
have to allocate and queue for deallocation a temporary buffer to capture the
ephemeral data. Instead, just let the 'cbuf' *be* the temporary buffer.
libgomp/
* target.c (gomp_copy_host2dev, gomp_map_vars_internal): Allow
libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral'
data.
For 'OFFSET_INLINED', 'gomp_map_val' does the right thing, and we may then
simplify the device plugins accordingly.
This is a follow-up to
Subversion r279551 (Git commit a6163563f2)
"Add OpenACC 2.6's no_create",
Subversion r279622 (Git commit 5bcd470bf0)
"Use gomp_map_val for OpenACC host-to-device address translation".
libgomp/
* target.c (gomp_map_vars_internal): Use 'OFFSET_INLINED' for
'GOMP_MAP_IF_PRESENT'.
* plugin/plugin-gcn.c (gcn_exec, GOMP_OFFLOAD_openacc_exec)
(GOMP_OFFLOAD_openacc_async_exec): Adjust.
* plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec)
(GOMP_OFFLOAD_openacc_async_exec): Likewise.
* testsuite/libgomp.oacc-c-c++-common/no_create-1.c: Add 'async'
testing.
* testsuite/libgomp.oacc-c-c++-common/no_create-2.c: Likewise.
libgomp/
* target.c (gomp_target_rev): Dereference ptr
to get device address.
* testsuite/libgomp.fortran/reverse-offload-5.f90: Add test
for unallocated allocatable.
As GOMP_MAP_ALWAYS_POINTER operates on the previous map item, ensure that
with 'target enter data' both are passed together to gomp_map_vars_internal.
libgomp/ChangeLog:
* target.c (gomp_map_vars_internal): Add 'i > 0' before doing a
kind check.
(GOMP_target_enter_exit_data): If the next map item is
GOMP_MAP_ALWAYS_POINTER map it together with the current item.
* testsuite/libgomp.fortran/target-enter-data-3.f90: New test.
If there is nothing to map, skip the mapping and avoid attempting to
copy 0 bytes from addrs, sizes and kinds.
Additionally, it could happen that a non-allocated address was deallocated,
such as a pointer set, leading to a free for the actual data.
libgomp/
* target.c (gomp_target_rev): Handle mapnum == 0 and avoid
freeing not allocated memory.
* testsuite/libgomp.fortran/reverse-offload-6.f90: New test.
This commit enabled reverse offload for nvptx such that gomp_target_rev
actually gets called. And it fills the latter function to do all of
the following: finding the host function to the device func ptr and
copying the arguments to the host, processing the mapping/firstprivate,
calling the host function, copying back the data and freeing as needed.
The data handling is made easier by assuming that all host variables
either existed before (and are in the mapping) or that those are
devices variables not yet available on the host. Thus, the reverse
mapping can do without refcounts etc. Note that the spec disallows
inside a target region device-affecting constructs other than target
plus ancestor device-modifier and it also limits the clauses permitted
on this construct.
For the function addresses, an additional splay tree is used; for
the lookup of mapped variables, the existing splay-tree is used.
Unfortunately, its data structure requires a full walk of the tree;
Additionally, the just mapped variables are recorded in a separate
data structure an extra lookup. While the lookup is slow, assuming
that only few variables get mapped in each reverse offload construct
and that reverse offload is the exception and not performance critical,
this seems to be acceptable.
libgomp/ChangeLog:
* libgomp.h (struct target_mem_desc): Predeclare; move
below after 'reverse_splay_tree_node' and add rev_array
member.
(struct reverse_splay_tree_key_s, reverse_splay_compare): New.
(reverse_splay_tree_node, reverse_splay_tree,
reverse_splay_tree_key): New typedef.
(struct gomp_device_descr): Add mem_map_rev member.
* oacc-host.c (host_dispatch): NULL init .mem_map_rev.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim
support for GOMP_REQUIRES_REVERSE_OFFLOAD.
* splay-tree.h (splay_tree_callback_stop): New typedef; like
splay_tree_callback but returning int not void.
(splay_tree_foreach_lazy): Define; like splay_tree_foreach but
taking splay_tree_callback_stop as argument.
* splay-tree.c (splay_tree_foreach_internal_lazy,
splay_tree_foreach_lazy): New; but early exit if callback returns
nonzero.
* target.c: Instatiate splay_tree_c with splay_tree_prefix 'reverse'.
(gomp_map_lookup_rev): New.
(gomp_load_image_to_device): Handle reverse-offload function
lookup table.
(gomp_unload_image_from_device): Free devicep->mem_map_rev.
(struct gomp_splay_tree_rev_lookup_data, gomp_splay_tree_rev_lookup,
gomp_map_rev_lookup, struct cpy_data, gomp_map_cdata_lookup_int,
gomp_map_cdata_lookup): New auxiliary structs and functions for
gomp_target_rev.
(gomp_target_rev): Implement reverse offloading and its mapping.
(gomp_target_init): Init current_device.mem_map_rev.root.
* testsuite/libgomp.fortran/reverse-offload-2.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-3.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-4.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-5.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-5a.f90: New test without
mapping of on-device allocated variables.
This patch adds support for omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of
device-specific ICV values (specified as environment variables or changed on a
device). In order to reuse device-specific ICV values, a copy back mechanism is
implemented that copies ICV values back from device to the host.
Additionally, a limitation of the number of teams on gcn offload devices is
implemented. The number of teams is limited by twice the number of compute
units (one team is executed on one compute unit). This avoids queueing
unnessecary many teams and a corresponding allocation of large amounts of
memory. Without that limitation the memory allocation for a large number of
user-specified teams can result in an "memory access fault".
A limitation of the number of teams is already also implemented for nvptx
devices (see nvptx_adjust_launch_bounds in libgomp/plugin/plugin-nvptx.c).
gcc/ChangeLog:
* gimplify.cc (optimize_target_teams): Set initial num_teams_upper
to "-2" instead of "1" for non-existing num_teams clause in order to
disambiguate from the case of an existing num_teams clause with value 1.
libgomp/ChangeLog:
* config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
allow processing of device-specific values.
(omp_set_teams_thread_limit): Likewise.
(ialias): Likewise.
* config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
(omp_set_teams_thread_limit): Likewise.
(ialias): Likewise.
* icv-device.c (omp_get_teams_thread_limit): Likewise.
(ialias): Likewise.
(omp_set_teams_thread_limit): Likewise.
* icv.c (omp_set_teams_thread_limit): Removed.
(omp_get_teams_thread_limit): Likewise.
(ialias): Likewise.
* libgomp.texi: Updated documentation for nvptx and gcn corresponding
to the limitation of the number of teams.
* plugin/plugin-gcn.c (limit_teams): New helper function that limits
the number of teams by twice the number of compute units.
(parse_target_attributes): Limit the number of teams on gcn offload
devices.
* target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
handling.
(gomp_load_image_to_device): Added a size check for the ICVs struct
variable.
(gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
copy back the ICV values from device to host.
(GOMP_target_ext): Update the number of teams and threads in the kernel
args also considering device-specific values.
* testsuite/libgomp.c-c++-common/icv-4.c: Fixed an error in the reading
of OMP_TEAMS_THREAD_LIMIT from the environment.
* testsuite/libgomp.c-c++-common/icv-5.c: Extended.
* testsuite/libgomp.c-c++-common/icv-6.c: Extended.
* testsuite/libgomp.c-c++-common/icv-7.c: Extended.
* testsuite/libgomp.c-c++-common/icv-9.c: New test.
* testsuite/libgomp.fortran/icv-5.f90: New test.
* testsuite/libgomp.fortran/icv-6.f90: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/target-teams-1.c: Adapt expected values for
num_teams from "1" to "-2" in cases without num_teams clause.
* g++.dg/gomp/target-teams-1.C: Likewise.
* gfortran.dg/gomp/defaultmap-4.f90: Likewise.
* gfortran.dg/gomp/defaultmap-5.f90: Likewise.
* gfortran.dg/gomp/defaultmap-6.f90: Likewise.
This patch adds a stub 'gomp_target_rev' in the host's target.c, which will
later handle the reverse offload.
For nvptx, it adds support for forwarding the offload gomp_target_ext call
to the host by setting values in a struct on the device and querying it on
the host - invoking gomp_target_rev on the result.
include/ChangeLog:
* cuda/cuda.h (enum CUdevice_attribute): Add
CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING.
(CU_MEMHOSTALLOC_DEVICEMAP): Define.
(cuMemHostAlloc): Add prototype.
libgomp/ChangeLog:
* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Remove
'static' for this variable.
* config/nvptx/libgomp-nvptx.h: New file.
* config/nvptx/target.c: Include it.
(GOMP_ADDITIONAL_ICVS): Declare extern var.
(GOMP_REV_OFFLOAD_VAR): Declare var.
(GOMP_target_ext): Handle reverse offload.
* libgomp-plugin.h (GOMP_PLUGIN_target_rev): New prototype.
* libgomp-plugin.c (GOMP_PLUGIN_target_rev): New, call ...
* target.c (gomp_target_rev): ... this new stub function.
* libgomp.h (gomp_target_rev): Declare.
* libgomp.map (GOMP_PLUGIN_1.4): New; add GOMP_PLUGIN_target_rev.
* plugin/cuda-lib.def (cuMemHostAlloc): Add.
* plugin/plugin-nvptx.c: Include libgomp-nvptx.h.
(struct ptx_device): Add rev_data member.
(nvptx_open_device): Remove async_engines query, last used in
r10-304-g1f4c5b9b; add unified-address assert check.
(GOMP_OFFLOAD_get_num_devices): Claim unified address
support.
(GOMP_OFFLOAD_load_image): Free rev_fn_table if no
offload functions exist. Make offload var available
on host and device.
(rev_off_dev_to_host_cpy, rev_off_host_to_dev_cpy): New.
(GOMP_OFFLOAD_run): Handle reverse offload.
When not in explicit parallel/target/teams construct, we in some cases create
an artificial parallel with a single thread (either to handle target nowait
or for task reduction purposes). In those cases, it handled again artificially
created implicit task (created by gomp_new_icv for cases where we needed to write
to some ICVs), but as the testcases show, didn't take into account possibility
of this being done from explicit task(s). The code would destroy/free the previous
task and replace it with the new implicit task. If task is an explicit task
(when teams is NULL, all explicit tasks behave like if (0)), it is a pointer to
a local stack variable, so freeing it doesn't work, and additionally we shouldn't
lose the explicit tasks - the new implicit task should instead replace the
ancestor task which is the first implicit one.
2022-10-12 Jakub Jelinek <jakub@redhat.com>
* task.c (gomp_create_artificial_team): Fix up handling of invocations
from within explicit task.
* target.c (GOMP_target_ext): Likewise.
* testsuite/libgomp.c/task-7.c: New test.
* testsuite/libgomp.c/task-8.c: New test.
* testsuite/libgomp.c-c++-common/task-reduction-17.c: New test.
* testsuite/libgomp.c-c++-common/task-reduction-18.c: New test.
Prepare for reverse-offloading function-pointer lookup by passing
a rev_fn_table argument to GOMP_OFFLOAD_load_image.
The argument will be NULL, unless GOMP_REQUIRES_REVERSE_OFFLOAD is
requested and devices not supported it, are filtered out.
(Up to and including this commit, no non-host device claims such
support and the caller currently always passes NULL.)
libgomp/ChangeLog:
* libgomp-plugin.h (GOMP_OFFLOAD_load_image): Add
'uint64_t **rev_fn_table' argument.
* oacc-host.c (host_load_image): Likewise.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Likewise;
currently unused.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
* target.c (gomp_load_image_to_device): Update call but pass
NULL for now.
liboffloadmic/ChangeLog:
* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_load_image):
Add (unused) uint64_t **rev_fn_table argument.
This patch considers the environment variable syntax extension for
device-specific variants of environment variables from OpenMP 5.1 (see
OpenMP 5.1 specification, p. 75 and p. 639). An environment variable (e.g.
OMP_NUM_TEAMS) can have different suffixes:
_DEV (e.g. OMP_NUM_TEAMS_DEV): affects all devices but not the host.
_DEV_<device> (e.g. OMP_NUM_TEAMS_DEV_42): affects only device with
number <device>.
no suffix (e.g. OMP_NUM_TEAMS): affects only the host.
In future OpenMP versions also suffix _ALL will be introduced (see discussion
https://github.com/OpenMP/spec/issues/3179). This is also considered in this
patch:
_ALL (e.g. OMP_NUM_TEAMS_ALL): affects all devices and the host.
The precedence is as follows (descending). For the host:
1. no suffix
2. _ALL
For devices:
1. _DEV_<device>
2. _DEV
3. _ALL
That means, _DEV_<device> is used whenever available. Otherwise _DEV is used if
available, and at last _ALL. If there is no value for any of the variable
variants, default values are used as already implemented before.
This patch concerns parsing (a), storing (b), output (c) and transmission to the
device (d):
(a) The actual number of devices and the numbering are not known when parsing
the environment variables. Thus all environment variables are iterated and
searched for device-specific ones.
(b) Only configured device-specific variables are stored. Thus, a linked list
is used.
(c) The output is done in omp_display_env (see specification p. 468f). Global
ICVs are tagged with [all], see https://github.com/OpenMP/spec/issues/3179.
ICVs which are not global but aren't handled device-specific yet are tagged
with [host]. omp_display_env outputs the initial values of the ICVs. That is
why a dedicated data structure is introduced for the inital values only
(gomp_initial_icv_list).
(d) Device-specific ICVs are transmitted to the device via GOMP_ADDITIONAL_ICVS.
libgomp/ChangeLog:
* config/gcn/icv-device.c (omp_get_default_device): Return device-
specific ICV.
(omp_get_max_teams): Added for GCN devices.
(omp_set_num_teams): Likewise.
(ialias): Likewise.
* config/nvptx/icv-device.c (omp_get_default_device): Return device-
specific ICV.
(omp_get_max_teams): Added for NVPTX devices.
(omp_set_num_teams): Likewise.
(ialias): Likewise.
* env.c (struct gomp_icv_list): New struct to store entries of initial
ICV values.
(struct gomp_offload_icv_list): New struct to store entries of device-
specific ICV values that are copied to the device and back.
(struct gomp_default_icv_values): New struct to store default values of
ICVs according to the OpenMP standard.
(parse_schedule): Generalized for different variants of OMP_SCHEDULE.
(print_env_var_error): Function that prints an error for invalid values
for ICVs.
(parse_unsigned_long_1): Removed getenv. Generalized.
(parse_unsigned_long): Likewise.
(parse_int_1): Likewise.
(parse_int): Likewise.
(parse_int_secure): Likewise.
(parse_unsigned_long_list): Likewise.
(parse_target_offload): Likewise.
(parse_bind_var): Likewise.
(parse_stacksize): Likewise.
(parse_boolean): Likewise.
(parse_wait_policy): Likewise.
(parse_allocator): Likewise.
(omp_display_env): Extended to output different variants of environment
variables.
(print_schedule): New helper function for omp_display_env which prints
the values of run_sched_var.
(print_proc_bind): New helper function for omp_display_env which prints
the values of proc_bind_var.
(enum gomp_parse_type): Collection of types used for parsing environment
variables.
(ENTRY): Preprocess string lengths of environment variables.
(OMP_VAR_CNT): Preprocess table size.
(OMP_HOST_VAR_CNT): Likewise.
(INT_MAX_STR_LEN): Constant for the maximal number of digits of a device
number.
(gomp_get_icv_flag): Returns if a flag for a particular ICV is set.
(gomp_set_icv_flag): Sets a flag for a particular ICV.
(print_device_specific_icvs): New helper function for omp_display_env to
print device specific ICV values.
(get_device_num): New helper function for parse_device_specific.
Extracts the device number from an environment variable name.
(get_icv_member_addr): Gets the memory address for a particular member
of an ICV struct.
(gomp_get_initial_icv_item): Get a list item of gomp_initial_icv_list.
(initialize_icvs): New function to initialize a gomp_initial_icvs
struct.
(add_initial_icv_to_list): Adds an ICV struct to gomp_initial_icv_list.
(startswith): Checks if a string starts with a given prefix.
(initialize_env): Extended to parse the new syntax of environment
variables.
* icv-device.c (omp_get_max_teams): Added.
(ialias): Likewise.
(omp_set_num_teams): Likewise.
* icv.c (omp_set_num_teams): Moved to icv-device.c.
(omp_get_max_teams): Likewise.
(ialias): Likewise.
* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Removed.
(GOMP_ADDITIONAL_ICVS): New target-side struct that
holds the designated ICVs of the target device.
* libgomp.h (enum gomp_icvs): Collection of ICVs.
(enum gomp_device_num): Definition of device numbers for _ALL, _DEV, and
no suffix.
(enum gomp_env_suffix): Collection of possible suffixes of environment
variables.
(struct gomp_initial_icvs): Contains all ICVs for which we need to store
initial values.
(struct gomp_default_icv):New struct to hold ICVs for which we need
to store initial values.
(struct gomp_icv_list): Definition of a linked list that is used for
storing ICVs for the devices and also for _DEV, _ALL, and without
suffix.
(struct gomp_offload_icvs): New struct to hold ICVs that are copied to
a device.
(struct gomp_offload_icv_list): Definition of a linked list that holds
device-specific ICVs that are copied to devices.
(gomp_get_initial_icv_item): Get a list item of gomp_initial_icv_list.
(gomp_get_icv_flag): Returns if a flag for a particular ICV is set.
* libgomp.texi: Updated.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Extended to read
further ICVs from the offload image.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
* target.c (gomp_get_offload_icv_item): Get a list item of
gomp_offload_icv_list.
(get_gomp_offload_icvs): New. Returns the ICV values
depending on the device num and the variable hierarchy.
(gomp_load_image_to_device): Extended to copy further ICVs to a device.
* testsuite/libgomp.c-c++-common/icv-5.c: New test.
* testsuite/libgomp.c-c++-common/icv-6.c: New test.
* testsuite/libgomp.c-c++-common/icv-7.c: New test.
* testsuite/libgomp.c-c++-common/icv-8.c: New test.
* testsuite/libgomp.c-c++-common/omp-display-env-1.c: New test.
* testsuite/libgomp.c-c++-common/omp-display-env-2.c: New test.
Contrary to gomp_{error,warning,fatal}, no tailing '\n' is added with
gomp_debug; only affected was a 'requires'-related output.
libgomp/ChangeLog:
* target.c (gomp_target_init): Added tailing '\n' to gomp_debug.
Similar to how the other 'mkoffload's got changed in
recent commit 683f118439
"OpenMP: Move omp requires checks to libgomp".
This also means finally switching Intel MIC 'mkoffload' to
'GOMP_offload_register_ver', 'GOMP_offload_unregister_ver',
making 'GOMP_offload_register', 'GOMP_offload_unregister'
legacy entry points.
gcc/
* config/i386/intelmic-mkoffload.cc (generate_host_descr_file)
(prepare_target_image, main): Handle OpenMP 'requires'.
(generate_host_descr_file): Switch to 'GOMP_offload_register_ver',
'GOMP_offload_unregister_ver'.
libgomp/
* target.c (GOMP_offload_register, GOMP_offload_unregister):
Denote as legacy entry points.
* testsuite/lib/libgomp.exp
(check_effective_target_offload_target_any): New proc.
* testsuite/libgomp.c-c++-common/requires-1.c: Enable for
'offload_target_any'.
* testsuite/libgomp.c-c++-common/requires-3.c: Likewise.
* testsuite/libgomp.c-c++-common/requires-7.c: Likewise.
* testsuite/libgomp.fortran/requires-1.f90: Likewise.
Handle reverse_offload, unified_address, and unified_shared_memory
requirements in libgomp by saving them alongside the offload table.
When the device lto1 runs, it extracts the data for mkoffload. The
latter than passes the value on to GOMP_offload_register_ver.
lto1 (either the host one, with -flto [+ ENABLE_OFFLOADING], or in the
offload-device lto1) also does the the consistency check is done,
erroring out when the 'omp requires' clause use is inconsistent.
For all in-principle supported devices, if a requirement cannot be fulfilled,
the device is excluded from the (supported) devices list. Currently, none of
those requirements are marked as supported for any of the non-host devices.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_target_data, c_parser_omp_target_update,
c_parser_omp_target_enter_data, c_parser_omp_target_exit_data): Set
OMP_REQUIRES_TARGET_USED.
(c_parser_omp_requires): Remove sorry.
gcc/ChangeLog:
* config/gcn/mkoffload.cc (process_asm): Write '#include <stdint.h>'.
(process_obj): Pass omp_requires_mask to GOMP_offload_register_ver.
(main): Ask lto1 to obtain omp_requires_mask and pass it on.
* config/nvptx/mkoffload.cc (process, main): Likewise.
* lto-cgraph.cc (omp_requires_to_name): New.
(input_offload_tables): Save omp_requires_mask.
(output_offload_tables): Read it, check for consistency,
save value for mkoffload.
* omp-low.cc (lower_omp_target): Force output_offloadtables
call for OMP_REQUIRES_TARGET_USED.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_target_data,
cp_parser_omp_target_enter_data, cp_parser_omp_target_exit_data,
cp_parser_omp_target_update): Set OMP_REQUIRES_TARGET_USED.
(cp_parser_omp_requires): Remove sorry.
gcc/fortran/ChangeLog:
* openmp.cc (gfc_match_omp_requires): Remove sorry.
* parse.cc (decode_omp_directive): Don't regard 'declare target'
as target usage for 'omp requires'; add more flags to
omp_requires_mask.
include/ChangeLog:
* gomp-constants.h (GOMP_VERSION): Bump to 2.
(GOMP_REQUIRES_UNIFIED_ADDRESS, GOMP_REQUIRES_UNIFIED_SHARED_MEMORY,
GOMP_REQUIRES_REVERSE_OFFLOAD, GOMP_REQUIRES_TARGET_USED):
New defines.
libgomp/ChangeLog:
* libgomp-plugin.h (GOMP_OFFLOAD_get_num_devices): Add
omp_requires_mask arg.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Likewise;
return -1 when device available but omp_requires_mask != 0.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Likewise.
* oacc-host.c (host_get_num_devices, host_openacc_get_property):
Update call.
* oacc-init.c (resolve_device, acc_init_1, acc_shutdown_1,
goacc_attach_host_thread_to_device, acc_get_num_devices,
acc_set_device_num, get_property_any): Likewise.
* target.c (omp_requires_mask): New global var.
(gomp_requires_to_name): New.
(GOMP_offload_register_ver): Handle passed omp_requires_mask.
(gomp_target_init): Handle omp_requires_mask.
* libgomp.texi (OpenMP 5.0): Update requires impl. status.
(OpenMP 5.1): Add a missed item.
(OpenMP 5.2): Mark linear-clause change as supported in C/C++.
* testsuite/libgomp.c-c++-common/requires-1-aux.c: New test.
* testsuite/libgomp.c-c++-common/requires-1.c: New test.
* testsuite/libgomp.c-c++-common/requires-2-aux.c: New test.
* testsuite/libgomp.c-c++-common/requires-2.c: New test.
* testsuite/libgomp.c-c++-common/requires-3-aux.c: New test.
* testsuite/libgomp.c-c++-common/requires-3.c: New test.
* testsuite/libgomp.c-c++-common/requires-4-aux.c: New test.
* testsuite/libgomp.c-c++-common/requires-4.c: New test.
* testsuite/libgomp.c-c++-common/requires-5-aux.c: New test.
* testsuite/libgomp.c-c++-common/requires-5.c: New test.
* testsuite/libgomp.c-c++-common/requires-6.c: New test.
* testsuite/libgomp.c-c++-common/requires-7-aux.c: New test.
* testsuite/libgomp.c-c++-common/requires-7.c: New test.
* testsuite/libgomp.fortran/requires-1-aux.f90: New test.
* testsuite/libgomp.fortran/requires-1.f90: New test.
liboffloadmic/ChangeLog:
* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_num_devices):
Return -1 when device available but omp_requires_mask != 0.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/requires-4.c: Update dg-*.
* c-c++-common/gomp/reverse-offload-1.c: Likewise.
* c-c++-common/gomp/target-device-ancestor-2.c: Likewise.
* c-c++-common/gomp/target-device-ancestor-3.c: Likewise.
* c-c++-common/gomp/target-device-ancestor-4.c: Likewise.
* c-c++-common/gomp/target-device-ancestor-5.c: Likewise.
* gfortran.dg/gomp/target-device-ancestor-3.f90: Likewise.
* gfortran.dg/gomp/target-device-ancestor-4.f90: Likewise.
* gfortran.dg/gomp/target-device-ancestor-5.f90: Likewise.
* gfortran.dg/gomp/target-device-ancestor-2.f90: Likewise. Move
post-FE checks to ...
* gfortran.dg/gomp/target-device-ancestor-2a.f90: ... this new file.
* gfortran.dg/gomp/requires-8.f90: Update as we don't regard
'declare target' for the 'requires' usage requirement.
Co-authored-by: Chung-Lin Tang <cltang@codesourcery.com>
Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
OpenMP 5.2 changed once more what device numbers are allowed.
In 5.1, valid device numbers were [0, omp_get_num_devices()].
5.2 makes also -1 valid (calls it omp_initial_device), which is equivalent
in behavior to omp_get_num_devices() number but has the advantage that it
is a constant. And it also introduces omp_invalid_device which is
also a constant with implementation defined value < -1. That value should
act like sNaN, any time any device construct (GOMP_target*) or OpenMP runtime
API routine is asked for such a device, the program is terminated.
And if OMP_TARGET_OFFLOAD=mandatory, all non-conforming device numbers (which
is all but [-1, omp_get_num_devices()] other than omp_invalid_device)
must be treated like omp_invalid_device.
For device constructs, we have a compatibility problem, we've historically
used 2 magic negative values to mean something special.
GOMP_DEVICE_ICV (-1) means device clause wasn't present, pick the
omp_get_default_device () number
GOMP_DEVICE_FALLBACK (-2) means the host device (this is used e.g. for
#pragma omp target if (cond)
where if cond is false, we pass -2
But 5.2 requires that omp_initial_device is -1 (there were discussions
about it, advantage of -1 is that one can say iterate over the
[-1, omp_get_num_devices()-1] range to get all devices starting with
the host/initial one.
And also, if user passes -2, unless it is omp_invalid_device, we need to
treat it like non-conforming with OMP_TARGET_OFFLOAD=mandatory.
So, the patch does on the compiler side some number remapping,
user_device_num >= -2U ? user_device_num - 1 : user_device_num.
This remapping is done at compile time if device clause has constant
argument, otherwise at runtime, and means that for user -1 (omp_initial_device)
we pass -2 to GOMP_* in the runtime library where it treats it like host
fallback, while -2 is remapped to -3 (one of the non-conforming device numbers,
for those it doesn't matter which one is which).
omp_invalid_device is then -4.
For the OpenMP device runtime APIs, no remapping is done.
This patch doesn't deal with the initial default-device-var for
OMP_TARGET_OFFLOAD=mandatory , the spec says that the inital ICV value
for that should in that case depend on whether there are any offloading
devices or not (if not, should be omp_invalid_device), but that means
we can't determine the number of devices lazily (and let libraries have the
possibility to register their offloading data etc.).
2022-06-13 Jakub Jelinek <jakub@redhat.com>
gcc/
* omp-expand.cc (expand_omp_target): Remap user provided
device clause arguments, -1 to -2 and -2 to -3, either
at compile time if constant, or at runtime.
include/
* gomp-constants.h (GOMP_DEVICE_INVALID): Define.
libgomp/
* omp.h.in (omp_initial_device, omp_invalid_device): New enumerators.
* omp_lib.f90.in (omp_initial_device, omp_invalid_device): New
parameters.
* omp_lib.h.in (omp_initial_device, omp_invalid_device): Likewise.
* target.c (resolve_device): Add remapped argument, handle
GOMP_DEVICE_ICV only if remapped is true (and clear remapped),
for negative values, treat GOMP_DEVICE_FALLBACK as fallback only
if remapped, otherwise treat omp_initial_device that way. For
omp_invalid_device, always emit gomp_fatal, even when
OMP_TARGET_OFFLOAD isn't mandatory.
(GOMP_target, GOMP_target_ext, GOMP_target_data, GOMP_target_data_ext,
GOMP_target_update, GOMP_target_update_ext,
GOMP_target_enter_exit_data): Pass true as remapped argument to
resolve_device.
(omp_target_alloc, omp_target_free, omp_target_is_present,
omp_target_memcpy_check, omp_target_associate_ptr,
omp_target_disassociate_ptr, omp_get_mapped_ptr,
omp_target_is_accessible): Pass false as remapped argument to
resolve_device. Treat omp_initial_device the same as
gomp_get_num_devices (). Don't bypass resolve_device calls if
device_num is negative.
(omp_pause_resource): Treat omp_initial_device the same as
gomp_get_num_devices (). Call resolve_device.
* icv-device.c (omp_set_default_device): Always set to device_num
even when it is negative.
* libgomp.texi: Document that Conforming device numbers,
omp_initial_device and omp_invalid_device is implemented.
* testsuite/libgomp.c/target-41.c (main): Add test with
omp_initial_device.
* testsuite/libgomp.c/target-45.c: New test.
* testsuite/libgomp.c/target-46.c: New test.
* testsuite/libgomp.c/target-47.c: New test.
* testsuite/libgomp.c-c++-common/target-is-accessible-1.c (main): Add
test with omp_initial_device. Use -5 instead of -1 for negative value
test.
* testsuite/libgomp.fortran/target-is-accessible-1.f90 (main):
Likewise. Reorder stop numbers.
For allocatable/pointer arrays, a firstprivate to a device
not only needs to privatize the descriptor but also the actual
data. This is implemented as:
firstprivate(x) firstprivate(x.data) attach(x [bias: &x.data-&x)
where the address of x in device memory is saved in hostaddrs[i]
by libgomp and the middle end actually passes hostaddrs[i]' to
attach.
As side effect, has_device_addr(array_desc) had to be changed:
before, it was converted to firstprivate in the front end; now
it is handled in omp-low.cc as has_device_addr requires a shallow
firstprivate (not touching the data pointer) while the normal
firstprivate requires (now) a deep firstprivate.
gcc/fortran/ChangeLog:
PR fortran/104949
* f95-lang.cc (LANG_HOOKS_OMP_ARRAY_SIZE): Redefine.
* trans-openmp.cc (gfc_omp_array_size): New.
(gfc_trans_omp_variable_list): Never turn has_device_addr
to firstprivate.
* trans.h (gfc_omp_array_size): New.
gcc/ChangeLog:
PR fortran/104949
* langhooks-def.h (lhd_omp_array_size): New.
(LANG_HOOKS_OMP_ARRAY_SIZE): Define.
(LANG_HOOKS_DECLS): Add it.
* langhooks.cc (lhd_omp_array_size): New.
* langhooks.h (struct lang_hooks_for_decls): Add hook.
* omp-low.cc (scan_sharing_clauses, lower_omp_target):
Handle GOMP_MAP_FIRSTPRIVATE for array descriptors.
libgomp/ChangeLog:
PR fortran/104949
* target.c (gomp_map_vars_internal, copy_firstprivate_data):
Support attach for GOMP_MAP_FIRSTPRIVATE.
* testsuite/libgomp.fortran/target-firstprivate-1.f90: New test.
* testsuite/libgomp.fortran/target-firstprivate-2.f90: New test.
* testsuite/libgomp.fortran/target-firstprivate-3.f90: New test.
This patch adds two new OpenMP runtime routines: omp_target_memcpy_async and
omp_target_memcpy_rect_async. Both functions are introduced in OpenMP 5.1 as
asynchronous variants of omp_target_memcpy and omp_target_memcpy_rect.
In contrast to the synchronous variants, the asynchronous functions have two
additional function parameters to allow the specification of task dependences:
int depobj_count
omp_depend_t *depobj_list
integer(c_int), value :: depobj_count
integer(omp_depend_kind), optional :: depobj_list(*)
The implementation splits the synchronous functions into two parts: (a) check
and (b) copy. Then (a) is used in the asynchronous functions for the sequential
part, and the actual copy process (b) is executed in a new created task. The
sequential part (a) takes into account the requirements for the return values:
"The routine returns zero if successful. Otherwise, it returns a non-zero
value." (omp_target_memcpy_async, OpenMP 5.1 spec, section 3.8.7)
"An application can determine the number of inclusive dimensions supported by an
implementation by passing NULL pointers (or C_NULL_PTR, for Fortran) for both
dst and src. The routine returns the number of dimensions supported by the
implementation for the specified device numbers. No copy operation is
performed." (omp_target_memcpy_rect_async, OpenMP 5.1 spec, section 3.8.8)
Due to asynchronicity an error is thrown if the asynchronous memcpy is not
successful (in contrast to the synchronous functions which use a return
value unequal to zero).
gcc/ChangeLog:
* omp-low.cc (omp_runtime_api_call): Added target_memcpy_async and
target_memcpy_rect_async to omp_runtime_apis array.
libgomp/ChangeLog:
* libgomp.map: Added omp_target_memcpy_async and
omp_target_memcpy_rect_async.
* libgomp.texi: Both functions are now supported.
* omp.h.in: Added omp_target_memcpy_async and
omp_target_memcpy_rect_async.
* omp_lib.f90.in: Added interfaces for both new functions.
* omp_lib.h.in: Likewise.
* target.c (ialias_redirect): Added for GOMP_task.
(omp_target_memcpy): Restructured into check and copy part.
(omp_target_memcpy_check): New helper function for omp_target_memcpy and
omp_target_memcpy_async that checks requirements.
(omp_target_memcpy_copy): New helper function for omp_target_memcpy and
omp_target_memcpy_async that performs the memcpy.
(omp_target_memcpy_async_helper): New helper function that is used in
omp_target_memcpy_async for the asynchronous task.
(omp_target_memcpy_async): Added.
(omp_target_memcpy_rect): Restructured into check and copy part.
(omp_target_memcpy_rect_check): New helper function for
omp_target_memcpy_rect and omp_target_memcpy_rect_async that checks
requirements.
(omp_target_memcpy_rect_copy): New helper function for
omp_target_memcpy_rect and omp_target_memcpy_rect_async that performs
the memcpy.
(omp_target_memcpy_rect_async_helper): New helper function that is used
in omp_target_memcpy_rect_async for the asynchronous task.
(omp_target_memcpy_rect_async): Added.
* task.c (ialias): Added for GOMP_task.
* testsuite/libgomp.c-c++-common/target-memcpy-async-1.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-async-2.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-rect-async-1.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-rect-async-2.c: New test.
* testsuite/libgomp.fortran/target-memcpy-async-1.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-async-2.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-rect-async-1.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-rect-async-2.f90: New test.
This patch adds the 'has_device_addr' clause to the OpenMP 'target' construct
which was introduced in OpenMP 5.1 (OpenMP API 5.1 specification pp. 197ff):
has_device_addr(list)
"The has_device_addr clause indicates that its list items already have device
addresses and therefore they may be directly accessed from a target device.
If the device address of a list item is not for the device on which the target
region executes, accessing the list item inside the region results in
unspecified behavior. The list items may include array sections." (p. 200)
"A list item may not be specified in both an is_device_ptr clause and a
has_device_addr clause on the directive." (p. 202)
"A list item that appears in an is_device_ptr or a has_device_addr clause must
not be specified in any data-sharing attribute clause on the same target
construct." (p. 203)
gcc/c-family/ChangeLog:
* c-omp.cc (c_omp_split_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case.
* c-pragma.h (enum pragma_kind): Added 5.1 in comment.
(enum pragma_omp_clause): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_clause_name): Parse 'has_device_addr'
clause.
(c_parser_omp_variable_list): Handle array sections.
(c_parser_omp_clause_has_device_addr): Added.
(c_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR
case.
(c_parser_omp_target_exit_data): Added HAS_DEVICE_ADDR to
OMP_CLAUSE_MASK.
* c-typeck.cc (handle_omp_array_sections): Handle clause restrictions.
(c_finish_omp_clauses): Handle array sections.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_clause_name): Parse 'has_device_addr' clause.
(cp_parser_omp_var_list_no_open): Handle array sections.
(cp_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR
case.
(cp_parser_omp_target_update): Added HAS_DEVICE_ADDR to OMP_CLAUSE_MASK.
* semantics.cc (handle_omp_array_sections): Handle clause restrictions.
(finish_omp_clauses): Handle array sections.
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (show_omp_clauses): Added OMP_LIST_HAS_DEVICE_ADDR
case.
* gfortran.h: Added OMP_LIST_HAS_DEVICE_ADDR.
* openmp.cc (enum omp_mask2): Added OMP_CLAUSE_HAS_DEVICE_ADDR.
(gfc_match_omp_clauses): Parse HAS_DEVICE_ADDR clause.
(resolve_omp_clauses): Same.
* trans-openmp.cc (gfc_trans_omp_variable_list): Added
OMP_LIST_HAS_DEVICE_ADDR case.
(gfc_trans_omp_clauses): Firstprivatize of array descriptors.
gcc/ChangeLog:
* gimplify.cc (gimplify_scan_omp_clauses): Added cases for
OMP_CLAUSE_HAS_DEVICE_ADDR
and handle array sections.
(gimplify_adjust_omp_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case.
* omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_HAS_DEVICE_ADDR.
(lower_omp_target): Same.
* tree-core.h (enum omp_clause_code): Same.
* tree-nested.cc (convert_nonlocal_omp_clauses): Same.
(convert_local_omp_clauses): Same.
* tree-pretty-print.cc (dump_omp_clause): Same.
* tree.cc: Same.
libgomp/ChangeLog:
* libgomp.texi: Updated entry for HAS_DEVICE_ADDR.
* target.c (copy_firstprivate_data): Copy only if host address is not
NULL.
* testsuite/libgomp.c++/target-has-device-addr-2.C: New test.
* testsuite/libgomp.c++/target-has-device-addr-4.C: New test.
* testsuite/libgomp.c++/target-has-device-addr-5.C: New test.
* testsuite/libgomp.c++/target-has-device-addr-6.C: New test.
* testsuite/libgomp.c-c++-common/target-has-device-addr-1.c: New test.
* testsuite/libgomp.c/target-has-device-addr-3.c: New test.
* testsuite/libgomp.fortran/target-has-device-addr-1.f90: New test.
* testsuite/libgomp.fortran/target-has-device-addr-2.f90: New test.
* testsuite/libgomp.fortran/target-has-device-addr-3.f90: New test.
* testsuite/libgomp.fortran/target-has-device-addr-4.f90: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/clauses-1.c: Added has_device_addr to test cases.
* g++.dg/gomp/attrs-1.C: Added has_device_addr to test cases.
* g++.dg/gomp/attrs-2.C: Added has_device_addr to test cases.
* c-c++-common/gomp/target-has-device-addr-1.c: New test.
* c-c++-common/gomp/target-has-device-addr-2.c: New test.
* c-c++-common/gomp/target-is-device-ptr-1.c: New test.
* c-c++-common/gomp/target-is-device-ptr-2.c: New test.
* gfortran.dg/gomp/is_device_ptr-3.f90: New test.
* gfortran.dg/gomp/target-has-device-addr-1.f90: New test.
* gfortran.dg/gomp/target-has-device-addr-2.f90: New test.
This patch implements three pieces of functionality:
(1) Adjust array section mapping to have standards conforming behavior,
mapping array sections should *NOT* also map the base-pointer:
struct S { int *ptr; ... };
struct S s;
Instead of generating this during gimplify:
map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0])
Now, adjust to:
(i.e. do not map the base-pointer together. The attach operation is still
generated, and if s.ptr is already mapped prior, attachment will happen)
The correct way of achieving the base-pointer-also-mapped behavior would be to
use:
(A small Fortran front-end patch to trans-openmp.c:gfc_trans_omp_array_section
is also included, which removes generation of a GOMP_MAP_ALWAYS_POINTER for
array types, which appears incorrect and causes a regression in
libgomp.fortranlibgomp.fortran/struct-elem-map-1.f90)
(2) Related to the first item above, are fixes in libgomp/target.c to not
overwrite attached pointers when handling device<->host copies, mainly for the
"always" case.
(3) The third is a set of changes to the C/C++ front-ends to extend the allowed
component access syntax in map clauses. These changes are enabled for both
OpenACC and OpenMP.
gcc/c/ChangeLog:
* c-parser.c (struct omp_dim): New struct type for use inside
c_parser_omp_variable_list.
(c_parser_omp_variable_list): Allow multiple levels of array and
component accesses in array section base-pointer expression.
(c_parser_omp_clause_to): Set 'allow_deref' to true in call to
c_parser_omp_var_list_parens.
(c_parser_omp_clause_from): Likewise.
* c-typeck.c (handle_omp_array_sections_1): Extend allowed range
of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
POINTER_PLUS_EXPR.
(c_finish_omp_clauses): Extend allowed ranged of expressions
involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.
gcc/cp/ChangeLog:
* parser.c (struct omp_dim): New struct type for use inside
cp_parser_omp_var_list_no_open.
(cp_parser_omp_var_list_no_open): Allow multiple levels of array and
component accesses in array section base-pointer expression.
(cp_parser_omp_all_clauses): Set 'allow_deref' to true in call to
cp_parser_omp_var_list for to/from clauses.
* semantics.c (handle_omp_array_sections_1): Extend allowed range
of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
POINTER_PLUS_EXPR.
(handle_omp_array_sections): Adjust pointer map generation of
references.
(finish_omp_clauses): Extend allowed ranged of expressions
involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.
gcc/fortran/ChangeLog:
* trans-openmp.c (gfc_trans_omp_array_section): Do not generate
GOMP_MAP_ALWAYS_POINTER map for main array maps of ARRAY_TYPE type.
gcc/ChangeLog:
* gimplify.c (extract_base_bit_offset): Add 'tree *offsetp' parameter,
accomodate case where 'offset' return of get_inner_reference is
non-NULL.
(is_or_contains_p): Further robustify conditions.
(omp_target_reorder_clauses): In alloc/to/from sorting phase, also
move following GOMP_MAP_ALWAYS_POINTER maps along. Add new sorting
phase where we make sure pointers with an attach/detach map are ordered
correctly.
(gimplify_scan_omp_clauses): Add modifications to avoid creating
GOMP_MAP_STRUCT and associated alloc map for attach/detach maps.
gcc/testsuite/ChangeLog:
* c-c++-common/goacc/deep-copy-arrayofstruct.c: Adjust testcase.
* c-c++-common/gomp/target-enter-data-1.c: New testcase.
* c-c++-common/gomp/target-implicit-map-2.c: New testcase.
libgomp/ChangeLog:
* target.c (gomp_map_vars_existing): Make sure attached pointer is
not overwritten during cross-host/device copying.
(gomp_update): Likewise.
(gomp_exit_data): Likewise.
* testsuite/libgomp.c++/target-11.C: Adjust testcase.
* testsuite/libgomp.c++/target-12.C: Likewise.
* testsuite/libgomp.c++/target-15.C: Likewise.
* testsuite/libgomp.c++/target-16.C: Likewise.
* testsuite/libgomp.c++/target-17.C: Likewise.
* testsuite/libgomp.c++/target-21.C: Likewise.
* testsuite/libgomp.c++/target-23.C: Likewise.
* testsuite/libgomp.c/target-23.c: Likewise.
* testsuite/libgomp.c/target-29.c: Likewise.
* testsuite/libgomp.c-c++-common/target-implicit-map-2.c: New testcase.
This patch implements several C++ specific mapping capabilities introduced for
OpenMP 5.0, including implicit mapping of this[:1] for non-static member
functions, zero-length array section mapping of pointer-typed members,
lambda captured variable access in target regions, and use of lambda objects
inside target regions.
Several adjustments to the C/C++ front-ends to allow more member-access syntax
as valid is also included.
PR middle-end/92120
gcc/cp/ChangeLog:
* cp-tree.h (finish_omp_target): New declaration.
(finish_omp_target_clauses): Likewise.
* parser.c (cp_parser_omp_clause_map): Adjust call to
cp_parser_omp_var_list_no_open to set 'allow_deref' argument to true.
(cp_parser_omp_target): Factor out code, adjust into calls to new
function finish_omp_target.
* pt.c (tsubst_expr): Add call to finish_omp_target_clauses for
OMP_TARGET case.
* semantics.c (handle_omp_array_sections_1): Add handling to create
'this->member' from 'member' FIELD_DECL. Remove case of rejecting
'this' when not in declare simd.
(handle_omp_array_sections): Likewise.
(finish_omp_clauses): Likewise. Adjust to allow 'this[]' in OpenMP
map clauses. Handle 'A->member' case in map clauses. Remove case of
rejecting 'this' when not in declare simd.
(struct omp_target_walk_data): New struct for walking over
target-directive tree body.
(finish_omp_target_clauses_r): New function for tree walk.
(finish_omp_target_clauses): New function.
(finish_omp_target): New function.
gcc/c/ChangeLog:
* c-parser.c (c_parser_omp_clause_map): Set 'allow_deref' argument in
call to c_parser_omp_variable_list to 'true'.
* c-typeck.c (handle_omp_array_sections_1): Add strip of MEM_REF in
array base handling.
(c_finish_omp_clauses): Handle 'A->member' case in map clauses.
gcc/ChangeLog:
* gimplify.c ("tree-hash-traits.h"): Add include.
(gimplify_scan_omp_clauses): Change struct_map_to_clause to type
hash_map<tree_operand, tree> *. Adjust struct map handling to handle
cases of *A and A->B expressions. Under !DECL_P case of
GOMP_CLAUSE_MAP handling, add STRIP_NOPS for indir_p case, add to
struct_deref_set for map(*ptr_to_struct) cases. Add MEM_REF case when
handling component_ref_p case. Add unshare_expr and gimplification
when created GOMP_MAP_STRUCT is not a DECL. Add code to add
firstprivate pointer for *pointer-to-struct case.
(gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for
exit data directives code to earlier position.
* omp-low.c (lower_omp_target):
Handle GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
* tree-pretty-print.c (dump_omp_clause): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/gomp/target-3.c: New testcase.
* g++.dg/gomp/target-3.C: New testcase.
* g++.dg/gomp/target-lambda-1.C: New testcase.
* g++.dg/gomp/target-lambda-2.C: New testcase.
* g++.dg/gomp/target-this-1.C: New testcase.
* g++.dg/gomp/target-this-2.C: New testcase.
* g++.dg/gomp/target-this-3.C: New testcase.
* g++.dg/gomp/target-this-4.C: New testcase.
* g++.dg/gomp/target-this-5.C: New testcase.
* g++.dg/gomp/this-2.C: Adjust testcase.
include/ChangeLog:
* gomp-constants.h (enum gomp_map_kind):
Add GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
(GOMP_MAP_POINTER_P):
Include GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION.
libgomp/ChangeLog:
* libgomp.h (gomp_attach_pointer): Add bool parameter.
* oacc-mem.c (acc_attach_async): Update call to gomp_attach_pointer.
(goacc_enter_data_internal): Likewise.
* target.c (gomp_map_vars_existing): Update assert condition to
include GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION.
(gomp_map_pointer): Add 'bool allow_zero_length_array_sections'
parameter, add support for mapping a pointer with NULL target.
(gomp_attach_pointer): Add 'bool allow_zero_length_array_sections'
parameter, add support for attaching a pointer with NULL target.
(gomp_map_vars_internal): Update calls to gomp_map_pointer and
gomp_attach_pointer, add handling for
GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION cases.
* testsuite/libgomp.c++/target-23.C: New testcase.
* testsuite/libgomp.c++/target-lambda-1.C: New testcase.
* testsuite/libgomp.c++/target-lambda-2.C: New testcase.
* testsuite/libgomp.c++/target-this-1.C: New testcase.
* testsuite/libgomp.c++/target-this-2.C: New testcase.
* testsuite/libgomp.c++/target-this-3.C: New testcase.
* testsuite/libgomp.c++/target-this-4.C: New testcase.
* testsuite/libgomp.c++/target-this-5.C: New testcase.
OpenMP 5.1 says that thread_limit clause can also appear on target,
and similarly to teams should affect the thread-limit-var ICV.
On combined target teams, the clause goes to both.
We actually passed thread_limit internally on target already before,
but only used it for gcn/ptx offloading to hint how many threads should be
created and for ptx didn't set thread_limit_var in that case.
Similarly for host fallback.
Also, I found that we weren't copying the args array that contains encoded
thread_limit and num_teams clause for target (etc.) for async target.
2021-11-15 Jakub Jelinek <jakub@redhat.com>
gcc/
* gimplify.c (optimize_target_teams): Only add OMP_CLAUSE_THREAD_LIMIT
to OMP_TARGET_CLAUSES if it isn't there already.
gcc/c-family/
* c-omp.c (c_omp_split_clauses) <case OMP_CLAUSE_THREAD_LIMIT>:
Duplicate to both OMP_TARGET and OMP_TEAMS.
gcc/c/
* c-parser.c (OMP_TARGET_CLAUSE_MASK): Add
PRAGMA_OMP_CLAUSE_THREAD_LIMIT.
gcc/cp/
* parser.c (OMP_TARGET_CLAUSE_MASK): Add
PRAGMA_OMP_CLAUSE_THREAD_LIMIT.
libgomp/
* task.c (gomp_create_target_task): Copy args array as well.
* target.c (gomp_target_fallback): Add args argument.
Set gomp_icv (true)->thread_limit_var if thread_limit is present.
(GOMP_target): Adjust gomp_target_fallback caller.
(GOMP_target_ext): Likewise.
(gomp_target_task_fn): Likewise.
* config/nvptx/team.c (gomp_nvptx_main): Set
gomp_global_icv.thread_limit_var.
* testsuite/libgomp.c-c++-common/thread-limit-1.c: New test.
This patch implements relaxing the requirements when a map with the implicit
attribute encounters an overlapping existing map. As the OpenMP 5.0 spec
describes on page 320, lines 18-27 (and 5.1 spec, page 352, lines 13-22):
"If a single contiguous part of the original storage of a list item with an
implicit data-mapping attribute has corresponding storage in the device data
environment prior to a task encountering the construct that is associated with
the map clause, only that part of the original storage will have corresponding
storage in the device data environment as a result of the map clause."
2021-11-12 Chung-Lin Tang <cltang@codesourcery.com>
include/ChangeLog:
* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_3): Define special bit macro.
(GOMP_MAP_IMPLICIT): New special map kind bits value.
(GOMP_MAP_FLAG_SPECIAL_BITS): Define helper mask for whole set of
special map kind bits.
(GOMP_MAP_IMPLICIT_P): New predicate macro for implicit map kinds.
gcc/ChangeLog:
* tree.h (OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P): New access macro for
'implicit' bit, using 'base.deprecated_flag' field of tree_node.
* tree-pretty-print.c (dump_omp_clause): Add support for printing
implicit attribute in tree dumping.
* gimplify.c (gimplify_adjust_omp_clauses_1):
Set OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P to 1 if map clause is implicitly
created.
(gimplify_adjust_omp_clauses): Adjust place of adding implicitly created
clauses, from simple append, to starting of list, after non-map clauses.
* omp-low.c (lower_omp_target): Add GOMP_MAP_IMPLICIT bits into kind
values passed to libgomp for implicit maps.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/target-implicit-map-1.c: New test.
* c-c++-common/goacc/combined-reduction.c: Adjust scan test pattern.
* c-c++-common/goacc/firstprivate-mappings-1.c: Likewise.
* c-c++-common/goacc/mdc-1.c: Likewise.
* g++.dg/goacc/firstprivate-mappings-1.C: Likewise.
libgomp/ChangeLog:
* target.c (gomp_map_vars_existing): Add 'bool implicit' parameter, add
implicit map handling to allow a "superset" existing map as valid case.
(get_kind): Adjust to filter out GOMP_MAP_IMPLICIT bits in return value.
(get_implicit): New function to extract implicit status.
(gomp_map_fields_existing): Adjust arguments in calls to
gomp_map_vars_existing, and add uses of get_implicit.
(gomp_map_vars_internal): Likewise.
* testsuite/libgomp.c-c++-common/target-implicit-map-1.c: New test.
The following patch implements what I've been talking about earlier,
honor that for explicit num_teams clause we create at least the
lower-bound (if not specified, upper-bound) teams in the league.
For host fallback, it still means we only have one thread doing all the
teams, sequentially one after another.
For PTX and GCN, I think the new teams-2.c test and maybe teams-4.c too
will or might fail.
For these offloads, I think it is ok to remove symbols no longer used
from libgomp.a.
If num_teams_lower is bigger than the provided num_blocks or num_workgroups,
we should arrange for gomp_num_teams_var to be num_teams_lower - 1,
stop using the %ctaid.x or __builtin_gcn_dim_pos (0) for omp_get_team_num ()
and instead use for it some .shared var that GOMP_teams4 initializes to
%ctaid.x or __builtin_gcn_dim_pos (0) when first and for !first
increment that by num_blocks or num_workgroups each time and only
return false when we are above num_teams_lower.
Any help with actually implementing this for the 2 architectures highly
appreciated.
2021-11-12 Jakub Jelinek <jakub@redhat.com>
gcc/
* omp-builtins.def (BUILT_IN_GOMP_TEAMS): Remove.
(BUILT_IN_GOMP_TEAMS4): New.
* builtin-types.def (BT_FN_VOID_UINT_UINT): Remove.
(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
* omp-low.c (lower_omp_teams): Use GOMP_teams4 instead of
GOMP_teams, pass to it also num_teams lower-bound expression
or a dup of upper-bound if it is missing and a flag whether
it is the first call or not.
gcc/fortran/
* types.def (BT_FN_VOID_UINT_UINT): Remove.
(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
libgomp/
* libgomp_g.h (GOMP_teams4): Declare.
* libgomp.map (GOMP_5.1): Export GOMP_teams4.
* target.c (GOMP_teams4): New function.
* config/nvptx/target.c (GOMP_teams): Remove.
(GOMP_teams4): New function.
* config/gcn/target.c (GOMP_teams): Remove.
(GOMP_teams4): New function.
* testsuite/libgomp.c/teams-4.c (main): Expect exactly 2
teams instead of <= 2.
* testsuite/libgomp.c-c++-common/teams-2.c: New test.
This patch releases the device lock on a sanity-checking error path in
transfer combining (cbuf) handling in libgomp:target.c. This shouldn't
happen when handling well-formed mapping clauses, but erroneous clauses
can currently cause a hang if the condition triggers.
2021-12-10 Julian Brown <julian@codesourcery.com>
libgomp/
* target.c (gomp_copy_host2dev): Release device lock on cbuf
error path.
This patch implements the omp_get_device_num library routine, specified in
OpenMP 5.0.
GOMP_DEVICE_NUM_VAR is a macro symbol which defines name of a "device number"
variable, is defined on the device-side libgomp, has it's address returned to
host-side libgomp during device initialization, and the host libgomp then
sets its value to the designated device number.
libgomp/ChangeLog:
* icv-device.c (omp_get_device_num): New API function, host side.
* fortran.c (omp_get_device_num_): New interface function.
* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol.
* libgomp.map (OMP_5.0.2): New version space with omp_get_device_num,
omp_get_device_num_.
* libgomp.texi (omp_get_device_num): Add documentation for new API
function.
* omp.h.in (omp_get_device_num): Add declaration.
* omp_lib.f90.in (omp_get_device_num): Likewise.
* omp_lib.h.in (omp_get_device_num): Likewise.
* target.c (gomp_load_image_to_device): If additional entry for device
number exists at end of returned entries from 'load_image_func' hook,
copy the assigned device number over to the device variable.
* config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
(omp_get_device_num): New API function, device side.
* plugin/plugin-gcn.c ("symcat.h"): Add include.
(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
at end of returned 'target_table' entries.
* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
(omp_get_device_num): New API function, device side.
* plugin/plugin-nvptx.c ("symcat.h"): Add include.
(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
at end of returned 'target_table' entries.
* testsuite/lib/libgomp.exp
(check_effective_target_offload_target_intelmic): New function for
testing for intelmic offloading.
* testsuite/libgomp.c-c++-common/target-45.c: New test.
* testsuite/libgomp.fortran/target10.f90: New test.
The host data might not be computed yet (by an earlier asynchronous compute
region, for example.
libgomp/
* target.c (gomp_coalesce_buf_add): Update comment.
(gomp_copy_host2dev, gomp_map_vars_internal): Don't expect to see
'aq && cbuf'.
(gomp_map_vars_internal): Only 'if (!aq)', do
'gomp_coalesce_buf_add'.
* testsuite/libgomp.oacc-c-c++-common/async-data-1-2.c: Remove
XFAIL.
Co-Authored-By: Julian Brown <julian@codesourcery.com>
This patch fixes several places in libgomp/target.c where "ephemeral" data
(on the stack or in temporary heap locations) may be used as the source of
an asynchronous host-to-device copy that may not complete before the host
data disappears.
An existing, but flawed, workaround for this problem in the AMD GCN
libgomp offloading plugin is currently present on mainline, and was
posted for the og9 branch here:
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg00901.html
and previous versions of this patch were posted here (for mainline/og9):
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg01482.htmlhttps://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01026.html
libgomp/
* libgomp.h (gomp_copy_host2dev): Update prototype.
* oacc-mem.c (memcpy_tofrom_device, update_dev_host): Add new
argument to gomp_copy_host2dev (false).
* plugin/plugin-gcn.c (struct copy_data): Remove free_src field.
(copy_data): Don't free src.
(queue_push_copy): Remove free_src handling.
(GOMP_OFFLOAD_dev2dev): Update call to queue_push_copy.
(GOMP_OFFLOAD_openacc_async_host2dev): Remove source-data
snapshotting.
(GOMP_OFFLOAD_openacc_async_dev2host): Update call to
queue_push_copy.
* target.c (goacc_device_copy_async): Add SRCADDR_ORIG parameter.
(gomp_copy_host2dev): Add EPHEMERAL parameter. Snapshot source
data when true, and set up deferred freeing of temporary buffer.
(gomp_copy_dev2host): Update call to goacc_device_copy_async.
(gomp_map_vars_existing, gomp_map_pointer, gomp_attach_pointer)
(gomp_detach_pointer, gomp_map_vars_internal, gomp_update): Update
calls to gomp_copy_host2dev with appropriate ephemeral argument.
* testsuite/libgomp.oacc-c-c++-common/async-data-1-1.c: Remove
XFAIL.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>