Commit graph

105 commits

Author SHA1 Message Date
Chung-Lin Tang
a7578a077e OpenACC 2.7: Adjust acc_map_data/acc_unmap_data interaction with reference counters
This patch adjusts the implementation of acc_map_data/acc_unmap_data API library
routines to more fit the description in the OpenACC 2.7 specification.

Instead of using REFCOUNT_INFINITY, we now define a REFCOUNT_ACC_MAP_DATA
special value to mark acc_map_data-created mappings. Adjustment around
mapping related code to respect OpenACC semantics are also added.

libgomp/ChangeLog:

	* libgomp.h (REFCOUNT_ACC_MAP_DATA): Define as (REFCOUNT_SPECIAL | 2).
	* oacc-mem.c (acc_map_data): Adjust to use REFCOUNT_ACC_MAP_DATA,
	initialize dynamic_refcount as 1.
	(acc_unmap_data): Adjust to use REFCOUNT_ACC_MAP_DATA,
	(goacc_map_var_existing): Add REFCOUNT_ACC_MAP_DATA case.
	(goacc_exit_datum_1): Add REFCOUNT_ACC_MAP_DATA case, respect
	REFCOUNT_ACC_MAP_DATA when decrementing/finalizing. Force lowest
	dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA.
	(goacc_enter_data_internal): Add REFCOUNT_ACC_MAP_DATA case.
	* target.c (gomp_increment_refcount): Return early for
	REFCOUNT_ACC_MAP_DATA case.
	(gomp_decrement_refcount): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-96.c: New testcase.
	* testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c: Adjust
	testcase error output scan test.
2024-04-16 09:04:11 +00:00
Jakub Jelinek
a945c346f5 Update copyright years. 2024-01-03 12:19:35 +01:00
Tobias Burnus
d4b6d14792 OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables
This commit adds -fopenmp-allocators which enables support for
'omp allocators' and 'omp allocate' that are associated with a Fortran
allocate-stmt. If such a construct is encountered, an error is shown,
unless the -fopenmp-allocators flag is present.

With -fopenmp -fopenmp-allocators, those constructs get turned into
GOMP_alloc allocations, while -fopenmp-allocators (also without -fopenmp)
ensures deallocation and reallocation (via intrinsic assignments) are
properly directed to GOMP_free/omp_realloc - while normal Fortran
allocations are processed by free/realloc.

In order to distinguish a 'malloc'ed from a 'GOMP_alloc'ed memory, the
version field of the Fortran array discriptor is (mis)used: 0 indicates
the normal Fortran allocation while 1 denotes GOMP_alloc. For scalars,
there is record keeping in libgomp: GOMP_add_alloc(ptr) will add the
pointer address to a splay_tree while GOMP_is_alloc(ptr) will return
true it was previously added but also removes it from the list.

Besides Fortran FE work, BUILT_IN_GOMP_REALLOC is no part of
omp-builtins.def and libgomp gains the mentioned two new function.

gcc/ChangeLog:

	* builtin-types.def (BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE): New.
	* omp-builtins.def (BUILT_IN_GOMP_REALLOC): New.
	* builtins.cc (builtin_fnspec): Handle it.
	* gimple-ssa-warn-access.cc (fndecl_alloc_p,
	matching_alloc_calls_p): Likewise.
	* gimple.cc (nonfreeing_call_p): Likewise.
	* predict.cc (expr_expected_value_1): Likewise.
	* tree-ssa-ccp.cc (evaluate_stmt): Likewise.
	* tree.cc (fndecl_dealloc_argno): Likewise.

gcc/fortran/ChangeLog:

	* dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_ALLOCATE
	and EXEC_OMP_ALLOCATORS.
	* f95-lang.cc (ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST):
	Add 'ECF_LEAF | ECF_MALLOC' to existing 'ECF_NOTHROW'.
	(ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LEAF_LIST): Define.
	* gfortran.h (gfc_omp_clauses): Add contained_in_target_construct.
	* invoke.texi (-fopenacc, -fopenmp): Update based on C version.
	(-fopenmp-simd): New, based on C version.
	(-fopenmp-allocators): New.
	* lang.opt (fopenmp-allocators): Add.
	* openmp.cc (resolve_omp_clauses): For allocators/allocate directive,
	add target and no dynamic_allocators diagnostic and more invalid
	diagnostic.
	* parse.cc (decode_omp_directive): Set contains_teams_construct.
	* trans-array.h (gfc_array_allocate): Update prototype.
	(gfc_conv_descriptor_version): New prototype.
	* trans-decl.cc (gfc_init_default_dt): Fix comment.
	* trans-array.cc (gfc_conv_descriptor_version): New.
	(gfc_array_allocate): Support GOMP_alloc allocation.
	(gfc_alloc_allocatable_for_assignment, structure_alloc_comps):
	Handle GOMP_free/omp_realloc as needed.
	* trans-expr.cc (gfc_conv_procedure_call): Likewise.
	(alloc_scalar_allocatable_for_assignment): Likewise.
	* trans-intrinsic.cc (conv_intrinsic_move_alloc): Likewise.
	* trans-openmp.cc (gfc_trans_omp_allocators,
	gfc_trans_omp_directive): Handle allocators/allocate directive.
	(gfc_omp_call_add_alloc, gfc_omp_call_is_alloc): New.
	* trans-stmt.h (gfc_trans_allocate): Update prototype.
	* trans-stmt.cc (gfc_trans_allocate): Support GOMP_alloc.
	* trans-types.cc (gfc_get_dtype_rank_type): Set version field.
	* trans.cc (gfc_allocate_using_malloc, gfc_allocate_allocatable):
	Update to handle GOMP_alloc.
	(gfc_deallocate_with_status, gfc_deallocate_scalar_with_status):
	Handle GOMP_free.
	(trans_code): Update call.
	* trans.h (gfc_allocate_allocatable, gfc_allocate_using_malloc):
	Update prototype.
	(gfc_omp_call_add_alloc, gfc_omp_call_is_alloc): New prototype.
	* types.def (BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE): New.

libgomp/ChangeLog:

	* allocator.c (struct fort_alloc_splay_tree_key_s,
	fort_alloc_splay_compare, GOMP_add_alloc, GOMP_is_alloc): New.
	* libgomp.h: Define splay_tree_static for 'reverse' splay tree.
	* libgomp.map (GOMP_5.1.2): New; add GOMP_add_alloc and
	GOMP_is_alloc; move GOMP_target_map_indirect_ptr from ...
	(GOMP_5.1.1): ... here.
	* libgomp.texi (Impl. Status, Memory management): Update for
	allocators/allocate directives.
	* splay-tree.c: Handle splay_tree_static define to declare all
	functions as static.
	(splay_tree_lookup_node): New.
	* splay-tree.h: Handle splay_tree_decl_only define.
	(splay_tree_lookup_node): New prototype.
	* target.c: Define splay_tree_static for 'reverse'.
	* testsuite/libgomp.fortran/allocators-1.f90: New test.
	* testsuite/libgomp.fortran/allocators-2.f90: New test.
	* testsuite/libgomp.fortran/allocators-3.f90: New test.
	* testsuite/libgomp.fortran/allocators-4.f90: New test.
	* testsuite/libgomp.fortran/allocators-5.f90: New test.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/allocate-14.f90: Add coarray and
	not-listed tests.
	* gfortran.dg/gomp/allocate-5.f90: Remove sorry dg-message.
	* gfortran.dg/bind_c_array_params_2.f90: Update expected
	dump for dtype '.version=0'.
	* gfortran.dg/gomp/allocate-16.f90: New test.
	* gfortran.dg/gomp/allocators-3.f90: New test.
	* gfortran.dg/gomp/allocators-4.f90: New test.
2023-12-08 15:18:25 +01:00
Andrew Stubbs
e7d6c277fa amdgcn, libgomp: low-latency allocator
This implements the OpenMP low-latency memory allocator for AMD GCN using the
small per-team LDS memory (Local Data Store).

Since addresses can now refer to LDS space, the "Global" address space is
no-longer compatible.  This patch therefore switches the backend to use
entirely "Flat" addressing (which supports both memories).  A future patch
will re-enable "global" instructions for cases where it is known to be safe
to do so.

gcc/ChangeLog:

	* config/gcn/gcn-builtins.def (DISPATCH_PTR): New built-in.
	* config/gcn/gcn.cc (gcn_init_machine_status): Disable global
	addressing.
	(gcn_expand_builtin_1): Implement GCN_BUILTIN_DISPATCH_PTR.

libgomp/ChangeLog:

	* config/gcn/libgomp-gcn.h (TEAM_ARENA_START): Move to here.
	(TEAM_ARENA_FREE): Likewise.
	(TEAM_ARENA_END): Likewise.
	(GCN_LOWLAT_HEAP): New.
	* config/gcn/team.c (LITTLEENDIAN_CPU): New, and import hsa.h.
	(__gcn_lowlat_init): New prototype.
	(gomp_gcn_enter_kernel): Initialize the low-latency heap.
	* libgomp.h (TEAM_ARENA_START): Move to libgomp.h.
	(TEAM_ARENA_FREE): Likewise.
	(TEAM_ARENA_END): Likewise.
	* plugin/plugin-gcn.c (lowlat_size): New variable.
	(print_kernel_dispatch): Label the group_segment_size purpose.
	(init_environment_variables): Read GOMP_GCN_LOWLAT_POOL.
	(create_kernel_dispatch): Pass low-latency head allocation to kernel.
	(run_kernel): Use shadow; don't assume values.
	* testsuite/libgomp.c/omp_alloc-traits.c: Enable for amdgcn.
	* config/gcn/allocator.c: New file.
	* libgomp.texi: Document low-latency implementation details.
2023-12-06 16:48:57 +00:00
Kwok Cheung Yeung
a49c7d3193 openmp: Add support for the 'indirect' clause in C/C++
This adds support for the 'indirect' clause in the 'declare target'
directive.  Functions declared as indirect may be called via function
pointers passed from the host in offloaded code.

Virtual calls to member functions via the object pointer in C++ are
currently not supported in target regions.

2023-11-07  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/c-family/
	* c-attribs.cc (c_common_attribute_table): Add attribute for
	indirect functions.
	* c-pragma.h (enum parma_omp_clause): Add entry for indirect clause.

gcc/c/
	* c-decl.cc (c_decl_attributes): Add attribute for indirect
	functions.
	* c-lang.h (c_omp_declare_target_attr): Add indirect field.
	* c-parser.cc (c_parser_omp_clause_name): Handle indirect clause.
	(c_parser_omp_clause_indirect): New.
	(c_parser_omp_all_clauses): Handle indirect clause.
	(OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
	(c_parser_omp_declare_target): Handle indirect clause.  Emit error
	message if device_type or indirect clauses used alone.  Emit error
	if indirect clause used with device_type that is not 'any'.
	(OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
	(c_parser_omp_begin): Handle indirect clause.
	* c-typeck.cc (c_finish_omp_clauses): Handle indirect clause.

gcc/cp/
	* cp-tree.h (cp_omp_declare_target_attr): Add indirect field.
	* decl2.cc (cplus_decl_attributes): Add attribute for indirect
	functions.
	* parser.cc (cp_parser_omp_clause_name): Handle indirect clause.
	(cp_parser_omp_clause_indirect): New.
	(cp_parser_omp_all_clauses): Handle indirect clause.
	(handle_omp_declare_target_clause): Add extra parameter.  Add
	indirect attribute for indirect functions.
	(OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
	(cp_parser_omp_declare_target): Handle indirect clause.  Emit error
	message if device_type or indirect clauses used alone.  Emit error
	if indirect clause used with device_type that is not 'any'.
	(OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
	(cp_parser_omp_begin): Handle indirect clause.
	* semantics.cc (finish_omp_clauses): Handle indirect clause.

gcc/
	* lto-cgraph.cc (enum LTO_symtab_tags): Add tag for indirect
	functions.
	(output_offload_tables): Write indirect functions.
	(input_offload_tables): read indirect functions.
	* lto-section-names.h (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New.
	* omp-builtins.def (BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR): New.
	* omp-offload.cc (offload_ind_funcs): New.
	(omp_discover_implicit_declare_target): Add functions marked with
	'omp declare target indirect' to indirect functions list.
	(omp_finish_file): Add indirect functions to section for offload
	indirect functions.
	(execute_omp_device_lower): Redirect indirect calls on target by
	passing function pointer to BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR.
	(pass_omp_device_lower::gate): Run pass_omp_device_lower if
	indirect functions are present on an accelerator device.
	* omp-offload.h (offload_ind_funcs): New.
	* tree-core.h (omp_clause_code): Add OMP_CLAUSE_INDIRECT.
	* tree.cc (omp_clause_num_ops): Add entry for OMP_CLAUSE_INDIRECT.
	(omp_clause_code_name): Likewise.
	* tree.h (OMP_CLAUSE_INDIRECT_EXPR): New.
	* config/gcn/mkoffload.cc (process_asm): Process offload_ind_funcs
	section.  Count number of indirect functions.
	(process_obj): Emit number of indirect functions.
	* config/nvptx/mkoffload.cc (ind_func_ids, ind_funcs_tail): New.
	(process): Emit offload_ind_func_table in PTX code.  Emit indirect
	function names and count in image.
	* config/nvptx/nvptx.cc (nvptx_record_offload_symbol): Mark
	indirect functions in PTX code with IND_FUNC_MAP.

gcc/testsuite/
	* c-c++-common/gomp/declare-target-7.c: Update expected error message.
	* c-c++-common/gomp/declare-target-indirect-1.c: New.
	* c-c++-common/gomp/declare-target-indirect-2.c: New.
	* g++.dg/gomp/attrs-21.C (v12): Update expected error message.
	* g++.dg/gomp/declare-target-indirect-1.C: New.
	* gcc.dg/gomp/attrs-21.c (v12): Update expected error message.

include/
	* gomp-constants.h (GOMP_VERSION): Increment to 3.
	(GOMP_VERSION_SUPPORTS_INDIRECT_FUNCS): New.

libgcc/
	* offloadstuff.c (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New.
	(__offload_ind_func_table): New.
	(__offload_ind_funcs_end): New.
	(__OFFLOAD_TABLE__): Add entries for indirect functions.

libgomp/
	* Makefile.am (libgomp_la_SOURCES): Add target-indirect.c.
	* Makefile.in: Regenerate.
	* libgomp-plugin.h (GOMP_INDIRECT_ADDR_MAP): New define.
	(GOMP_OFFLOAD_load_image): Add extra argument.
	* libgomp.h (struct indirect_splay_tree_key_s): New.
	(indirect_splay_tree_node, indirect_splay_tree,
	indirect_splay_tree_key): New.
	(indirect_splay_compare): New.
	* libgomp.map (GOMP_5.1.1): Add GOMP_target_map_indirect_ptr.
	* libgomp.texi (OpenMP 5.1): Update documentation on indirect
	calls in target region and on indirect clause.
	(Other new OpenMP 5.2 features): Add entry for virtual function calls.
	* libgomp_g.h (GOMP_target_map_indirect_ptr): Add prototype.
	* oacc-host.c (host_load_image): Add extra argument.
	* target.c (gomp_load_image_to_device): If the GOMP_VERSION is high
	enough, read host indirect functions table and pass to
	load_image_func.
	* config/accel/target-indirect.c: New.
	* config/linux/target-indirect.c: New.
	* config/gcn/team.c (build_indirect_map): Add prototype.
	(gomp_gcn_enter_kernel): Initialize support for indirect
	function calls on GCN target.
	* config/nvptx/team.c (build_indirect_map): Add prototype.
	(gomp_nvptx_main): Initialize support for indirect function
	calls on NVPTX target.
	* plugin/plugin-gcn.c (struct gcn_image_desc): Add field for
	indirect functions count.
	(GOMP_OFFLOAD_load_image): Add extra argument.  If the GOMP_VERSION
	is high enough, build address translation table and copy it to target
	memory.
	* plugin/plugin-nvptx.c (nvptx_tdata): Add field for indirect
	functions count.
	(GOMP_OFFLOAD_load_image): Add extra argument.  If the GOMP_VERSION
	is high enough, Build address translation table and copy it to target
	memory.
	* testsuite/libgomp.c-c++-common/declare-target-indirect-1.c: New.
	* testsuite/libgomp.c-c++-common/declare-target-indirect-2.c: New.
	* testsuite/libgomp.c++/declare-target-indirect-1.C: New.
2023-11-07 15:44:50 +00:00
Tobias Burnus
25072a477a OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect
When copying a 2D or 3D rectangular memmory block, the performance is
better when using CUDA's cuMemcpy2D/cuMemcpy3D instead of copying the
data one by one. That's what this commit does.

Additionally, it permits device-to-device copies, if neccessary using a
temporary variable on the host.

include/ChangeLog:

	* cuda/cuda.h (CUlimit): Add CUDA_ERROR_NOT_INITIALIZED,
	CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_INVALID_HANDLE.
	(CUarray, CUmemorytype, CUDA_MEMCPY2D, CUDA_MEMCPY3D,
	CUDA_MEMCPY3D_PEER): New typdefs.
	(cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned,
	cuMemcpy3D, cuMemcpy3DAsync, cuMemcpy3DPeer,
	cuMemcpy3DPeerAsync): New prototypes.

libgomp/ChangeLog:

	* libgomp-plugin.h (GOMP_OFFLOAD_memcpy2d,
	GOMP_OFFLOAD_memcpy3d): New prototypes.
	* libgomp.h (struct gomp_device_descr): Add memcpy2d_func
	and memcpy3d_func.
	* libgomp.texi (nvtpx): Document when cuMemcpy2D/cuMemcpy3D is used.
	* oacc-host.c (memcpy2d_func, .memcpy3d_func): Init with NULL.
	* plugin/cuda-lib.def (cuMemcpy2D, cuMemcpy2DUnaligned,
	cuMemcpy3D): Invoke via CUDA_ONE_CALL.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d,
	GOMP_OFFLOAD_memcpy3d): New.
	* target.c (omp_target_memcpy_rect_worker):
	(omp_target_memcpy_rect_check, omp_target_memcpy_rect_copy):
	Permit all device-to-device copyies; invoke new plugins for
	2D and 3D copying when available.
	(gomp_load_plugin_for_device): DLSYM the new plugin functions.
	* testsuite/libgomp.c/target-12.c: Fix dimension bug.
	* testsuite/libgomp.fortran/target-12.f90: Likewise.
	* testsuite/libgomp.fortran/target-memcpy-rect-1.f90: New test.
2023-07-26 16:22:35 +02:00
Thomas Schwinge
130c2f3c3a libgomp: Simplify OpenMP reverse offload host <-> device memory copy implementation
... by using the existing 'goacc_asyncqueue' instead of re-coding parts of it.

Follow-up to commit 131d18e928
"libgomp/nvptx: Prepare for reverse-offload callback handling",
and commit ea4b23d9c8
"libgomp: Handle OpenMP's reverse offloads".

	libgomp/
	* target.c (gomp_target_rev): Instead of 'dev_to_host_cpy',
	'host_to_dev_cpy', 'token', take a single 'goacc_asyncqueue'.
	* libgomp.h (gomp_target_rev): Adjust.
	* libgomp-plugin.c (GOMP_PLUGIN_target_rev): Adjust.
	* libgomp-plugin.h (GOMP_PLUGIN_target_rev): Adjust.
	* plugin/plugin-gcn.c (process_reverse_offload): Adjust.
	* plugin/plugin-nvptx.c (rev_off_dev_to_host_cpy)
	(rev_off_host_to_dev_cpy): Remove.
	(GOMP_OFFLOAD_run): Adjust.
2023-05-08 15:58:05 +02:00
Andrew Stubbs
f6fff8a6fc amdgcn, libgomp: Manually allocated stacks
Switch from using stacks in the "private segment" to using a memory block
allocated on the host side.  The primary reason is to permit the reverse
offload implementation to access values located on the device stack, but
there may also be performance benefits, especially with repeated kernel
invocations.

This implementation unifies the stacks with the "team arena" optimization
feature, and now allows both to have run-time configurable sizes.

A new ABI is needed, so all libraries must be rebuilt, and newlib must be
version 4.3.0.20230120 or newer.

gcc/ChangeLog:

	* config/gcn/gcn-run.cc: Include libgomp-gcn.h.
	(struct kernargs): Replace the common content with kernargs_abi.
	(struct heap): Delete.
	(main): Read GCN_STACK_SIZE envvar.
	Allocate space for the device stacks.
	Write the new kernargs fields.
	* config/gcn/gcn.cc (gcn_option_override): Remove stack_size_opt.
	(default_requested_args): Remove PRIVATE_SEGMENT_BUFFER_ARG and
	PRIVATE_SEGMENT_WAVE_OFFSET_ARG.
	(gcn_addr_space_convert): Mask the QUEUE_PTR_ARG content.
	(gcn_expand_prologue): Move the TARGET_PACKED_WORK_ITEMS to the top.
	Set up the stacks from the values in the kernargs, not private.
	(gcn_expand_builtin_1): Match the stack configuration in the prologue.
	(gcn_hsa_declare_function_name): Turn off the private segment.
	(gcn_conditional_register_usage): Ensure QUEUE_PTR is fixed.
	* config/gcn/gcn.h (FIXED_REGISTERS): Fix the QUEUE_PTR register.
	* config/gcn/gcn.opt (mstack-size): Change the description.

include/ChangeLog:

	* gomp-constants.h (GOMP_VERSION_GCN): Bump.

libgomp/ChangeLog:

	* config/gcn/libgomp-gcn.h (DEFAULT_GCN_STACK_SIZE): New define.
	(DEFAULT_TEAM_ARENA_SIZE): New define.
	(struct heap): Move to this file.
	(struct kernargs_abi): Likewise.
	* config/gcn/team.c (gomp_gcn_enter_kernel): Use team arena size from
	the kernargs.
	* libgomp.h: Include libgomp-gcn.h.
	(TEAM_ARENA_SIZE): Remove.
	(team_malloc): Update the error message.
	* plugin/plugin-gcn.c (struct kernargs): Move common content to
	struct kernargs_abi.
	(struct agent_info): Rename team arenas to ephemeral memories.
	(struct team_arena_list): Rename ....
	(struct ephemeral_memories_list): to this.
	(struct heap): Delete.
	(team_arena_size): New variable.
	(stack_size): New variable.
	(print_kernel_dispatch): Update debug messages.
	(init_environment_variables): Read GCN_TEAM_ARENA_SIZE.
	Read GCN_STACK_SIZE.
	(get_team_arena): Rename ...
	(configure_ephemeral_memories): ... to this, and set up stacks.
	(release_team_arena): Rename ...
	(release_ephemeral_memories): ... to this.
	(destroy_team_arenas): Rename ...
	(destroy_ephemeral_memories): ... to this.
	(create_kernel_dispatch): Add num_threads parameter.
	Adjust for kernargs_abi refactor and ephemeral memories.
	(release_kernel_dispatch): Adjust for ephemeral memories.
	(run_kernel): Pass thread-count to create_kernel_dispatch.
	(GOMP_OFFLOAD_init_device): Adjust for ephemeral memories.
	(GOMP_OFFLOAD_fini_device): Adjust for ephemeral memories.

gcc/testsuite/ChangeLog:

	* gcc.c-torture/execute/pr47237.c: Xfail on amdgcn.
	* gcc.dg/builtin-apply3.c: Xfail for amdgcn.
	* gcc.dg/builtin-apply4.c: Xfail for amdgcn.
	* gcc.dg/torture/stackalign/builtin-apply-3.c: Xfail for amdgcn.
	* gcc.dg/torture/stackalign/builtin-apply-4.c: Xfail for amdgcn.
2023-02-02 11:47:03 +00:00
Jakub Jelinek
83ffe9cde7 Update copyright years. 2023-01-16 11:52:17 +01:00
Tobias Burnus
ea4b23d9c8 libgomp: Handle OpenMP's reverse offloads
This commit enabled reverse offload for nvptx such that gomp_target_rev
actually gets called.  And it fills the latter function to do all of
the following: finding the host function to the device func ptr and
copying the arguments to the host, processing the mapping/firstprivate,
calling the host function, copying back the data and freeing as needed.

The data handling is made easier by assuming that all host variables
either existed before (and are in the mapping) or that those are
devices variables not yet available on the host. Thus, the reverse
mapping can do without refcounts etc. Note that the spec disallows
inside a target region device-affecting constructs other than target
plus ancestor device-modifier and it also limits the clauses permitted
on this construct.

For the function addresses, an additional splay tree is used; for
the lookup of mapped variables, the existing splay-tree is used.
Unfortunately, its data structure requires a full walk of the tree;
Additionally, the just mapped variables are recorded in a separate
data structure an extra lookup. While the lookup is slow, assuming
that only few variables get mapped in each reverse offload construct
and that reverse offload is the exception and not performance critical,
this seems to be acceptable.

libgomp/ChangeLog:

	* libgomp.h (struct target_mem_desc): Predeclare; move
	below after 'reverse_splay_tree_node' and add rev_array
	member.
	(struct reverse_splay_tree_key_s, reverse_splay_compare): New.
	(reverse_splay_tree_node, reverse_splay_tree,
	reverse_splay_tree_key): New typedef.
	(struct gomp_device_descr): Add mem_map_rev member.
	* oacc-host.c (host_dispatch): NULL init .mem_map_rev.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim
	support for GOMP_REQUIRES_REVERSE_OFFLOAD.
	* splay-tree.h (splay_tree_callback_stop): New typedef; like
	splay_tree_callback but returning int not void.
	(splay_tree_foreach_lazy): Define; like splay_tree_foreach but
	taking splay_tree_callback_stop as argument.
	* splay-tree.c (splay_tree_foreach_internal_lazy,
	splay_tree_foreach_lazy): New; but early exit if callback returns
	nonzero.
	* target.c: Instatiate splay_tree_c with splay_tree_prefix 'reverse'.
	(gomp_map_lookup_rev): New.
	(gomp_load_image_to_device): Handle reverse-offload function
	lookup table.
	(gomp_unload_image_from_device): Free devicep->mem_map_rev.
	(struct gomp_splay_tree_rev_lookup_data, gomp_splay_tree_rev_lookup,
	gomp_map_rev_lookup, struct cpy_data, gomp_map_cdata_lookup_int,
	gomp_map_cdata_lookup): New auxiliary structs and functions for
	gomp_target_rev.
	(gomp_target_rev): Implement reverse offloading and its mapping.
	(gomp_target_init): Init current_device.mem_map_rev.root.
	* testsuite/libgomp.fortran/reverse-offload-2.f90: New test.
	* testsuite/libgomp.fortran/reverse-offload-3.f90: New test.
	* testsuite/libgomp.fortran/reverse-offload-4.f90: New test.
	* testsuite/libgomp.fortran/reverse-offload-5.f90: New test.
	* testsuite/libgomp.fortran/reverse-offload-5a.f90: New test without
	mapping of on-device allocated variables.
2022-12-10 13:42:08 +01:00
Tobias Burnus
131d18e928 libgomp/nvptx: Prepare for reverse-offload callback handling
This patch adds a stub 'gomp_target_rev' in the host's target.c, which will
later handle the reverse offload.
For nvptx, it adds support for forwarding the offload gomp_target_ext call
to the host by setting values in a struct on the device and querying it on
the host - invoking gomp_target_rev on the result.

include/ChangeLog:

	* cuda/cuda.h (enum CUdevice_attribute): Add
	CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING.
	(CU_MEMHOSTALLOC_DEVICEMAP): Define.
	(cuMemHostAlloc): Add prototype.

libgomp/ChangeLog:

	* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Remove
	'static' for this variable.
	* config/nvptx/libgomp-nvptx.h: New file.
	* config/nvptx/target.c: Include it.
	(GOMP_ADDITIONAL_ICVS): Declare extern var.
	(GOMP_REV_OFFLOAD_VAR): Declare var.
	(GOMP_target_ext): Handle reverse offload.
	* libgomp-plugin.h (GOMP_PLUGIN_target_rev): New prototype.
	* libgomp-plugin.c (GOMP_PLUGIN_target_rev): New, call ...
	* target.c (gomp_target_rev): ... this new stub function.
	* libgomp.h (gomp_target_rev): Declare.
	* libgomp.map (GOMP_PLUGIN_1.4): New; add GOMP_PLUGIN_target_rev.
	* plugin/cuda-lib.def (cuMemHostAlloc): Add.
	* plugin/plugin-nvptx.c: Include libgomp-nvptx.h.
	(struct ptx_device): Add rev_data member.
	(nvptx_open_device): Remove async_engines query, last used in
	r10-304-g1f4c5b9b; add unified-address assert check.
	(GOMP_OFFLOAD_get_num_devices): Claim unified address
	support.
	(GOMP_OFFLOAD_load_image): Free rev_fn_table if no
	offload functions exist. Make offload var available
	on host and device.
	(rev_off_dev_to_host_cpy, rev_off_host_to_dev_cpy): New.
	(GOMP_OFFLOAD_run): Handle reverse offload.
2022-10-24 17:04:08 +02:00
Marcel Vollweiler
9f2fca5659 OpenMP, libgomp: Environment variable syntax extension
This patch considers the environment variable syntax extension for
device-specific variants of environment variables from OpenMP 5.1 (see
OpenMP 5.1 specification, p. 75 and p. 639).  An environment variable (e.g.
OMP_NUM_TEAMS) can have different suffixes:

_DEV (e.g. OMP_NUM_TEAMS_DEV): affects all devices but not the host.
_DEV_<device> (e.g. OMP_NUM_TEAMS_DEV_42): affects only device with
number <device>.
no suffix (e.g. OMP_NUM_TEAMS): affects only the host.

In future OpenMP versions also suffix _ALL will be introduced (see discussion
https://github.com/OpenMP/spec/issues/3179). This is also considered in this
patch:

_ALL (e.g. OMP_NUM_TEAMS_ALL): affects all devices and the host.

The precedence is as follows (descending). For the host:

	1. no suffix
	2. _ALL

For devices:

	1. _DEV_<device>
	2. _DEV
	3. _ALL

That means, _DEV_<device> is used whenever available. Otherwise _DEV is used if
available, and at last _ALL.  If there is no value for any of the variable
variants, default values are used as already implemented before.

This patch concerns parsing (a), storing (b), output (c) and transmission to the
device (d):

(a) The actual number of devices and the numbering are not known when parsing
the environment variables.  Thus all environment variables are iterated and
searched for device-specific ones.
(b) Only configured device-specific variables are stored.  Thus, a linked list
is used.
(c) The output is done in omp_display_env (see specification p. 468f).  Global
ICVs are tagged with [all], see https://github.com/OpenMP/spec/issues/3179.
ICVs which are not global but aren't handled device-specific yet are tagged
with [host].  omp_display_env outputs the initial values of the ICVs.  That is
why a dedicated data structure is introduced for the inital values only
(gomp_initial_icv_list).
(d) Device-specific ICVs are transmitted to the device via GOMP_ADDITIONAL_ICVS.

libgomp/ChangeLog:

	* config/gcn/icv-device.c (omp_get_default_device): Return device-
	specific ICV.
	(omp_get_max_teams): Added for GCN devices.
	(omp_set_num_teams): Likewise.
	(ialias): Likewise.
	* config/nvptx/icv-device.c (omp_get_default_device): Return device-
	specific ICV.
	(omp_get_max_teams): Added for NVPTX devices.
	(omp_set_num_teams): Likewise.
	(ialias): Likewise.
	* env.c (struct gomp_icv_list): New struct to store entries of initial
	ICV values.
	(struct gomp_offload_icv_list): New struct to store entries of device-
	specific ICV values that are copied to the device and back.
	(struct gomp_default_icv_values): New struct to store default values of
	ICVs according to the OpenMP standard.
	(parse_schedule): Generalized for different variants of OMP_SCHEDULE.
	(print_env_var_error): Function that prints an error for invalid values
	for ICVs.
	(parse_unsigned_long_1): Removed getenv.  Generalized.
	(parse_unsigned_long): Likewise.
	(parse_int_1): Likewise.
	(parse_int): Likewise.
	(parse_int_secure): Likewise.
	(parse_unsigned_long_list): Likewise.
	(parse_target_offload): Likewise.
	(parse_bind_var): Likewise.
	(parse_stacksize): Likewise.
	(parse_boolean): Likewise.
	(parse_wait_policy): Likewise.
	(parse_allocator): Likewise.
	(omp_display_env): Extended to output different variants of environment
	variables.
	(print_schedule): New helper function for omp_display_env which prints
	the values of run_sched_var.
	(print_proc_bind): New helper function for omp_display_env which prints
	the values of proc_bind_var.
	(enum gomp_parse_type): Collection of types used for parsing environment
	variables.
	(ENTRY): Preprocess string lengths of environment variables.
	(OMP_VAR_CNT): Preprocess table size.
	(OMP_HOST_VAR_CNT): Likewise.
	(INT_MAX_STR_LEN): Constant for the maximal number of digits of a device
	number.
	(gomp_get_icv_flag): Returns if a flag for a particular ICV is set.
	(gomp_set_icv_flag): Sets a flag for a particular ICV.
	(print_device_specific_icvs): New helper function for omp_display_env to
	print device specific ICV values.
	(get_device_num): New helper function for parse_device_specific.
	Extracts the device number from an environment variable name.
	(get_icv_member_addr): Gets the memory address for a particular member
	of an ICV struct.
	(gomp_get_initial_icv_item): Get a list item of gomp_initial_icv_list.
	(initialize_icvs): New function to initialize a gomp_initial_icvs
	struct.
	(add_initial_icv_to_list): Adds an ICV struct to gomp_initial_icv_list.
	(startswith): Checks if a string starts with a given prefix.
	(initialize_env): Extended to parse the new syntax of environment
	variables.
	* icv-device.c (omp_get_max_teams): Added.
	(ialias): Likewise.
	(omp_set_num_teams): Likewise.
	* icv.c (omp_set_num_teams): Moved to icv-device.c.
	(omp_get_max_teams): Likewise.
	(ialias): Likewise.
	* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Removed.
	(GOMP_ADDITIONAL_ICVS): New target-side struct that
	holds the designated ICVs of the target device.
	* libgomp.h (enum gomp_icvs): Collection of ICVs.
	(enum gomp_device_num): Definition of device numbers for _ALL, _DEV, and
	no suffix.
	(enum gomp_env_suffix): Collection of possible suffixes of environment
	variables.
	(struct gomp_initial_icvs): Contains all ICVs for which we need to store
	initial values.
	(struct gomp_default_icv):New struct to hold ICVs for which we need
	to store initial values.
	(struct gomp_icv_list): Definition of a linked list that is used for
	storing ICVs for the devices and also for _DEV, _ALL, and without
	suffix.
	(struct gomp_offload_icvs): New struct to hold ICVs that are copied to
	a device.
	(struct gomp_offload_icv_list): Definition of a linked list that holds
	device-specific ICVs that are copied to devices.
	(gomp_get_initial_icv_item): Get a list item of gomp_initial_icv_list.
	(gomp_get_icv_flag): Returns if a flag for a particular ICV is set.
	* libgomp.texi: Updated.
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Extended to read
	further ICVs from the offload image.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
	* target.c (gomp_get_offload_icv_item): Get a list item of
	gomp_offload_icv_list.
	(get_gomp_offload_icvs): New. Returns the ICV values
	depending on the device num and the variable hierarchy.
	(gomp_load_image_to_device): Extended to copy further ICVs to a device.
	* testsuite/libgomp.c-c++-common/icv-5.c: New test.
	* testsuite/libgomp.c-c++-common/icv-6.c: New test.
	* testsuite/libgomp.c-c++-common/icv-7.c: New test.
	* testsuite/libgomp.c-c++-common/icv-8.c: New test.
	* testsuite/libgomp.c-c++-common/omp-display-env-1.c: New test.
	* testsuite/libgomp.c-c++-common/omp-display-env-2.c: New test.
2022-09-08 10:19:37 -07:00
Jakub Jelinek
42fd2cd932 libgomp: Don't define GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC for _aligned_malloc [PR105745]
since apparently _aligned_malloc requires freeing with _aligned_free and:
 /* Defined if gomp_aligned_alloc doesn't use fallback version
    and free can be used instead of gomp_aligned_free.  */
 #define GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC 1
so the second condition isn't satisfied.  For uses inside of the OpenMP
allocators we can still use _aligned_malloc but we need to call _aligned_free
in gomp_aligned_free.

2022-05-28  Jakub Jelinek  <jakub@redhat.com>

	PR libgomp/105745
	* libgomp.h (GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC): Don't define for
	defined(HAVE__ALIGNED_MALLOC) case.
	* alloc.c (gomp_aligned_alloc): Move defined(HAVE__ALIGNED_MALLOC)
	handling as last option before fallback instead of first.
	(gomp_aligned_free): For defined(HAVE__ALIGNED_MALLOC) call
	_aligned_free.
2022-05-28 08:30:47 +02:00
Jakub Jelinek
2c16eb3157 openmp: Add support for inoutset depend-kind
This patch adds support for inoutset depend-kind in depend
clauses.  It is very similar to the in depend-kind in that
a task with a dependency with that depend-kind is dependent
on all previously created sibling tasks with matching address
unless they have the same depend-kind.
In the in depend-kind case everything is dependent except
for in -> in dependency, for inoutset everything is
dependent except for inoutset -> inoutset dependency.
mutexinoutset is also similar (everything is dependent except
for mutexinoutset -> mutexinoutset dependency), but there is
also the additional restriction that only one task with
mutexinoutset for each address can be scheduled at once (i.e.
mutual exclusitivty).  For now we support mutexinoutset
the same as inout/out, but the inoutset support is full.

In order not to bump the ABI for dependencies each time
(we've bumped it already once, the old ABI supports only
inout/out and in depend-kind, the new ABI supports
inout/out, mutexinoutset, in and depobj), this patch arranges
for inoutset to be at least for the time being always handled
as if it was specified through depobj even when it is not.
So it uses the new ABI for that and inoutset are represented
like depobj - pointer to a pair of pointers where the first one
will be the actual address of the object mentioned in depend
clause and second pointer will be (void *) GOMP_DEPEND_INOUTSET.

2022-05-17  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree-core.h (enum omp_clause_depend_kind): Add
	OMP_CLAUSE_DEPEND_INOUTSET.
	* tree-pretty-print.cc (dump_omp_clause): Handle
	OMP_CLAUSE_DEPEND_INOUTSET.
	* gimplify.cc (gimplify_omp_depend): Likewise.
	* omp-low.cc (lower_depend_clauses): Likewise.
gcc/c-family/
	* c-omp.cc (c_finish_omp_depobj): Handle
	OMP_CLAUSE_DEPEND_INOUTSET.
gcc/c/
	* c-parser.cc (c_parser_omp_clause_depend): Parse
	inoutset depend-kind.
	(c_parser_omp_depobj): Likewise.
gcc/cp/
	* parser.cc (cp_parser_omp_clause_depend): Parse
	inoutset depend-kind.
	(cp_parser_omp_depobj): Likewise.
	* cxx-pretty-print.cc (cxx_pretty_printer::statement): Handle
	OMP_CLAUSE_DEPEND_INOUTSET.
gcc/testsuite/
	* c-c++-common/gomp/all-memory-1.c (boo): Add test with
	inoutset depend-kind.
	* c-c++-common/gomp/all-memory-2.c (boo): Likewise.
	* c-c++-common/gomp/depobj-1.c (f1): Likewise.
	(f2): Adjusted expected diagnostics.
	* g++.dg/gomp/depobj-1.C (f4): Adjust expected diagnostics.
include/
	* gomp-constants.h (GOMP_DEPEND_INOUTSET): Define.
libgomp/
	* libgomp.h (struct gomp_task_depend_entry): Change is_in type
	from bool to unsigned char.
	* task.c (gomp_task_handle_depend): Handle GOMP_DEPEND_INOUTSET.
	Ignore dependencies where
	task->depend[i].is_in && task->depend[i].is_in == ent->is_in
	rather than just task->depend[i].is_in && ent->is_in.  Remember
	whether GOMP_DEPEND_IN loop is needed and guard the loop with that
	conditional.
	(gomp_task_maybe_wait_for_dependencies): Handle GOMP_DEPEND_INOUTSET.
	Ignore dependencies where elem.is_in && elem.is_in == ent->is_in
	rather than just elem.is_in && ent->is_in.
	* testsuite/libgomp.c-c++-common/depend-1.c (test): Add task with
	inoutset depend-kind.
	* testsuite/libgomp.c-c++-common/depend-2.c (test): Likewise.
	* testsuite/libgomp.c-c++-common/depend-3.c (test): Likewise.
	* testsuite/libgomp.c-c++-common/depend-inoutset-1.c: New test.
2022-05-17 15:40:27 +02:00
Jakub Jelinek
7f78783dbe openmp: Add omp_all_memory support (C/C++ only so far)
The ugly part is that OpenMP 5.1 made omp_all_memory a reserved identifier
which isn't allowed to be used anywhere but in the depend clause, this is
against how everything else has been handled in OpenMP so far (where
some identifiers could have special meaning in some OpenMP clauses or
pragmas but not elsewhere).
The patch handles it by making it a conditional keyword (for -fopenmp
only) and emitting a better diagnostics when it is used in a primary
expression.  Having a nicer diagnostics when e.g. trying to do
int omp_all_memory;
or
int *omp_all_memory[10];
etc. would mean changing too many spots and hooking into name lookups
to reject declaring any such symbols would be too ugly and I'm afraid
there are way too many spots where one can introduce a name
(variables, functions, namespaces, struct, enum, enumerators, template
arguments, ...).

Otherwise, the handling is quite simple, normal depend clauses lower
into addresses of variables being handed over to the library, for
omp_all_memory I'm using NULL pointers.  omp_all_memory can only be
used with inout or out depend kinds and means that a task is dependent
on all previously created sibling tasks that have any dependency (of
any depend kind) and that any later created sibling tasks will be
dependent on it if they have any dependency.

2022-05-12  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* gimplify.cc (gimplify_omp_depend): Don't build_fold_addr_expr
	if null_pointer_node.
	(gimplify_scan_omp_clauses): Likewise.
	* tree-pretty-print.cc (dump_omp_clause): Print null_pointer_node
	as omp_all_memory.
gcc/c-family/
	* c-common.h (enum rid): Add RID_OMP_ALL_MEMORY.
	* c-omp.cc (c_finish_omp_depobj): Don't build_fold_addr_expr
	if null_pointer_node.
gcc/c/
	* c-parser.cc (c_parse_init): Register omp_all_memory as keyword
	if flag_openmp.
	(c_parser_postfix_expression): Diagnose uses of omp_all_memory
	in postfix expressions.
	(c_parser_omp_variable_list): Handle omp_all_memory in depend
	clause.
	* c-typeck.cc (c_finish_omp_clauses): Handle omp_all_memory
	keyword in depend clause as null_pointer_node, diagnose invalid
	uses.
gcc/cp/
	* lex.cc (init_reswords): Register omp_all_memory as keyword
	if flag_openmp.
	* parser.cc (cp_parser_primary_expression): Diagnose uses of
	omp_all_memory in postfix expressions.
	(cp_parser_omp_var_list_no_open): Handle omp_all_memory in depend
	clause.
	* semantics.cc (finish_omp_clauses): Handle omp_all_memory
	keyword in depend clause as null_pointer_node, diagnose invalid
	uses.
	* pt.cc (tsubst_omp_clause_decl): Pass through omp_all_memory.
gcc/testsuite/
	* c-c++-common/gomp/all-memory-1.c: New test.
	* c-c++-common/gomp/all-memory-2.c: New test.
	* c-c++-common/gomp/all-memory-3.c: New test.
	* g++.dg/gomp/all-memory-1.C: New test.
	* g++.dg/gomp/all-memory-2.C: New test.
libgomp/
	* libgomp.h (struct gomp_task): Add depend_all_memory member.
	* task.c (gomp_init_task): Initialize depend_all_memory.
	(gomp_task_handle_depend): Handle omp_all_memory.
	(gomp_task_run_post_handle_depend_hash): Clear
	parent->depend_all_memory if equal to current task.
	(gomp_task_maybe_wait_for_dependencies): Handle omp_all_memory.
	* testsuite/libgomp.c-c++-common/depend-1.c: New test.
	* testsuite/libgomp.c-c++-common/depend-2.c: New test.
	* testsuite/libgomp.c-c++-common/depend-3.c: New test.
2022-05-12 08:31:20 +02:00
Jakub Jelinek
7adcbafe45 Update copyright years. 2022-01-03 10:42:10 +01:00
Chung-Lin Tang
0ab29cf0bb openmp: Improve OpenMP target support for C++ (PR92120)
This patch implements several C++ specific mapping capabilities introduced for
OpenMP 5.0, including implicit mapping of this[:1] for non-static member
functions, zero-length array section mapping of pointer-typed members,
lambda captured variable access in target regions, and use of lambda objects
inside target regions.

Several adjustments to the C/C++ front-ends to allow more member-access syntax
as valid is also included.

	PR middle-end/92120

gcc/cp/ChangeLog:

	* cp-tree.h (finish_omp_target): New declaration.
	(finish_omp_target_clauses): Likewise.
	* parser.c (cp_parser_omp_clause_map): Adjust call to
	cp_parser_omp_var_list_no_open to set 'allow_deref' argument to true.
	(cp_parser_omp_target): Factor out code, adjust into calls to new
	function finish_omp_target.
	* pt.c (tsubst_expr): Add call to finish_omp_target_clauses for
	OMP_TARGET case.
	* semantics.c (handle_omp_array_sections_1): Add handling to create
	'this->member' from 'member' FIELD_DECL. Remove case of rejecting
	'this' when not in declare simd.
	(handle_omp_array_sections): Likewise.
	(finish_omp_clauses): Likewise. Adjust to allow 'this[]' in OpenMP
	map clauses. Handle 'A->member' case in map clauses. Remove case of
	rejecting 'this' when not in declare simd.
	(struct omp_target_walk_data): New struct for walking over
	target-directive tree body.
	(finish_omp_target_clauses_r): New function for tree walk.
	(finish_omp_target_clauses): New function.
	(finish_omp_target): New function.

gcc/c/ChangeLog:

	* c-parser.c (c_parser_omp_clause_map): Set 'allow_deref' argument in
	call to c_parser_omp_variable_list to 'true'.
	* c-typeck.c (handle_omp_array_sections_1): Add strip of MEM_REF in
	array base handling.
	(c_finish_omp_clauses): Handle 'A->member' case in map clauses.

gcc/ChangeLog:

	* gimplify.c ("tree-hash-traits.h"): Add include.
	(gimplify_scan_omp_clauses): Change struct_map_to_clause to type
	hash_map<tree_operand, tree> *. Adjust struct map handling to handle
	cases of *A and A->B expressions. Under !DECL_P case of
	GOMP_CLAUSE_MAP handling, add STRIP_NOPS for indir_p case, add to
	struct_deref_set for map(*ptr_to_struct) cases. Add MEM_REF case when
	handling component_ref_p case. Add unshare_expr and gimplification
	when created GOMP_MAP_STRUCT is not a DECL. Add code to add
	firstprivate pointer for *pointer-to-struct case.
	(gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for
	exit data directives code to earlier position.
	* omp-low.c (lower_omp_target):
	Handle GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
	GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
	* tree-pretty-print.c (dump_omp_clause): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.dg/gomp/target-3.c: New testcase.
	* g++.dg/gomp/target-3.C: New testcase.
	* g++.dg/gomp/target-lambda-1.C: New testcase.
	* g++.dg/gomp/target-lambda-2.C: New testcase.
	* g++.dg/gomp/target-this-1.C: New testcase.
	* g++.dg/gomp/target-this-2.C: New testcase.
	* g++.dg/gomp/target-this-3.C: New testcase.
	* g++.dg/gomp/target-this-4.C: New testcase.
	* g++.dg/gomp/target-this-5.C: New testcase.
	* g++.dg/gomp/this-2.C: Adjust testcase.

include/ChangeLog:

	* gomp-constants.h (enum gomp_map_kind):
	Add GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
	GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
	(GOMP_MAP_POINTER_P):
	Include GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION.

libgomp/ChangeLog:

	* libgomp.h (gomp_attach_pointer): Add bool parameter.
	* oacc-mem.c (acc_attach_async): Update call to gomp_attach_pointer.
	(goacc_enter_data_internal): Likewise.
	* target.c (gomp_map_vars_existing): Update assert condition to
	include GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION.
	(gomp_map_pointer): Add 'bool allow_zero_length_array_sections'
	parameter, add support for mapping a pointer with NULL target.
	(gomp_attach_pointer): Add 'bool allow_zero_length_array_sections'
	parameter, add support for attaching a pointer with NULL target.
	(gomp_map_vars_internal): Update calls to gomp_map_pointer and
	gomp_attach_pointer, add handling for
	GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
	GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION cases.
	* testsuite/libgomp.c++/target-23.C: New testcase.
	* testsuite/libgomp.c++/target-lambda-1.C: New testcase.
	* testsuite/libgomp.c++/target-lambda-2.C: New testcase.
	* testsuite/libgomp.c++/target-this-1.C: New testcase.
	* testsuite/libgomp.c++/target-this-2.C: New testcase.
	* testsuite/libgomp.c++/target-this-3.C: New testcase.
	* testsuite/libgomp.c++/target-this-4.C: New testcase.
	* testsuite/libgomp.c++/target-this-5.C: New testcase.
2021-12-08 22:29:06 +08:00
Jakub Jelinek
17da2c7425 libgomp: Ensure that either gomp_team is properly aligned [PR102838]
struct gomp_team has struct gomp_work_share array inside of it.
If that latter structure has 64-byte aligned member in the middle,
the whole struct gomp_team needs to be 64-byte aligned, but we weren't
allocating it using gomp_aligned_alloc.

This patch fixes that, except that on gcn team_malloc is special, so
I've instead decided at least for now to avoid using aligned member
and use the padding instead on gcn.

2021-11-18  Jakub Jelinek  <jakub@redhat.com>

	PR libgomp/102838
	* libgomp.h (GOMP_USE_ALIGNED_WORK_SHARES): Define if
	GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is defined and __AMDGCN__ is not.
	(struct gomp_work_share): Use GOMP_USE_ALIGNED_WORK_SHARES instead of
	GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC.
	* work.c (alloc_work_share, gomp_work_share_start): Likewise.
	* team.c (gomp_new_team): If GOMP_USE_ALIGNED_WORK_SHARES, use
	gomp_aligned_alloc instead of team_malloc.
2021-11-18 09:10:40 +01:00
Jakub Jelinek
fa4fcb111a libgomp: Use TLS storage for omp_get_num_teams()/omp_get_team_num() values
When thinking about GOMP_teams3, I've realized that using global variables
for the values returned by omp_get_num_teams()/omp_get_team_num() calls
is incorrect even with our right now dumb way of implementing host teams.
The problems are two, one is if host teams is used from multiple pthread_create
created threads - the spec says that host teams can't be nested inside of
explicit parallel or other teams constructs, but with pthread_create the
standard says obviously nothing about it.  Another more important thing
is host fallback, right now we don't do anything for omp_get_num_teams()
or omp_get_team_num() which was fine before host teams was introduced and
the 5.1 requirement that num_teams clause specifies minimum of teams, but
with the global vars it means inside of target teams num_teams (2) we happily
return omp_get_num_teams() == 4 if the target teams is inside of host teams
with num_teams(4).  With target fallback being invoked from parallel
regions global vars simply can't work right on the host.

So, this patch moves them to struct gomp_thread and propagates those for
parallel to child threads.  For host fallback, the implicit zeroing of
*thr results in us returning omp_get_num_teams () == 1 and
omp_get_team_num () == 0 which is fine for target teams without num_teams
clause, for target teams with num_teams clause something to work on and
for target without teams nested in it I've asked on omp-lang what should
be done.

2021-11-11  Jakub Jelinek  <jakub@redhat.com>

	* libgomp.h (struct gomp_thread): Add num_teams and team_num members.
	* team.c (struct gomp_thread_start_data): Likewise.
	(gomp_thread_start): Initialize thr->num_teams and thr->team_num.
	(gomp_team_start): Initialize start_data->num_teams and
	start_data->team_num.  Update nthr->num_teams and nthr->team_num.
	* teams.c (gomp_num_teams, gomp_team_num): Remove.
	(GOMP_teams_reg): Set and restore thr->num_teams and thr->team_num
	instead of gomp_num_teams and gomp_team_num.
	(omp_get_num_teams): Use thr->num_teams + 1 instead of gomp_num_teams.
	(omp_get_team_num): Use thr->team_num instead of gomp_team_num.
	* testsuite/libgomp.c/teams-4.c: New test.
2021-11-11 13:57:31 +01:00
Jakub Jelinek
c7abdf46fb openmp: Fix up struct gomp_work_share handling [PR102838]
If GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is not defined, the intent was to
treat the split of the structure between first cacheline (64 bytes)
as mostly write-once, use afterwards and second cacheline as rw just
as an optimization.  But as has been reported, with vectorization enabled
at -O2 it can now result in aligned vector 16-byte or larger stores.
When not having posix_memalign/aligned_alloc/memalign or other similar API,
alloc.c emulates it but it needs to allocate extra memory for the dynamic
realignment.
So, for the GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC not defined case, this patch
stops using aligned (64) attribute in the middle of the structure and instead
inserts padding that puts the second half of the structure at offset 64 bytes.

And when GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is defined, usually it was allocated
as aligned, but for the orphaned case it could still be allocated just with
gomp_malloc without guaranteed proper alignment.

2021-10-20  Jakub Jelinek  <jakub@redhat.com>

	PR libgomp/102838
	* libgomp.h (struct gomp_work_share_1st_cacheline): New type.
	(struct gomp_work_share): Only use aligned(64) attribute if
	GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is defined, otherwise just
	add padding before lock to ensure lock is at offset 64 bytes
	into the structure.
	(gomp_workshare_struct_check1, gomp_workshare_struct_check2):
	New poor man's static assertions.
	* work.c (gomp_work_share_start): Use gomp_aligned_alloc instead of
	gomp_malloc if GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC.
2021-10-20 09:34:51 +02:00
Jakub Jelinek
07dd3bcda1 openmp: Add omp_set_num_teams, omp_get_max_teams, omp_[gs]et_teams_thread_limit
OpenMP 5.1 adds env vars and functions to set and query new ICVs used
as fallback if thread_limit or num_teams clauses aren't specified on
teams construct.

The following patch implements those, though further work will be needed:
1) OpenMP 5.1 also changed the num_teams clause, so that it can specify
   both lower and upper limit for how many teams should be created and
   changed the meaning when only one expression is provided, instead of
   num_teams(expr) in 5.0 meaning num_teams(1:expr) in 5.1, it now means
   num_teams(expr:expr), i.e. while previously we could create 1 to expr
   teams, in 5.1 we have some low limit by default equal to the single
   expression provided and may not create fewer teams.
   For host teams (which we don't currently implement efficiently for
   NUMA hosts) we trivially satisfy it now by always honoring what the
   user asked for, but for the offloading teams I think we'll need to
   rethink the APIs; currently teams construct is just a call that returns
   and possibly lowers the number of teams; and whenever possible we try
   to evaluate num_teams/thread_limit already on the target construct
   and the GOMP_teams call just sets the number of teams to the minimum
   of provided and requested teams; for some cases e.g. where target
   is not combined with teams and num_teams expression calls some functions
   etc., we need to call those functions in the target region and so it is
   late to figure number of teams, but also hw could just limit what it
   is willing to create; in that case I'm afraid we need to run the target
   body multiple times and arrange for omp_get_team_num () returning the
   right values
2) we need to finally implement the NUMA handling for GOMP_teams_reg
3) I now realize I haven't added some testcase coverage, will do that
   incrementally
4) libgomp.texi needs updates for these new APIs, but also others like
   the allocator

2021-10-11  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* omp-low.c (omp_runtime_api_call): Handle omp_get_max_teams,
	omp_[sg]et_teams_thread_limit and omp_set_num_teams.
libgomp/
	* omp.h.in (omp_set_num_teams, omp_get_max_teams,
	omp_set_teams_thread_limit, omp_get_teams_thread_limit): Declare.
	* omp_lib.f90.in (omp_set_num_teams, omp_get_max_teams,
	omp_set_teams_thread_limit, omp_get_teams_thread_limit): Declare.
	* omp_lib.h.in (omp_set_num_teams, omp_get_max_teams,
	omp_set_teams_thread_limit, omp_get_teams_thread_limit): Declare.
	* libgomp.h (gomp_nteams_var, gomp_teams_thread_limit_var): Declare.
	* libgomp.map (OMP_5.1): Export omp_get_max_teams{,_},
	omp_get_teams_thread_limit{,_}, omp_set_num_teams{,_,_8_} and
	omp_set_teams_thread_limit{,_,_8_}.
	* icv.c (omp_set_num_teams, omp_get_max_teams,
	omp_set_teams_thread_limit, omp_get_teams_thread_limit): New
	functions.
	* env.c (gomp_nteams_var, gomp_teams_thread_limit_var): Define.
	(omp_display_env): Print OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT.
	(initialize_env): Handle OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT env
	vars.
	* teams.c (GOMP_teams_reg): If thread_limit is not specified, use
	gomp_teams_thread_limit_var as fallback if not zero.  If num_teams
	is not specified, use gomp_nteams_var.
	* fortran.c (omp_set_num_teams, omp_get_max_teams,
	omp_set_teams_thread_limit, omp_get_teams_thread_limit): Add
	ialias_redirect.
	(omp_set_num_teams_, omp_set_num_teams_8_, omp_get_max_teams_,
	omp_set_teams_thread_limit_, omp_set_teams_thread_limit_8_,
	omp_get_teams_thread_limit_): New functions.
2021-10-11 12:20:22 +02:00
Julian Brown
9c41f5b9cd Fix OpenACC "ephemeral" asynchronous host-to-device copies
This patch fixes several places in libgomp/target.c where "ephemeral" data
(on the stack or in temporary heap locations) may be used as the source of
an asynchronous host-to-device copy that may not complete before the host
data disappears.

An existing, but flawed, workaround for this problem in the AMD GCN
libgomp offloading plugin is currently present on mainline, and was
posted for the og9 branch here:

  https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg00901.html

and previous versions of this patch were posted here (for mainline/og9):

  https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg01482.html
  https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01026.html

libgomp/
	* libgomp.h (gomp_copy_host2dev): Update prototype.
	* oacc-mem.c (memcpy_tofrom_device, update_dev_host): Add new
	argument to gomp_copy_host2dev (false).
	* plugin/plugin-gcn.c (struct copy_data): Remove free_src field.
	(copy_data): Don't free src.
	(queue_push_copy): Remove free_src handling.
	(GOMP_OFFLOAD_dev2dev): Update call to queue_push_copy.
	(GOMP_OFFLOAD_openacc_async_host2dev): Remove source-data
	snapshotting.
	(GOMP_OFFLOAD_openacc_async_dev2host): Update call to
	queue_push_copy.
	* target.c (goacc_device_copy_async): Add SRCADDR_ORIG parameter.
	(gomp_copy_host2dev): Add EPHEMERAL parameter.  Snapshot source
	data when true, and set up deferred freeing of temporary buffer.
	(gomp_copy_dev2host): Update call to goacc_device_copy_async.
	(gomp_map_vars_existing, gomp_map_pointer, gomp_attach_pointer)
	(gomp_detach_pointer, gomp_map_vars_internal, gomp_update): Update
	calls to gomp_copy_host2dev with appropriate ephemeral argument.
	* testsuite/libgomp.oacc-c-c++-common/async-data-1-1.c: Remove
	XFAIL.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-07-27 11:16:27 +02:00
Thomas Schwinge
8168338684 [gcn] Work-around libgomp 'error: array subscript 0 is outside array bounds of ‘__lds struct gomp_thread * __lds[0]’ [-Werror=array-bounds]' some more [PR101484]
With yesterday's commit 9f2bc5077d "[gcn]
Work-around libgomp 'error: array subscript 0 is outside array bounds of
‘__lds struct gomp_thread * __lds[0]’ [-Werror=array-bounds]' [PR101484]",
I did defuse the "unexpected" '-Werror=array-bounds' diagnostics that we see
as of commit a110855667 "Correct handling of
variable offset minus constant in -Warray-bounds [PR100137]".  However, these
'#pragma GCC diagnostic [...]' directives cause some code generation changes
(that seems unexpected, problematic!), which results in a lot (ten thousands)
of 'GCN team arena exhausted' run-time diagnostics, also leading to a few
FAILs:

    PASS: libgomp.c/../libgomp.c-c++-common/for-11.c (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-11.c execution test

    PASS: libgomp.c/../libgomp.c-c++-common/for-12.c (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-12.c execution test

    PASS: libgomp.c/../libgomp.c-c++-common/for-3.c (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-3.c execution test

    PASS: libgomp.c/../libgomp.c-c++-common/for-5.c (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-5.c execution test

    PASS: libgomp.c/../libgomp.c-c++-common/for-6.c (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-6.c execution test

    PASS: libgomp.c/../libgomp.c-c++-common/for-9.c (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-9.c execution test

Same for 'libgomp.c++'.

It remains to be analyzed how '#pragma GCC diagnostic [...]' directives can
cause code generation changes; for now I'm working around the "unexpected"
'-Werror=array-bounds' diagnostics differently.

Overall, still awaiting a different solution, of course.

	libgomp/
	PR target/101484
	* configure.tgt [amdgcn*-*-*] (XCFLAGS): Add
	'-Wno-error=array-bounds'.
	* config/gcn/team.c: Remove '-Werror=array-bounds' work-around.
	* libgomp.h [__AMDGCN__]: Likewise.
2021-07-20 09:14:28 +02:00
Thomas Schwinge
9f2bc5077d [gcn] Work-around libgomp 'error: array subscript 0 is outside array bounds of ‘__lds struct gomp_thread * __lds[0]’ [-Werror=array-bounds]' [PR101484]
... seen as of commit a110855667 "Correct
handling of variable offset minus constant in -Warray-bounds [PR100137]".

Awaiting a different solution, of course.

	libgomp/
	PR target/101484
	* config/gcn/team.c: Apply '-Werror=array-bounds' work-around.
	* libgomp.h [__AMDGCN__]: Likewise.
2021-07-19 10:26:12 +02:00
Chung-Lin Tang
275c736e73 libgomp: Structure element mapping for OpenMP 5.0
This patch implement OpenMP 5.0 requirements of incrementing/decrementing
the reference count of a mapped structure at most once (across all elements)
on a construct.

This is implemented by pulling in libgomp/hashtab.h and using htab_t as a
pointer set. Structure element list siblings also have pointers-to-refcounts
linked together, to naturally achieve uniform increment/decrement without
repeating.

There are still some questions on whether using such a htab_t based set is
faster/slower than using a sorted pointer array based implementation. This
is to be researched on later.

libgomp/ChangeLog:

	* hashtab.h (htab_clear): New function with initialization code
	factored out from...
	(htab_create): ...here, adjust to use htab_clear function.

	* libgomp.h (REFCOUNT_SPECIAL): New symbol to denote range of
	special refcount values, add comments.
	(REFCOUNT_INFINITY): Adjust definition to use REFCOUNT_SPECIAL.
	(REFCOUNT_LINK): Likewise.
	(REFCOUNT_STRUCTELEM): New special refcount range for structure
	element siblings.
	(REFCOUNT_STRUCTELEM_P): Macro for testing for structure element
	sibling maps.
	(REFCOUNT_STRUCTELEM_FLAG_FIRST): Flag to indicate first sibling.
	(REFCOUNT_STRUCTELEM_FLAG_LAST):  Flag to indicate last sibling.
	(REFCOUNT_STRUCTELEM_FIRST_P): Macro to test _FIRST flag.
	(REFCOUNT_STRUCTELEM_LAST_P): Macro to test _LAST flag.
	(struct splay_tree_key_s): Add structelem_refcount and
	structelem_refcount_ptr fields into a union with dynamic_refcount.
	Add comments.
	(gomp_map_vars): Delete declaration.
	(gomp_map_vars_async): Likewise.
	(gomp_unmap_vars): Likewise.
	(gomp_unmap_vars_async): Likewise.
	(goacc_map_vars): New declaration.
	(goacc_unmap_vars): Likewise.

	* oacc-mem.c (acc_map_data): Adjust to use goacc_map_vars.
	(goacc_enter_datum): Likewise.
	(goacc_enter_data_internal): Likewise.
	* oacc-parallel.c (GOACC_parallel_keyed): Adjust to use goacc_map_vars
	and goacc_unmap_vars.
	(GOACC_data_start): Adjust to use goacc_map_vars.
	(GOACC_data_end): Adjust to use goacc_unmap_vars.

	* target.c (hash_entry_type): New typedef.
	(htab_alloc): New function hook for hashtab.h.
	(htab_free): Likewise.
	(htab_hash): Likewise.
	(htab_eq): Likewise.
	(hashtab.h): Add file include.
	(gomp_increment_refcount): New function.
	(gomp_decrement_refcount): Likewise.
	(gomp_map_vars_existing): Add refcount_set parameter, adjust to use
	gomp_increment_refcount.
	(gomp_map_fields_existing): Add refcount_set parameter, adjust calls
	to gomp_map_vars_existing.

	(gomp_map_vars_internal): Add refcount_set parameter, add local openmp_p
	variable to guard OpenMP specific paths, adjust calls to
	gomp_map_vars_existing, add structure element sibling splay_tree_key
	sequence creation code, adjust Fortran map case to avoid increment
	under OpenMP.
	(gomp_map_vars): Adjust to static, add refcount_set parameter, manage
	local refcount_set if caller passed in NULL, adjust call to
	gomp_map_vars_internal.
	(gomp_map_vars_async): Adjust and rename into...
	(goacc_map_vars): ...this new function, adjust call to
	gomp_map_vars_internal.

	(gomp_remove_splay_tree_key): New function with code factored out from
	gomp_remove_var_internal.
	(gomp_remove_var_internal): Add code to handle removing multiple
	splay_tree_key sequence for structure elements, adjust code to use
	gomp_remove_splay_tree_key for splay-tree key removal.
	(gomp_unmap_vars_internal): Add refcount_set parameter, adjust to use
	gomp_decrement_refcount.
	(gomp_unmap_vars): Adjust to static, add refcount_set parameter, manage
	local refcount_set if caller passed in NULL, adjust call to
	gomp_unmap_vars_internal.
	(gomp_unmap_vars_async): Adjust and rename into...
	(goacc_unmap_vars): ...this new function, adjust call to
	gomp_unmap_vars_internal.
	(GOMP_target): Manage refcount_set and adjust calls to gomp_map_vars and
	gomp_unmap_vars.
	(GOMP_target_ext): Likewise.
	(gomp_target_data_fallback): Adjust call to gomp_map_vars.
	(GOMP_target_data): Likewise.
	(GOMP_target_data_ext): Likewise.
	(GOMP_target_end_data): Adjust call to gomp_unmap_vars.
	(gomp_exit_data): Add refcount_set parameter, adjust to use
	gomp_decrement_refcount, adjust to queue splay-tree keys for removal
	after main loop.
	(GOMP_target_enter_exit_data): Manage refcount_set and adjust calls to
	gomp_map_vars and gomp_exit_data.
	(gomp_target_task_fn): Likewise.

	* testsuite/libgomp.c-c++-common/refcount-1.c: New testcase.
	* testsuite/libgomp.c-c++-common/struct-elem-1.c: New testcase.
	* testsuite/libgomp.c-c++-common/struct-elem-2.c: New testcase.
	* testsuite/libgomp.c-c++-common/struct-elem-3.c: New testcase.
	* testsuite/libgomp.c-c++-common/struct-elem-4.c: New testcase.
	* testsuite/libgomp.c-c++-common/struct-elem-5.c: New testcase.
2021-06-17 21:34:59 +08:00
Kwok Cheung Yeung
d656bfda2d openmp: Fix intermittent hanging of task-detach-6 libgomp tests [PR98738]
This adds support for the task detach clause to taskwait and taskgroup, and
simplifies the handling of the detach clause by moving most of the extra
handling required for detach tasks to omp_fulfill_event.

2021-02-25  Kwok Cheung Yeung  <kcy@codesourcery.com>
	    Jakub Jelinek  <jakub@redhat.com>

	libgomp/

	PR libgomp/98738
	* libgomp.h (enum gomp_task_kind): Add GOMP_TASK_DETACHED.
	(struct gomp_task): Replace detach and completion_sem fields with
	union containing completion_sem and detach_team.  Add deferred_p
	field.
	(struct gomp_team): Remove task_detach_queue.
	* task.c: Include assert.h.
	(gomp_init_task): Initialize deferred_p and completion_sem fields.
	Rearrange initialization order of fields.
	(task_fulfilled_p): Delete.
	(GOMP_task): Use address of task as the event handle.  Remove
	initialization of detach field.  Initialize deferred_p field.
	Use automatic local for completion_sem.  Initialize detach_team field
	for deferred tasks.
	(gomp_barrier_handle_tasks): Remove handling of task_detach_queue.
	Set kind of suspended detach task to GOMP_TASK_DETACHED and
	decrement task_running_count.  Move finish_cancelled block out of
	else branch.  Relocate call to gomp_team_barrier_done.
	(GOMP_taskwait): Handle tasks with completion events that have not
	been fulfilled.
	(GOMP_taskgroup_end): Likewise.
	(omp_fulfill_event): Use address of task as event handle.  Post to
	completion_sem for undeferred tasks.  Clear detach_team if task
	has not finished.  For finished tasks, handle post-execution tasks,
	call gomp_team_barrier_wake if necessary, and free task.
	* team.c (gomp_new_team): Remove initialization of task_detach_queue.
	(free_team): Remove free of task_detach_queue.
	* testsuite/libgomp.c-c++-common/task-detach-1.c: Fix formatting.
	* testsuite/libgomp.c-c++-common/task-detach-2.c: Fix formatting.
	* testsuite/libgomp.c-c++-common/task-detach-3.c: Fix formatting.
	* testsuite/libgomp.c-c++-common/task-detach-4.c: Fix formatting.
	* testsuite/libgomp.c-c++-common/task-detach-5.c: Fix formatting.
	Change data-sharing of detach events on enclosing parallel to private.
	* testsuite/libgomp.c-c++-common/task-detach-6.c: Likewise.  Remove
	taskwait directive.
	* testsuite/libgomp.c-c++-common/task-detach-7.c: New.
	* testsuite/libgomp.c-c++-common/task-detach-8.c: New.
	* testsuite/libgomp.c-c++-common/task-detach-9.c: New.
	* testsuite/libgomp.c-c++-common/task-detach-10.c: New.
	* testsuite/libgomp.c-c++-common/task-detach-11.c: New.
	* testsuite/libgomp.fortran/task-detach-1.f90: Fix formatting.
	* testsuite/libgomp.fortran/task-detach-2.f90: Fix formatting.
	* testsuite/libgomp.fortran/task-detach-3.f90: Fix formatting.
	* testsuite/libgomp.fortran/task-detach-4.f90: Fix formatting.
	* testsuite/libgomp.fortran/task-detach-5.f90: Fix formatting.
	Change data-sharing of detach events on enclosing parallel to private.
	* testsuite/libgomp.fortran/task-detach-6.f90: Likewise.  Remove
	taskwait directive.
	* testsuite/libgomp.fortran/task-detach-7.f90: New.
	* testsuite/libgomp.fortran/task-detach-8.f90: New.
	* testsuite/libgomp.fortran/task-detach-9.f90: New.
	* testsuite/libgomp.fortran/task-detach-10.f90: New.
	* testsuite/libgomp.fortran/task-detach-11.f90: New.
2021-02-25 14:47:11 -08:00
Kwok Cheung Yeung
a6d22fb21c openmp: Add support for the OpenMP 5.0 task detach clause
2021-01-16  Kwok Cheung Yeung  <kcy@codesourcery.com>

	gcc/
	* builtin-types.def
	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT): Rename
	to...
	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PTR):
	...this.  Add extra argument.
	* gimplify.c (omp_default_clause): Ensure that event handle is
	firstprivate in a task region.
	(gimplify_scan_omp_clauses): Handle OMP_CLAUSE_DETACH.
	(gimplify_adjust_omp_clauses): Likewise.
	* omp-builtins.def (BUILT_IN_GOMP_TASK): Change function type to
	BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PTR.
	* omp-expand.c (expand_task_call): Add GOMP_TASK_FLAG_DETACH to flags
	if detach clause specified.  Add detach argument when generating
	call to	GOMP_task.
	* omp-low.c (scan_sharing_clauses): Setup data environment for detach
	clause.
	(finish_taskreg_scan): Move field for variable containing the event
	handle to the front of the struct.
	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_DETACH.  Fix
	ordering.
	* tree-nested.c (convert_nonlocal_omp_clauses): Handle
	OMP_CLAUSE_DETACH clause.
	(convert_local_omp_clauses): Handle OMP_CLAUSE_DETACH clause.
	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE_DETACH.
	* tree.c (omp_clause_num_ops): Add entry for OMP_CLAUSE_DETACH.
	Fix ordering.
	(omp_clause_code_name): Add entry for OMP_CLAUSE_DETACH.  Fix
	ordering.
	(walk_tree_1): Handle OMP_CLAUSE_DETACH.

	gcc/c-family/
	* c-pragma.h (pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_DETACH.
	Redefine PRAGMA_OACC_CLAUSE_DETACH.

	gcc/c/
	* c-parser.c (c_parser_omp_clause_detach): New.
	(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH clause.
	(OMP_TASK_CLAUSE_MASK): Add mask for PRAGMA_OMP_CLAUSE_DETACH.
	* c-typeck.c (c_finish_omp_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH
	clause.  Prevent use of detach with mergeable and overriding the
	data sharing mode of the event handle.

	gcc/cp/
	* parser.c (cp_parser_omp_clause_detach): New.
	(cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH.
	(OMP_TASK_CLAUSE_MASK): Add mask for PRAGMA_OMP_CLAUSE_DETACH.
	* pt.c (tsubst_omp_clauses): Handle OMP_CLAUSE_DETACH clause.
	* semantics.c (finish_omp_clauses): Handle OMP_CLAUSE_DETACH clause.
	Prevent use of detach with mergeable and overriding the	data sharing
	mode of the event handle.

	gcc/fortran/
	* dump-parse-tree.c (show_omp_clauses): Handle detach clause.
	* frontend-passes.c (gfc_code_walker): Walk detach expression.
	* gfortran.h (struct gfc_omp_clauses): Add detach field.
	(gfc_c_intptr_kind): New.
	* openmp.c (gfc_free_omp_clauses): Free detach clause.
	(gfc_match_omp_detach): New.
	(enum omp_mask1): Add OMP_CLAUSE_DETACH.
	(enum omp_mask2): Remove OMP_CLAUSE_DETACH.
	(gfc_match_omp_clauses): Handle OMP_CLAUSE_DETACH for OpenMP.
	(OMP_TASK_CLAUSES): Add OMP_CLAUSE_DETACH.
	(resolve_omp_clauses): Prevent use of detach with mergeable and
	overriding the data sharing mode of the event handle.
	* trans-openmp.c (gfc_trans_omp_clauses): Handle detach clause.
	* trans-types.c (gfc_c_intptr_kind): New.
	(gfc_init_kinds): Initialize gfc_c_intptr_kind.
	* types.def
	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT): Rename
	to...
	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PTR):
	...this.  Add extra argument.

	gcc/testsuite/
	* c-c++-common/gomp/task-detach-1.c: New.
	* g++.dg/gomp/task-detach-1.C: New.
	* gcc.dg/gomp/task-detach-1.c: New.
	* gfortran.dg/gomp/task-detach-1.f90: New.

	include/
	* gomp-constants.h (GOMP_TASK_FLAG_DETACH): New.

	libgomp/
	* fortran.c (omp_fulfill_event_): New.
	* libgomp.h (struct gomp_task): Add detach and completion_sem fields.
	(struct gomp_team): Add task_detach_queue and task_detach_count
	fields.
	* libgomp.map (OMP_5.0.1): Add omp_fulfill_event and omp_fulfill_event_.
	* libgomp_g.h (GOMP_task): Add extra argument.
	* omp.h.in (enum omp_event_handle_t): New.
	(omp_fulfill_event): New.
	* omp_lib.f90.in (omp_event_handle_kind): New.
	(omp_fulfill_event): New.
	* omp_lib.h.in (omp_event_handle_kind): New.
	(omp_fulfill_event): Declare.
	* priority_queue.c (priority_tree_find): New.
	(priority_list_find): New.
	(priority_queue_find): New.
	* priority_queue.h (priority_queue_predicate): New.
	(priority_queue_find): New.
	* task.c (gomp_init_task): Initialize detach field.
	(task_fulfilled_p): New.
	(GOMP_task): Add detach argument.  Ignore detach argument if
	GOMP_TASK_FLAG_DETACH not set in flags.  Initialize completion_sem
	field.	Copy address of completion_sem into detach argument and
	into the start of the data record.  Wait for detach event if task
	not deferred.
	(gomp_barrier_handle_tasks): Queue tasks with unfulfilled events.
	Remove completed tasks and requeue dependent tasks.
	(omp_fulfill_event): New.
	* team.c (gomp_new_team): Initialize task_detach_queue and
	task_detach_count fields.
	(free_team): Free task_detach_queue field.
	* testsuite/libgomp.c-c++-common/task-detach-1.c: New testcase.
	* testsuite/libgomp.c-c++-common/task-detach-2.c: New testcase.
	* testsuite/libgomp.c-c++-common/task-detach-3.c: New testcase.
	* testsuite/libgomp.c-c++-common/task-detach-4.c: New testcase.
	* testsuite/libgomp.c-c++-common/task-detach-5.c: New testcase.
	* testsuite/libgomp.c-c++-common/task-detach-6.c: New testcase.
	* testsuite/libgomp.fortran/task-detach-1.f90: New testcase.
	* testsuite/libgomp.fortran/task-detach-2.f90: New testcase.
	* testsuite/libgomp.fortran/task-detach-3.f90: New testcase.
	* testsuite/libgomp.fortran/task-detach-4.f90: New testcase.
	* testsuite/libgomp.fortran/task-detach-5.f90: New testcase.
	* testsuite/libgomp.fortran/task-detach-6.f90: New testcase.
2021-01-16 12:58:13 -08:00
Jakub Jelinek
99dee82307 Update copyright years. 2021-01-04 10:26:59 +01:00
Kwok Cheung Yeung
6fae7eda96 openmp: Retire nest-var ICV for OpenMP 5.1
This removes the nest-var ICV, expressing nesting in terms of the
max-active-levels-var ICV instead.  The max-active-levels-var ICV
is now per data environment rather than per device.

2020-11-18  Kwok Cheung Yeung  <kcy@codesourcery.com>

	libgomp/
	* env.c (gomp_global_icv): Remove nest_var field.  Add
	max_active_levels_var field.
	(gomp_max_active_levels_var): Remove.
	(parse_boolean): Return true on success.
	(handle_omp_display_env): Express OMP_NESTED in terms of
	max_active_levels_var.  Change format specifier for
	max_active_levels_var.
	(initialize_env): Set max_active_levels_var from
	OMP_MAX_ACTIVE_LEVELS, OMP_NESTED, OMP_NUM_THREADS and
	OMP_PROC_BIND.
	* icv.c (omp_set_nested): Express in terms of
	max_active_levels_var.
	(omp_get_nested): Likewise.
	(omp_set_max_active_levels): Use max_active_levels_var field instead
	of gomp_max_active_levels_var.
	(omp_get_max_active_levels): Likewise.
	* libgomp.h (struct gomp_task_icv): Remove nest_var field.  Add
	max_active_levels_var field.
	(gomp_supported_active_levels): Set to UCHAR_MAX.
	(gomp_max_active_levels_var): Delete.
	* libgomp.texi (omp_get_nested): Update documentation.
	(omp_set_nested): Likewise.
	(OMP_MAX_ACTIVE_LEVELS): Likewise.
	(OMP_NESTED): Likewise.
	(OMP_NUM_THREADS): Likewise.
	(OMP_PROC_BIND): Likewise.
	* parallel.c (gomp_resolve_num_threads): Replace reference
	to nest_var with max_active_levels_var.  Use max_active_levels_var
	field instead of gomp_max_active_levels_var.
2020-11-18 11:24:36 -08:00
Chung-Lin Tang
9e62802422 openmp: Implement OpenMP 5.0 base-pointer attachement and clause ordering
This patch implements some parts of the target variable mapping changes
specified in OpenMP 5.0, including base-pointer attachment/detachment
behavior for array section list-items in map clauses, and ordering of
map clauses according to map kind.

2020-11-10  Chung-Lin Tang  <cltang@codesourcery.com>

gcc/c-family/ChangeLog:

	* c-common.h (c_omp_adjust_map_clauses): New declaration.
	* c-omp.c (struct map_clause): Helper type for c_omp_adjust_map_clauses.
	(c_omp_adjust_map_clauses): New function.

gcc/c/ChangeLog:

	* c-parser.c (c_parser_omp_target_data): Add use of
	new c_omp_adjust_map_clauses function. Add GOMP_MAP_ATTACH_DETACH as
	handled map clause kind.
	(c_parser_omp_target_enter_data): Likewise.
	(c_parser_omp_target_exit_data): Likewise.
	(c_parser_omp_target): Likewise.
	* c-typeck.c (handle_omp_array_sections): Adjust COMPONENT_REF case to
	use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type.
	(c_finish_omp_clauses): Adjust bitmap checks to allow struct decl and
	same struct field access to co-exist on OpenMP construct.

gcc/cp/ChangeLog:

	* parser.c (cp_parser_omp_target_data): Add use of
	new c_omp_adjust_map_clauses function. Add GOMP_MAP_ATTACH_DETACH as
	handled map clause kind.
	(cp_parser_omp_target_enter_data): Likewise.
	(cp_parser_omp_target_exit_data): Likewise.
	(cp_parser_omp_target): Likewise.
	* semantics.c (handle_omp_array_sections): Adjust COMPONENT_REF case to
	use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type. Fix
	interaction between reference case and attach/detach.
	(finish_omp_clauses): Adjust bitmap checks to allow struct decl and
	same struct field access to co-exist on OpenMP construct.

gcc/ChangeLog:

	* gimplify.c (is_or_contains_p): New static helper function.
	(omp_target_reorder_clauses): New function.
	(gimplify_scan_omp_clauses): Add use of omp_target_reorder_clauses to
	reorder clause list according to OpenMP 5.0 rules. Add handling of
	GOMP_MAP_ATTACH_DETACH for OpenMP cases.
	* omp-low.c (is_omp_target): New static helper function.
	(scan_sharing_clauses): Add scan phase handling of GOMP_MAP_ATTACH/DETACH
	for OpenMP cases.
	(lower_omp_target): Add lowering handling of GOMP_MAP_ATTACH/DETACH for
	OpenMP cases.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/clauses-2.c: Remove dg-error cases now valid.
	* gfortran.dg/gomp/map-2.f90: Likewise.
	* c-c++-common/gomp/map-5.c: New testcase.

libgomp/ChangeLog:

	* libgomp.h (enum gomp_map_vars_kind): Adjust enum values to be bit-flag
	usable.
	* oacc-mem.c (acc_map_data): Adjust gomp_map_vars argument flags to
	'GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_ENTER_DATA'.
	(goacc_enter_datum): Likewise for call to gomp_map_vars_async.
	(goacc_enter_data_internal): Likewise.
	* target.c (gomp_map_vars_internal):
	Change checks of GOMP_MAP_VARS_ENTER_DATA to use bit-and (&). Adjust use
	of gomp_attach_pointer for OpenMP cases.
	(gomp_exit_data): Add handling of GOMP_MAP_DETACH.
	(GOMP_target_enter_exit_data): Add handling of GOMP_MAP_ATTACH.
	* testsuite/libgomp.c-c++-common/ptr-attach-1.c: New testcase.
2020-11-10 03:36:58 -08:00
Kwok Cheung Yeung
1bfc07d150 openmp: Implement support for OMP_TARGET_OFFLOAD environment variable
This implements support for the OMP_TARGET_OFFLOAD environment variable
introduced in the OpenMP 5.0 standard, which controls how offloading
is handled.  It may be set to MANDATORY (abort if offloading cannot be
performed), DISABLED (no offloading to devices) or DEFAULT (offload to
device if possible, fall back to host if not).

2020-10-20  Kwok Cheung Yeung  <kcy@codesourcery.com>

	libgomp/
	* env.c (gomp_target_offload_var): New.
	(parse_target_offload): New.
	(handle_omp_display_env): Print value of OMP_TARGET_OFFLOAD.
	(initialize_env): Parse OMP_TARGET_OFFLOAD.
	* libgomp.h (gomp_target_offload_t): New.
	(gomp_target_offload_var): New.
	* libgomp.texi (OMP_TARGET_OFFLOAD): New section.
	* target.c (resolve_device): Generate error if device not found and
	offloading is mandatory.
	(gomp_target_fallback): Generate error if offloading is mandatory.
	(GOMP_target): Add argument in call to gomp_target_fallback.
	(GOMP_target_ext): Likewise.
	(gomp_target_data_fallback): Generate error if offloading is mandatory.
	(GOMP_target_data): Add argument in call to gomp_target_data_fallback.
	(GOMP_target_data_ext): Likewise.
	(gomp_target_task_fn): Add argument in call to gomp_target_fallback.
	(gomp_target_init): Return early if offloading is disabled.
2020-10-20 04:16:26 -07:00
Kwok Cheung Yeung
8949b985db openmp: Add support for the omp_get_supported_active_levels runtime library routine
This patch implements the omp_get_supported_active_levels runtime routine
from the OpenMP 5.0 specification, which returns the maximum number of
active nested parallel regions supported by this implementation.  The
current maximum (set using the omp_set_max_active_levels routine or the
OMP_MAX_ACTIVE_LEVELS environment variable) cannot exceed this number.

2020-10-13  Kwok Cheung Yeung  <kcy@codesourcery.com>

	libgomp/
	* env.c (gomp_max_active_levels_var): Initialize to
	gomp_supported_active_levels.
	(initialize_env): Limit gomp_max_active_levels_var to be at most
	equal to gomp_supported_active_levels.
	* fortran.c (omp_get_supported_active_levels): Add ialias_redirect.
	(omp_get_supported_active_levels_): New.
	* icv.c (omp_set_max_active_levels): Limit gomp_max_active_levels_var
	to at most equal to gomp_supported_active_levels.
	(omp_get_supported_active_levels): New.
	* libgomp.h (gomp_supported_active_levels): New.
	* libgomp.map (OMP_5.0.1): Add omp_get_supported_active_levels and
	omp_get_supported_active_levels_.
	* libgomp.texi (omp_get_supported_active_levels): New.
	(omp_set_max_active_levels): Update.  Add reference to
	omp_get_supported_active_levels.
	* omp.h.in (omp_get_supported_active_levels): New.
	* omp_lib.f90.in (omp_get_supported_active_levels): New.
	* omp_lib.h.in (omp_get_supported_active_levels): New.
	* testsuite/libgomp.c/lib-2.c (main): Check omp_get_max_active_levels
	against omp_get_supported_active_levels.
	* testsuite/libgomp.fortran/lib4.f90 (lib4): Likewise.
2020-10-13 13:21:02 -07:00
Tobias Burnus
972da55746 OpenMP/Fortran: Fix (re)mapping of allocatable/pointer arrays [PR96668]
gcc/cp/ChangeLog:

	PR fortran/96668
	* cp-gimplify.c (cxx_omp_finish_clause): Add bool openacc arg.
	* cp-tree.h (cxx_omp_finish_clause): Likewise
	* semantics.c (handle_omp_for_class_iterator): Update call.

gcc/fortran/ChangeLog:

	PR fortran/96668
	* trans.h (gfc_omp_finish_clause): Add bool openacc arg.
	* trans-openmp.c (gfc_omp_finish_clause): Ditto. Use
	GOMP_MAP_ALWAYS_POINTER with PSET for pointers.
	(gfc_trans_omp_clauses): Like the latter and also if the always
	modifier is used.

gcc/ChangeLog:

	PR fortran/96668
	* gimplify.c (gimplify_omp_for): Add 'bool openacc' argument;
	update omp_finish_clause calls.
	(gimplify_adjust_omp_clauses_1, gimplify_adjust_omp_clauses,
	gimplify_expr, gimplify_omp_loop): Update omp_finish_clause
	and/or gimplify_for calls.
	* langhooks-def.h (lhd_omp_finish_clause): Add bool openacc arg.
	* langhooks.c (lhd_omp_finish_clause): Likewise.
	* langhooks.h (lhd_omp_finish_clause): Likewise.
	* omp-low.c (scan_sharing_clauses): Keep GOMP_MAP_TO_PSET cause for
	'declare target' vars.

include/ChangeLog:

	PR fortran/96668
	* gomp-constants.h (GOMP_MAP_ALWAYS_POINTER_P): Define.

libgomp/ChangeLog:

	PR fortran/96668
	* libgomp.h (struct target_var_desc): Add has_null_ptr_assoc member.
	* target.c (gomp_map_vars_existing): Add always_to_flag flag.
	(gomp_map_vars_existing): Update call to it.
	(gomp_map_fields_existing): Likewise
	(gomp_map_vars_internal): Update PSET handling such that if a nullptr is
	now allocated or if GOMP_MAP_POINTER is used PSET is updated and pointer
	remapped.
	(GOMP_target_enter_exit_data): Hanlde GOMP_MAP_ALWAYS_POINTER like
	GOMP_MAP_POINTER.
	* testsuite/libgomp.fortran/map-alloc-ptr-1.f90: New test.
	* testsuite/libgomp.fortran/map-alloc-ptr-2.f90: New test.
2020-09-15 09:24:47 +02:00
Julian Brown
bc4ed079dc openacc: Deep copy attach/detach should not affect reference counts
Attach and detach operations are not supposed to affect structural or
dynamic reference counts for OpenACC. Previously they did so, which led to
subtle problems in some circumstances. We can avoid reference-counting
attach/detach operations by extending and slightly repurposing the
do_detach field in target_var_desc. It is now called is_attach to better
reflect its new role.

2020-07-27  Julian Brown  <julian@codesourcery.com>
	    Thomas Schwinge  <thomas@codesourcery.com>

libgomp/
	* libgomp.h (struct target_var_desc): Rename do_detach field to
	is_attach.
	* oacc-mem.c (goacc_exit_datum_1): Add assert.  Don't set finalize for
	GOMP_MAP_FORCE_DETACH. Update checking to use is_attach field.
	(goacc_enter_data_internal): Don't affect reference counts
	for attach mappings.
	(goacc_exit_data_internal): Don't affect reference counts for detach
	mappings.
	* target.c (gomp_map_vars_existing): Don't affect reference counts for
	attach mappings.
	(gomp_map_vars_internal): Set renamed is_attach flag unconditionally to
	mark attach mappings.
	(gomp_unmap_vars_internal): Use is_attach flag to prevent affecting
	reference count for attach mappings.
	* testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/mdc-refcount-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/mdc-refcount-2.c: New test.
	* testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90: Mark
	test as shouldfail.
	* testsuite/libgomp.oacc-fortran/deep-copy-6.f90: Adjust to fail
	gracefully in no-finalize mode.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2020-07-27 09:16:57 -07:00
Julian Brown
6f5b4b64d2 openacc: Adjust dynamic reference count semantics
This patch adjusts how dynamic reference counts work so that they match
the semantics of the source program more closely, instead of representing
"excess" reference counts beyond those that represent pointers in the
internal libgomp splay-tree data structure. This allows some corner
cases to be handled more gracefully.

2020-07-10  Julian Brown  <julian@codesourcery.com>
	    Thomas Schwinge  <thomas@codesourcery.com>

	libgomp/
	* libgomp.h (struct splay_tree_key_s): Change virtual_refcount to
	dynamic_refcount.
	(struct gomp_device_descr): Remove GOMP_MAP_VARS_OPENACC_ENTER_DATA.
	* oacc-mem.c (acc_map_data): Substitute virtual_refcount for
	dynamic_refcount.
	(acc_unmap_data): Update comment.
	(goacc_map_var_existing, goacc_enter_datum): Adjust for
	dynamic_refcount semantics.
	(goacc_exit_datum_1, goacc_exit_datum): Re-add some error checking.
	Adjust for dynamic_refcount semantics.
	(goacc_enter_data_internal): Implement "present" case of dynamic
	memory-map handling here.  Update "non-present" case for
	dynamic_refcount semantics.
	(goacc_exit_data_internal): Use goacc_exit_datum_1.
	* target.c (gomp_map_vars_internal): Remove
	GOMP_MAP_VARS_OPENACC_ENTER_DATA handling.  Update for dynamic_refcount
	handling.
	(gomp_unmap_vars_internal): Remove virtual_refcount handling.
	(gomp_load_image_to_device): Substitute dynamic_refcount for
	virtual_refcount.
	* testsuite/libgomp.oacc-c-c++-common/pr92843-1.c: Remove XFAILs.
	* testsuite/libgomp.oacc-c-c++-common/refcounting-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/refcounting-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/struct-3-1-1.c: New test.
	* testsuite/libgomp.oacc-fortran/deep-copy-6.f90: Remove XFAILs and
	trace output.
	* testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90: Remove
	trace output.
	* testsuite/libgomp.oacc-fortran/dynamic-incr-structural-1.f90: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/structured-dynamic-lifetimes-4.c:
	Remove stale comment.
	* testsuite/libgomp.oacc-fortran/mdc-refcount-1-1-1.f90: Remove XFAILs.
	* testsuite/libgomp.oacc-fortran/mdc-refcount-1-1-2.F90: Likewise.
	* testsuite/libgomp.oacc-fortran/mdc-refcount-1-2-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/mdc-refcount-1-2-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/mdc-refcount-1-3-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/mdc-refcount-1-4-1.f90: Adjust XFAIL.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2020-07-10 08:07:12 -07:00
Jakub Jelinek
800bcc8c00 openmp: Add basic library allocator support.
This patch adds very basic allocator support (omp_{init,destroy}_allocator,
omp_{alloc,free}, omp_[sg]et_default_allocator).
The plan is to use memkind (likely dlopened) for high bandwidth memory, but
that part isn't implemented yet, probably mlock for pinned memory and see
what other options there are for other kinds of memory.
For offloading targets, we need to decide if we want to support the
dynamic allocators (and on which targets), or if e.g. all we do is at compile
time replace omp_alloc/omp_free calls with constexpr predefined allocators
with something special.

And allocate directive and allocator/uses_allocators clauses are future work
too.

2020-05-19  Jakub Jelinek  <jakub@redhat.com>

	* omp.h.in (omp_uintptr_t): New typedef.
	(__GOMP_UINTPTR_T_ENUM): Define.
	(omp_memspace_handle_t, omp_allocator_handle_t, omp_alloctrait_key_t,
	omp_alloctrait_value_t, omp_alloctrait_t): New typedefs.
	(__GOMP_DEFAULT_NULL_ALLOCATOR): Define.
	(omp_init_allocator, omp_destroy_allocator, omp_set_default_allocator,
	omp_get_default_allocator, omp_alloc, omp_free): Declare.
	* libgomp.h (struct gomp_team_state): Add def_allocator field.
	(gomp_def_allocator): Declare.
	* libgomp.map (OMP_5.0.1): Export omp_set_default_allocator,
	omp_get_default_allocator, omp_init_allocator, omp_destroy_allocator,
	omp_alloc and omp_free.
	* team.c (gomp_team_start): Copy over ts.def_allocator.
	* env.c (gomp_def_allocator): New variable.
	(parse_wait_policy): Adjust function comment.
	(parse_allocator): New function.
	(handle_omp_display_env): Print OMP_ALLOCATOR.
	(initialize_env): Call parse_allocator.
	* Makefile.am (libgomp_la_SOURCES): Add allocator.c.
	* allocator.c: New file.
	* icv.c (omp_set_default_allocator, omp_get_default_allocator): New
	functions.
	* testsuite/libgomp.c-c++-common/alloc-1.c: New test.
	* testsuite/libgomp.c-c++-common/alloc-2.c: New test.
	* testsuite/libgomp.c-c++-common/alloc-3.c: New test.
	* Makefile.in: Regenerated.
2020-05-19 10:11:01 +02:00
Thomas Schwinge
6fc0385c0c OpenACC 'acc_get_property' cleanup
include/
	* gomp-constants.h (enum gomp_device_property): Remove.
	libgomp/
	* libgomp-plugin.h (enum goacc_property): New.  Adjust all users
	to use this instead of 'enum gomp_device_property'.
	(GOMP_OFFLOAD_get_property): Rename to...
	(GOMP_OFFLOAD_openacc_get_property): ... this.  Adjust all users.
	* libgomp.h (struct gomp_device_descr): Move
	'GOMP_OFFLOAD_openacc_get_property'...
	(struct acc_dispatch_t): ... here.  Adjust all users.
	* plugin/plugin-hsa.c (GOMP_OFFLOAD_get_property): Remove.
	liboffloadmic/
	* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_property):
	Remove.

From-SVN: r280150
2020-01-10 23:24:36 +01:00
Jakub Jelinek
91df4397a1 re PR libgomp/93219 (unused return value in affinity-fmt.c)
PR libgomp/93219
	* libgomp.h (gomp_print_string): Change return type from void to int.
	* affinity-fmt.c (gomp_print_string): Likewise.  Return true if
	not all characters have been written.

From-SVN: r280137
2020-01-10 21:42:00 +01:00
Jakub Jelinek
8d9254fc8a Update copyright years.
From-SVN: r279813
2020-01-01 12:51:42 +01:00
Maciej W. Rozycki
6c84c8bf9b Add OpenACC 2.6 `acc_get_property' support
Add generic support for the OpenACC 2.6 `acc_get_property' and
`acc_get_property_string' routines, as well as full handlers for the
host and the NVPTX offload targets and minimal handlers for the HSA,
Intel MIC, and AMD GCN offload targets.

Included are C/C++ and Fortran tests that, in particular, print
the property values for acc_property_vendor, acc_property_memory,
acc_property_free_memory, acc_property_name, and acc_property_driver.
The output looks as follows:

Vendor: GNU
Name: GOMP
Total memory: 0
Free memory: 0
Driver: 1.0

with the host driver (where the memory related properties are not
supported for the host device and yield 0, conforming to the standard)
and output like:

Vendor: Nvidia
Total memory: 12651462656
Free memory: 12202737664
Name: TITAN V
Driver: CUDA Driver 9.1

with the NVPTX driver.

2019-12-22  Maciej W. Rozycki  <macro@codesourcery.com>
	    Frederik Harwath  <frederik@codesourcery.com>
	    Thomas Schwinge  <tschwinge@codesourcery.com>

	include/
	* gomp-constants.h (gomp_device_property): New enum.

	libgomp/
	* libgomp.h (gomp_device_descr): Add `get_property_func' member.
	* libgomp-plugin.h (gomp_device_property_value): New union.
	(gomp_device_property_value): New prototype.
	* openacc.h (acc_device_t): Add `acc_device_current' enumeration
	constant.
	(acc_device_property_t): New enum.
	(acc_get_property, acc_get_property_string): New prototypes.
	* oacc-init.c (acc_get_device_type): Also assert that result
	is not `acc_device_current'.
	(get_property_any, acc_get_property, acc_get_property_string):
	New functions.
	* openacc.f90 (openacc_kinds): Add `acc_device_current' and
	`acc_property_memory', `acc_property_free_memory',
	`acc_property_name', `acc_property_vendor' and
	`acc_property_driver' constants.  Add `acc_device_property' data
	type.
	(openacc_internal): Add `acc_get_property' and
	`acc_get_property_string' interfaces.  Add `acc_get_property_h',
	`acc_get_property_string_h', `acc_get_property_l' and
	`acc_get_property_string_l'.
	* oacc-host.c (host_get_property): New function.
	(host_dispatch): Wire it.
	* target.c (gomp_load_plugin_for_device): Handle `get_property'.
	* libgomp.map (OACC_2.6): Add `acc_get_property', `acc_get_property_h_',
	`acc_get_property_string' and `acc_get_property_string_h_' symbols.
	* libgomp.texi (OpenACC Runtime Library Routines): Add
	`acc_get_property'.
	(acc_get_property): New node.
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_property): New
	function (stub).
	* plugin/plugin-hsa.c (GOMP_OFFLOAD_get_property): New function.
	* plugin/plugin-nvptx.c (CUDA_CALLS): Add `cuDeviceGetName',
	`cuDeviceTotalMem', `cuDriverGetVersion' and `cuMemGetInfo'
	calls.
	(GOMP_OFFLOAD_get_property): New function.
	(struct ptx_device): Add new field "name".
	(cuda_driver_version_s): Add new static variable ...
	(nvptx_init): ... and init from here.

	* testsuite/libgomp.oacc-c-c++-common/acc_get_property.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-3.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c: New file
	with test helper functions.

	* testsuite/libgomp.oacc-fortran/acc_get_property.f90: New test.

	liboffloadmic/
	* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_property):
	New function.

Reviewed-by: Thomas Schwinge <thomas@codesourcery.com>


Co-Authored-By: Frederik Harwath <frederik@codesourcery.com>
Co-Authored-By: Thomas Schwinge <tschwinge@codesourcery.com>

From-SVN: r279710
2019-12-22 19:54:09 +00:00
Julian Brown
8e7e71ff24 OpenACC 2.6 deep copy: libgomp parts
include/
	* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_4, GOMP_MAP_DEEP_COPY):
	Define.
	(gomp_map_kind): Add GOMP_MAP_ATTACH, GOMP_MAP_DETACH,
	GOMP_MAP_FORCE_DETACH.

	libgomp/
	* libgomp.h (struct target_var_desc): Add do_detach flag.
	* oacc-init.c (acc_shutdown_1): Free aux block if present.
	* oacc-mem.c (find_group_last): Add SIZES parameter. Support
	struct components.  Tidy up and add some new checks.
	(goacc_enter_data_internal): Update call to find_group_last.
	(goacc_exit_data_internal): Support detach operations and
	GOMP_MAP_STRUCT.
	(GOACC_enter_exit_data): Handle initial GOMP_MAP_STRUCT or
	GOMP_MAP_FORCE_PRESENT in finalization detection code.  Handle
	attach/detach in enter/exit data detection code.
	* target.c (gomp_map_vars_existing): Initialise do_detach field of
	tgt_var_desc.
	(gomp_map_vars_internal): Support attach.
	(gomp_unmap_vars_internal): Support detach.

From-SVN: r279625
2019-12-20 01:20:30 +00:00
Julian Brown
5d5be7bfb5 OpenACC 2.6 deep copy: attach/detach API routines
libgomp/
	* libgomp.h (struct splay_tree_aux): Add attach_count field.
	(gomp_attach_pointer, gomp_detach_pointer): Add prototypes.
	* libgomp.map (OACC_2.6): New section. Add acc_attach,
	acc_attach_async, acc_detach, acc_detach_async, acc_detach_finalize,
	acc_detach_finalize_async.
	* oacc-mem.c (acc_attach_async, acc_attach, goacc_detach_internal,
	acc_detach, acc_detach_async, acc_detach_finalize,
	acc_detach_finalize_async): New functions.
	* openacc.h (acc_attach, acc_attach_async, acc_detach,
	(acc_detach_async, acc_detach_finalize, acc_detach_finalize_async): Add
	prototypes.
	* target.c (gomp_attach_pointer, gomp_detach_pointer): New functions.
	(gomp_remove_var_internal): Free attachment counts if present.
	* testsuite/libgomp.oacc-c-c++-common/deep-copy-3.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/deep-copy-5.c: New test.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>

From-SVN: r279624
2019-12-20 01:20:27 +00:00
Julian Brown
5bcd470bf0 Use gomp_map_val for OpenACC host-to-device address translation
libgomp/
	* libgomp.h (gomp_map_val): Add prototype.
	* oacc-parallel.c (GOACC_parallel_keyed): Use gomp_map_val instead of
	open-coding device-address calculation.
	* target.c (gomp_map_val): Make global. Use OFFSET_POINTER in
	non-present case.

Co-Authored-By: Cesar Philippidis <cesar@codesourcery.com>

From-SVN: r279622
2019-12-20 01:20:19 +00:00
Julian Brown
378da98fcc OpenACC reference count overhaul
libgomp/
	* libgomp.h (struct splay_tree_key_s): Substitute dynamic_refcount
	field for virtual_refcount.
	(enum gomp_map_vars_kind): Add GOMP_MAP_VARS_OPENACC_ENTER_DATA.
	(gomp_free_memmap): Remove prototype.
	* oacc-init.c (acc_shutdown_1): Iteratively call gomp_remove_var
	instead of calling gomp_free_memmap.
	* oacc-mem.c (acc_map_data): Use virtual_refcount instead of
	dynamic_refcount.
	(acc_unmap_data): Open code instead of forcing target_mem_desc's
	to_free field to NULL then calling gomp_unmap_vars.  Handle
	REFCOUNT_INFINITY on target blocks.
	(goacc_enter_data): Rename to...
	(goacc_enter_datum): ...this.  Remove MAPNUM parameter and special
	handling for mapping groups.  Use virtual_refcount instead of
	dynamic_refcount.  Use GOMP_MAP_VARS_OPENACC_ENTER_DATA for
	map_map_vars_async call.  Re-do lookup for target pointer return value.
	(acc_create, acc_create_async, acc_copyin, acc_copyin_async): Call
	renamed goacc_enter_datum function.
	(goacc_exit_data): Rename to...
	(goacc_exit_datum): ...this.  Update for virtual_refcount semantics.
	(acc_delete, acc_delete_async, acc_delete_finalize,
	acc_delete_finalize_async, acc_copyout, acc_copyout_async,
	acc_copyout_finalize, acc_copyout_finalize_async): Call renamed
	goacc_exit_datum function.
	(gomp_acc_remove_pointer, find_pointer): Remove functions.
	(find_group_last, goacc_enter_data_internal, goacc_exit_data_internal):
	New functions.
	(GOACC_enter_exit_data): Use goacc_enter_data_internal and
	goacc_exit_data_internal helper functions.
	* target.c (gomp_map_vars_internal): Handle
	GOMP_MAP_VARS_OPENACC_ENTER_DATA.  Update for virtual_refcount
	semantics.
	(gomp_unmap_vars_internal): Update for virtual_refcount semantics.
	(gomp_load_image_to_device, omp_target_associate_ptr): Zero-initialise
	virtual_refcount field instead of dynamic_refcount.
	(gomp_free_memmap): Remove function.
	* testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c: New test.
	* testsuite/libgomp.c-c++-common/unmap-infinity-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/pr92843-1.c: Add XFAIL.

From-SVN: r279621
2019-12-20 01:20:16 +00:00
Julian Brown
2a656a9359 Use aux struct in libgomp for infrequently-used/API-specific data
libgomp/
	* libgomp.h (struct splay_tree_aux): New.
	(struct splay_tree_key_s): Replace link_key field with aux pointer.
	* target.c (gomp_map_vars_internal): Adjust for link_key being moved
	to aux struct.
	(gomp_remove_var_internal): Free aux block if present.
	(gomp_load_image_to_device): Zero-initialise aux field instead of
	link_key field.
	(omp_target_associate_pointer): Zero-initialise aux field.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>

From-SVN: r279620
2019-12-20 01:20:13 +00:00
Thomas Schwinge
6278b54922 Make 'libgomp/target.c:gomp_unmap_tgt' 'static' again
This got changed to 'attribute_hidden' in r271128, but it's not actually used
outside of 'libgomp/target.c'.

	libgomp/
	* target.c (gomp_unmap_tgt): Make it 'static'.
	* libgomp.h (gomp_unmap_tgt): Remove.

From-SVN: r279529
2019-12-18 18:00:28 +01:00
Julian Brown
1cbd94e834 Fix potential race condition in OpenACC "exit data" operations
PR libgomp/92881

	libgomp/
	* libgomp.h (gomp_remove_var_async): Add prototype.
	* oacc-mem.c (delete_copyout): Call gomp_remove_var_async instead of
	gomp_remove_var.
	* target.c (gomp_unref_tgt): Change return type to bool, indicating
	whether target_mem_desc was unmapped.
	(gomp_unref_tgt_void): New.
	(gomp_remove_var): Reimplement in terms of...
	(gomp_remove_var_internal): ...this new helper function.
	(gomp_remove_var_async): New, implemented using above helper function.
	(gomp_unmap_vars_internal): Use gomp_unref_tgt_void instead of
	gomp_unref_tgt.

Reviewed-by: Thomas Schwinge <thomas@codesourcery.com>

From-SVN: r279388
2019-12-13 23:14:15 +00:00
Thomas Schwinge
57963e3934 [OpenACC] Consolidate 'GOACC_enter_exit_data' and its helper functions in 'libgomp/oacc-mem.c'
libgomp/
	* oacc-parallel.c (find_pointer, GOACC_enter_exit_data): Move...
	* oacc-mem.c: ... here.
	(gomp_acc_insert_pointer, gomp_acc_remove_pointer): Rename to
	'goacc_insert_pointer', 'goacc_remove_pointer', and make 'static'.
	* libgomp.h (gomp_acc_insert_pointer, gomp_acc_remove_pointer):
	Remove.
	* libgomp_g.h: Update.

From-SVN: r279233
2019-12-11 17:49:17 +01:00
Thomas Schwinge
47afc7b4dd [PR92116, PR92877] [OpenACC] Replace 'openacc.data_environ' by standard libgomp mechanics
libgomp/
	PR libgomp/92116
	PR libgomp/92877
	* oacc-mem.c (lookup_dev): Reimplement.  Adjust all users.
	* libgomp.h (struct acc_dispatch_t): Remove 'data_environ' member.
	Adjust all users.
	* testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-4-2.c:
	Remove XFAIL.
	* testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr92877-1.c: New file.

Co-Authored-By: Julian Brown <julian@codesourcery.com>

From-SVN: r279147
2019-12-09 23:52:56 +01:00
Andrew Stubbs
cee1645106 Optimize GCN OpenMP malloc performance
2019-11-13  Andrew Stubbs  <ams@codesourcery.com>

	libgomp/
	* config/gcn/team.c (gomp_gcn_enter_kernel): Set up the team arena
	and use team_malloc variants.
	(gomp_gcn_exit_kernel): Use team_free.
	* libgomp.h (TEAM_ARENA_SIZE): Define.
	(TEAM_ARENA_START): Define.
	(TEAM_ARENA_FREE): Define.
	(TEAM_ARENA_END): Define.
	(team_malloc): New function.
	(team_malloc_cleared): New function.
	(team_free): New function.
	* team.c (gomp_new_team): Initialize and use team_malloc.
	(free_team): Use team_free.
	(gomp_free_thread): Use team_free.
	(gomp_pause_host): Use team_free.
	* work.c (gomp_init_work_share): Use team_malloc.
	(gomp_fini_work_share): Use team_free.

From-SVN: r278136
2019-11-13 12:38:09 +00:00