Commit graph

156 commits

Author SHA1 Message Date
Thomas Schwinge
130c2f3c3a libgomp: Simplify OpenMP reverse offload host <-> device memory copy implementation
... by using the existing 'goacc_asyncqueue' instead of re-coding parts of it.

Follow-up to commit 131d18e928
"libgomp/nvptx: Prepare for reverse-offload callback handling",
and commit ea4b23d9c8
"libgomp: Handle OpenMP's reverse offloads".

	libgomp/
	* target.c (gomp_target_rev): Instead of 'dev_to_host_cpy',
	'host_to_dev_cpy', 'token', take a single 'goacc_asyncqueue'.
	* libgomp.h (gomp_target_rev): Adjust.
	* libgomp-plugin.c (GOMP_PLUGIN_target_rev): Adjust.
	* libgomp-plugin.h (GOMP_PLUGIN_target_rev): Adjust.
	* plugin/plugin-gcn.c (process_reverse_offload): Adjust.
	* plugin/plugin-nvptx.c (rev_off_dev_to_host_cpy)
	(rev_off_host_to_dev_cpy): Remove.
	(GOMP_OFFLOAD_run): Adjust.
2023-05-08 15:58:05 +02:00
Thomas Schwinge
f8332e52a4 Use 'GOMP_MAP_VARS_TARGET' for OpenACC compute constructs [PR90596]
Thereby considerably simplify the device plugins' 'GOMP_OFFLOAD_openacc_exec',
'GOMP_OFFLOAD_openacc_async_exec' functions: in terms of lines of code, but in
particular conceptually: no more device memory allocation, host to device data
copying, device memory deallocation -- 'GOMP_MAP_VARS_TARGET' does all that for
us.

This depends on commit 2b2340e236
"Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data",
where I said that "a use will emerge later", which is this one here.

	PR libgomp/90596
	libgomp/
	* target.c (gomp_map_vars_internal): Allow for
	'param_kind == GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_TARGET'.
	* oacc-parallel.c (GOACC_parallel_keyed): Pass
	'GOMP_MAP_VARS_TARGET' to 'goacc_map_vars'.
	* plugin/plugin-gcn.c (alloc_by_agent, gcn_exec)
	(GOMP_OFFLOAD_openacc_exec, GOMP_OFFLOAD_openacc_async_exec):
	Adjust, simplify.
	(gomp_offload_free): Remove.
	* plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec)
	(GOMP_OFFLOAD_openacc_async_exec): Adjust, simplify.
	(cuda_free_argmem): Remove.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
	Adjust.
2023-03-10 18:05:27 +01:00
Thomas Schwinge
199867d07b Simplify OpenACC 'no_create' clause implementation
For 'OFFSET_INLINED', 'gomp_map_val' does the right thing, and we may then
simplify the device plugins accordingly.

This is a follow-up to
Subversion r279551 (Git commit a6163563f2)
"Add OpenACC 2.6's no_create",
Subversion r279622 (Git commit 5bcd470bf0)
"Use gomp_map_val for OpenACC host-to-device address translation".

	libgomp/
	* target.c (gomp_map_vars_internal): Use 'OFFSET_INLINED' for
	'GOMP_MAP_IF_PRESENT'.
	* plugin/plugin-gcn.c (gcn_exec, GOMP_OFFLOAD_openacc_exec)
	(GOMP_OFFLOAD_openacc_async_exec): Adjust.
	* plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec)
	(GOMP_OFFLOAD_openacc_async_exec): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/no_create-1.c: Add 'async'
	testing.
	* testsuite/libgomp.oacc-c-c++-common/no_create-2.c: Likewise.
2023-03-10 15:48:43 +01:00
Thomas Schwinge
649f1939ba Fix OpenACC/GCN 'acc_ev_enqueue_launch_end' position
For an OpenACC compute construct, we've currently got:

  - [...]
  - acc_ev_enqueue_launch_start
  - launch kernel
  - free memory
  - acc_ev_free
  - acc_ev_enqueue_launch_end

This confused another thing that I'm working on, so I adjusted that to:

  - [...]
  - acc_ev_enqueue_launch_start
  - launch kernel
  - acc_ev_enqueue_launch_end
  - free memory
  - acc_ev_free

Correspondingly, verify 'acc_ev_alloc', 'acc_ev_free' in
'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'.

	libgomp/
	* plugin/plugin-gcn.c (gcn_exec): Fix 'acc_ev_enqueue_launch_end'
	position.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
	Verify 'acc_ev_alloc', 'acc_ev_free'.
2023-03-10 15:05:01 +01:00
Tobias Burnus
f84fdb134d libgomp: enable reverse offload for AMDGCN
libgomp/ChangeLog:

	* libgomp.texi (5.0 Impl. Status, gcn specifics): Update for
	reverse offload.
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Accept
	reverse-offload requirement.
2023-02-03 08:33:17 +01:00
Andrew Stubbs
f6fff8a6fc amdgcn, libgomp: Manually allocated stacks
Switch from using stacks in the "private segment" to using a memory block
allocated on the host side.  The primary reason is to permit the reverse
offload implementation to access values located on the device stack, but
there may also be performance benefits, especially with repeated kernel
invocations.

This implementation unifies the stacks with the "team arena" optimization
feature, and now allows both to have run-time configurable sizes.

A new ABI is needed, so all libraries must be rebuilt, and newlib must be
version 4.3.0.20230120 or newer.

gcc/ChangeLog:

	* config/gcn/gcn-run.cc: Include libgomp-gcn.h.
	(struct kernargs): Replace the common content with kernargs_abi.
	(struct heap): Delete.
	(main): Read GCN_STACK_SIZE envvar.
	Allocate space for the device stacks.
	Write the new kernargs fields.
	* config/gcn/gcn.cc (gcn_option_override): Remove stack_size_opt.
	(default_requested_args): Remove PRIVATE_SEGMENT_BUFFER_ARG and
	PRIVATE_SEGMENT_WAVE_OFFSET_ARG.
	(gcn_addr_space_convert): Mask the QUEUE_PTR_ARG content.
	(gcn_expand_prologue): Move the TARGET_PACKED_WORK_ITEMS to the top.
	Set up the stacks from the values in the kernargs, not private.
	(gcn_expand_builtin_1): Match the stack configuration in the prologue.
	(gcn_hsa_declare_function_name): Turn off the private segment.
	(gcn_conditional_register_usage): Ensure QUEUE_PTR is fixed.
	* config/gcn/gcn.h (FIXED_REGISTERS): Fix the QUEUE_PTR register.
	* config/gcn/gcn.opt (mstack-size): Change the description.

include/ChangeLog:

	* gomp-constants.h (GOMP_VERSION_GCN): Bump.

libgomp/ChangeLog:

	* config/gcn/libgomp-gcn.h (DEFAULT_GCN_STACK_SIZE): New define.
	(DEFAULT_TEAM_ARENA_SIZE): New define.
	(struct heap): Move to this file.
	(struct kernargs_abi): Likewise.
	* config/gcn/team.c (gomp_gcn_enter_kernel): Use team arena size from
	the kernargs.
	* libgomp.h: Include libgomp-gcn.h.
	(TEAM_ARENA_SIZE): Remove.
	(team_malloc): Update the error message.
	* plugin/plugin-gcn.c (struct kernargs): Move common content to
	struct kernargs_abi.
	(struct agent_info): Rename team arenas to ephemeral memories.
	(struct team_arena_list): Rename ....
	(struct ephemeral_memories_list): to this.
	(struct heap): Delete.
	(team_arena_size): New variable.
	(stack_size): New variable.
	(print_kernel_dispatch): Update debug messages.
	(init_environment_variables): Read GCN_TEAM_ARENA_SIZE.
	Read GCN_STACK_SIZE.
	(get_team_arena): Rename ...
	(configure_ephemeral_memories): ... to this, and set up stacks.
	(release_team_arena): Rename ...
	(release_ephemeral_memories): ... to this.
	(destroy_team_arenas): Rename ...
	(destroy_ephemeral_memories): ... to this.
	(create_kernel_dispatch): Add num_threads parameter.
	Adjust for kernargs_abi refactor and ephemeral memories.
	(release_kernel_dispatch): Adjust for ephemeral memories.
	(run_kernel): Pass thread-count to create_kernel_dispatch.
	(GOMP_OFFLOAD_init_device): Adjust for ephemeral memories.
	(GOMP_OFFLOAD_fini_device): Adjust for ephemeral memories.

gcc/testsuite/ChangeLog:

	* gcc.c-torture/execute/pr47237.c: Xfail on amdgcn.
	* gcc.dg/builtin-apply3.c: Xfail for amdgcn.
	* gcc.dg/builtin-apply4.c: Xfail for amdgcn.
	* gcc.dg/torture/stackalign/builtin-apply-3.c: Xfail for amdgcn.
	* gcc.dg/torture/stackalign/builtin-apply-4.c: Xfail for amdgcn.
2023-02-02 11:47:03 +00:00
Jakub Jelinek
83ffe9cde7 Update copyright years. 2023-01-16 11:52:17 +01:00
Tobias Burnus
ea4b23d9c8 libgomp: Handle OpenMP's reverse offloads
This commit enabled reverse offload for nvptx such that gomp_target_rev
actually gets called.  And it fills the latter function to do all of
the following: finding the host function to the device func ptr and
copying the arguments to the host, processing the mapping/firstprivate,
calling the host function, copying back the data and freeing as needed.

The data handling is made easier by assuming that all host variables
either existed before (and are in the mapping) or that those are
devices variables not yet available on the host. Thus, the reverse
mapping can do without refcounts etc. Note that the spec disallows
inside a target region device-affecting constructs other than target
plus ancestor device-modifier and it also limits the clauses permitted
on this construct.

For the function addresses, an additional splay tree is used; for
the lookup of mapped variables, the existing splay-tree is used.
Unfortunately, its data structure requires a full walk of the tree;
Additionally, the just mapped variables are recorded in a separate
data structure an extra lookup. While the lookup is slow, assuming
that only few variables get mapped in each reverse offload construct
and that reverse offload is the exception and not performance critical,
this seems to be acceptable.

libgomp/ChangeLog:

	* libgomp.h (struct target_mem_desc): Predeclare; move
	below after 'reverse_splay_tree_node' and add rev_array
	member.
	(struct reverse_splay_tree_key_s, reverse_splay_compare): New.
	(reverse_splay_tree_node, reverse_splay_tree,
	reverse_splay_tree_key): New typedef.
	(struct gomp_device_descr): Add mem_map_rev member.
	* oacc-host.c (host_dispatch): NULL init .mem_map_rev.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim
	support for GOMP_REQUIRES_REVERSE_OFFLOAD.
	* splay-tree.h (splay_tree_callback_stop): New typedef; like
	splay_tree_callback but returning int not void.
	(splay_tree_foreach_lazy): Define; like splay_tree_foreach but
	taking splay_tree_callback_stop as argument.
	* splay-tree.c (splay_tree_foreach_internal_lazy,
	splay_tree_foreach_lazy): New; but early exit if callback returns
	nonzero.
	* target.c: Instatiate splay_tree_c with splay_tree_prefix 'reverse'.
	(gomp_map_lookup_rev): New.
	(gomp_load_image_to_device): Handle reverse-offload function
	lookup table.
	(gomp_unload_image_from_device): Free devicep->mem_map_rev.
	(struct gomp_splay_tree_rev_lookup_data, gomp_splay_tree_rev_lookup,
	gomp_map_rev_lookup, struct cpy_data, gomp_map_cdata_lookup_int,
	gomp_map_cdata_lookup): New auxiliary structs and functions for
	gomp_target_rev.
	(gomp_target_rev): Implement reverse offloading and its mapping.
	(gomp_target_init): Init current_device.mem_map_rev.root.
	* testsuite/libgomp.fortran/reverse-offload-2.f90: New test.
	* testsuite/libgomp.fortran/reverse-offload-3.f90: New test.
	* testsuite/libgomp.fortran/reverse-offload-4.f90: New test.
	* testsuite/libgomp.fortran/reverse-offload-5.f90: New test.
	* testsuite/libgomp.fortran/reverse-offload-5a.f90: New test without
	mapping of on-device allocated variables.
2022-12-10 13:42:08 +01:00
Marcel Vollweiler
81476bc4f4 OpenMP: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices
This patch adds support for omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of
device-specific ICV values (specified as environment variables or changed on a
device). In order to reuse device-specific ICV values, a copy back mechanism is
implemented that copies ICV values back from device to the host.

Additionally, a limitation of the number of teams on gcn offload devices is
implemented.  The number of teams is limited by twice the number of compute
units (one team is executed on one compute unit).  This avoids queueing
unnessecary many teams and a corresponding allocation of large amounts of
memory.  Without that limitation the memory allocation for a large number of
user-specified teams can result in an "memory access fault".
A limitation of the number of teams is already also implemented for nvptx
devices (see nvptx_adjust_launch_bounds in libgomp/plugin/plugin-nvptx.c).

gcc/ChangeLog:

	* gimplify.cc (optimize_target_teams): Set initial num_teams_upper
	to "-2" instead of "1" for non-existing num_teams clause in order to
	disambiguate from the case of an existing num_teams clause with value 1.

libgomp/ChangeLog:

	* config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
	allow processing of device-specific values.
	(omp_set_teams_thread_limit): Likewise.
	(ialias): Likewise.
	* config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
	(omp_set_teams_thread_limit): Likewise.
	(ialias): Likewise.
	* icv-device.c (omp_get_teams_thread_limit): Likewise.
	(ialias): Likewise.
	(omp_set_teams_thread_limit): Likewise.
	* icv.c (omp_set_teams_thread_limit): Removed.
	(omp_get_teams_thread_limit): Likewise.
	(ialias): Likewise.
	* libgomp.texi: Updated documentation for nvptx and gcn corresponding
	to the limitation of the number of teams.
	* plugin/plugin-gcn.c (limit_teams): New helper function that limits
	the number of teams by twice the number of compute units.
	(parse_target_attributes): Limit the number of teams on gcn offload
	devices.
	* target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
	handling.
	(gomp_load_image_to_device): Added a size check for the ICVs struct
	variable.
	(gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
	copy back the ICV values from device to host.
	(GOMP_target_ext): Update the number of teams and threads in the kernel
	args also considering device-specific values.
	* testsuite/libgomp.c-c++-common/icv-4.c: Fixed an error in the reading
	of OMP_TEAMS_THREAD_LIMIT from the environment.
	* testsuite/libgomp.c-c++-common/icv-5.c: Extended.
	* testsuite/libgomp.c-c++-common/icv-6.c: Extended.
	* testsuite/libgomp.c-c++-common/icv-7.c: Extended.
	* testsuite/libgomp.c-c++-common/icv-9.c: New test.
	* testsuite/libgomp.fortran/icv-5.f90: New test.
	* testsuite/libgomp.fortran/icv-6.f90: New test.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/target-teams-1.c: Adapt expected values for
	num_teams from "1" to "-2" in cases without num_teams clause.
	* g++.dg/gomp/target-teams-1.C: Likewise.
	* gfortran.dg/gomp/defaultmap-4.f90: Likewise.
	* gfortran.dg/gomp/defaultmap-5.f90: Likewise.
	* gfortran.dg/gomp/defaultmap-6.f90: Likewise.
2022-12-06 06:03:50 -08:00
Tobias Burnus
9f9d128f45 libgomp: Add no-target-region rev offload test + fix plugin-nvptx
OpenMP permits that a 'target device(ancestor:1)' is called without being
enclosed in a target region - using the current device (i.e. the host) in
that case.  This commit adds a testcase for this.

In case of nvptx, the missing on-device 'GOMP_target_ext' call causes that
it and also the associated on-device GOMP_REV_OFFLOAD_VAR variable are not
linked in from nvptx's libgomp.a. Thus, handle the failing cuModuleGetGlobal
gracefully by disabling reverse offload and assuming that the failure is fine.

libgomp/ChangeLog:

	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Use unsigned int
	for 'i' to match 'fn_entries'; regard absent GOMP_REV_OFFLOAD_VAR
	as valid and the code having no reverse-offload code.
	* testsuite/libgomp.c-c++-common/reverse-offload-2.c: New test.
2022-11-25 13:48:17 +01:00
Tobias Burnus
6edcb5dc42 libgomp/gcn: fix/improve struct output
output.printf_data.(value union) contains text[128], which has the size
of 128 bytes, sufficient for 16 uint64_t variables; hence value_u64[2]
could be extended to value_u64[6] - sufficient for all required arguments
to gomp_target_rev.  Additionally, next_output.printf_data.(msg union)
contained msg_u64 which then is no longer needed and also caused 32bit
vs 64bit alignment issues.

libgomp/
	* config/gcn/libgomp-gcn.h (struct output):
	Remove 'msg_u64' from the union, change
	value_u64[2] to value_u64[6].
	* config/gcn/target.c (GOMP_target_ext): Update accordingly.
	* plugin/plugin-gcn.c (process_reverse_offload, console_output):
	Likewise.
2022-11-21 15:22:35 +01:00
Tobias Burnus
8c05d8cd43 libgomp/gcn: Prepare for reverse-offload callback handling
libgomp/ChangeLog:

	* config/gcn/libgomp-gcn.h: New file; contains
	struct output, declared previously in plugin-gcn.c.
	* config/gcn/target.c: Include it.
	(GOMP_ADDITIONAL_ICVS): Declare as extern var.
	(GOMP_target_ext): Handle reverse offload.
	* plugin/plugin-gcn.c: Include libgomp-gcn.h.
	(struct kernargs): Replace struct def by the one
	from libgomp-gcn.h for output_data.
	(process_reverse_offload): New.
	(console_output): Call it.
2022-11-19 10:36:27 +01:00
Thomas Schwinge
e4cba49413 Remove support for Intel MIC offloading
... after its deprecation in GCC 12.

	* Makefile.def: Remove module 'liboffloadmic'.
	* Makefile.in: Regenerate.
	* configure.ac: Remove 'liboffloadmic' handling.
	* configure: Regenerate.
	contrib/
	* gcc-changelog/git_commit.py (default_changelog_locations):
	Remove 'liboffloadmic'.
	* gcc_update (files_and_dependencies): Remove 'liboffloadmic'
	files.
	* update-copyright.py (GCCCmdLine): Remove 'liboffloadmic'
	comment.
	gcc/
	* config.gcc [target *-intelmic-* | *-intelmicemul-*]: Remove.
	* config/i386/i386-options.cc (ix86_omp_device_kind_arch_isa)
	[ACCEL_COMPILER]: Remove.
	* config/i386/intelmic-mkoffload.cc: Remove.
	* config/i386/intelmic-offload.h: Likewise.
	* config/i386/t-intelmic: Likewise.
	* config/i386/t-omp-device: Likewise.
	* configure.ac [target *-intelmic-* | *-intelmicemul-*]: Remove.
	* configure: Regenerate.
	* doc/install.texi (--enable-offload-targets=[...]): Update.
	* doc/sourcebuild.texi: Remove 'liboffloadmic' documentation.
	include/
	* gomp-constants.h (GOMP_DEVICE_INTEL_MIC): Comment out.
	(GOMP_VERSION_INTEL_MIC): Remove.
	libgomp/
	* libgomp-plugin.h (OFFLOAD_TARGET_TYPE_INTEL_MIC): Remove.
	* libgomp.texi (OpenMP Context Selectors): Remove Intel MIC
	documentation.
	* plugin/configfrag.ac <enable_offload_targets>
	[*-intelmic-* | *-intelmicemul-*]: Remove.
	* configure: Regenerate.
	* testsuite/lib/libgomp.exp (libgomp_init): Remove 'liboffloadmic'
	handling.
	(offload_target_to_openacc_device_type)
	[$offload_target = *-intelmic*]: Remove.
	(check_effective_target_offload_device_intel_mic)
	(check_effective_target_offload_device_any_intel_mic): Remove.
	* testsuite/libgomp.c-c++-common/on_device_arch.h
	(device_arch_intel_mic, on_device_arch_intel_mic, any_device_arch)
	(any_device_arch_intel_mic): Remove.
	* testsuite/libgomp.c-c++-common/target-45.c: Remove
	'offload_device_any_intel_mic' XFAIL.
	* testsuite/libgomp.fortran/target10.f90: Likewise.
	liboffloadmic/
	* ChangeLog: Remove.
	* Makefile.am: Likewise.
	* Makefile.in: Likewise.
	* aclocal.m4: Likewise.
	* configure: Likewise.
	* configure.ac: Likewise.
	* configure.tgt: Likewise.
	* doc/doxygen/config: Likewise.
	* doc/doxygen/header.tex: Likewise.
	* include/coi/common/COIEngine_common.h: Likewise.
	* include/coi/common/COIEvent_common.h: Likewise.
	* include/coi/common/COIMacros_common.h: Likewise.
	* include/coi/common/COIPerf_common.h: Likewise.
	* include/coi/common/COIResult_common.h: Likewise.
	* include/coi/common/COISysInfo_common.h: Likewise.
	* include/coi/common/COITypes_common.h: Likewise.
	* include/coi/sink/COIBuffer_sink.h: Likewise.
	* include/coi/sink/COIPipeline_sink.h: Likewise.
	* include/coi/sink/COIProcess_sink.h: Likewise.
	* include/coi/source/COIBuffer_source.h: Likewise.
	* include/coi/source/COIEngine_source.h: Likewise.
	* include/coi/source/COIEvent_source.h: Likewise.
	* include/coi/source/COIPipeline_source.h: Likewise.
	* include/coi/source/COIProcess_source.h: Likewise.
	* liboffloadmic_host.spec.in: Likewise.
	* liboffloadmic_target.spec.in: Likewise.
	* plugin/Makefile.am: Likewise.
	* plugin/Makefile.in: Likewise.
	* plugin/aclocal.m4: Likewise.
	* plugin/configure: Likewise.
	* plugin/configure.ac: Likewise.
	* plugin/libgomp-plugin-intelmic.cpp: Likewise.
	* plugin/offload_target_main.cpp: Likewise.
	* runtime/cean_util.cpp: Likewise.
	* runtime/cean_util.h: Likewise.
	* runtime/coi/coi_client.cpp: Likewise.
	* runtime/coi/coi_client.h: Likewise.
	* runtime/coi/coi_server.cpp: Likewise.
	* runtime/coi/coi_server.h: Likewise.
	* runtime/compiler_if_host.cpp: Likewise.
	* runtime/compiler_if_host.h: Likewise.
	* runtime/compiler_if_target.cpp: Likewise.
	* runtime/compiler_if_target.h: Likewise.
	* runtime/dv_util.cpp: Likewise.
	* runtime/dv_util.h: Likewise.
	* runtime/emulator/coi_common.h: Likewise.
	* runtime/emulator/coi_device.cpp: Likewise.
	* runtime/emulator/coi_device.h: Likewise.
	* runtime/emulator/coi_host.cpp: Likewise.
	* runtime/emulator/coi_host.h: Likewise.
	* runtime/emulator/coi_version_asm.h: Likewise.
	* runtime/emulator/coi_version_linker_script.map: Likewise.
	* runtime/liboffload_error.c: Likewise.
	* runtime/liboffload_error_codes.h: Likewise.
	* runtime/liboffload_msg.c: Likewise.
	* runtime/liboffload_msg.h: Likewise.
	* runtime/mic_lib.f90: Likewise.
	* runtime/offload.h: Likewise.
	* runtime/offload_common.cpp: Likewise.
	* runtime/offload_common.h: Likewise.
	* runtime/offload_engine.cpp: Likewise.
	* runtime/offload_engine.h: Likewise.
	* runtime/offload_env.cpp: Likewise.
	* runtime/offload_env.h: Likewise.
	* runtime/offload_host.cpp: Likewise.
	* runtime/offload_host.h: Likewise.
	* runtime/offload_iterator.h: Likewise.
	* runtime/offload_omp_host.cpp: Likewise.
	* runtime/offload_omp_target.cpp: Likewise.
	* runtime/offload_orsl.cpp: Likewise.
	* runtime/offload_orsl.h: Likewise.
	* runtime/offload_table.cpp: Likewise.
	* runtime/offload_table.h: Likewise.
	* runtime/offload_target.cpp: Likewise.
	* runtime/offload_target.h: Likewise.
	* runtime/offload_target_main.cpp: Likewise.
	* runtime/offload_timer.h: Likewise.
	* runtime/offload_timer_host.cpp: Likewise.
	* runtime/offload_timer_target.cpp: Likewise.
	* runtime/offload_trace.cpp: Likewise.
	* runtime/offload_trace.h: Likewise.
	* runtime/offload_util.cpp: Likewise.
	* runtime/offload_util.h: Likewise.
	* runtime/ofldbegin.cpp: Likewise.
	* runtime/ofldend.cpp: Likewise.
	* runtime/orsl-lite/include/orsl-lite.h: Likewise.
	* runtime/orsl-lite/lib/orsl-lite.c: Likewise.
	* runtime/orsl-lite/version.txt: Likewise.
2022-11-04 10:51:01 +01:00
Thomas Schwinge
205538832b libgomp/nvptx: Prepare for reverse-offload callback handling, resolve spurious SIGSEGVs
Per commit r13-3460-g131d18e928a3ea1ab2d3bf61aa92d68a8a254609
"libgomp/nvptx: Prepare for reverse-offload callback handling",
I'm seeing a lot of libgomp execution test regressions.  Random
example, 'libgomp.c-c++-common/error-1.c':

    [...]
      GOMP_OFFLOAD_run: kernel main$_omp_fn$0: launch [(teams: 1), 1, 1] [(lanes: 32), (threads: 8), 1]

    Thread 1 "a.out" received signal SIGSEGV, Segmentation fault.
    0x00007ffff793b87d in GOMP_OFFLOAD_run (ord=<optimized out>, tgt_fn=<optimized out>, tgt_vars=<optimized out>, args=<optimized out>) at [...]/source-gcc/libgomp/plugin/plugin-nvptx.c:2127
    2127            if (__atomic_load_n (&ptx_dev->rev_data->fn, __ATOMIC_ACQUIRE) != 0)
    (gdb) print ptx_dev
    $1 = (struct ptx_device *) 0x6a55a0
    (gdb) print ptx_dev->rev_data
    $2 = (struct rev_offload *) 0xffffffff00000000
    (gdb) print ptx_dev->rev_data->fn
    Cannot access memory at address 0xffffffff00000000

	libgomp/
	* plugin/plugin-nvptx.c (nvptx_open_device): Initialize
	'ptx_dev->rev_data'.
2022-10-24 21:47:05 +02:00
Tobias Burnus
131d18e928 libgomp/nvptx: Prepare for reverse-offload callback handling
This patch adds a stub 'gomp_target_rev' in the host's target.c, which will
later handle the reverse offload.
For nvptx, it adds support for forwarding the offload gomp_target_ext call
to the host by setting values in a struct on the device and querying it on
the host - invoking gomp_target_rev on the result.

include/ChangeLog:

	* cuda/cuda.h (enum CUdevice_attribute): Add
	CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING.
	(CU_MEMHOSTALLOC_DEVICEMAP): Define.
	(cuMemHostAlloc): Add prototype.

libgomp/ChangeLog:

	* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Remove
	'static' for this variable.
	* config/nvptx/libgomp-nvptx.h: New file.
	* config/nvptx/target.c: Include it.
	(GOMP_ADDITIONAL_ICVS): Declare extern var.
	(GOMP_REV_OFFLOAD_VAR): Declare var.
	(GOMP_target_ext): Handle reverse offload.
	* libgomp-plugin.h (GOMP_PLUGIN_target_rev): New prototype.
	* libgomp-plugin.c (GOMP_PLUGIN_target_rev): New, call ...
	* target.c (gomp_target_rev): ... this new stub function.
	* libgomp.h (gomp_target_rev): Declare.
	* libgomp.map (GOMP_PLUGIN_1.4): New; add GOMP_PLUGIN_target_rev.
	* plugin/cuda-lib.def (cuMemHostAlloc): Add.
	* plugin/plugin-nvptx.c: Include libgomp-nvptx.h.
	(struct ptx_device): Add rev_data member.
	(nvptx_open_device): Remove async_engines query, last used in
	r10-304-g1f4c5b9b; add unified-address assert check.
	(GOMP_OFFLOAD_get_num_devices): Claim unified address
	support.
	(GOMP_OFFLOAD_load_image): Free rev_fn_table if no
	offload functions exist. Make offload var available
	on host and device.
	(rev_off_dev_to_host_cpy, rev_off_host_to_dev_cpy): New.
	(GOMP_OFFLOAD_run): Handle reverse offload.
2022-10-24 17:04:08 +02:00
Tobias Burnus
50be486dff nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
Add support to nvptx for reverse lookup of function name to prepare for
'omp target device(ancestor:1)'.

gcc/ChangeLog:

	* config/nvptx/mkoffload.cc (struct id_map): Add 'dim' member.
	(record_id): Store func name without quotes, store dim separately.
	(process): For GOMP_REQUIRES_REVERSE_OFFLOAD, check that -march is
	at least sm_35, create '$offload_func_table' global array and init
	with reverse-offload function addresses.
	* config/nvptx/nvptx.cc (write_fn_proto_1, write_fn_proto): New
	force_public attribute to force .visible.
	(nvptx_declare_function_name): For "omp target
	device_ancestor_nohost" attribut, force .visible/TREE_PUBLIC.

libgomp/ChangeLog:

	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Read offload
	function address table '$offload_func_table' if rev_fn_table
	is not NULL.
2022-09-09 17:58:02 +02:00
Tobias Burnus
dfd75bf7e9 GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
Add support to GCN for reverse lookup of function name to prepare for
'omp target device(ancestor:1)'.

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (process_asm): Create .offload_func_table,
	similar to pre-existing .offload_var_table.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Read
	.offload_func_table to populate rev_fn_table when requested.
2022-09-09 17:43:12 +02:00
Tobias Burnus
0fcc0cf9dc libgomp: Prepare for reverse offload fn lookup
Prepare for reverse-offloading function-pointer lookup by passing
a rev_fn_table argument to GOMP_OFFLOAD_load_image.

The argument will be NULL, unless GOMP_REQUIRES_REVERSE_OFFLOAD is
requested and devices not supported it, are filtered out.
(Up to and including this commit, no non-host device claims such
support and the caller currently always passes NULL.)

libgomp/ChangeLog:

	* libgomp-plugin.h (GOMP_OFFLOAD_load_image): Add
	'uint64_t **rev_fn_table' argument.
	* oacc-host.c (host_load_image): Likewise.
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Likewise;
	currently unused.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
	* target.c (gomp_load_image_to_device): Update call but pass
	NULL for now.

liboffloadmic/ChangeLog:

	* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_load_image):
	Add (unused) uint64_t **rev_fn_table argument.
2022-09-09 17:37:09 +02:00
Marcel Vollweiler
9f2fca5659 OpenMP, libgomp: Environment variable syntax extension
This patch considers the environment variable syntax extension for
device-specific variants of environment variables from OpenMP 5.1 (see
OpenMP 5.1 specification, p. 75 and p. 639).  An environment variable (e.g.
OMP_NUM_TEAMS) can have different suffixes:

_DEV (e.g. OMP_NUM_TEAMS_DEV): affects all devices but not the host.
_DEV_<device> (e.g. OMP_NUM_TEAMS_DEV_42): affects only device with
number <device>.
no suffix (e.g. OMP_NUM_TEAMS): affects only the host.

In future OpenMP versions also suffix _ALL will be introduced (see discussion
https://github.com/OpenMP/spec/issues/3179). This is also considered in this
patch:

_ALL (e.g. OMP_NUM_TEAMS_ALL): affects all devices and the host.

The precedence is as follows (descending). For the host:

	1. no suffix
	2. _ALL

For devices:

	1. _DEV_<device>
	2. _DEV
	3. _ALL

That means, _DEV_<device> is used whenever available. Otherwise _DEV is used if
available, and at last _ALL.  If there is no value for any of the variable
variants, default values are used as already implemented before.

This patch concerns parsing (a), storing (b), output (c) and transmission to the
device (d):

(a) The actual number of devices and the numbering are not known when parsing
the environment variables.  Thus all environment variables are iterated and
searched for device-specific ones.
(b) Only configured device-specific variables are stored.  Thus, a linked list
is used.
(c) The output is done in omp_display_env (see specification p. 468f).  Global
ICVs are tagged with [all], see https://github.com/OpenMP/spec/issues/3179.
ICVs which are not global but aren't handled device-specific yet are tagged
with [host].  omp_display_env outputs the initial values of the ICVs.  That is
why a dedicated data structure is introduced for the inital values only
(gomp_initial_icv_list).
(d) Device-specific ICVs are transmitted to the device via GOMP_ADDITIONAL_ICVS.

libgomp/ChangeLog:

	* config/gcn/icv-device.c (omp_get_default_device): Return device-
	specific ICV.
	(omp_get_max_teams): Added for GCN devices.
	(omp_set_num_teams): Likewise.
	(ialias): Likewise.
	* config/nvptx/icv-device.c (omp_get_default_device): Return device-
	specific ICV.
	(omp_get_max_teams): Added for NVPTX devices.
	(omp_set_num_teams): Likewise.
	(ialias): Likewise.
	* env.c (struct gomp_icv_list): New struct to store entries of initial
	ICV values.
	(struct gomp_offload_icv_list): New struct to store entries of device-
	specific ICV values that are copied to the device and back.
	(struct gomp_default_icv_values): New struct to store default values of
	ICVs according to the OpenMP standard.
	(parse_schedule): Generalized for different variants of OMP_SCHEDULE.
	(print_env_var_error): Function that prints an error for invalid values
	for ICVs.
	(parse_unsigned_long_1): Removed getenv.  Generalized.
	(parse_unsigned_long): Likewise.
	(parse_int_1): Likewise.
	(parse_int): Likewise.
	(parse_int_secure): Likewise.
	(parse_unsigned_long_list): Likewise.
	(parse_target_offload): Likewise.
	(parse_bind_var): Likewise.
	(parse_stacksize): Likewise.
	(parse_boolean): Likewise.
	(parse_wait_policy): Likewise.
	(parse_allocator): Likewise.
	(omp_display_env): Extended to output different variants of environment
	variables.
	(print_schedule): New helper function for omp_display_env which prints
	the values of run_sched_var.
	(print_proc_bind): New helper function for omp_display_env which prints
	the values of proc_bind_var.
	(enum gomp_parse_type): Collection of types used for parsing environment
	variables.
	(ENTRY): Preprocess string lengths of environment variables.
	(OMP_VAR_CNT): Preprocess table size.
	(OMP_HOST_VAR_CNT): Likewise.
	(INT_MAX_STR_LEN): Constant for the maximal number of digits of a device
	number.
	(gomp_get_icv_flag): Returns if a flag for a particular ICV is set.
	(gomp_set_icv_flag): Sets a flag for a particular ICV.
	(print_device_specific_icvs): New helper function for omp_display_env to
	print device specific ICV values.
	(get_device_num): New helper function for parse_device_specific.
	Extracts the device number from an environment variable name.
	(get_icv_member_addr): Gets the memory address for a particular member
	of an ICV struct.
	(gomp_get_initial_icv_item): Get a list item of gomp_initial_icv_list.
	(initialize_icvs): New function to initialize a gomp_initial_icvs
	struct.
	(add_initial_icv_to_list): Adds an ICV struct to gomp_initial_icv_list.
	(startswith): Checks if a string starts with a given prefix.
	(initialize_env): Extended to parse the new syntax of environment
	variables.
	* icv-device.c (omp_get_max_teams): Added.
	(ialias): Likewise.
	(omp_set_num_teams): Likewise.
	* icv.c (omp_set_num_teams): Moved to icv-device.c.
	(omp_get_max_teams): Likewise.
	(ialias): Likewise.
	* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Removed.
	(GOMP_ADDITIONAL_ICVS): New target-side struct that
	holds the designated ICVs of the target device.
	* libgomp.h (enum gomp_icvs): Collection of ICVs.
	(enum gomp_device_num): Definition of device numbers for _ALL, _DEV, and
	no suffix.
	(enum gomp_env_suffix): Collection of possible suffixes of environment
	variables.
	(struct gomp_initial_icvs): Contains all ICVs for which we need to store
	initial values.
	(struct gomp_default_icv):New struct to hold ICVs for which we need
	to store initial values.
	(struct gomp_icv_list): Definition of a linked list that is used for
	storing ICVs for the devices and also for _DEV, _ALL, and without
	suffix.
	(struct gomp_offload_icvs): New struct to hold ICVs that are copied to
	a device.
	(struct gomp_offload_icv_list): Definition of a linked list that holds
	device-specific ICVs that are copied to devices.
	(gomp_get_initial_icv_item): Get a list item of gomp_initial_icv_list.
	(gomp_get_icv_flag): Returns if a flag for a particular ICV is set.
	* libgomp.texi: Updated.
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Extended to read
	further ICVs from the offload image.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
	* target.c (gomp_get_offload_icv_item): Get a list item of
	gomp_offload_icv_list.
	(get_gomp_offload_icvs): New. Returns the ICV values
	depending on the device num and the variable hierarchy.
	(gomp_load_image_to_device): Extended to copy further ICVs to a device.
	* testsuite/libgomp.c-c++-common/icv-5.c: New test.
	* testsuite/libgomp.c-c++-common/icv-6.c: New test.
	* testsuite/libgomp.c-c++-common/icv-7.c: New test.
	* testsuite/libgomp.c-c++-common/icv-8.c: New test.
	* testsuite/libgomp.c-c++-common/omp-display-env-1.c: New test.
	* testsuite/libgomp.c-c++-common/omp-display-env-2.c: New test.
2022-09-08 10:19:37 -07:00
Tobias Burnus
683f118439 OpenMP: Move omp requires checks to libgomp
Handle reverse_offload, unified_address, and unified_shared_memory
requirements in libgomp by saving them alongside the offload table.
When the device lto1 runs, it extracts the data for mkoffload. The
latter than passes the value on to GOMP_offload_register_ver.

lto1 (either the host one, with -flto [+ ENABLE_OFFLOADING], or in the
offload-device lto1) also does the the consistency check is done,
erroring out when the 'omp requires' clause use is inconsistent.

For all in-principle supported devices, if a requirement cannot be fulfilled,
the device is excluded from the (supported) devices list. Currently, none of
those requirements are marked as supported for any of the non-host devices.

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_target_data, c_parser_omp_target_update,
	c_parser_omp_target_enter_data, c_parser_omp_target_exit_data): Set
	OMP_REQUIRES_TARGET_USED.
	(c_parser_omp_requires): Remove sorry.

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (process_asm): Write '#include <stdint.h>'.
	(process_obj): Pass omp_requires_mask to GOMP_offload_register_ver.
	(main): Ask lto1 to obtain omp_requires_mask and pass it on.
	* config/nvptx/mkoffload.cc (process, main): Likewise.
	* lto-cgraph.cc (omp_requires_to_name): New.
	(input_offload_tables): Save omp_requires_mask.
	(output_offload_tables): Read it, check for consistency,
	save value for mkoffload.
	* omp-low.cc (lower_omp_target): Force output_offloadtables
	call for OMP_REQUIRES_TARGET_USED.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_target_data,
	cp_parser_omp_target_enter_data, cp_parser_omp_target_exit_data,
	cp_parser_omp_target_update): Set OMP_REQUIRES_TARGET_USED.
	(cp_parser_omp_requires): Remove sorry.

gcc/fortran/ChangeLog:

	* openmp.cc (gfc_match_omp_requires): Remove sorry.
	* parse.cc (decode_omp_directive): Don't regard 'declare target'
	as target usage for 'omp requires'; add more flags to
	omp_requires_mask.

include/ChangeLog:

	* gomp-constants.h (GOMP_VERSION): Bump to 2.
	(GOMP_REQUIRES_UNIFIED_ADDRESS, GOMP_REQUIRES_UNIFIED_SHARED_MEMORY,
	GOMP_REQUIRES_REVERSE_OFFLOAD, GOMP_REQUIRES_TARGET_USED):
	New defines.

libgomp/ChangeLog:

	* libgomp-plugin.h (GOMP_OFFLOAD_get_num_devices): Add
	omp_requires_mask arg.
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Likewise;
	return -1 when device available but omp_requires_mask != 0.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Likewise.
	* oacc-host.c (host_get_num_devices, host_openacc_get_property):
	Update call.
	* oacc-init.c (resolve_device, acc_init_1, acc_shutdown_1,
	goacc_attach_host_thread_to_device, acc_get_num_devices,
	acc_set_device_num, get_property_any): Likewise.
	* target.c (omp_requires_mask): New global var.
	(gomp_requires_to_name): New.
	(GOMP_offload_register_ver): Handle passed omp_requires_mask.
	(gomp_target_init): Handle omp_requires_mask.
	* libgomp.texi (OpenMP 5.0): Update requires impl. status.
	(OpenMP 5.1): Add a missed item.
	(OpenMP 5.2): Mark linear-clause change as supported in C/C++.
	* testsuite/libgomp.c-c++-common/requires-1-aux.c: New test.
	* testsuite/libgomp.c-c++-common/requires-1.c: New test.
	* testsuite/libgomp.c-c++-common/requires-2-aux.c: New test.
	* testsuite/libgomp.c-c++-common/requires-2.c: New test.
	* testsuite/libgomp.c-c++-common/requires-3-aux.c: New test.
	* testsuite/libgomp.c-c++-common/requires-3.c: New test.
	* testsuite/libgomp.c-c++-common/requires-4-aux.c: New test.
	* testsuite/libgomp.c-c++-common/requires-4.c: New test.
	* testsuite/libgomp.c-c++-common/requires-5-aux.c: New test.
	* testsuite/libgomp.c-c++-common/requires-5.c: New test.
	* testsuite/libgomp.c-c++-common/requires-6.c: New test.
	* testsuite/libgomp.c-c++-common/requires-7-aux.c: New test.
	* testsuite/libgomp.c-c++-common/requires-7.c: New test.
	* testsuite/libgomp.fortran/requires-1-aux.f90: New test.
	* testsuite/libgomp.fortran/requires-1.f90: New test.

liboffloadmic/ChangeLog:

	* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_num_devices):
	Return -1 when device available but omp_requires_mask != 0.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/requires-4.c: Update dg-*.
	* c-c++-common/gomp/reverse-offload-1.c: Likewise.
	* c-c++-common/gomp/target-device-ancestor-2.c: Likewise.
	* c-c++-common/gomp/target-device-ancestor-3.c: Likewise.
	* c-c++-common/gomp/target-device-ancestor-4.c: Likewise.
	* c-c++-common/gomp/target-device-ancestor-5.c: Likewise.
	* gfortran.dg/gomp/target-device-ancestor-3.f90: Likewise.
	* gfortran.dg/gomp/target-device-ancestor-4.f90: Likewise.
	* gfortran.dg/gomp/target-device-ancestor-5.f90: Likewise.
	* gfortran.dg/gomp/target-device-ancestor-2.f90: Likewise. Move
	post-FE checks to ...
	* gfortran.dg/gomp/target-device-ancestor-2a.f90: ... this new file.
	* gfortran.dg/gomp/requires-8.f90: Update as we don't regard
	'declare target' for the 'requires' usage requirement.

Co-authored-by: Chung-Lin Tang <cltang@codesourcery.com>
Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2022-07-04 13:52:02 +02:00
Thomas Schwinge
1459b55d24 libgomp nvptx plugin: Remove '--with-cuda-driver=[...]' etc. configuration option
That means, exposing to the user only the '--without-cuda-driver' behavior:
including the GCC-shipped 'include/cuda/cuda.h' (not system <cuda.h>), and
'dlopen'ing the CUDA Driver library (not linking it).

For development purposes, the libgomp nvptx plugin developer may still manually
override that, to get the previous '--with-cuda-driver' behavior.

	libgomp/
	* plugin/Makefrag.am: Evaluate 'if PLUGIN_NVPTX_DYNAMIC' to true.
	* plugin/configfrag.ac (--with-cuda-driver)
	(--with-cuda-driver-include, --with-cuda-driver-lib)
	(CUDA_DRIVER_INCLUDE, CUDA_DRIVER_LIB, PLUGIN_NVPTX_CPPFLAGS)
	(PLUGIN_NVPTX_LDFLAGS, PLUGIN_NVPTX_LIBS, PLUGIN_NVPTX_DYNAMIC):
	Remove.
	* testsuite/libgomp-test-support.exp.in (cuda_driver_include)
	(cuda_driver_lib): Remove.
	* testsuite/lib/libgomp.exp (libgomp_init): Don't consider these.
	* Makefile.in: Regenerate.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
2022-06-10 17:08:57 +02:00
Andrew Stubbs
cde52d3a2d amdgcn: Add gfx90a support
This adds architecture options and multilibs for the AMD GFX90a GPUs.
It also tidies up some of the ISA selection code, and corrects a few small
mistake in the gfx908 naming.

gcc/ChangeLog:

	* config.gcc (amdgcn): Accept --with-arch=gfx908 and gfx90a.
	* config/gcn/gcn-opts.h (enum gcn_isa): New.
	(TARGET_GCN3): Use enum gcn_isa.
	(TARGET_GCN3_PLUS): Likewise.
	(TARGET_GCN5): Likewise.
	(TARGET_GCN5_PLUS): Likewise.
	(TARGET_CDNA1): New.
	(TARGET_CDNA1_PLUS): New.
	(TARGET_CDNA2): New.
	(TARGET_CDNA2_PLUS): New.
	(TARGET_M0_LDS_LIMIT): New.
	(TARGET_PACKED_WORK_ITEMS): New.
	* config/gcn/gcn.cc (gcn_isa): Change to enum gcn_isa.
	(gcn_option_override): Recognise CDNA ISA variants.
	(gcn_omp_device_kind_arch_isa): Support gfx90a.
	(gcn_expand_prologue): Make m0 init optional.
	Add support for packed work items.
	(output_file_start): Support gfx90a.
	(gcn_hsa_declare_function_name): Support gfx90a metadata.
	* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS):Add __CDNA1__ and
	__CDNA2__.
	* config/gcn/gcn.md (<su>mulsi3_highpart): Use TARGET_GCN5_PLUS.
	(<su>mulsi3_highpart_imm): Likewise.
	(<su>mulsidi3): Likewise.
	(<su>mulsidi3_imm): Likewise.
	* config/gcn/gcn.opt (gpu_type): Add gfx90a.
	* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX90a): New.
	(main): Support gfx90a.
	* config/gcn/t-gcn-hsa: Add gfx90a multilib.
	* config/gcn/t-omp-device: Add gfx90a isa.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (EF_AMDGPU_MACH): Add
	EF_AMDGPU_MACH_AMDGCN_GFX90a.
	(gcn_gfx90a_s): New.
	(isa_hsa_name): Support gfx90a.
	(isa_code): Likewise.
2022-05-24 16:18:14 +01:00
Thomas Schwinge
1f89e48789 libgomp nvptx plugin: Only consider '--with-cuda-driver=[...]' when applicable
They're not applicable in 'PLUGIN_NVPTX_DYNAMIC' configurations.

	libgomp/
	* plugin/Makefrag.am (libgomp_plugin_nvptx_la_CPPFLAGS)
	[PLUGIN_NVPTX_DYNAMIC]: Don't append '$(PLUGIN_NVPTX_CPPFLAGS)'.
	(libgomp_plugin_nvptx_la_LDFLAGS) [PLUGIN_NVPTX_DYNAMIC]: Don't
	append '$(PLUGIN_NVPTX_LDFLAGS)'.
	* Makefile.in: Regenerate.
2022-05-13 14:01:01 +02:00
Thomas Schwinge
dcc266796a Refactor '-ldl' handling for libgomp proper and plugins
Instead of implicit global 'LIBS="-ldl $LIBS"' via 'AC_CHECK_LIB', make
'-ldl' explicit for libgomp proper, and clean up 'PLUGIN_GCN_LIBS',
'PLUGIN_NVPTX_LIBS' accordingly.

	libgomp/
	* Makefile.am (libgomp_la_LIBADD): Initialize.
	* plugin/configfrag.ac (DL_LIBS): New.
	(PLUGIN_GCN_LIBS): Remove.
	(PLUGIN_NVPTX_LIBS): Don't set in the 'PLUGIN_NVPTX_DYNAMIC' case.
	* plugin/Makefrag.am (libgomp_la_LIBADD)
	(libgomp_plugin_gcn_la_LIBADD): Consider '$(DL_LIBS)'.
	(libgomp_plugin_nvptx_la_LIBADD) <PLUGIN_NVPTX_DYNAMIC>: Likewise.
	* Makefile.in: Regenerate.
	* config.h.in: Likewise.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
2022-05-12 15:11:30 +02:00
Thomas Schwinge
cd644ce8be libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'
Including the GCC-shipped 'include/cuda/cuda.h' vs. system <cuda.h> and
'dlopen'ing the CUDA Driver library vs. linking it are separate concerns.

	libgomp/
	* plugin/Makefrag.am: Handle 'PLUGIN_NVPTX_DYNAMIC'.
	* plugin/configfrag.ac (PLUGIN_NVPTX_DYNAMIC): Change
	'AC_DEFINE_UNQUOTED' into 'AM_CONDITIONAL'.
	* plugin/plugin-nvptx.c: Split 'PLUGIN_NVPTX_DYNAMIC' into
	'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and
	'PLUGIN_NVPTX_LINK_LIBCUDA'.
	* Makefile.in: Regenerate.
	* config.h.in: Likewise.
	* configure: Likewise.
2022-05-12 14:14:13 +02:00
Thomas Schwinge
edbd2b1caa libgomp plugins: Don't 'AC_SUBST' and 'AC_DEFINE_UNQUOTED' for 'PLUGIN_GCN', 'PLUGIN_NVPTX'
Nothing ever used these.

	libgomp/
	* plugin/configfrag.ac: Don't 'AC_SUBST' and 'AC_DEFINE_UNQUOTED'
	for 'PLUGIN_GCN', 'PLUGIN_NVPTX'.
	* Makefile.in: Regenerate.
	* config.h.in: Likewise.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
2022-05-12 13:28:23 +02:00
Thomas Schwinge
876ac21b7e libgomp: Remove unused '--with-hsa-runtime', '--with-hsa-runtime-include', '--with-hsa-runtime-lib'
With recent commit 2e309a4eff "libgomp testsuite:
Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library",
and commit d6adba3075 "libgomp GCN plugin:
 Clean up unused references to system-provided HSA Runtime library", the last
uses of '--with-hsa-runtime' etc. are gone.

	gcc/
	* doc/install.texi: Don't document '--with-hsa-runtime',
	'--with-hsa-runtime-include', '--with-hsa-runtime-lib'.
	libgomp/
	* plugin/configfrag.ac: Remove '--with-hsa-runtime',
	'--with-hsa-runtime-include', '--with-hsa-runtime-lib' processing.
	* Makefile.in: Regenerate.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
2022-05-11 14:27:42 +02:00
Thomas Schwinge
91a6dcd149 libgomp GCN plugin: Clean up always-empty 'PLUGIN_GCN_CPPFLAGS', 'PLUGIN_GCN_LDFLAGS'
After recent commit d6adba3075
"libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime
library", these aren't set anymore.

	libgomp/
	* plugin/Makefrag.am (libgomp_plugin_gcn_la_CPPFLAGS): Don't
	consider 'PLUGIN_GCN_CPPFLAGS'.
	(libgomp_plugin_gcn_la_LDFLAGS): Don't consider
	'PLUGIN_GCN_LDFLAGS'.
	* plugin/configfrag.ac (PLUGIN_GCN_CPPFLAGS, PLUGIN_GCN_LDFLAGS):
	Remove.
	* Makefile.in: Regenerate.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
2022-05-11 14:25:58 +02:00
Thomas Schwinge
d6adba3075 libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime library
This is only active if GCC is 'configure'd with '--with-hsa-runtime=[...]' or
'--with-hsa-runtime-include=[...]', '--with-hsa-runtime-lib=[...]' -- which
nobody really is doing, as far as I can tell.

Originally changed for the libgomp HSA plugin in
commit b8d89b03db (r242749)
"Remove build dependence on HSA run-time", and later propagated into the GCN
plugin, these are no longer built against system-provided HSA Runtime library.
Instead, unconditionally built against the GCC-shipped 'include/hsa*.h' header
files, and at run time does 'dlopen("libhsa-runtime64.so.1")'.  It thus doesn't
make sense to consider references to system-provided HSA Runtime library during
libgomp GCN plugin build.

	libgomp/
	* plugin/configfrag.ac (HSA_RUNTIME_CPPFLAGS)
	(HSA_RUNTIME_LDFLAGS): Remove.
	* configure: Regenerate.
2022-05-11 14:24:55 +02:00
Tobias Burnus
4a20616107 libgomp/plugin/plugin-gcn.c: Use -foffload-options= in err msg
While -foffload=-<flag> works (never documented legacy feature),
the documented way is to use -foffload-options=.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (isa_matches_agent): Suggest -foffload-options.
2022-05-04 18:40:03 +02:00
Thomas Schwinge
5e431ae4cc Move 'libgomp/plugin/cuda/cuda.h' to 'include/cuda/cuda.h'
... so that it may be used by other projects that inherit GCC's 'include'
directory.

	include/
	* cuda/cuda.h: New file.
	libgomp/
	* plugin/cuda/cuda.h: Remove file.
	* plugin/plugin-nvptx.c [PLUGIN_NVPTX_DYNAMIC]: Include
	"cuda/cuda.h" instead of <cuda.h>.
	* plugin/configfrag.ac <PLUGIN_NVPTX_DYNAMIC>: Don't set
	'PLUGIN_NVPTX_CPPFLAGS'.
	* configure: Regenerate.
2022-04-06 22:30:14 +02:00
Tom de Vries
52f42dce15 [libgomp, testsuite] Fix hardcoded libexec in plugin/configfrag.ac
When building an nvptx offloading configuration on openSUSE Leap 15.3, the
site script /usr/share/site/x86_64-unknown-linux-gnu is activated, setting
libexecdir to ${exec_prefix}/lib rather than ${exec_prefix}/libexec:
...
| # If user did not specify libexecdir, set the correct target:
| # Nor FHS nor openSUSE allow prefix/libexec. Let's default to prefix/lib.
|
| if test "$libexecdir" = '${exec_prefix}/libexec' ; then
|       libexecdir='${exec_prefix}/lib'
| fi
...

However, in libgomp libgomp/plugin/configfrag.ac we hardcode libexec:
...
    # Configure additional search paths.
    if test x"$tgt_dir" != x; then
      offload_additional_options="$offload_additional_options \
        -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) \
	-B$tgt_dir/bin"
...

Fix this by using /$(libexecdir:\$(exec_prefix)/%=%)/ instead of /libexec/.

Tested on x86_64-linux with nvptx accelerator.

libgomp/ChangeLog:

2022-03-28  Tom de Vries  <tdevries@suse.de>

	* plugin/configfrag.ac: Use /$(libexecdir:\$(exec_prefix)/%=%)/
	instead of /libexec/.
	* configure: Regenerate.
2022-03-28 14:09:02 +02:00
Kwok Cheung Yeung
a78b1ab1df amdgcn: Tune default OpenMP/OpenACC GPU utilization
libgomp/
	* plugin/plugin-gcn.c (parse_target_attributes): Automatically set
	the number of teams and threads if necessary.
	(gcn_exec): Automatically set the number of gangs and workers if
	necessary.

Co-Authored-By: Andrew Stubbs  <ams@codesourcery.com>
2022-01-16 17:25:36 +01:00
Chung-Lin Tang
fbb592407c libgomp: Fix GOMP_DEVICE_NUM_VAR stringification during offload image load
In the patch that implemented omp_get_device_num(), there was an error where
the stringification of GOMP_DEVICE_NUM_VAR, which is the macro expanding to
the actual symbol used, was erroneously using the STRINGX() macro in the
libgomp offload image symbol search, and expansion of the variable name
string through the additional layer of preprocessor symbol was not properly
achieved.

This patch fixes this by changing to properly use XSTRING(), also from
include/symcat.h.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Change uses of STRINGX
	into XSTRING when looking for GOMP_DEVICE_NUM_VAR in offload image.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
2022-01-04 17:26:23 +08:00
Jakub Jelinek
7adcbafe45 Update copyright years. 2022-01-03 10:42:10 +01:00
Andrew Stubbs
4a87a8e4b1 amdgcn: Change offload variable table discovery
Up to now the libgomp GCN plugin has been finding the offload variables
by using a symbol lookup, but the AMD runtime requires that the symbols are
global for that to work. This was ensured by mkoffload as a post-procssing
step, but the LLVM 13 assembler no longer accepts this in the case where the
variable was previously declared differently.

This patch switches to locating the symbols directly from the
offload_var_table, which means that only one symbol needs to be forced
global.

This changes breaks the libgomp image compatibility so GOMP_VERSION_GCN has
also been bumped.

gcc/ChangeLog:

	* config/gcn/mkoffload.c (process_asm): Process the variable table
	completely differently.
	(process_obj): Encode the varaible data differently.

include/ChangeLog:

	* gomp-constants.h (GOMP_VERSION_GCN): Bump.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (struct gcn_image_desc): Remove global_variables.
	(GOMP_OFFLOAD_load_image): Locate the offload variables via the
	table, not individual symbols.
2021-12-10 10:45:09 +00:00
Julian Brown
c408512e1f amdgcn: Enable OpenACC worker partitioning for AMD GCN
gcc/
	* config/gcn/gcn.c (gcn_init_builtins): Override decls for
	BUILT_IN_GOACC_SINGLE_START, BUILT_IN_GOACC_SINGLE_COPY_START,
	BUILT_IN_GOACC_SINGLE_COPY_END and BUILT_IN_GOACC_BARRIER.
	(gcn_goacc_validate_dims): Turn on worker partitioning unconditionally.
	(gcn_fork_join): Update comment.
	* config/gcn/gcn.opt (flag_worker_partitioning): Remove.
	(macc_experimental_workers): Remove unused option.
	libgomp/
	* plugin/plugin-gcn.c (gcn_exec): Change default number of workers to
	16.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
	[acc_device_radeon]: Update.
	* testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c
	[ACC_DEVICE_TYPE_radeon]: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
	[acc_device_radeon]: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c
	[ACC_DEVICE_TYPE_radeon]: Likewise.
	* testsuite/libgomp.oacc-fortran/optional-reduction.f90: XFAIL for
	'openacc_radeon_accel_selected' and '-O0'.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: Likewise.

Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com>
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-08-09 15:08:44 +02:00
Chung-Lin Tang
0bac793ed6 openmp: Implement omp_get_device_num routine
This patch implements the omp_get_device_num library routine, specified in
OpenMP 5.0.

GOMP_DEVICE_NUM_VAR is a macro symbol which defines name of a "device number"
variable, is defined on the device-side libgomp, has it's address returned to
host-side libgomp during device initialization, and the host libgomp then
sets its value to the designated device number.

libgomp/ChangeLog:

	* icv-device.c (omp_get_device_num): New API function, host side.
	* fortran.c (omp_get_device_num_): New interface function.
	* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol.
	* libgomp.map (OMP_5.0.2): New version space with omp_get_device_num,
	omp_get_device_num_.
	* libgomp.texi (omp_get_device_num): Add documentation for new API
	function.
	* omp.h.in (omp_get_device_num): Add declaration.
	* omp_lib.f90.in (omp_get_device_num): Likewise.
	* omp_lib.h.in (omp_get_device_num): Likewise.
	* target.c (gomp_load_image_to_device): If additional entry for device
	number exists at end of returned entries from 'load_image_func' hook,
	copy the assigned device number over to the device variable.

	* config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
	(omp_get_device_num): New API function, device side.
	* plugin/plugin-gcn.c ("symcat.h"): Add include.
	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
	at end of returned 'target_table' entries.

	* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
	(omp_get_device_num): New API function, device side.
	* plugin/plugin-nvptx.c ("symcat.h"): Add include.
	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
	at end of returned 'target_table' entries.

	* testsuite/lib/libgomp.exp
	(check_effective_target_offload_target_intelmic): New function for
	testing for intelmic offloading.
	* testsuite/libgomp.c-c++-common/target-45.c: New test.
	* testsuite/libgomp.fortran/target10.f90: New test.
2021-08-05 23:29:03 +08:00
Julian Brown
9c41f5b9cd Fix OpenACC "ephemeral" asynchronous host-to-device copies
This patch fixes several places in libgomp/target.c where "ephemeral" data
(on the stack or in temporary heap locations) may be used as the source of
an asynchronous host-to-device copy that may not complete before the host
data disappears.

An existing, but flawed, workaround for this problem in the AMD GCN
libgomp offloading plugin is currently present on mainline, and was
posted for the og9 branch here:

  https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg00901.html

and previous versions of this patch were posted here (for mainline/og9):

  https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg01482.html
  https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01026.html

libgomp/
	* libgomp.h (gomp_copy_host2dev): Update prototype.
	* oacc-mem.c (memcpy_tofrom_device, update_dev_host): Add new
	argument to gomp_copy_host2dev (false).
	* plugin/plugin-gcn.c (struct copy_data): Remove free_src field.
	(copy_data): Don't free src.
	(queue_push_copy): Remove free_src handling.
	(GOMP_OFFLOAD_dev2dev): Update call to queue_push_copy.
	(GOMP_OFFLOAD_openacc_async_host2dev): Remove source-data
	snapshotting.
	(GOMP_OFFLOAD_openacc_async_dev2host): Update call to
	queue_push_copy.
	* target.c (goacc_device_copy_async): Add SRCADDR_ORIG parameter.
	(gomp_copy_host2dev): Add EPHEMERAL parameter.  Snapshot source
	data when true, and set up deferred freeing of temporary buffer.
	(gomp_copy_dev2host): Update call to goacc_device_copy_async.
	(gomp_map_vars_existing, gomp_map_pointer, gomp_attach_pointer)
	(gomp_detach_pointer, gomp_map_vars_internal, gomp_update): Update
	calls to gomp_copy_host2dev with appropriate ephemeral argument.
	* testsuite/libgomp.oacc-c-c++-common/async-data-1-1.c: Remove
	XFAIL.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-07-27 11:16:27 +02:00
Thomas Schwinge
30656822b3 [GCN] Fix run-time variable 'num_workers'
... which currently has *not* been forced to 'num_workers (1)'.

In addition to the testcases modified here, this also fixes:

    FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/mode-transitions.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  execution test
    [Etc.]

    mode-transitions.exe: [...]/libgomp.oacc-c-c++-common/mode-transitions.c:702: t17: Assertion `arr_b[i] == (i ^ 31) * 8' failed.

	libgomp/
	* plugin/plugin-gcn.c (gcn_exec): Force 'num_workers (1)'
	unconditionally.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c:
	Update.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c: Likewise.
2021-06-08 12:00:15 +02:00
Thomas Schwinge
7c1e856bed libgomp HSA/GCN plugins: don't prepend the 'HSA_RUNTIME_LIB' path to 'libhsa-runtime64.so'
For unknown reasons, this had gotten added for the libgomp HSA plugin in commit
b8d89b03db (r242749) "Remove build dependence on
HSA run-time", and later propagated into the GCN plugin.

	libgomp/
	* plugin/plugin-gcn.c (init_environment_variables): Don't prepend
	the 'HSA_RUNTIME_LIB' path to 'libhsa-runtime64.so'.
	* plugin/configfrag.ac (HSA_RUNTIME_LIB): Clean up.
	* config.h.in: Regenerate.
	* configure: Likewise.
2021-03-25 14:11:50 +01:00
Jakub Jelinek
f65e551f73 libgomp: Use sizeof(void*) based checks instead of looking through $CC $CFLAGS for -m32/-mx32
Some gcc configurations default to -m32 but support -m64 too.  This patch
just makes the ILP32 tests more reliable by following what e.g. libsanitizer
configury does.

2021-03-04  Jakub Jelinek  <jakub@redhat.com>

	* configure.ac: Add AC_CHECK_SIZEOF([void *]).
	* plugin/configfrag.ac: Check $ac_cv_sizeof_void_p value instead of
	checking of -m32 or -mx32 options on the command line.
	* config.h.in: Regenerated.
	* configure: Regenerated.
2021-03-04 09:43:34 +01:00
Andrew Stubbs
3535402e20 amdgcn: Add gfx908 support
gcc/

	* config/gcn/gcn-opts.h (enum processor_type): Add PROCESSOR_GFX908.
	* config/gcn/gcn.c (gcn_omp_device_kind_arch_isa): Add gfx908.
	(output_file_start): Add gfx908.
	* config/gcn/gcn.opt (gpu_type): Add gfx908.
	* config/gcn/t-gcn-hsa (MULTILIB_OPTIONS): Add march=gfx908.
	(MULTILIB_DIRNAMES): Add gfx908.
	* config/gcn/mkoffload.c (EF_AMDGPU_MACH_AMDGCN_GFX908): New define.
	(main): Recognize gfx908.
	* config/gcn/t-omp-device: Add gfx908.

libgomp/

	* plugin/plugin-gcn.c (EF_AMDGPU_MACH): Add
	EF_AMDGPU_MACH_AMDGCN_GFX908.
	(gcn_gfx908_s): New constant string.
	(isa_hsa_name): Add gfx908.
	(isa_code): Add gfx908.
2021-02-03 14:25:33 +00:00
Thomas Schwinge
6106dfb9f7 [nvptx libgomp plugin] Build only in supported configurations
As recently again discussed in <https://gcc.gnu.org/PR97436> "[nvptx] -m32
support", nvptx offloading other than for 64-bit host has never been
implemented, tested, supported.  So we simply should buildn't the nvptx libgomp
plugin in this case.

This avoids build problems if, for example, in a (standard) bi-arch
x86_64-pc-linux-gnu '-m64'/'-m32' build, libcuda is available only in a 64-bit
variant but not in a 32-bit one, which, for example, is the case if you build
GCC against the CUDA toolkit's 'stubs/libcuda.so' (see
<https://stackoverflow.com/a/52784819>).

This amends PR65099 commit a92defdab7 (r225560)
"[nvptx offloading] Only 64-bit configurations are currently supported" to
match the way we're doing this for the HSA/GCN plugins.

	libgomp/
	PR libgomp/65099
	* plugin/configfrag.ac (PLUGIN_NVPTX): Restrict to supported
	configurations.
	* configure: Regenerate.
	* plugin/plugin-nvptx.c (nvptx_get_num_devices): Remove 64-bit
	check.
2021-01-14 18:48:00 +01:00
Julian Brown
6b577a17b2 nvptx: Cache stacks block for OpenMP kernel launch
2021-01-05  Julian Brown  <julian@codesourcery.com>

libgomp/
	* plugin/plugin-nvptx.c (SOFTSTACK_CACHE_LIMIT): New define.
	(struct ptx_device): Add omp_stacks struct.
	(nvptx_open_device): Initialise cached-stacks housekeeping info.
	(nvptx_close_device): Free cached stacks block and mutex.
	(nvptx_stacks_free): New function.
	(nvptx_alloc): Add SUPPRESS_ERRORS parameter.
	(GOMP_OFFLOAD_alloc): Add strategies for freeing soft-stacks block.
	(nvptx_stacks_alloc): Rename to...
	(nvptx_stacks_acquire): This.  Cache stacks block between runs if same
	size or smaller is required.
	(nvptx_stacks_free): Remove.
	(GOMP_OFFLOAD_run): Call nvptx_stacks_acquire and lock stacks block
	during kernel execution.
2021-01-05 09:56:36 -08:00
Jakub Jelinek
99dee82307 Update copyright years. 2021-01-04 10:26:59 +01:00
Andrew Stubbs
85f0a4d982 Import HSA header files from AMD
These are the same header files that exist in the Radeon Open Compute Runtime
project (as of October 2020), but they have been specially relicensed by AMD
for use in GCC.

The header files retain AMD copyright.

include/ChangeLog:

	* hsa.h: Replace whole file.
	* hsa_ext_amd.h: New file.
	* hsa_ext_image.h: New file.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c: Include hsa_ext_amd.h.
	(HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT): Delete redundant definition.
2020-12-09 11:10:40 +00:00
Andrew Stubbs
97981e13b7 Tweak plugin-gcn.c defines
Ensure the code will continue to compile when elf.h gets these definitions.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c: Don't redefine relocations if elf.h has them.
	(reserved): Delete unused define.
2020-11-24 14:56:38 +00:00
Tom de Vries
7345ef6c2a [libgomp, nvptx] Report launch dimensions in GOMP_OFFLOAD_run
Using this patch, when using GOMP_DEBUG=1 and launching a kernel in
GOMP_OFFLOAD_run (used by the omp implementation), we see the kernel launch
dimensions:
...
  GOMP_OFFLOAD_run: kernel main$_omp_fn$0: \
    launch [(teams: 1), 1, 1] [(lanes: 32), (threads: 1), 1]
...

Build on x86_64-linux with nvptx accelerator, tested libgomp.

libgomp/ChangeLog:

2020-10-08  Tom de Vries  <tdevries@suse.de>

	PR libgomp/81802
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_run): Report launch
	dimensions.
2020-10-08 11:03:29 +02:00
Tom de Vries
c0e9cee285 [libgomp, nvptx] Print error log for link error
By running libgomp test-case libgomp.c/target-28.c with GOMP_NVPTX_PTXRW=w
(using a maintenance patch that adds support for this env var), we dump the
ptx in target-28.exe to file.  By editing one ptx file to rename
gomp_nvptx_main to gomp_nvptx_main2 in both declaration and call, and
running with GOMP_NVPTX_PTXRW=r, we trigger a link error:
...
$ GOMP_NVPTX_PTXRW=r ./target-28.exe
libgomp: cuLinkComplete error: unknown error
...
The error is somewhat uninformative.

Fix this by dumping the error log returned by the failing cuda call, such
that we have instead:
...
$ GOMP_NVPTX_PTXRW=r ./target-28.exe
libgomp: Link error log error   : \
  Undefined reference to 'gomp_nvptx_main2' in ''
libgomp: cuLinkComplete error: unknown error
...

Build on x86_64 with nvptx accelerator, tested libgomp.

libgomp/ChangeLog:

	* plugin/plugin-nvptx.c (link_ptx): Print elog if cuLinkComplete call
	fails.
2020-09-22 13:38:00 +02:00