Commit graph

247 commits

Author SHA1 Message Date
Thomas Schwinge
30e42bb66d libgomp testsuite: Have each '*.exp' file specify the compiler to use [PR91884]
..., which is still 'GCC_UNDER_TEST' for all of them; no change in behavior.

	PR testsuite/91884
	libgomp/
	* testsuite/lib/libgomp.exp (libgomp_target_compile): Don't
	specify compiler.
	* testsuite/libgomp.c++/c++.exp (ALWAYS_CFLAGS): Specify compiler.
	* testsuite/libgomp.c/c.exp (ALWAYS_CFLAGS): Likewise.
	* testsuite/libgomp.fortran/fortran.exp (ALWAYS_CFLAGS): Likewise.
	* testsuite/libgomp.graphite/graphite.exp (ALWAYS_CFLAGS):
	Likewise.
	* testsuite/libgomp.oacc-c++/c++.exp (ALWAYS_CFLAGS): Likewise.
	* testsuite/libgomp.oacc-c/c.exp (ALWAYS_CFLAGS): Likewise.
	* testsuite/libgomp.oacc-fortran/fortran.exp (ALWAYS_CFLAGS):
	Likewise.
2023-05-15 12:11:17 +02:00
Thomas Schwinge
b1cdda9f36 libgomp testsuite: Localize 'lang_[...]' etc.
..., instead of letting them bleed into the next '*.exp' file, requiring
clean-up there.

	libgomp/
	* testsuite/libgomp.c++/c++.exp: Localize 'lang_[...]' etc.
	* testsuite/libgomp.c/c.exp: Likewise.
	* testsuite/libgomp.fortran/fortran.exp: Likewise.
	* testsuite/libgomp.graphite/graphite.exp: Likewise.
	* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
	* testsuite/libgomp.oacc-c/c.exp: Likewise.
	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
2023-05-12 09:15:31 +02:00
Thomas Schwinge
2ed5ceba0f libgomp testsuite: Use 'lang_test_file_found' instead of 'lang_test_file'
The value of 'lang_test_file' isn't actually used anywhere.

	libgomp/
	* testsuite/libgomp.c++/c++.exp: Don't set 'lang_test_file'.
	* testsuite/libgomp.fortran/fortran.exp: Likewise.
	* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
	* testsuite/libgomp.c/c.exp: Unset 'lang_test_file_found' instead of
	'lang_test_file'.
	* testsuite/libgomp.oacc-c/c.exp: Likewise.
	* testsuite/libgomp.graphite/graphite.exp: Likewise.
	* testsuite/lib/libgomp.exp (libgomp_target_compile): Look for
	'lang_test_file_found' instead of 'lang_test_file'.
2023-05-09 14:32:45 +02:00
Tobias Burnus
1c101fcfaa 'omp scan' struct block seq update for OpenMP 5.x
While OpenMP 5.0 required a single structured block before and after the
'omp scan' directive, OpenMP 5.1 changed this to a 'structured block sequence,
denoting 2 or more executable statements in OpenMP 5.1 (whoops!) and zero or
more in OpenMP 5.2. This commit updates C/C++ to accept zero statements (but
till requires the '{' ... '}' for the final-loop-body) and updates Fortran
to accept zero or more than one statements.

If there is no preceeding or succeeding executable statement, a warning is
shown.

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_scan_loop_body): Handle
	zero exec statements before/after 'omp scan'.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_scan_loop_body): Handle
	zero exec statements before/after 'omp scan'.

gcc/fortran/ChangeLog:

	* openmp.cc (gfc_resolve_omp_do_blocks): Handle zero
	or more than one exec statements before/after 'omp scan'.
	* trans-openmp.cc (gfc_trans_omp_do): Likewise.

libgomp/ChangeLog:

	* testsuite/libgomp.c-c++-common/scan-1.c: New test.
	* testsuite/libgomp.c/scan-23.c: New test.
	* testsuite/libgomp.fortran/scan-2.f90: New test.

gcc/testsuite/ChangeLog:

	* g++.dg/gomp/attrs-7.C: Update dg-error/dg-warning.
	* gfortran.dg/gomp/loop-2.f90: Likewise.
	* gfortran.dg/gomp/reduction5.f90: Likewise.
	* gfortran.dg/gomp/reduction6.f90: Likewise.
	* gfortran.dg/gomp/scan-1.f90: Likewise.
	* gfortran.dg/gomp/taskloop-2.f90: Likewise.
	* c-c++-common/gomp/scan-6.c: New test.
	* gfortran.dg/gomp/scan-8.f90: New test.
2023-04-25 16:29:14 +02:00
Kwok Cheung Yeung
ce9cd7258d amdgcn: Enable SIMD vectorization of math functions
Calls to vectorized versions of routines in the math library will now
be inserted when vectorizing code containing supported math functions.

2023-03-02  Kwok Cheung Yeung  <kcy@codesourcery.com>
	    Paul-Antoine Arras  <pa@codesourcery.com>

	gcc/
	* builtins.cc (mathfn_built_in_explicit): New.
	* config/gcn/gcn.cc: Include case-cfn-macros.h.
	(mathfn_built_in_explicit): Add prototype.
	(gcn_vectorize_builtin_vectorized_function): New.
	(gcn_libc_has_function): New.
	(TARGET_LIBC_HAS_FUNCTION): Define.
	(TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION): Define.

	gcc/testsuite/
	* gcc.target/gcn/simd-math-1.c: New testcase.
	* gcc.target/gcn/simd-math-2.c: New testcase.

	libgomp/
	* testsuite/libgomp.c/simd-math-1.c: New testcase.
2023-03-02 20:56:53 +00:00
Jakub Jelinek
46644ec99c openmp: Fix up OpenMP expansion of non-rectangular loops [PR108459]
expand_omp_for_init_counts was using for the case where collapse(2)
inner loop has init expression dependent on non-constant multiple of
the outer iterator and the condition upper bound expression doesn't
depend on the outer iterator fold_unary (NEGATE_EXPR, ...).  This
will just return NULL if it can't be folded, we need fold_build1
instead.

2023-01-19  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/108459
	* omp-expand.cc (expand_omp_for_init_counts): Use fold_build1 rather
	than fold_unary for NEGATE_EXPR.

	* testsuite/libgomp.c/pr108459.c: New test.
2023-01-19 21:00:08 +01:00
Jakub Jelinek
83ffe9cde7 Update copyright years. 2023-01-16 11:52:17 +01:00
Paul-Antoine Arras
1fd508744e amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors
Add support for gfx803 as an alias for fiji.
Add test cases for all supported 'isa' values.

gcc/ChangeLog:

	* config/gcn/gcn.cc (gcn_omp_device_kind_arch_isa): Add gfx803.
	* config/gcn/t-omp-device: Add gfx803.

libgomp/ChangeLog:

	* testsuite/libgomp.c/declare-variant-4-fiji.c: New test.
	* testsuite/libgomp.c/declare-variant-4-gfx803.c: New test.
	* testsuite/libgomp.c/declare-variant-4-gfx900.c: New test.
	* testsuite/libgomp.c/declare-variant-4-gfx906.c: New test.
	* testsuite/libgomp.c/declare-variant-4-gfx908.c: New test.
	* testsuite/libgomp.c/declare-variant-4-gfx90a.c: New test.
	* testsuite/libgomp.c/declare-variant-4.h: New header file.
2022-11-30 10:51:42 +01:00
Sandra Loosemore
309e2d95e3 OpenMP: Generate SIMD clones for functions with "declare target"
This patch causes the IPA simdclone pass to generate clones for
functions with the "omp declare target" attribute as if they had
"omp declare simd", provided the function appears to be suitable for
SIMD execution.  The filter is conservative, rejecting functions
that write memory or that call other functions not known to be safe.
A new option -fopenmp-target-simd-clone is added to control this
transformation; it's enabled for offload processing at -O2 and higher.

gcc/ChangeLog:

	* common.opt (fopenmp-target-simd-clone): New option.
	(target_simd_clone_device): New enum to go with it.
	* doc/invoke.texi (-fopenmp-target-simd-clone): Document.
	* flag-types.h (enum omp_target_simd_clone_device_kind): New.
	* omp-simd-clone.cc (auto_simd_fail): New function.
	(auto_simd_check_stmt): New function.
	(plausible_type_for_simd_clone): New function.
	(ok_for_auto_simd_clone): New function.
	(simd_clone_create): Add force_local argument, make the symbol
	have internal linkage if it is true.
	(expand_simd_clones): Also check for cloneable functions with
	"omp declare target".  Pass explicit_p argument to
	simd_clone.compute_vecsize_and_simdlen target hook.
	* opts.cc (default_options_table): Add -fopenmp-target-simd-clone.
	* target.def (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN):
	Add bool explicit_p argument.
	* doc/tm.texi: Regenerated.
	* config/aarch64/aarch64.cc
	(aarch64_simd_clone_compute_vecsize_and_simdlen): Update.
	* config/gcn/gcn.cc
	(gcn_simd_clone_compute_vecsize_and_simdlen): Update.
	* config/i386/i386.cc
	(ix86_simd_clone_compute_vecsize_and_simdlen): Update.

gcc/testsuite/ChangeLog:

	* g++.dg/gomp/target-simd-clone-1.C: New.
	* g++.dg/gomp/target-simd-clone-2.C: New.
	* gcc.dg/gomp/target-simd-clone-1.c: New.
	* gcc.dg/gomp/target-simd-clone-2.c: New.
	* gcc.dg/gomp/target-simd-clone-3.c: New.
	* gcc.dg/gomp/target-simd-clone-4.c: New.
	* gcc.dg/gomp/target-simd-clone-5.c: New.
	* gcc.dg/gomp/target-simd-clone-6.c: New.
	* gcc.dg/gomp/target-simd-clone-7.c: New.
	* gcc.dg/gomp/target-simd-clone-8.c: New.
	* lib/scanoffloadipa.exp: New.

libgomp/ChangeLog:

	* testsuite/lib/libgomp.exp: Load scanoffloadipa.exp library.
	* testsuite/libgomp.c/target-simd-clone-1.c: New.
	* testsuite/libgomp.c/target-simd-clone-2.c: New.
	* testsuite/libgomp.c/target-simd-clone-3.c: New.
2022-11-25 18:13:22 +00:00
Thomas Schwinge
b61796663b Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c'
That is, '-mptx=_' is only valid in '-foffload-options=nvptx-none', too.

Fix test case added in recent
commit r13-2625-g6b43f556f392a7165582aca36a19fe7389d995b2 "nvptx/mkoffload.cc:
Warn instead of error when reverse offload is not possible".

	libgomp/
	* testsuite/libgomp.c/reverse-offload-sm30.c: Fix nvptx-specific
	'-foffload-options' syntax.
2022-10-17 13:50:57 +02:00
Jakub Jelinek
a58a965eb7 libgomp: Fix up creation of artificial teams
When not in explicit parallel/target/teams construct, we in some cases create
an artificial parallel with a single thread (either to handle target nowait
or for task reduction purposes).  In those cases, it handled again artificially
created implicit task (created by gomp_new_icv for cases where we needed to write
to some ICVs), but as the testcases show, didn't take into account possibility
of this being done from explicit task(s).  The code would destroy/free the previous
task and replace it with the new implicit task.  If task is an explicit task
(when teams is NULL, all explicit tasks behave like if (0)), it is a pointer to
a local stack variable, so freeing it doesn't work, and additionally we shouldn't
lose the explicit tasks - the new implicit task should instead replace the
ancestor task which is the first implicit one.

2022-10-12  Jakub Jelinek  <jakub@redhat.com>

	* task.c (gomp_create_artificial_team): Fix up handling of invocations
	from within explicit task.
	* target.c (GOMP_target_ext): Likewise.
	* testsuite/libgomp.c/task-7.c: New test.
	* testsuite/libgomp.c/task-8.c: New test.
	* testsuite/libgomp.c-c++-common/task-reduction-17.c: New test.
	* testsuite/libgomp.c-c++-common/task-reduction-18.c: New test.
2022-10-12 17:54:08 +02:00
Tobias Burnus
6b43f556f3 nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible
Reverse offload requests at least -misa=sm_35; with this patch, a warning
instead of an error is shown, still permitting reverse offload for all
other configured device types. This is achieved by not calling
GOMP_offload_register_ver (and stopping generating pointless 'static const char'
variables, once known.)

The tool_name as progname changes adds "nvptx " and "gcn " to the
"mkoffload: warning/error:" diagnostic.

gcc/ChangeLog:

	* config/nvptx/mkoffload.cc (process): Replace a fatal_error by
	a warning + not enabling offloading if -misa=sm_30 prevents
	reverse offload.
	(main): Use tool_name as progname for diagnostic.
	* config/gcn/mkoffload.cc (main): Likewise.

libgomp/ChangeLog:

	* libgomp.texi (Offload-Target Specifics: nvptx): Document
	that reverse offload requires >= -march=sm_35.
	* testsuite/libgomp.c-c++-common/requires-4.c: Build for nvptx
	with -misa=sm_35.
	* testsuite/libgomp.c-c++-common/requires-5.c: Likewise.
	* testsuite/libgomp.c-c++-common/requires-6.c: Likewise.
	* testsuite/libgomp.c-c++-common/reverse-offload-1.c: Likewise.
	* testsuite/libgomp.fortran/reverse-offload-1.f90: Likewise.
	* testsuite/libgomp.c/reverse-offload-sm30.c: New test.
2022-09-12 15:25:13 +02:00
Jakub Jelinek
f25a6767ec openmp: Implement doacross(sink: omp_cur_iteration - 1)
This patch implements doacross(sink: omp_cur_iteration - 1) that the
previous patchset emitted a sorry on during omp expansion.
It can be implemented with existing library functions.

To recap, depend(source)/doacross(source:)/doacross(source:omp_cur_iteration)
is implemented calling GOMP_doacross_post or GOMP_doacross_ull_post,
called with an array of long or unsigned long long elements, one for
all collapsed loops together and one for each further ordered loop if any.
We initialize that array in each thread when grabbing further set of iterations
and update it at the end of loops, so that it represents the current iteration
(as 0 based counters).  When the worksharing loop is created, we tell the
library through another similar array the counts (the loop needs to be
rectangular) in each dimension, first element is count of all logical iterations
in the collapsed loops.

depend(sink:v1 op N1, v2 op N2, ...) is then implemented by conditionally calling
GOMP_doacross_wait/GOMP_doacross_ull_wait.  For N? of 0 there is no check,
otherwise if it wants to wait in a particular dimension for a previous iteration,
we check that the corresponding iterator isn't the first one (or first few),
where the previous iterator in that dimension would be out of range, and similarly
for checking of next iteration in a dimension that it isn't the last one (or last few)
where it would be similarly out of bounds.  Then the collapsed loop counters are
folded into a single 0 based counter (first argument) and then other 0 based
iterations counters on what iteration it should wait for.

Now, doacross(sink: omp_cur_iteration - 1) is supposed to wait for the previous
logical iteration in the combined iteration space of all ordered loops.
For the very first iteration in that combined iteration space it does nothing,
there is no previous iteration.  And similarly it does nothing if there
are more ordered loops than collapsed loop and it isn't the first logical
iteration of the combined loops inside of the collapsed loops, because as implemented
we know the previous iteration in that case is always executed by the same thread
as the current one.
In the implementation, we use the same value as is stored in the first element
of the array for GOMP_doacross_post/GOMP_doacross_ull_post, if that value is 0,
we do nothing.  The rest is different based on if ordered argument is equal to
collapse or not.  If it is, then we otherwise call
GOMP_doacross_wait/GOMP_doacross_ull_wait with a single argument, one less than
that counter we compare against 0.
If ordered argument is bigger than collapse, we add a per-thread boolean variable
.first.N, which we set to true at the start of the outermost ordered loop inside
of the collapsed set of loops and set to false at the end of the innermost
ordered loop.  If .first.N is false, we don't do anything (we know the previous
iteration was handled by the current thread and by my reading of the spec we don't
need to emit even a memory barrier in that case, because it is just synchronization
with the same thread), otherwise we call GOMP_doacross_wait/GOMP_doacross_ull_wait
with the first argument one less than the counter we compare against 0, and then
one less than 2nd and following counts if iterations we pass to the workshare
initialization.  If say .counts.N passed to the workshare initialization is
{ 256, 13, 5, 2 } for collapse(3) ordered(6) loop, then
GOMP_doacross_post/GOMP_doacross_ull_post is called with arguments equal to
.ordereda.N[0] - 1, 12, 4, 1.

2022-09-08  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* omp-expand.cc (expand_omp_ordered_sink): Add CONT_BB argument.
	Add doacross(sink:omp_cur_iteration-1) support.
	(expand_omp_ordered_source_sink): Clear counts[fd->ordered + 1].
	Adjust expand_omp_ordered_sink caller.
	(expand_omp_for_ordered_loops): If counts[fd->ordered + 1] is
	non-NULL, set that variable to true at the start of outermost
	non-collapsed loop and set it to false at the end of innermost
	ordered loop.
	(expand_omp_for_generic): If fd->ordered, allocate
	1 + (fd->ordered - fd->collapse) further elements in counts array.
	Copy to counts + 2 + fd->ordered the counts of fd->collapse ..
	fd->ordered - 1 loop if any.
gcc/testsuite/
	* c-c++-common/gomp/doacross-7.c: New test.
libgomp/
	* libgomp.texi (OpenMP 5.2): Mention that omp_cur_iteration is now
	fully supported.
	* testsuite/libgomp.c/doacross-4.c: New test.
	* testsuite/libgomp.c/doacross-5.c: New test.
	* testsuite/libgomp.c/doacross-6.c: New test.
	* testsuite/libgomp.c/doacross-7.c: New test.
2022-09-08 13:32:51 +02:00
Tobias Burnus
d9c9424d2c OpenMP: Fix var replacement with 'simd' and linear-step vars [PR106548]
gcc/ChangeLog:

	PR middle-end/106548
	* omp-low.cc (lower_rec_input_clauses): Use build_outer_var_ref
	for 'simd' linear-step values that are variable.

libgomp/ChangeLog:

	PR middle-end/106548
	* testsuite/libgomp.c/linear-2.c: New test.
2022-08-17 15:45:56 +02:00
Jakub Jelinek
85d613da34 libgomp: Fix up target-31.c test [PR106045]
The i variable is used inside of the parallel in:
      #pragma omp simd safelen(32) private (v)
      for (i = 0; i < 64; i++)
        {
          v = 3 * i;
          ll[i] = u1 + v * u2[0] + u2[1] + x + y[0] + y[1] + v + h[0] + u3[i];
        }
where i is predetermined linear (so while inside of the body
it is safe, private per SIMD lane var) the final value is written to
the shared variable, and in:
      for (i = 0; i < 64; i++)
        if (ll[i] != u1 + 3 * i * u2[0] + u2[1] + x + y[0] + y[1] + 3 * i + 13 + 14 + i)
          #pragma omp atomic write
            err = 1;
which is a normal loop and so it isn't in any way privatized there.
So we have a data race, fixed by adding private (i) clause to the
parallel.

2022-06-21  Jakub Jelinek  <jakub@redhat.com>
	    Paul Iannetta  <piannetta@kalrayinc.com>

	PR libgomp/106045
	* testsuite/libgomp.c/target-31.c: Add private (i) clause.
2022-06-21 17:51:08 +02:00
Jakub Jelinek
1158fe4340 openmp: Conforming device numbers and omp_{initial,invalid}_device
OpenMP 5.2 changed once more what device numbers are allowed.
In 5.1, valid device numbers were [0, omp_get_num_devices()].
5.2 makes also -1 valid (calls it omp_initial_device), which is equivalent
in behavior to omp_get_num_devices() number but has the advantage that it
is a constant.  And it also introduces omp_invalid_device which is
also a constant with implementation defined value < -1.  That value should
act like sNaN, any time any device construct (GOMP_target*) or OpenMP runtime
API routine is asked for such a device, the program is terminated.
And if OMP_TARGET_OFFLOAD=mandatory, all non-conforming device numbers (which
is all but [-1, omp_get_num_devices()] other than omp_invalid_device)
must be treated like omp_invalid_device.

For device constructs, we have a compatibility problem, we've historically
used 2 magic negative values to mean something special.
GOMP_DEVICE_ICV (-1) means device clause wasn't present, pick the
		     omp_get_default_device () number
GOMP_DEVICE_FALLBACK (-2) means the host device (this is used e.g. for
			  #pragma omp target if (cond)
			  where if cond is false, we pass -2
But 5.2 requires that omp_initial_device is -1 (there were discussions
about it, advantage of -1 is that one can say iterate over the
[-1, omp_get_num_devices()-1] range to get all devices starting with
the host/initial one.
And also, if user passes -2, unless it is omp_invalid_device, we need to
treat it like non-conforming with OMP_TARGET_OFFLOAD=mandatory.

So, the patch does on the compiler side some number remapping,
user_device_num >= -2U ? user_device_num - 1 : user_device_num.
This remapping is done at compile time if device clause has constant
argument, otherwise at runtime, and means that for user -1 (omp_initial_device)
we pass -2 to GOMP_* in the runtime library where it treats it like host
fallback, while -2 is remapped to -3 (one of the non-conforming device numbers,
for those it doesn't matter which one is which).
omp_invalid_device is then -4.
For the OpenMP device runtime APIs, no remapping is done.

This patch doesn't deal with the initial default-device-var for
OMP_TARGET_OFFLOAD=mandatory , the spec says that the inital ICV value
for that should in that case depend on whether there are any offloading
devices or not (if not, should be omp_invalid_device), but that means
we can't determine the number of devices lazily (and let libraries have the
possibility to register their offloading data etc.).

2022-06-13  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* omp-expand.cc (expand_omp_target): Remap user provided
	device clause arguments, -1 to -2 and -2 to -3, either
	at compile time if constant, or at runtime.
include/
	* gomp-constants.h (GOMP_DEVICE_INVALID): Define.
libgomp/
	* omp.h.in (omp_initial_device, omp_invalid_device): New enumerators.
	* omp_lib.f90.in (omp_initial_device, omp_invalid_device): New
	parameters.
	* omp_lib.h.in (omp_initial_device, omp_invalid_device): Likewise.
	* target.c (resolve_device): Add remapped argument, handle
	GOMP_DEVICE_ICV only if remapped is true (and clear remapped),
	for negative values, treat GOMP_DEVICE_FALLBACK as fallback only
	if remapped, otherwise treat omp_initial_device that way.  For
	omp_invalid_device, always emit gomp_fatal, even when
	OMP_TARGET_OFFLOAD isn't mandatory.
	(GOMP_target, GOMP_target_ext, GOMP_target_data, GOMP_target_data_ext,
	GOMP_target_update, GOMP_target_update_ext,
	GOMP_target_enter_exit_data): Pass true as remapped argument to
	resolve_device.
	(omp_target_alloc, omp_target_free, omp_target_is_present,
	omp_target_memcpy_check, omp_target_associate_ptr,
	omp_target_disassociate_ptr, omp_get_mapped_ptr,
	omp_target_is_accessible): Pass false as remapped argument to
	resolve_device.  Treat omp_initial_device the same as
	gomp_get_num_devices ().  Don't bypass resolve_device calls if
	device_num is negative.
	(omp_pause_resource): Treat omp_initial_device the same as
	gomp_get_num_devices ().  Call resolve_device.
	* icv-device.c (omp_set_default_device): Always set to device_num
	even when it is negative.
	* libgomp.texi: Document that Conforming device numbers,
	omp_initial_device and omp_invalid_device is implemented.
	* testsuite/libgomp.c/target-41.c (main): Add test with
	omp_initial_device.
	* testsuite/libgomp.c/target-45.c: New test.
	* testsuite/libgomp.c/target-46.c: New test.
	* testsuite/libgomp.c/target-47.c: New test.
	* testsuite/libgomp.c-c++-common/target-is-accessible-1.c (main): Add
	test with omp_initial_device.  Use -5 instead of -1 for negative value
	test.
	* testsuite/libgomp.fortran/target-is-accessible-1.f90 (main):
	Likewise.  Reorder stop numbers.
2022-06-13 14:02:37 +02:00
Jakub Jelinek
0ccba4ed85 openmp: Add support for enter clause on declare target
OpenMP 5.1 and earlier had 2 different uses of to clause, one for target
update construct with one semantics, and one for declare target directive
with a different semantics.
Under the hood we were using OMP_CLAUSE_TO_DECLARE to represent the latter.
OpenMP 5.2 renamed the declare target clause to to enter, the old one is
kept as a deprecated alias.

As we are far from having full OpenMP 5.2 support, this patch adds support
for the enter clause (and renames OMP_CLAUSE_TO_DECLARE to OMP_CLAUSE_ENTER
with a flag to tell the spelling of the clause for better diagnostics),
but doesn't deprecate the to clause on declare target just yet (that
should be done as one of the last steps in 5.2 support).

2022-05-27  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree-core.h (enum omp_clause_code): Rename OMP_CLAUSE_TO_DECLARE
	to OMP_CLAUSE_ENTER.
	* tree.h (OMP_CLAUSE_ENTER_TO): Define.
	* tree.cc (omp_clause_num_ops, omp_clause_code_name): Rename
	OMP_CLAUSE_TO_DECLARE to OMP_CLAUSE_ENTER.
	* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_ENTER
	instead of OMP_CLAUSE_TO_DECLARE, if OMP_CLAUSE_ENTER_TO, print
	"to" instead of "enter".
	* tree-nested.cc (convert_nonlocal_omp_clauses,
	convert_local_omp_clauses): Handle OMP_CLAUSE_ENTER instead of
	OMP_CLAUSE_TO_DECLARE.
gcc/c-family/
	* c-pragma.h (enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_ENTER.
gcc/c/
	* c-parser.cc (c_parser_omp_clause_name): Parse enter clause.
	(c_parser_omp_all_clauses): For to clause on declare target, use
	OMP_CLAUSE_ENTER clause with OMP_CLAUSE_ENTER_TO instead of
	OMP_CLAUSE_TO_DECLARE clause.  Handle PRAGMA_OMP_CLAUSE_ENTER.
	(OMP_DECLARE_TARGET_CLAUSE_MASK): Add enter clause.
	(c_parser_omp_declare_target): Use OMP_CLAUSE_ENTER instead of
	OMP_CLAUSE_TO_DECLARE.
	* c-typeck.cc (c_finish_omp_clauses): Handle OMP_CLAUSE_ENTER instead
	of OMP_CLAUSE_TO_DECLARE, to OMP_CLAUSE_ENTER_TO use "to" as clause
	name in diagnostics instead of
	omp_clause_code_name[OMP_CLAUSE_CODE (c)].
gcc/cp/
	* parser.cc (cp_parser_omp_clause_name): Parse enter clause.
	(cp_parser_omp_all_clauses): For to clause on declare target, use
	OMP_CLAUSE_ENTER clause with OMP_CLAUSE_ENTER_TO instead of
	OMP_CLAUSE_TO_DECLARE clause.  Handle PRAGMA_OMP_CLAUSE_ENTER.
	(OMP_DECLARE_TARGET_CLAUSE_MASK): Add enter clause.
	(cp_parser_omp_declare_target): Use OMP_CLAUSE_ENTER instead of
	OMP_CLAUSE_TO_DECLARE.
	* semantics.cc (finish_omp_clauses): Handle OMP_CLAUSE_ENTER instead
	of OMP_CLAUSE_TO_DECLARE, to OMP_CLAUSE_ENTER_TO use "to" as clause
	name in diagnostics instead of
	omp_clause_code_name[OMP_CLAUSE_CODE (c)].
gcc/testsuite/
	* c-c++-common/gomp/clauses-3.c: Add tests with enter clause instead
	of to or modify some existing to clauses to enter.
	* c-c++-common/gomp/declare-target-1.c: Likewise.
	* c-c++-common/gomp/declare-target-2.c: Likewise.
	* c-c++-common/gomp/declare-target-3.c: Likewise.
	* g++.dg/gomp/attrs-9.C: Likewise.
	* g++.dg/gomp/declare-target-1.C: Likewise.
libgomp/
	* testsuite/libgomp.c-c++-common/target-40.c: Modify some existing to
	clauses to enter.
	* testsuite/libgomp.c/target-41.c: Likewise.
2022-05-27 12:48:48 +02:00
Tom de Vries
a624388b95 [nvptx] Add warp sync at simt exit
Consider this code (with N defined to 1024):
...
  float v = 0.0;
  #pragma omp target map(tofrom: v)
  #pragma omp parallel for simd
  for (int i = 0 ; i < N; i++)
    {
      #pragma omp atomic update
      v = v + 1.0;
    }
...

It hangs when executing on target board unix/-foffload=-misa=sm_75, using
drivers 470.103.01 and 510.54 on a T400 board (sm_75).

I'm tentatively identifying the problem as a bug in -muniform-simt for
architectures that support Independent Thread Scheduling (sm_70 and later).

The problem -muniform-simt is trying to address is to make sure that a
register produced outside an openmp simd region is available when used in any
lane inside an simd region.

The solution is to, outside an simd region, execute in all warp lanes, thus
producing consistent values in result registers in each warp thread.

This approach doesn't work when executing in all warp lanes multiplies the
side effects from 1 to 32 separate side effects, which is the case for atomic
insns.  So atomic insns are rewritten to execute only in lane 0, and if
there are any results, those are propagated to the other threads in the warp.
[ And likewise for system calls malloc, free, vprintf. ]

Now, consider a non-atomic update: ld, add, store.  The store has side
effects, are those multiplied or not?

Pre-sm_70 we can assume that at the end of an SIMT region, any divergent
control flow has reconverged, and we have a uniform warp, executing in lock
step.  So:
- the load will load the same value into the result register across the warp,
- the add will write the same value into the result register across the warp,
- the store will write the same value to the same memory location, 32 times,
  at once, having the result of a single store.
So, no side-effect multiplication (well, at least that's the observation).

Starting sm_70, the threads in a warp are no longer guaranteed to reconverge
after divergence.  There's a "Convergence Optimizer" that can can identify
that it is safe for a warp to reconverge, but that works only as long as the
code does not contain "synchronizing operations".

Consequently, the ld, add, store sequence can be executed by a non-uniform
warp, which means the side effects can have multiplied, and the registers are
no longer guarantueed to be in sync.

The atomic update in the example above is translated using an atom.cas loop,
which means that we have divergence (because only one thread is allowed to
succeed at a time) and the "Convergence Optimizer" doesn't reconverge probably
because the atom.cas counts as a "synchronizing operation".  So, it seems
plausible that the root cause for the mentioned hang is the problem described
above.

Fix this by adding an explicit warp sync at simt exit.

Note that we're assuming here that the warp will stay uniform until the next
SIMT region entry.

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-03-09  Tom de Vries  <tdevries@suse.de>

	PR target/104916
	PR target/104783
	* config/nvptx/nvptx.md (define_expand "omp_simt_exit"): Emit warp
	sync (or uniform warp check for mptx < 6.0).

libgomp/ChangeLog:

2022-03-15  Tom de Vries  <tdevries@suse.de>

	PR target/104916
	PR target/104783
	* testsuite/libgomp.c/pr104783-2.c: New test.
2022-03-22 14:35:34 +01:00
Tom de Vries
093cdadbce [openmp] Fix SIMT reduction using TRUTH_{AND,OR}IF_EXPR
Consider test-case pr104952-1.c, included in this commit, containing:
...
  #pragma omp target map(tofrom:result) map(to:arr)
  #pragma omp simd reduction(||: result)
...

When run on x86_64 with nvptx accelerator, the test-case either aborts or
hangs.

The reduction clause is translated by the SIMT code (active for nvptx) as a
butterfly reduction loop with this butterfly shuffle / update pair:
...
  D.2163 = D.2163 || .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164)
...
in the loop body.

The problem is that the butterfly shuffle is possibly not executed, while it
needs to be executed unconditionally.

Fix this by translating instead as:
...
  D.tmp_bfly = .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164)
  D.2163 = D.2163 || D.tmp_bfly
...

Tested on x86_64-linux with nvptx accelerator.

gcc/ChangeLog:

2022-03-17  Tom de Vries  <tdevries@suse.de>

	PR target/104952
	* omp-low.cc (lower_rec_input_clauses): Make sure GOMP_SIMT_XCHG_BFLY
	is executed unconditionally.

libgomp/ChangeLog:

2022-03-17  Tom de Vries  <tdevries@suse.de>

	PR target/104952
	* testsuite/libgomp.c/pr104952-1.c: New test.
	* testsuite/libgomp.c/pr104952-2.c: New test.
2022-03-18 15:45:13 +01:00
Tom de Vries
f07178ca3c [nvptx] Disable warp sync in simt region
I ran into a hang for this code:
...
  #pragma omp target map(tofrom: counter_N0)
  #pragma omp simd
  for (int i = 0 ; i < 1 ; i++ )
    {
      #pragma omp atomic update
      counter_N0 = counter_N0 + 1 ;
    }
...

This has to do with the nature of -muniform-simt.  It has two modes of
operation: inside and outside an SIMT region.

Outside an SIMT region, a warp pretends to execute a single thread, but
actually executes in all threads, to keep the local registers in all threads
consistent.  This approach works unless the insn that is executed is a syscall
or an atomic insn.  In that case, the insn is predicated, such that it
executes in only one thread.  If the predicated insn writes a result to a
register, then that register is propagated to the other threads, after which
the local registers in all threads are consistent again.

Inside an SIMT region, a warp executes in all threads.  However, the
predication and propagation for syscalls and atomic insns is also present
here, because nvptx_reorg_uniform_simt works on all code.  Care has been taken
though to ensure that the predication and propagation is a nop.  That is,
inside an SIMT region:
- the predicate evalutes to true for each thread, and
- the propagation insn copies a register from each thread to the same thread.

That works fine, until we use -mptx=6.0, and instead of using the deprecated
warp propagation insn shfl, we start using shfl.sync:
...
  @%r33 atom.add.u32		_, [%r29], 1;
	shfl.sync.idx.b32	%r30, %r30, %r32, 31, 0xffffffff;
...

The shfl.sync specifies a member mask indicating all threads, but given that
the loop only has a single iteration, only thread 0 will execute the insn,
where it will hang waiting for the other threads.

Fix this by predicating the shfl.sync (and likewise, bar.warp.sync and the
uniform warp check) such that it only executes outside the SIMT region.

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-03-08  Tom de Vries  <tdevries@suse.de>

	PR target/104783
	* config/nvptx/nvptx.cc (nvptx_init_unisimt_predicate)
	(nvptx_output_unisimt_switch): Handle unisimt_outside_simt_predicate.
	(nvptx_get_unisimt_outside_simt_predicate): New function.
	(predicate_insn): New function, factored out of ...
	(nvptx_reorg_uniform_simt): ... here.  Predicate all emitted insns.
	* config/nvptx/nvptx.h (struct machine_function): Add
	unisimt_outside_simt_predicate field.
	* config/nvptx/nvptx.md (define_insn "nvptx_warpsync")
	(define_insn "nvptx_uniform_warp_check"): Make predicable.

libgomp/ChangeLog:

2022-03-10  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.c/pr104783.c: New test.
2022-03-10 12:20:44 +01:00
Tom de Vries
f485b0ed7d [libgomp, testsuite, nvptx] Add -mptx=_ in declare-variant-3-sm*.c
When running with target board unix/-foffload=-mptx=3.1, we run into:
...
lto1: error: PTX version (-mptx) needs to be at least 4.2 to support \
  selected -misa (sm_53)^M
mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned \
  1 exit status^M
compilation terminated.^M
  ...
FAIL: libgomp.c/declare-variant-3-sm53.c (test for excess errors)
...

Fix this by adding -foffload=-mptx=_ in the libgomp.c/declare-variant-3-sm*.c
test-cases.

Tested on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-02-28  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.c/declare-variant-3-sm30.c: Add -foffload=-mptx=_.
	* testsuite/libgomp.c/declare-variant-3-sm35.c: Same.
	* testsuite/libgomp.c/declare-variant-3-sm53.c: Same.
	* testsuite/libgomp.c/declare-variant-3-sm70.c: Same.
	* testsuite/libgomp.c/declare-variant-3-sm75.c: Same.
	* testsuite/libgomp.c/declare-variant-3-sm80.c: Same.
2022-02-28 10:10:51 +01:00
Tom de Vries
59b8ade887 [libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm*.c
Add openmp test-cases that test the omp declare variant construct:
...
  #pragma omp declare variant (f30) match (device={isa("sm_30")})
...
using the available nvptx isas.

Only the one for sm_30 is a dg-do run test-case, the other ones are dg-do
link.

Tested on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-02-24  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.c/declare-variant-3-sm30.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm35.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm53.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm70.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm75.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm80.c: New test.
	* testsuite/libgomp.c/declare-variant-3.h: New header file.
2022-02-24 11:41:03 +01:00
Tom de Vries
5ed77fb3ed [libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end
Consider the following omp fragment.
...
  #pragma omp target
  #pragma omp parallel num_threads (2)
  #pragma omp task
    ;
...

This hangs at -O0 for nvptx.

Investigating the behaviour gives us the following trace of events:
- both threads execute GOMP_task, where they:
  - deposit a task, and
  - execute gomp_team_barrier_wake
- thread 1 executes gomp_team_barrier_wait_end and, not being the last thread,
  proceeds to wait at the team barrier
- thread 0 executes gomp_team_barrier_wait_end and, being the last thread, it
  calls gomp_barrier_handle_tasks, where it:
  - executes both tasks and marks the team barrier done
  - executes a gomp_team_barrier_wake which wakes up thread 1
- thread 1 exits the team barrier
- thread 0 returns from gomp_barrier_handle_tasks and goes to wait at
  the team barrier.
- thread 0 hangs.

To understand why there is a hang here, it's good to understand how things
are setup for nvptx.  The libgomp/config/nvptx/bar.c implementation is
a copy of the libgomp/config/linux/bar.c implementation, with uses of both
futex_wake and do_wait replaced with uses of ptx insn bar.sync:
...
  if (bar->total > 1)
    asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
...

The point where thread 0 goes to wait at the team barrier, corresponds in
the linux implementation with a do_wait.  In the linux case, the call to
do_wait doesn't hang, because it's waiting for bar->generation to become
a certain value, and if bar->generation already has that value, it just
proceeds, without any need for coordination with other threads.

In the nvtpx case, the bar.sync waits until thread 1 joins it in the same
logical barrier, which never happens: thread 1 is lingering in the
thread pool at the thread pool barrier (using a different logical barrier),
waiting to join a new team.

The easiest way to fix this is to revert to the posix implementation for
bar.{c,h}.  That however falls back on a busy-waiting approach, and
does not take advantage of the ptx bar.sync insn.

Instead, we revert to the linux implementation for bar.c,
and implement bar.c local functions futex_wait and futex_wake using the
bar.sync insn.

The bar.sync insn takes an argument specifying how many threads are
participating, and that doesn't play well with the futex syntax where it's
not clear in advance how many threads will be woken up.

This is solved by waking up all waiting threads each time a futex_wait or
futex_wake happens, and possibly going back to sleep with an updated thread
count.

Tested libgomp on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2021-04-20  Tom de Vries  <tdevries@suse.de>

	PR target/99555
	* config/nvptx/bar.c (generation_to_barrier): New function, copied
	from config/rtems/bar.c.
	(futex_wait, futex_wake): New function.
	(do_spin, do_wait): New function, copied from config/linux/wait.h.
	(gomp_barrier_wait_end, gomp_barrier_wait_last)
	(gomp_team_barrier_wake, gomp_team_barrier_wait_end):
	(gomp_team_barrier_wait_cancel_end, gomp_team_barrier_cancel): Remove
	and replace with include of config/linux/bar.c.
	* config/nvptx/bar.h (gomp_barrier_t): Add fields waiters and lock.
	(gomp_barrier_init): Init new fields.
	* testsuite/libgomp.c-c++-common/task-detach-6.c: Remove nvptx-specific
	workarounds.
	* testsuite/libgomp.c/pr99555-1.c: Same.
	* testsuite/libgomp.fortran/task-detach-6.f90: Same.
2022-02-22 15:48:03 +01:00
Marcel Vollweiler
bbb7f8604e C, C++, Fortran, OpenMP: Add 'has_device_addr' clause to 'target' construct.
This patch adds the 'has_device_addr' clause to the OpenMP 'target' construct
which was introduced in OpenMP 5.1 (OpenMP API 5.1 specification pp. 197ff):

	has_device_addr(list)

"The has_device_addr clause indicates that its list items already have device
addresses and therefore they may be directly accessed from a target device.
If the device address of a list item is not for the device on which the target
region executes, accessing the list item inside the region results in
unspecified behavior. The list items may include array sections." (p. 200)

"A list item may not be specified in both an is_device_ptr clause and a
has_device_addr clause on the directive." (p. 202)

"A list item that appears in an is_device_ptr or a has_device_addr clause must
not be specified in any data-sharing attribute clause on the same target
construct." (p. 203)

gcc/c-family/ChangeLog:

	* c-omp.cc (c_omp_split_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case.
	* c-pragma.h (enum pragma_kind): Added 5.1 in comment.
	(enum pragma_omp_clause): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR.

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_clause_name): Parse 'has_device_addr'
	clause.
	(c_parser_omp_variable_list): Handle array sections.
	(c_parser_omp_clause_has_device_addr): Added.
	(c_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR
	case.
	(c_parser_omp_target_exit_data): Added HAS_DEVICE_ADDR to
	OMP_CLAUSE_MASK.
	* c-typeck.cc (handle_omp_array_sections): Handle clause restrictions.
	(c_finish_omp_clauses): Handle array sections.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_clause_name): Parse 'has_device_addr' clause.
	(cp_parser_omp_var_list_no_open): Handle array sections.
	(cp_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR
	case.
	(cp_parser_omp_target_update): Added HAS_DEVICE_ADDR to OMP_CLAUSE_MASK.
	* semantics.cc (handle_omp_array_sections): Handle clause restrictions.
	(finish_omp_clauses): Handle array sections.

gcc/fortran/ChangeLog:

	* dump-parse-tree.cc (show_omp_clauses): Added OMP_LIST_HAS_DEVICE_ADDR
	case.
	* gfortran.h: Added OMP_LIST_HAS_DEVICE_ADDR.
	* openmp.cc (enum omp_mask2): Added OMP_CLAUSE_HAS_DEVICE_ADDR.
	(gfc_match_omp_clauses): Parse HAS_DEVICE_ADDR clause.
	(resolve_omp_clauses): Same.
	* trans-openmp.cc (gfc_trans_omp_variable_list): Added
	OMP_LIST_HAS_DEVICE_ADDR case.
	(gfc_trans_omp_clauses): Firstprivatize of array descriptors.

gcc/ChangeLog:

	* gimplify.cc (gimplify_scan_omp_clauses): Added cases for
	OMP_CLAUSE_HAS_DEVICE_ADDR
	and handle array sections.
	(gimplify_adjust_omp_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case.
	* omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_HAS_DEVICE_ADDR.
	(lower_omp_target): Same.
	* tree-core.h (enum omp_clause_code): Same.
	* tree-nested.cc (convert_nonlocal_omp_clauses): Same.
	(convert_local_omp_clauses): Same.
	* tree-pretty-print.cc (dump_omp_clause): Same.
	* tree.cc: Same.

libgomp/ChangeLog:

	* libgomp.texi: Updated entry for HAS_DEVICE_ADDR.
	* target.c (copy_firstprivate_data): Copy only if host address is not
	NULL.
	* testsuite/libgomp.c++/target-has-device-addr-2.C: New test.
	* testsuite/libgomp.c++/target-has-device-addr-4.C: New test.
	* testsuite/libgomp.c++/target-has-device-addr-5.C: New test.
	* testsuite/libgomp.c++/target-has-device-addr-6.C: New test.
	* testsuite/libgomp.c-c++-common/target-has-device-addr-1.c: New test.
	* testsuite/libgomp.c/target-has-device-addr-3.c: New test.
	* testsuite/libgomp.fortran/target-has-device-addr-1.f90: New test.
	* testsuite/libgomp.fortran/target-has-device-addr-2.f90: New test.
	* testsuite/libgomp.fortran/target-has-device-addr-3.f90: New test.
	* testsuite/libgomp.fortran/target-has-device-addr-4.f90: New test.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/clauses-1.c: Added has_device_addr to test cases.
	* g++.dg/gomp/attrs-1.C: Added has_device_addr to test cases.
	* g++.dg/gomp/attrs-2.C: Added has_device_addr to test cases.
	* c-c++-common/gomp/target-has-device-addr-1.c: New test.
	* c-c++-common/gomp/target-has-device-addr-2.c: New test.
	* c-c++-common/gomp/target-is-device-ptr-1.c: New test.
	* c-c++-common/gomp/target-is-device-ptr-2.c: New test.
	* gfortran.dg/gomp/is_device_ptr-3.f90: New test.
	* gfortran.dg/gomp/target-has-device-addr-1.f90: New test.
	* gfortran.dg/gomp/target-has-device-addr-2.f90: New test.
2022-02-09 23:47:12 -08:00
Jakub Jelinek
0af7ef050a libgomp: Fix segfault with posthumous orphan tasks [PR104385]
The following patch fixes crashes with posthumous orphan tasks.
When a parent task finishes, gomp_clear_parent clears the parent
pointers of its children tasks present in the parent->children_queue.
But children that are still waiting for dependencies aren't in that
queue yet, they will be added there only when the sibling they are
waiting for exits.  Unfortunately we were adding those tasks into
the queues with the original task->parent which then causes crashes
because that task is gone and freed.  The following patch fixes that
by clearing the parent field when we schedule such task for running
by adding it into the queues and we know that the sibling task which
is about to finish has NULL parent.

2022-02-08  Jakub Jelinek  <jakub@redhat.com>

	PR libgomp/104385
	* task.c (gomp_task_run_post_handle_dependers): If parent is NULL,
	clear task->parent.
	* testsuite/libgomp.c/pr104385.c: New test.
2022-02-08 09:30:17 +01:00
Thomas Schwinge
9fcc3a1dd2 Host and offload targets have no common meaning of address spaces
gcc/
	* tree-streamer-out.c (pack_ts_base_value_fields): Don't pack
	'TYPE_ADDR_SPACE' for offloading.
	* tree-streamer-in.c (unpack_ts_base_value_fields): Don't unpack
	'TYPE_ADDR_SPACE' for offloading.
	libgomp/
	* testsuite/libgomp.c/address-space-1.c: Remove 'dg-xfail-run-if'
	for 'offload_device_intel_mic'.
2022-01-13 11:16:20 +01:00
Jakub Jelinek
7adcbafe45 Update copyright years. 2022-01-03 10:42:10 +01:00
Chung-Lin Tang
6c0399378e OpenMP 5.0: Remove array section base-pointer mapping semantics and other front-end adjustments
This patch implements three pieces of functionality:

(1) Adjust array section mapping to have standards conforming behavior,
mapping array sections should *NOT* also map the base-pointer:

struct S { int *ptr; ... };
struct S s;

Instead of generating this during gimplify:
                              map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0])

Now, adjust to:

(i.e. do not map the base-pointer together. The attach operation is still
generated, and if s.ptr is already mapped prior, attachment will happen)

The correct way of achieving the base-pointer-also-mapped behavior would be to
use:

(A small Fortran front-end patch to trans-openmp.c:gfc_trans_omp_array_section
 is also included, which removes generation of a GOMP_MAP_ALWAYS_POINTER for
 array types, which appears incorrect and causes a regression in
 libgomp.fortranlibgomp.fortran/struct-elem-map-1.f90)

(2) Related to the first item above, are fixes in libgomp/target.c to not
overwrite attached pointers when handling device<->host copies, mainly for the
"always" case.

(3) The third is a set of changes to the C/C++ front-ends to extend the allowed
component access syntax in map clauses. These changes are enabled for both
OpenACC and OpenMP.

gcc/c/ChangeLog:

	* c-parser.c (struct omp_dim): New struct type for use inside
	c_parser_omp_variable_list.
	(c_parser_omp_variable_list): Allow multiple levels of array and
	component accesses in array section base-pointer expression.
	(c_parser_omp_clause_to): Set 'allow_deref' to true in call to
	c_parser_omp_var_list_parens.
	(c_parser_omp_clause_from): Likewise.
	* c-typeck.c (handle_omp_array_sections_1): Extend allowed range
	of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
	POINTER_PLUS_EXPR.
	(c_finish_omp_clauses): Extend allowed ranged of expressions
	involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.

gcc/cp/ChangeLog:

	* parser.c (struct omp_dim): New struct type for use inside
	cp_parser_omp_var_list_no_open.
	(cp_parser_omp_var_list_no_open): Allow multiple levels of array and
	component accesses in array section base-pointer expression.
	(cp_parser_omp_all_clauses): Set 'allow_deref' to true in call to
	cp_parser_omp_var_list for to/from clauses.
	* semantics.c (handle_omp_array_sections_1): Extend allowed range
	of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
	POINTER_PLUS_EXPR.
	(handle_omp_array_sections): Adjust pointer map generation of
	references.
	(finish_omp_clauses): Extend allowed ranged of expressions
	involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.

gcc/fortran/ChangeLog:

	* trans-openmp.c (gfc_trans_omp_array_section): Do not generate
	GOMP_MAP_ALWAYS_POINTER map for main array maps of ARRAY_TYPE type.

gcc/ChangeLog:

	* gimplify.c (extract_base_bit_offset): Add 'tree *offsetp' parameter,
	accomodate case where 'offset' return of get_inner_reference is
	non-NULL.
	(is_or_contains_p): Further robustify conditions.
	(omp_target_reorder_clauses): In alloc/to/from sorting phase, also
	move following GOMP_MAP_ALWAYS_POINTER maps along.  Add new sorting
	phase where we make sure pointers with an attach/detach map are ordered
	correctly.
	(gimplify_scan_omp_clauses): Add modifications to avoid creating
	GOMP_MAP_STRUCT and associated alloc map for attach/detach maps.

gcc/testsuite/ChangeLog:

	* c-c++-common/goacc/deep-copy-arrayofstruct.c: Adjust testcase.
	* c-c++-common/gomp/target-enter-data-1.c: New testcase.
	* c-c++-common/gomp/target-implicit-map-2.c: New testcase.

libgomp/ChangeLog:

	* target.c (gomp_map_vars_existing): Make sure attached pointer is
	not overwritten during cross-host/device copying.
	(gomp_update): Likewise.
	(gomp_exit_data): Likewise.
	* testsuite/libgomp.c++/target-11.C: Adjust testcase.
	* testsuite/libgomp.c++/target-12.C: Likewise.
	* testsuite/libgomp.c++/target-15.C: Likewise.
	* testsuite/libgomp.c++/target-16.C: Likewise.
	* testsuite/libgomp.c++/target-17.C: Likewise.
	* testsuite/libgomp.c++/target-21.C: Likewise.
	* testsuite/libgomp.c++/target-23.C: Likewise.
	* testsuite/libgomp.c/target-23.c: Likewise.
	* testsuite/libgomp.c/target-29.c: Likewise.
	* testsuite/libgomp.c-c++-common/target-implicit-map-2.c: New testcase.
2021-12-09 00:01:10 +08:00
Jakub Jelinek
5bca26742c openmp: Fix up handling of kind(host) and kind(nohost) in ACCEL_COMPILERs [PR103384]
As the testcase shows, we weren't handling kind(host) and kind(nohost) properly
in the ACCEL_COMPILERs, the code written in there is valid for the host
compiler only, where if we are maybe offloaded, we defer resolution after IPA,
otherwise return 0 for kind(nohost) and accept it for kind(host).  Note,
omp_maybe_offloaded is false after IPA.  If ACCEL_COMPILER is defined, it is
the other way around, but also we know we are after IPA.

2021-11-24  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/103384
gcc/
	* omp-general.c (omp_context_selector_matches): For ACCEL_COMPILER,
	return 0 for kind(host) and continue for kind(nohost).
libgomp/
	* testsuite/libgomp.c/declare-variant-2.c: New test.
2021-11-24 10:30:32 +01:00
Jakub Jelinek
d294459720 libgomp: Add a testcase for omp_get_num_teams inside of target inside of host teams
This is https://github.com/OpenMP/spec/issues/3183
There is an agreement that we should return 1 team inside of target,
even if that target is inside of host teams.  We were doing that
when offloading and not during host fallback, r12-5151 should fix that
even for host fallback.

2021-11-15  Jakub Jelinek  <jakub@redhat.com>

	* testsuite/libgomp.c/teams-5.c: New test.
2021-11-15 08:58:39 +01:00
Jakub Jelinek
7d6da11fce openmp: Honor OpenMP 5.1 num_teams lower bound
The following patch implements what I've been talking about earlier,
honor that for explicit num_teams clause we create at least the
lower-bound (if not specified, upper-bound) teams in the league.
For host fallback, it still means we only have one thread doing all the
teams, sequentially one after another.
For PTX and GCN, I think the new teams-2.c test and maybe teams-4.c too
will or might fail.
For these offloads, I think it is ok to remove symbols no longer used
from libgomp.a.
If num_teams_lower is bigger than the provided num_blocks or num_workgroups,
we should arrange for gomp_num_teams_var to be num_teams_lower - 1,
stop using the %ctaid.x or __builtin_gcn_dim_pos (0) for omp_get_team_num ()
and instead use for it some .shared var that GOMP_teams4 initializes to
%ctaid.x or __builtin_gcn_dim_pos (0) when first and for !first
increment that by num_blocks or num_workgroups each time and only
return false when we are above num_teams_lower.
Any help with actually implementing this for the 2 architectures highly
appreciated.

2021-11-12  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* omp-builtins.def (BUILT_IN_GOMP_TEAMS): Remove.
	(BUILT_IN_GOMP_TEAMS4): New.
	* builtin-types.def (BT_FN_VOID_UINT_UINT): Remove.
	(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
	* omp-low.c (lower_omp_teams): Use GOMP_teams4 instead of
	GOMP_teams, pass to it also num_teams lower-bound expression
	or a dup of upper-bound if it is missing and a flag whether
	it is the first call or not.
gcc/fortran/
	* types.def (BT_FN_VOID_UINT_UINT): Remove.
	(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
libgomp/
	* libgomp_g.h (GOMP_teams4): Declare.
	* libgomp.map (GOMP_5.1): Export GOMP_teams4.
	* target.c (GOMP_teams4): New function.
	* config/nvptx/target.c (GOMP_teams): Remove.
	(GOMP_teams4): New function.
	* config/gcn/target.c (GOMP_teams): Remove.
	(GOMP_teams4): New function.
	* testsuite/libgomp.c/teams-4.c (main): Expect exactly 2
	teams instead of <= 2.
	* testsuite/libgomp.c-c++-common/teams-2.c: New test.
2021-11-12 12:41:22 +01:00
Jakub Jelinek
fa4fcb111a libgomp: Use TLS storage for omp_get_num_teams()/omp_get_team_num() values
When thinking about GOMP_teams3, I've realized that using global variables
for the values returned by omp_get_num_teams()/omp_get_team_num() calls
is incorrect even with our right now dumb way of implementing host teams.
The problems are two, one is if host teams is used from multiple pthread_create
created threads - the spec says that host teams can't be nested inside of
explicit parallel or other teams constructs, but with pthread_create the
standard says obviously nothing about it.  Another more important thing
is host fallback, right now we don't do anything for omp_get_num_teams()
or omp_get_team_num() which was fine before host teams was introduced and
the 5.1 requirement that num_teams clause specifies minimum of teams, but
with the global vars it means inside of target teams num_teams (2) we happily
return omp_get_num_teams() == 4 if the target teams is inside of host teams
with num_teams(4).  With target fallback being invoked from parallel
regions global vars simply can't work right on the host.

So, this patch moves them to struct gomp_thread and propagates those for
parallel to child threads.  For host fallback, the implicit zeroing of
*thr results in us returning omp_get_num_teams () == 1 and
omp_get_team_num () == 0 which is fine for target teams without num_teams
clause, for target teams with num_teams clause something to work on and
for target without teams nested in it I've asked on omp-lang what should
be done.

2021-11-11  Jakub Jelinek  <jakub@redhat.com>

	* libgomp.h (struct gomp_thread): Add num_teams and team_num members.
	* team.c (struct gomp_thread_start_data): Likewise.
	(gomp_thread_start): Initialize thr->num_teams and thr->team_num.
	(gomp_team_start): Initialize start_data->num_teams and
	start_data->team_num.  Update nthr->num_teams and nthr->team_num.
	* teams.c (gomp_num_teams, gomp_team_num): Remove.
	(GOMP_teams_reg): Set and restore thr->num_teams and thr->team_num
	instead of gomp_num_teams and gomp_team_num.
	(omp_get_num_teams): Use thr->num_teams + 1 instead of gomp_num_teams.
	(omp_get_team_num): Use thr->team_num instead of gomp_team_num.
	* testsuite/libgomp.c/teams-4.c: New test.
2021-11-11 13:57:31 +01:00
Tobias Burnus
948d461954 OpenMP: Add strictly nested API call check [PR102972]
The teams construct only permits omp_get_num_teams and omp_get_team_num
as API call in strictly nested regions - check for it.

Additionally, for Fortran, using DECL_NAME does not show the mangled
name, hence, DECL_ASSEMBLER_NAME had to be used to.

Finally, 'target device(ancestor:1)' wrongly rejected non-API calls
as well.

	PR middle-end/102972
gcc/ChangeLog:

	* omp-low.c (omp_runtime_api_call): Use DECL_ASSEMBLER_NAME to get
	internal Fortran name; new permit_num_teams arg to permit
	omp_get_num_teams and omp_get_team_num.
	(scan_omp_1_stmt): Update call to it, add missing call for
	reverse offload, and check for strictly nested API calls in teams.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/target-device-ancestor-3.c: Add non-API
	routine test.
	* gfortran.dg/gomp/order-6.f90: Add missing bind(C).
	* c-c++-common/gomp/teams-3.c: New test.
	* gfortran.dg/gomp/teams-3.f90: New test.
	* gfortran.dg/gomp/teams-4.f90: New test.

libgomp/ChangeLog:
	* testsuite/libgomp.c-c++-common/icv-3.c: Nest API calls inside
	parallel construct.
	* testsuite/libgomp.c-c++-common/icv-4.c: Likewise.
	* testsuite/libgomp.c/target-3.c: Likewise.
	* testsuite/libgomp.c/target-5.c: Likewise.
	* testsuite/libgomp.c/target-6.c: Likewise.
	* testsuite/libgomp.c/target-teams-1.c: Likewise.
	* testsuite/libgomp.c/teams-1.c: Likewise.
	* testsuite/libgomp.c/thread-limit-2.c: Likewise.
	* testsuite/libgomp.c/thread-limit-3.c: Likewise.
	* testsuite/libgomp.c/thread-limit-4.c: Likewise.
	* testsuite/libgomp.c/thread-limit-5.c: Likewise.
	* testsuite/libgomp.fortran/icv-3.f90: Likewise.
	* testsuite/libgomp.fortran/icv-4.f90: Likewise.
	* testsuite/libgomp.fortran/teams1.f90: Likewise.
2021-10-30 23:45:32 +02:00
Jakub Jelinek
2084b5f42a openmp: Allow non-rectangular loops with pointer iterators
This patch handles pointer iterators for non-rectangular loops.  They are
more limited than integral iterators of non-rectangular loops, in particular
only var-outer, var-outer + a2, a2 + var-outer or var-outer - a2 can appear
in lb or ub where a2 is some integral loop invariant expression, so no e.g.
multiplication etc.

2021-10-27  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* omp-expand.c (expand_omp_for_init_counts): Handle non-rectangular
	iterators with pointer types.
	(expand_omp_for_init_vars, extract_omp_for_update_vars): Likewise.
gcc/c-family/
	* c-omp.c (c_omp_check_loop_iv_r): Don't clear 3rd bit for
	POINTER_PLUS_EXPR.
	(c_omp_check_nonrect_loop_iv): Handle POINTER_PLUS_EXPR.
	(c_omp_check_loop_iv): Set kind even if the iterator is non-integral.
gcc/testsuite/
	* c-c++-common/gomp/loop-8.c: New test.
	* c-c++-common/gomp/loop-9.c: New test.
libgomp/
	* testsuite/libgomp.c/loop-26.c: New test.
	* testsuite/libgomp.c/loop-27.c: New test.
2021-10-27 09:22:07 +02:00
Jakub Jelinek
a10794eafb openmp: Improve testsuite/libgomp.c/affinity-1.c testcase
I've noticed that while I have added hopefully sufficient test coverage
for the case where one uses simple number or !number as p-interval,
I haven't added any coverage for number:len:stride or number:len.

This patch adds that.

2021-10-15  Jakub Jelinek  <jakub@redhat.com>

	* testsuite/libgomp.c/affinity-1.c (struct places): Change name field
	type from char [50] to const char *.
	(places_array): Add a testcase for simplified syntax place followed
	by length or length and stride.
2021-10-15 17:19:54 +02:00
Jakub Jelinek
4a0fed0c0c openmp: Handle OpenMP 5.1 simplified OMP_PLACES syntax
In addition to adding ll_caches and numa_domain abstract names
to OMP_PLACES syntax, OpenMP 5.1 also added one syntax simplification:
https://github.com/OpenMP/spec/issues/2080
https://github.com/OpenMP/spec/pull/2081
in particular that in the grammar place non-terminal is now
not only { res-list } but also res (i.e. a non-negative integer),
which stands as a shortcut for { res }
So, one can specify OMP_PLACES=0,4,8,12 with the meaning
OMP_PLACES={0},{4},{8},{12} or OMP_PLACES=0:4 instead of OMP_PLACES={0}:4
or OMP_PLACES={0},{1},{2},{3} etc.

This patch implements that.

2021-10-15  Jakub Jelinek  <jakub@redhat.com>

	* env.c (parse_one_place): Handle non-negative-number the same
	as { non-negative-number }.  Reject even !number:1 and
	!number:1:stride or !place:1 or !place:1:stride instead of just
	length other than 1.
	* libgomp.texi (OpenMP 5.1): Document OMP_PLACES syntax extensions
	and OMP_NUM_TEAMS/OMP_TEAMS_THREAD_LIMIT and
	omp_{set_num,get_max}_teams/omp_{s,g}et_teams_thread_limit features
	as implemented.
	* testsuite/libgomp.c/affinity-1.c: Add a test for the 5.1 place
	simplified syntax.
2021-10-15 16:35:57 +02:00
Jakub Jelinek
4764049dd6 openmp: Fix up handling of OMP_PLACES=threads(1)
When writing the places-*.c tests, I've noticed that we mishandle threads
abstract name with specified num-places if num-places isn't a multiple of
number of hw threads in a core.  It then happily ignores the maximum count
and overwrites for the remaining hw threads in a core further places that
haven't been allocated.

2021-10-15  Jakub Jelinek  <jakub@redhat.com>

	* config/linux/affinity.c (gomp_affinity_init_level_1): For level 1
	after creating count places clean up and return immediately.
	* testsuite/libgomp.c/places-6.c: New test.
	* testsuite/libgomp.c/places-7.c: New test.
	* testsuite/libgomp.c/places-8.c: New test.
	* testsuite/libgomp.c/places-9.c: New test.
	* testsuite/libgomp.c/places-10.c: New test.
2021-10-15 16:25:25 +02:00
Jakub Jelinek
e7ce32c783 openmp: Add support for OMP_PLACES=numa_domains
This adds support for numa_domains abstract name in OMP_PLACES, also new
in OpenMP 5.1.

Way to test this is
OMP_PLACES=numa_domains OMP_DISPLAY_ENV=true LD_PRELOAD=.libs/libgomp.so.1 /bin/true
and see what it prints on OMP_PLACES line.
For non-NUMA machines it should print a single place that covers all CPUs,
for NUMA machine one place for each NUMA node with corresponding CPUs.

2021-10-15  Jakub Jelinek  <jakub@redhat.com>

	* env.c (parse_places_var): Handle numa_domains as level 5.
	* config/linux/affinity.c (gomp_affinity_init_numa_domains): New
	function.
	(gomp_affinity_init_level): Use it instead of
	gomp_affinity_init_level_1 for level == 5.
	* testsuite/libgomp.c/places-5.c: New test.
2021-10-15 12:16:50 +02:00
Jakub Jelinek
5809be05a2 openmp: Add support for OMP_PLACES=ll_caches
This patch implements support for ll_caches abstract name in OMP_PLACES,
which stands for places where logical cpus in each place share the last
level cache.

This seems to work fine for me on x86 and kernel sources show that it is
in common code, but on some machines on CompileFarm the files I'm using,
i.e.
/sys/devices/system/cpu/cpuN/cache/indexN/level
/sys/devices/system/cpu/cpuN/cache/indexN/shared_cpu_list
don't exist, is that because they have too old kernel and newer kernels
are fine or should I implement some fallback methods (which)?
E.g. on gcc112.fsffrance.org I see just shared_cpu_map and not shared_cpu_list
(with shared_cpu_map being harder to parse) and on another box I didn't even
see the cache subdirectories.

Way to test this is
OMP_PLACES=ll_caches OMP_DISPLAY_ENV=true LD_PRELOAD=.libs/libgomp.so.1 /bin/true
and see what it prints on OMP_PLACES line.

2021-10-15  Jakub Jelinek  <jakub@redhat.com>

	* env.c (parse_places_var): Handle ll_caches as level 4.
	* config/linux/affinity.c (gomp_affinity_find_last_cache_level): New
	function.
	(gomp_affinity_init_level_1): Handle level 4 as logical cpus sharing
	last level cache.
	(gomp_affinity_init_level): Likewise.
	* testsuite/libgomp.c/places-1.c: New test.
	* testsuite/libgomp.c/places-2.c: New test.
	* testsuite/libgomp.c/places-3.c: New test.
	* testsuite/libgomp.c/places-4.c: New test.
2021-10-15 12:06:51 +02:00
Jakub Jelinek
fab2f61dc1 vectorizer: Fix up -fsimd-cost-model= handling
>	* testsuite/libgomp.c++/scan-10.C: Add option -fvect-cost-model=cheap.

I don't think this is the right thing to do.
This just means that at some point between 2013 when -fsimd-cost-model has
been introduced and now -fsimd-cost-model= option at least partially stopped
working properly.
As documented, -fsimd-cost-model= overrides the -fvect-cost-model= setting
for OpenMP simd loops (loop->force_vectorize is true) if specified differently
from default.
In tree-vectorizer.h we have:
static inline bool
unlimited_cost_model (loop_p loop)
{
  if (loop != NULL && loop->force_vectorize
      && flag_simd_cost_model != VECT_COST_MODEL_DEFAULT)
    return flag_simd_cost_model == VECT_COST_MODEL_UNLIMITED;
  return (flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED);
}
and use it in various places, but we also just use flag_vect_cost_model
in lots of places (and in one spot use flag_simd_cost_model, not sure if
we are sure it is a force_vectorize loop or what).

So, IMHO we should change the above inline function to
loop_cost_model and let it return the cost model and then just
reimplement unlimited_cost_model as
return loop_cost_model (loop) == VECT_COST_MODEL_UNLIMITED;
and then adjust the direct uses of the flag and revert these changes.

2021-10-12  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree-vectorizer.h (loop_cost_model): New function.
	(unlimited_cost_model): Use it.
	* tree-vect-loop.c (vect_analyze_loop_costing): Use loop_cost_model
	call instead of flag_vect_cost_model.
	* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Likewise.
	(vect_prune_runtime_alias_test_list): Likewise.  Also use it instead
	of flag_simd_cost_model.
gcc/testsuite/
	* gcc.dg/gomp/simd-2.c: Remove option -fvect-cost-model=cheap.
	* gcc.dg/gomp/simd-3.c: Likewise.
libgomp/
	* testsuite/libgomp.c/scan-11.c: Remove option -fvect-cost-model=cheap.
	* testsuite/libgomp.c/scan-12.c: Likewise.
	* testsuite/libgomp.c/scan-13.c: Likewise.
	* testsuite/libgomp.c/scan-14.c: Likewise.
	* testsuite/libgomp.c/scan-15.c: Likewise.
	* testsuite/libgomp.c/scan-16.c: Likewise.
	* testsuite/libgomp.c/scan-17.c: Likewise.
	* testsuite/libgomp.c/scan-18.c: Likewise.
	* testsuite/libgomp.c/scan-19.c: Likewise.
	* testsuite/libgomp.c/scan-20.c: Likewise.
	* testsuite/libgomp.c/scan-21.c: Likewise.
	* testsuite/libgomp.c/scan-22.c: Likewise.
	* testsuite/libgomp.c++/scan-9.C: Likewise.
	* testsuite/libgomp.c++/scan-10.C: Likewise.
	* testsuite/libgomp.c++/scan-11.C: Likewise.
	* testsuite/libgomp.c++/scan-12.C: Likewise.
	* testsuite/libgomp.c++/scan-13.C: Likewise.
	* testsuite/libgomp.c++/scan-14.C: Likewise.
	* testsuite/libgomp.c++/scan-15.C: Likewise.
	* testsuite/libgomp.c++/scan-16.C: Likewise.
2021-10-12 09:28:10 +02:00
liuhongt
b4e81f6dd4 Adjust more testcases for O2 vectorization enabling.
libgomp/ChangeLog:

	* testsuite/libgomp.c++/scan-10.C: Add option -fvect-cost-model=cheap.
	* testsuite/libgomp.c++/scan-11.C: Ditto.
	* testsuite/libgomp.c++/scan-12.C: Ditto.
	* testsuite/libgomp.c++/scan-13.C: Ditto.
	* testsuite/libgomp.c++/scan-14.C: Ditto.
	* testsuite/libgomp.c++/scan-15.C: Ditto.
	* testsuite/libgomp.c++/scan-16.C: Ditto.
	* testsuite/libgomp.c++/scan-9.C: Ditto.
	* testsuite/libgomp.c-c++-common/lastprivate-conditional-7.c: Ditto.
	* testsuite/libgomp.c-c++-common/lastprivate-conditional-8.c: Ditto.
	* testsuite/libgomp.c/scan-11.c: Ditto.
	* testsuite/libgomp.c/scan-12.c: Ditto.
	* testsuite/libgomp.c/scan-13.c: Ditto.
	* testsuite/libgomp.c/scan-14.c: Ditto.
	* testsuite/libgomp.c/scan-15.c: Ditto.
	* testsuite/libgomp.c/scan-16.c: Ditto.
	* testsuite/libgomp.c/scan-17.c: Ditto.
	* testsuite/libgomp.c/scan-18.c: Ditto.
	* testsuite/libgomp.c/scan-19.c: Ditto.
	* testsuite/libgomp.c/scan-20.c: Ditto.
	* testsuite/libgomp.c/scan-21.c: Ditto.
	* testsuite/libgomp.c/scan-22.c: Ditto.

gcc/testsuite/ChangeLog:

	* g++.dg/tree-ssa/pr94403.C: Add -fno-tree-vectorize
	* gcc.dg/optimize-bswapsi-5.c: Ditto.
	* gcc.dg/optimize-bswapsi-6.c: Ditto.
	* gcc.dg/Warray-bounds-51.c: Add additional option
	-mtune=generic for target x86/i?86
	* gcc.dg/Wstringop-overflow-14.c: Ditto.
2021-10-09 16:28:11 +08:00
Thomas Schwinge
086bb917d6 'libgomp.c/target-43.c': '-latomic' for nvptx offloading
... to avoid a regression with recent
commit 090f0d78f1
"openmp: Improve expand_omp_atomic_pipeline":

    unresolved symbol __atomic_compare_exchange_1
    collect2: error: ld returned 1 exit status
    mkoffload: fatal error: [...]/gcc/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status

	libgomp/
	* testsuite/libgomp.c/target-43.c: '-latomic' for nvptx offloading.
2021-09-06 11:51:13 +02:00
Thomas Schwinge
29c355f76c Add 'libgomp.c/address-space-1.c'
Intel MIC (emulated) offloading execution failure remains to be analyzed.

	libgomp/
	* testsuite/libgomp.c/address-space-1.c: New file.

Co-authored-by: Jakub Jelinek <jakub@redhat.com>
2021-08-23 17:46:08 +02:00
Tobias Burnus
432de08498 OpenMP 5.1: Add proc-bind 'primary' support
In OpenMP 5.1 "master thread" was changed to "primary thread" and
the proc_bind clause and the OMP_PROC_BIND environment variable
now take 'primary' as argument as alias for 'master', while the
latter is deprecated.
This commit accepts 'primary' and adds the named constant
omp_proc_bind_primary and changes 'master thread' in the
documentation; however, given that not even OpenMP 5.0 is
fully supported, omp_display_env and the dumps currently
still output 'master' and there is no deprecation warning
when using the 'master' in the proc_bind clause.

gcc/c/ChangeLog:

	* c-parser.c (c_parser_omp_clause_proc_bind): Accept
	'primary' as alias for 'master'.

gcc/cp/ChangeLog:

	* parser.c (cp_parser_omp_clause_proc_bind): Accept
	'primary' as alias for 'master'.

gcc/fortran/ChangeLog:

	* gfortran.h (gfc_omp_proc_bind_kind): Add OMP_PROC_BIND_PRIMARY.
	* dump-parse-tree.c (show_omp_clauses): Add TODO comment to
	change 'master' to 'primary' in proc_bind for OpenMP 5.1.
	* intrinsic.texi (OMP_LIB): Mention OpenMP 5.1; add
	omp_proc_bind_primary.
	* openmp.c (gfc_match_omp_clauses): Accept
	'primary' as alias for 'master'.
	* trans-openmp.c (gfc_trans_omp_clauses): Handle
	OMP_PROC_BIND_PRIMARY.

gcc/ChangeLog:

	* tree-core.h (omp_clause_proc_bind_kind): Add
	OMP_CLAUSE_PROC_BIND_PRIMARY.
	* tree-pretty-print.c (dump_omp_clause): Add TODO comment to
	change 'master' to 'primary' in proc_bind for OpenMP 5.1.

libgomp/ChangeLog:

	* env.c (parse_bind_var): Accept 'primary' as alias for
	'master'.
	(omp_display_env): Add TODO comment to
	change 'master' to 'primary' in proc_bind for OpenMP 5.1.
	* libgomp.texi: Change 'master thread' to 'primary thread'
	in line with OpenMP 5.1.
	(omp_get_proc_bind): Add omp_proc_bind_primary and note that
	omp_proc_bind_master is an alias of it.
	(OMP_PROC_BIND): Mention 'PRIMARY'.
	* omp.h.in (__GOMP_DEPRECATED_5_1): Define.
	(omp_proc_bind_primary): Add.
	(omp_proc_bind_master): Deprecate for OpenMP 5.1.
	* omp_lib.f90.in (omp_proc_bind_primary): Add.
	(omp_proc_bind_master): Deprecate for OpenMP 5.1.
	* omp_lib.h.in (omp_proc_bind_primary): Add.
	* testsuite/libgomp.c/affinity-1.c: Check that
	'primary' works and is identical to 'master'.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/pr61486-2.c: Duplicate one proc_bind(master)
	testcase and test proc_bind(primary) instead.
	* gfortran.dg/gomp/affinity-1.f90: Likewise.
2021-08-12 15:49:49 +02:00
Tobias Burnus
33c4e46624 Add 'default' to -foffload=; document that flag [PR67300]
As -foffload={options,targets,targets=options} is very convoluted,
it has been split into -foffload=targets (supporting the old syntax
for backward compatibilty) and -foffload-options={options,target=options}.

Only the new syntax is documented.

Additionally, -foffload=default is supported, which can reset the
devices after -foffload=disable / -foffload=targets to the default,
if needed.

gcc/ChangeLog:

	PR other/67300
	* common.opt (-foffload=): Update description.
	(-foffload-options=): New.
	* doc/invoke.texi (C Language Options): Document
	-foffload and -foffload-options.
	* gcc.c (check_offload_target_name): New, split off from
	handle_foffload_option.
	(check_foffload_target_names): New.
	(handle_foffload_option): Handle -foffload=default.
	(driver_handle_option): Update for -foffload-options.
	* lto-opts.c (lto_write_options): Use -foffload-options
	instead of -foffload.
	* lto-wrapper.c (merge_and_complain, append_offload_options):
	Likewise.
	* opts.c (common_handle_option): Likewise.

libgomp/ChangeLog:

	PR other/67300
	* testsuite/libgomp.c-c++-common/reduction-16.c: Replace
	-foffload=nvptx-none= by -foffload-options=nvptx-none= to
	avoid disabling other offload targets.
	* testsuite/libgomp.c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.c-c++-common/reduction-6.c: Likewise.
	* testsuite/libgomp.c/target-44.c: Likewise.
2021-06-29 16:00:04 +02:00
Thomas Schwinge
abf937ac00 'libgomp.c/target-44.c': Restrict '-latomic' to nvptx offloading compilation
Fix-up for recent commit f87990a2a8
"[openmp, simt] Disable SIMT for user-defined reduction"; see commit
d42088e453 "Avoid -latomic for amdgcn
offloading".

	libgomp/
	* testsuite/libgomp.c/target-44.c: Restrict '-latomic' to nvptx
	offloading compilation.
2021-05-18 12:57:35 +02:00
Martin Liska
810afb0b5f testsuite: prune new LTO warning
libgomp/ChangeLog:

	PR testsuite/100569
	* testsuite/libgomp.c/omp-nested-3.c: Prune new LTO warning.
	* testsuite/libgomp.c/pr46032-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c: Likewise.

gcc/testsuite/ChangeLog:

	PR testsuite/100569
	* gcc.dg/atomic/c11-atomic-exec-2.c: Prune new LTO warning.
	* gcc.dg/torture/pr94947-1.c: Likewise.
2021-05-13 09:24:23 +02:00
Jakub Jelinek
98acbb3111 openmp: Fix up taskloop reduction ICE if taskloop has no iterations [PR100471]
When a taskloop doesn't have any iterations, GOMP_taskloop* takes an early
return, doesn't create any tasks and more importantly, doesn't create
a taskgroup and doesn't register task reductions.  But, the code emitted
in the callers assumes task reductions have been registered and performs
the reduction handling and task reduction unregistration.  The pointer
to the task reduction private variables is reused, on input it is the alignment
and only on output it is the pointer, so in the case taskloop with no iterations
the caller attempts to dereference the alignment value as if it was a pointer
and crashes.  We could in the early returns register the task reductions
only to have them looped over and unregistered in the caller, but I think
it is better to tell the caller there is nothing to task reduce and bypass
all that.

2021-05-11  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/100471
	* omp-low.c (lower_omp_task_reductions): For OMP_TASKLOOP, if data
	is 0, bypass the reduction loop including
	GOMP_taskgroup_reduction_unregister call.

	* taskloop.c (GOMP_taskloop): If GOMP_TASK_FLAG_REDUCTION and not
	GOMP_TASK_FLAG_NOGROUP, when doing early return clear the task
	reduction pointer.
	* testsuite/libgomp.c/task-reduction-4.c: New test.
2021-05-11 09:07:47 +02:00
Tom de Vries
f87990a2a8 [openmp, simt] Disable SIMT for user-defined reduction
The test-case included in this patch contains this target region:
...
  for (int i0 = 0 ; i0 < N0 ; i0++ )
    counter_N0.i += 1;
...

When running with nvptx accelerator, the counter variable is expected to
be N0 after the region, but instead is N0 / 32.  The problem is that rather
than getting the result for all warp lanes, we get it for just one lane.

This is caused by the implementation of SIMT being incomplete.  It handles
regular reductions, but appearantly not user-defined reductions.

For now, handle this by disabling SIMT in this case, specifically by setting
sctx->max_vf to 1.

Tested libgomp on x86_64-linux with nvptx accelerator.

gcc/ChangeLog:

2021-05-03  Tom de Vries  <tdevries@suse.de>

	PR target/100321
	* omp-low.c (lower_rec_input_clauses): Disable SIMT for user-defined
	reduction.

libgomp/ChangeLog:

2021-05-03  Tom de Vries  <tdevries@suse.de>

	PR target/100321
	* testsuite/libgomp.c/target-44.c: New test.
2021-05-03 23:13:59 +02:00
Tom de Vries
fc14ff6111 [omp, simt] Handle alternative IV
Consider the test-case libgomp.c/pr81778.c added in this commit, with
this core loop (note: CANARY_SIZE set to 0 for simplicity):
...
  int s = 1;
  #pragma omp target simd
  for (int i = N - 1; i > -1; i -= s)
    a[i] = 1;
...
which, given that N is 32, sets a[0..31] to 1.

After omp-expand, this looks like:
...
  <bb 5> :
  simduid.7 = .GOMP_SIMT_ENTER (simduid.7);
  .omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7);
  D.3193 = -s;
  s.9 = s;
  D.3204 = .GOMP_SIMT_LANE ();
  D.3205 = -s.9;
  D.3206 = (int) D.3204;
  D.3207 = D.3205 * D.3206;
  i = D.3207 + 31;
  D.3209 = 0;
  D.3210 = -s.9;
  D.3211 = D.3210 - i;
  D.3210 = -s.9;
  D.3212 = D.3211 / D.3210;
  D.3213 = (unsigned int) D.3212;
  D.3213 = i >= 0 ? D.3213 : 0;

  <bb 19> :
  if (D.3209 < D.3213)
    goto <bb 6>; [87.50%]
  else
    goto <bb 7>; [12.50%]

  <bb 6> :
  a[i] = 1;
  D.3215 = -s.9;
  D.3219 = .GOMP_SIMT_VF ();
  D.3216 = (int) D.3219;
  D.3220 = D.3215 * D.3216;
  i = D.3220 + i;
  D.3209 = D.3209 + 1;
  goto <bb 19>; [100.00%]
...

On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending
on the lane that is executing) at bb entry.

So we have the following sequence:
- a[0..31] is set to 1
- i is updated to -32..-1
- D.3209 is updated to 1 (being 0 initially)
- bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates
  to true
- bb6 is once more executed, which should not happen because all the elements
  that needed to be handled were already handled.
- consequently, elements that should not be written are written
- with CANARY_SIZE == 0, we may run into a libgomp error:
  ...
  libgomp: cuCtxSynchronize error: an illegal memory access was encountered
  ...
  and with CANARY_SIZE unmodified, we run into:
  ...
  Expected 0, got 1 at base[-961]
  Aborted (core dumped)
  ...

The cause of this is as follows:
- because the step s is a variable rather than a constant, an alternative
  IV (D.3209 in our example) is generated in expand_omp_simd, and the
  loop condition is tested in terms of the alternative IV rather than
  the original IV (i in our example).
- the SIMT code in expand_omp_simd works by modifying step and initial value.
- The initial value fd->loop.n1 is loaded into a variable n1, which is
  modified by the SIMT code and then used there-after.
- The step fd->loop.step is loaded into a variable step, which is modified
  by the SIMT code, but afterwards there are uses of both step and
  fd->loop.step.
- There are uses of fd->loop.step in the alternative IV handling code,
  which should use step instead.

Fix this by introducing an additional variable orig_step, which is not
modified by the SIMT code and replacing all remaining uses of fd->loop.step
by either step or orig_step.

Build on x86_64-linux with nvptx accelerator, tested libgomp.

This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
with driver 450.66.

gcc/ChangeLog:

2020-10-02  Tom de Vries  <tdevries@suse.de>

	* omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of
	fd->loop.step by either step or orig_step.

libgomp/ChangeLog:

2020-10-02  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.c/pr81778.c: New test.
2021-04-29 14:37:32 +02:00