FreeChainXenon/gcc - Aiden Isik's Forgejo Server

Author	SHA1	Message	Date
Thomas Schwinge	30e42bb66d	libgomp testsuite: Have each '.exp' file specify the compiler to use [PR91884] ..., which is still 'GCC_UNDER_TEST' for all of them; no change in behavior. PR testsuite/91884 libgomp/ testsuite/lib/libgomp.exp (libgomp_target_compile): Don't specify compiler. * testsuite/libgomp.c++/c++.exp (ALWAYS_CFLAGS): Specify compiler. * testsuite/libgomp.c/c.exp (ALWAYS_CFLAGS): Likewise. * testsuite/libgomp.fortran/fortran.exp (ALWAYS_CFLAGS): Likewise. * testsuite/libgomp.graphite/graphite.exp (ALWAYS_CFLAGS): Likewise. * testsuite/libgomp.oacc-c++/c++.exp (ALWAYS_CFLAGS): Likewise. * testsuite/libgomp.oacc-c/c.exp (ALWAYS_CFLAGS): Likewise. * testsuite/libgomp.oacc-fortran/fortran.exp (ALWAYS_CFLAGS): Likewise.	2023-05-15 12:11:17 +02:00
Thomas Schwinge	b1cdda9f36	libgomp testsuite: Localize 'lang_[...]' etc. ..., instead of letting them bleed into the next '.exp' file, requiring clean-up there. libgomp/ testsuite/libgomp.c++/c++.exp: Localize 'lang_[...]' etc. * testsuite/libgomp.c/c.exp: Likewise. * testsuite/libgomp.fortran/fortran.exp: Likewise. * testsuite/libgomp.graphite/graphite.exp: Likewise. * testsuite/libgomp.oacc-c++/c++.exp: Likewise. * testsuite/libgomp.oacc-c/c.exp: Likewise. * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.	2023-05-12 09:15:31 +02:00
Thomas Schwinge	2ed5ceba0f	libgomp testsuite: Use 'lang_test_file_found' instead of 'lang_test_file' The value of 'lang_test_file' isn't actually used anywhere. libgomp/ * testsuite/libgomp.c++/c++.exp: Don't set 'lang_test_file'. * testsuite/libgomp.fortran/fortran.exp: Likewise. * testsuite/libgomp.oacc-c++/c++.exp: Likewise. * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise. * testsuite/libgomp.c/c.exp: Unset 'lang_test_file_found' instead of 'lang_test_file'. * testsuite/libgomp.oacc-c/c.exp: Likewise. * testsuite/libgomp.graphite/graphite.exp: Likewise. * testsuite/lib/libgomp.exp (libgomp_target_compile): Look for 'lang_test_file_found' instead of 'lang_test_file'.	2023-05-09 14:32:45 +02:00
Tobias Burnus	1c101fcfaa	'omp scan' struct block seq update for OpenMP 5.x While OpenMP 5.0 required a single structured block before and after the 'omp scan' directive, OpenMP 5.1 changed this to a 'structured block sequence, denoting 2 or more executable statements in OpenMP 5.1 (whoops!) and zero or more in OpenMP 5.2. This commit updates C/C++ to accept zero statements (but till requires the '{' ... '}' for the final-loop-body) and updates Fortran to accept zero or more than one statements. If there is no preceeding or succeeding executable statement, a warning is shown. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_scan_loop_body): Handle zero exec statements before/after 'omp scan'. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_scan_loop_body): Handle zero exec statements before/after 'omp scan'. gcc/fortran/ChangeLog: * openmp.cc (gfc_resolve_omp_do_blocks): Handle zero or more than one exec statements before/after 'omp scan'. * trans-openmp.cc (gfc_trans_omp_do): Likewise. libgomp/ChangeLog: * testsuite/libgomp.c-c++-common/scan-1.c: New test. * testsuite/libgomp.c/scan-23.c: New test. * testsuite/libgomp.fortran/scan-2.f90: New test. gcc/testsuite/ChangeLog: * g++.dg/gomp/attrs-7.C: Update dg-error/dg-warning. * gfortran.dg/gomp/loop-2.f90: Likewise. * gfortran.dg/gomp/reduction5.f90: Likewise. * gfortran.dg/gomp/reduction6.f90: Likewise. * gfortran.dg/gomp/scan-1.f90: Likewise. * gfortran.dg/gomp/taskloop-2.f90: Likewise. * c-c++-common/gomp/scan-6.c: New test. * gfortran.dg/gomp/scan-8.f90: New test.	2023-04-25 16:29:14 +02:00
Kwok Cheung Yeung	ce9cd7258d	amdgcn: Enable SIMD vectorization of math functions Calls to vectorized versions of routines in the math library will now be inserted when vectorizing code containing supported math functions. 2023-03-02 Kwok Cheung Yeung <kcy@codesourcery.com> Paul-Antoine Arras <pa@codesourcery.com> gcc/ * builtins.cc (mathfn_built_in_explicit): New. * config/gcn/gcn.cc: Include case-cfn-macros.h. (mathfn_built_in_explicit): Add prototype. (gcn_vectorize_builtin_vectorized_function): New. (gcn_libc_has_function): New. (TARGET_LIBC_HAS_FUNCTION): Define. (TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION): Define. gcc/testsuite/ * gcc.target/gcn/simd-math-1.c: New testcase. * gcc.target/gcn/simd-math-2.c: New testcase. libgomp/ * testsuite/libgomp.c/simd-math-1.c: New testcase.	2023-03-02 20:56:53 +00:00
Jakub Jelinek	46644ec99c	openmp: Fix up OpenMP expansion of non-rectangular loops [PR108459] expand_omp_for_init_counts was using for the case where collapse(2) inner loop has init expression dependent on non-constant multiple of the outer iterator and the condition upper bound expression doesn't depend on the outer iterator fold_unary (NEGATE_EXPR, ...). This will just return NULL if it can't be folded, we need fold_build1 instead. 2023-01-19 Jakub Jelinek <jakub@redhat.com> PR middle-end/108459 * omp-expand.cc (expand_omp_for_init_counts): Use fold_build1 rather than fold_unary for NEGATE_EXPR. * testsuite/libgomp.c/pr108459.c: New test.	2023-01-19 21:00:08 +01:00
Jakub Jelinek	83ffe9cde7	Update copyright years.	2023-01-16 11:52:17 +01:00
Paul-Antoine Arras	1fd508744e	amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors Add support for gfx803 as an alias for fiji. Add test cases for all supported 'isa' values. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_omp_device_kind_arch_isa): Add gfx803. * config/gcn/t-omp-device: Add gfx803. libgomp/ChangeLog: * testsuite/libgomp.c/declare-variant-4-fiji.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx803.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx900.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx906.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx908.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx90a.c: New test. * testsuite/libgomp.c/declare-variant-4.h: New header file.	2022-11-30 10:51:42 +01:00
Sandra Loosemore	309e2d95e3	OpenMP: Generate SIMD clones for functions with "declare target" This patch causes the IPA simdclone pass to generate clones for functions with the "omp declare target" attribute as if they had "omp declare simd", provided the function appears to be suitable for SIMD execution. The filter is conservative, rejecting functions that write memory or that call other functions not known to be safe. A new option -fopenmp-target-simd-clone is added to control this transformation; it's enabled for offload processing at -O2 and higher. gcc/ChangeLog: * common.opt (fopenmp-target-simd-clone): New option. (target_simd_clone_device): New enum to go with it. * doc/invoke.texi (-fopenmp-target-simd-clone): Document. * flag-types.h (enum omp_target_simd_clone_device_kind): New. * omp-simd-clone.cc (auto_simd_fail): New function. (auto_simd_check_stmt): New function. (plausible_type_for_simd_clone): New function. (ok_for_auto_simd_clone): New function. (simd_clone_create): Add force_local argument, make the symbol have internal linkage if it is true. (expand_simd_clones): Also check for cloneable functions with "omp declare target". Pass explicit_p argument to simd_clone.compute_vecsize_and_simdlen target hook. * opts.cc (default_options_table): Add -fopenmp-target-simd-clone. * target.def (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN): Add bool explicit_p argument. * doc/tm.texi: Regenerated. * config/aarch64/aarch64.cc (aarch64_simd_clone_compute_vecsize_and_simdlen): Update. * config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen): Update. * config/i386/i386.cc (ix86_simd_clone_compute_vecsize_and_simdlen): Update. gcc/testsuite/ChangeLog: * g++.dg/gomp/target-simd-clone-1.C: New. * g++.dg/gomp/target-simd-clone-2.C: New. * gcc.dg/gomp/target-simd-clone-1.c: New. * gcc.dg/gomp/target-simd-clone-2.c: New. * gcc.dg/gomp/target-simd-clone-3.c: New. * gcc.dg/gomp/target-simd-clone-4.c: New. * gcc.dg/gomp/target-simd-clone-5.c: New. * gcc.dg/gomp/target-simd-clone-6.c: New. * gcc.dg/gomp/target-simd-clone-7.c: New. * gcc.dg/gomp/target-simd-clone-8.c: New. * lib/scanoffloadipa.exp: New. libgomp/ChangeLog: * testsuite/lib/libgomp.exp: Load scanoffloadipa.exp library. * testsuite/libgomp.c/target-simd-clone-1.c: New. * testsuite/libgomp.c/target-simd-clone-2.c: New. * testsuite/libgomp.c/target-simd-clone-3.c: New.	2022-11-25 18:13:22 +00:00
Thomas Schwinge	b61796663b	Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c' That is, '-mptx=_' is only valid in '-foffload-options=nvptx-none', too. Fix test case added in recent commit r13-2625-g6b43f556f392a7165582aca36a19fe7389d995b2 "nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible". libgomp/ * testsuite/libgomp.c/reverse-offload-sm30.c: Fix nvptx-specific '-foffload-options' syntax.	2022-10-17 13:50:57 +02:00
Jakub Jelinek	a58a965eb7	libgomp: Fix up creation of artificial teams When not in explicit parallel/target/teams construct, we in some cases create an artificial parallel with a single thread (either to handle target nowait or for task reduction purposes). In those cases, it handled again artificially created implicit task (created by gomp_new_icv for cases where we needed to write to some ICVs), but as the testcases show, didn't take into account possibility of this being done from explicit task(s). The code would destroy/free the previous task and replace it with the new implicit task. If task is an explicit task (when teams is NULL, all explicit tasks behave like if (0)), it is a pointer to a local stack variable, so freeing it doesn't work, and additionally we shouldn't lose the explicit tasks - the new implicit task should instead replace the ancestor task which is the first implicit one. 2022-10-12 Jakub Jelinek <jakub@redhat.com> * task.c (gomp_create_artificial_team): Fix up handling of invocations from within explicit task. * target.c (GOMP_target_ext): Likewise. * testsuite/libgomp.c/task-7.c: New test. * testsuite/libgomp.c/task-8.c: New test. * testsuite/libgomp.c-c++-common/task-reduction-17.c: New test. * testsuite/libgomp.c-c++-common/task-reduction-18.c: New test.	2022-10-12 17:54:08 +02:00
Tobias Burnus	6b43f556f3	nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible Reverse offload requests at least -misa=sm_35; with this patch, a warning instead of an error is shown, still permitting reverse offload for all other configured device types. This is achieved by not calling GOMP_offload_register_ver (and stopping generating pointless 'static const char' variables, once known.) The tool_name as progname changes adds "nvptx " and "gcn " to the "mkoffload: warning/error:" diagnostic. gcc/ChangeLog: * config/nvptx/mkoffload.cc (process): Replace a fatal_error by a warning + not enabling offloading if -misa=sm_30 prevents reverse offload. (main): Use tool_name as progname for diagnostic. * config/gcn/mkoffload.cc (main): Likewise. libgomp/ChangeLog: * libgomp.texi (Offload-Target Specifics: nvptx): Document that reverse offload requires >= -march=sm_35. * testsuite/libgomp.c-c++-common/requires-4.c: Build for nvptx with -misa=sm_35. * testsuite/libgomp.c-c++-common/requires-5.c: Likewise. * testsuite/libgomp.c-c++-common/requires-6.c: Likewise. * testsuite/libgomp.c-c++-common/reverse-offload-1.c: Likewise. * testsuite/libgomp.fortran/reverse-offload-1.f90: Likewise. * testsuite/libgomp.c/reverse-offload-sm30.c: New test.	2022-09-12 15:25:13 +02:00
Jakub Jelinek	f25a6767ec	openmp: Implement doacross(sink: omp_cur_iteration - 1) This patch implements doacross(sink: omp_cur_iteration - 1) that the previous patchset emitted a sorry on during omp expansion. It can be implemented with existing library functions. To recap, depend(source)/doacross(source:)/doacross(source:omp_cur_iteration) is implemented calling GOMP_doacross_post or GOMP_doacross_ull_post, called with an array of long or unsigned long long elements, one for all collapsed loops together and one for each further ordered loop if any. We initialize that array in each thread when grabbing further set of iterations and update it at the end of loops, so that it represents the current iteration (as 0 based counters). When the worksharing loop is created, we tell the library through another similar array the counts (the loop needs to be rectangular) in each dimension, first element is count of all logical iterations in the collapsed loops. depend(sink:v1 op N1, v2 op N2, ...) is then implemented by conditionally calling GOMP_doacross_wait/GOMP_doacross_ull_wait. For N? of 0 there is no check, otherwise if it wants to wait in a particular dimension for a previous iteration, we check that the corresponding iterator isn't the first one (or first few), where the previous iterator in that dimension would be out of range, and similarly for checking of next iteration in a dimension that it isn't the last one (or last few) where it would be similarly out of bounds. Then the collapsed loop counters are folded into a single 0 based counter (first argument) and then other 0 based iterations counters on what iteration it should wait for. Now, doacross(sink: omp_cur_iteration - 1) is supposed to wait for the previous logical iteration in the combined iteration space of all ordered loops. For the very first iteration in that combined iteration space it does nothing, there is no previous iteration. And similarly it does nothing if there are more ordered loops than collapsed loop and it isn't the first logical iteration of the combined loops inside of the collapsed loops, because as implemented we know the previous iteration in that case is always executed by the same thread as the current one. In the implementation, we use the same value as is stored in the first element of the array for GOMP_doacross_post/GOMP_doacross_ull_post, if that value is 0, we do nothing. The rest is different based on if ordered argument is equal to collapse or not. If it is, then we otherwise call GOMP_doacross_wait/GOMP_doacross_ull_wait with a single argument, one less than that counter we compare against 0. If ordered argument is bigger than collapse, we add a per-thread boolean variable .first.N, which we set to true at the start of the outermost ordered loop inside of the collapsed set of loops and set to false at the end of the innermost ordered loop. If .first.N is false, we don't do anything (we know the previous iteration was handled by the current thread and by my reading of the spec we don't need to emit even a memory barrier in that case, because it is just synchronization with the same thread), otherwise we call GOMP_doacross_wait/GOMP_doacross_ull_wait with the first argument one less than the counter we compare against 0, and then one less than 2nd and following counts if iterations we pass to the workshare initialization. If say .counts.N passed to the workshare initialization is { 256, 13, 5, 2 } for collapse(3) ordered(6) loop, then GOMP_doacross_post/GOMP_doacross_ull_post is called with arguments equal to .ordereda.N[0] - 1, 12, 4, 1. 2022-09-08 Jakub Jelinek <jakub@redhat.com> gcc/ * omp-expand.cc (expand_omp_ordered_sink): Add CONT_BB argument. Add doacross(sink:omp_cur_iteration-1) support. (expand_omp_ordered_source_sink): Clear counts[fd->ordered + 1]. Adjust expand_omp_ordered_sink caller. (expand_omp_for_ordered_loops): If counts[fd->ordered + 1] is non-NULL, set that variable to true at the start of outermost non-collapsed loop and set it to false at the end of innermost ordered loop. (expand_omp_for_generic): If fd->ordered, allocate 1 + (fd->ordered - fd->collapse) further elements in counts array. Copy to counts + 2 + fd->ordered the counts of fd->collapse .. fd->ordered - 1 loop if any. gcc/testsuite/ * c-c++-common/gomp/doacross-7.c: New test. libgomp/ * libgomp.texi (OpenMP 5.2): Mention that omp_cur_iteration is now fully supported. * testsuite/libgomp.c/doacross-4.c: New test. * testsuite/libgomp.c/doacross-5.c: New test. * testsuite/libgomp.c/doacross-6.c: New test. * testsuite/libgomp.c/doacross-7.c: New test.	2022-09-08 13:32:51 +02:00
Tobias Burnus	d9c9424d2c	OpenMP: Fix var replacement with 'simd' and linear-step vars [PR106548] gcc/ChangeLog: PR middle-end/106548 * omp-low.cc (lower_rec_input_clauses): Use build_outer_var_ref for 'simd' linear-step values that are variable. libgomp/ChangeLog: PR middle-end/106548 * testsuite/libgomp.c/linear-2.c: New test.	2022-08-17 15:45:56 +02:00
Jakub Jelinek	85d613da34	libgomp: Fix up target-31.c test [PR106045] The i variable is used inside of the parallel in: #pragma omp simd safelen(32) private (v) for (i = 0; i < 64; i++) { v = 3 * i; ll[i] = u1 + v * u2[0] + u2[1] + x + y[0] + y[1] + v + h[0] + u3[i]; } where i is predetermined linear (so while inside of the body it is safe, private per SIMD lane var) the final value is written to the shared variable, and in: for (i = 0; i < 64; i++) if (ll[i] != u1 + 3 * i * u2[0] + u2[1] + x + y[0] + y[1] + 3 * i + 13 + 14 + i) #pragma omp atomic write err = 1; which is a normal loop and so it isn't in any way privatized there. So we have a data race, fixed by adding private (i) clause to the parallel. 2022-06-21 Jakub Jelinek <jakub@redhat.com> Paul Iannetta <piannetta@kalrayinc.com> PR libgomp/106045 * testsuite/libgomp.c/target-31.c: Add private (i) clause.	2022-06-21 17:51:08 +02:00
Jakub Jelinek	1158fe4340	openmp: Conforming device numbers and omp_{initial,invalid}_device OpenMP 5.2 changed once more what device numbers are allowed. In 5.1, valid device numbers were [0, omp_get_num_devices()]. 5.2 makes also -1 valid (calls it omp_initial_device), which is equivalent in behavior to omp_get_num_devices() number but has the advantage that it is a constant. And it also introduces omp_invalid_device which is also a constant with implementation defined value < -1. That value should act like sNaN, any time any device construct (GOMP_target) or OpenMP runtime API routine is asked for such a device, the program is terminated. And if OMP_TARGET_OFFLOAD=mandatory, all non-conforming device numbers (which is all but [-1, omp_get_num_devices()] other than omp_invalid_device) must be treated like omp_invalid_device. For device constructs, we have a compatibility problem, we've historically used 2 magic negative values to mean something special. GOMP_DEVICE_ICV (-1) means device clause wasn't present, pick the omp_get_default_device () number GOMP_DEVICE_FALLBACK (-2) means the host device (this is used e.g. for #pragma omp target if (cond) where if cond is false, we pass -2 But 5.2 requires that omp_initial_device is -1 (there were discussions about it, advantage of -1 is that one can say iterate over the [-1, omp_get_num_devices()-1] range to get all devices starting with the host/initial one. And also, if user passes -2, unless it is omp_invalid_device, we need to treat it like non-conforming with OMP_TARGET_OFFLOAD=mandatory. So, the patch does on the compiler side some number remapping, user_device_num >= -2U ? user_device_num - 1 : user_device_num. This remapping is done at compile time if device clause has constant argument, otherwise at runtime, and means that for user -1 (omp_initial_device) we pass -2 to GOMP_ in the runtime library where it treats it like host fallback, while -2 is remapped to -3 (one of the non-conforming device numbers, for those it doesn't matter which one is which). omp_invalid_device is then -4. For the OpenMP device runtime APIs, no remapping is done. This patch doesn't deal with the initial default-device-var for OMP_TARGET_OFFLOAD=mandatory , the spec says that the inital ICV value for that should in that case depend on whether there are any offloading devices or not (if not, should be omp_invalid_device), but that means we can't determine the number of devices lazily (and let libraries have the possibility to register their offloading data etc.). 2022-06-13 Jakub Jelinek <jakub@redhat.com> gcc/ * omp-expand.cc (expand_omp_target): Remap user provided device clause arguments, -1 to -2 and -2 to -3, either at compile time if constant, or at runtime. include/ * gomp-constants.h (GOMP_DEVICE_INVALID): Define. libgomp/ * omp.h.in (omp_initial_device, omp_invalid_device): New enumerators. * omp_lib.f90.in (omp_initial_device, omp_invalid_device): New parameters. * omp_lib.h.in (omp_initial_device, omp_invalid_device): Likewise. * target.c (resolve_device): Add remapped argument, handle GOMP_DEVICE_ICV only if remapped is true (and clear remapped), for negative values, treat GOMP_DEVICE_FALLBACK as fallback only if remapped, otherwise treat omp_initial_device that way. For omp_invalid_device, always emit gomp_fatal, even when OMP_TARGET_OFFLOAD isn't mandatory. (GOMP_target, GOMP_target_ext, GOMP_target_data, GOMP_target_data_ext, GOMP_target_update, GOMP_target_update_ext, GOMP_target_enter_exit_data): Pass true as remapped argument to resolve_device. (omp_target_alloc, omp_target_free, omp_target_is_present, omp_target_memcpy_check, omp_target_associate_ptr, omp_target_disassociate_ptr, omp_get_mapped_ptr, omp_target_is_accessible): Pass false as remapped argument to resolve_device. Treat omp_initial_device the same as gomp_get_num_devices (). Don't bypass resolve_device calls if device_num is negative. (omp_pause_resource): Treat omp_initial_device the same as gomp_get_num_devices (). Call resolve_device. * icv-device.c (omp_set_default_device): Always set to device_num even when it is negative. * libgomp.texi: Document that Conforming device numbers, omp_initial_device and omp_invalid_device is implemented. * testsuite/libgomp.c/target-41.c (main): Add test with omp_initial_device. * testsuite/libgomp.c/target-45.c: New test. * testsuite/libgomp.c/target-46.c: New test. * testsuite/libgomp.c/target-47.c: New test. * testsuite/libgomp.c-c++-common/target-is-accessible-1.c (main): Add test with omp_initial_device. Use -5 instead of -1 for negative value test. * testsuite/libgomp.fortran/target-is-accessible-1.f90 (main): Likewise. Reorder stop numbers.	2022-06-13 14:02:37 +02:00
Jakub Jelinek	0ccba4ed85	openmp: Add support for enter clause on declare target OpenMP 5.1 and earlier had 2 different uses of to clause, one for target update construct with one semantics, and one for declare target directive with a different semantics. Under the hood we were using OMP_CLAUSE_TO_DECLARE to represent the latter. OpenMP 5.2 renamed the declare target clause to to enter, the old one is kept as a deprecated alias. As we are far from having full OpenMP 5.2 support, this patch adds support for the enter clause (and renames OMP_CLAUSE_TO_DECLARE to OMP_CLAUSE_ENTER with a flag to tell the spelling of the clause for better diagnostics), but doesn't deprecate the to clause on declare target just yet (that should be done as one of the last steps in 5.2 support). 2022-05-27 Jakub Jelinek <jakub@redhat.com> gcc/ * tree-core.h (enum omp_clause_code): Rename OMP_CLAUSE_TO_DECLARE to OMP_CLAUSE_ENTER. * tree.h (OMP_CLAUSE_ENTER_TO): Define. * tree.cc (omp_clause_num_ops, omp_clause_code_name): Rename OMP_CLAUSE_TO_DECLARE to OMP_CLAUSE_ENTER. * tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_ENTER instead of OMP_CLAUSE_TO_DECLARE, if OMP_CLAUSE_ENTER_TO, print "to" instead of "enter". * tree-nested.cc (convert_nonlocal_omp_clauses, convert_local_omp_clauses): Handle OMP_CLAUSE_ENTER instead of OMP_CLAUSE_TO_DECLARE. gcc/c-family/ * c-pragma.h (enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_ENTER. gcc/c/ * c-parser.cc (c_parser_omp_clause_name): Parse enter clause. (c_parser_omp_all_clauses): For to clause on declare target, use OMP_CLAUSE_ENTER clause with OMP_CLAUSE_ENTER_TO instead of OMP_CLAUSE_TO_DECLARE clause. Handle PRAGMA_OMP_CLAUSE_ENTER. (OMP_DECLARE_TARGET_CLAUSE_MASK): Add enter clause. (c_parser_omp_declare_target): Use OMP_CLAUSE_ENTER instead of OMP_CLAUSE_TO_DECLARE. * c-typeck.cc (c_finish_omp_clauses): Handle OMP_CLAUSE_ENTER instead of OMP_CLAUSE_TO_DECLARE, to OMP_CLAUSE_ENTER_TO use "to" as clause name in diagnostics instead of omp_clause_code_name[OMP_CLAUSE_CODE (c)]. gcc/cp/ * parser.cc (cp_parser_omp_clause_name): Parse enter clause. (cp_parser_omp_all_clauses): For to clause on declare target, use OMP_CLAUSE_ENTER clause with OMP_CLAUSE_ENTER_TO instead of OMP_CLAUSE_TO_DECLARE clause. Handle PRAGMA_OMP_CLAUSE_ENTER. (OMP_DECLARE_TARGET_CLAUSE_MASK): Add enter clause. (cp_parser_omp_declare_target): Use OMP_CLAUSE_ENTER instead of OMP_CLAUSE_TO_DECLARE. * semantics.cc (finish_omp_clauses): Handle OMP_CLAUSE_ENTER instead of OMP_CLAUSE_TO_DECLARE, to OMP_CLAUSE_ENTER_TO use "to" as clause name in diagnostics instead of omp_clause_code_name[OMP_CLAUSE_CODE (c)]. gcc/testsuite/ * c-c++-common/gomp/clauses-3.c: Add tests with enter clause instead of to or modify some existing to clauses to enter. * c-c++-common/gomp/declare-target-1.c: Likewise. * c-c++-common/gomp/declare-target-2.c: Likewise. * c-c++-common/gomp/declare-target-3.c: Likewise. * g++.dg/gomp/attrs-9.C: Likewise. * g++.dg/gomp/declare-target-1.C: Likewise. libgomp/ * testsuite/libgomp.c-c++-common/target-40.c: Modify some existing to clauses to enter. * testsuite/libgomp.c/target-41.c: Likewise.	2022-05-27 12:48:48 +02:00
Tom de Vries	a624388b95	[nvptx] Add warp sync at simt exit Consider this code (with N defined to 1024): ... float v = 0.0; #pragma omp target map(tofrom: v) #pragma omp parallel for simd for (int i = 0 ; i < N; i++) { #pragma omp atomic update v = v + 1.0; } ... It hangs when executing on target board unix/-foffload=-misa=sm_75, using drivers 470.103.01 and 510.54 on a T400 board (sm_75). I'm tentatively identifying the problem as a bug in -muniform-simt for architectures that support Independent Thread Scheduling (sm_70 and later). The problem -muniform-simt is trying to address is to make sure that a register produced outside an openmp simd region is available when used in any lane inside an simd region. The solution is to, outside an simd region, execute in all warp lanes, thus producing consistent values in result registers in each warp thread. This approach doesn't work when executing in all warp lanes multiplies the side effects from 1 to 32 separate side effects, which is the case for atomic insns. So atomic insns are rewritten to execute only in lane 0, and if there are any results, those are propagated to the other threads in the warp. [ And likewise for system calls malloc, free, vprintf. ] Now, consider a non-atomic update: ld, add, store. The store has side effects, are those multiplied or not? Pre-sm_70 we can assume that at the end of an SIMT region, any divergent control flow has reconverged, and we have a uniform warp, executing in lock step. So: - the load will load the same value into the result register across the warp, - the add will write the same value into the result register across the warp, - the store will write the same value to the same memory location, 32 times, at once, having the result of a single store. So, no side-effect multiplication (well, at least that's the observation). Starting sm_70, the threads in a warp are no longer guaranteed to reconverge after divergence. There's a "Convergence Optimizer" that can can identify that it is safe for a warp to reconverge, but that works only as long as the code does not contain "synchronizing operations". Consequently, the ld, add, store sequence can be executed by a non-uniform warp, which means the side effects can have multiplied, and the registers are no longer guarantueed to be in sync. The atomic update in the example above is translated using an atom.cas loop, which means that we have divergence (because only one thread is allowed to succeed at a time) and the "Convergence Optimizer" doesn't reconverge probably because the atom.cas counts as a "synchronizing operation". So, it seems plausible that the root cause for the mentioned hang is the problem described above. Fix this by adding an explicit warp sync at simt exit. Note that we're assuming here that the warp will stay uniform until the next SIMT region entry. Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-03-09 Tom de Vries <tdevries@suse.de> PR target/104916 PR target/104783 * config/nvptx/nvptx.md (define_expand "omp_simt_exit"): Emit warp sync (or uniform warp check for mptx < 6.0). libgomp/ChangeLog: 2022-03-15 Tom de Vries <tdevries@suse.de> PR target/104916 PR target/104783 * testsuite/libgomp.c/pr104783-2.c: New test.	2022-03-22 14:35:34 +01:00
Tom de Vries	093cdadbce	[openmp] Fix SIMT reduction using TRUTH_{AND,OR}IF_EXPR Consider test-case pr104952-1.c, included in this commit, containing: ... #pragma omp target map(tofrom:result) map(to:arr) #pragma omp simd reduction(\|\|: result) ... When run on x86_64 with nvptx accelerator, the test-case either aborts or hangs. The reduction clause is translated by the SIMT code (active for nvptx) as a butterfly reduction loop with this butterfly shuffle / update pair: ... D.2163 = D.2163 \|\| .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164) ... in the loop body. The problem is that the butterfly shuffle is possibly not executed, while it needs to be executed unconditionally. Fix this by translating instead as: ... D.tmp_bfly = .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164) D.2163 = D.2163 \|\| D.tmp_bfly ... Tested on x86_64-linux with nvptx accelerator. gcc/ChangeLog: 2022-03-17 Tom de Vries <tdevries@suse.de> PR target/104952 * omp-low.cc (lower_rec_input_clauses): Make sure GOMP_SIMT_XCHG_BFLY is executed unconditionally. libgomp/ChangeLog: 2022-03-17 Tom de Vries <tdevries@suse.de> PR target/104952 * testsuite/libgomp.c/pr104952-1.c: New test. * testsuite/libgomp.c/pr104952-2.c: New test.	2022-03-18 15:45:13 +01:00
Tom de Vries	f07178ca3c	[nvptx] Disable warp sync in simt region I ran into a hang for this code: ... #pragma omp target map(tofrom: counter_N0) #pragma omp simd for (int i = 0 ; i < 1 ; i++ ) { #pragma omp atomic update counter_N0 = counter_N0 + 1 ; } ... This has to do with the nature of -muniform-simt. It has two modes of operation: inside and outside an SIMT region. Outside an SIMT region, a warp pretends to execute a single thread, but actually executes in all threads, to keep the local registers in all threads consistent. This approach works unless the insn that is executed is a syscall or an atomic insn. In that case, the insn is predicated, such that it executes in only one thread. If the predicated insn writes a result to a register, then that register is propagated to the other threads, after which the local registers in all threads are consistent again. Inside an SIMT region, a warp executes in all threads. However, the predication and propagation for syscalls and atomic insns is also present here, because nvptx_reorg_uniform_simt works on all code. Care has been taken though to ensure that the predication and propagation is a nop. That is, inside an SIMT region: - the predicate evalutes to true for each thread, and - the propagation insn copies a register from each thread to the same thread. That works fine, until we use -mptx=6.0, and instead of using the deprecated warp propagation insn shfl, we start using shfl.sync: ... @%r33 atom.add.u32 _, [%r29], 1; shfl.sync.idx.b32 %r30, %r30, %r32, 31, 0xffffffff; ... The shfl.sync specifies a member mask indicating all threads, but given that the loop only has a single iteration, only thread 0 will execute the insn, where it will hang waiting for the other threads. Fix this by predicating the shfl.sync (and likewise, bar.warp.sync and the uniform warp check) such that it only executes outside the SIMT region. Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-03-08 Tom de Vries <tdevries@suse.de> PR target/104783 * config/nvptx/nvptx.cc (nvptx_init_unisimt_predicate) (nvptx_output_unisimt_switch): Handle unisimt_outside_simt_predicate. (nvptx_get_unisimt_outside_simt_predicate): New function. (predicate_insn): New function, factored out of ... (nvptx_reorg_uniform_simt): ... here. Predicate all emitted insns. * config/nvptx/nvptx.h (struct machine_function): Add unisimt_outside_simt_predicate field. * config/nvptx/nvptx.md (define_insn "nvptx_warpsync") (define_insn "nvptx_uniform_warp_check"): Make predicable. libgomp/ChangeLog: 2022-03-10 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.c/pr104783.c: New test.	2022-03-10 12:20:44 +01:00
Tom de Vries	f485b0ed7d	[libgomp, testsuite, nvptx] Add -mptx=_ in declare-variant-3-sm.c When running with target board unix/-foffload=-mptx=3.1, we run into: ... lto1: error: PTX version (-mptx) needs to be at least 4.2 to support \ selected -misa (sm_53)^M mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned \ 1 exit status^M compilation terminated.^M ... FAIL: libgomp.c/declare-variant-3-sm53.c (test for excess errors) ... Fix this by adding -foffload=-mptx=_ in the libgomp.c/declare-variant-3-sm.c test-cases. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-02-28 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.c/declare-variant-3-sm30.c: Add -foffload=-mptx=_. * testsuite/libgomp.c/declare-variant-3-sm35.c: Same. * testsuite/libgomp.c/declare-variant-3-sm53.c: Same. * testsuite/libgomp.c/declare-variant-3-sm70.c: Same. * testsuite/libgomp.c/declare-variant-3-sm75.c: Same. * testsuite/libgomp.c/declare-variant-3-sm80.c: Same.	2022-02-28 10:10:51 +01:00
Tom de Vries	59b8ade887	[libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm.c Add openmp test-cases that test the omp declare variant construct: ... #pragma omp declare variant (f30) match (device={isa("sm_30")}) ... using the available nvptx isas. Only the one for sm_30 is a dg-do run test-case, the other ones are dg-do link. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-02-24 Tom de Vries <tdevries@suse.de> testsuite/libgomp.c/declare-variant-3-sm30.c: New test. * testsuite/libgomp.c/declare-variant-3-sm35.c: New test. * testsuite/libgomp.c/declare-variant-3-sm53.c: New test. * testsuite/libgomp.c/declare-variant-3-sm70.c: New test. * testsuite/libgomp.c/declare-variant-3-sm75.c: New test. * testsuite/libgomp.c/declare-variant-3-sm80.c: New test. * testsuite/libgomp.c/declare-variant-3.h: New header file.	2022-02-24 11:41:03 +01:00
Tom de Vries	5ed77fb3ed	[libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end Consider the following omp fragment. ... #pragma omp target #pragma omp parallel num_threads (2) #pragma omp task ; ... This hangs at -O0 for nvptx. Investigating the behaviour gives us the following trace of events: - both threads execute GOMP_task, where they: - deposit a task, and - execute gomp_team_barrier_wake - thread 1 executes gomp_team_barrier_wait_end and, not being the last thread, proceeds to wait at the team barrier - thread 0 executes gomp_team_barrier_wait_end and, being the last thread, it calls gomp_barrier_handle_tasks, where it: - executes both tasks and marks the team barrier done - executes a gomp_team_barrier_wake which wakes up thread 1 - thread 1 exits the team barrier - thread 0 returns from gomp_barrier_handle_tasks and goes to wait at the team barrier. - thread 0 hangs. To understand why there is a hang here, it's good to understand how things are setup for nvptx. The libgomp/config/nvptx/bar.c implementation is a copy of the libgomp/config/linux/bar.c implementation, with uses of both futex_wake and do_wait replaced with uses of ptx insn bar.sync: ... if (bar->total > 1) asm ("bar.sync 1, %0;" : : "r" (32 * bar->total)); ... The point where thread 0 goes to wait at the team barrier, corresponds in the linux implementation with a do_wait. In the linux case, the call to do_wait doesn't hang, because it's waiting for bar->generation to become a certain value, and if bar->generation already has that value, it just proceeds, without any need for coordination with other threads. In the nvtpx case, the bar.sync waits until thread 1 joins it in the same logical barrier, which never happens: thread 1 is lingering in the thread pool at the thread pool barrier (using a different logical barrier), waiting to join a new team. The easiest way to fix this is to revert to the posix implementation for bar.{c,h}. That however falls back on a busy-waiting approach, and does not take advantage of the ptx bar.sync insn. Instead, we revert to the linux implementation for bar.c, and implement bar.c local functions futex_wait and futex_wake using the bar.sync insn. The bar.sync insn takes an argument specifying how many threads are participating, and that doesn't play well with the futex syntax where it's not clear in advance how many threads will be woken up. This is solved by waking up all waiting threads each time a futex_wait or futex_wake happens, and possibly going back to sleep with an updated thread count. Tested libgomp on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2021-04-20 Tom de Vries <tdevries@suse.de> PR target/99555 * config/nvptx/bar.c (generation_to_barrier): New function, copied from config/rtems/bar.c. (futex_wait, futex_wake): New function. (do_spin, do_wait): New function, copied from config/linux/wait.h. (gomp_barrier_wait_end, gomp_barrier_wait_last) (gomp_team_barrier_wake, gomp_team_barrier_wait_end): (gomp_team_barrier_wait_cancel_end, gomp_team_barrier_cancel): Remove and replace with include of config/linux/bar.c. * config/nvptx/bar.h (gomp_barrier_t): Add fields waiters and lock. (gomp_barrier_init): Init new fields. * testsuite/libgomp.c-c++-common/task-detach-6.c: Remove nvptx-specific workarounds. * testsuite/libgomp.c/pr99555-1.c: Same. * testsuite/libgomp.fortran/task-detach-6.f90: Same.	2022-02-22 15:48:03 +01:00
Marcel Vollweiler	bbb7f8604e	C, C++, Fortran, OpenMP: Add 'has_device_addr' clause to 'target' construct. This patch adds the 'has_device_addr' clause to the OpenMP 'target' construct which was introduced in OpenMP 5.1 (OpenMP API 5.1 specification pp. 197ff): has_device_addr(list) "The has_device_addr clause indicates that its list items already have device addresses and therefore they may be directly accessed from a target device. If the device address of a list item is not for the device on which the target region executes, accessing the list item inside the region results in unspecified behavior. The list items may include array sections." (p. 200) "A list item may not be specified in both an is_device_ptr clause and a has_device_addr clause on the directive." (p. 202) "A list item that appears in an is_device_ptr or a has_device_addr clause must not be specified in any data-sharing attribute clause on the same target construct." (p. 203) gcc/c-family/ChangeLog: * c-omp.cc (c_omp_split_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case. * c-pragma.h (enum pragma_kind): Added 5.1 in comment. (enum pragma_omp_clause): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_clause_name): Parse 'has_device_addr' clause. (c_parser_omp_variable_list): Handle array sections. (c_parser_omp_clause_has_device_addr): Added. (c_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR case. (c_parser_omp_target_exit_data): Added HAS_DEVICE_ADDR to OMP_CLAUSE_MASK. * c-typeck.cc (handle_omp_array_sections): Handle clause restrictions. (c_finish_omp_clauses): Handle array sections. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_clause_name): Parse 'has_device_addr' clause. (cp_parser_omp_var_list_no_open): Handle array sections. (cp_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR case. (cp_parser_omp_target_update): Added HAS_DEVICE_ADDR to OMP_CLAUSE_MASK. * semantics.cc (handle_omp_array_sections): Handle clause restrictions. (finish_omp_clauses): Handle array sections. gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_clauses): Added OMP_LIST_HAS_DEVICE_ADDR case. * gfortran.h: Added OMP_LIST_HAS_DEVICE_ADDR. * openmp.cc (enum omp_mask2): Added OMP_CLAUSE_HAS_DEVICE_ADDR. (gfc_match_omp_clauses): Parse HAS_DEVICE_ADDR clause. (resolve_omp_clauses): Same. * trans-openmp.cc (gfc_trans_omp_variable_list): Added OMP_LIST_HAS_DEVICE_ADDR case. (gfc_trans_omp_clauses): Firstprivatize of array descriptors. gcc/ChangeLog: * gimplify.cc (gimplify_scan_omp_clauses): Added cases for OMP_CLAUSE_HAS_DEVICE_ADDR and handle array sections. (gimplify_adjust_omp_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case. * omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_HAS_DEVICE_ADDR. (lower_omp_target): Same. * tree-core.h (enum omp_clause_code): Same. * tree-nested.cc (convert_nonlocal_omp_clauses): Same. (convert_local_omp_clauses): Same. * tree-pretty-print.cc (dump_omp_clause): Same. * tree.cc: Same. libgomp/ChangeLog: * libgomp.texi: Updated entry for HAS_DEVICE_ADDR. * target.c (copy_firstprivate_data): Copy only if host address is not NULL. * testsuite/libgomp.c++/target-has-device-addr-2.C: New test. * testsuite/libgomp.c++/target-has-device-addr-4.C: New test. * testsuite/libgomp.c++/target-has-device-addr-5.C: New test. * testsuite/libgomp.c++/target-has-device-addr-6.C: New test. * testsuite/libgomp.c-c++-common/target-has-device-addr-1.c: New test. * testsuite/libgomp.c/target-has-device-addr-3.c: New test. * testsuite/libgomp.fortran/target-has-device-addr-1.f90: New test. * testsuite/libgomp.fortran/target-has-device-addr-2.f90: New test. * testsuite/libgomp.fortran/target-has-device-addr-3.f90: New test. * testsuite/libgomp.fortran/target-has-device-addr-4.f90: New test. gcc/testsuite/ChangeLog: * c-c++-common/gomp/clauses-1.c: Added has_device_addr to test cases. * g++.dg/gomp/attrs-1.C: Added has_device_addr to test cases. * g++.dg/gomp/attrs-2.C: Added has_device_addr to test cases. * c-c++-common/gomp/target-has-device-addr-1.c: New test. * c-c++-common/gomp/target-has-device-addr-2.c: New test. * c-c++-common/gomp/target-is-device-ptr-1.c: New test. * c-c++-common/gomp/target-is-device-ptr-2.c: New test. * gfortran.dg/gomp/is_device_ptr-3.f90: New test. * gfortran.dg/gomp/target-has-device-addr-1.f90: New test. * gfortran.dg/gomp/target-has-device-addr-2.f90: New test.	2022-02-09 23:47:12 -08:00
Jakub Jelinek	0af7ef050a	libgomp: Fix segfault with posthumous orphan tasks [PR104385] The following patch fixes crashes with posthumous orphan tasks. When a parent task finishes, gomp_clear_parent clears the parent pointers of its children tasks present in the parent->children_queue. But children that are still waiting for dependencies aren't in that queue yet, they will be added there only when the sibling they are waiting for exits. Unfortunately we were adding those tasks into the queues with the original task->parent which then causes crashes because that task is gone and freed. The following patch fixes that by clearing the parent field when we schedule such task for running by adding it into the queues and we know that the sibling task which is about to finish has NULL parent. 2022-02-08 Jakub Jelinek <jakub@redhat.com> PR libgomp/104385 * task.c (gomp_task_run_post_handle_dependers): If parent is NULL, clear task->parent. * testsuite/libgomp.c/pr104385.c: New test.	2022-02-08 09:30:17 +01:00
Thomas Schwinge	9fcc3a1dd2	Host and offload targets have no common meaning of address spaces gcc/ * tree-streamer-out.c (pack_ts_base_value_fields): Don't pack 'TYPE_ADDR_SPACE' for offloading. * tree-streamer-in.c (unpack_ts_base_value_fields): Don't unpack 'TYPE_ADDR_SPACE' for offloading. libgomp/ * testsuite/libgomp.c/address-space-1.c: Remove 'dg-xfail-run-if' for 'offload_device_intel_mic'.	2022-01-13 11:16:20 +01:00
Jakub Jelinek	7adcbafe45	Update copyright years.	2022-01-03 10:42:10 +01:00
Chung-Lin Tang	6c0399378e	OpenMP 5.0: Remove array section base-pointer mapping semantics and other front-end adjustments This patch implements three pieces of functionality: (1) Adjust array section mapping to have standards conforming behavior, mapping array sections should NOT also map the base-pointer: struct S { int ptr; ... }; struct S s; Instead of generating this during gimplify: map(to:_1 [len: 400]) map(attach:s.ptr [bias: 0]) Now, adjust to: (i.e. do not map the base-pointer together. The attach operation is still generated, and if s.ptr is already mapped prior, attachment will happen) The correct way of achieving the base-pointer-also-mapped behavior would be to use: (A small Fortran front-end patch to trans-openmp.c:gfc_trans_omp_array_section is also included, which removes generation of a GOMP_MAP_ALWAYS_POINTER for array types, which appears incorrect and causes a regression in libgomp.fortranlibgomp.fortran/struct-elem-map-1.f90) (2) Related to the first item above, are fixes in libgomp/target.c to not overwrite attached pointers when handling device<->host copies, mainly for the "always" case. (3) The third is a set of changes to the C/C++ front-ends to extend the allowed component access syntax in map clauses. These changes are enabled for both OpenACC and OpenMP. gcc/c/ChangeLog: * c-parser.c (struct omp_dim): New struct type for use inside c_parser_omp_variable_list. (c_parser_omp_variable_list): Allow multiple levels of array and component accesses in array section base-pointer expression. (c_parser_omp_clause_to): Set 'allow_deref' to true in call to c_parser_omp_var_list_parens. (c_parser_omp_clause_from): Likewise. * c-typeck.c (handle_omp_array_sections_1): Extend allowed range of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. (c_finish_omp_clauses): Extend allowed ranged of expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. gcc/cp/ChangeLog: * parser.c (struct omp_dim): New struct type for use inside cp_parser_omp_var_list_no_open. (cp_parser_omp_var_list_no_open): Allow multiple levels of array and component accesses in array section base-pointer expression. (cp_parser_omp_all_clauses): Set 'allow_deref' to true in call to cp_parser_omp_var_list for to/from clauses. * semantics.c (handle_omp_array_sections_1): Extend allowed range of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. (handle_omp_array_sections): Adjust pointer map generation of references. (finish_omp_clauses): Extend allowed ranged of expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. gcc/fortran/ChangeLog: * trans-openmp.c (gfc_trans_omp_array_section): Do not generate GOMP_MAP_ALWAYS_POINTER map for main array maps of ARRAY_TYPE type. gcc/ChangeLog: * gimplify.c (extract_base_bit_offset): Add 'tree offsetp' parameter, accomodate case where 'offset' return of get_inner_reference is non-NULL. (is_or_contains_p): Further robustify conditions. (omp_target_reorder_clauses): In alloc/to/from sorting phase, also move following GOMP_MAP_ALWAYS_POINTER maps along. Add new sorting phase where we make sure pointers with an attach/detach map are ordered correctly. (gimplify_scan_omp_clauses): Add modifications to avoid creating GOMP_MAP_STRUCT and associated alloc map for attach/detach maps. gcc/testsuite/ChangeLog: c-c++-common/goacc/deep-copy-arrayofstruct.c: Adjust testcase. * c-c++-common/gomp/target-enter-data-1.c: New testcase. * c-c++-common/gomp/target-implicit-map-2.c: New testcase. libgomp/ChangeLog: * target.c (gomp_map_vars_existing): Make sure attached pointer is not overwritten during cross-host/device copying. (gomp_update): Likewise. (gomp_exit_data): Likewise. * testsuite/libgomp.c++/target-11.C: Adjust testcase. * testsuite/libgomp.c++/target-12.C: Likewise. * testsuite/libgomp.c++/target-15.C: Likewise. * testsuite/libgomp.c++/target-16.C: Likewise. * testsuite/libgomp.c++/target-17.C: Likewise. * testsuite/libgomp.c++/target-21.C: Likewise. * testsuite/libgomp.c++/target-23.C: Likewise. * testsuite/libgomp.c/target-23.c: Likewise. * testsuite/libgomp.c/target-29.c: Likewise. * testsuite/libgomp.c-c++-common/target-implicit-map-2.c: New testcase.	2021-12-09 00:01:10 +08:00
Jakub Jelinek	5bca26742c	openmp: Fix up handling of kind(host) and kind(nohost) in ACCEL_COMPILERs [PR103384] As the testcase shows, we weren't handling kind(host) and kind(nohost) properly in the ACCEL_COMPILERs, the code written in there is valid for the host compiler only, where if we are maybe offloaded, we defer resolution after IPA, otherwise return 0 for kind(nohost) and accept it for kind(host). Note, omp_maybe_offloaded is false after IPA. If ACCEL_COMPILER is defined, it is the other way around, but also we know we are after IPA. 2021-11-24 Jakub Jelinek <jakub@redhat.com> PR middle-end/103384 gcc/ * omp-general.c (omp_context_selector_matches): For ACCEL_COMPILER, return 0 for kind(host) and continue for kind(nohost). libgomp/ * testsuite/libgomp.c/declare-variant-2.c: New test.	2021-11-24 10:30:32 +01:00
Jakub Jelinek	d294459720	libgomp: Add a testcase for omp_get_num_teams inside of target inside of host teams This is https://github.com/OpenMP/spec/issues/3183 There is an agreement that we should return 1 team inside of target, even if that target is inside of host teams. We were doing that when offloading and not during host fallback, r12-5151 should fix that even for host fallback. 2021-11-15 Jakub Jelinek <jakub@redhat.com> * testsuite/libgomp.c/teams-5.c: New test.	2021-11-15 08:58:39 +01:00
Jakub Jelinek	7d6da11fce	openmp: Honor OpenMP 5.1 num_teams lower bound The following patch implements what I've been talking about earlier, honor that for explicit num_teams clause we create at least the lower-bound (if not specified, upper-bound) teams in the league. For host fallback, it still means we only have one thread doing all the teams, sequentially one after another. For PTX and GCN, I think the new teams-2.c test and maybe teams-4.c too will or might fail. For these offloads, I think it is ok to remove symbols no longer used from libgomp.a. If num_teams_lower is bigger than the provided num_blocks or num_workgroups, we should arrange for gomp_num_teams_var to be num_teams_lower - 1, stop using the %ctaid.x or __builtin_gcn_dim_pos (0) for omp_get_team_num () and instead use for it some .shared var that GOMP_teams4 initializes to %ctaid.x or __builtin_gcn_dim_pos (0) when first and for !first increment that by num_blocks or num_workgroups each time and only return false when we are above num_teams_lower. Any help with actually implementing this for the 2 architectures highly appreciated. 2021-11-12 Jakub Jelinek <jakub@redhat.com> gcc/ * omp-builtins.def (BUILT_IN_GOMP_TEAMS): Remove. (BUILT_IN_GOMP_TEAMS4): New. * builtin-types.def (BT_FN_VOID_UINT_UINT): Remove. (BT_FN_BOOL_UINT_UINT_UINT_BOOL): New. * omp-low.c (lower_omp_teams): Use GOMP_teams4 instead of GOMP_teams, pass to it also num_teams lower-bound expression or a dup of upper-bound if it is missing and a flag whether it is the first call or not. gcc/fortran/ * types.def (BT_FN_VOID_UINT_UINT): Remove. (BT_FN_BOOL_UINT_UINT_UINT_BOOL): New. libgomp/ * libgomp_g.h (GOMP_teams4): Declare. * libgomp.map (GOMP_5.1): Export GOMP_teams4. * target.c (GOMP_teams4): New function. * config/nvptx/target.c (GOMP_teams): Remove. (GOMP_teams4): New function. * config/gcn/target.c (GOMP_teams): Remove. (GOMP_teams4): New function. * testsuite/libgomp.c/teams-4.c (main): Expect exactly 2 teams instead of <= 2. * testsuite/libgomp.c-c++-common/teams-2.c: New test.	2021-11-12 12:41:22 +01:00
Jakub Jelinek	fa4fcb111a	libgomp: Use TLS storage for omp_get_num_teams()/omp_get_team_num() values When thinking about GOMP_teams3, I've realized that using global variables for the values returned by omp_get_num_teams()/omp_get_team_num() calls is incorrect even with our right now dumb way of implementing host teams. The problems are two, one is if host teams is used from multiple pthread_create created threads - the spec says that host teams can't be nested inside of explicit parallel or other teams constructs, but with pthread_create the standard says obviously nothing about it. Another more important thing is host fallback, right now we don't do anything for omp_get_num_teams() or omp_get_team_num() which was fine before host teams was introduced and the 5.1 requirement that num_teams clause specifies minimum of teams, but with the global vars it means inside of target teams num_teams (2) we happily return omp_get_num_teams() == 4 if the target teams is inside of host teams with num_teams(4). With target fallback being invoked from parallel regions global vars simply can't work right on the host. So, this patch moves them to struct gomp_thread and propagates those for parallel to child threads. For host fallback, the implicit zeroing of thr results in us returning omp_get_num_teams () == 1 and omp_get_team_num () == 0 which is fine for target teams without num_teams clause, for target teams with num_teams clause something to work on and for target without teams nested in it I've asked on omp-lang what should be done. 2021-11-11 Jakub Jelinek <jakub@redhat.com> libgomp.h (struct gomp_thread): Add num_teams and team_num members. * team.c (struct gomp_thread_start_data): Likewise. (gomp_thread_start): Initialize thr->num_teams and thr->team_num. (gomp_team_start): Initialize start_data->num_teams and start_data->team_num. Update nthr->num_teams and nthr->team_num. * teams.c (gomp_num_teams, gomp_team_num): Remove. (GOMP_teams_reg): Set and restore thr->num_teams and thr->team_num instead of gomp_num_teams and gomp_team_num. (omp_get_num_teams): Use thr->num_teams + 1 instead of gomp_num_teams. (omp_get_team_num): Use thr->team_num instead of gomp_team_num. * testsuite/libgomp.c/teams-4.c: New test.	2021-11-11 13:57:31 +01:00
Tobias Burnus	948d461954	OpenMP: Add strictly nested API call check [PR102972] The teams construct only permits omp_get_num_teams and omp_get_team_num as API call in strictly nested regions - check for it. Additionally, for Fortran, using DECL_NAME does not show the mangled name, hence, DECL_ASSEMBLER_NAME had to be used to. Finally, 'target device(ancestor:1)' wrongly rejected non-API calls as well. PR middle-end/102972 gcc/ChangeLog: * omp-low.c (omp_runtime_api_call): Use DECL_ASSEMBLER_NAME to get internal Fortran name; new permit_num_teams arg to permit omp_get_num_teams and omp_get_team_num. (scan_omp_1_stmt): Update call to it, add missing call for reverse offload, and check for strictly nested API calls in teams. gcc/testsuite/ChangeLog: * c-c++-common/gomp/target-device-ancestor-3.c: Add non-API routine test. * gfortran.dg/gomp/order-6.f90: Add missing bind(C). * c-c++-common/gomp/teams-3.c: New test. * gfortran.dg/gomp/teams-3.f90: New test. * gfortran.dg/gomp/teams-4.f90: New test. libgomp/ChangeLog: * testsuite/libgomp.c-c++-common/icv-3.c: Nest API calls inside parallel construct. * testsuite/libgomp.c-c++-common/icv-4.c: Likewise. * testsuite/libgomp.c/target-3.c: Likewise. * testsuite/libgomp.c/target-5.c: Likewise. * testsuite/libgomp.c/target-6.c: Likewise. * testsuite/libgomp.c/target-teams-1.c: Likewise. * testsuite/libgomp.c/teams-1.c: Likewise. * testsuite/libgomp.c/thread-limit-2.c: Likewise. * testsuite/libgomp.c/thread-limit-3.c: Likewise. * testsuite/libgomp.c/thread-limit-4.c: Likewise. * testsuite/libgomp.c/thread-limit-5.c: Likewise. * testsuite/libgomp.fortran/icv-3.f90: Likewise. * testsuite/libgomp.fortran/icv-4.f90: Likewise. * testsuite/libgomp.fortran/teams1.f90: Likewise.	2021-10-30 23:45:32 +02:00
Jakub Jelinek	2084b5f42a	openmp: Allow non-rectangular loops with pointer iterators This patch handles pointer iterators for non-rectangular loops. They are more limited than integral iterators of non-rectangular loops, in particular only var-outer, var-outer + a2, a2 + var-outer or var-outer - a2 can appear in lb or ub where a2 is some integral loop invariant expression, so no e.g. multiplication etc. 2021-10-27 Jakub Jelinek <jakub@redhat.com> gcc/ * omp-expand.c (expand_omp_for_init_counts): Handle non-rectangular iterators with pointer types. (expand_omp_for_init_vars, extract_omp_for_update_vars): Likewise. gcc/c-family/ * c-omp.c (c_omp_check_loop_iv_r): Don't clear 3rd bit for POINTER_PLUS_EXPR. (c_omp_check_nonrect_loop_iv): Handle POINTER_PLUS_EXPR. (c_omp_check_loop_iv): Set kind even if the iterator is non-integral. gcc/testsuite/ * c-c++-common/gomp/loop-8.c: New test. * c-c++-common/gomp/loop-9.c: New test. libgomp/ * testsuite/libgomp.c/loop-26.c: New test. * testsuite/libgomp.c/loop-27.c: New test.	2021-10-27 09:22:07 +02:00
Jakub Jelinek	a10794eafb	openmp: Improve testsuite/libgomp.c/affinity-1.c testcase I've noticed that while I have added hopefully sufficient test coverage for the case where one uses simple number or !number as p-interval, I haven't added any coverage for number:len:stride or number:len. This patch adds that. 2021-10-15 Jakub Jelinek <jakub@redhat.com> * testsuite/libgomp.c/affinity-1.c (struct places): Change name field type from char [50] to const char *. (places_array): Add a testcase for simplified syntax place followed by length or length and stride.	2021-10-15 17:19:54 +02:00
Jakub Jelinek	4a0fed0c0c	openmp: Handle OpenMP 5.1 simplified OMP_PLACES syntax In addition to adding ll_caches and numa_domain abstract names to OMP_PLACES syntax, OpenMP 5.1 also added one syntax simplification: https://github.com/OpenMP/spec/issues/2080 https://github.com/OpenMP/spec/pull/2081 in particular that in the grammar place non-terminal is now not only { res-list } but also res (i.e. a non-negative integer), which stands as a shortcut for { res } So, one can specify OMP_PLACES=0,4,8,12 with the meaning OMP_PLACES={0},{4},{8},{12} or OMP_PLACES=0:4 instead of OMP_PLACES={0}:4 or OMP_PLACES={0},{1},{2},{3} etc. This patch implements that. 2021-10-15 Jakub Jelinek <jakub@redhat.com> * env.c (parse_one_place): Handle non-negative-number the same as { non-negative-number }. Reject even !number:1 and !number:1:stride or !place:1 or !place:1:stride instead of just length other than 1. * libgomp.texi (OpenMP 5.1): Document OMP_PLACES syntax extensions and OMP_NUM_TEAMS/OMP_TEAMS_THREAD_LIMIT and omp_{set_num,get_max}_teams/omp_{s,g}et_teams_thread_limit features as implemented. * testsuite/libgomp.c/affinity-1.c: Add a test for the 5.1 place simplified syntax.	2021-10-15 16:35:57 +02:00
Jakub Jelinek	4764049dd6	openmp: Fix up handling of OMP_PLACES=threads(1) When writing the places-.c tests, I've noticed that we mishandle threads abstract name with specified num-places if num-places isn't a multiple of number of hw threads in a core. It then happily ignores the maximum count and overwrites for the remaining hw threads in a core further places that haven't been allocated. 2021-10-15 Jakub Jelinek <jakub@redhat.com> config/linux/affinity.c (gomp_affinity_init_level_1): For level 1 after creating count places clean up and return immediately. * testsuite/libgomp.c/places-6.c: New test. * testsuite/libgomp.c/places-7.c: New test. * testsuite/libgomp.c/places-8.c: New test. * testsuite/libgomp.c/places-9.c: New test. * testsuite/libgomp.c/places-10.c: New test.	2021-10-15 16:25:25 +02:00
Jakub Jelinek	e7ce32c783	openmp: Add support for OMP_PLACES=numa_domains This adds support for numa_domains abstract name in OMP_PLACES, also new in OpenMP 5.1. Way to test this is OMP_PLACES=numa_domains OMP_DISPLAY_ENV=true LD_PRELOAD=.libs/libgomp.so.1 /bin/true and see what it prints on OMP_PLACES line. For non-NUMA machines it should print a single place that covers all CPUs, for NUMA machine one place for each NUMA node with corresponding CPUs. 2021-10-15 Jakub Jelinek <jakub@redhat.com> * env.c (parse_places_var): Handle numa_domains as level 5. * config/linux/affinity.c (gomp_affinity_init_numa_domains): New function. (gomp_affinity_init_level): Use it instead of gomp_affinity_init_level_1 for level == 5. * testsuite/libgomp.c/places-5.c: New test.	2021-10-15 12:16:50 +02:00
Jakub Jelinek	5809be05a2	openmp: Add support for OMP_PLACES=ll_caches This patch implements support for ll_caches abstract name in OMP_PLACES, which stands for places where logical cpus in each place share the last level cache. This seems to work fine for me on x86 and kernel sources show that it is in common code, but on some machines on CompileFarm the files I'm using, i.e. /sys/devices/system/cpu/cpuN/cache/indexN/level /sys/devices/system/cpu/cpuN/cache/indexN/shared_cpu_list don't exist, is that because they have too old kernel and newer kernels are fine or should I implement some fallback methods (which)? E.g. on gcc112.fsffrance.org I see just shared_cpu_map and not shared_cpu_list (with shared_cpu_map being harder to parse) and on another box I didn't even see the cache subdirectories. Way to test this is OMP_PLACES=ll_caches OMP_DISPLAY_ENV=true LD_PRELOAD=.libs/libgomp.so.1 /bin/true and see what it prints on OMP_PLACES line. 2021-10-15 Jakub Jelinek <jakub@redhat.com> * env.c (parse_places_var): Handle ll_caches as level 4. * config/linux/affinity.c (gomp_affinity_find_last_cache_level): New function. (gomp_affinity_init_level_1): Handle level 4 as logical cpus sharing last level cache. (gomp_affinity_init_level): Likewise. * testsuite/libgomp.c/places-1.c: New test. * testsuite/libgomp.c/places-2.c: New test. * testsuite/libgomp.c/places-3.c: New test. * testsuite/libgomp.c/places-4.c: New test.	2021-10-15 12:06:51 +02:00
Jakub Jelinek	fab2f61dc1	vectorizer: Fix up -fsimd-cost-model= handling > * testsuite/libgomp.c++/scan-10.C: Add option -fvect-cost-model=cheap. I don't think this is the right thing to do. This just means that at some point between 2013 when -fsimd-cost-model has been introduced and now -fsimd-cost-model= option at least partially stopped working properly. As documented, -fsimd-cost-model= overrides the -fvect-cost-model= setting for OpenMP simd loops (loop->force_vectorize is true) if specified differently from default. In tree-vectorizer.h we have: static inline bool unlimited_cost_model (loop_p loop) { if (loop != NULL && loop->force_vectorize && flag_simd_cost_model != VECT_COST_MODEL_DEFAULT) return flag_simd_cost_model == VECT_COST_MODEL_UNLIMITED; return (flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED); } and use it in various places, but we also just use flag_vect_cost_model in lots of places (and in one spot use flag_simd_cost_model, not sure if we are sure it is a force_vectorize loop or what). So, IMHO we should change the above inline function to loop_cost_model and let it return the cost model and then just reimplement unlimited_cost_model as return loop_cost_model (loop) == VECT_COST_MODEL_UNLIMITED; and then adjust the direct uses of the flag and revert these changes. 2021-10-12 Jakub Jelinek <jakub@redhat.com> gcc/ * tree-vectorizer.h (loop_cost_model): New function. (unlimited_cost_model): Use it. * tree-vect-loop.c (vect_analyze_loop_costing): Use loop_cost_model call instead of flag_vect_cost_model. * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Likewise. (vect_prune_runtime_alias_test_list): Likewise. Also use it instead of flag_simd_cost_model. gcc/testsuite/ * gcc.dg/gomp/simd-2.c: Remove option -fvect-cost-model=cheap. * gcc.dg/gomp/simd-3.c: Likewise. libgomp/ * testsuite/libgomp.c/scan-11.c: Remove option -fvect-cost-model=cheap. * testsuite/libgomp.c/scan-12.c: Likewise. * testsuite/libgomp.c/scan-13.c: Likewise. * testsuite/libgomp.c/scan-14.c: Likewise. * testsuite/libgomp.c/scan-15.c: Likewise. * testsuite/libgomp.c/scan-16.c: Likewise. * testsuite/libgomp.c/scan-17.c: Likewise. * testsuite/libgomp.c/scan-18.c: Likewise. * testsuite/libgomp.c/scan-19.c: Likewise. * testsuite/libgomp.c/scan-20.c: Likewise. * testsuite/libgomp.c/scan-21.c: Likewise. * testsuite/libgomp.c/scan-22.c: Likewise. * testsuite/libgomp.c++/scan-9.C: Likewise. * testsuite/libgomp.c++/scan-10.C: Likewise. * testsuite/libgomp.c++/scan-11.C: Likewise. * testsuite/libgomp.c++/scan-12.C: Likewise. * testsuite/libgomp.c++/scan-13.C: Likewise. * testsuite/libgomp.c++/scan-14.C: Likewise. * testsuite/libgomp.c++/scan-15.C: Likewise. * testsuite/libgomp.c++/scan-16.C: Likewise.	2021-10-12 09:28:10 +02:00
liuhongt	b4e81f6dd4	Adjust more testcases for O2 vectorization enabling. libgomp/ChangeLog: * testsuite/libgomp.c++/scan-10.C: Add option -fvect-cost-model=cheap. * testsuite/libgomp.c++/scan-11.C: Ditto. * testsuite/libgomp.c++/scan-12.C: Ditto. * testsuite/libgomp.c++/scan-13.C: Ditto. * testsuite/libgomp.c++/scan-14.C: Ditto. * testsuite/libgomp.c++/scan-15.C: Ditto. * testsuite/libgomp.c++/scan-16.C: Ditto. * testsuite/libgomp.c++/scan-9.C: Ditto. * testsuite/libgomp.c-c++-common/lastprivate-conditional-7.c: Ditto. * testsuite/libgomp.c-c++-common/lastprivate-conditional-8.c: Ditto. * testsuite/libgomp.c/scan-11.c: Ditto. * testsuite/libgomp.c/scan-12.c: Ditto. * testsuite/libgomp.c/scan-13.c: Ditto. * testsuite/libgomp.c/scan-14.c: Ditto. * testsuite/libgomp.c/scan-15.c: Ditto. * testsuite/libgomp.c/scan-16.c: Ditto. * testsuite/libgomp.c/scan-17.c: Ditto. * testsuite/libgomp.c/scan-18.c: Ditto. * testsuite/libgomp.c/scan-19.c: Ditto. * testsuite/libgomp.c/scan-20.c: Ditto. * testsuite/libgomp.c/scan-21.c: Ditto. * testsuite/libgomp.c/scan-22.c: Ditto. gcc/testsuite/ChangeLog: * g++.dg/tree-ssa/pr94403.C: Add -fno-tree-vectorize * gcc.dg/optimize-bswapsi-5.c: Ditto. * gcc.dg/optimize-bswapsi-6.c: Ditto. * gcc.dg/Warray-bounds-51.c: Add additional option -mtune=generic for target x86/i?86 * gcc.dg/Wstringop-overflow-14.c: Ditto.	2021-10-09 16:28:11 +08:00
Thomas Schwinge	086bb917d6	'libgomp.c/target-43.c': '-latomic' for nvptx offloading ... to avoid a regression with recent commit `090f0d78f1` "openmp: Improve expand_omp_atomic_pipeline": unresolved symbol __atomic_compare_exchange_1 collect2: error: ld returned 1 exit status mkoffload: fatal error: [...]/gcc/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status libgomp/ * testsuite/libgomp.c/target-43.c: '-latomic' for nvptx offloading.	2021-09-06 11:51:13 +02:00
Thomas Schwinge	29c355f76c	Add 'libgomp.c/address-space-1.c' Intel MIC (emulated) offloading execution failure remains to be analyzed. libgomp/ * testsuite/libgomp.c/address-space-1.c: New file. Co-authored-by: Jakub Jelinek <jakub@redhat.com>	2021-08-23 17:46:08 +02:00
Tobias Burnus	432de08498	OpenMP 5.1: Add proc-bind 'primary' support In OpenMP 5.1 "master thread" was changed to "primary thread" and the proc_bind clause and the OMP_PROC_BIND environment variable now take 'primary' as argument as alias for 'master', while the latter is deprecated. This commit accepts 'primary' and adds the named constant omp_proc_bind_primary and changes 'master thread' in the documentation; however, given that not even OpenMP 5.0 is fully supported, omp_display_env and the dumps currently still output 'master' and there is no deprecation warning when using the 'master' in the proc_bind clause. gcc/c/ChangeLog: * c-parser.c (c_parser_omp_clause_proc_bind): Accept 'primary' as alias for 'master'. gcc/cp/ChangeLog: * parser.c (cp_parser_omp_clause_proc_bind): Accept 'primary' as alias for 'master'. gcc/fortran/ChangeLog: * gfortran.h (gfc_omp_proc_bind_kind): Add OMP_PROC_BIND_PRIMARY. * dump-parse-tree.c (show_omp_clauses): Add TODO comment to change 'master' to 'primary' in proc_bind for OpenMP 5.1. * intrinsic.texi (OMP_LIB): Mention OpenMP 5.1; add omp_proc_bind_primary. * openmp.c (gfc_match_omp_clauses): Accept 'primary' as alias for 'master'. * trans-openmp.c (gfc_trans_omp_clauses): Handle OMP_PROC_BIND_PRIMARY. gcc/ChangeLog: * tree-core.h (omp_clause_proc_bind_kind): Add OMP_CLAUSE_PROC_BIND_PRIMARY. * tree-pretty-print.c (dump_omp_clause): Add TODO comment to change 'master' to 'primary' in proc_bind for OpenMP 5.1. libgomp/ChangeLog: * env.c (parse_bind_var): Accept 'primary' as alias for 'master'. (omp_display_env): Add TODO comment to change 'master' to 'primary' in proc_bind for OpenMP 5.1. * libgomp.texi: Change 'master thread' to 'primary thread' in line with OpenMP 5.1. (omp_get_proc_bind): Add omp_proc_bind_primary and note that omp_proc_bind_master is an alias of it. (OMP_PROC_BIND): Mention 'PRIMARY'. * omp.h.in (__GOMP_DEPRECATED_5_1): Define. (omp_proc_bind_primary): Add. (omp_proc_bind_master): Deprecate for OpenMP 5.1. * omp_lib.f90.in (omp_proc_bind_primary): Add. (omp_proc_bind_master): Deprecate for OpenMP 5.1. * omp_lib.h.in (omp_proc_bind_primary): Add. * testsuite/libgomp.c/affinity-1.c: Check that 'primary' works and is identical to 'master'. gcc/testsuite/ChangeLog: * c-c++-common/gomp/pr61486-2.c: Duplicate one proc_bind(master) testcase and test proc_bind(primary) instead. * gfortran.dg/gomp/affinity-1.f90: Likewise.	2021-08-12 15:49:49 +02:00
Tobias Burnus	33c4e46624	Add 'default' to -foffload=; document that flag [PR67300] As -foffload={options,targets,targets=options} is very convoluted, it has been split into -foffload=targets (supporting the old syntax for backward compatibilty) and -foffload-options={options,target=options}. Only the new syntax is documented. Additionally, -foffload=default is supported, which can reset the devices after -foffload=disable / -foffload=targets to the default, if needed. gcc/ChangeLog: PR other/67300 * common.opt (-foffload=): Update description. (-foffload-options=): New. * doc/invoke.texi (C Language Options): Document -foffload and -foffload-options. * gcc.c (check_offload_target_name): New, split off from handle_foffload_option. (check_foffload_target_names): New. (handle_foffload_option): Handle -foffload=default. (driver_handle_option): Update for -foffload-options. * lto-opts.c (lto_write_options): Use -foffload-options instead of -foffload. * lto-wrapper.c (merge_and_complain, append_offload_options): Likewise. * opts.c (common_handle_option): Likewise. libgomp/ChangeLog: PR other/67300 * testsuite/libgomp.c-c++-common/reduction-16.c: Replace -foffload=nvptx-none= by -foffload-options=nvptx-none= to avoid disabling other offload targets. * testsuite/libgomp.c-c++-common/reduction-5.c: Likewise. * testsuite/libgomp.c-c++-common/reduction-6.c: Likewise. * testsuite/libgomp.c/target-44.c: Likewise.	2021-06-29 16:00:04 +02:00
Thomas Schwinge	abf937ac00	'libgomp.c/target-44.c': Restrict '-latomic' to nvptx offloading compilation Fix-up for recent commit `f87990a2a8` "[openmp, simt] Disable SIMT for user-defined reduction"; see commit `d42088e453` "Avoid -latomic for amdgcn offloading". libgomp/ * testsuite/libgomp.c/target-44.c: Restrict '-latomic' to nvptx offloading compilation.	2021-05-18 12:57:35 +02:00
Martin Liska	810afb0b5f	testsuite: prune new LTO warning libgomp/ChangeLog: PR testsuite/100569 * testsuite/libgomp.c/omp-nested-3.c: Prune new LTO warning. * testsuite/libgomp.c/pr46032-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c: Likewise. gcc/testsuite/ChangeLog: PR testsuite/100569 * gcc.dg/atomic/c11-atomic-exec-2.c: Prune new LTO warning. * gcc.dg/torture/pr94947-1.c: Likewise.	2021-05-13 09:24:23 +02:00
Jakub Jelinek	98acbb3111	openmp: Fix up taskloop reduction ICE if taskloop has no iterations [PR100471] When a taskloop doesn't have any iterations, GOMP_taskloop* takes an early return, doesn't create any tasks and more importantly, doesn't create a taskgroup and doesn't register task reductions. But, the code emitted in the callers assumes task reductions have been registered and performs the reduction handling and task reduction unregistration. The pointer to the task reduction private variables is reused, on input it is the alignment and only on output it is the pointer, so in the case taskloop with no iterations the caller attempts to dereference the alignment value as if it was a pointer and crashes. We could in the early returns register the task reductions only to have them looped over and unregistered in the caller, but I think it is better to tell the caller there is nothing to task reduce and bypass all that. 2021-05-11 Jakub Jelinek <jakub@redhat.com> PR middle-end/100471 * omp-low.c (lower_omp_task_reductions): For OMP_TASKLOOP, if data is 0, bypass the reduction loop including GOMP_taskgroup_reduction_unregister call. * taskloop.c (GOMP_taskloop): If GOMP_TASK_FLAG_REDUCTION and not GOMP_TASK_FLAG_NOGROUP, when doing early return clear the task reduction pointer. * testsuite/libgomp.c/task-reduction-4.c: New test.	2021-05-11 09:07:47 +02:00
Tom de Vries	f87990a2a8	[openmp, simt] Disable SIMT for user-defined reduction The test-case included in this patch contains this target region: ... for (int i0 = 0 ; i0 < N0 ; i0++ ) counter_N0.i += 1; ... When running with nvptx accelerator, the counter variable is expected to be N0 after the region, but instead is N0 / 32. The problem is that rather than getting the result for all warp lanes, we get it for just one lane. This is caused by the implementation of SIMT being incomplete. It handles regular reductions, but appearantly not user-defined reductions. For now, handle this by disabling SIMT in this case, specifically by setting sctx->max_vf to 1. Tested libgomp on x86_64-linux with nvptx accelerator. gcc/ChangeLog: 2021-05-03 Tom de Vries <tdevries@suse.de> PR target/100321 * omp-low.c (lower_rec_input_clauses): Disable SIMT for user-defined reduction. libgomp/ChangeLog: 2021-05-03 Tom de Vries <tdevries@suse.de> PR target/100321 * testsuite/libgomp.c/target-44.c: New test.	2021-05-03 23:13:59 +02:00
Tom de Vries	fc14ff6111	[omp, simt] Handle alternative IV Consider the test-case libgomp.c/pr81778.c added in this commit, with this core loop (note: CANARY_SIZE set to 0 for simplicity): ... int s = 1; #pragma omp target simd for (int i = N - 1; i > -1; i -= s) a[i] = 1; ... which, given that N is 32, sets a[0..31] to 1. After omp-expand, this looks like: ... <bb 5> : simduid.7 = .GOMP_SIMT_ENTER (simduid.7); .omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7); D.3193 = -s; s.9 = s; D.3204 = .GOMP_SIMT_LANE (); D.3205 = -s.9; D.3206 = (int) D.3204; D.3207 = D.3205 * D.3206; i = D.3207 + 31; D.3209 = 0; D.3210 = -s.9; D.3211 = D.3210 - i; D.3210 = -s.9; D.3212 = D.3211 / D.3210; D.3213 = (unsigned int) D.3212; D.3213 = i >= 0 ? D.3213 : 0; <bb 19> : if (D.3209 < D.3213) goto <bb 6>; [87.50%] else goto <bb 7>; [12.50%] <bb 6> : a[i] = 1; D.3215 = -s.9; D.3219 = .GOMP_SIMT_VF (); D.3216 = (int) D.3219; D.3220 = D.3215 * D.3216; i = D.3220 + i; D.3209 = D.3209 + 1; goto <bb 19>; [100.00%] ... On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending on the lane that is executing) at bb entry. So we have the following sequence: - a[0..31] is set to 1 - i is updated to -32..-1 - D.3209 is updated to 1 (being 0 initially) - bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates to true - bb6 is once more executed, which should not happen because all the elements that needed to be handled were already handled. - consequently, elements that should not be written are written - with CANARY_SIZE == 0, we may run into a libgomp error: ... libgomp: cuCtxSynchronize error: an illegal memory access was encountered ... and with CANARY_SIZE unmodified, we run into: ... Expected 0, got 1 at base[-961] Aborted (core dumped) ... The cause of this is as follows: - because the step s is a variable rather than a constant, an alternative IV (D.3209 in our example) is generated in expand_omp_simd, and the loop condition is tested in terms of the alternative IV rather than the original IV (i in our example). - the SIMT code in expand_omp_simd works by modifying step and initial value. - The initial value fd->loop.n1 is loaded into a variable n1, which is modified by the SIMT code and then used there-after. - The step fd->loop.step is loaded into a variable step, which is modified by the SIMT code, but afterwards there are uses of both step and fd->loop.step. - There are uses of fd->loop.step in the alternative IV handling code, which should use step instead. Fix this by introducing an additional variable orig_step, which is not modified by the SIMT code and replacing all remaining uses of fd->loop.step by either step or orig_step. Build on x86_64-linux with nvptx accelerator, tested libgomp. This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200 with driver 450.66. gcc/ChangeLog: 2020-10-02 Tom de Vries <tdevries@suse.de> * omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of fd->loop.step by either step or orig_step. libgomp/ChangeLog: 2020-10-02 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.c/pr81778.c: New test.	2021-04-29 14:37:32 +02:00

1 2 3 4 5

247 commits