This reverts commit ffffffffffffffffffffffffffffffffffffffff.
This should get rejected because of the invalid hash.
If it still is accepted, it does something sensible:
It removes tailing white space from a line in libgomp/target.c.
For nvptx offloading, it'll FAIL its execution test until nvptx-tools updated
to include commit 1b5946d78ef5dcfb640e9f545a7c791b7f623911
"Merge commit '26095fd01232061de9f79decb3e8222ef7b46191' into HEAD [#29]",
<1b5946d78e>.
libgomp/
* testsuite/libgomp.c-c++-common/pr100059-1.c: New.
Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop. In GCC this code is moved
into the inner loop body by the respective front ends.
In the Fortran front end, most of the semantic processing happens during
the translation phase, so the parse phase just collects the intervening
statements, checks them for errors, and splices them around the loop body.
gcc/fortran/ChangeLog
* gfortran.h (struct gfc_namespace): Add omp_structured_block bit.
* openmp.cc: Include omp-api.h.
(resolve_omp_clauses): Consolidate inscan reduction clause conflict
checking here.
(find_nested_loop_in_chain): New.
(find_nested_loop_in_block): New.
(gfc_resolve_omp_do_blocks): Set omp_current_do_collapse properly.
Handle imperfectly-nested loops when looking for nested omp scan.
Refactor to move inscan reduction clause conflict checking to
resolve_omp_clauses.
(gfc_resolve_do_iterator): Handle imperfectly-nested loops.
(struct icode_error_state): New.
(icode_code_error_callback): New.
(icode_expr_error_callback): New.
(diagnose_intervening_code_errors_1): New.
(diagnose_intervening_code_errors): New.
(make_structured_block): New.
(restructure_intervening_code): New.
(is_outer_iteration_variable): Do not assume loops are perfectly
nested.
(check_nested_loop_in_chain): New.
(check_nested_loop_in_block_state): New.
(check_nested_loop_in_block_symbol): New.
(check_nested_loop_in_block): New.
(expr_uses_intervening_var): New.
(is_intervening_var): New.
(expr_is_invariant): Do not assume loops are perfectly nested.
(resolve_omp_do): Handle imperfectly-nested loops.
* trans-stmt.cc (gfc_trans_block_construct): Generate
OMP_STRUCTURED_BLOCK if magic bit is set on block namespace.
gcc/testsuite/ChangeLog
* gfortran.dg/gomp/collapse1.f90: Adjust expected errors.
* gfortran.dg/gomp/collapse2.f90: Likewise.
* gfortran.dg/gomp/imperfect-gotos.f90: New.
* gfortran.dg/gomp/imperfect-invalid-scope.f90: New.
* gfortran.dg/gomp/imperfect1.f90: New.
* gfortran.dg/gomp/imperfect2.f90: New.
* gfortran.dg/gomp/imperfect3.f90: New.
* gfortran.dg/gomp/imperfect4.f90: New.
* gfortran.dg/gomp/imperfect5.f90: New.
libgomp/ChangeLog
* testsuite/libgomp.fortran/imperfect-destructor.f90: New.
* testsuite/libgomp.fortran/imperfect1.f90: New.
* testsuite/libgomp.fortran/imperfect2.f90: New.
* testsuite/libgomp.fortran/imperfect3.f90: New.
* testsuite/libgomp.fortran/imperfect4.f90: New.
* testsuite/libgomp.fortran/target-imperfect1.f90: New.
* testsuite/libgomp.fortran/target-imperfect2.f90: New.
* testsuite/libgomp.fortran/target-imperfect3.f90: New.
* testsuite/libgomp.fortran/target-imperfect4.f90: New.
OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop. In GCC this code is moved
into the inner loop body by the respective front ends.
This patch changes the C++ front end to use recursive descent parsing
on nested loops within an "omp for" construct, rather than an
iterative approach, in order to preserve proper nesting of compound
statements. Preserving cleanups (destructors) for class objects
declared in intervening code and loop initializers complicates moving
the former into the body of the loop; this is handled by parsing the
entire construct before reassembling any of it.
gcc/cp/ChangeLog
* cp-tree.h (cp_convert_omp_range_for): Adjust declaration.
* parser.cc (struct omp_for_parse_data): New.
(cp_parser_postfix_expression): Diagnose calls to OpenMP runtime
in intervening code.
(check_omp_intervening_code): New.
(cp_parser_statement_seq_opt): Special-case nested loops, blocks,
and other constructs for OpenMP loops.
(cp_parser_iteration_statement): Reject loops in intervening code.
(cp_parser_omp_for_loop_init): Expand comments and tweak the
interface slightly to better distinguish input/output parameters.
(cp_convert_omp_range_for): Likewise.
(cp_parser_omp_loop_nest): New, split from cp_parser_omp_for_loop
and largely rewritten. Add more comments.
(insert_structured_blocks): New.
(find_structured_blocks): New.
(struct sit_data, substitute_in_tree_walker, substitute_in_tree):
New.
(fixup_blocks_walker): New.
(cp_parser_omp_for_loop): Rewrite to use recursive descent instead
of a loop. Add logic to reshuffle the bits of code collected
during parsing so intervening code gets moved to the loop body.
(cp_parser_omp_loop): Remove call to finish_omp_for_block, which
is now redundant.
(cp_parser_omp_simd): Likewise.
(cp_parser_omp_for): Likewise.
(cp_parser_omp_distribute): Likewise.
(cp_parser_oacc_loop): Likewise.
(cp_parser_omp_taskloop): Likewise.
(cp_parser_pragma): Reject OpenMP pragmas in intervening code.
* parser.h (struct cp_parser): Add omp_for_parse_state field.
* pt.cc (tsubst_omp_for_iterator): Adjust call to
cp_convert_omp_range_for.
* semantics.cc (finish_omp_for): Try harder to preserve location
of loop variable init expression for use in diagnostics.
(struct fofb_data, finish_omp_for_block_walker): New.
(finish_omp_for_block): Allow variables to be bound in a BIND_EXPR
nested inside BIND instead of directly in BIND itself.
gcc/testsuite/ChangeLog
* c-c++-common/goacc/tile-2.c: Adjust expected error patterns.
* g++.dg/gomp/attrs-imperfect1.C: New test.
* g++.dg/gomp/attrs-imperfect2.C: New test.
* g++.dg/gomp/attrs-imperfect3.C: New test.
* g++.dg/gomp/attrs-imperfect4.C: New test.
* g++.dg/gomp/attrs-imperfect5.C: New test.
* g++.dg/gomp/pr41967.C: Adjust expected error patterns.
* g++.dg/gomp/tpl-imperfect-gotos.C: New test.
* g++.dg/gomp/tpl-imperfect-invalid-scope.C: New test.
libgomp/ChangeLog
* testsuite/libgomp.c++/attrs-imperfect1.C: New test.
* testsuite/libgomp.c++/attrs-imperfect2.C: New test.
* testsuite/libgomp.c++/attrs-imperfect3.C: New test.
* testsuite/libgomp.c++/attrs-imperfect4.C: New test.
* testsuite/libgomp.c++/attrs-imperfect5.C: New test.
* testsuite/libgomp.c++/attrs-imperfect6.C: New test.
* testsuite/libgomp.c++/imperfect-class-1.C: New test.
* testsuite/libgomp.c++/imperfect-class-2.C: New test.
* testsuite/libgomp.c++/imperfect-class-3.C: New test.
* testsuite/libgomp.c++/imperfect-destructor.C: New test.
* testsuite/libgomp.c++/imperfect-template-1.C: New test.
* testsuite/libgomp.c++/imperfect-template-2.C: New test.
* testsuite/libgomp.c++/imperfect-template-3.C: New test.
The following functions are not standard, and not always available
(e.g., on darwin). They should not be called unless available: gamma,
gammaf, scalb, scalbf, significand, and significandf.
libgomp/ChangeLog:
* testsuite/lib/libgomp.exp: Add effective target.
* testsuite/libgomp.c/simd-math-1.c: Avoid calling nonstandard
functions.
Before commit r12-5295-g47de0b56ee455e, all gimple_build_cond in
expand_omp_for_* were inserted with
gsi_insert_before (gsi_p, cond_stmt, GSI_SAME_STMT);
except the one dealing with the multiplicative factor that was
gsi_insert_after (gsi, cond_stmt, GSI_CONTINUE_LINKING);
That commit for PR103208 fixed the issue of some missing regimplify of
operands of GIMPLE_CONDs by moving the condition handling to the new function
expand_omp_build_cond. While that function has an 'bool after = false'
argument to switch between the two variants.
However, all callers ommited this argument. This commit reinstates the
prior behavior by passing 'true' for the factor != 0 condition, fixing
the included testcase.
PR middle-end/111017
gcc/
* omp-expand.cc (expand_omp_for_init_vars): Pass after=true
to expand_omp_build_cond for 'factor != 0' condition, resulting
in pre-r12-5295-g47de0b56ee455e code for the gimple insert.
libgomp/
* testsuite/libgomp.c-c++-common/non-rect-loop-1.c: New test.
The documentation requires that numa_available() is called and only
when successful, other libnuma function may be called. Internally,
it does a syscall to get_mempolicy with flag=0 (which would return
the default policy if mode were not NULL). If this returns -1 (and
not 0) and errno == ENOSYS, the Linux kernel does not have the
get_mempolicy syscall function; if so, numa_available() returns -1
(otherwise: 0).
libgomp/
PR libgomp/111024
* allocator.c (gomp_init_libnuma): Call numa_available; if
not available or not returning 0, disable libnuma usage.
These are the os support patches we have been grooming and maintaining
for quite a few years over on git.haiku-os.org. All of these
architectures are working and most have been stable for quite some time.
ChangeLog:
* configure: Regenerate.
* configure.ac: Add Haiku to list of ELF OSes
* libtool.m4: Update sys_lib_dlsearch_path_spec on Haiku.
gcc/ChangeLog:
* configure: Regenerate.
libatomic/ChangeLog:
* configure: Regenerate.
libbacktrace/ChangeLog:
* configure: Regenerate.
libcc1/ChangeLog:
* configure: Regenerate.
libffi/ChangeLog:
* configure: Regenerate.
libgfortran/ChangeLog:
* configure: Regenerate.
libgm2/ChangeLog:
* configure: Regenerate.
libgomp/ChangeLog:
* configure: Regenerate.
libitm/ChangeLog:
* configure: Regenerate.
libobjc/ChangeLog:
* configure: Regenerate.
libphobos/ChangeLog:
* configure: Regenerate.
libquadmath/ChangeLog:
* configure: Regenerate.
libsanitizer/ChangeLog:
* configure: Regenerate.
libssp/ChangeLog:
* configure: Regenerate.
libstdc++-v3/ChangeLog:
* configure: Regenerate.
libvtv/ChangeLog:
* configure: Regenerate.
lto-plugin/ChangeLog:
* configure: Regenerate.
zlib/ChangeLog:
* configure: Regenerate.
My previous nm patch handled all cases but one -- if the user set NM in
the environment to a path which contained an option, libtool's nm
detection tries to run nm against a copy of nm with the options in it:
e.g. if NM was set to "nm --blargle", and nm was found in /usr/bin, the
test would try to run "/usr/bin/nm --blargle /usr/bin/nm --blargle".
This is unlikely to be desirable: in this case we should run
"/usr/bin/nm --blargle /usr/bin/nm".
Furthermore, as part of this nm has to detect when the passed-in $NM
contains a path, and in that case avoid doing a path search itself.
This too was thrown off if an option contained something that looked
like a path, e.g. NM="nm -B../prev-gcc"; libtool then tries to run
"nm -B../prev-gcc nm" which rarely works well (and indeed it looks
to see whether that nm exists, finds it doesn't, and wrongly concludes
that nm -p or whatever does not work).
Fix all of these by clipping all options (defined as everything
including and after the first " -") before deciding whether nm
contains a path (but not using the clipped value for anything else),
and then removing all options from the path-modified nm before
looking to see whether that nm existed.
NM=my-nm now does a path search and runs e.g.
/usr/bin/my-nm -B /usr/bin/my-nm
NM=/usr/bin/my-nm now avoids a path search and runs e.g.
/usr/bin/my-nm -B /usr/bin/my-nm
NM="my-nm -p../wombat" now does a path search and runs e.g.
/usr/bin/my-nm -p../wombat -B /usr/bin/my-nm
NM="../prev-binutils/new-nm -B../prev-gcc" now avoids a path search:
../prev-binutils/my-nm -B../prev-gcc -B ../prev-binutils/my-nm
This seems to be all combinations, including those used by GCC bootstrap
(which, before this commit, fails to bootstrap when configured
--with-build-config=bootstrap-lto, because the lto plugin is now using
--export-symbols-regex, which requires libtool to find a working nm,
while also using -B../prev-gcc to point at the lto plugin associated
with the GCC just built.)
Regenerate all affected configure scripts.
ChangeLog:
* libtool.m4 (LT_PATH_NM): Handle user-specified NM with
options, including options containing paths.
gcc/ChangeLog:
* configure: Regenerate.
libatomic/ChangeLog:
* configure: Regenerate.
libbacktrace/ChangeLog:
* configure: Regenerate.
libcc1/ChangeLog:
* configure: Regenerate.
libffi/ChangeLog:
* configure: Regenerate.
libgfortran/ChangeLog:
* configure: Regenerate.
libgm2/ChangeLog:
* configure: Regenerate.
libgomp/ChangeLog:
* configure: Regenerate.
libitm/ChangeLog:
* configure: Regenerate.
libobjc/ChangeLog:
* configure: Regenerate.
libphobos/ChangeLog:
* configure: Regenerate.
libquadmath/ChangeLog:
* configure: Regenerate.
libsanitizer/ChangeLog:
* configure: Regenerate.
libssp/ChangeLog:
* configure: Regenerate.
libstdc++-v3/ChangeLog:
* configure: Regenerate.
libvtv/ChangeLog:
* configure: Regenerate.
lto-plugin/ChangeLog:
* configure: Regenerate.
zlib/ChangeLog:
* configure: Regenerate.
Libtool needs to get BSD-format (or MS-format) output out of the system
nm, so that it can scan generated object files for symbol names for
-export-symbols-regex support. Some nms need specific flags to turn on
BSD-formatted output, so libtool checks for this in its AC_PATH_NM.
Unfortunately the code to do this has a pair of interlocking flaws:
- it runs the test by doing an nm of /dev/null. Some platforms
reasonably refuse to do an nm on a device file, but before now this
has only been worked around by assuming that the error message has a
specific textual form emitted by Tru64 nm, and that getting this
error means this is Tru64 nm and that nm -B would work to produce
BSD-format output, even though the test never actually got anything
but an error message out of nm -B. This is fixable by nm'ing *nm
itself* (since we necessarily have a path to it).
- the test is entirely skipped if NM is set in the environment, on the
grounds that the user has overridden the test: but the user cannot
reasonably be expected to know that libtool wants not only nm but
also flags forcing BSD-format output. Worse yet, one such "user" is
the top-level Cygnus configure script, which neither tests for
nor specifies any BSD-format flags. So platforms needing BSD-format
flags always fail to set them when run in a Cygnus tree, breaking
-export-symbols-regex on such platforms. Libtool also needs to
augment $LD on some platforms, but this is done unconditionally,
augmenting whatever the user specified: the nm check should do the
same.
One wrinkle: if the user has overridden $NM, a path might have been
provided: so we use the user-specified path if there was one, and
otherwise do the path search as usual. (If the nm specified doesn't
work, this might lead to a few extra pointless path searches -- but
the test is going to fail anyway, so that's not a problem.)
(Tested with NM unset, and set to nm, /usr/bin/nm, my-nm where my-nm is a
symlink to /usr/bin/nm on the PATH, and /not-on-the-path/my-nm where
*that* is a symlink to /usr/bin/nm.)
ChangeLog:
* libtool.m4 (LT_PATH_NM): Try BSDization flags with a user-provided
NM, if there is one. Run nm on itself, not on /dev/null, to avoid
errors from nms that refuse to work on non-regular files. Remove
other workarounds for this problem. Strip out blank lines from the
nm output.
fixincludes/ChangeLog:
* configure: Regenerate.
gcc/ChangeLog:
* configure: Regenerate.
libatomic/ChangeLog:
* configure: Regenerate.
libbacktrace/ChangeLog:
* configure: Regenerate.
libcc1/ChangeLog:
* configure: Regenerate.
libffi/ChangeLog:
* configure: Regenerate.
libgfortran/ChangeLog:
* configure: Regenerate.
libgm2/ChangeLog:
* configure: Regenerate.
libgomp/ChangeLog:
* configure: Regenerate.
libitm/ChangeLog:
* configure: Regenerate.
libobjc/ChangeLog:
* configure: Regenerate.
libphobos/ChangeLog:
* configure: Regenerate.
libquadmath/ChangeLog:
* configure: Regenerate.
libsanitizer/ChangeLog:
* configure: Regenerate.
libssp/ChangeLog:
* configure: Regenerate.
libstdc++-v3/ChangeLog:
* configure: Regenerate.
libvtv/ChangeLog:
* configure: Regenerate.
lto-plugin/ChangeLog:
* configure: Regenerate.
zlib/ChangeLog:
* configure: Regenerate.
AR from older binutils doesn't work with --plugin and rc:
[hjl@gnu-cfl-2 bin]$ touch foo.c
[hjl@gnu-cfl-2 bin]$ ar --plugin /usr/libexec/gcc/x86_64-redhat-linux/10/liblto_plugin.so rc libfoo.a foo.c
[hjl@gnu-cfl-2 bin]$ ./ar --plugin /usr/libexec/gcc/x86_64-redhat-linux/10/liblto_plugin.so rc libfoo.a foo.c
./ar: no operation specified
[hjl@gnu-cfl-2 bin]$ ./ar --version
GNU ar (Linux/GNU Binutils) 2.29.51.0.1.20180112
Copyright (C) 2018 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) any later version.
This program has absolutely no warranty.
[hjl@gnu-cfl-2 bin]$
Check if AR works with --plugin and rc before passing --plugin to AR and
RANLIB.
ChangeLog:
* configure: Regenerated.
* libtool.m4 (_LT_CMD_OLD_ARCHIVE): Check if AR works with
--plugin and rc before enabling --plugin.
config/ChangeLog:
* gcc-plugin.m4 (GCC_PLUGIN_OPTION): Check if AR works with
--plugin and rc before enabling --plugin.
gcc/ChangeLog:
* configure: Regenerate.
libatomic/ChangeLog:
* configure: Regenerate.
libbacktrace/ChangeLog:
* configure: Regenerate.
libcc1/ChangeLog:
* configure: Regenerate.
libffi/ChangeLog:
* configure: Regenerate.
libgfortran/ChangeLog:
* configure: Regenerate.
libgm2/ChangeLog:
* configure: Regenerate.
libgomp/ChangeLog:
* configure: Regenerate.
libiberty/ChangeLog:
* configure: Regenerate.
libitm/ChangeLog:
* configure: Regenerate.
libobjc/ChangeLog:
* configure: Regenerate.
libphobos/ChangeLog:
* configure: Regenerate.
libquadmath/ChangeLog:
* configure: Regenerate.
libsanitizer/ChangeLog:
* configure: Regenerate.
libssp/ChangeLog:
* configure: Regenerate.
libstdc++-v3/ChangeLog:
* configure: Regenerate.
libvtv/ChangeLog:
* configure: Regenerate.
lto-plugin/ChangeLog:
* configure: Regenerate.
zlib/ChangeLog:
* configure: Regenerate.
Fixes for commit r14-2792-g25072a477a56a727b369bf9b20f4d18198ff5894
"OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect",
namely:
In that commit, the code was changed to handle shared-memory devices;
however, as pointed out, omp_target_memcpy_check already set the pointer
to NULL in that case. Hence, this commit reverts to the prior version.
In cuda.h, it adds cuMemcpyPeer{,Async} for symmetry for cuMemcpy3DPeer
(all currently unused) and in three structs, fixes reserved-member names
and remove a bogus 'const' in three structs.
And it changes a DLSYM to DLSYM_OPT as not all plugins support the new
functions, yet.
include/ChangeLog:
* cuda/cuda.h (CUDA_MEMCPY2D, CUDA_MEMCPY3D, CUDA_MEMCPY3D_PEER):
Remove bogus 'const' from 'const void *dst' and fix reserved-name
name in those structs.
(cuMemcpyPeer, cuMemcpyPeerAsync): Add.
libgomp/ChangeLog:
* target.c (omp_target_memcpy_rect_worker): Undo dim=1 change for
GOMP_OFFLOAD_CAP_SHARED_MEM.
(omp_target_memcpy_rect_copy): Likewise for lock condition.
(gomp_load_plugin_for_device): Use DLSYM_OPT not DLSYM for
memcpy3d/memcpy2d.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d,
GOMP_OFFLOAD_memcpy3d): Use memset 0 to nullify reserved and
unused src/dst fields for that mem type; remove '{src,dst}LOD = 0'.
When copying a 2D or 3D rectangular memmory block, the performance is
better when using CUDA's cuMemcpy2D/cuMemcpy3D instead of copying the
data one by one. That's what this commit does.
Additionally, it permits device-to-device copies, if neccessary using a
temporary variable on the host.
include/ChangeLog:
* cuda/cuda.h (CUlimit): Add CUDA_ERROR_NOT_INITIALIZED,
CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_INVALID_HANDLE.
(CUarray, CUmemorytype, CUDA_MEMCPY2D, CUDA_MEMCPY3D,
CUDA_MEMCPY3D_PEER): New typdefs.
(cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned,
cuMemcpy3D, cuMemcpy3DAsync, cuMemcpy3DPeer,
cuMemcpy3DPeerAsync): New prototypes.
libgomp/ChangeLog:
* libgomp-plugin.h (GOMP_OFFLOAD_memcpy2d,
GOMP_OFFLOAD_memcpy3d): New prototypes.
* libgomp.h (struct gomp_device_descr): Add memcpy2d_func
and memcpy3d_func.
* libgomp.texi (nvtpx): Document when cuMemcpy2D/cuMemcpy3D is used.
* oacc-host.c (memcpy2d_func, .memcpy3d_func): Init with NULL.
* plugin/cuda-lib.def (cuMemcpy2D, cuMemcpy2DUnaligned,
cuMemcpy3D): Invoke via CUDA_ONE_CALL.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d,
GOMP_OFFLOAD_memcpy3d): New.
* target.c (omp_target_memcpy_rect_worker):
(omp_target_memcpy_rect_check, omp_target_memcpy_rect_copy):
Permit all device-to-device copyies; invoke new plugins for
2D and 3D copying when available.
(gomp_load_plugin_for_device): DLSYM the new plugin functions.
* testsuite/libgomp.c/target-12.c: Fix dimension bug.
* testsuite/libgomp.fortran/target-12.f90: Likewise.
* testsuite/libgomp.fortran/target-memcpy-rect-1.f90: New test.
The previous list of OpenMP routines was rather lengthy and the order seemed
to be rather random - especially for outputs which did not have @menu as then
the sectioning was not visible.
The OpenMP specification split in 5.1 the lengthy list by adding
sections to the chapter and grouping the routines under them.
This patch follow suite and uses the same sections and order. The commit also
prepares for adding not-yet-documented routines by listening those in the
@menu (@c commented - both for just undocumented and for also unimplemented
routines). See also PR 110364.
libgomp/ChangeLog:
* libgomp.texi (OpenMP Runtime Library Routines):
Split long list by adding sections and moving routines there.
(OMP_ALLOCATORS): Fix typo.
Before this commit, gfortran produced with OpenMP for 'do i = 1,10,2'
the code
for (count.0 = 0; count.0 < 5; count.0 = count.0 + 1)
i = count.0 * 2 + 1;
While such an inner loop can be collapsed, a non-rectangular could not.
With this commit and for all constant loop steps, a simple loop such
as 'for (i = 1; i <= 10; i = i + 2)' is created. (Before only for the
constant steps of 1 and -1.)
The constant step permits to know the direction (increasing/decreasing)
that is required for the loop condition.
The new code is only valid if one assumes no overflow of the loop variable.
However, the Fortran standard can be read that this must be ensured by
the user. Namely, the Fortran standard requires (F2023, 10.1.5.2.4):
"The execution of any numeric operation whose result is not defined by
the arithmetic used by the processor is prohibited."
And, for DO loops, F2023's "11.1.7.4.3 The execution cycle" has the
following: The number of loop iterations handled by an iteration count,
which would permit code like 'do i = huge(i)-5, huge(i),4'. However,
in step (3), this count is not only decremented by one but also:
"... The DO variable, if any, is incremented by the value of the
incrementation parameter m3."
And for the example above, 'i' would be 'huge(i)+3' in the last
execution cycle, which exceeds the largest model number and should
render the example as invalid.
PR fortran/107424
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_nonrect_loop_expr): Accept all
constant loop steps.
(gfc_trans_omp_do): Likewise; use sign to determine
loop direction.
libgomp/ChangeLog:
* libgomp.texi (Impl. Status 5.0): Add link to new PR110735.
* testsuite/libgomp.fortran/non-rectangular-loop-1.f90: Enable
commented tests.
* testsuite/libgomp.fortran/non-rectangular-loop-1a.f90: Remove
test file; tests are in non-rectangular-loop-1.f90.
* testsuite/libgomp.fortran/non-rectangular-loop-5.f90: Change
testcase to use a non-constant step to retain the 'sorry' test.
* testsuite/libgomp.fortran/non-rectangular-loop-6.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/linear-2.f90: Update dump to remove
the additional count variable.
The 'uses_allocators' clause to the 'target' construct accepts predefined
allocators and can also be used to define a new allocator for a target region.
As predefined allocators in GCC do not require special handling, those can and
are ignored after parsing, such that this feature now works. On the other hand,
defining a new allocator will fail for now with a 'sorry, unimplemented'.
Note that both the OpenMP 5.0/5.1 and 5.2 syntax for uses_allocators
is supported by this commit.
2023-07-17 Tobias Burnus <tobias@codesoucery.com>
Chung-Lin Tang <cltang@codesourcery.com>
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (show_omp_namelist, show_omp_clauses): Dump
uses_allocators clause.
* gfortran.h (gfc_free_omp_namelist): Add memspace_sym to u union
and traits_sym to u2 union.
(OMP_LIST_USES_ALLOCATORS): New enum value.
(gfc_free_omp_namelist): Add 'bool free_mem_traits_space' arg.
* match.cc (gfc_free_omp_namelist): Likewise.
* openmp.cc (gfc_free_omp_clauses, gfc_match_omp_variable_list,
gfc_match_omp_to_link, gfc_match_omp_doacross_sink,
gfc_match_omp_clause_reduction, gfc_match_omp_allocate,
gfc_match_omp_flush): Update call.
(gfc_match_omp_clauses): Likewise. Parse uses_allocators clause.
(gfc_match_omp_clause_uses_allocators): New.
(enum omp_mask2): Add new OMP_CLAUSE_USES_ALLOCATORS.
(OMP_TARGET_CLAUSES): Accept it.
(resolve_omp_clauses): Resolve uses_allocators clause
* st.cc (gfc_free_statement): Update gfc_free_omp_namelist call.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle
OMP_LIST_USES_ALLOCATORS; fail with sorry unless predefined allocator.
(gfc_split_omp_clauses): Handle uses_allocators.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/uses_allocators_1.f90: New test.
* testsuite/libgomp.fortran/uses_allocators_2.f90: New test.
Co-authored-by: Chung-Lin Tang <cltang@codesourcery.com>
libgomp/
* libgomp.texi (OMP_ALLOCATOR): Document the default values for
the traits. Add crossref to 'Memory allocation'.
(Memory allocation): Refer to OMP_ALLOCATOR for the available
traits and allocators/mem spaces; document the default value
for the pool_size trait.
Follow up to r14-2462-g450b05ce54d3f0. The case that libnuma was not
available at runtime was not properly handled; now it falls back to
the normal malloc.
libgomp/
* allocator.c (omp_init_allocator): Check whether symbol from
dlopened libnuma is available before using libnuma for
allocations.
Some test cases in libgomp testsuite pass -flto as an option, but
the testcases do not require LTO target support. This patch adds
the necessary DejaGNU requirement for LTO support to the testcases..
libgomp/ChangeLog:
* testsuite/libgomp.c++/target-map-class-2.C: Require LTO.
* testsuite/libgomp.c-c++-common/requires-4.c: Require LTO.
* testsuite/libgomp.c-c++-common/requires-4a.c: Require LTO.
Signed-off-by: David Edelsohn <dje.gcc@gmail.com>
libgomp/
* libgomp.texi (OpenMP 5.0): Replace '... stub' by @ref to
'Memory allocation' section which contains the full status.
(TR11): Remove differently worded duplicated entry.
As with the memkind library, it is only used when found at runtime;
it does not need to be present when building GCC.
The included testcase does not check whether the memory has been placed
on the nearest node as the Linux kernel memory handling too often ignores
that hint, using a different node for the allocation. However, when
running with 'numactl --preferred=<node> ./executable', it is clearly
visible that the feature works by comparing malloc/default vs. nearest
placement (using get_mempolicy to obtain the node for a mem addr).
libgomp/ChangeLog:
* allocator.c: Add ifdef for LIBGOMP_USE_LIBNUMA.
(enum gomp_numa_memkind_kind): Renamed from gomp_memkind_kind;
add GOMP_MEMKIND_LIBNUMA.
(struct gomp_libnuma_data, gomp_init_libnuma, gomp_get_libnuma): New.
(omp_init_allocator): Handle partition=nearest with libnuma if avail.
(omp_aligned_alloc, omp_free, omp_aligned_calloc, omp_realloc): Add
numa_alloc_local (+ memset), numa_free, and numa_realloc calls as
needed.
* config/linux/allocator.c (LIBGOMP_USE_LIBNUMA): Define
* libgomp.texi: Fix a typo; use 'fi' instead of its ligature char.
(Memory allocation): Renamed from 'Memory allocation with libmemkind';
updated for libnuma usage.
* testsuite/libgomp.c-c++-common/alloc-11.c: New test.
* testsuite/libgomp.c-c++-common/alloc-12.c: New test.
libgomp/
* allocator.c (omp_init_allocator): Use malloc for
omp_high_bw_mem_space when the memkind lib is unavailable
instead of returning omp_null_allocator.
* libgomp.texi (OpenMP 5.0): Fix typo.
(Memory allocation with libmemkind): Document implementation
in more detail.
Use @var{} instead of @emph{} - for semantic texinfo formatting; the result
is similar: slanted instead of italic in PDF, still italic in HTML, albeit
in info is is now uppercase instead of '_' as pre/suffix.
The patch also documents the newer _ALL/_DEV/_DEV_<no> env var suffixes
and as it refers to the ICV vars and their scope, those were added to the
OMP_ env vars for reference. For OMP_NESTING, a note that those were
deprecated was added plus a bunch of cross references. For OMP_ALLOCATOR,
add note about the lack of per-device env vars support.
A new section, consisting mostly of cross references was added to document
the implementation-defined ICV initialization, especially as OpenMP demands
that implementations document what they do for 'implementation defined'.
For nvptx, the implementation-defined used stack size was documented
libgomp/
* libgomp.texi: Use @var for ICV vars.
(OpenMP Environment Variables): Mention _ALL/_DEV/_DEV_<no> variants,
document which ICV is set and which scope the ICV has; extend/cleanup
some @ref.
(Implementation-defined ICV Initialization): New.
(nvptx): Document the implementation-defined used per-warp stack size.
Depending on the details, the testcase can fail with different but
related messages; all of the following all could be observed for this
testcase:
libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device cannot be used for offloading
libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device not found
libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but only the host device is available
Before, the last two were tested for with 'target offload_device' and
'! offload_device', respectively. Now, all three are accepted by matching
'.*' already after 'but' and without distinguishing whether the effective
target is an offload_device or not.
(For completeness, there is a fourth error that follows this pattern:
'OMP_TARGET_OFFLOAD is set to MANDATORY, but device is finalized'.)
libgomp/
* testsuite/libgomp.c/target-51.c: Accept more error msg variants
as expected dg-output.