Commit graph

1171 commits

Author SHA1 Message Date
Julian Brown
1413af02d6 OpenMP: lvalue parsing for map/to/from clauses (C++)
This patch supports "lvalue" parsing (or "locator list item type" parsing)
for several OpenMP clause types for C++, as required for OpenMP 5.0
and above.

This version has been rebased -- some things have changed around
template handling recently, e.g. removal of build_non_dependent_expr and
tsubst_copy.  A new potential corner-case issue has shown up regarding
implicit mapping of references to pointer to pointers -- an interaction
with the post-review fixes/rework for the patch here:

  https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638602.html

Which fixed the (new) tests baseptrs-[6789].C.  I've noted that for now in
the patch, and adjusted the baseptrs-[46].C tests slightly to accommodate.

2024-01-08  Julian Brown  <julian@codesourcery.com>

gcc/c-family/
	* c-common.h (c_omp_address_inspector): Remove static from get_origin
	and maybe_unconvert_ref methods.
	* c-omp.cc (c_omp_split_clauses): Support OMP_ARRAY_SECTION.
	(c_omp_address_inspector::map_supported_p): Handle OMP_ARRAY_SECTION.
	(c_omp_address_inspector::get_origin): Avoid dereferencing possibly
	NULL type when processing template decls.
	(c_omp_address_inspector::maybe_unconvert_ref): Likewise.

gcc/cp/
	* constexpr.cc (potential_consant_expression_1): Handle
	OMP_ARRAY_SECTION.
	* cp-tree.h (grok_omp_array_section, build_omp_array_section): Add
	prototypes.
	* decl2.cc (grok_omp_array_section): New function.
	* error.cc (dump_expr): Handle OMP_ARRAY_SECTION.
	* parser.cc (cp_parser_new): Initialize parser->omp_array_section_p.
	(cp_parser_statement_expr): Disallow array sections.
	(cp_parser_postfix_open_square_expression): Support OMP_ARRAY_SECTION
	parsing.
	(cp_parser_parenthesized_expression_list, cp_parser_lambda_expression,
	cp_parser_braced_list): Disallow array sections.
	(cp_parser_omp_var_list_no_open): Remove ALLOW_DEREF parameter, add
	MAP_LVALUE in its place.  Support generalised lvalue parsing for
	OpenMP map, to and from clauses.  Use OMP_ARRAY_SECTION
	code instead of TREE_LIST to represent OpenMP array sections.
	(cp_parser_omp_var_list): Remove ALLOW_DEREF parameter, add MAP_LVALUE.
	Pass to cp_parser_omp_var_list_no_open.
	(cp_parser_oacc_data_clause): Update call to cp_parser_omp_var_list.
	(cp_parser_omp_clause_map): Add sk_omp scope around
	cp_parser_omp_var_list_no_open call.
	* parser.h (cp_parser): Add omp_array_section_p field.
	* pt.cc (tsubst, tsubst_copy, tsubst_omp_clause_decl,
	tsubst_copy_and_build): Add OMP_ARRAY_SECTION support.
	* semantics.cc (handle_omp_array_sections_1, handle_omp_array_sections,
	cp_oacc_check_attachments, finish_omp_clauses): Use OMP_ARRAY_SECTION
	instead of TREE_LIST where appropriate.  Handle more types of map
	expression.
	* typeck.cc (build_omp_array_section): New function.

gcc/
	* gimplify.cc (gimplify_expr): Ensure OMP_ARRAY_SECTION has been
	processed out before gimplification.
	* tree-pretty-print.cc (dump_generic_node): Support OMP_ARRAY_SECTION.
	* tree.def (OMP_ARRAY_SECTION): New tree code.

gcc/testsuite/
	* c-c++-common/gomp/map-6.c: Update expected output.
	* c-c++-common/gomp/target-enter-data-1.c: Update scan test.
	* g++.dg/gomp/array-section-1.C: New test.
	* g++.dg/gomp/array-section-2.C: New test.
	* g++.dg/gomp/bad-array-section-1.C: New test.
	* g++.dg/gomp/bad-array-section-2.C: New test.
	* g++.dg/gomp/bad-array-section-3.C: New test.
	* g++.dg/gomp/bad-array-section-4.C: New test.
	* g++.dg/gomp/bad-array-section-5.C: New test.
	* g++.dg/gomp/bad-array-section-6.C: New test.
	* g++.dg/gomp/bad-array-section-7.C: New test.
	* g++.dg/gomp/bad-array-section-8.C: New test.
	* g++.dg/gomp/bad-array-section-9.C: New test.
	* g++.dg/gomp/bad-array-section-10.C: New test.
	* g++.dg/gomp/bad-array-section-11.C: New test.
	* g++.dg/gomp/has_device_addr-non-lvalue-1.C: New test.
	* g++.dg/gomp/pr67522.C: Update expected output.
	* g++.dg/gomp/ind-base-3.C: New test.
	* g++.dg/gomp/map-assignment-1.C: New test.
	* g++.dg/gomp/map-inc-1.C: New test.
	* g++.dg/gomp/map-lvalue-ref-1.C: New test.
	* g++.dg/gomp/map-ptrmem-1.C: New test.
	* g++.dg/gomp/map-ptrmem-2.C: New test.
	* g++.dg/gomp/map-static-cast-lvalue-1.C: New test.
	* g++.dg/gomp/map-ternary-1.C: New test.
	* g++.dg/gomp/member-array-2.C: New test.

libgomp/
	* testsuite/libgomp.c++/baseptrs-4.C: Remove commented-out cases that
	now work.
	* testsuite/libgomp.c++/baseptrs-6.C: New test.
	* testsuite/libgomp.c++/ind-base-1.C: New test.
	* testsuite/libgomp.c++/ind-base-2.C: New test.
	* testsuite/libgomp.c++/lvalue-tofrom-1.C: New test.
	* testsuite/libgomp.c++/lvalue-tofrom-2.C: New test.
	* testsuite/libgomp.c++/map-comma-1.C: New test.
	* testsuite/libgomp.c++/map-rvalue-ref-1.C: New test.
	* testsuite/libgomp.c++/struct-ref-1.C: New test.
	* testsuite/libgomp.c-c++-common/array-field-1.c: New test.
	* testsuite/libgomp.c-c++-common/array-of-struct-1.c: New test.
	* testsuite/libgomp.c-c++-common/array-of-struct-2.c: New test.
2024-01-09 10:08:18 +00:00
Jakub Jelinek
a945c346f5 Update copyright years. 2024-01-03 12:19:35 +01:00
Julian Brown
144c531fe2 OpenMP/OpenACC: Reorganise OMP map clause handling in gimplify.cc
This patch has been separated out from the C++ "declare mapper"
support patch.  It contains just the gimplify.cc rearrangement
work, mostly moving gimplification from gimplify_scan_omp_clauses
to gimplify_adjust_omp_clauses for map clauses.

The motivation for doing this was that we don't know if we need to
instantiate mappers implicitly until the body of an offload region has
been scanned, i.e. in gimplify_adjust_omp_clauses, but we also need the
un-gimplified form of clauses to sort by base-pointer dependencies after
mapper instantiation has taken place.

The patch also reimplements the "present" clause sorting code to avoid
another sorting pass on mapping nodes.

This version of the patch is based on the version posted for og13, and
additionally incorporates a follow-on fix for DECL_VALUE_EXPR handling
in gimplify_adjust_omp_clauses:

"OpenMP/OpenACC: Reorganise OMP map clause handling in gimplify.cc"
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622223.html

Parts of:
"OpenMP: OpenMP 5.2 semantics for pointers with unmapped target"
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623351.html

2023-12-16  Julian Brown  <julian@codesourcery.com>

gcc/
	* gimplify.cc (omp_segregate_mapping_groups): Handle "present" groups.
	(gimplify_scan_omp_clauses): Use mapping group functionality to
	iterate through mapping nodes.  Remove most gimplification of
	OMP_CLAUSE_MAP nodes from here, but still populate ctx->variables
	splay tree.
	(gimplify_adjust_omp_clauses): Move most gimplification of
	OMP_CLAUSE_MAP nodes here.

libgomp/
	* testsuite/libgomp.fortran/target-enter-data-6.f90: Remove XFAIL.
2023-12-21 13:12:12 +00:00
Julian Brown
d7e9ae4fa9 OpenMP, NVPTX: memcpy[23]D bias correction
This patch works around behaviour of the 2D and 3D memcpy operations in
the CUDA driver runtime.  Particularly in Fortran, the "base pointer"
of an array (used for either source or destination of a host/device copy)
may lie outside of data that is actually stored on the device.  The fix
is to make sure that we use the first element of data to be transferred
instead, and adjust parameters accordingly.

2023-10-02  Julian Brown  <julian@codesourcery.com>

libgomp/
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d): Adjust parameters to
	avoid out-of-bounds array checks in CUDA runtime.
	(GOMP_OFFLOAD_memcpy3d): Likewise.
	* testsuite/libgomp.c-c++-common/memcpyxd-bias-1.c: New test.
2023-12-20 21:35:36 +00:00
Jakub Jelinek
000155e8ee libgomp: Make libgomp.c/declare-variant-1.c test x86 specific
As written earlier, this test was written with the x86 specifics in mind
and adding dg-final directives for it for other arches makes it unreadable.
If a declare variant call can be resolved in gimple already as in the
aarch64 or gcn cases, it can be done in gcc.dg/gomp/ and I believe we have
tests like that already, the point of the test is that it is not known
during gimplification time which exact call should be chosen as it depends
on which declare simd clone it will be in.

2023-12-18  Jakub Jelinek  <jakub@redhat.com>

	* testsuite/libgomp.c/declare-variant-1.c: Restrict the test to x86,
	drop because of that unneeded target selector from other directives
	and remove the aarch64 specific ones.
2023-12-18 11:57:39 +01:00
Andre Vieira
863df360fb Fix tests for gomp
This is to fix testisms initially introduced by:
commit f5fc001a84
Author: Andre Vieira <andre.simoesdiasvieira@arm.com>
Date:   Mon Dec 11 14:24:41 2023 +0000

    aarch64: enable mixed-types for aarch64 simdclones

gcc/testsuite/ChangeLog:

	* gcc.dg/gomp/pr87887-1.c: Fixed test.
	* gcc.dg/gomp/pr89246-1.c: Likewise.
	* gcc.dg/gomp/simd-clones-2.c: Likewise.

libgomp/ChangeLog:

	* testsuite/libgomp.c/declare-variant-1.c: Fixed test.
	* testsuite/libgomp.fortran/declare-simd-1.f90: Likewise.
2023-12-15 13:48:08 +00:00
Thomas Schwinge
bc7546e32c In 'libgomp.fortran/map-subarray-5.f90', restrict 'dg-output's to 'target offload_device_nonshared_as'
..., as in 'libgomp.c-c++-common/map-arrayofstruct-{2,3}.c'.

Minor fix-up for commit f5745dc142
"OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic".

	libgomp/
	* testsuite/libgomp.fortran/map-subarray-5.f90: Restrict
	'dg-output's to 'target offload_device_nonshared_as'.
2023-12-15 13:58:53 +01:00
Julian Brown
f5745dc142 OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic
This patch adds support for non-constant component offsets in "map"
clauses for OpenMP (and the equivalants for OpenACC), which are not able
to be sorted into order at compile time.  Normally struct accesses in
such clauses are gathered together and sorted into increasing address
order after a "GOMP_MAP_STRUCT" node: if we have variable indices,
that is no longer possible.

This version of the patch scales back the previously-posted version to
merely add a diagnostic for incorrect usage of component accesses with
variably-indexed arrays of structs: the only permitted variant is where
we have multiple indices that are the same, but we could not prove so
at compile time.  Rather than silently producing the wrong result for
cases where the indices are in fact different, we error out (e.g.,
"map(dtarr(i)%arrptr, dtarr(j)%arrptr(4:8))", for different i/j).

For now, multiple *constant* array indices are still supported (see
map-arrayofstruct-1.c).  That could perhaps be addressed with a follow-up
patch, if necessary.

This version of the patch renumbers the GOMP_MAP_STRUCT_UNORD kind to
avoid clashing with the OpenACC "non-contiguous" dynamic array support
(though that is not yet applied to mainline).

2023-08-18  Julian Brown  <julian@codesourcery.com>

gcc/
	* gimplify.cc (extract_base_bit_offset): Add VARIABLE_OFFSET parameter.
	(omp_get_attachment, omp_group_last, omp_group_base,
	omp_directive_maps_explicitly): Add GOMP_MAP_STRUCT_UNORD support.
	(omp_accumulate_sibling_list): Update calls to extract_base_bit_offset.
	Support GOMP_MAP_STRUCT_UNORD.
	(omp_build_struct_sibling_lists, gimplify_scan_omp_clauses,
	gimplify_adjust_omp_clauses, gimplify_omp_target_update): Add
	GOMP_MAP_STRUCT_UNORD support.
	* omp-low.cc (lower_omp_target): Add GOMP_MAP_STRUCT_UNORD support.
	* tree-pretty-print.cc (dump_omp_clause): Likewise.

include/
	* gomp-constants.h (gomp_map_kind): Add GOMP_MAP_STRUCT_UNORD.

libgomp/
	* oacc-mem.c (find_group_last, goacc_enter_data_internal,
	goacc_exit_data_internal, GOACC_enter_exit_data): Add
	GOMP_MAP_STRUCT_UNORD support.
	* target.c (gomp_map_vars_internal): Add GOMP_MAP_STRUCT_UNORD support.
	Detect incorrect use of variable indexing of arrays of structs.
	(GOMP_target_enter_exit_data, gomp_target_task_fn): Add
	GOMP_MAP_STRUCT_UNORD support.
	* testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c: New test.
	* testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c: New test.
	* testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c: New test.
	* testsuite/libgomp.fortran/map-subarray-5.f90: New test.
2023-12-15 10:33:52 +00:00
Julian Brown
7362543f00 OpenMP: Pointers and member mappings
This patch changes the mapping node arrangement used for array components
of derived types in order to accommodate for changes made in the previous
patch, particularly the use of "GOMP_MAP_ATTACH_DETACH" for pointer-typed
derived-type members instead of "GOMP_MAP_ALWAYS_POINTER".

We change the mapping nodes used for a derived-type mapping like this:

  type T
  integer, pointer, dimension(:) :: arrptr
  end type T

  type(T) :: tvar
  [...]
  !$omp target map(tofrom: tvar%arrptr)

So that the nodes used look like this:

  1) map(to: tvar%arrptr)   -->
  GOMP_MAP_TO [implicit]  *tvar%arrptr%data  (the array data)
  GOMP_MAP_TO_PSET        tvar%arrptr        (the descriptor)
  GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data

  2) map(tofrom: tvar%arrptr(3:8)   -->
  GOMP_MAP_TOFROM         *tvar%arrptr%data(3)  (size 8-3+1, etc.)
  GOMP_MAP_TO_PSET        tvar%arrptr
  GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data      (bias 3, etc.)

In this case, we can determine in the front-end that the
whole-array/pointer mapping (1) is only needed to map the pointer
-- so we drop it entirely.  (Note also that we set -- early -- the
OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P flag for whole-array-via-pointer
mappings. See below.)

In the middle end, we process mappings using the struct sibling-list
handling machinery by moving the "GOMP_MAP_TO_PSET" node from the middle
of the group of three mapping nodes to the proper sorted position after
the GOMP_MAP_STRUCT mapping:

  GOMP_MAP_STRUCT   tvar     (len: 1)
  GOMP_MAP_TO_PSET  tvar%arr (size: 64, etc.)  <--. moved here
  [...]                                           |
  GOMP_MAP_TOFROM         *tvar%arrptr%data(3) ___|
  GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data

In another case, if we have an array of derived-type values "dtarr",
and mappings like:

  i = 1
  j = 1
  map(to: dtarr(i)%arrptr) map(tofrom: dtarr(j)%arrptr(3:8))

We still map the same way, but this time we cannot prove that the base
expressions "dtarr(i) and "dtarr(j)" are the same in the front-end.
So we keep both mappings, but we move the "[implicit]" mapping of the
full-array reference to the end of the clause list in gimplify.cc (by
adjusting the topological sorting algorithm):

  GOMP_MAP_STRUCT         dtvar  (len: 2)
  GOMP_MAP_TO_PSET        dtvar(i)%arrptr
  GOMP_MAP_TO_PSET        dtvar(j)%arrptr
  [...]
  GOMP_MAP_TOFROM         *dtvar(j)%arrptr%data(3)  (size: 8-3+1)
  GOMP_MAP_ATTACH_DETACH  dtvar(j)%arrptr%data
  GOMP_MAP_TO [implicit]  *dtvar(i)%arrptr%data(1)  (size: whole array)
  GOMP_MAP_ATTACH_DETACH  dtvar(i)%arrptr%data

Always moving "[implicit]" full-array mappings after array-section
mappings (without that bit set) means that we'll avoid copying the whole
array unnecessarily -- even in cases where we can't prove that the arrays
are the same.

The patch also fixes some bugs with "enter data" and "exit data"
directives with this new mapping arrangement.  Also now if you have
mappings like this:

  #pragma omp target enter data map(to: dv, dv%arr(1:20))

The whole of the derived-type variable "dv" is mapped, so the
GOMP_MAP_TO_PSET for the array-section mapping can be dropped:

  GOMP_MAP_TO            dv

  GOMP_MAP_TO            *dv%arr%data
  GOMP_MAP_TO_PSET       dv%arr <-- deleted (array section mapping)
  GOMP_MAP_ATTACH_DETACH dv%arr%data

To accommodate for recent changes to mapping nodes made by
Tobias, this version of the patch avoids using GOMP_MAP_TO_PSET
for "exit data" directives, in favour of using the "correct"
GOMP_MAP_RELEASE/GOMP_MAP_DELETE kinds during early expansion.  A new
flag is introduced so the middle-end knows when the latter two kinds
are being used specifically for an array descriptor.

This version of the patch fixes "omp target exit data" handling
for GOMP_MAP_DELETE, and adds pretty-printing dump output
for the OMP_CLAUSE_RELEASE_DESCRIPTOR flag (for a little extra
clarity).

Also I noticed the handling of descriptors on *OpenACC*
exit-data directives was inconsistent, so I've made those use
GOMP_MAP_RELEASE/GOMP_MAP_DELETE with the new flag in the same way as
OpenMP too.  In the end it doesn't actually matter to the runtime,
which handles GOMP_MAP_RELEASE/GOMP_MAP_DELETE/GOMP_MAP_TO_PSET for
array descriptors on OpenACC "exit data" directives the same, anyway,
and doing it this way in the FE avoids needless divergence.

I've added a couple of new tests (gomp/target-enter-exit-data.f90 and
goacc/enter-exit-data-2.f90).

2023-12-07  Julian Brown  <julian@codesourcery.com>

gcc/fortran/
	* dependency.cc (gfc_omp_expr_prefix_same): New function.
	* dependency.h (gfc_omp_expr_prefix_same): Add prototype.
	* gfortran.h (gfc_omp_namelist): Add "duplicate_of" field to "u2"
	union.
	* trans-openmp.cc (dependency.h): Include.
	(gfc_trans_omp_array_section): Adjust mapping node arrangement for
	array descriptors.  Use GOMP_MAP_TO_PSET or
	GOMP_MAP_RELEASE/GOMP_MAP_DELETE with the OMP_CLAUSE_RELEASE_DESCRIPTOR
	flag set.
	(gfc_symbol_rooted_namelist): New function.
	(gfc_trans_omp_clauses): Check subcomponent and subarray/element
	accesses elsewhere in the clause list for pointers to derived types or
	array descriptors, and adjust or drop mapping nodes appropriately.
	Adjust for changes to mapping node arrangement.
	(gfc_trans_oacc_executable_directive): Pass code op through.

gcc/
	* gimplify.cc (omp_map_clause_descriptor_p): New function.
	(build_omp_struct_comp_nodes, omp_get_attachment, omp_group_base): Use
	above function.
	(omp_tsort_mapping_groups): Process nodes that have
	OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P set after those that don't.  Add
	enter_exit_data parameter.
	(omp_resolve_clause_dependencies): Remove GOMP_MAP_TO_PSET mappings if
	we're mapping the whole containing derived-type variable.
	(omp_accumulate_sibling_list): Adjust GOMP_MAP_TO_PSET handling.
	Remove GOMP_MAP_ALWAYS_POINTER handling.
	(gimplify_scan_omp_clauses): Pass enter_exit argument to
	omp_tsort_mapping_groups.  Don't adjust/remove GOMP_MAP_TO_PSET
	mappings for derived-type components here.
	* tree.h (OMP_CLAUSE_RELEASE_DESCRIPTOR): New macro.
	* tree-pretty-print.cc (dump_omp_clause): Show
	OMP_CLAUSE_RELEASE_DESCRIPTOR in dump output (with
	GOMP_MAP_TO_PSET-like syntax).

gcc/testsuite/
	* gfortran.dg/goacc/enter-exit-data-2.f90: New test.
	* gfortran.dg/goacc/finalize-1.f: Adjust scan output.
	* gfortran.dg/gomp/map-9.f90: Adjust scan output.
	* gfortran.dg/gomp/map-subarray-2.f90: New test.
	* gfortran.dg/gomp/map-subarray.f90: New test.
	* gfortran.dg/gomp/target-enter-exit-data.f90: New test.

libgomp/
	* testsuite/libgomp.fortran/map-subarray.f90: New test.
	* testsuite/libgomp.fortran/map-subarray-2.f90: New test.
	* testsuite/libgomp.fortran/map-subarray-3.f90: New test.
	* testsuite/libgomp.fortran/map-subarray-4.f90: New test.
	* testsuite/libgomp.fortran/map-subarray-6.f90: New test.
	* testsuite/libgomp.fortran/map-subarray-7.f90: New test.
	* testsuite/libgomp.fortran/map-subarray-8.f90: New test.
	* testsuite/libgomp.fortran/map-subcomponents.f90: New test.
	* testsuite/libgomp.fortran/struct-elem-map-1.f90: Adjust for
	descriptor-mapping changes.  Remove XFAIL.
2023-12-13 20:30:49 +00:00
Julian Brown
5fdb150cd4 OpenMP/OpenACC: Rework clause expansion and nested struct handling
This patch reworks clause expansion in the C, C++ and (to a lesser
extent) Fortran front ends for OpenMP and OpenACC mapping nodes used in
GPU offloading support.

At present a single clause may be turned into several mapping nodes,
or have its mapping type changed, in several places scattered through
the front- and middle-end.  The analysis relating to which particular
transformations are needed for some given expression has become quite hard
to follow.  Briefly, we manipulate clause types in the following places:

 1. During parsing, in c_omp_adjust_map_clauses.  Depending on a set of
    rules, we may change a FIRSTPRIVATE_POINTER (etc.) mapping into
    ATTACH_DETACH, or mark the decl addressable.

 2. In semantics.cc or c-typeck.cc, clauses are expanded in
    handle_omp_array_sections (called via {c_}finish_omp_clauses, or in
    finish_omp_clauses itself.  The two cases are for processing array
    sections (the former), or non-array sections (the latter).

 3. In gimplify.cc, we build sibling lists for struct accesses, which
    groups and sorts accesses along with their struct base, creating
    new ALLOC/RELEASE nodes for pointers.

 4. In gimplify.cc:gimplify_adjust_omp_clauses, mapping nodes may be
    adjusted or created.

This patch doesn't completely disrupt this scheme, though clause
types are no longer adjusted in c_omp_adjust_map_clauses (step 1).
Clause expansion in step 2 (for C and C++) now uses a single, unified
mechanism, parts of which are also reused for analysis in step 3.

Rather than the kind-of "ad-hoc" pattern matching on addresses used to
expand clauses used at present, a new method for analysing addresses is
introduced.  This does a recursive-descent tree walk on expression nodes,
and emits a vector of tokens describing each "part" of the address.
This tokenized address can then be translated directly into mapping nodes,
with the assurance that no part of the expression has been inadvertently
skipped or misinterpreted.  In this way, all the variations of ways
pointers, arrays, references and component accesses might be combined
can be teased apart into easily-understood cases - and we know we've
"parsed" the whole address before we start analysis, so the right code
paths can easily be selected.

For example, a simple access "arr[idx]" might parse as:

  base-decl access-indexed-array

or "mystruct->foo[x]" with a pointer "foo" component might parse as:

  base-decl access-pointer component-selector access-pointer

A key observation is that support for "array" bases, e.g. accesses
whose root nodes are not structures, but describe scalars or arrays,
and also *one-level deep* structure accesses, have first-class support
in gimplify and beyond.  Expressions that use deeper struct accesses
or e.g. multiple indirections were more problematic: some cases worked,
but lots of cases didn't.  This patch reimplements the support for those
in gimplify.cc, again using the new "address tokenization" support.

An expression like "mystruct->foo->bar[0:10]" used in a mapping node will
translate the right-hand access directly in the front-end.  The base for
the access will be "mystruct->foo".  This is handled recursively in
gimplify.cc -- there may be several accesses of "mystruct"'s members
on the same directive, so the sibling-list building machinery can be
used again.  (This was already being done for OpenACC, but the new
implementation differs somewhat in details, and is more robust.)

For OpenMP, in the case where the base pointer itself,
i.e. "mystruct->foo" here, is NOT mapped on the same directive, we
create a "fragile" mapping.  This turns the "foo" component access
into a zero-length allocation (which is a new feature for the runtime,
so support has been added there too).

A couple of changes have been made to how mapping clauses are turned
into mapping nodes:

The first change is based on the observation that it is probably never
correct to use GOMP_MAP_ALWAYS_POINTER for component accesses (e.g. for
references), because if the containing struct is already mapped on the
target then the host version of the pointer in question will be corrupted
if the struct is copied back from the target.  This patch removes all
such uses, across each of C, C++ and Fortran.

The second change is to the way that GOMP_MAP_ATTACH_DETACH nodes
are processed during sibling-list creation.  For OpenMP, for pointer
components, we must map the base pointer separately from an array section
that uses the base pointer, so e.g. we must have both "map(mystruct.base)"
and "map(mystruct.base[0:10])" mappings.  These create nodes such as:

  GOMP_MAP_TOFROM mystruct.base
  G_M_TOFROM *mystruct.base [len: 10*elemsize] G_M_ATTACH_DETACH mystruct.base

Instead of using the first of these directly when building the struct
sibling list then skipping the group using GOMP_MAP_ATTACH_DETACH,
leading to:

  GOMP_MAP_STRUCT mystruct [len: 1] GOMP_MAP_TOFROM mystruct.base

we now introduce a new "mini-pass", omp_resolve_clause_dependencies, that
drops the GOMP_MAP_TOFROM for the base pointer, marks the second group
as having had a base-pointer mapping, then omp_build_struct_sibling_lists
can create:

  GOMP_MAP_STRUCT mystruct [len: 1] GOMP_MAP_ALLOC mystruct.base [len: ptrsize]

This ends up working better in many cases, particularly those involving
references.  (The "alloc" space is immediately overwritten by a pointer
attachment, so this is mildly more efficient than a redundant TO mapping
at runtime also.)

There is support in the address tokenizer for "arbitrary" base expressions
which aren't rooted at a decl, but that is not used as present because
such addresses are disallowed at parse time.

In the front-ends, the address tokenization machinery is mostly only
used for clause expansion and not for diagnostics at present.  It could
be used for those too, which would allow more of my previous "address
inspector" implementation to be removed.

The new bits in gimplify.cc work with OpenACC also.

This version of the patch addresses several first-pass review comments
from Tobias, and fixes a few previously-missed cases for manually-managed
ragged array mappings (including cases using references).  Some arbitrary
differences between handling of clause expansion for C vs. C++ have also
been fixed, and some fragments from later in the patch series have been
moved forward (where they were useful for fixing bugs).  Several new
test cases have been added.

2023-11-29  Julian Brown  <julian@codesourcery.com>

gcc/c-family/
	* c-common.h (c_omp_region_type): Add C_ORT_EXIT_DATA,
	C_ORT_OMP_EXIT_DATA and C_ORT_ACC_TARGET.
	(omp_addr_token): Add forward declaration.
	(c_omp_address_inspector): New class.
	* c-omp.cc (c_omp_adjust_map_clauses): Mark decls addressable here, but
	do not change any mapping node types.
	(c_omp_address_inspector::unconverted_ref_origin,
	c_omp_address_inspector::component_access_p,
	c_omp_address_inspector::check_clause,
	c_omp_address_inspector::get_root_term,
	c_omp_address_inspector::map_supported_p,
	c_omp_address_inspector::get_origin,
	c_omp_address_inspector::maybe_unconvert_ref,
	c_omp_address_inspector::maybe_zero_length_array_section,
	c_omp_address_inspector::expand_array_base,
	c_omp_address_inspector::expand_component_selector,
	c_omp_address_inspector::expand_map_clause): New methods.
	(omp_expand_access_chain): New function.

gcc/c/
	* c-parser.cc (c_parser_oacc_all_clauses): Add TARGET_P parameter. Use
	to select region type for c_finish_omp_clauses call.
	(c_parser_oacc_loop): Update calls to c_parser_oacc_all_clauses.
	(c_parser_oacc_compute): Likewise.
	(c_parser_omp_target_data, c_parser_omp_target_enter_data): Support
	ATTACH kind.
	(c_parser_omp_target_exit_data): Support DETACH kind.
	(check_clauses): Handle GOMP_MAP_POINTER and GOMP_MAP_ATTACH here.
	* c-typeck.cc (handle_omp_array_sections_1,
	handle_omp_array_sections, c_finish_omp_clauses): Use
	c_omp_address_inspector class and OMP address tokenizer to analyze and
	expand map clause expressions.  Fix some diagnostics.  Fix "is OpenACC"
	condition for C_ORT_ACC_TARGET addition.

gcc/cp/
	* parser.cc (cp_parser_oacc_all_clauses): Add TARGET_P parameter. Use
	to select region type for finish_omp_clauses call.
	(cp_parser_omp_target_data, cp_parser_omp_target_enter_data): Support
	GOMP_MAP_ATTACH kind.
	(cp_parser_omp_target_exit_data): Support GOMP_MAP_DETACH kind.
	(cp_parser_oacc_declare): Update call to cp_parser_oacc_all_clauses.
	(cp_parser_oacc_loop): Update calls to cp_parser_oacc_all_clauses.
	(cp_parser_oacc_compute): Likewise.
	* pt.cc (tsubst_expr): Use C_ORT_ACC_TARGET for call to
	tsubst_omp_clauses for OpenACC compute regions.
	* semantics.cc (cp_omp_address_inspector): New class, derived from
	c_omp_address_inspector.
	(handle_omp_array_sections_1, handle_omp_array_sections,
	finish_omp_clauses): Use cp_omp_address_inspector class and OMP address
	tokenizer to analyze and expand OpenMP map clause expressions.  Fix
	some diagnostics.  Support C_ORT_ACC_TARGET.
	(finish_omp_target): Handle GOMP_MAP_POINTER.

gcc/fortran/
	* trans-openmp.cc (gfc_trans_omp_array_section): Add OPENMP parameter.
	Use GOMP_MAP_ATTACH_DETACH instead of GOMP_MAP_ALWAYS_POINTER for
	derived type components.
	(gfc_trans_omp_clauses): Update calls to gfc_trans_omp_array_section.

gcc/
	* gimplify.cc (build_struct_comp_nodes): Don't process
	GOMP_MAP_ATTACH_DETACH "middle" nodes here.
	(omp_mapping_group): Add REPROCESS_STRUCT and FRAGILE booleans for
	nested struct handling.
	(omp_strip_components_and_deref, omp_strip_indirections): Remove
	functions.
	(omp_get_attachment): Handle GOMP_MAP_DETACH here.
	(omp_group_last): Handle GOMP_MAP_*, GOMP_MAP_DETACH,
	GOMP_MAP_ATTACH_DETACH groups for "exit data" of reference-to-pointer
	component array sections.
	(omp_gather_mapping_groups_1): Initialise reprocess_struct and fragile
	fields.
	(omp_group_base): Handle GOMP_MAP_ATTACH_DETACH after GOMP_MAP_STRUCT.
	(omp_index_mapping_groups_1): Skip reprocess_struct groups.
	(omp_get_nonfirstprivate_group, omp_directive_maps_explicitly,
	omp_resolve_clause_dependencies, omp_first_chained_access_token): New
	functions.
	(omp_check_mapping_compatibility): Adjust accepted node combinations
	for "from" clauses using release instead of alloc.
	(omp_accumulate_sibling_list): Add GROUP_MAP, ADDR_TOKENS, FRAGILE_P,
	REPROCESSING_STRUCT, ADDED_TAIL parameters.  Use OMP address tokenizer
	to analyze addresses.  Reimplement nested struct handling, and
	implement "fragile groups".
	(omp_build_struct_sibling_lists): Adjust for changes to
	omp_accumulate_sibling_list.  Recalculate bias for ATTACH_DETACH nodes
	after GOMP_MAP_STRUCT nodes.
	(gimplify_scan_omp_clauses): Call omp_resolve_clause_dependencies.  Use
	OMP address tokenizer.
	(gimplify_adjust_omp_clauses_1): Use build_fold_indirect_ref_loc
	instead of build_simple_mem_ref_loc.
	* omp-general.cc (omp-general.h, tree-pretty-print.h): Include.
	(omp_addr_tokenizer): New namespace.
	(omp_addr_tokenizer::omp_addr_token): New.
	(omp_addr_tokenizer::omp_parse_component_selector,
	omp_addr_tokenizer::omp_parse_ref,
	omp_addr_tokenizer::omp_parse_pointer,
	omp_addr_tokenizer::omp_parse_access_method,
	omp_addr_tokenizer::omp_parse_access_methods,
	omp_addr_tokenizer::omp_parse_structure_base,
	omp_addr_tokenizer::omp_parse_structured_expr,
	omp_addr_tokenizer::omp_parse_array_expr,
	omp_addr_tokenizer::omp_access_chain_p,
	omp_addr_tokenizer::omp_accessed_addr): New functions.
	(omp_parse_expr, debug_omp_tokenized_addr): New functions.
	* omp-general.h (omp_addr_tokenizer::access_method_kinds,
	omp_addr_tokenizer::structure_base_kinds,
	omp_addr_tokenizer::token_type,
	omp_addr_tokenizer::omp_addr_token,
	omp_addr_tokenizer::omp_access_chain_p,
	omp_addr_tokenizer::omp_accessed_addr): New.
	(omp_addr_token, omp_parse_expr): New.
	* omp-low.cc (scan_sharing_clauses): Skip error check for references
	to pointers.
	* tree.h (OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED): New macro.

gcc/testsuite/
	* c-c++-common/gomp/clauses-2.c: Fix error output.
	* c-c++-common/gomp/target-implicit-map-2.c: Adjust scan output.
	* c-c++-common/gomp/target-50.c: Adjust scan output.
	* c-c++-common/gomp/target-enter-data-1.c: Adjust scan output.
	* g++.dg/gomp/static-component-1.C: New test.
	* gcc.dg/gomp/target-3.c: Adjust scan output.
	* gfortran.dg/gomp/map-9.f90: Adjust scan output.

libgomp/
	* target.c (gomp_map_pointer): Modify zero-length array section
	pointer handling.
	(gomp_attach_pointer): Likewise.
	(gomp_map_fields_existing): Use gomp_map_0len_lookup.
	(gomp_attach_pointer): Allow attaching null pointers (or Fortran
	"unassociated" pointers).
	(gomp_map_vars_internal): Handle zero-sized struct members.  Add
	diagnostic for unmapped struct pointer members.
	* testsuite/libgomp.c-c++-common/baseptrs-1.c: New test.
	* testsuite/libgomp.c-c++-common/baseptrs-2.c: New test.
	* testsuite/libgomp.c-c++-common/baseptrs-6.c: New test.
	* testsuite/libgomp.c-c++-common/baseptrs-7.c: New test.
	* testsuite/libgomp.c-c++-common/ptr-attach-2.c: New test.
	* testsuite/libgomp.c-c++-common/target-implicit-map-2.c: Fix missing
	"free".
	* testsuite/libgomp.c-c++-common/target-implicit-map-5.c: New test.
	* testsuite/libgomp.c-c++-common/target-map-zlas-1.c: New test.
	* testsuite/libgomp.c++/class-array-1.C: New test.
	* testsuite/libgomp.c++/baseptrs-3.C: New test.
	* testsuite/libgomp.c++/baseptrs-4.C: New test.
	* testsuite/libgomp.c++/baseptrs-5.C: New test.
	* testsuite/libgomp.c++/baseptrs-8.C: New test.
	* testsuite/libgomp.c++/baseptrs-9.C: New test.
	* testsuite/libgomp.c++/ref-mapping-1.C: New test.
	* testsuite/libgomp.c++/target-48.C: New test.
	* testsuite/libgomp.c++/target-49.C: New test.
	* testsuite/libgomp.c++/target-exit-data-reftoptr-1.C: New test.
	* testsuite/libgomp.c++/target-lambda-1.C: Update for OpenMP 5.2
	semantics.
	* testsuite/libgomp.c++/target-this-3.C: Likewise.
	* testsuite/libgomp.c++/target-this-4.C: Likewise.
	* testsuite/libgomp.fortran/struct-elem-map-1.f90: Add temporary XFAIL.
	* testsuite/libgomp.fortran/target-enter-data-6.f90: Likewise.
2023-12-13 20:30:49 +00:00
Andrew Stubbs
348874f0ba libgomp: basic pinned memory on Linux
Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall.  Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.

This implementation will work OK for page-scale allocations, and finer-grained
allocations will be implemented in a future patch.

libgomp/ChangeLog:

	* allocator.c (MEMSPACE_ALLOC): Add PIN.
	(MEMSPACE_CALLOC): Add PIN.
	(MEMSPACE_REALLOC): Add PIN.
	(MEMSPACE_FREE): Add PIN.
	(MEMSPACE_VALIDATE): Add PIN.
	(omp_init_allocator): Use MEMSPACE_VALIDATE to check pinning.
	(omp_aligned_alloc): Add pinning to all MEMSPACE_* calls.
	(omp_aligned_calloc): Likewise.
	(omp_realloc): Likewise.
	(omp_free): Likewise.
	* config/linux/allocator.c: New file.
	* config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN.
	(MEMSPACE_CALLOC): Add PIN.
	(MEMSPACE_REALLOC): Add PIN.
	(MEMSPACE_FREE): Add PIN.
	(MEMSPACE_VALIDATE): Add PIN.
	* config/gcn/allocator.c (MEMSPACE_ALLOC): Add PIN.
	(MEMSPACE_CALLOC): Add PIN.
	(MEMSPACE_REALLOC): Add PIN.
	(MEMSPACE_FREE): Add PIN.
	* libgomp.texi: Switch pinned trait to supported.
	(MEMSPACE_VALIDATE): Add PIN.
	* testsuite/libgomp.c/alloc-pinned-1.c: New test.
	* testsuite/libgomp.c/alloc-pinned-2.c: New test.
	* testsuite/libgomp.c/alloc-pinned-3.c: New test.
	* testsuite/libgomp.c/alloc-pinned-4.c: New test.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2023-12-13 14:27:07 +00:00
Lipeng Zhu
b806c88fab libgfortran: Replace mutex with rwlock
This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most instances,
we can get the unit in the phase of reading the unit_cache or unit_root
tree. So split the read/write phase by rwlock would be an approach to
make it more parallel.

BTW, the IPC metrics can gain around 9x in our test
server with 220 cores. The benchmark we used is
https://github.com/rwesson/NEAT

libgcc/ChangeLog:

	* gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
	(__gthrw): New function.
	(__gthread_rwlock_rdlock): New function.
	(__gthread_rwlock_tryrdlock): New function.
	(__gthread_rwlock_wrlock): New function.
	(__gthread_rwlock_trywrlock): New function.
	(__gthread_rwlock_unlock): New function.

libgfortran/ChangeLog:

	* io/async.c (DEBUG_LINE): New macro.
	* io/async.h (RWLOCK_DEBUG_ADD): New macro.
	(CHECK_RDLOCK): New macro.
	(CHECK_WRLOCK): New macro.
	(TAIL_RWLOCK_DEBUG_QUEUE): New macro.
	(IN_RWLOCK_DEBUG_QUEUE): New macro.
	(RDLOCK): New macro.
	(WRLOCK): New macro.
	(RWUNLOCK): New macro.
	(RD_TO_WRLOCK): New macro.
	(INTERN_RDLOCK): New macro.
	(INTERN_WRLOCK): New macro.
	(INTERN_RWUNLOCK): New macro.
	* io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
	a comment.
	(unit_lock): Remove including associated internal_proto.
	(unit_rwlock): New declarations including associated internal_proto.
	(dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
	instead of __gthread_mutex_lock and __gthread_mutex_unlock on
	unit_lock.
	* io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
	unit_rwlock instead of LOCK and UNLOCK on unit_lock.
	(st_write_done_worker): Likewise.
	* io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
	comment. Use unit_rwlock variable instead of unit_lock variable.
	(get_gfc_unit_from_unit_root): New function.
	(get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
	instead of LOCK and UNLOCK on unit_lock.
	(close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
	LOCK and UNLOCK on unit_lock.
	(close_units): Likewise.
	(newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
	unit_lock.
	* io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
	instead of LOCK and UNLOCK on unit_lock.
	(flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
	of LOCK and UNLOCK on unit_lock.
2023-12-11 09:43:59 -08:00
Andre Vieira
f5fc001a84 aarch64: enable mixed-types for aarch64 simdclones
This patch enables the use of mixed-types for simd clones for AArch64, adds
aarch64 as a target_vect_simd_clones and corrects the way the simdlen is chosen
for non-specified simdlen clauses according to the 'Vector Function Application
Binary Interface Specification for AArch64'.

Additionally this patch also restricts combinations of simdlen and
return/argument types that map to vectors larger than 128 bits as we currently
do not have a way to represent these types in a way that is consistent
internally and externally.

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (lane_size): New function.
	(aarch64_simd_clone_compute_vecsize_and_simdlen): Determine simdlen according to NDS rule
	and reject combination of simdlen and types that lead to vectors larger than 128bits.

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp: Add aarch64 targets to vect_simd_clones.
	* c-c++-common/gomp/declare-variant-14.c: Adapt test for aarch64.
	* c-c++-common/gomp/pr60823-1.c: Likewise.
	* c-c++-common/gomp/pr60823-2.c: Likewise.
	* c-c++-common/gomp/pr60823-3.c: Likewise.
	* g++.dg/gomp/attrs-10.C: Likewise.
	* g++.dg/gomp/declare-simd-1.C: Likewise.
	* g++.dg/gomp/declare-simd-3.C: Likewise.
	* g++.dg/gomp/declare-simd-4.C: Likewise.
	* g++.dg/gomp/declare-simd-7.C: Likewise.
	* g++.dg/gomp/declare-simd-8.C: Likewise.
	* g++.dg/gomp/pr88182.C: Likewise.
	* gcc.dg/declare-simd.c: Likewise.
	* gcc.dg/gomp/declare-simd-1.c: Likewise.
	* gcc.dg/gomp/declare-simd-3.c: Likewise.
	* gcc.dg/gomp/pr87887-1.c: Likewise.
	* gcc.dg/gomp/pr87895-1.c: Likewise.
	* gcc.dg/gomp/pr89246-1.c: Likewise.
	* gcc.dg/gomp/pr99542.c: Likewise.
	* gcc.dg/gomp/simd-clones-2.c: Likewise.
	* gcc.dg/vect/vect-simd-clone-1.c: Likewise.
	* gcc.dg/vect/vect-simd-clone-2.c: Likewise.
	* gcc.dg/vect/vect-simd-clone-4.c: Likewise.
	* gcc.dg/vect/vect-simd-clone-5.c: Likewise.
	* gcc.dg/vect/vect-simd-clone-6.c: Likewise.
	* gcc.dg/vect/vect-simd-clone-7.c: Likewise.
	* gcc.dg/vect/vect-simd-clone-8.c: Likewise.
	* gfortran.dg/gomp/declare-simd-2.f90: Likewise.
	* gfortran.dg/gomp/declare-simd-coarray-lib.f90: Likewise.
	* gfortran.dg/gomp/declare-variant-14.f90: Likewise.
	* gfortran.dg/gomp/pr79154-1.f90: Likewise.
	* gfortran.dg/gomp/pr83977.f90: Likewise.

libgomp/ChangeLog:

	* testsuite/libgomp.c/declare-variant-1.c: Adapt test for aarch64.
	* testsuite/libgomp.fortran/declare-simd-1.f90: Likewise.
2023-12-11 14:51:14 +00:00
Tobias Burnus
d4b6d14792 OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables
This commit adds -fopenmp-allocators which enables support for
'omp allocators' and 'omp allocate' that are associated with a Fortran
allocate-stmt. If such a construct is encountered, an error is shown,
unless the -fopenmp-allocators flag is present.

With -fopenmp -fopenmp-allocators, those constructs get turned into
GOMP_alloc allocations, while -fopenmp-allocators (also without -fopenmp)
ensures deallocation and reallocation (via intrinsic assignments) are
properly directed to GOMP_free/omp_realloc - while normal Fortran
allocations are processed by free/realloc.

In order to distinguish a 'malloc'ed from a 'GOMP_alloc'ed memory, the
version field of the Fortran array discriptor is (mis)used: 0 indicates
the normal Fortran allocation while 1 denotes GOMP_alloc. For scalars,
there is record keeping in libgomp: GOMP_add_alloc(ptr) will add the
pointer address to a splay_tree while GOMP_is_alloc(ptr) will return
true it was previously added but also removes it from the list.

Besides Fortran FE work, BUILT_IN_GOMP_REALLOC is no part of
omp-builtins.def and libgomp gains the mentioned two new function.

gcc/ChangeLog:

	* builtin-types.def (BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE): New.
	* omp-builtins.def (BUILT_IN_GOMP_REALLOC): New.
	* builtins.cc (builtin_fnspec): Handle it.
	* gimple-ssa-warn-access.cc (fndecl_alloc_p,
	matching_alloc_calls_p): Likewise.
	* gimple.cc (nonfreeing_call_p): Likewise.
	* predict.cc (expr_expected_value_1): Likewise.
	* tree-ssa-ccp.cc (evaluate_stmt): Likewise.
	* tree.cc (fndecl_dealloc_argno): Likewise.

gcc/fortran/ChangeLog:

	* dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_ALLOCATE
	and EXEC_OMP_ALLOCATORS.
	* f95-lang.cc (ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST):
	Add 'ECF_LEAF | ECF_MALLOC' to existing 'ECF_NOTHROW'.
	(ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LEAF_LIST): Define.
	* gfortran.h (gfc_omp_clauses): Add contained_in_target_construct.
	* invoke.texi (-fopenacc, -fopenmp): Update based on C version.
	(-fopenmp-simd): New, based on C version.
	(-fopenmp-allocators): New.
	* lang.opt (fopenmp-allocators): Add.
	* openmp.cc (resolve_omp_clauses): For allocators/allocate directive,
	add target and no dynamic_allocators diagnostic and more invalid
	diagnostic.
	* parse.cc (decode_omp_directive): Set contains_teams_construct.
	* trans-array.h (gfc_array_allocate): Update prototype.
	(gfc_conv_descriptor_version): New prototype.
	* trans-decl.cc (gfc_init_default_dt): Fix comment.
	* trans-array.cc (gfc_conv_descriptor_version): New.
	(gfc_array_allocate): Support GOMP_alloc allocation.
	(gfc_alloc_allocatable_for_assignment, structure_alloc_comps):
	Handle GOMP_free/omp_realloc as needed.
	* trans-expr.cc (gfc_conv_procedure_call): Likewise.
	(alloc_scalar_allocatable_for_assignment): Likewise.
	* trans-intrinsic.cc (conv_intrinsic_move_alloc): Likewise.
	* trans-openmp.cc (gfc_trans_omp_allocators,
	gfc_trans_omp_directive): Handle allocators/allocate directive.
	(gfc_omp_call_add_alloc, gfc_omp_call_is_alloc): New.
	* trans-stmt.h (gfc_trans_allocate): Update prototype.
	* trans-stmt.cc (gfc_trans_allocate): Support GOMP_alloc.
	* trans-types.cc (gfc_get_dtype_rank_type): Set version field.
	* trans.cc (gfc_allocate_using_malloc, gfc_allocate_allocatable):
	Update to handle GOMP_alloc.
	(gfc_deallocate_with_status, gfc_deallocate_scalar_with_status):
	Handle GOMP_free.
	(trans_code): Update call.
	* trans.h (gfc_allocate_allocatable, gfc_allocate_using_malloc):
	Update prototype.
	(gfc_omp_call_add_alloc, gfc_omp_call_is_alloc): New prototype.
	* types.def (BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE): New.

libgomp/ChangeLog:

	* allocator.c (struct fort_alloc_splay_tree_key_s,
	fort_alloc_splay_compare, GOMP_add_alloc, GOMP_is_alloc): New.
	* libgomp.h: Define splay_tree_static for 'reverse' splay tree.
	* libgomp.map (GOMP_5.1.2): New; add GOMP_add_alloc and
	GOMP_is_alloc; move GOMP_target_map_indirect_ptr from ...
	(GOMP_5.1.1): ... here.
	* libgomp.texi (Impl. Status, Memory management): Update for
	allocators/allocate directives.
	* splay-tree.c: Handle splay_tree_static define to declare all
	functions as static.
	(splay_tree_lookup_node): New.
	* splay-tree.h: Handle splay_tree_decl_only define.
	(splay_tree_lookup_node): New prototype.
	* target.c: Define splay_tree_static for 'reverse'.
	* testsuite/libgomp.fortran/allocators-1.f90: New test.
	* testsuite/libgomp.fortran/allocators-2.f90: New test.
	* testsuite/libgomp.fortran/allocators-3.f90: New test.
	* testsuite/libgomp.fortran/allocators-4.f90: New test.
	* testsuite/libgomp.fortran/allocators-5.f90: New test.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/allocate-14.f90: Add coarray and
	not-listed tests.
	* gfortran.dg/gomp/allocate-5.f90: Remove sorry dg-message.
	* gfortran.dg/bind_c_array_params_2.f90: Update expected
	dump for dtype '.version=0'.
	* gfortran.dg/gomp/allocate-16.f90: New test.
	* gfortran.dg/gomp/allocators-3.f90: New test.
	* gfortran.dg/gomp/allocators-4.f90: New test.
2023-12-08 15:18:25 +01:00
Andrew Stubbs
e7d6c277fa amdgcn, libgomp: low-latency allocator
This implements the OpenMP low-latency memory allocator for AMD GCN using the
small per-team LDS memory (Local Data Store).

Since addresses can now refer to LDS space, the "Global" address space is
no-longer compatible.  This patch therefore switches the backend to use
entirely "Flat" addressing (which supports both memories).  A future patch
will re-enable "global" instructions for cases where it is known to be safe
to do so.

gcc/ChangeLog:

	* config/gcn/gcn-builtins.def (DISPATCH_PTR): New built-in.
	* config/gcn/gcn.cc (gcn_init_machine_status): Disable global
	addressing.
	(gcn_expand_builtin_1): Implement GCN_BUILTIN_DISPATCH_PTR.

libgomp/ChangeLog:

	* config/gcn/libgomp-gcn.h (TEAM_ARENA_START): Move to here.
	(TEAM_ARENA_FREE): Likewise.
	(TEAM_ARENA_END): Likewise.
	(GCN_LOWLAT_HEAP): New.
	* config/gcn/team.c (LITTLEENDIAN_CPU): New, and import hsa.h.
	(__gcn_lowlat_init): New prototype.
	(gomp_gcn_enter_kernel): Initialize the low-latency heap.
	* libgomp.h (TEAM_ARENA_START): Move to libgomp.h.
	(TEAM_ARENA_FREE): Likewise.
	(TEAM_ARENA_END): Likewise.
	* plugin/plugin-gcn.c (lowlat_size): New variable.
	(print_kernel_dispatch): Label the group_segment_size purpose.
	(init_environment_variables): Read GOMP_GCN_LOWLAT_POOL.
	(create_kernel_dispatch): Pass low-latency head allocation to kernel.
	(run_kernel): Use shadow; don't assume values.
	* testsuite/libgomp.c/omp_alloc-traits.c: Enable for amdgcn.
	* config/gcn/allocator.c: New file.
	* libgomp.texi: Document low-latency implementation details.
2023-12-06 16:48:57 +00:00
Andrew Stubbs
e9a19ead49 openmp, nvptx: low-lat memory access traits
The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all".  This change means that the omp_low_lat_mem_alloc predefined
allocator no longer works (but omp_cgroup_mem_alloc still does).

libgomp/ChangeLog:

	* allocator.c (MEMSPACE_VALIDATE): New macro.
	(omp_init_allocator): Use MEMSPACE_VALIDATE.
	(omp_aligned_alloc): Use OMP_LOW_LAT_MEM_ALLOC_INVALID.
	(omp_aligned_calloc): Likewise.
	(omp_realloc): Likewise.
	* config/nvptx/allocator.c (nvptx_memspace_validate): New function.
	(MEMSPACE_VALIDATE): New macro.
	(OMP_LOW_LAT_MEM_ALLOC_INVALID): New define.
	* libgomp.texi: Document low-latency implementation details.
	* testsuite/libgomp.c/omp_alloc-1.c (main): Add gnu_lowlat.
	* testsuite/libgomp.c/omp_alloc-2.c (main): Add gnu_lowlat.
	* testsuite/libgomp.c/omp_alloc-3.c (main): Add gnu_lowlat.
	* testsuite/libgomp.c/omp_alloc-4.c (main): Add access trait.
	* testsuite/libgomp.c/omp_alloc-5.c (main): Add gnu_lowlat.
	* testsuite/libgomp.c/omp_alloc-6.c (main): Add access trait.
	* testsuite/libgomp.c/omp_alloc-traits.c: New test.
2023-12-06 16:48:57 +00:00
Andrew Stubbs
30486fab71 libgomp, nvptx: low-latency memory allocator
This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc.  The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
the GOMP_NVPTX_LOWLAT_POOL environment variable.

The use of the PTX dynamic_smem_size feature means that low-latency allocator
will not work with the PTX 3.1 multilib.

For now, the omp_low_lat_mem_alloc allocator also works, but that will change
when I implement the access traits.

libgomp/ChangeLog:

	* allocator.c (MEMSPACE_ALLOC): New macro.
	(MEMSPACE_CALLOC): New macro.
	(MEMSPACE_REALLOC): New macro.
	(MEMSPACE_FREE): New macro.
	(predefined_alloc_mapping): New array.  Add _Static_assert to match.
	(ARRAY_SIZE): New macro.
	(omp_aligned_alloc): Use MEMSPACE_ALLOC.
	Implement fall-backs for predefined allocators.  Simplify existing
	fall-backs.
	(omp_free): Use MEMSPACE_FREE.
	(omp_calloc): Use MEMSPACE_CALLOC. Implement fall-backs for
	predefined allocators.  Simplify existing fall-backs.
	(omp_realloc): Use MEMSPACE_REALLOC, MEMSPACE_ALLOC, and MEMSPACE_FREE.
	Implement fall-backs for predefined allocators.  Simplify existing
	fall-backs.
	* config/nvptx/team.c (__nvptx_lowlat_pool): New asm variable.
	(__nvptx_lowlat_init): New prototype.
	(gomp_nvptx_main): Call __nvptx_lowlat_init.
	* libgomp.texi: Update memory space table.
	* plugin/plugin-nvptx.c (lowlat_pool_size): New variable.
	(GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar.
	(GOMP_OFFLOAD_run): Apply lowlat_pool_size.
	* basic-allocator.c: New file.
	* config/nvptx/allocator.c: New file.
	* testsuite/libgomp.c/omp_alloc-1.c: New test.
	* testsuite/libgomp.c/omp_alloc-2.c: New test.
	* testsuite/libgomp.c/omp_alloc-3.c: New test.
	* testsuite/libgomp.c/omp_alloc-4.c: New test.
	* testsuite/libgomp.c/omp_alloc-5.c: New test.
	* testsuite/libgomp.c/omp_alloc-6.c: New test.

Co-authored-by: Kwok Cheung Yeung  <kcy@codesourcery.com>
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2023-12-06 16:48:57 +00:00
Thomas Schwinge
aae57a9e19 Fix 'libgomp.c/declare-variant-4-*.c', add 'libgomp.c/declare-variant-4.c'
These test cases being 'dg-skip-if [...] { ! amdgcn-*-* }' meant they were
only ever considered for 'amdgcn-*-*' target -- that is, GCN *target*, not
GCN *offload target*, what is intended to be tested here, but for which they
thus always were UNSUPPORTED.  Use the same style of 'dg-[...]' directives
as for the nvptx offloading test cases, 'libgomp.c/declare-variant-3*.c'.

Fix-up for commit 1fd508744e
"amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors".

	libgomp/
	* testsuite/libgomp.c/declare-variant-4-fiji.c: Adjust.
	* testsuite/libgomp.c/declare-variant-4-gfx803.c: Likewise.
	* testsuite/libgomp.c/declare-variant-4-gfx900.c: Likewise.
	* testsuite/libgomp.c/declare-variant-4-gfx906.c: Likewise.
	* testsuite/libgomp.c/declare-variant-4-gfx908.c: Likewise.
	* testsuite/libgomp.c/declare-variant-4-gfx90a.c: Likewise.
	* testsuite/libgomp.c/declare-variant-4.h: Likewise.
	* testsuite/libgomp.c/declare-variant-4.c: New.
2023-11-30 15:42:57 +01:00
Thomas Schwinge
95e6e32a85 Spin 'dg-do run' part of 'libgomp.c/declare-variant-3-sm30.c' off into new 'libgomp.c/declare-variant-3.c'
Having nvptx offloading configured doesn't imply being able to run nvptx
offloading test cases on the test host.

Also, make 'libgomp.c/declare-variant-3.c' work for all non-offloading and
offloading cases.

Fix-up for commit 59b8ade887
"[libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm*.c".

	libgomp/
	* testsuite/libgomp.c/declare-variant-3-sm30.c: Turn 'dg-do run'
	into 'dg-do link'.
	* testsuite/libgomp.c/declare-variant-3.c: New.
	* testsuite/libgomp.c/declare-variant-3.h: Extend.
2023-11-30 15:42:57 +01:00
Thomas Schwinge
186e22c5de In 'libgomp.c/declare-variant-{3,4}-*.c', restrict 'scan-offload-tree-dump's to 'only_for_offload_target [...]'
... to care for the case where not just one but both of GCN and nvptx
offloading are enabled.  In that case, we currently get:

    UNRESOLVED: libgomp.c/declare-variant-3-sm30.c scan-amdgcn-amdhsa-offload-tree-dump optimized "= f30 \\(\\);"

... in addition to:

    PASS: libgomp.c/declare-variant-3-sm30.c scan-nvptx-none-offload-tree-dump optimized "= f30 \\(\\);"

Etc.

Fix-up for commit 59b8ade887
"[libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm*.c",
and commit 1fd508744e
"amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors".

	libgomp/
	* testsuite/libgomp.c/declare-variant-3-sm30.c: Restrict
	'scan-offload-tree-dump' to 'only_for_offload_target nvptx-none'.
	* testsuite/libgomp.c/declare-variant-3-sm35.c: Likewise.
	* testsuite/libgomp.c/declare-variant-3-sm53.c: Likewise.
	* testsuite/libgomp.c/declare-variant-3-sm70.c: Likewise.
	* testsuite/libgomp.c/declare-variant-3-sm75.c: Likewise.
	* testsuite/libgomp.c/declare-variant-3-sm80.c: Likewise.
	* testsuite/libgomp.c/declare-variant-4-fiji.c: Restrict
	'scan-offload-tree-dump' to
	'only_for_offload_target amdgcn-amdhsa'.
	* testsuite/libgomp.c/declare-variant-4-gfx803.c: Likewise.
	* testsuite/libgomp.c/declare-variant-4-gfx900.c: Likewise.
	* testsuite/libgomp.c/declare-variant-4-gfx906.c: Likewise.
	* testsuite/libgomp.c/declare-variant-4-gfx908.c: Likewise.
	* testsuite/libgomp.c/declare-variant-4-gfx90a.c: Likewise.
2023-11-30 15:42:57 +01:00
Thomas Schwinge
3f5a3b7539 Fix 'libgomp.c/declare-variant-3-*.c' compilation for configurations where GCN offloading is enabled in addition to nvptx
The GCN offloading compiler doesn't like '-misa=sm_30' etc.; restrict to
'-foffload=nvptx-none' compilation only.

Fix-up for commit 59b8ade887
"[libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm*.c".

	libgomp/
	* testsuite/libgomp.c/declare-variant-3-sm30.c:
	'dg-additional-options -foffload=nvptx-none'.
	* testsuite/libgomp.c/declare-variant-3-sm35.c: Likewise.
	* testsuite/libgomp.c/declare-variant-3-sm53.c: Likewise.
	* testsuite/libgomp.c/declare-variant-3-sm70.c: Likewise.
	* testsuite/libgomp.c/declare-variant-3-sm75.c: Likewise.
	* testsuite/libgomp.c/declare-variant-3-sm80.c: Likewise.
2023-11-30 15:42:57 +01:00
Thomas Schwinge
4c909c6ee3 In 'libgomp.c/target-simd-clone-{1,2,3}.c', restrict 'scan-offload-ipa-dump's to 'only_for_offload_target amdgcn-amdhsa'
This gets rid of UNRESOLVEDs if nvptx offloading compilation is enabled in
addition to GCN:

     PASS: libgomp.c/target-simd-clone-1.c (test for excess errors)
     PASS: libgomp.c/target-simd-clone-1.c scan-amdgcn-amdhsa-offload-ipa-dump simdclone "Generated local clone _ZGV.*N.*_addit"
    -UNRESOLVED: libgomp.c/target-simd-clone-1.c scan-nvptx-none-offload-ipa-dump simdclone "Generated local clone _ZGV.*N.*_addit"
     PASS: libgomp.c/target-simd-clone-1.c scan-amdgcn-amdhsa-offload-ipa-dump simdclone "Generated local clone _ZGV.*M.*_addit"
    -UNRESOLVED: libgomp.c/target-simd-clone-1.c scan-nvptx-none-offload-ipa-dump simdclone "Generated local clone _ZGV.*M.*_addit"
     PASS: libgomp.c/target-simd-clone-2.c (test for excess errors)
     PASS: libgomp.c/target-simd-clone-2.c scan-amdgcn-amdhsa-offload-ipa-dump-not simdclone "Generated .* clone"
    -UNRESOLVED: libgomp.c/target-simd-clone-2.c scan-nvptx-none-offload-ipa-dump-not simdclone "Generated .* clone"
     PASS: libgomp.c/target-simd-clone-3.c (test for excess errors)
     PASS: libgomp.c/target-simd-clone-3.c scan-amdgcn-amdhsa-offload-ipa-dump simdclone "device doesn't match"
    -UNRESOLVED: libgomp.c/target-simd-clone-3.c scan-nvptx-none-offload-ipa-dump simdclone "device doesn't match"
     PASS: libgomp.c/target-simd-clone-3.c scan-amdgcn-amdhsa-offload-ipa-dump-not simdclone "Generated .* clone"
    -UNRESOLVED: libgomp.c/target-simd-clone-3.c scan-nvptx-none-offload-ipa-dump-not simdclone "Generated .* clone"

Minor fix-up for commit 309e2d95e3
'OpenMP: Generate SIMD clones for functions with "declare target"'.

	libgomp/
	* testsuite/libgomp.c/target-simd-clone-1.c: Restrict
	'scan-offload-ipa-dump's to
	'only_for_offload_target amdgcn-amdhsa'.
	* testsuite/libgomp.c/target-simd-clone-2.c: Likewise.
	* testsuite/libgomp.c/target-simd-clone-3.c: Likewise.
2023-11-29 15:10:01 +01:00
Thomas Schwinge
a53da3a213 Adjust 'libgomp.c/declare-variant-{3,4}-[...]' for inter-procedural value range propagation
..., that is, commit 53ba8d6695
"inter-procedural value range propagation", after which we see:

    [-PASS:-]{+FAIL:+} libgomp.c/declare-variant-3-sm30.c scan-nvptx-none-offload-tree-dump optimized "= f30 \\(\\);"

Etc.  That's due to:

    @@ -144,13 +144,11 @@
     __attribute__((omp target entrypoint, noclone))
     void main._omp_fn.0 (const struct .omp_data_t.3 & restrict .omp_data_i)
     {
    -  int _3;
       int * _5;

       <bb 2> [local count: 1073741824]:
    -  _3 = f30 ();
       _5 = *.omp_data_i_4(D).v;
    -  *_5 = _3;
    +  *_5 = 30;
       return;

It's nice to see this optimization work here, too, but it does interfere with
how we're currently testing OpenMP 'declare variant'.

	libgomp/
	* testsuite/libgomp.c/declare-variant-3.h (f30, f35, f53, f70)
	(f75, f80, f): Add '__attribute__ ((noipa))'.
	* testsuite/libgomp.c/declare-variant-4.h (gfx803, gfx900, gfx906)
	(gfx908, gfx90a, f): Likewise.
2023-11-22 17:54:59 +01:00
Kwok Cheung Yeung
a49c7d3193 openmp: Add support for the 'indirect' clause in C/C++
This adds support for the 'indirect' clause in the 'declare target'
directive.  Functions declared as indirect may be called via function
pointers passed from the host in offloaded code.

Virtual calls to member functions via the object pointer in C++ are
currently not supported in target regions.

2023-11-07  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/c-family/
	* c-attribs.cc (c_common_attribute_table): Add attribute for
	indirect functions.
	* c-pragma.h (enum parma_omp_clause): Add entry for indirect clause.

gcc/c/
	* c-decl.cc (c_decl_attributes): Add attribute for indirect
	functions.
	* c-lang.h (c_omp_declare_target_attr): Add indirect field.
	* c-parser.cc (c_parser_omp_clause_name): Handle indirect clause.
	(c_parser_omp_clause_indirect): New.
	(c_parser_omp_all_clauses): Handle indirect clause.
	(OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
	(c_parser_omp_declare_target): Handle indirect clause.  Emit error
	message if device_type or indirect clauses used alone.  Emit error
	if indirect clause used with device_type that is not 'any'.
	(OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
	(c_parser_omp_begin): Handle indirect clause.
	* c-typeck.cc (c_finish_omp_clauses): Handle indirect clause.

gcc/cp/
	* cp-tree.h (cp_omp_declare_target_attr): Add indirect field.
	* decl2.cc (cplus_decl_attributes): Add attribute for indirect
	functions.
	* parser.cc (cp_parser_omp_clause_name): Handle indirect clause.
	(cp_parser_omp_clause_indirect): New.
	(cp_parser_omp_all_clauses): Handle indirect clause.
	(handle_omp_declare_target_clause): Add extra parameter.  Add
	indirect attribute for indirect functions.
	(OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
	(cp_parser_omp_declare_target): Handle indirect clause.  Emit error
	message if device_type or indirect clauses used alone.  Emit error
	if indirect clause used with device_type that is not 'any'.
	(OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
	(cp_parser_omp_begin): Handle indirect clause.
	* semantics.cc (finish_omp_clauses): Handle indirect clause.

gcc/
	* lto-cgraph.cc (enum LTO_symtab_tags): Add tag for indirect
	functions.
	(output_offload_tables): Write indirect functions.
	(input_offload_tables): read indirect functions.
	* lto-section-names.h (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New.
	* omp-builtins.def (BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR): New.
	* omp-offload.cc (offload_ind_funcs): New.
	(omp_discover_implicit_declare_target): Add functions marked with
	'omp declare target indirect' to indirect functions list.
	(omp_finish_file): Add indirect functions to section for offload
	indirect functions.
	(execute_omp_device_lower): Redirect indirect calls on target by
	passing function pointer to BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR.
	(pass_omp_device_lower::gate): Run pass_omp_device_lower if
	indirect functions are present on an accelerator device.
	* omp-offload.h (offload_ind_funcs): New.
	* tree-core.h (omp_clause_code): Add OMP_CLAUSE_INDIRECT.
	* tree.cc (omp_clause_num_ops): Add entry for OMP_CLAUSE_INDIRECT.
	(omp_clause_code_name): Likewise.
	* tree.h (OMP_CLAUSE_INDIRECT_EXPR): New.
	* config/gcn/mkoffload.cc (process_asm): Process offload_ind_funcs
	section.  Count number of indirect functions.
	(process_obj): Emit number of indirect functions.
	* config/nvptx/mkoffload.cc (ind_func_ids, ind_funcs_tail): New.
	(process): Emit offload_ind_func_table in PTX code.  Emit indirect
	function names and count in image.
	* config/nvptx/nvptx.cc (nvptx_record_offload_symbol): Mark
	indirect functions in PTX code with IND_FUNC_MAP.

gcc/testsuite/
	* c-c++-common/gomp/declare-target-7.c: Update expected error message.
	* c-c++-common/gomp/declare-target-indirect-1.c: New.
	* c-c++-common/gomp/declare-target-indirect-2.c: New.
	* g++.dg/gomp/attrs-21.C (v12): Update expected error message.
	* g++.dg/gomp/declare-target-indirect-1.C: New.
	* gcc.dg/gomp/attrs-21.c (v12): Update expected error message.

include/
	* gomp-constants.h (GOMP_VERSION): Increment to 3.
	(GOMP_VERSION_SUPPORTS_INDIRECT_FUNCS): New.

libgcc/
	* offloadstuff.c (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New.
	(__offload_ind_func_table): New.
	(__offload_ind_funcs_end): New.
	(__OFFLOAD_TABLE__): Add entries for indirect functions.

libgomp/
	* Makefile.am (libgomp_la_SOURCES): Add target-indirect.c.
	* Makefile.in: Regenerate.
	* libgomp-plugin.h (GOMP_INDIRECT_ADDR_MAP): New define.
	(GOMP_OFFLOAD_load_image): Add extra argument.
	* libgomp.h (struct indirect_splay_tree_key_s): New.
	(indirect_splay_tree_node, indirect_splay_tree,
	indirect_splay_tree_key): New.
	(indirect_splay_compare): New.
	* libgomp.map (GOMP_5.1.1): Add GOMP_target_map_indirect_ptr.
	* libgomp.texi (OpenMP 5.1): Update documentation on indirect
	calls in target region and on indirect clause.
	(Other new OpenMP 5.2 features): Add entry for virtual function calls.
	* libgomp_g.h (GOMP_target_map_indirect_ptr): Add prototype.
	* oacc-host.c (host_load_image): Add extra argument.
	* target.c (gomp_load_image_to_device): If the GOMP_VERSION is high
	enough, read host indirect functions table and pass to
	load_image_func.
	* config/accel/target-indirect.c: New.
	* config/linux/target-indirect.c: New.
	* config/gcn/team.c (build_indirect_map): Add prototype.
	(gomp_gcn_enter_kernel): Initialize support for indirect
	function calls on GCN target.
	* config/nvptx/team.c (build_indirect_map): Add prototype.
	(gomp_nvptx_main): Initialize support for indirect function
	calls on NVPTX target.
	* plugin/plugin-gcn.c (struct gcn_image_desc): Add field for
	indirect functions count.
	(GOMP_OFFLOAD_load_image): Add extra argument.  If the GOMP_VERSION
	is high enough, build address translation table and copy it to target
	memory.
	* plugin/plugin-nvptx.c (nvptx_tdata): Add field for indirect
	functions count.
	(GOMP_OFFLOAD_load_image): Add extra argument.  If the GOMP_VERSION
	is high enough, Build address translation table and copy it to target
	memory.
	* testsuite/libgomp.c-c++-common/declare-target-indirect-1.c: New.
	* testsuite/libgomp.c-c++-common/declare-target-indirect-2.c: New.
	* testsuite/libgomp.c++/declare-target-indirect-1.C: New.
2023-11-07 15:44:50 +00:00
Thomas Schwinge
3e888f9462 Add OpenACC 'acc_map_data' variant to 'libgomp.oacc-c-c++-common/deep-copy-8.c'
libgomp/
	* testsuite/libgomp.oacc-c-c++-common/deep-copy-8.c: Add OpenACC
	'acc_map_data' variant.
2023-10-31 14:54:41 +01:00
Thomas Schwinge
7b2ae64b68 Handle OpenACC 'self' clause for compute constructs in OpenACC 'kernels' decomposition
... to fix up recent commit 3a3596389c
"OpenACC 2.7: Implement self clause for compute constructs" for that case.

	gcc/
	* omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1):
	Handle 'OMP_CLAUSE_SELF' like 'OMP_CLAUSE_IF'.
	* omp-expand.cc (expand_omp_target): Handle 'OMP_CLAUSE_SELF' for
	'GF_OMP_TARGET_KIND_OACC_DATA_KERNELS'.
	gcc/testsuite/
	* c-c++-common/goacc/self-clause-2.c: Verify
	'--param=openacc-kernels=decompose'.
	* gfortran.dg/goacc/kernels-tree.f95: Adjust.
	libgomp/
	* oacc-parallel.c (GOACC_data_start): Handle
	'GOACC_FLAG_LOCAL_DEVICE'.
	(GOACC_parallel_keyed): Simplify accordingly.
	* testsuite/libgomp.oacc-fortran/self-1.f90: Adjust.
2023-10-25 11:30:36 +02:00
Thomas Schwinge
047841a68e Extend test suite coverage for OpenACC 'self' clause for compute constructs
... on top of what was provided in recent
commit 3a3596389c
"OpenACC 2.7: Implement self clause for compute constructs".

	gcc/testsuite/
	* c-c++-common/goacc/if-clause-2.c: Enhance.
	* c-c++-common/goacc/self-clause-1.c: Likewise.
	* c-c++-common/goacc/self-clause-2.c: Likewise.
	* gfortran.dg/goacc/if.f95: Likewise.
	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
	* gfortran.dg/goacc/parallel-tree.f95: Likewise.
	* gfortran.dg/goacc/self.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Enhance.
	* testsuite/libgomp.oacc-c-c++-common/self-1.c: Likewise.
	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/if-self-1.c: New.
	* testsuite/libgomp.oacc-fortran/self-1.f90: Likewise.
2023-10-25 11:24:29 +02:00
Chung-Lin Tang
3a3596389c OpenACC 2.7: Implement self clause for compute constructs
This patch implements the 'self' clause for compute constructs: parallel,
kernels, and serial. This clause conditionally uses the local device
(the host mult-core CPU) as the executing device of the compute region.

The actual implementation of the "local device" device type inside libgomp
(presumably using pthreads) is still not yet completed, so the libgomp
side is still implemented the exact same as host-fallback mode. (so as of now,
it essentially behaves like the 'if' clause with the condition inverted)

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_oacc_compute_clause_self): New function.
	(c_parser_oacc_all_clauses): Add new 'bool compute_p = false'
	parameter, add parsing of self clause when compute_p is true.
	(OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF.
	(OACC_PARALLEL_CLAUSE_MASK): Likewise,
	(OACC_SERIAL_CLAUSE_MASK): Likewise.
	(c_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to
	set compute_p argument to true.
	* c-typeck.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_oacc_compute_clause_self): New function.
	(cp_parser_oacc_all_clauses): Add new 'bool compute_p = false'
	parameter, add parsing of self clause when compute_p is true.
	(OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF.
	(OACC_PARALLEL_CLAUSE_MASK): Likewise,
	(OACC_SERIAL_CLAUSE_MASK): Likewise.
	(cp_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to
	set compute_p argument to true.
	* pt.cc (tsubst_omp_clauses): Add OMP_CLAUSE_SELF case.
	* semantics.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case, merged
	with OMP_CLAUSE_IF case.

gcc/fortran/ChangeLog:

	* gfortran.h (typedef struct gfc_omp_clauses): Add self_expr field.
	* openmp.cc (enum omp_mask2): Add OMP_CLAUSE_SELF.
	(gfc_match_omp_clauses): Add handling for OMP_CLAUSE_SELF.
	(OACC_PARALLEL_CLAUSES): Add OMP_CLAUSE_SELF.
	(OACC_KERNELS_CLAUSES): Likewise.
	(OACC_SERIAL_CLAUSES): Likewise.
	(resolve_omp_clauses): Add handling for omp_clauses->self_expr.
	* trans-openmp.cc (gfc_trans_omp_clauses): Add handling of
	clauses->self_expr and building of OMP_CLAUSE_SELF tree clause.
	(gfc_split_omp_clauses): Add handling of self_expr field copy.

gcc/ChangeLog:

	* gimplify.cc (gimplify_scan_omp_clauses): Add OMP_CLAUSE_SELF case.
	(gimplify_adjust_omp_clauses): Likewise.
	* omp-expand.cc (expand_omp_target): Add OMP_CLAUSE_SELF expansion code,
	* omp-low.cc (scan_sharing_clauses): Add OMP_CLAUSE_SELF case.
	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_SELF enum.
	* tree-nested.cc (convert_nonlocal_omp_clauses): Add OMP_CLAUSE_SELF
	case.
	(convert_local_omp_clauses): Likewise.
	* tree-pretty-print.cc (dump_omp_clause): Add OMP_CLAUSE_SELF case.
	* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_SELF entry.
	(omp_clause_code_name): Likewise.
	* tree.h (OMP_CLAUSE_SELF_EXPR): New macro.

gcc/testsuite/ChangeLog:

	* c-c++-common/goacc/self-clause-1.c: New test.
	* c-c++-common/goacc/self-clause-2.c: New test.
	* gfortran.dg/goacc/self.f95: New test.

include/ChangeLog:

	* gomp-constants.h (GOACC_FLAG_LOCAL_DEVICE): New flag bit value.

libgomp/ChangeLog:

	* oacc-parallel.c (GOACC_parallel_keyed): Add code to handle
	GOACC_FLAG_LOCAL_DEVICE case.
	* testsuite/libgomp.oacc-c-c++-common/self-1.c: New test.
2023-10-25 10:49:55 +02:00
Tobias Burnus
fd6b17a489 libgomp.fortran/allocate-6.f90: Run with -fdump-tree-gimple
libgomp/
	* testsuite/libgomp.fortran/allocate-6.f90: Add missing
	dg-additional-options "-fdump-tree-gimple"; fix scan.
2023-10-14 20:09:34 +02:00
Tobias Burnus
969f5c3eaa Fortran: Support OpenMP's 'allocate' directive for stack vars
gcc/fortran/ChangeLog:

	* gfortran.h (ext_attr_t): Add omp_allocate flag.
	* match.cc (gfc_free_omp_namelist): Void deleting same
	u2.allocator multiple times now that a sequence can use
	the same one.
	* openmp.cc (gfc_match_omp_clauses, gfc_match_omp_allocate): Use
	same allocator expr multiple times.
	(is_predefined_allocator): Make static.
	(gfc_resolve_omp_allocate): Update/extend restriction checks;
	remove sorry message.
	(resolve_omp_clauses): Reject corarrays in allocate/allocators
	directive.
	* parse.cc (check_omp_allocate_stmt): Permit procedure pointers
	here (rejected later) for less misleading diagnostic.
	* trans-array.cc (gfc_trans_auto_array_allocation): Propagate
	size for GOMP_alloc and location to which it should be added to.
	* trans-decl.cc (gfc_trans_deferred_vars): Handle 'omp allocate'
	for stack variables; sorry for static variables/common blocks.
	* trans-openmp.cc (gfc_trans_omp_clauses): Evaluate 'allocate'
	clause's allocator only once; fix adding expressions to the
	block.
	(gfc_trans_omp_single): Pass a block to gfc_trans_omp_clauses.

gcc/ChangeLog:

	* gimplify.cc (gimplify_bind_expr): Handle Fortran's
	'omp allocate' for stack variables.

libgomp/ChangeLog:

	* libgomp.texi (OpenMP Impl. Status): Mention that Fortran now
	supports the allocate directive for stack variables.
	* testsuite/libgomp.fortran/allocate-5.f90: New test.
	* testsuite/libgomp.fortran/allocate-6.f90: New test.
	* testsuite/libgomp.fortran/allocate-7.f90: New test.
	* testsuite/libgomp.fortran/allocate-8.f90: New test.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/allocate-14.c: Fix directive name.
	* c-c++-common/gomp/allocate-15.c: Likewise.
	* c-c++-common/gomp/allocate-9.c: Fix comment typo.
	* gfortran.dg/gomp/allocate-4.f90: Remove sorry dg-error.
	* gfortran.dg/gomp/allocate-7.f90: Likewise.
	* gfortran.dg/gomp/allocate-10.f90: New test.
	* gfortran.dg/gomp/allocate-11.f90: New test.
	* gfortran.dg/gomp/allocate-12.f90: New test.
	* gfortran.dg/gomp/allocate-13.f90: New test.
	* gfortran.dg/gomp/allocate-14.f90: New test.
	* gfortran.dg/gomp/allocate-15.f90: New test.
	* gfortran.dg/gomp/allocate-8.f90: New test.
	* gfortran.dg/gomp/allocate-9.f90: New test.
2023-10-14 11:07:47 +02:00
Tobias Burnus
6a8edd50a1 Fortran/OpenMP: Fix handling of strictly structured blocks
For strictly structured blocks, a BLOCK was created but the code
was placed after the block the outer structured block. Additionally,
labelled blocks were mishandled. As the code is now properly in a
BLOCK, it solves additional issues.

gcc/fortran/ChangeLog:

	* parse.cc (parse_omp_structured_block): Make the user code end
	up inside of BLOCK construct for strictly structured blocks;
	fix fallout for 'section' and 'teams'.
	* openmp.cc (resolve_omp_target): Fix changed BLOCK handling
	for teams in target checking.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/strictly-structured-block-1.f90: New test.

gcc/testsuite/ChangeLog:

	* gfortran.dg/block_17.f90: New test.
	* gfortran.dg/gomp/strictly-structured-block-5.f90: New test.
2023-10-08 11:54:07 +02:00
Tobias Burnus
1a554a2c9f OpenMP: Add ME support for 'omp allocate' stack variables
Call GOMP_alloc/free for 'omp allocate' allocated variables. This is
for C only as C++ and Fortran show a sorry already in the FE. Note that
this only applies to stack variables as the C FE shows a sorry for
static variables.

gcc/ChangeLog:

	* gimplify.cc (gimplify_bind_expr): Call GOMP_alloc/free for
	'omp allocate' variables; move stack cleanup after other
	cleanup.
	(omp_notice_variable): Process original decl when decl
	of the value-expression for a 'omp allocate' variable is passed.
	* omp-low.cc (scan_omp_1_op): Handle 'omp allocate' variables

libgomp/ChangeLog:

	* libgomp.texi (OpenMP 5.1 Impl.): Mark 'omp allocate' as
	implemented for C only.
	* testsuite/libgomp.c/allocate-4.c: New test.
	* testsuite/libgomp.c/allocate-5.c: New test.
	* testsuite/libgomp.c/allocate-6.c: New test.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/allocate-11.c: Remove C-only dg-message
	for 'sorry, unimplemented'.
	* c-c++-common/gomp/allocate-12.c: Likewise.
	* c-c++-common/gomp/allocate-15.c: Likewise.
	* c-c++-common/gomp/allocate-9.c: Likewise.
	* c-c++-common/gomp/allocate-10.c: New test.
	* c-c++-common/gomp/allocate-17.c: New test.
2023-09-20 16:03:19 +02:00
Thomas Schwinge
fb5d27be27 libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]
This is commit c8e759b421 (Subversion r279708)
"libgomp/test: Fix compilation for build sysroot" and follow-up
commit 749bd22ddc
"libgomp/test: Remove a build sysroot fix regression" done differently,
avoiding build-tree testing use of any random gunk that may appear in
build-time 'CC', 'CXX', 'FC'.

	PR testsuite/91884
	PR testsuite/109951
	libgomp/
	* configure.ac: Revert earlier changes, instead
	'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'.
	* Makefile.in: Regenerate.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
	* testsuite/lib/libgomp.exp (libgomp_init): Remove
	"Fix up '-funconfigured-libstdc++-v3' in 'GXX_UNDER_TEST'" code.
	If '--with-build-sysroot=[...]' was specified, use it for
	build-tree testing.
	* testsuite/libgomp-site-extra.exp.in (GCC_UNDER_TEST)
	(GXX_UNDER_TEST, GFORTRAN_UNDER_TEST): Don't set.
	(SYSROOT_CFLAGS_FOR_TARGET): Set.
	* testsuite/libgomp.c++/c++.exp (lang_source_re)
	(lang_include_flags): Set for build-tree testing.
	* testsuite/libgomp.oacc-c++/c++.exp (lang_source_re)
	(lang_include_flags): Likewise.

Co-authored-by: Chung-Lin Tang <cltang@codesourcery.com>
2023-09-12 11:30:37 +02:00
Tobias Burnus
fe0f9e0941 Add 'libgomp.c-c++-common/pr100059-1.c'
For nvptx offloading, it'll FAIL its execution test until nvptx-tools updated
to include commit 1b5946d78ef5dcfb640e9f545a7c791b7f623911
"Merge commit '26095fd01232061de9f79decb3e8222ef7b46191' into HEAD [#29]",
<1b5946d78e>.

	libgomp/
	* testsuite/libgomp.c-c++-common/pr100059-1.c: New.

Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2023-09-04 14:49:50 +02:00
Sandra Loosemore
b7c4a12a9d OpenMP: Fortran support for imperfectly-nested loops
OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop.  In GCC this code is moved
into the inner loop body by the respective front ends.

In the Fortran front end, most of the semantic processing happens during
the translation phase, so the parse phase just collects the intervening
statements, checks them for errors, and splices them around the loop body.

gcc/fortran/ChangeLog
	* gfortran.h (struct gfc_namespace): Add omp_structured_block bit.
	* openmp.cc: Include omp-api.h.
	(resolve_omp_clauses): Consolidate inscan reduction clause conflict
	checking here.
	(find_nested_loop_in_chain): New.
	(find_nested_loop_in_block): New.
	(gfc_resolve_omp_do_blocks): Set omp_current_do_collapse properly.
	Handle imperfectly-nested loops when looking for nested omp scan.
	Refactor to move inscan reduction clause conflict checking to
	resolve_omp_clauses.
	(gfc_resolve_do_iterator): Handle imperfectly-nested loops.
	(struct icode_error_state): New.
	(icode_code_error_callback): New.
	(icode_expr_error_callback): New.
	(diagnose_intervening_code_errors_1): New.
	(diagnose_intervening_code_errors): New.
	(make_structured_block): New.
	(restructure_intervening_code): New.
	(is_outer_iteration_variable): Do not assume loops are perfectly
	nested.
	(check_nested_loop_in_chain): New.
	(check_nested_loop_in_block_state): New.
	(check_nested_loop_in_block_symbol): New.
	(check_nested_loop_in_block): New.
	(expr_uses_intervening_var): New.
	(is_intervening_var): New.
	(expr_is_invariant): Do not assume loops are perfectly nested.
	(resolve_omp_do): Handle imperfectly-nested loops.
	* trans-stmt.cc (gfc_trans_block_construct): Generate
	OMP_STRUCTURED_BLOCK if magic bit is set on block namespace.

gcc/testsuite/ChangeLog
	* gfortran.dg/gomp/collapse1.f90: Adjust expected errors.
	* gfortran.dg/gomp/collapse2.f90: Likewise.
	* gfortran.dg/gomp/imperfect-gotos.f90: New.
	* gfortran.dg/gomp/imperfect-invalid-scope.f90: New.
	* gfortran.dg/gomp/imperfect1.f90: New.
	* gfortran.dg/gomp/imperfect2.f90: New.
	* gfortran.dg/gomp/imperfect3.f90: New.
	* gfortran.dg/gomp/imperfect4.f90: New.
	* gfortran.dg/gomp/imperfect5.f90: New.

libgomp/ChangeLog
	* testsuite/libgomp.fortran/imperfect-destructor.f90: New.
	* testsuite/libgomp.fortran/imperfect1.f90: New.
	* testsuite/libgomp.fortran/imperfect2.f90: New.
	* testsuite/libgomp.fortran/imperfect3.f90: New.
	* testsuite/libgomp.fortran/imperfect4.f90: New.
	* testsuite/libgomp.fortran/target-imperfect1.f90: New.
	* testsuite/libgomp.fortran/target-imperfect2.f90: New.
	* testsuite/libgomp.fortran/target-imperfect3.f90: New.
	* testsuite/libgomp.fortran/target-imperfect4.f90: New.
2023-08-25 19:42:51 +00:00
Sandra Loosemore
410df0843d OpenMP: New C/C++ testcases for imperfectly nested loops.
gcc/testsuite/ChangeLog
	* c-c++-common/gomp/imperfect-attributes.c: New.
	* c-c++-common/gomp/imperfect-badloops.c: New.
	* c-c++-common/gomp/imperfect-blocks.c: New.
	* c-c++-common/gomp/imperfect-extension.c: New.
	* c-c++-common/gomp/imperfect-gotos.c: New.
	* c-c++-common/gomp/imperfect-invalid-scope.c: New.
	* c-c++-common/gomp/imperfect-labels.c: New.
	* c-c++-common/gomp/imperfect-legacy-syntax.c: New.
	* c-c++-common/gomp/imperfect-pragmas.c: New.
	* c-c++-common/gomp/imperfect1.c: New.
	* c-c++-common/gomp/imperfect2.c: New.
	* c-c++-common/gomp/imperfect3.c: New.
	* c-c++-common/gomp/imperfect4.c: New.
	* c-c++-common/gomp/imperfect5.c: New.

libgomp/ChangeLog
	* testsuite/libgomp.c-c++-common/imperfect1.c: New.
	* testsuite/libgomp.c-c++-common/imperfect2.c: New.
	* testsuite/libgomp.c-c++-common/imperfect3.c: New.
	* testsuite/libgomp.c-c++-common/imperfect4.c: New.
	* testsuite/libgomp.c-c++-common/imperfect5.c: New.
	* testsuite/libgomp.c-c++-common/imperfect6.c: New.
	* testsuite/libgomp.c-c++-common/target-imperfect1.c: New.
	* testsuite/libgomp.c-c++-common/target-imperfect2.c: New.
	* testsuite/libgomp.c-c++-common/target-imperfect3.c: New.
	* testsuite/libgomp.c-c++-common/target-imperfect4.c: New.
2023-08-25 19:42:50 +00:00
Sandra Loosemore
53891f18f3 OpenMP: C++ support for imperfectly-nested loops
OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop.  In GCC this code is moved
into the inner loop body by the respective front ends.

This patch changes the C++ front end to use recursive descent parsing
on nested loops within an "omp for" construct, rather than an
iterative approach, in order to preserve proper nesting of compound
statements.  Preserving cleanups (destructors) for class objects
declared in intervening code and loop initializers complicates moving
the former into the body of the loop; this is handled by parsing the
entire construct before reassembling any of it.

gcc/cp/ChangeLog
	* cp-tree.h (cp_convert_omp_range_for): Adjust declaration.
	* parser.cc (struct omp_for_parse_data): New.
	(cp_parser_postfix_expression): Diagnose calls to OpenMP runtime
	in intervening code.
	(check_omp_intervening_code): New.
	(cp_parser_statement_seq_opt): Special-case nested loops, blocks,
	and other constructs for OpenMP loops.
	(cp_parser_iteration_statement): Reject loops in intervening code.
	(cp_parser_omp_for_loop_init): Expand comments and tweak the
	interface slightly to better distinguish input/output parameters.
	(cp_convert_omp_range_for): Likewise.
	(cp_parser_omp_loop_nest): New, split from cp_parser_omp_for_loop
	and largely rewritten.  Add more comments.
	(insert_structured_blocks): New.
	(find_structured_blocks): New.
	(struct sit_data, substitute_in_tree_walker, substitute_in_tree):
	New.
	(fixup_blocks_walker): New.
	(cp_parser_omp_for_loop): Rewrite to use recursive descent instead
	of a loop.  Add logic to reshuffle the bits of code collected
	during parsing so intervening code gets moved to the loop body.
	(cp_parser_omp_loop): Remove call to finish_omp_for_block, which
	is now redundant.
	(cp_parser_omp_simd): Likewise.
	(cp_parser_omp_for): Likewise.
	(cp_parser_omp_distribute): Likewise.
	(cp_parser_oacc_loop): Likewise.
	(cp_parser_omp_taskloop): Likewise.
	(cp_parser_pragma): Reject OpenMP pragmas in intervening code.
	* parser.h (struct cp_parser): Add omp_for_parse_state field.
	* pt.cc (tsubst_omp_for_iterator): Adjust call to
	cp_convert_omp_range_for.
	* semantics.cc (finish_omp_for): Try harder to preserve location
	of loop variable init expression for use in diagnostics.
	(struct fofb_data, finish_omp_for_block_walker): New.
	(finish_omp_for_block): Allow variables to be bound in a BIND_EXPR
	nested inside BIND instead of directly in BIND itself.

gcc/testsuite/ChangeLog
	* c-c++-common/goacc/tile-2.c: Adjust expected error patterns.
	* g++.dg/gomp/attrs-imperfect1.C: New test.
	* g++.dg/gomp/attrs-imperfect2.C: New test.
	* g++.dg/gomp/attrs-imperfect3.C: New test.
	* g++.dg/gomp/attrs-imperfect4.C: New test.
	* g++.dg/gomp/attrs-imperfect5.C: New test.
	* g++.dg/gomp/pr41967.C: Adjust expected error patterns.
	* g++.dg/gomp/tpl-imperfect-gotos.C: New test.
	* g++.dg/gomp/tpl-imperfect-invalid-scope.C: New test.

libgomp/ChangeLog
	* testsuite/libgomp.c++/attrs-imperfect1.C: New test.
	* testsuite/libgomp.c++/attrs-imperfect2.C: New test.
	* testsuite/libgomp.c++/attrs-imperfect3.C: New test.
	* testsuite/libgomp.c++/attrs-imperfect4.C: New test.
	* testsuite/libgomp.c++/attrs-imperfect5.C: New test.
	* testsuite/libgomp.c++/attrs-imperfect6.C: New test.
	* testsuite/libgomp.c++/imperfect-class-1.C: New test.
	* testsuite/libgomp.c++/imperfect-class-2.C: New test.
	* testsuite/libgomp.c++/imperfect-class-3.C: New test.
	* testsuite/libgomp.c++/imperfect-destructor.C: New test.
	* testsuite/libgomp.c++/imperfect-template-1.C: New test.
	* testsuite/libgomp.c++/imperfect-template-2.C: New test.
	* testsuite/libgomp.c++/imperfect-template-3.C: New test.
2023-08-25 19:42:50 +00:00
Francois-Xavier Coudert
0ccfbe6431 libgomp, testsuite: Do not call nonstandard functions
The following functions are not standard, and not always available
(e.g., on darwin). They should not be called unless available: gamma,
gammaf, scalb, scalbf, significand, and significandf.

libgomp/ChangeLog:

	* testsuite/lib/libgomp.exp: Add effective target.
	* testsuite/libgomp.c/simd-math-1.c: Avoid calling nonstandard
	functions.
2023-08-23 00:49:08 +02:00
Tobias Burnus
1dc65003b6 omp-expand.cc: Fix wrong code with non-rectangular loop nest [PR111017]
Before commit r12-5295-g47de0b56ee455e, all gimple_build_cond in
expand_omp_for_* were inserted with
  gsi_insert_before (gsi_p, cond_stmt, GSI_SAME_STMT);
except the one dealing with the multiplicative factor that was
  gsi_insert_after (gsi, cond_stmt, GSI_CONTINUE_LINKING);

That commit for PR103208 fixed the issue of some missing regimplify of
operands of GIMPLE_CONDs by moving the condition handling to the new function
expand_omp_build_cond. While that function has an 'bool after = false'
argument to switch between the two variants.

However, all callers ommited this argument. This commit reinstates the
prior behavior by passing 'true' for the factor != 0 condition, fixing
the included testcase.

	PR middle-end/111017
gcc/
	* omp-expand.cc (expand_omp_for_init_vars): Pass after=true
	to expand_omp_build_cond for 'factor != 0' condition, resulting
	in pre-r12-5295-g47de0b56ee455e code for the gimple insert.

libgomp/
	* testsuite/libgomp.c-c++-common/non-rect-loop-1.c: New test.
2023-08-19 07:49:06 +02:00
Tobias Burnus
25072a477a OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect
When copying a 2D or 3D rectangular memmory block, the performance is
better when using CUDA's cuMemcpy2D/cuMemcpy3D instead of copying the
data one by one. That's what this commit does.

Additionally, it permits device-to-device copies, if neccessary using a
temporary variable on the host.

include/ChangeLog:

	* cuda/cuda.h (CUlimit): Add CUDA_ERROR_NOT_INITIALIZED,
	CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_INVALID_HANDLE.
	(CUarray, CUmemorytype, CUDA_MEMCPY2D, CUDA_MEMCPY3D,
	CUDA_MEMCPY3D_PEER): New typdefs.
	(cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned,
	cuMemcpy3D, cuMemcpy3DAsync, cuMemcpy3DPeer,
	cuMemcpy3DPeerAsync): New prototypes.

libgomp/ChangeLog:

	* libgomp-plugin.h (GOMP_OFFLOAD_memcpy2d,
	GOMP_OFFLOAD_memcpy3d): New prototypes.
	* libgomp.h (struct gomp_device_descr): Add memcpy2d_func
	and memcpy3d_func.
	* libgomp.texi (nvtpx): Document when cuMemcpy2D/cuMemcpy3D is used.
	* oacc-host.c (memcpy2d_func, .memcpy3d_func): Init with NULL.
	* plugin/cuda-lib.def (cuMemcpy2D, cuMemcpy2DUnaligned,
	cuMemcpy3D): Invoke via CUDA_ONE_CALL.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d,
	GOMP_OFFLOAD_memcpy3d): New.
	* target.c (omp_target_memcpy_rect_worker):
	(omp_target_memcpy_rect_check, omp_target_memcpy_rect_copy):
	Permit all device-to-device copyies; invoke new plugins for
	2D and 3D copying when available.
	(gomp_load_plugin_for_device): DLSYM the new plugin functions.
	* testsuite/libgomp.c/target-12.c: Fix dimension bug.
	* testsuite/libgomp.fortran/target-12.f90: Likewise.
	* testsuite/libgomp.fortran/target-memcpy-rect-1.f90: New test.
2023-07-26 16:22:35 +02:00
Tobias Burnus
85da0b4053 OpenMP/Fortran: Non-rectangular loops with constant steps other than 1 or -1 [PR107424]
Before this commit, gfortran produced with OpenMP for 'do i = 1,10,2'
the code
  for (count.0 = 0; count.0 < 5; count.0 = count.0 + 1)
    i = count.0 * 2 + 1;

While such an inner loop can be collapsed, a non-rectangular could not.
With this commit and for all constant loop steps, a simple loop such
as 'for (i = 1; i <= 10; i = i + 2)' is created. (Before only for the
constant steps of 1 and -1.)

The constant step permits to know the direction (increasing/decreasing)
that is required for the loop condition.

The new code is only valid if one assumes no overflow of the loop variable.
However, the Fortran standard can be read that this must be ensured by
the user. Namely, the Fortran standard requires (F2023, 10.1.5.2.4):
"The execution of any numeric operation whose result is not defined by
the arithmetic used by the processor is prohibited."

And, for DO loops, F2023's "11.1.7.4.3 The execution cycle" has the
following: The number of loop iterations handled by an iteration count,
which would permit code like 'do i = huge(i)-5, huge(i),4'. However,
in step (3), this count is not only decremented by one but also:
  "... The DO variable, if any, is incremented by the value of the
  incrementation parameter m3."
And for the example above, 'i' would be 'huge(i)+3' in the last
execution cycle, which exceeds the largest model number and should
render the example as invalid.

	PR fortran/107424

gcc/fortran/ChangeLog:

	* trans-openmp.cc (gfc_nonrect_loop_expr): Accept all
	constant loop steps.
	(gfc_trans_omp_do): Likewise; use sign to determine
	loop direction.

libgomp/ChangeLog:

	* libgomp.texi (Impl. Status 5.0): Add link to new PR110735.
	* testsuite/libgomp.fortran/non-rectangular-loop-1.f90: Enable
	commented tests.
	* testsuite/libgomp.fortran/non-rectangular-loop-1a.f90: Remove
	test file; tests are in non-rectangular-loop-1.f90.
	* testsuite/libgomp.fortran/non-rectangular-loop-5.f90: Change
	testcase to use a non-constant step to retain the 'sorry' test.
	* testsuite/libgomp.fortran/non-rectangular-loop-6.f90: New test.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/linear-2.f90: Update dump to remove
	the additional count variable.
2023-07-19 10:18:49 +02:00
Tobias Burnus
89d0f082b3 OpenMP/Fortran: Parsing support for 'uses_allocators'
The 'uses_allocators' clause to the 'target' construct accepts predefined
allocators and can also be used to define a new allocator for a target region.
As predefined allocators in GCC do not require special handling, those can and
are ignored after parsing, such that this feature now works. On the other hand,
defining a new allocator will fail for now with a 'sorry, unimplemented'.

Note that both the OpenMP 5.0/5.1 and 5.2 syntax for uses_allocators
is supported by this commit.

2023-07-17  Tobias Burnus  <tobias@codesoucery.com>
	    Chung-Lin Tang  <cltang@codesourcery.com>

gcc/fortran/ChangeLog:

	* dump-parse-tree.cc (show_omp_namelist, show_omp_clauses): Dump
	uses_allocators clause.
	* gfortran.h (gfc_free_omp_namelist): Add memspace_sym to u union
	and traits_sym to u2 union.
	(OMP_LIST_USES_ALLOCATORS): New enum value.
	(gfc_free_omp_namelist): Add 'bool free_mem_traits_space' arg.
	* match.cc (gfc_free_omp_namelist): Likewise.
	* openmp.cc (gfc_free_omp_clauses, gfc_match_omp_variable_list,
	gfc_match_omp_to_link, gfc_match_omp_doacross_sink,
	gfc_match_omp_clause_reduction, gfc_match_omp_allocate,
	gfc_match_omp_flush): Update call.
	(gfc_match_omp_clauses): Likewise. Parse uses_allocators clause.
	(gfc_match_omp_clause_uses_allocators): New.
	(enum omp_mask2): Add new OMP_CLAUSE_USES_ALLOCATORS.
	(OMP_TARGET_CLAUSES): Accept it.
	(resolve_omp_clauses): Resolve uses_allocators clause
	* st.cc (gfc_free_statement): Update gfc_free_omp_namelist call.
	* trans-openmp.cc (gfc_trans_omp_clauses): Handle
	OMP_LIST_USES_ALLOCATORS; fail with sorry unless predefined allocator.
	(gfc_split_omp_clauses): Handle uses_allocators.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/uses_allocators_1.f90: New test.
	* testsuite/libgomp.fortran/uses_allocators_2.f90: New test.

Co-authored-by: Chung-Lin Tang <cltang@codesourcery.com>
2023-07-17 15:13:44 +02:00
David Edelsohn
6f0b0cdb8a testsuite: dg-require LTO for libgomp LTO tests
Some test cases in libgomp testsuite pass -flto as an option, but
the testcases do not require LTO target support.  This patch adds
the necessary DejaGNU requirement for LTO support to the testcases..

libgomp/ChangeLog:
	* testsuite/libgomp.c++/target-map-class-2.C: Require LTO.
	* testsuite/libgomp.c-c++-common/requires-4.c: Require LTO.
	* testsuite/libgomp.c-c++-common/requires-4a.c: Require LTO.

Signed-off-by: David Edelsohn <dje.gcc@gmail.com>
2023-07-13 08:54:20 -04:00
Tobias Burnus
450b05ce54 libgomp: Use libnuma for OpenMP's partition=nearest allocation trait
As with the memkind library, it is only used when found at runtime;
it does not need to be present when building GCC.

The included testcase does not check whether the memory has been placed
on the nearest node as the Linux kernel memory handling too often ignores
that hint, using a different node for the allocation.  However, when
running with 'numactl --preferred=<node> ./executable', it is clearly
visible that the feature works by comparing malloc/default vs. nearest
placement (using get_mempolicy to obtain the node for a mem addr).

libgomp/ChangeLog:

	* allocator.c: Add ifdef for LIBGOMP_USE_LIBNUMA.
	(enum gomp_numa_memkind_kind): Renamed from gomp_memkind_kind;
	add GOMP_MEMKIND_LIBNUMA.
	(struct gomp_libnuma_data, gomp_init_libnuma, gomp_get_libnuma): New.
	(omp_init_allocator): Handle partition=nearest with libnuma if avail.
	(omp_aligned_alloc, omp_free, omp_aligned_calloc, omp_realloc): Add
	numa_alloc_local (+ memset), numa_free, and numa_realloc calls as
	needed.
	* config/linux/allocator.c (LIBGOMP_USE_LIBNUMA): Define
	* libgomp.texi: Fix a typo; use 'fi' instead of its ligature char.
	(Memory allocation): Renamed from 'Memory allocation with libmemkind';
	updated for libnuma usage.
	* testsuite/libgomp.c-c++-common/alloc-11.c: New test.
	* testsuite/libgomp.c-c++-common/alloc-12.c: New test.
2023-07-12 13:50:21 +02:00
Thomas Schwinge
de2d3b69ee Fix DejaGnu directive syntax error in 'libgomp.c/target-51.c'
ERROR: libgomp.c/target-51.c: unknown dg option: \} for "}"

Fix-up for recent commit 01fe115ba7
"libgomp.c/target-51.c: Accept more error-msg variants in dg-output".

	libgomp/
	* testsuite/libgomp.c/target-51.c: Fix DejaGnu directive syntax
	error.
2023-06-19 12:22:29 +02:00
Tobias Burnus
01fe115ba7 libgomp.c/target-51.c: Accept more error-msg variants in dg-output
Depending on the details, the testcase can fail with different but
related messages; all of the following all could be observed for this
testcase:

  libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device cannot be used for offloading
  libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device not found
  libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but only the host device is available

Before, the last two were tested for with 'target offload_device' and
'! offload_device', respectively. Now, all three are accepted by matching
'.*' already after 'but' and without distinguishing whether the effective
target is an offload_device or not.

(For completeness, there is a fourth error that follows this pattern:
'OMP_TARGET_OFFLOAD is set to MANDATORY, but device is finalized'.)

libgomp/

	* testsuite/libgomp.c/target-51.c: Accept more error msg variants
	as expected dg-output.
2023-06-19 09:57:34 +02:00
Tobias Burnus
b25ea7ab78 OpenMP (C/C++): Keep pointer value of unmapped ptr with default mapping [PR110270]
For C/C++ pointers, default implicit mapping firstprivatizes the pointer
but if the memory it points to is mapped, the it is updated to point to
the device memory (by attaching a zero sized array section of the pointed-to
storage).

However, if the pointed-to storage wasn't mapped, the pointer was set to
NULL on the device side (OpenMP 5.0/5.1 semantic). With this commit, the
pointer retains the on-host address in that case (OpenMP 5.2 semantic).

The new semantic avoids an explicit map/firstprivate/is_device_ptr in the
following sensible cases: Special values (e.g. pointer or 0x1, 0x2 etc.),
explicitly device allocated memory (e.g. omp_target_alloc), and with
(unified) shared memory.
(Note: With (U)SM, mappings still must be tracked, at least when
omp_target_associate_ptr does not fail when passing in two destinct pointers.)

libgomp/

	PR middle-end/110270
	* target.c (gomp_map_vars_internal): Copy host value instead of NULL
	for  GOMP_MAP_ZERO_LEN_ARRAY_SECTION if not mapped.
	* libgomp.texi (OpenMP 5.2 Impl.): Mark as 'Y'.
	* testsuite/libgomp.c/target-19.c: Update expected value.
	* testsuite/libgomp.c++/target-18.C: Likewise.
	* testsuite/libgomp.c++/target-19.C: Likewise.
	* testsuite/libgomp.c-c++-common/requires-unified-addr-2.c: New test.
	* testsuite/libgomp.c-c++-common/target-implicit-map-3.c: New test.
	* testsuite/libgomp.c-c++-common/target-implicit-map-4.c: New test.
2023-06-19 09:08:51 +02:00
Tobias Burnus
8216ca8503 libgomp: Fix OMP_TARGET_OFFLOAD=mandatory
It turned out that gomp_init_targets_once() was not run when directly
calling 'omp target' or 'omp target (enter/exit) data' causing an
abort with OMP_TARGET_OFFLOAD=mandatory wrongly claiming that no
device is available. It was called a tiny bit later but few lines too
late for updating the default-device-var.

libgomp/ChangeLog:

	* target.c (resolve_device): Call gomp_get_num_devices early to ensure
	gomp_init_targets_once was called before using default-device-var.
	* testsuite/libgomp.c/target-55.c: New test.
	* testsuite/libgomp.c/target-55a.c: New test.
2023-06-16 17:48:09 +02:00
Tobias Burnus
73a0d3bf89 libgomp: Extend OMP_ALLOCATOR, add affinity env var doc
Support OpenMP 5.1's syntax for OMP_ALLOCATOR as well,
which permits besides predefined allocators also
predefined memspaces optionally followed by traits.

Additionally, this commit adds the previously lacking
documentation for OMP_ALLOCATOR, OMP_AFFINITY_FORMAT
and OMP_DISPLAY_AFFINITY.

libgomp/ChangeLog:

	* env.c (gomp_def_allocator_envvar): New var.
	(parse_allocator): Handle OpenMP 5.1 syntax.
	(cleanup_env): New.
	(omp_display_env): Output gomp_def_allocator_envvar
	for an allocator with traits.
	* libgomp.texi (OMP_ALLOCATOR, OMP_AFFINITY_FORMAT,
	OMP_DISPLAY_AFFINITY): New.
	* testsuite/libgomp.c/allocator-1.c: New test.
	* testsuite/libgomp.c/allocator-2.c: New test.
	* testsuite/libgomp.c/allocator-3.c: New test.
	* testsuite/libgomp.c/allocator-4.c: New test.
	* testsuite/libgomp.c/allocator-5.c: New test.
	* testsuite/libgomp.c/allocator-6.c: New test.
2023-06-15 12:55:58 +02:00
Thomas Schwinge
f2ef1dabbc Align a 'OMP_TARGET_OFFLOAD=mandatory' diagnostic with others
On 2023-06-14T11:42:22+0200, Tobias Burnus <tobias@codesourcery.com> wrote:
> On 14.06.23 10:09, Thomas Schwinge wrote:
>> Let me know if I should also adjust the new 'target { ! offload_device }'
>> diagnostic "[...] MANDATORY but only the host device is available" to
>> include a comma before 'but', for consistency with the other existing
>> diagnostics (cited above)?
>
> I think it makes sense to be consistent. Thus: Yes, please add the commas.

Fix-up for recent commit 18c8b56c7d
"OpenMP: Set default-device-var with OMP_TARGET_OFFLOAD=mandatory".

	libgomp/
	* target.c (resolve_device): Align a
	'OMP_TARGET_OFFLOAD=mandatory' diagnostic with others.
	* testsuite/libgomp.c/target-51.c: Adjust.
2023-06-14 12:56:35 +02:00