2016-01-19 Martin Jambor <mjambor@suse.cz>
Martin Liska <mliska@suse.cz>
Michael Matz <matz@suse.de>
libgomp/
* plugin/Makefrag.am: Add HSA plugin requirements.
* plugin/configfrag.ac (HSA_RUNTIME_INCLUDE): New variable.
(HSA_RUNTIME_LIB): Likewise.
(HSA_RUNTIME_CPPFLAGS): Likewise.
(HSA_RUNTIME_INCLUDE): New substitution.
(HSA_RUNTIME_LIB): Likewise.
(HSA_RUNTIME_LDFLAGS): Likewise.
(hsa-runtime): New configure option.
(hsa-runtime-include): Likewise.
(hsa-runtime-lib): Likewise.
(PLUGIN_HSA): New substitution variable.
Fill HSA_RUNTIME_INCLUDE and HSA_RUNTIME_LIB according to the new
configure options.
(PLUGIN_HSA_CPPFLAGS): Likewise.
(PLUGIN_HSA_LDFLAGS): Likewise.
(PLUGIN_HSA_LIBS): Likewise.
Check that we have access to HSA run-time.
* libgomp-plugin.h (offload_target_type): New element
OFFLOAD_TARGET_TYPE_HSA.
* libgomp.h (gomp_target_task): New fields firstprivate_copies and
args.
(bool gomp_create_target_task): Updated.
(gomp_device_descr): Extra parameter of run_func and async_run_func,
new field can_run_func.
* libgomp_g.h (GOMP_target_ext): Update prototype.
* oacc-host.c (host_run): Added a new parameter args.
* target.c (calculate_firstprivate_requirements): New function.
(copy_firstprivate_data): Likewise.
(gomp_target_fallback_firstprivate): Use them.
(gomp_target_unshare_firstprivate): New function.
(gomp_get_target_fn_addr): Allow returning NULL for shared memory
devices.
(GOMP_target): Do host fallback for all shared memory devices. Do not
pass any args to plugins.
(GOMP_target_ext): Introduce device-specific argument parameter args.
Allow host fallback if device shares memory. Do not remap data if
device has shared memory.
(gomp_target_task_fn): Likewise. Also treat shared memory devices
like host fallback for mappings.
(GOMP_target_data): Treat shared memory devices like host fallback.
(GOMP_target_data_ext): Likewise.
(GOMP_target_update): Likewise.
(GOMP_target_update_ext): Likewise. Also pass NULL as args to
gomp_create_target_task.
(GOMP_target_enter_exit_data): Likewise.
(omp_target_alloc): Treat shared memory devices like host fallback.
(omp_target_free): Likewise.
(omp_target_is_present): Likewise.
(omp_target_memcpy): Likewise.
(omp_target_memcpy_rect): Likewise.
(omp_target_associate_ptr): Likewise.
(gomp_load_plugin_for_device): Also load can_run.
* task.c (GOMP_PLUGIN_target_task_completion): Free
firstprivate_copies.
(gomp_create_target_task): Accept new argument args and store it to
ttask.
* plugin/plugin-hsa.c: New file.
gcc/
* Makefile.in (OBJS): Add new source files.
(GTFILES): Add hsa.c.
* common.opt (disable_hsa): New variable.
(-Whsa): New warning.
* config.in (ENABLE_HSA): New.
* configure.ac: Treat hsa differently from other accelerators.
(OFFLOAD_TARGETS): Define ENABLE_OFFLOADING according to
$enable_offloading.
(ENABLE_HSA): Define ENABLE_HSA according to $enable_hsa.
* doc/install.texi (Configuration): Document --with-hsa-runtime,
--with-hsa-runtime-include, --with-hsa-runtime-lib and
--with-hsa-kmt-lib.
* doc/invoke.texi (-Whsa): Document.
(hsa-gen-debug-stores): Likewise.
* lto-wrapper.c (compile_images_for_offload_targets): Do not attempt
to invoke offload compiler for hsa acclerator.
* opts.c (common_handle_option): Determine whether HSA offloading
should be performed.
* params.def (PARAM_HSA_GEN_DEBUG_STORES): New parameter.
* builtin-types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
* gimple-low.c (lower_stmt): Also handle GIMPLE_OMP_GRID_BODY.
* gimple-pretty-print.c (dump_gimple_omp_for): Also handle
GF_OMP_FOR_KIND_GRID_LOOP.
(dump_gimple_omp_block): Also handle GIMPLE_OMP_GRID_BODY.
(pp_gimple_stmt_1): Likewise.
* gimple-walk.c (walk_gimple_stmt): Likewise.
* gimple.c (gimple_build_omp_grid_body): New function.
(gimple_copy): Also handle GIMPLE_OMP_GRID_BODY.
* gimple.def (GIMPLE_OMP_GRID_BODY): New.
* gimple.h (enum gf_mask): Added GF_OMP_PARALLEL_GRID_PHONY,
GF_OMP_FOR_KIND_GRID_LOOP, GF_OMP_FOR_GRID_PHONY and
GF_OMP_TEAMS_GRID_PHONY.
(gimple_statement_omp_single_layout): Updated comments.
(gimple_build_omp_grid_body): New function.
(gimple_has_substatements): Also handle GIMPLE_OMP_GRID_BODY.
(gimple_omp_for_grid_phony): New function.
(gimple_omp_for_set_grid_phony): Likewise.
(gimple_omp_parallel_grid_phony): Likewise.
(gimple_omp_parallel_set_grid_phony): Likewise.
(gimple_omp_teams_grid_phony): Likewise.
(gimple_omp_teams_set_grid_phony): Likewise.
(gimple_return_set_retbnd): Also handle GIMPLE_OMP_GRID_BODY.
* omp-builtins.def (BUILT_IN_GOMP_OFFLOAD_REGISTER): New.
(BUILT_IN_GOMP_OFFLOAD_UNREGISTER): Likewise.
(BUILT_IN_GOMP_TARGET): Updated type.
* omp-low.c: Include symbol-summary.h, hsa.h and params.h.
(adjust_for_condition): New function.
(get_omp_for_step_from_incr): Likewise.
(extract_omp_for_data): Moved parts to adjust_for_condition and
get_omp_for_step_from_incr.
(build_outer_var_ref): Handle GIMPLE_OMP_GRID_BODY.
(fixup_child_record_type): Bail out if receiver_decl is NULL.
(scan_sharing_clauses): Handle OMP_CLAUSE__GRIDDIM_.
(scan_omp_parallel): Do not create child functions for phony
constructs.
(check_omp_nesting_restrictions): Handle GIMPLE_OMP_GRID_BODY.
(scan_omp_1_op): Checking assert we are not remapping to
ERROR_MARK. Also also handle GIMPLE_OMP_GRID_BODY.
(parallel_needs_hsa_kernel_p): New function.
(expand_parallel_call): Register apprpriate parallel child
functions as HSA kernels.
(grid_launch_attributes_trees): New type.
(grid_attr_trees): New variable.
(grid_create_kernel_launch_attr_types): New function.
(grid_insert_store_range_dim): Likewise.
(grid_get_kernel_launch_attributes): Likewise.
(get_target_argument_identifier_1): Likewise.
(get_target_argument_identifier): Likewise.
(get_target_argument_value): Likewise.
(push_target_argument_according_to_value): Likewise.
(get_target_arguments): Likewise.
(expand_omp_target): Call get_target_arguments instead of looking
up for teams and thread limit.
(grid_expand_omp_for_loop): New function.
(grid_arg_decl_map): New type.
(grid_remap_kernel_arg_accesses): New function.
(grid_expand_target_kernel_body): New function.
(expand_omp): Call it.
(lower_omp_for): Do not emit phony constructs.
(lower_omp_taskreg): Do not emit phony constructs but create for them
a temporary variable receiver_decl.
(lower_omp_taskreg): Do not emit phony constructs.
(lower_omp_teams): Likewise.
(lower_omp_grid_body): New function.
(lower_omp_1): Call it.
(grid_reg_assignment_to_local_var_p): New function.
(grid_seq_only_contains_local_assignments): Likewise.
(grid_find_single_omp_among_assignments_1): Likewise.
(grid_find_single_omp_among_assignments): Likewise.
(grid_find_ungridifiable_statement): Likewise.
(grid_target_follows_gridifiable_pattern): Likewise.
(grid_remap_prebody_decls): Likewise.
(grid_copy_leading_local_assignments): Likewise.
(grid_process_kernel_body_copy): Likewise.
(grid_attempt_target_gridification): Likewise.
(grid_gridify_all_targets_stmt): Likewise.
(grid_gridify_all_targets): Likewise.
(execute_lower_omp): Call grid_gridify_all_targets.
(make_gimple_omp_edges): Handle GIMPLE_OMP_GRID_BODY.
* tree-core.h (omp_clause_code): Added OMP_CLAUSE__GRIDDIM_.
(tree_omp_clause): Added union field dimension.
* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE__GRIDDIM_.
* tree.c (omp_clause_num_ops): Added number of arguments of
OMP_CLAUSE__GRIDDIM_.
(omp_clause_code_name): Added name of OMP_CLAUSE__GRIDDIM_.
(walk_tree_1): Handle OMP_CLAUSE__GRIDDIM_.
* tree.h (OMP_CLAUSE_GRIDDIM_DIMENSION): New.
(OMP_CLAUSE_SET_GRIDDIM_DIMENSION): Likewise.
(OMP_CLAUSE_GRIDDIM_SIZE): Likewise.
(OMP_CLAUSE_GRIDDIM_GROUP): Likewise.
* passes.def: Schedule pass_ipa_hsa and pass_gen_hsail.
* tree-pass.h (make_pass_gen_hsail): Declare.
(make_pass_ipa_hsa): Likewise.
* ipa-hsa.c: New file.
* lto-section-in.c (lto_section_name): Add hsa section name.
* lto-streamer.h (lto_section_type): Add hsa section.
* timevar.def (TV_IPA_HSA): New.
* hsa-brig-format.h: New file.
* hsa-brig.c: New file.
* hsa-dump.c: Likewise.
* hsa-gen.c: Likewise.
* hsa.c: Likewise.
* hsa.h: Likewise.
* toplev.c (compile_file): Call hsa_output_brig.
* hsa-regalloc.c: New file.
gcc/fortran/
* types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
gcc/lto/
* lto-partition.c: Include "hsa.h"
(add_symbol_to_partition_1): Put hsa implementations into the
same partition as host implementations.
liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_async_run): New
unused parameter.
(GOMP_OFFLOAD_run): Likewise.
include/
* gomp-constants.h (GOMP_DEVICE_HSA): New macro.
(GOMP_VERSION_HSA): Likewise.
(GOMP_TARGET_ARG_DEVICE_MASK): Likewise.
(GOMP_TARGET_ARG_DEVICE_ALL): Likewise.
(GOMP_TARGET_ARG_SUBSEQUENT_PARAM): Likewise.
(GOMP_TARGET_ARG_ID_MASK): Likewise.
(GOMP_TARGET_ARG_NUM_TEAMS): Likewise.
(GOMP_TARGET_ARG_THREAD_LIMIT): Likewise.
(GOMP_TARGET_ARG_VALUE_SHIFT): Likewise.
(GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES): Likewise.
From-SVN: r232549
When link_ptx runs, a CUDA device is already bound to current thread, so the
driver library knows the target architecture. There isn't any benefit from
forcing a specific target here; on the contrary, hardcoding sm_30 breaks
offloading on later (Maxwell, sm_5x) devices.
* plugin/plugin-nvptx.c (link_ptx): Do not set CU_JIT_TARGET.
From-SVN: r232227
gcc/
* config/nvptx/mkoffload.c (Kind, Vis): Remove enums.
(Token, Stmt): Remove structs.
(decls, vars, fns): Remove variables.
(alloc_comment, append_stmt, is_keyword): Remove macros.
(tokenize, write_token, write_tokens, alloc_stmt, rev_stmts)
(write_stmt, write_stmts, parse_insn, parse_list_nosemi)
(parse_init, parse_file): Remove functions.
(read_file): Accept a pointer to a length and store into it.
(process): Don't try to parse the input file, just write it out as
a string, but looking for maps. Also write out the length.
(main): Don't use "-S" to compile PTX code.
libgomp/
* oacc-ptx.h: Remove file, moving its content into...
* config/nvptx/fortran.c: ... here...
* config/nvptx/oacc-init.c: ..., here...
* config/nvptx/oacc-parallel.c: ..., and here.
* config/nvptx/openacc.f90: New file.
* plugin/plugin-nvptx.c: Don't include "oacc-ptx.h".
(link_ptx): Don't link in predefined bits of PTX code.
Co-Authored-By: Bernd Schmidt <bernds@codesourcery.com>
From-SVN: r228418
gcc/
* configure.ac (offload_targets, OFFLOAD_TARGETS): Separate
offload targets by commas, not colons.
* config.in: Regenerate.
* configure: Likewise.
* gcc.c (driver::maybe_putenv_COLLECT_LTO_WRAPPER): Due to that,
instead of setting up the default offload targets here...
(process_command): ..., do it here.
libgomp/
* plugin/configfrag.ac (OFFLOAD_TARGETS): Clarify that offload
targets are separated by commas.
* config.h.in: Regenerate.
From-SVN: r228053
PR libgomp/65099
gcc/
* config/nvptx/mkoffload.c (main): Create an offload image only in
64-bit configurations.
libgomp/
* plugin/plugin-nvptx.c (nvptx_get_num_devices): Return 0 if not
in a 64-bit configuration.
* testsuite/libgomp.oacc-c++/c++.exp: Don't attempt nvidia
offloading testing if no such device is available.
* testsuite/libgomp.oacc-c/c.exp: Likewise.
* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
From-SVN: r225560
PR libgomp/65742
gcc/
* builtins.c (expand_builtin_acc_on_device): Don't use open-coded
sequence for !ACCEL_COMPILER.
libgomp/
* oacc-init.c (plugin/plugin-host.h): Include.
(acc_on_device): Check whether we're in an offloaded region for
host_nonshm
plugin. Don't use __builtin_acc_on_device.
* plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set
nonshm_exec flag in thread-local data.
(GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local
data for host_nonshm plugin.
(GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local data
for host_nonshm plugin.
* plugin/plugin-host.h: New.
From-SVN: r223801
gcc/
* config/i386/intelmic-mkoffload.c (generate_host_descr_file): Call
GOMP_offload_unregister from the destructor.
libgomp/
* libgomp-plugin.h (struct mapping_table): Replace with addr_pair.
* libgomp.h (struct gomp_memory_mapping): Remove.
(struct target_mem_desc): Change type of mem_map from
gomp_memory_mapping * to splay_tree_s *.
(struct gomp_device_descr): Remove register_image_func, get_table_func.
Add load_image_func, unload_image_func.
Change type of mem_map from gomp_memory_mapping to splay_tree_s.
Remove offload_regions_registered.
(gomp_init_tables): Remove.
(gomp_free_memmap): Change type of argument from gomp_memory_mapping *
to splay_tree_s *.
* libgomp.map (GOMP_4.0.1): Add GOMP_offload_unregister.
* oacc-host.c (host_dispatch): Do not initialize register_image_func,
get_table_func, mem_map.is_initialized, mem_map.splay_tree.root,
offload_regions_registered.
Initialize load_image_func, unload_image_func, mem_map.root.
(goacc_host_init): Do not initialize host_dispatch.mem_map.lock.
* oacc-init.c (lazy_open): Don't call gomp_init_tables.
(acc_shutdown_1): Use dev's lock and splay_tree instead of mem_map's.
* oacc-mem.c (lookup_host): Get gomp_device_descr *dev instead of
gomp_memory_mapping *. Use dev's lock and splay_tree.
(lookup_dev): Use dev's lock.
(acc_deviceptr): Pass dev to lookup_host instead of mem_map.
(acc_is_present): Likewise.
(acc_map_data): Likewise.
(acc_unmap_data): Likewise. Use dev's lock.
(present_create_copy): Likewise.
(delete_copyout): Pass dev to lookup_host instead of mem_map.
(update_dev_host): Likewise.
(gomp_acc_remove_pointer): Likewise. Use dev's lock.
* oacc-parallel.c (GOACC_parallel): Use dev's lock and splay_tree.
* plugin/plugin-host.c (GOMP_OFFLOAD_register_image): Remove.
(GOMP_OFFLOAD_get_table): Remove
(GOMP_OFFLOAD_load_image): New function.
(GOMP_OFFLOAD_unload_image): New function.
* target.c (register_lock): New mutex for offload image registration.
(num_devices): Do not guard with PLUGIN_SUPPORT.
(gomp_realloc_unlock): New static function.
(gomp_map_vars_existing): Add device descriptor argument. Unlock mutex
before gomp_fatal.
(gomp_map_vars): Use dev's lock and splay_tree instead of mem_map's.
Pass devicep to gomp_map_vars_existing. Unlock mutex before gomp_fatal.
(gomp_copy_from_async): Use dev's lock and splay_tree instead of
mem_map's.
(gomp_unmap_vars): Likewise.
(gomp_update): Remove gomp_memory_mapping argument. Use dev's lock and
splay_tree instead of mm's. Unlock mutex before gomp_fatal.
(gomp_offload_image_to_device): New static function.
(GOMP_offload_register): Add mutex lock.
Call gomp_offload_image_to_device for all initialized devices.
Replace gomp_realloc with gomp_realloc_unlock.
(GOMP_offload_unregister): New function.
(gomp_init_tables): Replace with gomp_init_device. Replace a call to
get_table_func from the plugin with calls to init_device_func and
gomp_offload_image_to_device.
(gomp_free_memmap): Change type of argument from gomp_memory_mapping *
to splay_tree_s *.
(GOMP_target): Do not call gomp_init_tables. Use dev's lock and
splay_tree instead of mem_map's. Unlock mutex before gomp_fatal.
(GOMP_target_data): Do not call gomp_init_tables.
(GOMP_target_update): Likewise. Remove argument from gomp_update.
(gomp_load_plugin_for_device): Replace register_image and get_table
with load_image and unload_image in DLSYM ().
(gomp_register_images_for_device): Remove function.
(gomp_target_init): Do not initialize current_device.mem_map.*,
current_device.offload_regions_registered.
Remove call to gomp_register_images_for_device.
Do not free offload_images and num_offload_images.
liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp: Include map.
(AddrVect, DevAddrVect, ImgDevAddrMap): New typedefs.
(num_devices, num_images, address_table): New static vars.
(num_libraries, lib_descrs): Remove static vars.
(set_mic_lib_path): Rename to ...
(init): ... this. Allocate address_table and get num_devices.
(GOMP_OFFLOAD_get_num_devices): return num_devices.
(load_lib_and_get_table): Remove static function.
(offload_image): New static function.
(GOMP_OFFLOAD_get_table): Remove function.
(GOMP_OFFLOAD_load_image, GOMP_OFFLOAD_unload_image): New functions.
From-SVN: r221878