
This patch adds a stub 'gomp_target_rev' in the host's target.c, which will later handle the reverse offload. For nvptx, it adds support for forwarding the offload gomp_target_ext call to the host by setting values in a struct on the device and querying it on the host - invoking gomp_target_rev on the result. include/ChangeLog: * cuda/cuda.h (enum CUdevice_attribute): Add CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING. (CU_MEMHOSTALLOC_DEVICEMAP): Define. (cuMemHostAlloc): Add prototype. libgomp/ChangeLog: * config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Remove 'static' for this variable. * config/nvptx/libgomp-nvptx.h: New file. * config/nvptx/target.c: Include it. (GOMP_ADDITIONAL_ICVS): Declare extern var. (GOMP_REV_OFFLOAD_VAR): Declare var. (GOMP_target_ext): Handle reverse offload. * libgomp-plugin.h (GOMP_PLUGIN_target_rev): New prototype. * libgomp-plugin.c (GOMP_PLUGIN_target_rev): New, call ... * target.c (gomp_target_rev): ... this new stub function. * libgomp.h (gomp_target_rev): Declare. * libgomp.map (GOMP_PLUGIN_1.4): New; add GOMP_PLUGIN_target_rev. * plugin/cuda-lib.def (cuMemHostAlloc): Add. * plugin/plugin-nvptx.c: Include libgomp-nvptx.h. (struct ptx_device): Add rev_data member. (nvptx_open_device): Remove async_engines query, last used in r10-304-g1f4c5b9b; add unified-address assert check. (GOMP_OFFLOAD_get_num_devices): Claim unified address support. (GOMP_OFFLOAD_load_image): Free rev_fn_table if no offload functions exist. Make offload var available on host and device. (rev_off_dev_to_host_cpy, rev_off_host_to_dev_cpy): New. (GOMP_OFFLOAD_run): Handle reverse offload.
55 lines
1.8 KiB
Modula-2
55 lines
1.8 KiB
Modula-2
CUDA_ONE_CALL (cuCtxCreate)
|
|
CUDA_ONE_CALL (cuCtxDestroy)
|
|
CUDA_ONE_CALL (cuCtxGetCurrent)
|
|
CUDA_ONE_CALL (cuCtxGetDevice)
|
|
CUDA_ONE_CALL (cuCtxPopCurrent)
|
|
CUDA_ONE_CALL (cuCtxPushCurrent)
|
|
CUDA_ONE_CALL (cuCtxSynchronize)
|
|
CUDA_ONE_CALL (cuDeviceGet)
|
|
CUDA_ONE_CALL (cuDeviceGetAttribute)
|
|
CUDA_ONE_CALL (cuDeviceGetCount)
|
|
CUDA_ONE_CALL (cuDeviceGetName)
|
|
CUDA_ONE_CALL (cuDeviceTotalMem)
|
|
CUDA_ONE_CALL (cuDriverGetVersion)
|
|
CUDA_ONE_CALL (cuEventCreate)
|
|
CUDA_ONE_CALL (cuEventDestroy)
|
|
CUDA_ONE_CALL (cuEventElapsedTime)
|
|
CUDA_ONE_CALL (cuEventQuery)
|
|
CUDA_ONE_CALL (cuEventRecord)
|
|
CUDA_ONE_CALL (cuEventSynchronize)
|
|
CUDA_ONE_CALL (cuFuncGetAttribute)
|
|
CUDA_ONE_CALL_MAYBE_NULL (cuGetErrorString)
|
|
CUDA_ONE_CALL (cuInit)
|
|
CUDA_ONE_CALL (cuLaunchKernel)
|
|
CUDA_ONE_CALL (cuLinkAddData)
|
|
CUDA_ONE_CALL_MAYBE_NULL (cuLinkAddData_v2)
|
|
CUDA_ONE_CALL (cuLinkComplete)
|
|
CUDA_ONE_CALL (cuLinkCreate)
|
|
CUDA_ONE_CALL_MAYBE_NULL (cuLinkCreate_v2)
|
|
CUDA_ONE_CALL (cuLinkDestroy)
|
|
CUDA_ONE_CALL (cuMemAlloc)
|
|
CUDA_ONE_CALL (cuMemAllocHost)
|
|
CUDA_ONE_CALL (cuMemHostAlloc)
|
|
CUDA_ONE_CALL (cuMemcpy)
|
|
CUDA_ONE_CALL (cuMemcpyDtoDAsync)
|
|
CUDA_ONE_CALL (cuMemcpyDtoH)
|
|
CUDA_ONE_CALL (cuMemcpyDtoHAsync)
|
|
CUDA_ONE_CALL (cuMemcpyHtoD)
|
|
CUDA_ONE_CALL (cuMemcpyHtoDAsync)
|
|
CUDA_ONE_CALL (cuMemFree)
|
|
CUDA_ONE_CALL (cuMemFreeHost)
|
|
CUDA_ONE_CALL (cuMemGetAddressRange)
|
|
CUDA_ONE_CALL (cuMemGetInfo)
|
|
CUDA_ONE_CALL (cuMemHostGetDevicePointer)
|
|
CUDA_ONE_CALL (cuModuleGetFunction)
|
|
CUDA_ONE_CALL (cuModuleGetGlobal)
|
|
CUDA_ONE_CALL (cuModuleLoad)
|
|
CUDA_ONE_CALL (cuModuleLoadData)
|
|
CUDA_ONE_CALL (cuModuleUnload)
|
|
CUDA_ONE_CALL_MAYBE_NULL (cuOccupancyMaxPotentialBlockSize)
|
|
CUDA_ONE_CALL (cuStreamAddCallback)
|
|
CUDA_ONE_CALL (cuStreamCreate)
|
|
CUDA_ONE_CALL (cuStreamDestroy)
|
|
CUDA_ONE_CALL (cuStreamQuery)
|
|
CUDA_ONE_CALL (cuStreamSynchronize)
|
|
CUDA_ONE_CALL (cuStreamWaitEvent)
|