一些重要OpenCL概念的解释:
Computing unit - An OpenCL* device has one or more compute units. A work-group executes on a single compute unit. A compute unit is composed of one or more processing elements and local memory. A compute unit may also include dedicated texture filter units that can be accessed by its processing elements.
Device - A device is a collection of compute units.
A command-queue is used to queue commands to a device. Examples of commands include executing kernels, or reading and writing memory objects. OpenCL* devices typically correspond to a GPU, a multi-core CPU, and other processors such as DSPs and the Cell/B.E. processor.
A kernel is a function declared in a program and executed on an OpenCL* device. A kernel is identified by the kernel or kernel qualifier applied to any function defined in a program.
Work item - one of a collection of parallel executions of a kernel invoked on a device by a command. A work-item is executed by one or more processing elements as part of a work-group executing on a compute unit. A work-item is distinguished from other executions within the collection by its global ID and local ID.
Work-group - a collection of related work-items that execute on a single compute unit. The work-items in the group execute the same kernel and share local memory and work-group barriers.
Each work-group has the following properties:
- Data sharing between work items via local memory
- Synchronization between work items via barriers and memory fences
- Special work-group level built-in functions, such as work_group_copy.
A multi-core CPU or multiple CPUs (in a multi-socket machine) constitute a single OpenCL* device. Separate cores are compute units. Device Fission extension enables you to control compute unit utilization within a compute device. You can find more information in the ‘Device Fission Extension Support’ section of the Intel® SDK for OpenCL* User’s Guide (see Related Documents).
When launching the kernel for execution, the host code defines the grid dimensions, or the global work size. The host code can also define the partitioning to work-groups, or leave it to the implementation. During the execution, the implementation runs a single work item for each point on the grid. It also groups the execution on compute units according to the work-group size.
The order of execution of work items within a work-group, as well as the order of work-groups, is implementation-specific.
Task-Parallel programming model – the OpenCL* programming model that runs a single work-group with a single work item.
No comments:
Post a Comment