The CUPTI installation includes several samples that demonstrate the use of the CUPTI APIs. These samples can be referred to for the usage of different APIs supported by CUPTI. The samples are:
Activity API
- activity_trace_async
- This sample shows how to collect a trace of CPU and GPU activity using the new asynchronous activity buffer APIs.
- callback_timestamp
- This sample shows how to use the callback API to record a trace of API start and stop times.
- cuda_graphs_trace
- This sample shows how to collect the trace of CUDA graphs and correlate the graph node launch to the node creation API using CUPTI callbacks.
- cuda_memory_trace
- This sample shows how to collect the trace of CUDA memory operations. The sample also traces CUDA memory operations done via default memory pool.
- cupti_correlation
- This sample shows how to do the correlation between CUDA APIs and corresponding GPU activities.
- cupti_external_correlation
- This sample shows how to do the correlation of CUDA API activity records with external APIs.
- cupti_finalize
- This sample shows how to use API cuptiFinalize() to dynamically detach and attach CUPTI.
- cupti_nvtx
- This sample shows how to receive NVTX callbacks and collect NVTX records in CUPTI.
- cupti_trace_injection
- This sample shows how to build an injection library using the CUPTI activity and callback APIs. It can be used to trace CUDA APIs and GPU activities for any CUDA application. It does not require the CUDA application to be modified.
- nvlink_bandwidth
- This sample shows how to collect NVLink topology and NVLink throughput metrics in continuous mode.
- openacc_trace
- This sample shows how to use CUPTI APIs for OpenACC data collection.
- pc_sampling
- This sample shows how to collect PC Sampling profiling information for a kernel using the PC Sampling Activity APIs.
- sass_source_map
- This sample shows how to generate CUpti_ActivityInstructionExecution records and how to map SASS assembly instructions to CUDA C source.
- unified_memory
- This sample shows how to collect information about page transfers for unified memory.
Event and Metric APIs
- callback_event
- This sample shows how to use both the callback and event APIs to record the events that occur during the execution of a simple kernel. The sample shows the required ordering for synchronization, and for event group enabling, disabling, and reading.
- callback_metric
- This sample shows how to use both the callback and metric APIs to record the metric's events during the execution of a simple kernel, and then use those events to calculate the metric value.
- cupti_query
- This sample shows how to query CUDA-enabled devices for their event domains, events, and metrics.
- event_multi_gpu
- This sample shows how to use the CUPTI event and CUDA APIs to sample events on a setup with multiple GPUs. The sample shows the required ordering for synchronization, and for event group enabling, disabling, and reading.
- event_sampling
- This sample shows how to use the event APIs to sample events using a separate host thread.
Profiling API
- extensions
- This includes utilities used in some of the samples.
- autorange_profiling
- This sample shows how to use profiling APIs to collect metrics in autorange mode.
- callback_profiling
- This sample shows how to use callback and profiling APIs to collect the metrics during the execution of a kernel. It shows how to use different phases of profiling i.e. enumeration, configuration, collection and evaluation in the appropriate callbacks.
- concurrent_profiling
- This sample shows how to use the profiling API to record metrics from concurrent kernels launched in two different ways - using multiple streams on a single device, and using multiple threads with multiple devices.
- cupti_metric_properties
- This sample shows how to query various properties of metrics using the Profiling APIs. The sample shows collection method (hardware or software) and number of passes required to collect a list of metrics.
- nested_range_profiling
- This sample shows how to profile nested ranges using the Profiling APIs.
- profiling_injection
- This sample for Linux systems shows how to build an injection library which can automatically enable CUPTI's Profiling API using Auto Ranges with Kernel Replay mode. It can attach to an application which was not instrumented using CUPTI and profile any kernel launches.
- userrange_profiling
- This sample shows how to use profiling APIs to collect metrics in user specified range mode.
PC Sampling API
- pc_sampling_continuous
- This injection sample shows how to collect PC Sampling profiling information using the PC Sampling APIs. A perl script libpc_sampling_continuous.pl is provided to run the CUDA application with different PC sampling options. Use the command './libpc_sampling_continuous.pl --help' to list all the options. The CUDA application code does not need to be modified. Refer the README.txt file shipped with the sample for instructions to build and use the injection library.
- pc_sampling_start_stop
- This sample shows how to collect PC Sampling profiling information for kernels within a range using the PC Sampling start/stop APIs.
- pc_sampling_utility
- This utility takes the pc sampling data file generated by the pc_sampling_continuous injection library as input. It prints the stall reason counter values at the GPU assembly instruction level. It also does GPU assembly to CUDA-C source correlation and shows the CUDA-C source file name and line number. Refer the README.txt file shipped with the sample for instructions to build and run the utility.
SASS Metric API
- sass_metric
- This sample shows how to use the SASS metric API to enumerate metrics supported by a device and how to collect metrics at the source level using SASS patching.
Checkpoint API