By Dong-hyeon Park on 14 April 2014

Sorry for the delay, but I finally modified the GPGPU-sim code to print early cache statistics so we can adjust for compulsory misses. The original plan was to have all our benchmarks have some sort of a dry-run initially to fill up the cache before collecting data, but looking at the source code, trying to manipulate when/how the benchmarks are run is going to be pretty tricky. So intead, we are settling for printing the cache statistics early (like after first 10,000 cycles), then comparing it with the final cache statistics once everything finishes running.

Read on for details on how the soruce code was modified.

GPGPU-Sim Source Code Changes

These two files were modified:

  • gpu-sim.cc
  • gpu-sim.h

In gpgpu-sim.cc

I added a print_l1_cache_statistics() function to the gpgpu_sim object with the following code:

void gpgpu_sim::print_l1_cache_statistics()
{
   printf("======= INTERMEDIARY CACHE STATS ====== \n");
   shader_print_cache_stats(stdout);
   cache_stats core_cache_stats;
   //core_cache_stats.clear();
   for(unsigned i=0; i<m_config.num_cluster(); i++){
       m_cluster[i]->get_cache_stats(core_cache_stats);
   }
   printf("\nTotal_core_cache_stats:\n");
   core_cache_stats.print_stats(stdout, "Total_core_cache_stats_breakdown");
   printf("======================================== \n");
}

This code basically tells the gpgpu-sim object to print the current L1 cache statistics for each core. I had the cycle() command call this method. Write now, cycle() method simulates a single cycle of gpu execuation, then prints a one-line status every 500-1000 cycles. I inserted this line when in the block of code that prints these statistics so the cache statics get printed after the first 10,000 cycles:

if (gpu_sim_cycle == CACHE_PRINT_CYCLE_COUNT)
    print_l1_cache_statistics();

CACHE_PRINT_CYCLE_COUNT is set to 10,000 right now.

In gpu-sim.h

The only change made in this file is to add the function header for the print_l1_cache_statistics amongst the public declarations for gpgpu_sim object.

New Printout:

I tested this using the default GTX480 configuration for CP benchmark.

Here's what we see from the cache statics at 10,000 cycle point:

GPGPU-Sim uArch: cycles simulated: 10000  inst.: 5276160 (ipc=527.6) sim_rate=439680 (inst/sec) elapsed = 0:0:00:12 / Mon Apr 14 22:23:55 2014
======= INTERMEDIARY CACHE STATS ====== 

========= Core cache stats =========
L1I_cache:
PRINT FORMAT:L1I_cache_core[ Cluster_Index][ Core_index ]: Access, Miss, Miss_rate, Pending_hits, Reservation_fails 

    L1I_cache_core[0][0]: Access = 5943, Miss = 114, Miss_rate = 0.01918, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[1][0]: Access = 5966, Miss = 114, Miss_rate = 0.01911, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[2][0]: Access = 5938, Miss = 113, Miss_rate = 0.01903, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[3][0]: Access = 5969, Miss = 114, Miss_rate = 0.01910, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[4][0]: Access = 5978, Miss = 115, Miss_rate = 0.01924, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[5][0]: Access = 5964, Miss = 114, Miss_rate = 0.01911, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[6][0]: Access = 5934, Miss = 113, Miss_rate = 0.01904, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[7][0]: Access = 5941, Miss = 113, Miss_rate = 0.01902, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[8][0]: Access = 5910, Miss = 104, Miss_rate = 0.01760, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[9][0]: Access = 5895, Miss = 107, Miss_rate = 0.01815, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[10][0]: Access = 5894, Miss = 100, Miss_rate = 0.01697, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[11][0]: Access = 5904, Miss = 105, Miss_rate = 0.01778, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[12][0]: Access = 5902, Miss = 103, Miss_rate = 0.01745, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[13][0]: Access = 5885, Miss = 100, Miss_rate = 0.01699, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[14][0]: Access = 5890, Miss = 97, Miss_rate = 0.01647, Pending_hits = 0, Reservation_fails = 0
    L1I_total_cache_accesses = 88913
    L1I_total_cache_misses = 1626
    L1I_total_cache_miss_rate = 0.01829
    L1I_total_cache_pending_hits = 0
    L1I_total_cache_reservation_fails = 0
L1D_cache:
    L1D_cache_core[0]: Access = 0, Miss = 0, Miss_rate = -nan, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[1]: Access = 0, Miss = 0, Miss_rate = -nan, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[2]: Access = 0, Miss = 0, Miss_rate = -nan, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[3]: Access = 0, Miss = 0, Miss_rate = -nan, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[4]: Access = 0, Miss = 0, Miss_rate = -nan, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[5]: Access = 0, Miss = 0, Miss_rate = -nan, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[6]: Access = 0, Miss = 0, Miss_rate = -nan, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[7]: Access = 0, Miss = 0, Miss_rate = -nan, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[8]: Access = 0, Miss = 0, Miss_rate = -nan, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[9]: Access = 0, Miss = 0, Miss_rate = -nan, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[10]: Access = 0, Miss = 0, Miss_rate = -nan, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[11]: Access = 0, Miss = 0, Miss_rate = -nan, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[12]: Access = 0, Miss = 0, Miss_rate = -nan, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[13]: Access = 0, Miss = 0, Miss_rate = -nan, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[14]: Access = 0, Miss = 0, Miss_rate = -nan, Pending_hits = 0, Reservation_fails = 0
    L1D_total_cache_accesses = 0
    L1D_total_cache_misses = 0
    L1D_total_cache_pending_hits = 0
    L1D_total_cache_reservation_fails = 0
    L1D_cache_data_port_util = 0.000
    L1D_cache_fill_port_util = 0.000
L1C_cache:
    L1C_total_cache_accesses = 41886
    L1C_total_cache_misses = 3620
    L1C_total_cache_miss_rate = 0.0864
    L1C_total_cache_pending_hits = 0
    L1C_total_cache_reservation_fails = 4679
L1T_cache:
    L1T_total_cache_accesses = 0
    L1T_total_cache_misses = 0
    L1T_total_cache_pending_hits = 0
    L1T_total_cache_reservation_fails = 0

Total_core_cache_stats:
    Total_core_cache_stats_breakdown[CONST_ACC_R][HIT] = 38266
    Total_core_cache_stats_breakdown[CONST_ACC_R][MISS] = 3620
    Total_core_cache_stats_breakdown[CONST_ACC_R][RESERVATION_FAIL] = 4679
    Total_core_cache_stats_breakdown[INST_ACC_R][HIT] = 87287
    Total_core_cache_stats_breakdown[INST_ACC_R][MISS] = 1626
======================================== 
GPGPU-Sim PTX: 5400000 instructions simulated : ctaid=(7,11,0) tid=(15,1,0)

It's a bit concerning that there's no data access during the first 10,000 cycles(???) And this is the final cache statistic after benchmark is completed

========= Core cache stats =========
L1I_cache:
PRINT FORMAT:L1I_cache_core[ Cluster_Index][ Core_index ]: Access, Miss, Miss_rate, Pending_hits, Reservation_fails 

    L1I_cache_core[0][0]: Access = 137754, Miss = 122, Miss_rate = 0.00089, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[1][0]: Access = 137754, Miss = 122, Miss_rate = 0.00089, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[2][0]: Access = 137754, Miss = 122, Miss_rate = 0.00089, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[3][0]: Access = 137753, Miss = 121, Miss_rate = 0.00088, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[4][0]: Access = 137756, Miss = 124, Miss_rate = 0.00090, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[5][0]: Access = 137754, Miss = 122, Miss_rate = 0.00089, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[6][0]: Access = 137753, Miss = 121, Miss_rate = 0.00088, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[7][0]: Access = 137753, Miss = 121, Miss_rate = 0.00088, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[8][0]: Access = 137744, Miss = 112, Miss_rate = 0.00081, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[9][0]: Access = 137746, Miss = 114, Miss_rate = 0.00083, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[10][0]: Access = 137740, Miss = 108, Miss_rate = 0.00078, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[11][0]: Access = 145841, Miss = 113, Miss_rate = 0.00077, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[12][0]: Access = 137743, Miss = 111, Miss_rate = 0.00081, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[13][0]: Access = 137739, Miss = 107, Miss_rate = 0.00078, Pending_hits = 0, Reservation_fails = 0
    L1I_cache_core[14][0]: Access = 137737, Miss = 105, Miss_rate = 0.00076, Pending_hits = 0, Reservation_fails = 0
    L1I_total_cache_accesses = 2074321
    L1I_total_cache_misses = 1745
    L1I_total_cache_miss_rate = 0.00084
    L1I_total_cache_pending_hits = 0
    L1I_total_cache_reservation_fails = 0
L1D_cache:
    L1D_cache_core[0]: Access = 544, Miss = 272, Miss_rate = 0.500, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[1]: Access = 544, Miss = 272, Miss_rate = 0.500, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[2]: Access = 544, Miss = 272, Miss_rate = 0.500, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[3]: Access = 544, Miss = 272, Miss_rate = 0.500, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[4]: Access = 544, Miss = 272, Miss_rate = 0.500, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[5]: Access = 544, Miss = 272, Miss_rate = 0.500, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[6]: Access = 544, Miss = 272, Miss_rate = 0.500, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[7]: Access = 544, Miss = 272, Miss_rate = 0.500, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[8]: Access = 544, Miss = 272, Miss_rate = 0.500, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[9]: Access = 544, Miss = 272, Miss_rate = 0.500, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[10]: Access = 544, Miss = 272, Miss_rate = 0.500, Pending_hits = 0, Reservation_fails = 5
    L1D_cache_core[11]: Access = 576, Miss = 288, Miss_rate = 0.500, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[12]: Access = 544, Miss = 272, Miss_rate = 0.500, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[13]: Access = 544, Miss = 272, Miss_rate = 0.500, Pending_hits = 0, Reservation_fails = 0
    L1D_cache_core[14]: Access = 544, Miss = 272, Miss_rate = 0.500, Pending_hits = 0, Reservation_fails = 0
    L1D_total_cache_accesses = 8192
    L1D_total_cache_misses = 4096
    L1D_total_cache_miss_rate = 0.5000
    L1D_total_cache_pending_hits = 0
    L1D_total_cache_reservation_fails = 5
    L1D_cache_data_port_util = 0.001
    L1D_cache_fill_port_util = 0.001
L1C_cache:
    L1C_total_cache_accesses = 1028096
    L1C_total_cache_misses = 22056
    L1C_total_cache_miss_rate = 0.0215
    L1C_total_cache_pending_hits = 0
    L1C_total_cache_reservation_fails = 5482
L1T_cache:
    L1T_total_cache_accesses = 0
    L1T_total_cache_misses = 0
    L1T_total_cache_pending_hits = 0
    L1T_total_cache_reservation_fails = 0

Total_core_cache_stats:
    Total_core_cache_stats_breakdown[GLOBAL_ACC_R][HIT] = 28
    Total_core_cache_stats_breakdown[GLOBAL_ACC_R][MISS] = 4068
    Total_core_cache_stats_breakdown[GLOBAL_ACC_R][RESERVATION_FAIL] = 5
    Total_core_cache_stats_breakdown[CONST_ACC_R][HIT] = 1006040
    Total_core_cache_stats_breakdown[CONST_ACC_R][MISS] = 22056
    Total_core_cache_stats_breakdown[CONST_ACC_R][RESERVATION_FAIL] = 5482
    Total_core_cache_stats_breakdown[GLOBAL_ACC_W][HIT] = 4068
    Total_core_cache_stats_breakdown[GLOBAL_ACC_W][MISS] = 28
    Total_core_cache_stats_breakdown[INST_ACC_R][HIT] = 2072576
    Total_core_cache_stats_breakdown[INST_ACC_R][MISS] = 1745
category: tags:


blog comments powered by Disqus