cacti
Integration of CACTI into Sphynx
I have been working on setting up CACTI on charlab in order to get more reliable power consumption results than the simulator used by GP-GPUSIM. The process of installing all the dependencies was slightly annoyinng, but now there is a working copy of CACTI 6.0 in the gpgpu-sim folder. It can be run to generate a report using standard compile and run instructions (make and then run).
All cache configuration is set in the config file (which seems well documented!). Alternatively, you can provide command line arguments to run a simulation.
C B A Tech NoBanks and / or -weight <delay> <dynamic> <leakage> <cycle> <area>
and / or -deviate <delay> <dynamic> <leakage> <cycle> <area>
C - Cache size in bytes
B - Block size in bytes
A - Associativity
Tech - Process technology in microns or nano-meter
NoBanks - No. of UCA banksC B A Tech NoBanks
and / or
-weight <delay> <dynamic> <leakage> <cycle> <area>
and / or
-deviate <delay> <dynamic> <leakage> <cycle> <area>
C - Cache size in bytes
B - Block size in bytes
A - Associativity
Tech - Process technology in microns or nano-meter
NoBanks - No. of UCA banks
Remember that the command line arguments are optional, and it is prefferable to use the config file.
CACTI Terminology
Here are some terms that appear in CACTI that we need to be familiar with
• Bank - A memory structure that consists of a data and a tag array. A cache is typically split into multiple banks and CACTI assumes enough bandwidth so that these banks can be accessed simultaneously. The network topology that interconnects these banks can vary depending on the cache model (UCA or NUCA).
• Sub-arrays - A data or tag array is divided into a number of sub-arrays to reduce the delay due to wordline and bitline. Unlike banks, at any given time, these sub-arrays support only one single access. The total number of sub-arrays in a cache is equal to the product of Ndwl and Ndbl.
• Mat - A group of four sub-arrays (2x2) that share a common central predecoder. CACTI’s exhaustive search starts from a minimum of at least one mat.
• Sub-bank - In a typical cache, a cache block is scattered across multiple sub-arrays to improve the reliability of a cache. Irrespective of the cache organization, CACTI assumes that every cache block in a cache is distributed across an entire row of mats and the row number corresponding to a particular block is determined based on the block address. Each row (of mats) in an array is referred to as a sub-bank.
• Ntwl/Ndwl - Number of horizontal partitions in a tag or data array i.e., the number of segments that a single wordline is partitioned into.
• Ntbl/Ndbl - Number of vertical partitions in a tag or data array i.e., the number of segments that a single bitline is partitioned into.
• Ntspd/Nspd - Number of sets stored in each row of a sub-array. For a given Ndwl and Ndbl values, Nspd decides the aspect ratio of the sub-array.
• Ntcm/Ndcm - Degree of bitline multiplexing.
• Ntsam/Ndsam - Degree of sense-amplifier multiplexing.
Power Output
Here is the output for running a simulation with an associativity of 8 and a block size of 128.
Power Components:
Data array: Total dynamic read energy/access (nJ): 0.104652
Total leakage read/write power all banks at maximum frequency (mW): 1152.16
Total energy in H-tree (that includes both address and data transfer) (nJ): 0.0947923
Decoder (nJ): 0.000141478
Wordline (nJ): 0.000422674
Bitline mux & associated drivers (nJ): 0.000255944
Sense amp mux & associated drivers (nJ): 0.000124171
Bitlines (nJ): 0.00627315
Sense amplifier energy (nJ): 0.00110592
Sub-array output driver (nJ): 0.00128588
Tag array: Total dynamic read energy/access (nJ): 0.0165109
Total leakage read/write power all banks at maximum frequency (mW): 40.1795
Total energy in H-tree (nJ): 0.0111652
Decoder (nJ): 5.22869e-05
Wordline (nJ): 6.28864e-05
Bitline mux & associated drivers (nJ): 3.49353e-05
Sense amp mux & associated drivers (nJ): 0
Bitlines (nJ): 0.00154038
Sense amplifier energy (nJ): 0.00055296
Sub-array output driver (nJ): 0.00291354