2014-2-11-Minutes
Start time: around 12:10pm
In attendance: Prof Spjut, Fabiha, Erik, Paul, Akhil, Sami, Donghyeon
__
Paul:
-Paul thinks he is oversimplifying his project
-Testing with more instructions is the ideal
-Instructions should forwards branch as much as possible; if there's a backwards branch, there need to be some code to make it forwards branch again
-Getting raspberry pi to work is the hard part, so he is fine; getting to that ASAP would be good
-Prof Harris' implementation does have a bug; testing should catch it (Paul should test Prof Harris' thing with his version)
__
Sami:
-Sami was not able to run GPGPU-Sim
-It is working; Akhil says it is working
-The default is .bashrc
__
Fabiha:
-One thing to note is that there are no reservation fails recorded in the output file we are currently looking out
-The miss rates for these simulations are very low; we may want to decrease the cache size to increase the miss rate (argue it's good in the paper for some reason, such as saving space [though it saves very little])
-Ideally, we would want fewer instruction caches (vs smaller ones) and figure out how to share them
-We want the static number of instructions
-NOT the dynamic number of instructions (ex. the accesses + misses + reservation fails)
-We need a way to record this
-We also need a way to figure out when the banks want to access the same instruction (this has to be included with a print command in the source)
-"inputfile.readlines()[x:y]" This Python function will allow only certain lines (x through y) to be read and this can be a variable inputted as an argument to the parser function
-What I need to do for next week: table with list of benchmarks with number of instructions per benchmark (ideally we want sass and not ptx)
-Ptx: like llvm in that it is a high level assembly, but it is symbolic assembly
-Sass: like mips or ARM, actual instructions that get executed
-Compiling using nvcc compiles it down to ptx and then loads and compiles into sass
-Sass is different depending on what GPU you have
-Commit to ptx, but not sass (we will probably get ptx instructions and that's fine, but sass would be ideal)
__
Donghyeon&Akhil:
-They had questions to ask about their project
-warp is 32
-simd has 16 or 32 shader cores (number of shader cores divided by 16 or 32)
-simd width * simd threads= total number of shader cores, indicates how powerful it is
-execution is outside of it
-4 CTAs (NVIDIA term for warp), indication of how many parallels your core can handle
-CTA = cooperative thread array, array of threads that executes kernel concurrently or in parallel (read more here)
-shader cores are tightly coupled (shader is the graphics term for something programmable, ex. creating vertex shaders to create illumination model but shader cores do general computation)