Blog
Game controller development board
This post also represents contributions from Kirk Lau. The following is a response to a question from Josef about why the Arduino Pro Micro is the best choice. The core issue in using the pro mini is that - much like other boards in the $10 range - it lacks usb communication. In order to communicate with the board, as we really need to do to have a usb controller, we need either a usb-to-serial converter or a microcontroller capable of directly using the protocol. Although these $10 boards can afford the processor chip, LEDs, resistors, and such, the usb-to-serial...
Welcome New Members
I'd like to welcome a set of new student researchers to the Jetson
project.
The list of people who are joining Charlab are as follows:
Amy Ngai
Da Eun Shim
Ramy Elminyawi
Andrew Fishberg
Richard Piersall
Kirklann Lau
The tentative assignments are for each person to be responsible for a
single lab assignment directly, while working with the rest of the
group on the bigger issues.
Lab
Person
1
Richard
2
Kirk
3
Andrew
4
Ramy
5
Da Eun
6
Amy
I'm excited to be working with all of you and hope we'll do wonderful
things this semester!
cacti
Integration of CACTI into Sphynx I have been working on setting up CACTI on charlab in order to get more reliable power consumption results than the simulator used by GP-GPUSIM. The process of installing all the dependencies was slightly annoyinng, but now there is a working copy of CACTI 6.0 in the gpgpu-sim folder. It can be run to generate a report using standard compile and run instructions (make and then run). All cache configuration is set in the config file (which seems well documented!). Alternatively, you can provide command line arguments to run a simulation. C B A Tech...
Ray Characterization
Last week, I characterized how parallel CP and RAY were by examining the CUDA files for both the benchmarks. I found that CP was more parallel than RAY -- which mande it an interesting benchmark to analyze despite its small size. This week, we wanted to finish our characterization of RAY by making it more parallel in software. This entailed changing the image size that the Ray Tracer would operate on and the number of thread blocks (and consequently the number of threads in the queue at a given time). The following are the calculations related to the tweaked version...
Example Blog Post Errors
This morning I noticed an issue preventing some of the blog posts to show up correctly on the live website. These errors happened mostly because some of the blog authors use the github in-browser editor to create posts rather than testing locally using jekyll before pushing the updates to the live site. This time, there were 3 errors with 3 different levels of severity. Blog post location This was the most critical error that caused jekyll to throw up. The issue was that one of the blog posts had be placed in the root directory instead of being placed inside...
Benchmark Parallelism
Analysis of Parallelism in Ray Tracing and Coulombic Potential Benchmarks Last week, Fahiba and DH generated some very interesting plots which gave us a lot of insight into the different benchmarks that we were using in our project. So this week, I set out to understand the reasons behind some of the trends that they spoke about in their blog by examining in important characteristic of these benchmark programs: their degree of parallism or in other words the number of software work elements that each program issues to the GPU when executing in hardware. The following calculations have been derived...
Working set size for traces
For this week, I modified the simulation to work for on an L1, L2, and L3 data cache, instead of just the L1 cache. After doing some research, I found the typical size and associativity of these caches in an i7 processor. After simulating this, I found that the traces were all resident in the cache. This means that there was no recaching in the L3 cache. To determine a good cache configuration moving forward, it will be useful to know the working set size, which is the amount of memory needed by the trace. Trace Unique Addresses is_omp 89677...
Early Cache Stats Printing in GPGPU-Sim
Sorry for the delay, but I finally modified the GPGPU-sim code to print early cache statistics so we can adjust for compulsory misses. The original plan was to have all our benchmarks have some sort of a dry-run initially to fill up the cache before collecting data, but looking at the source code, trying to manipulate when/how the benchmarks are run is going to be pretty tricky. So intead, we are settling for printing the cache statistics early (like after first 10,000 cycles), then comparing it with the final cache statistics once everything finishes running. Read on for details on...