Cache to Ram Ration
A processor might have 512 KB of Cache and 512 MB of RAM.
There may be 1000 times more RAM than cache.
The cache algorithms have to carefully select the 0.1% of the memory that is likely to be most accessed.
A cache line contains two fields
Data from RAM
The address of the block is called the tag field.
Mapping:
The memory system has to quickly determine if a given address is in the cache.
Three popular methods of mapping addresses to cache locations.
-- Fully Associative
Search the entire cache for an address.
--Direct
Each address has a specific place in the cache.
--Set Associative
Each address can be in any of a small set of cache locations.
Searching Problem
Knowledge of searching
Linear Search O(n)
Binary Search O(log2 (n))
Hashing O(1)
Parallel Search O(n/p)
Associative Mapping
The data from any location in RAM can be stored in any location in cache.
When the processor wants an address, all tag fields in the cache as checked to determine if the data is already in the cache.
Each tag line requires circuitry to compare the desired address with the tag field.
All tag fields are checked in parallel.
Set Associative Mapping
Set associative mapping is a mixture of direct and associative mapping.
The cache lines are grouped into sets.
Replacement policy
When a cache miss occurs, data is copied into some location in cache.
With Set Associative of Fully Associative mapping, the system must decide where to put the data and what values will be replaced.
Cache performance is greatly affected by properly choosing data that is unlikely to referenced again.
Replacement Options
First In First Out (FIFO)
Least Recently Used (LRU)
Pseudo LRU
Random
Comparison of Mapping Fully Associatve
Associate mapping works the best, but is complex to implement. Each tag line requires circuitry to compare the desired address with the tag field.
Some special purpose $, such as the virtual memory Translation Lookaside Buffer (TLB) is an associative cache.
Comparison of Mapping Direct.
Has the lowest performance, but is easiest to implement. Direct is often used for instruction cache.
Sequential addresses fill a cache line and then go to the next cache line.
Sunday, October 28, 2012
Interleaving & Memory Interleaving
Computer Memory, Communication system, error correction,
http://en.wikipedia.org/wiki/Interleaving
http://fourier.eng.hmc.edu/e85/lectures/memory/node2.html
http://en.wikipedia.org/wiki/Interleaving
http://fourier.eng.hmc.edu/e85/lectures/memory/node2.html
Issues Related to Cache Memory
- Load-ThroughWhen the CPU needs to read a word from the memory, the block containing the word is brought from MM to CM, while at the same time the word is forwarded to the CPU.
- Store-ThroughIf store-through is used, a word to be stored from CPU to memory is written to both CM (if the word is in there) and MM. By doing so, a CM block to be replaced can be overwritten by an in-coming block without being saved to MM.
CUDA Shared Memory broadcast
Multiple addresses map to same memory bank
Accesses are serialized
Hardware splits request into as many separate conflict-free requests as necessary
Exception: if all access the same address: broadcast
However, recent large improvements in CUBLAS and CUFFT performance were achieved by avoiding shared memory in favor of registers -- so try to use registers whenever possible.
If all threads read from the same shared memory address then a broadcast mechanism is automatically invoked and serialization is avoided. Shared memory broadcasts are an excellent and high-performance way to get data to many threads simultaneously.
It is worthwhile trying to exploit this feature whenever you use shared memory.
----Rob Farber
www.drdobbs.com/parallel/cuda-supercomputing-for-the-masses-part/208801731
CUDA, Supercomputing for the masses: Part 5
Accesses are serialized
Hardware splits request into as many separate conflict-free requests as necessary
Exception: if all access the same address: broadcast
However, recent large improvements in CUBLAS and CUFFT performance were achieved by avoiding shared memory in favor of registers -- so try to use registers whenever possible.
If all threads read from the same shared memory address then a broadcast mechanism is automatically invoked and serialization is avoided. Shared memory broadcasts are an excellent and high-performance way to get data to many threads simultaneously.
It is worthwhile trying to exploit this feature whenever you use shared memory.
----Rob Farber
www.drdobbs.com/parallel/cuda-supercomputing-for-the-masses-part/208801731
CUDA, Supercomputing for the masses: Part 5
Vim cheatsheet
http://www.worldtimzone.com/res/vi.html
Tuesday, October 23, 2012
Sunday, October 21, 2012
Friday, October 19, 2012
5.1 Cost of Solving a System of Linear Equations Using Gaussian Elimination
http://ceee.rice.edu/Books/CS/chapter5/cost1.html
Thursday, October 18, 2012
Cost of Solving a System of Linear Equations Using Gaussian Elimination
http://ceee.rice.edu/Books/CS/chapter5/cost1.html
Monday, October 15, 2012
Today's reading: Computer Architecture: Appendix B1
Reviews of Memory Hierarchy
Palt When the processor references an item within a page that is not present in cache or main memory.
Palt occurs.
Palt When the processor references an item within a page that is not present in cache or main memory.
Palt occurs.
Syntax error: Bad for loop variable
With sh or ksh, you must use a while statement.
Jean-Pierre.
http://www.unix.com/shell-programming-scripting/60860-loop-syntax-trouble.html
chmod 755 xxxxx.sh
./xxxxx
Jean-Pierre.
http://www.unix.com/shell-programming-scripting/60860-loop-syntax-trouble.html
chmod 755 xxxxx.sh
./xxxxx
Bash/Shell Programming – Binary Operator Expected
http://digitalvectorz.wordpress.com/2009/12/10/bashshell-programming-binary-operator-expected/
if [[ -n `ls | grep something` ]]; then
echo "Success";
fi
Nested if/then condition:
http://tldp.org/LDP/abs/html/nestedifthen.html
if [ "$a" -gt 0 ]
then
if [ "$a" -lt 5 ]
then
echo "The value of \"a\" lies somewhere between 0 and 5."
fi
fi
Linux C exit function
http://en.wikipedia.org/wiki/Exit_%28operating_system%29
http://linux.about.com/library/cmd/blcmdl3_exit.htm
http://stackoverflow.com/questions/2007558/exit-function-on-linux
include
stdlib
cstdlib (C++)
http://linux.about.com/library/cmd/blcmdl3_exit.htm
http://stackoverflow.com/questions/2007558/exit-function-on-linux
include
stdlib
cstdlib (C++)
Sunday, October 14, 2012
How to use string as data for plotting in Matlab?
http://stackoverflow.com/questions/3672637/how-to-use-string-as-data-for-plotting-in-matlab
x = yourXdata;
y = yourYdata;
labels = {'A' 'B' 'C'};
plot(x, y); set(gca, 'XTick', 1:3, 'XTickLabel', labels);
How to Set the Tick Locations and Labels
http://www.math.ufl.edu/help/matlab/tec2.10.html
Thread Subject: convert matrix (double) to cell array (string) without for loop
a = 1:3 ;
b = strread(num2str(a),'%s')
http://www.mathworks.com/matlabcentral/newsreader/view_thread/156758
x = yourXdata;
y = yourYdata;
labels = {'A' 'B' 'C'};
plot(x, y); set(gca, 'XTick', 1:3, 'XTickLabel', labels);
How to Set the Tick Locations and Labels
http://www.math.ufl.edu/help/matlab/tec2.10.html
Thread Subject: convert matrix (double) to cell array (string) without for loop
a = 1:3 ;
b = strread(num2str(a),'%s')
http://www.mathworks.com/matlabcentral/newsreader/view_thread/156758
Inside Nehalem: Intel’s Future Processor and System
http://www.realworldtech.com/nehalem/7/
L1D Cache?
Inclusive caches are forced by design to replicate data, which implies certain relationships between the sizes of the various levels of the cache. In the case of Nehalem, each core contains 64KB of data in the L1 caches and 256KB in the L2 cache (there may or may not be data that is in both the L1 and L2 caches).
This means that 1-1.25MB of the 8MB L3 cache in Nehalem is filled with data that is also in other caches. What this means is that inclusive caches should only really be used where there is a fairly substantial size difference between the two levels. Nehalem has about an 8X difference between the sum of the four L2 caches and the L3, while Barcelona’s L3 cache is the same size as the total of the L2 caches.
Nehalem’s cache hierarchy has also been made more flexible by increasing support for unaligned accesses.
As a result, an unaligned SSE load or store will always have the same latency as an aligned memory access, so there is no particular reason to use aligned SSE memory accesses.
L1D Cache?
Inclusive caches are forced by design to replicate data, which implies certain relationships between the sizes of the various levels of the cache. In the case of Nehalem, each core contains 64KB of data in the L1 caches and 256KB in the L2 cache (there may or may not be data that is in both the L1 and L2 caches).
This means that 1-1.25MB of the 8MB L3 cache in Nehalem is filled with data that is also in other caches. What this means is that inclusive caches should only really be used where there is a fairly substantial size difference between the two levels. Nehalem has about an 8X difference between the sum of the four L2 caches and the L3, while Barcelona’s L3 cache is the same size as the total of the L2 caches.
Nehalem’s cache hierarchy has also been made more flexible by increasing support for unaligned accesses.
As a result, an unaligned SSE load or store will always have the same latency as an aligned memory access, so there is no particular reason to use aligned SSE memory accesses.
Saturday, October 13, 2012
TLB Translation Lookaside Buffer
A TLB has a fixed number of slots that contain page table entries, which map virtual addresses to physical addresses. The virtual memory is the space seen from a process. This space is segmented in pages of a prefixed size. The page table
(generally loaded in memory) keeps track of where the virtual pages are
loaded in the physical memory. The TLB is a cache of the page table; that is, only a subset of its contents are stored.
The TLB references physical memory addresses in its table.
The TLB references physical memory addresses in its table.
bash for loop exampe
http://www.thegeekstuff.com/2011/07/bash-for-loop-examples/
General solution of output redirection
http://askubuntu.com/questions/75327/redirection-doesnt-work
http://www.mathinfo.u-picardie.fr/asch/f/MeCS/courseware/users/help/general/unix/redirection.html
>> Append standard output
Sometimes shell redirecting does not work (specifically - when one shell spawns another shell, I think:). Above is the generic solution that simply grabs all the shell output and places it into the file. In your case this should work as well, since you're expecting output on stdout/stderr.
http://www.mathinfo.u-picardie.fr/asch/f/MeCS/courseware/users/help/general/unix/redirection.html
>> Append standard output
Sometimes shell redirecting does not work (specifically - when one shell spawns another shell, I think:). Above is the generic solution that simply grabs all the shell output and places it into the file. In your case this should work as well, since you're expecting output on stdout/stderr.
|
Wire Parameter Calculator
http://circuitcalculator.com/wordpress/2007/09/20/wire-parameter-calculator
Friday, October 12, 2012
zssh and scp solution
http://www.hypexr.org/linux_scp_help.php
zssh
http://askubuntu.com/questions/13382/download-a-file-over-an-active-ssh-session
scp
http://www.howtogeek.com/66776/how-to-remotely-copy-files-over-ssh-without-entering-your-password/
zssh
http://askubuntu.com/questions/13382/download-a-file-over-an-active-ssh-session
scp
http://www.howtogeek.com/66776/how-to-remotely-copy-files-over-ssh-without-entering-your-password/
Wednesday, October 10, 2012
Matlab libc.so.6 permission deny
cannot mmap file
http://www.turnkeylinux.org/forum/support/20110216/cannot-mmap-file
rm /usr/lib/libGuestLib.so
ln -s /usr/lib/vmware-tools/lib32/libvmGuestLib.so/libvmGuestLib.so /usr/lib/libGuestLib.so
http://www.mathworks.com/support/solutions/en/data/1-ONA55/index.html?solution=1-ONA55
http://www.turnkeylinux.org/forum/support/20110216/cannot-mmap-file
rm /usr/lib/libGuestLib.so
ln -s /usr/lib/vmware-tools/lib32/libvmGuestLib.so/libvmGuestLib.so /usr/lib/libGuestLib.so
http://www.mathworks.com/support/solutions/en/data/1-ONA55/index.html?solution=1-ONA55
chmod 755 libc.so.6
Matlab
http://www.gatsby.ucl.ac.uk/~qhuys/matlab.html
Symbolic Link
http://www.mathworks.com/matlabcentral/answers/10134-usr-local-matlab-r2011a-bin-util-oscheck-sh-605-lib64-libc-so-6-not-found
http://askubuntu.com/questions/189318/missing-lib-libc-so-6
Tuesday, October 9, 2012
Homework LaTeX Template I use
https://gist.github.com/1278588
http://tex.stackexchange.com/questions/31183/class-file-for-homework-assignments
Saturday, October 6, 2012
Update Manager error : Requires installation of untrusted packages
http://ubuntuforums.org/showthread.php?t=1828748
sudo rm -r /var/lib/apt/lists
sudo mkdir -p /var/lib/apt/lists/partial
sudo aptitude update
sudo rm -r /var/lib/apt/lists
sudo mkdir -p /var/lib/apt/lists/partial
sudo aptitude update
Friday, October 5, 2012
Drawing Sexy graphs in Matlab
http://quantombone.blogspot.com/2012/01/drawing-sexy-graphs-in-matlab.html
From Tombone's blog
(Tomasz Malisiewicz)
MatLab GraphViz Interface
http://www.mathworks.com/matlabcentral/fileexchange/4518
From Leon Peshkin
2-D line plot
http://www.mathworks.com/help/matlab/ref/plot.html
Setting Properties
Making pretty graphs
http://blogs.mathworks.com/loren/2007/12/11/making-pretty-graphs/
Setting figure size:
http://stackoverflow.com/questions/5183047/matlab-setting-graph-figure-size
From Tombone's blog
(Tomasz Malisiewicz)
MatLab GraphViz Interface
http://www.mathworks.com/matlabcentral/fileexchange/4518
From Leon Peshkin
Drawing Beautiful Explicite and Implicite Functions using Matlab
http://www.falkoschindler.de/pub/2011/08/20-drawing-beautiful-explicite-and-implicite-functions-using-matlab/2-D line plot
http://www.mathworks.com/help/matlab/ref/plot.html
Setting Properties
Improving Your MATLAB Figures
Micah Kimo Johnson
http://www.mit.edu/~kimo/blog/improving_figures.htmlMaking pretty graphs
http://blogs.mathworks.com/loren/2007/12/11/making-pretty-graphs/
Setting figure size:
http://stackoverflow.com/questions/5183047/matlab-setting-graph-figure-size
hFig = figure(1); set(hFig, 'Position', [x y width height])
Infinite Geometric Series
http://www.intmath.com/series-binomial-theorem/3-infinite-geometric-series.php
If−1<r<1 , then the infinite geometric series
This value is given by:
If
a1 + a1r + a1r2 + a1r3 + ... + a1rn-1converges to a particular value.
This value is given by:
The series converges because each term gets smaller and smaller (since -1 < r < 1).S∞=a11−r (|r|<1)
Thursday, October 4, 2012
Daily Reading: Software stack
http://www.pcmag.com/encyclopedia_term/0,2542,t=software+stack&i=51702,00.asp
PC Mag
A set of programs that work together to produce a result; for example, an operating system and its applications. It may refer to any group of applications that work in sequence toward a common result or to any set of utilities or routines that work as a group.
Solution Stack
http://en.wikipedia.org/wiki/Solution_stack
A comprehensive definition from IBM:
http://pic.dhe.ibm.com/infocenter/tivihelp/v28r1/index.jsp?topic=%2Fcom.ibm.tivoli.tpm.scenario.doc%2Fsoftware%2Fcsfm_sftstack.html
Discussion on differences between OS and software stack
http://stackoverflow.com/questions/10283725/what-is-difference-between-software-stack-and-os-why-android-is-not-an-os-but
PC Mag
A set of programs that work together to produce a result; for example, an operating system and its applications. It may refer to any group of applications that work in sequence toward a common result or to any set of utilities or routines that work as a group.
Solution Stack
http://en.wikipedia.org/wiki/Solution_stack
A comprehensive definition from IBM:
http://pic.dhe.ibm.com/infocenter/tivihelp/v28r1/index.jsp?topic=%2Fcom.ibm.tivoli.tpm.scenario.doc%2Fsoftware%2Fcsfm_sftstack.html
Discussion on differences between OS and software stack
http://stackoverflow.com/questions/10283725/what-is-difference-between-software-stack-and-os-why-android-is-not-an-os-but
How to find/display your MAC address
Unix/Linux
http://www.coffer.com/mac_info/locate-unix.html
ifconfig -a
HWaddr
http://www.jonathanmoeller.com/screed/?p=3420
ifconfig | grep HWaddr
http://www.coffer.com/mac_info/locate-unix.html
ifconfig -a
HWaddr
http://www.jonathanmoeller.com/screed/?p=3420
ifconfig | grep HWaddr
Monday, October 1, 2012
Keeping a reading journal
http://cseweb.ucsd.edu/classes/fa12/cse260-b/Summaries.html
How to read a research paper:
In addition to summarizing the basic facts, your writeup
should
- discuss the contributions of the paper, reflecting, analyzing or criticizing the ideas presented
- offer insight into the authors' motivation
- explore open issues or ideas that you are wondering about after reading the paper
How to read a research paper:
with contributions of Bill Griswold, Gail Murphy, Cristina Conati, Erica Melis
http://www.cs.brandeis.edu/~cs227b/papers/introduction/howToRead.txt
What are motivations for this work?
The paper should describe why the problem is important and why it does not have a trivial solution; that is, why a new solution may be required.
What is the proposed solution?
There should also be an argument about why the solution solves the problem better than previous solutions. There should also be a discussion about how the solution is achieved (designed and implemented) or is at least achievable. Are all concepts and notations introduced before their first usage?
What is the evaluation of the proposed solution?
What argument and/or experiment is made to make a case for the value of the ideas? What benefits or problems are identified? Are they convincing?
What alternative solutions exist? Read a paper critically. What are the contributions? The contributions in a paper may be many and varied. Ideas, software, experimental techniques, and area survey are a few key possibilities. What are future directions for this research? Not only what future directions do the authors identify, but what ideas did you come up with while reading the paper?
You may find it productive to try to answer each question in turn, writing your answer down. In practice, you are not done reading a paper until you can answer all the questions.
Exascale Computing
Subscribe to:
Posts (Atom)