Sunday, October 28, 2012

Cache Mapping

Cache to Ram Ration

A processor might have 512 KB of Cache and 512 MB of RAM.

There may be 1000 times more RAM than cache.

The cache algorithms have to carefully select the 0.1% of the memory that is likely to be most accessed.

A cache line contains two fields

Data from RAM
The address of the block is called the tag field.

Mapping:

The memory system has to quickly determine if a given address is in the cache.

Three popular methods of mapping addresses to cache locations.

-- Fully Associative
Search the entire cache for an address.
--Direct
Each address has a specific place in the cache.
--Set Associative
Each address can be in any of a small set of cache locations.

Searching Problem
Knowledge of searching

Linear Search O(n)
Binary Search O(log2 (n))
Hashing O(1)
Parallel Search O(n/p)

Associative Mapping
The data from any location in RAM can be stored in any location in cache.

When the processor wants an address, all tag fields in the cache as checked to determine if the data is already in the cache.

Each tag line requires circuitry to compare the desired address with the tag field.

All tag fields are checked in parallel.

Set Associative Mapping

Set associative mapping is a mixture of direct and associative mapping.

The cache lines are grouped into sets.

Replacement policy

When a cache miss occurs, data is copied into some location in cache.

With Set Associative of Fully Associative mapping, the system must decide where to put the data and what values will be replaced.

Cache performance is greatly affected by properly choosing data that is unlikely to referenced again.

Replacement Options
First In First Out (FIFO)
Least Recently Used (LRU)
Pseudo LRU
Random

Comparison of Mapping Fully Associatve

Associate mapping works the best, but is complex to implement. Each tag line requires circuitry to compare the desired address with the tag field.

Some special purpose $, such as the virtual memory Translation Lookaside Buffer (TLB) is an associative cache.

Comparison of Mapping Direct.

Has the lowest performance, but is easiest to implement. Direct is often used for instruction cache.

Sequential addresses fill a cache line and then go to the next cache line.

Interleaving & Memory Interleaving

Computer Memory, Communication system, error correction,
http://en.wikipedia.org/wiki/Interleaving



http://fourier.eng.hmc.edu/e85/lectures/memory/node2.html



Issues Related to Cache Memory



  • Load-ThroughWhen the CPU needs to read a word from the memory, the block containing the word is brought from MM to CM, while at the same time the word is forwarded to the CPU.
  • Store-ThroughIf store-through is used, a word to be stored from CPU to memory is written to both CM (if the word is in there) and MM. By doing so, a CM block to be replaced can be overwritten by an in-coming block without being saved to MM.

CUDA Shared Memory broadcast

Multiple addresses map to same memory bank

Accesses are serialized
Hardware splits request into as many separate conflict-free requests as necessary
Exception: if all access the same address: broadcast

However, recent large improvements in CUBLAS and CUFFT performance were achieved by avoiding shared memory in favor of registers -- so try to use registers whenever possible.

If all threads read from the same shared memory address then a broadcast mechanism is automatically invoked and serialization is avoided. Shared memory broadcasts are an excellent and high-performance way to get data to many threads simultaneously.

It is worthwhile trying to exploit this feature whenever you use shared memory.

----Rob Farber
www.drdobbs.com/parallel/cuda-supercomputing-for-the-masses-part/208801731

CUDA, Supercomputing for the masses: Part 5 

C preprocessor #if defined

#if defined MACRO is precisely equivalent to #ifdef MACRO

Vim cheatsheet

http://www.worldtimzone.com/res/vi.html

Monday, October 15, 2012

Today's reading: Computer Architecture: Appendix B1

Reviews of Memory Hierarchy

Palt   When the processor references an item within a page that is not present in cache or main memory.

Palt occurs.


Syntax error: Bad for loop variable

With sh or ksh, you must use a while statement.

Jean-Pierre.

http://www.unix.com/shell-programming-scripting/60860-loop-syntax-trouble.html

 chmod 755 xxxxx.sh

./xxxxx

Bash/Shell Programming – Binary Operator Expected

http://digitalvectorz.wordpress.com/2009/12/10/bashshell-programming-binary-operator-expected/


if [[ -n `ls | grep something` ]]; then
echo "Success";
fi




Nested if/then condition:
http://tldp.org/LDP/abs/html/nestedifthen.html


if [ "$a" -gt 0 ]
then
  if [ "$a" -lt 5 ]
  then
    echo "The value of \"a\" lies somewhere between 0 and 5."
  fi
fi


Linux C exit function

http://en.wikipedia.org/wiki/Exit_%28operating_system%29

http://linux.about.com/library/cmd/blcmdl3_exit.htm

http://stackoverflow.com/questions/2007558/exit-function-on-linux

include
stdlib 
cstdlib (C++)

Sunday, October 14, 2012

How to use string as data for plotting in Matlab?

http://stackoverflow.com/questions/3672637/how-to-use-string-as-data-for-plotting-in-matlab
x = yourXdata;
y
= yourYdata;
labels
= {'A' 'B' 'C'};
plot
(x, y); set(gca, 'XTick', 1:3, 'XTickLabel', labels);

How to Set the Tick Locations and Labels
http://www.math.ufl.edu/help/matlab/tec2.10.html


Thread Subject: convert matrix (double) to cell array (string) without for loop
a = 1:3 ;
b = strread(num2str(a),'%s')

http://www.mathworks.com/matlabcentral/newsreader/view_thread/156758

Inside Nehalem: Intel’s Future Processor and System

http://www.realworldtech.com/nehalem/7/

L1D Cache?

Inclusive caches are forced by design to replicate data, which implies certain relationships between the sizes of the various levels of the cache. In the case of Nehalem, each core contains 64KB of data in the L1 caches and 256KB in the L2 cache (there may or may not be data that is in both the L1 and L2 caches).


This means that 1-1.25MB of the 8MB L3 cache in Nehalem is filled with data that is also in other caches. What this means is that inclusive caches should only really be used where there is a fairly substantial size difference between the two levels. Nehalem has about an 8X difference between the sum of the four L2 caches and the L3, while Barcelona’s L3 cache is the same size as the total of the L2 caches.

Nehalem’s cache hierarchy has also been made more flexible by increasing support for unaligned accesses.

As a result, an unaligned SSE load or store will always have the same latency as an aligned memory access, so there is no particular reason to use aligned SSE memory accesses. 

Saturday, October 13, 2012

TLB Translation Lookaside Buffer

A TLB has a fixed number of slots that contain page table entries, which map virtual addresses to physical addresses. The virtual memory is the space seen from a process. This space is segmented in pages of a prefixed size. The page table (generally loaded in memory) keeps track of where the virtual pages are loaded in the physical memory. The TLB is a cache of the page table; that is, only a subset of its contents are stored.

The TLB references physical memory addresses in its table.

bash for loop exampe

http://www.thegeekstuff.com/2011/07/bash-for-loop-examples/

General solution of output redirection

http://askubuntu.com/questions/75327/redirection-doesnt-work


http://www.mathinfo.u-picardie.fr/asch/f/MeCS/courseware/users/help/general/unix/redirection.html

>> Append standard output


Sometimes shell redirecting does not work (specifically - when one shell spawns another shell, I think:). Above is the generic solution that simply grabs all the shell output and places it into the file. In your case this should work as well, since you're expecting output on stdout/stderr.



  
script -c "/path/prog" /path/log.txt
 
script -c "Your Command" Filename.txt
  



Wire Parameter Calculator

http://circuitcalculator.com/wordpress/2007/09/20/wire-parameter-calculator

Friday, October 12, 2012

zssh and scp solution

http://www.hypexr.org/linux_scp_help.php

zssh
http://askubuntu.com/questions/13382/download-a-file-over-an-active-ssh-session

scp
http://www.howtogeek.com/66776/how-to-remotely-copy-files-over-ssh-without-entering-your-password/

Matlab tricks errorbar

http://www.gatsby.ucl.ac.uk/~qhuys/matlab.html



Wednesday, October 10, 2012

Matlab libc.so.6 permission deny

cannot mmap file
http://www.turnkeylinux.org/forum/support/20110216/cannot-mmap-file

rm /usr/lib/libGuestLib.so
ln -s /usr/lib/vmware-tools/lib32/libvmGuestLib.so/libvmGuestLib.so /usr/lib/libGuestLib.so

http://www.mathworks.com/support/solutions/en/data/1-ONA55/index.html?solution=1-ONA55

chmod 755 libc.so.6
 
 
Matlab 
http://www.gatsby.ucl.ac.uk/~qhuys/matlab.html 


Symbolic Link
http://www.mathworks.com/matlabcentral/answers/10134-usr-local-matlab-r2011a-bin-util-oscheck-sh-605-lib64-libc-so-6-not-found
 
 
http://askubuntu.com/questions/189318/missing-lib-libc-so-6 

Tuesday, October 9, 2012

Homework LaTeX Template I use


https://gist.github.com/1278588


http://tex.stackexchange.com/questions/31183/class-file-for-homework-assignments

Friday, October 5, 2012

Drawing Sexy graphs in Matlab

http://quantombone.blogspot.com/2012/01/drawing-sexy-graphs-in-matlab.html

From Tombone's blog
(Tomasz Malisiewicz)

MatLab GraphViz Interface
http://www.mathworks.com/matlabcentral/fileexchange/4518

From Leon Peshkin

Drawing Beautiful Explicite and Implicite Functions using Matlab

http://www.falkoschindler.de/pub/2011/08/20-drawing-beautiful-explicite-and-implicite-functions-using-matlab/

2-D line plot
http://www.mathworks.com/help/matlab/ref/plot.html

Setting Properties

Improving Your MATLAB Figures

Micah Kimo Johnson

http://www.mit.edu/~kimo/blog/improving_figures.html


Making pretty graphs
http://blogs.mathworks.com/loren/2007/12/11/making-pretty-graphs/

Setting figure size:
http://stackoverflow.com/questions/5183047/matlab-setting-graph-figure-size

hFig = figure(1); set(hFig, 'Position', [x y width height]) 

Octave Plotting Function

http://sunsite.univie.ac.at/textbooks/octave/octave_15.html

Infinite Geometric Series

http://www.intmath.com/series-binomial-theorem/3-infinite-geometric-series.php

If 1<r<1, then the infinite geometric series
a1 + a1r + a1r2 + a1r3 + ... + a1rn-1
converges to a particular value.
This value is given by:
S=a11r (|r|<1)
The series converges because each term gets smaller and smaller (since -1 < r < 1).

Thursday, October 4, 2012

Daily Reading: Software stack

http://www.pcmag.com/encyclopedia_term/0,2542,t=software+stack&i=51702,00.asp

PC Mag

A set of programs that work together to produce a result; for example, an operating system and its applications. It may refer to any group of applications that work in sequence toward a common result or to any set of utilities or routines that work as a group. 

Solution Stack 
http://en.wikipedia.org/wiki/Solution_stack 

A comprehensive definition from IBM:

http://pic.dhe.ibm.com/infocenter/tivihelp/v28r1/index.jsp?topic=%2Fcom.ibm.tivoli.tpm.scenario.doc%2Fsoftware%2Fcsfm_sftstack.html  
Discussion on differences between OS and software stack
http://stackoverflow.com/questions/10283725/what-is-difference-between-software-stack-and-os-why-android-is-not-an-os-but

How to find/display your MAC address

Unix/Linux

http://www.coffer.com/mac_info/locate-unix.html

ifconfig -a

HWaddr

http://www.jonathanmoeller.com/screed/?p=3420

ifconfig | grep HWaddr

Monday, October 1, 2012

Keeping a reading journal

http://cseweb.ucsd.edu/classes/fa12/cse260-b/Summaries.html

In addition to summarizing the basic facts, your writeup should
  • discuss the contributions of the paper, reflecting, analyzing or criticizing the ideas presented
  • offer insight into the authors' motivation
  • explore open issues or ideas that you are wondering about after reading the paper

How to read a research paper:
with contributions of Bill Griswold, Gail Murphy, Cristina Conati, Erica Melis
 
http://www.cs.brandeis.edu/~cs227b/papers/introduction/howToRead.txt 
 
 
What are motivations for this work?
 
The paper should describe why the problem
      is important and why it does not have a trivial solution; that is, why a
      new solution may be required.
 
What is the proposed solution?
 
There should also be an argument about why the solution solves the
      problem better than previous solutions. There should also be a discussion
      about how the solution is achieved (designed and implemented) or is at
      least achievable. Are all concepts and notations introduced before their
      first usage? 

What is the evaluation of the proposed solution?
What argument and/or
      experiment is made to make a case for the value of the ideas? What
      benefits or problems are identified? Are they convincing?
 
What alternative solutions exist? Read a paper critically.

What are the contributions? The contributions in a paper may be many and
      varied. Ideas, software, experimental techniques, and area survey are a
      few key possibilities.

What are future directions for this research? Not only what future
      directions do the authors identify, but what ideas did you come up with
      while reading the paper? 

You may find it productive to try to answer each question in turn, writing your answer
down. In practice, you are not done reading a paper until you can answer all the
questions.
 
 
 
 

Exascale Computing

Definition(Wiki):

Exascale computing refers to a computer system capable of reaching performance of at least one exaflops. Such capacity would represent a thousandfold increase over the currently existing petascale[1]