Texture memory speed test

Speed test program
http://forums.nvidia.com/index.php?showtopic=181432&st=0

"On Fermi, global memory loads are cached in L1 and L1 cache has higher bandwidth than the texture cache"
http://stackoverflow.com/questions/9893086/why-in-my-case-the-texture-memory-is-slower-than-the-global

Spatial Locality of texture memory usage.

Cache & Cache miss or hit

Most modern desktop and server CPUs have at least three independent caches: an Instruction cache to speed up executable instruction fetch, a data cache to speed up data fetch and store, and a translation lookaside buffer (TLB)???? used to speed up virtual-to-physical address translation for both executable instruction and data.

Cache Entries
Memory is split into "locations," which correspond to cache "lines".
The requested memory location (now called a tag)
a copy of the data

When the processor needs to read or write a location in main memory, it first checks for a corresponding entry in the cache. The cache checks for the contents of the requested memory location in any cache lines that might contain in that address. If the processor finds that the memory location is in the cache, a cache hit has occurred (otherwise, a cache miss).

A cache miss refers to a failed attempt to read or write a piece of data in the cache, which results in a main memory access with much longer latency.

Three kinds of cache misses : instruction read miss, data read miss, and data write miss.

A cache read miss from an instruction cache generally causes the most delay, because the processor, or at least the thread of execution, has to wait (stall) until the instruction is fetched from main memory.

A cache read miss from a data cache usually causes less delay, because instructions not dependent on the cache read can be issued and continue execution until the data is returned from main memory, and the dependent instructions can resume execution.

A cache write miss to a data cache generally causes the least delay, because the write can be queued and there are few limitations on the execution of subsequent  instructions. The processor can continue until the queue is full.

Reference:
http://en.wikipedia.org/wiki/CPU_cache#Cache_miss

Friday, April 27, 2012

Translation lookaside buffer (wiki)

A translation lookaside buffer (TLB) is a cache that memory management hardware uses to improve virtual address translation speed. All current desktop, notebook, and server processors use a TLB to map virtual and physical address spaces, and it is nearly always present in any hardware which utilizes virtual memory.

Reference:Wiki

Saturday, April 21, 2012

Friday, April 20, 2012

Fixing problem during installing gummi spelling check

gtkspell need

No package 'gtk+-2.0' found solution:
http://ubuntuforums.org/showthread.php?t=1255480

libgtk2.0-dev 
fixed it

No package 'enchant' found solution:
Download enchant source
http://www.linuxfromscratch.org/blfs/view/cvs/general/enchant.html

Installation Method:
http://groups.google.com/group/pyenchant-users/browse_thread/thread/6be9036c488d37bb?pli=1

./configure
make
make install (maybe need sudo permission)

inttool installation (need intltool 0.35.0 or later)
./configure
make
make install

Wednesday, April 18, 2012

GPU: architecture and programming (NYU Course)

http://cs.nyu.edu/courses/spring12/CSCI-GA.3033-012/index.html

Contains some interesting links for GPU tools from webpage above
Multi2Sim Simulation Framework
http://www.multi2sim.org/

GPUocelot
http://code.google.com/p/gpuocelot/
Dynamic Compilation for PTX

Short CUDA tutorial of Colorado School of Mines
http://geco.mines.edu/tesla/cuda_tutorial_mio/index.html

Notes of CUDA/C typecasting

Fast way to convert float4 to uchar4? Texture conversion
http://forums.nvidia.com/index.php?showtopic=166797

There is no Boolean type in C
http://www.cs.cf.ac.uk/Dave/C/node4.html

How to vectorize a vector type cast
http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-February/047827.html

CUDA Example BoxFilter in OpenCL
http://rungpu.org/opencl/kernels/25/boxfilter.cl
Shows Interoperability between CUDA and OpenGL

Contains function of conversion between RGB field and float field

How to record videos from Logitech Quickcam Pro 9000

http://ubuntuforums.org/showthread.php?t=890526


ffmpeg -f audio_device -i /dev/dsp1  -f video4linux2 -s 640x480 -i /dev/video0 -f avi - | tee `date -I`.avi | mplayer -