Saturday, 17 August 2013

Machine Learning and Optimization

Machine Learning algorithms and optimization techniques have become central to most applications of computing ranging from search, ads, data-mining, data-analytics in large databases, information retrieval and extraction, natural language processing including machine translation, speech, vision, gaming, user adaptation of computing systems, as well as security, privacy, and the broad topic of crowd-sourcing. Our goal is to conduct research in theoretical and practical aspects of Machine Learning and Optimization including:

Novel machine learning algorithms and paradigms
Foundational aspects of optimization techniques, including new algorithms and applications to machine learning
Theoretical analysis of machine learning and optimization algorithms
Performance analysis and enhancement of machine learning and optimization algorithms
Applications in search and IR, vision, NLP and other areas
Data mining and data analytics for very large data sets

TURING LECTURES A.M. TURING AWARD WINNERS BY...

(2011) Pearl, Judea The Mechanization of Causal Inference: A “mini” Turing Test and Beyond

(2010) Valiant, Leslie Gabriel The Extent and Limitations of Mechanistic Explanations of Nature

(2009) Thacker, Charles P. (Chuck) Improving the future by examining the past: ACM Turing Award Lecture

(2008) Liskov, Barbara The Power of Abstraction

(2007) Clarke, Edmund Melson Model checking: my 27-year quest to overcome the state explosion problem

Emerson, E. Allen Model checking: A Personal Perspective

Sifakis, Joseph The Quest for Correctness Beyond Verification

(2006) Allen, Frances ("Fran") Elizabeth Compiling for Performance: A Personal Tour

(2005) Naur, Peter Computing vs. Human Thinking

(2004) Cerf, Vinton (“Vint”) Gray Assessing the Internet: Lessons Learned, Strategies for Evolution, and Future Possibilities

Kahn, Robert (“Bob”) Elliot Assessing the Internet: Lessons Learned, Strategies for Evolution, and Future Possibilities

(2003) Kay, Alan Turing Award Lecture

(2002) Adleman, Leonard (Len) Max Pre-RSA Days: History and Lessons

Rivest, Ronald (Ron) Linn The Eary Days of RSA: History and Lessons

Shamir, Adi Cryptography: State of the science

(1996) Pnueli, Amir Verification engineering: a future profession

(1994) Feigenbaum, Edward A ("Ed") How the “what” becomes the “how”

Reddy, Dabbala Rajagopal ("Raj") To dream the possible dream

(1993) Hartmanis, Juris Turing award lecture: on computational complexity and the nature of computer science

Stearns, Richard ("Dick") Edwin Turing award lecture: it's time to reconsider time

(1992) Lampson, Butler W Principles for Computer System Design

(1991) Milner, Arthur John Robin Gorell ("Robin") Elements of interaction

(1990) Corbato, Fernando J ("Corby") On building systems that will fail

(1987) Cocke, John The search for performance in scientific processors

(1986) Hopcroft, John E Computer science: the emergence of a discipline

Tarjan, Robert (Bob) Endre Algorithm design

(1985) Karp, Richard ("Dick") Manning Combinatorics, complexity, and randomness

(1984) Wirth, Niklaus E From programming language design to computer construction

(1983) Ritchie, Dennis M. Reflections on software research

Thompson, Kenneth Lane Reflections on trusting trust

(1982) Cook, Stephen Arthur An overview of computational complexity

(1981) Codd, Edgar F. ("Ted") Relational database: a practical foundation for productivity

(1980) Hoare, C. Antony ("Tony") R. The emperor's old clothes

(1979) Iverson, Kenneth E. ("Ken") Notation as a tool of thought

(1978) Floyd, Robert (Bob) W The paradigms of programming

(1977) Backus, John Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs

(1976) Rabin, Michael O. Complexity of computations

Scott, Dana Stewart Logic and programming languages

(1975) Newell, Allen Computer science as empirical inquiry: symbols and search

Simon, Herbert ("Herb") Alexander Computer science as empirical inquiry: symbols and search

(1974) Knuth, Donald ("Don") Ervin Computer programming as an art

(1973) Bachman, Charles William The programmer as navigator

(1972) Dijkstra, Edsger Wybe The humble programmer

(1971) McCarthy, John Generality in artificial intelligence

(1970) Wilkinson, James Hardy ("Jim") Some comments from a numerical analyst

(1969) Minsky, Marvin Form and content in computer science

(1968) Hamming, Richard W One man's view of computer science

(1967) Wilkes, Maurice V. Computers then and now

(1966) Perlis, Alan Jay The synthesis of algorithmic systems

About Java Shell

This file gives instruction on how to install and run the Java Shell

Installation

Java Shell comes in a zipped format - jshell.tar.gz
Unzip the file in the directory where the Java Shell has to be installed
The zipped file will expand to the following directory structure

Run the Shell

The shell is written in java and it makes use of java swing api. To run the shell JVM has to be installed on the machine .

Set the PATH and CLASSPATH environment variables appropriately . Make sure that the CLASSPATH includes swing classes.

To run the Shell go to the classes subdirectory of jshell . i.e. cd jshell/classes

In the classes directory type at the comand prompt : java JShell to run the shell

The java shell is up and running

To get help on a command

To see what all commands are supported by Java Shell type help from the java shell

To get help on a particular command type help commandname

Man pages

To see a manual entry on a command go to the jshell directory .
Type man -M . commandname

Click to see the report on java shell

Click to download java shell source code

Friday, 16 August 2013

Principles of compiler design By Aho & Ullman

The Dragon Book

Theory of Computation Books

Advanced Complexity Theory
by Daniel Spielman, 2001, PDF

Algorithmic Randomness and Complexity
by R. G. Downey, D. R. Hirschfeldt, 2010, 629 pages, 4MB, PDF

Bayesian Computational Methods
by Christian P. Robert, 2010, 59 pp, 3.7MB, PDF

Cellular Automata
edited by S. Bandini, B. Chopard, M. Tomassini, 2002, 379 pp, 8.3MB, PDF

Cellular Automata
Wikibooks, 2010

Cellular Automata And Complexity: Collected Papers
by Stephen Wolfram, 1994

Cellular Automata: Simplicity Behind Complexity
edited by Alejandro Salcido, 2011, 566 pages, 31MB, PDF

Combinatorial Optimization: Exact and Approximate Algorithms
by Luca Trevisan, 2011, 139 pages, 830KB, PDF

Communication Complexity
by Domotor Palvolgyi, 2005, 39 pages, 380KB, PDF

Complexity
by Rajesh R. Parwani, 2002

Complexity Theory by Johan Hastad, 2008, 130 pages, 0.7MB, PDF

Computability and Complexity
Wikibooks, 2010

Computability and Complexity from a Programming Perspective
by Neil D. Jones, 1997, 485 pages, 1.7MB, PDF

Computability and Randomness
by Andre Nies, 2008, 447 pages, 2.6MB, PDF

Computability Theory
by Wilfried Sieg, 2006, 125 pp, 1.9MB, PDF

Computational Complexity: A Modern Approach
by Sanjeev Arora, Boaz Barak, 2008, 489 pages, 4.4MB, PDF

Computational Modeling and Complexity Science
by Allen Downey, 2008, 97 pages, 1.4MB, PDF

Finite-state Automata in Java
by Bradley Kjell

From Complexity to Creativity
by Ben Goertzel, 1996

From Philosophy to Program Size
by G. J. Chaitin, 2003, 54 pages, PS/PDF

Handbook of Quantum Information
Quantiki, 2013, online html

Introduction to Complexity Theory
by Oded Goldreich, 1999, 375 pages, 2.3MB, PDF

Introduction to Computational Complexity
by Martin Tompa, 1991, 85 pages, 1MB, PDF

Introduction to Quantum Algorithms for Physics and Chemistry
by Man-Hong Yung, et al. 2012, 44 pp, 2MB, PDF

Introduction to Quantum Cellular Automata
by B. Aoun, M. Tarifi, 2004, 46 pages, 330KB, PDF

An Introduction to Quantum Computing using Cavity QED concepts
by Zachary Burell, 2012, 53 pp, 260KB, PDF

An Introduction to the Theory of Computation
by Eitan Gurari, 1989, 314 pages, 3.2MB, ZIP/HTML

Lecture Notes on Algorithm Analysis and Computational Complexity
by Ian Parberry, 119 pages, 1.9MB, PDF

Lecture Notes on Computational Complexity
by Luca Trevisan, 2004, 171 pages, 0.9MB, PDF

Logic for Computer Scientists
by Uli Furbach, 2010

Mathematical Foundations of Automata Theory
by Jean-Eric Pin, 2012, 310 pp, 1.9MB, PDF

Notes on Automata, Logics, Games and Algebra
by K Narayan Kumar, 2007, PDF

P, NP, and NP-Completeness: The Basics of Complexity Theory
by Oded Goldreich, 2010, 190pp, 1.9MB, PS

Physics, Topology, Logic and Computation: A Rosetta Stone
by John C. Baez, Mike Stay, 2009, 73 pages, 780KB, PDF

Quantum Computation
by John Watrous, 2006, 139 pages, 660KB, PDF

Quantum Walks: A Comprehensive Review
by Salvador E. Venegas-Andraca, 2012, 88 pp, 1.5MB, PDF

Recursion Theory
by Frank Stephan, 2009, 125 pp, 610KB, PDF

Rule-based Computation and Deduction
by Helene Kirchner, Pierre-Etienne Moreau, 2001, 100 pp, 870KB, PDF

Think Complexity: Complexity Science and Computational Modeling
by Allen B. Downey, 2012, 146 pp, 1.2MB, PDF

Tree Automata Techniques and Applications
by H. Comon, M. Dauchet, R. Gilleron, 2008, 262 pages, 2MB, PDF

Saturday, 10 August 2013

Anatomy of a Program in Memory

Memory management is the heart of operating systems; it is crucial for both programming and system administration. In the next few posts I’ll cover memory with an eye towards practical aspects, but without shying away from internals. While the concepts are generic, examples are mostly from Linux and Windows on 32-bit x86. This first post describes how programs are laid out in memory.
Each process in a multi-tasking OS runs in its own memory sandbox. This sandbox is the virtual address space, which in 32-bit mode is always a 4GB block of memory addresses. These virtual addresses are mapped to physical memory by page tables, which are maintained by the operating system kernel and consulted by the processor. Each process has its own set of page tables, but there is a catch. Once virtual addresses are enabled, they apply to all software running in the machine, including the kernel itself. Thus a portion of the virtual address space must be reserved to the kernel:

This does not mean the kernel uses that much physical memory, only that it has that portion of address space available to map whatever physical memory it wishes. Kernel space is flagged in the page tables as exclusive to privileged code (ring 2 or lower), hence a page fault is triggered if user-mode programs try to touch it. In Linux, kernel space is constantly present and maps the same physical memory in all processes. Kernel code and data are always addressable, ready to handle interrupts or system calls at any time. By contrast, the mapping for the user-mode portion of the address space changes whenever a process switch happens:

Process Switch Effects on Virtual Memory

Blue regions represent virtual addresses that are mapped to physical memory, whereas white regions are unmapped. In the example above, Firefox has used far more of its virtual address space due to its legendary memory hunger. The distinct bands in the address space correspond to memory segments like the heap, stack, and so on. Keep in mind these segments are simply a range of memory addresses and have nothing to do with Intel-style segments. Anyway, here is the standard segment layout in a Linux process:

Flexible Process Address Space Layout In Linux

When computing was happy and safe and cuddly, the starting virtual addresses for the segments shown above were exactly the same for nearly every process in a machine. This made it easy to exploit security vulnerabilities remotely. An exploit often needs to reference absolute memory locations: an address on the stack, the address for a library function, etc. Remote attackers must choose this location blindly, counting on the fact that address spaces are all the same. When they are, people get pwned. Thus address space randomization has become popular. Linux randomizes the stack, memory mapping segment, and heap by adding offsets to their starting addresses. Unfortunately the 32-bit address space is pretty tight, leaving little room for randomization and hampering its effectiveness.
The topmost segment in the process address space is the stack, which stores local variables and function parameters in most programming languages. Calling a method or function pushes a new stack frame onto the stack. The stack frame is destroyed when the function returns. This simple design, possible because the data obeys strict LIFO order, means that no complex data structure is needed to track stack contents – a simple pointer to the top of the stack will do. Pushing and popping are thus very fast and deterministic. Also, the constant reuse of stack regions tends to keep active stack memory in the cpu caches, speeding up access. Each thread in a process gets its own stack.
It is possible to exhaust the area mapping the stack by pushing more data than it can fit. This triggers a page fault that is handled in Linux by expand_stack(), which in turn calls acct_stack_growth() to check whether it’s appropriate to grow the stack. If the stack size is below RLIMIT_STACK (usually 8MB), then normally the stack grows and the program continues merrily, unaware of what just happened. This is the normal mechanism whereby stack size adjusts to demand. However, if the maximum stack size has been reached, we have a stack overflow and the program receives a Segmentation Fault. While the mapped stack area expands to meet demand, it does not shrink back when the stack gets smaller. Like the federal budget, it only expands.
Dynamic stack growth is the only situation in which access to an unmapped memory region, shown in white above, might be valid. Any other access to unmapped memory triggers a page fault that results in a Segmentation Fault. Some mapped areas are read-only, hence write attempts to these areas also lead to segfaults.
Below the stack, we have the memory mapping segment. Here the kernel maps contents of files directly to memory. Any application can ask for such a mapping via the Linux mmap() system call (implementation) or CreateFileMapping() / MapViewOfFile() in Windows. Memory mapping is a convenient and high-performance way to do file I/O, so it is used for loading dynamic libraries. It is also possible to create an anonymous memory mapping that does not correspond to any files, being used instead for program data. In Linux, if you request a large block of memory via malloc(), the C library will create such an anonymous mapping instead of using heap memory. ‘Large’ means larger than MMAP_THRESHOLD bytes, 128 kB by default and adjustable via mallopt().
Speaking of the heap, it comes next in our plunge into address space. The heap provides runtime memory allocation, like the stack, meant for data that must outlive the function doing the allocation, unlike the stack. Most languages provide heap management to programs. Satisfying memory requests is thus a joint affair between the language runtime and the kernel. In C, the interface to heap allocation is malloc() and friends, whereas in a garbage-collected language like C# the interface is the new keyword.
If there is enough space in the heap to satisfy a memory request, it can be handled by the language runtime without kernel involvement. Otherwise the heap is enlarged via the brk() system call (implementation) to make room for the requested block. Heap management is complex, requiring sophisticated algorithms that strive for speed and efficient memory usage in the face of our programs’ chaotic allocation patterns. The time needed to service a heap request can vary substantially. Real-time systems have special-purpose allocators to deal with this problem. Heaps also become fragmented, shown below:

Finally, we get to the lowest segments of memory: BSS, data, and program text. Both BSS and data store contents for static (global) variables in C. The difference is that BSS stores the contents of uninitialized static variables, whose values are not set by the programmer in source code. The BSS memory area is anonymous: it does not map any file. If you say static int cntActiveUsers, the contents of cntActiveUsers live in the BSS.
The data segment, on the other hand, holds the contents for static variables initialized in source code. This memory area is not anonymous. It maps the part of the program’s binary image that contains the initial static values given in source code. So if you say static int cntWorkerBees = 10, the contents of cntWorkerBees live in the data segment and start out as 10. Even though the data segment maps a file, it is a private memory mapping, which means that updates to memory are not reflected in the underlying file. This must be the case, otherwise assignments to global variables would change your on-disk binary image. Inconceivable!
The data example in the diagram is trickier because it uses a pointer. In that case, the contents of pointer gonzo – a 4-byte memory address – live in the data segment. The actual string it points to does not, however. The string lives in the text segment, which is read-only and stores all of your code in addition to tidbits like string literals. The text segment also maps your binary file in memory, but writes to this area earn your program a Segmentation Fault. This helps prevent pointer bugs, though not as effectively as avoiding C in the first place. Here’s a diagram showing these segments and our example variables:

You can examine the memory areas in a Linux process by reading the file /proc/pid_of_process/maps. Keep in mind that a segment may contain many areas. For example, each memory mapped file normally has its own area in the mmap segment, and dynamic libraries have extra areas similar to BSS and data. The next post will clarify what ‘area’ really means. Also, sometimes people say “data segment” meaning all of data + bss + heap.
You can examine binary images using the nm and objdump commands to display symbols, their addresses, segments, and so on. Finally, the virtual address layout described above is the “flexible” layout in Linux, which has been the default for a few years. It assumes that we have a value for RLIMIT_STACK. When that’s not the case, Linux reverts back to the “classic” layout shown below:

Classic Process Address Space Layout In Linux

That’s it for virtual address space layout. The next post discusses how the kernel keeps track of these memory areas. Coming up we’ll look at memory mapping, how file reading and writing ties into all this and what memory usage figures mean.

Mutex vs. Semaphore, what is the difference?

The Toilet Example (c) Copyright 2005, Niclas Winquist ;)

Mutex:
Is a key to a toilet. One person can have the key - occupy the toilet - at the time. When finished, the person gives (frees) the key to the next person in the queue.
Officially: "Mutexes are typically used to serialise access to a section of re-entrant code that cannot be executed concurrently by more than one thread. A mutex object only allows one thread into a controlled section, forcing other threads which attempt to gain access to that section to wait until the first thread has exited from that section."
Ref: Symbian Developer Library
(A mutex is really a semaphore with value 1.) Semaphore:
Is the number of free identical toilet keys. Example, say we have four toilets with identical locks and keys. The semaphore count - the count of keys - is set to 4 at beginning (all four toilets are free), then the count value is decremented as people are coming in. If all toilets are full, ie. there are no free keys left, the semaphore count is 0. Now, when eq. one person leaves the toilet, semaphore is increased to 1 (one free key), and given to the next person in the queue.
Officially: "A semaphore restricts the number of simultaneous users of a shared resource up to a maximum number. Threads can request access to the resource (decrementing the semaphore), and can signal that they have finished using the resource (incrementing the semaphore)."
Ref: Symbian Developer Library

Inspiring Quotes

You cannot believe in God until you believe in yourself.
Swami Vivekananda

Truth can be stated in a thousand different ways, yet each one can be true.
Swami Vivekananda

That man has reached immortality who is disturbed by nothing material.
Swami Vivekananda

In one word, this ideal is that you are divine.
Swami Vivekananda

You cannot endow even the best machine with initiative .---Wlater Lippmann

A Little Learning is a dangerous things. --- Alexander Pope

What is a name? That which we call a rose By any other name would smell as sweet.--- Shakespears

"Begin at the beginning," the king said, very gravely,”and go on till come to the end: then stop.” --- Lewis Carroll

"Opportunities don’t happen, you create them.” via Chris Grosser

“Don’t complain; just work harder.” via Chris Grosser

“Deeds not Words” via Bernord Hor

“Before you can work smart you must work hard.” via Chris Pardo

“Persevere…because on the road to success there is never a crowd on the extra mile!” via Charity Gibson

“I’m not afraid of dying, I’m afraid of not trying.” – Jay Z via Steven Gordon

“Coffee is for CLOSERS!” via Michael Wayne

“It’s not the cards you’re dealt it’s how you play the game” via Chris Pardo

“If you think you made it, your at the wrong place, never stop” via Chris Pardo

“Some people dream of great accomplishments, while others stay awake and do them” via Danielle Luedtke

“The grass is greener where you water it” via Danielle Luedtke

Believe in yourself! Have faith in your abilities! Without a humble but reasonable confidence in your own powers you cannot be successful or happy. Norman Vincent Peale