Friday, July 17, 2009

virtual destructor

Ask any programmer, he'll immediately reply saying "A destructor is a member function of a class, which gets called when the object goes out of scope". This means all clean ups and final steps of class destruction are to be done in destructor. A virtual function is something which helps a derived class in overriding the implementation of a functionality of a base class.
The order of execution of destructor in an inherited class during a clean up is like this.
1. Derived class destructor
2. Base class destructor

A difference between a destructor (of course also the constructor) and other member functions is that, if a regular member function has a body at the derived class, only the version at Derived class gets executed. Whereas in case of destructors, both derived as well as base class versions get executed.

Now turning our attention to why a destructor has to be virtual, the reason is that we, programmers are very smart. We'll do days and nights of work to inherit and extend the functionality of an existing class which is being used, and say that we don't want to change the implementation/interface just for the sake of a new entrant. Let me explain this with an example.


--------------------------------------------------------------------------------
#include
class Base
{
public:
Base(){ cout<<"Constructor: Base"< ~Base(){ cout<<"Destructor : Base"<};
class Derived: public Base
{
//Doing a lot of jobs by extending the functionality
public:
Derived(){ cout<<"Constructor: Derived"< ~Derived(){ cout<<"Destructor : Derived"<> };
void main()
{
Base *Var = new Derived();
delete Var;
}

Try executing this code, you'll see the difference. To our observation, the constructors are getting called in the proper order. But to the dread of a programmer of a large project, the destructor of the derived class was not called at all.

This is where the virtual mechanism comes into our rescue. By making the Base class Destructor virtual, both the destructors will be called in order. The following is the corrected sample.

#include
class Base
{
public:
Base(){ cout<<"Constructor: Base"< virtual ~Base(){ cout<<"Destructor : Base"<};
class Derived: public Base
{
//Doing a lot of jobs by extending the functionality
public:
Derived(){ cout<<"Constructor: Derived"< ~Derived(){ cout<<"Destructor : Derived"<};
void main()
{
Base *Var = new Derived();
delete Var;
}


Note:
There is one more point to be noted regarding virtual destructor. We can't declare pure virtual destructor. Even if a virtual destructor is declared as pure, it will have to implement an empty body (at least) for the destructor.

Sunday, July 12, 2009

Intel interview question II

1. Write a function that will return the bit mirror image of inputData. (i.e. 11110010b => 01001111b)
unsigned short mirror (unsigned short inputData)
{

int numberByte = sizeof(unsigned short);//decide how many bytes in this processor.
int *temp_buffer,j,temp;
unsigned short mirrorOutput=0;
int numberBits = numberByte * 8; // how many bits for inputData
// allocate the memory
temp_buffer = (int*)malloc(sizeof(int)*numberBits);

// read bits information into temp_buffer.
for (j =0;j temp_buffer[j]= (inputData & (1<
// output the bits informaiton for inputData.
printf("output the bits informaiton for inputData\n");
for (j =0;j printf("%d ",temp_buffer[j]);


// inverse the buffer.
for (j=0;j temp = temp_buffer[j];
temp_buffer[j]= temp_buffer[numberBits-1-j];
temp_buffer[numberBits-1-j] = temp;
}
// output the mirror bits information for outputData
printf("\n\n output the bits informaiton for outputData\n");
for (j =0;j printf("%d ",temp_buffer[j]);

// output the integral number
for (j=0;j mirrorOutput = mirrorOutput + temp_buffer[j]*(1<
return mirrorOutput;
}




2. What is the output from the following code? (Assume 32-bit integers)

int x[] = {1,2,3,4,5};
int *p = x;
char *pc = (char *) p;
*(p++) = 6 ;
*(int *)(pc+4) = 7;
printf(”%d %d %d %d %d\n”, x[0], x[1], x[2], x[3], x[4]);


6,7,3,4,5.





3. What is the value of a 32-bit integer with bits 13:9 (9 through 13) set to 0x1A?
4. Given an array that contains N numbers, write a routine to determine if there are two numbers in the array whose sum equals the number K. For example, given the array {8, 4, 1, 6}, and K=10, then return true since 4+6=10. Describe a solution that would take O(N log N) time.
5. Answer the following questions relating to the code below:
class Buffer {
static int dataBuffer[100];
static int head = 0;
static int tail = 0;

public:
static void Produce(int data) {
dataBuffer[head++] = data;
if (head == 100) head = 0;
}
static bool CanConsume() {
if (head != tail) return true;
return false;
}
static int Consume() {
int pos = tail++;
if (tail == 100) tail = 0;
return dataBuffer[pos];
}
}
void Producer() {
while(1) {
… some stuff …
Buffer.Produce(x);
… some stuff …
}
}
void Consumer() {
while(1) {
… some stuff …
if(Buffer.CanConsume() ) {
int x = Buffer.Consume();
… do something with x …
}
… some stuff …
}
}

a. If we have a multithreaded system with one thread executing ‘Producer’ and one thread executing ‘Consumer’ is there anything wrong with the code above? Give details.
The major bug in this implementation is buffer is not protected. Two problems will occur: the producer may write into the buffer and index is not updated yet, consumer will get the wrong result. We can use the semaphore that limits access to that section of code to only one thread at a time.

b. If we have two threads running ‘Producer’ and one thread running ‘Consumer’ is there anything wrong with the code above? Give details.
If two threads running ‘producer’ and one thread running ‘consumer’, the buffer will be overflow because of the fast speed of producing data. We need another semaphore indicating the buffer is not full in order to produce the data into buffer.


c. If we have one thread Running ‘Producer’ and two threads running ‘Consumer’ is there anything wrong with the code above? Give details.
If we have one thread Running ‘Producer’ and two threads running ‘Consumer’, the buffer will be empty. The thread consumer will wake up, grab the lock and see that the buffer is empty or not, and then go back to sleep. We need set another semaphore indicating the buffer is not empty.



6. Write a function to return a nearest integer value given a float input.

int round (float f)
{
return nearIntegel = f>=0?(int)(x+0.5):(int)(x-0.5);
}



Intel graphic team

Intel Graphyics Team
One of the things with the questions below is that some of the
questions are not as straightforward as they look since they
may not work on all system architectures or maybe performance
sensitive. For additional brownie points, you are encouraged
to make additional comments about the questions/answers.
1. Consider the following structure definitions:
typedef struct {
int a;
int b;
int c;
} ABC;

typedef struct {
int d;
int e;
int f;
ABC *abc;
} DEF;

Write code for the following two functions:

// The create function uses a SINGLE MALLLOC to allocate for structure
// DEF and its child structure (i.e. at the end of the allocation,
// these two lines of code will be valid:
//
// DEF *pdef = CreateDEF ();
// pdef->abc->a = 10;
DEF *CreateDEF (void)
{


}

// Note: To the free function, we are passing a pointer to ABC.

void FreeDEF (ABC *pabc)
{

}

solution:
#include "stdafx.h"
typedef struct {
int a;
int b;
int c;
} ABC;

typedef struct {
int d;
int e;
int f;
ABC *abc;
} DEF;

DEF *CreateDEF (void)
{
// dynamicallly allocate the memory
DEF *temp_DEF = (DEF*)malloc(sizeof(DEF));
if(!temp_DEF){
printf("\n error allocating the memory for DEF\n");
return NULL ;
}
else
{
//allocate the memory for its child structure
temp_DEF->abc = (ABC*)malloc(sizeof(ABC));
if(!temp_DEF->abc){
printf("\n error allocating the memory for its child ABC\n");
return NULL ;
}
// return the valid memory allocation
return temp_DEF;
}
}
void FreeDEF (ABC *pabc)
{
//if memory is not empty
if(pabc!=NULL) free(pabc);

}

int _tmain(int argc, _TCHAR* argv[])
{
DEF *pdef = CreateDEF ();

//problem 2.
int value;

// get the address of pdef
value = (int) pdef;
// address + 8, moved to pdef->f
value += 8;

// then pass 10 to pdef->f.
*(int *) value = 10;

//now you can pass value to pdef.
pdef->abc->a = 10;
// print out the value on console.
printf("%d\n", pdef->abc->a);
printf("%d\n", pdef->abc-f);

//deallocate the memory
FreeDEF(pdef->abc);
if(pdef!=NULL) free(pdef);

return 0;
}


2. Is the following code OK? What does it do?

DEF *pdef = CreateDEF();
int value;

value = (int) pdef;
value += 8;

*(int *) value = 10;

solution:
it fristly get the address of pdef and add by 8 bypte, which
point to the f of DEF. therefore, the value 10 should pass to
pdef->f.
pdef->f =10




3. Consider 4 components of a color where:

unsigned char red = 0x10;
unsigned char green = 0xFF;
unsigned char blue = 0x0F;
unsigned char alpha = 0x44;

Generate a packed color ARGB which is a 32 bit integer such that A is in the
MSB of ARGB followed by red, green, blue

-----------------
A R G B
-----------------

unsigned int PackBytes (unsigned char red, .. green, ... blue, .. alpha)
{


}

solution:
#include "stdafx.h"
unsigned int PackBytes (unsigned char red, unsigned char green, unsigned char blue, unsigned char alpha)
{

unsigned int result;
// package the four elements using left shift and or.
result = (alpha<<24)(red<<16)(green<<8)(blue);
return result;
}
void UnPackByptes(unsigned int result)
{
unsigned char alpha,red,green, blue;
alpha = (result&0xFF000000)>>24;
red = (result&0x00FF0000)>>16;
green = (result&0x0000FF00)>>8;
blue = (result&0x000000FF);
printf("%d,%d,%d,%d\n",alpha,red,green,blue);
}
int _tmain(int argc, _TCHAR* argv[])
{
unsigned char red = 0x10;
unsigned char green = 0xFF;
unsigned char blue = 0x0F;
unsigned char alpha = 0x44;
unsigned int result;
//package these four components
result = PackBytes(red,green,blue,alpha);
printf("the packaged result is %d\n",result);
//unpack the result into four components for verification.
UnPackByptes(result);
return 0;
}


4. What is wrong with the following code?

a. How would you fix it by changing the code?
b. How would you fix it by changing the macro? (Not allowed to
collapse the three printf's into a single printf).


#define PRINT_THREE_TIMES \
printf ("one: %d\n", 1); \
printf ("two: %d\n", 2); \
printf ("three: %d\n", 3); \



if (i == 1)
....;
else if (i == 2)
....;
else if (i == 3)
PRINT_THREE_TIMES
else if (i == 4)
....;

Solution:
(a). adding bracket
if (i == 1)
....;
else if (i == 2)
....;
else if (i == 3)
{PRINT_THREE_TIMES}
else if (i == 4)
....;
(b).#define PRINT_THREE_TIMES_modify { printf ("one: %d\n", 1); printf ("two: %d\n", 2); printf ("three: %d\n", 3);}
5. Consider the following code:
int CheckForZero (int a[], int n)
{
int i;

for (i = 0; i < n; i++) {
if (a[i] == 0) {
return (TRUE);
}
}
return (FALSE);
}
This code works - but it does a check for every element in 'a' - i.e.
it does "n" compares and bails early if it finds even one zero element.
Our array 'a' generally does not have a zero. Optimize this code to
do only one compare. Feel free to instead use some mathematical
operations such as ADD, SUB etc.

Solution:
#include "stdafx.h"
int CheckForZero_Opt(int a[],int n)
{
int i,result=1;
//multiply all elements for once comparison, in fact, doing such way is not efficient, because each time
// multiplication is very expensive. however,in order to provide once comparison, this is one method.
for (i=0;i result*=a[i];
//if result is zero, it means that there is at least one zero element in the array.
if(result==0)
return 1;
else
return 0;
}
int _tmain(int argc, _TCHAR* argv[])
{
int a[4]={1,0,-2,4};
int flag;
flag = CheckForZero_Opt(a,4);
if(flag==1) printf("there is at least one zero element in the array\n");
else printf("there is no zero in the array\n");

return 0;
}
6. Consider the following function:
int SpecialFunction (int a)
{

if (a <= 1) {
return (1);
} else {
return (a * SpecialFunction (a - 2));
}
}

What is the value of x where "x = SpecialFunction (11)"


11*9*7*5*3*1=10395





7. Write code to see if a number is a power of 2

int Is2Power (unsigned int a)
{


}
solution:
#include "stdafx.h"
int Is2Power (unsigned int a)
{
int numOnes=0;
//calculate the total numbers of 1 in the integer number.
while(a)
{ a = a&(a-1);
numOnes++;
}
// if only oneOnes,then it is power of 2.
if(numOnes==1)
return 1;
else
return 0;
}

int _tmain(int argc, _TCHAR* argv[])
{
int a,flag;
printf("please input one integer number here:\n");
scanf("%d",&a);
flag = Is2Power(a);
if(flag==1) printf("this is a power of 2\n");
else printf("this is not a power of 2\n");
return 0;
}


8. What is the value of x where "x = sizeof (struct A)" on a 32 bit machine?
Show your calculations.

struct A {
char a;
char b;
int c;
short d;
int *p;
};

solution:
#include "stdafx.h"
struct A {
char a;//1 byte
char b;//1 byte
int c;//4 byte
short d;//4 byte
int *p; //4 byte.

};
/*according to the above label,the length of struct should 14.in fact the actual length should be 16, not 14, because of word alignment. The reason for this is that most compilers, by default, align complex data-structures to a word alignment boundary. In addition, the individual members are also aligned to their respective alignment boundaries. By this logic, the structure student gets aligned on a word boundary and the variable age within the structure is aligned with the next word address. This is accomplished by way of the compiler inserting "padding" space between two members or to the end of the structure to satisfy alignment requirements.This padding is inserted to align age with a word boundary. */
int _tmain(int argc, _TCHAR* argv[])
{
int length;
length = sizeof(struct A);
printf("%d\n",length);
return 0;
}


9. The operating system typically allocates memory in pages such that
the base address of the pages are 0, 4K, 8K, .... Given two
addresses, write a program to find if two pointers are on the
same page.

int AreOnSamePage (int *a, int *b)
{


}

solution:
#include "stdafx.h"
int AreOnSamePage (int *a, int *b)
{

int aAddress = int(a);
int bAddress = int(b);
int page_size = 4096;
int aPageNum, bPageNum;

//define the page number in the memory space
aPageNum = aAddress/page_size;
bPageNum = bAddress/page_size;

//if they are the same papge, return 1.
if(aPageNum==bPageNum&&abs(bAddress-aAddress) return 1;
else
return 0;

}
int _tmain(int argc, _TCHAR* argv[])
{
int *a,*b;
int c;
//dynamically allocat memory for a and b
a = (int*)malloc(sizeof(int));
b = (int*)malloc(sizeof(int));
// decide whether they are on the same page.
c = AreOnSamePage(a,b);
//print out result
if(c==1) printf("they are in the same page\n");
else printf("they are not in the same page\n");
//deallocate memory
free(a);
free(b);
return 0;
}

10. Write a function to generate a nearest integer value.

int round (float f)
{
}
solution:
#include "stdafx.h"
int round (float f)
{
return int(f+0.5);
}
int _tmain(int argc, _TCHAR* argv[])
{
float f;
int c;
printf("please input on float number here:\n");
scanf ("%f",&f);
c = round(f);
printf("round number is %d\n",c);
return 0;
}

Qualcomm Internship Interview questions

2) Tell me about the most enjoyable project you have completed while at work/school, and why
· I developed one robust face detection/tracking algorithm on TI OMAP3430 platform, and also based on face detection I can overlay 2D/3D graphics object on the Phone via OpenGL.
· I improved H.264 video encoding with my developed two algorithms.
· Recently I also developed one robust head pose estimation running on TI OMAP3430.
· TI has already shown my developed demos to their customers.


3) What are some questions you would ask yourself before deciding to use a LUT to complete some task?
· (1). Firstly we should think lookup table can be generalized to other conditions. For example, we have developed one lookup table for color-based on tracking. In order to robustly tracking the faces, I have developed two lookup tables for fluorescent and incandescent conditions. For other conditions between fluorescent and incandescent condition, you need to consider whether your developed lookup table still works.
· (2). How big is your lookup table, especially for mobile device.


4) What data structure would you use if you were developing a spell checker for a language that contained 10 words, 100 words, 50000 words?
For 50000 words,
Binary Tree for larger dictionary should be used, because it gives O(log2(n)).
For small dictionary, array should be used.


4) What is one way you could dramatically hurt the performance of an image rotation function?
The pixel at non-integer position, additonal interpolation filter should be used.


6) What was this programmer most likely trying to do, and fix his code:
int f(char * p)
{
int n = 0;
while(*p != 0) n = 10 *n + *++p - 0;
return &n;
}
Convert one string to integer number.
int f(char * p)
{
int n = 0;
while(*p != 0)
n = 10 *n + *p++ - 0; // should post pointer plus plus
return n; //return one value not address.
}

Thursday, July 9, 2009

List all file names in one folder using C.

FILE *fid;
char fileName[256];
//positive samples(fist)
system("dir /s/b C:\\jianfeng\\debug_pics\\*.* > fist_list.txt");
fid = fopen("fist_list.txt","r");
if ( fid != NULL )
{
while ( fgets (fileName, sizeof(fileName), fid ) != NULL ) /* read a line */
{
fputs ( fileName, stdout ); /* write the line */
}
fclose ( fid );
}
else
{
perror ( "error" ); /* why didn't the file open? */
}

Wednesday, July 8, 2009

Machine Learning: Expectation Maximization via OpenCV

#include
#include
int main( int argc, char** argv )
{
const int N = 4;
const int N1 = (int)sqrt((double)N);
const CvScalar colors[] = {{{0,0,255}},{{0,255,0}},{{0,255,255}},{{255,255,0}}};
int i, j;
int nsamples = 100;
CvRNG rng_state = cvRNG(-1);
CvMat* samples = cvCreateMat( nsamples, 2, CV_32FC1 );
CvMat* labels = cvCreateMat( nsamples, 1, CV_32SC1 );
IplImage* img = cvCreateImage( cvSize( 500, 500 ), 8, 3 );
float _sample[2];
CvMat sample = cvMat( 1, 2, CV_32FC1, _sample );
CvEM em_model;
CvEMParams params;
CvMat samples_part;
cvReshape( samples, samples, 2, 0 );
for( i = 0; i < N; i++ )
{
CvScalar mean, sigma;
// form the training samples
cvGetRows( samples, &samples_part, i*nsamples/N, (i+1)*nsamples/N );
mean = cvScalar(((i%N1)+1.)*img->width/(N1+1), ((i/N1)+1.)*img->height/(N1+1));
sigma = cvScalar(30,30);
cvRandArr(&rng_state, &samples_part, CV_RAND_NORMAL, mean, sigma );
}
cvReshape( samples, samples, 1, 0 );
// initialize model's parameters
params.covs = NULL;
params.means = NULL;
params.weights = NULL;
params.probs = NULL;
params.nclusters = N;
params.cov_mat_type = CvEM::COV_MAT_SPHERICAL;
params.start_step = CvEM::START_AUTO_STEP;
params.term_crit.max_iter = 10;
params.term_crit.epsilon = 0.1;
params.term_crit.type = CV_TERMCRIT_ITERCV_TERMCRIT_EPS;
// cluster the data
em_model.train( samples, 0, params, labels );
// classify every image pixel
cvZero( img );
for( i = 0; i <>height; i++ )
{
for( j = 0; j <>width; j++ )
{
CvPoint pt = cvPoint(j, i);
sample.data.fl[0] = (float)j;
sample.data.fl[1] = (float)i;
int response = cvRound(em_model.predict( &sample, NULL ));
CvScalar c = colors[response];
cvCircle( img, pt, 1, cvScalar(c.val[0]*0.75,c.val[1]*0.75,c.val[2]*0.75), CV_FILLED );
}
}
//draw the clustered samples
for( i = 0; i < nsamples; i++ )
{
CvPoint pt;
pt.x = cvRound(samples->data.fl[i*2]);
pt.y = cvRound(samples->data.fl[i*2+1]);
cvCircle( img, pt, 1, colors[labels->data.i[i]], CV_FILLED );
}
cvNamedWindow( "EM-clustering result", 1 );
cvShowImage( "EM-clustering result", img );
cvWaitKey(0);
cvReleaseMat( &samples );
cvReleaseMat( &labels );
return 0;
}

Thursday, January 8, 2009

summary of the interview

1. C++ not too much familar;
2. Video streaming should also consider some network protocal such as RTP/RSVP, if having the time, I should go to www.helixcommunity.com to down the source and study it. 
3. 3GPP standard to support mobile streaming.
4. Video codec still require more time involved such as entropy coding and decoding part; 
5. Any DSP optimizations should be explored  such as bit operation for embedded system developer.
6. English communication skills.

Interview at Ambrado.

1. typedef struct element
{
    unsigned int a;
unsigned int b;
unsigned int c;
unsigned long long d;
unsigned long long e;
unsigned char f;
}element;

what about sizeof(element)? 
Key: it should be 40. how to put them into 32, I do not how to figure it about.


2. give 16 unsigned integer number, computing the round average value, only limited unsigned integer number.

unsinged int avg(unsinged int arr[16])
{
   unsigned int sum1 = 0;
   unsigned int sum2 = 0;
   unsinged int avg_val;
   for(int i=0;i<16;i++)
  {
      sum1 += arr[i]>>4;
      sum2 + = arr[i]&0xf
  }
   avg_val = sum1+(sum2+8)>>4;
  return avg_val;
}

3. How about SIMD architecture, should be more familar with DSP optimizaiton.
 Key:

loop unrolling; 
DMA techniques to move data from memory;
software pipeline;
SIMD to take care of parallel processing;
some intrinsic instructions.

4. How about entropy coding? I should take some time to study about the CAVLC and CABAC in H.264.

Wednesday, January 7, 2009

Microsoft interview questions

1.why are you interested in join SDE?

2. given one continuous array, provide the factorial value for each element of the array.

Answers:
typedef struct element
{
  int x;
  struct element *next;
}element;

void insert_link_list(element ** head, int value)
{
  
  element * tmp = (element*)malloc(sizeof(element));

 tmp->x = value;
 tmp->next = *head;
*head = tmp;
}

int factorial(int value)
{
  int result = 0; 
   while(value)
  {
     result+= value*(value-1);
    value--;
 }
return result;
}

void print_link_list(element **head)
{
   element **node;
   for(node=head; (*node)!=NULL; node = &(*node)->next)
    printf("%d\n", (*node)->x);
}

void main()
{
   element *head=NULL;
   int result;  
    for(int index=1;index<100;index++)
     {
      result = factorial(index);
     insert_link_list(&head,result);
   }
   print_link_list(&head);
}

  
3. given one number, such as 100, please find  all prime numbers starting from 1 to 100.

typedef struct element
{
  int x;
  struct element *next;
}element;

int is_prime_number(int n)
{
long c;
if (n <= 2) 
   return 1;
for (c = 2; c <=n/2; c++)
{
    if ((n % c) == 0) 
       return 0;
}
return 1;
}
void insert_link_list(element **head,int index)
{
element *new_element;
new_element = (element*)malloc(sizeof(element));
new_element->x = index;
//new_element->next = NULL;
new_element->next = *head;
*head = new_element;
}

void insert_middle_link(element **head,int index)
{
  element **node;
  element *new_elem;
  for(node=head;(*node)!=NULL;node = &(*node)->next)
  {
 if( (*node)->x == index)
 {
         new_elem = (element*)malloc(sizeof(element));
new_elem->x = index+1;
new_elem->next = (*node)->next;
(*node)->next = new_elem;
 }
  }
}


void delete_middle_list(element **head,int index)
{
   element **node;
  for(node=head;(*node)!=NULL;node = &(*node)->next)
  {
 if( (*node)->x == index)
 {
 (*node)= (*node)->next;
 }
  }
}

int _tmain(int argc, _TCHAR* argv[])
{
int index;
element *prime_list=NULL;
    element *new_element;

//element *new_element;
for(index=1;index<1000;index++)
{
    if(is_prime_number(index))
    {
      insert_link_list(&prime_list,index);
      printf("%d is prime number\n",index);
      } 
 }

insert_middle_link(&prime_list,3);
delete_middle_list(&prime_list,3);

while(prime_list!=NULL)
{
  printf("%d\n",(prime_list)->x);
  prime_list=(prime_list)->next;
}
return 0;
}

phone interview for real networks on Jan.05 2009

1. given one 8bit integer number, find the number of bit 1, and further optimizaiton.

2. what about singleton instance class?

3. virtual function, giving an example to describe.

4. mutlithread progamming in C++. how?

5. what about the technology COM?

6. network programming. give one server to stream one video to client, if the bandwidth is not enough, how does server knows this?

Tuesday, January 6, 2009

Implementing the Singleton Design Pattern by Danny Kalev

Singleton is probably the most widely used design pattern. Its intent is to ensure that a class has only one instance, and to provide a global point of access to it. There are many situations in which a singleton object is necessary: a GUI application must have a single mouse, an active modem needs one and only one telephone line, an operating system can only have one window manager, and a PC is connected to a single keyboard. I will show how to implement the singleton pattern in C++ and explain how you can optimize its design for single-threaded applications.

Design Considerations
Using a global object ensures that the instance is easily accessible but it doesn't keep you from instantiating multiple objects—you can still create a local instance of the same class in addition to the global one. The Singleton pattern provides an elegant solution to this problem by making the class itself responsible for managing its sole instance. The sole instance is an ordinary object of its class, but that class is written so that only one instance can ever be created. This way, you guarantee that no other instance can be created. Furthermore, you provide a global point of access to that instance. The Singleton class hides the operation that creates the instance behind a static member function. This member function, traditionally called Instance(), returns a pointer to the sole instance. Here's a declaration of such a class:

  class Singleton    {   public:       static Singleton* Instance();   protected:       Singleton();       Singleton(const Singleton&);       Singleton& operator= (const Singleton&);   private:       static Singleton* pinstance;   };

Instead of Singleton, you can name your class Mouse, FileManager, Scheduler, etc., and declare additional members accordingly. To ensure that users can't create local instances of the class, Singleton's constructor, assignment operator, and copy constructor are declared protected. The class also declares a private static pointer to its instance, pinstance. When the static function Instance() is called for the first time, it creates the sole instance, assigns its address to pinstance, and returns that address. In every subsequent invocation, Instance() will merely return that address.

The class's implementation looks like this:

  Singleton* Singleton::pinstance = 0;// initialize pointer   Singleton* Singleton::Instance ()    {     if (pinstance == 0)  // is it the first call?     {         pinstance = new Singleton; // create sole instance     }     return pinstance; // address of sole instance   }   Singleton::Singleton()    {      //... perform necessary instance initializations    }

Users access the sole instance through the Instance() member function exclusively. Any attempt to create an instance not through this function will fail because the class's constructor is protected. Instance() uses lazy initialization. This means the value it returns is created when the function is accessed for the first time. Note that this design is bullet-proof—all the following Instance() calls return a pointer to the same instance:

  Singleton *p1 = Singleton::Instance();   Singleton *p2 = p1->Instance();   Singleton & ref = * Singleton::Instance();

Although our example uses a single instance, with minor modifications to the function Instance(), this design pattern permits a variable number of instances. For example, you can design a class that allows up to five instances.

fast bit counter

Puzzle: Devise a fast algorithm for computing the number of 1-bits in an unsigned integer. If the machine supports n-bit integers, can we compute the number of 1-bits in O(log n) machine instructions? Can we compute the number of 1-bits in O(b) machine instructions where b is the number of 1-bits in the integer?

Source: Commonly asked in job interviews for Software Engineers. I first heard it in 1996 when I was a graduate student.

Solution: This article presents six solutions to this problem.

1) Iterated Count int bitcount (unsigned int n) {    int count = 0;    while (n) {       count += n & 0x1u;       n >>= 1;    }    return count; } 

Iterated Count runs in time proportional to the total number of bits. It simply loops through all the bits, terminating slightly earlier because of the while condition. Useful if 1’s are sparse and among the least significant bits.

2a) Sparse Ones int bitcount (unsigned int n)  {    int count = 0 ;    while (n)  {       count++ ;       n &= (n - 1) ;    }    return count ; } 

Sparse Ones runs in time proportional to the number of 1 bits. The mystical line n &= (n - 1) simply sets the rightmost 1 bit in n to 0.

2b) Dense Ones int bitcount (unsigned int n)   {    int count = 8 * sizeof(int) ;    n ^= (unsigned int) - 1 ;    while (n)  {       count-- ;       n &= (n - 1) ;    }    return count ; } 

Dense Ones runs in time proportional to the number of 0 bits. It is the same as Sparse Ones, except that it first toggles all bits (n ~= -1), and continually subtracts the number of 1 bits from sizeof(int).

Sparse Ones and Dense Ones were first described by Peter Wegner in “A Technique for Counting Ones in a Binary Computer”, Communications of the ACM, Volume 3 (1960) Number 5, page 322.

3a) Precompute_8bit static int bits_in_char [256] ;             int bitcount (unsigned int n)  {   // works only for 32-bit ints    return bits_in_char [n  & 0xffu]       +  bits_in_char [(n >>  8 ) & 0xffu]       +  bits_in_char [(n >> 16) & 0xffu]       +  bits_in_char [(n >> 24) & 0xffu] ; } 

Precompute_8bit assumes an array bits_in_char such that bits_in_char[i] contains the number of 1 bits in the binary representation for i. It repeatedly updates count by masking out the last eight bits in n, and indexing into bits_in_char.

3b) Precompute_16bit static char bits_in_16bits [0x1u <<>> 16) & 0xffffu] ; } 

Precompute_16bit is a variant of Precompute_8bit in that an array bits_in_16bits[] stores the number of 1 bits in successive 16 bit numbers (shorts).

4) Parallel Count #define TWO(c)     (0x1u << (c)) #define MASK(c)    (((unsigned int)(-1)) / (TWO(TWO(c)) + 1u)) #define COUNT(x,c) ((x) & MASK(c)) + (((x) >> (TWO(c))) & MASK(c))  int bitcount (unsigned int n)  {    n = COUNT(n, 0) ;    n = COUNT(n, 1) ;    n = COUNT(n, 2) ;    n = COUNT(n, 3) ;    n = COUNT(n, 4) ;    /* n = COUNT(n, 5) ;    for 64-bit integers */    return n ; } 

Parallel Count carries out bit counting in a parallel fashion. Consider n after the first line has finished executing. Imagine splitting n into pairs of bits. Each pair contains the number of ones in those two bit positions in the original n. After the second line has finished executing, each nibble contains the number of ones in those four bits positions in the original n. Continuing this for five iterations, the 64 bits contain the number of ones among these sixty-four bit positions in the original n. That is what we wanted to compute.

5) Nifty Parallel Count #define MASK_01010101 (((unsigned int)(-1))/3) #define MASK_00110011 (((unsigned int)(-1))/5) #define MASK_00001111 (((unsigned int)(-1))/17)  int bitcount (unsigned int n) {    n = (n & MASK_01010101) + ((n >> 1) & MASK_01010101) ;    n = (n & MASK_00110011) + ((n >> 2) & MASK_00110011) ;    n = (n & MASK_00001111) + ((n >> 4) & MASK_00001111) ;    return n % 255 ; } 

Nifty Parallel Count works the same way as Parallel Count for the first three iterations. At the end of the third line (just before the return), each byte of n contains the number of ones in those eight bit positions in the original n. A little thought then explains why the remainder modulo 255 works.

6) MIT HAKMEM Count int bitcount(unsigned int n) {    /* works for 32-bit numbers only    */    /* fix last line for 64-bit numbers */     register unsigned int tmp;     tmp = n - ((n >> 1) & 033333333333)            - ((n >> 2) & 011111111111);    return ((tmp + (tmp >> 3)) & 030707070707) % 63; } 

MIT HAKMEM Count is funky. Consider a 3 bit number as being 4a+2b+c. If we shift it right 1 bit, we have 2a+b. Subtracting this from the original gives 2a+b+c. If we right-shift the original 3-bit number by two bits, we get a, and so with another subtraction we have a+b+c, which is the number of bits in the original number. How is this insight employed? The first assignment statement in the routine computes tmp. Consider the octal representation of tmp. Each digit in the octal representation is simply the number of 1’s in the corresponding three bit positions in n. The last return statement sums these octal digits to produce the final answer. The key idea is to add adjacent pairs of octal digits together and then compute the remainder modulus 63. This is accomplished by right-shifting tmp by three bits, adding it to tmp itself and ANDing with a suitable mask. This yields a number in which groups of six adjacent bits (starting from the LSB) contain the number of 1’s among those six positions in n. This number modulo 63 yields the final answer. For 64-bit numbers, we would have to add triples of octal digits and use modulus 1023. This is HACKMEM 169, as used in X11 sources. Source: MIT AI Lab memo, late 1970’s.

7) Builtin Instructions

GNU compiler allows for

int __builtin_popcount (unsigned int x);

which translates into a single CPU instruction if the underlying machine architecture supports it. For example, Intel machines have POPCNT (SSE4 Instruction set announced in 2006). Many GCC builtin functions exist.

 No Optimization         Some Optimization       Heavy Optimization    Precomp_16 52.94 Mcps    Precomp_16 76.22 Mcps    Precomp_16 80.58 Mcps    Precomp_8 29.74 Mcps     Precomp_8 49.83 Mcps     Precomp_8 51.65 Mcps     Parallel 19.30 Mcps      Parallel 36.00 Mcps      Parallel 38.55 Mcps          MIT 16.93 Mcps           MIT 17.10 Mcps         Nifty 31.82 Mcps        Nifty 12.78 Mcps         Nifty 16.07 Mcps           MIT 29.71 Mcps       Sparse  5.70 Mcps        Sparse 15.01 Mcps        Sparse 14.62 Mcps        Dense  5.30 Mcps        Dense  14.11 Mcps         Dense 14.56 Mcps     Iterated  3.60 Mcps      Iterated  3.84 Mcps      Iterated  9.24 Mcps    Mcps = Million counts per second 

Which of the several bit counting routines is the fastest? Results of speed trials on an i686 are summarized in the table on left. “No Optimization” was compiled with plain gcc. “Some Optimizations” was gcc -O3. “Heavy Optimizations” corresponds to gcc -O3 -mcpu=i686 -march=i686 -fforce-addr -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -malign-functions=4.