With the recent breakthrough in commodity 3D imaging solutions such as depth sensing, photogrammetry, stereoscopic vision and structured light, 3D shape recognition is becoming an increasingly important problem. A longstanding question is what should be the format of the 3D shape (such as voxel, mesh, point-cloud etc.) and what could be a good generic feature representation for shape recognition. This question is particularly important in the context of convolutional neural network (CNN) whose efficacy and complexity depends upon the choice of input shape format and the design of network. It has been seen that both 3D voxel representation as well as collection of rendered views on 2D images have produced competing results. Similarly, it have been seen that networks with few million parameters and networks with several hundred million parameters have similar performance. In this work we compare these solutions and provide an analysis on the factors resulting in increase in the parameters without significantly improving accuracy. On the basis of the above analysis we propose a representation method (point cloud to 2D grid) and architecture that results in much less parameters for the CNN but has competing accuracy.