Ashif Ali


The term Big data is a great buzzword in IT industries these days. It refers to large volume having both structured and unstructured format. This huge volume of data is generated from multidirections through various channels and usually of great concern to us. So we can analyze the insights of this data and can use it for the betterment of decision making and for profitable business strategies. Volume of data coming through direction is so huge that it cannot be processed using traditional procedures and technologies. Managing such volume requires the standard framework like Hadoop. It is also an open source which is attracting the mass audience for its management and popularity. Along with Hadoop big technologies like Pig, Hive and lot of other products also came into picture. Upcoming technologies like Spark, NoSQL databases and Google's Map reduce are also hitting the tech giants to solve complex problems. There are lot of proprietary and open source technologies in market which could be used to manage the data handling problems in big data environment. In this paper we will discuss the few technologies and the level application in small scale, mid scale and big scale industries.

Keywords: Big data, Hive, Pig, Spark, Framework, Technologies

Full Text:



M. A. Beyer and D. Laney, “The importance of ‟big data‟: A definition,” Gartner, Tech. Rep., 2012.

X. Wu, X. Zhu, G. Q. Wu, et al., “Data mining with big data,” IEEE Trans. on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97-107, January 2014.Rajaraman and J. D. Ullman, “Mining of massive datasets,” Cambridge University Press, 2012.

Z. Zheng, J. Zhu, M. R. Lyu. “Service-generated Big Data and Big Data-as-a-Service: An Overview,” in Proc. IEEE BigData, pp. 403-410, October 2013. A . Bellogín,I. Cantador, F. Díez, et al., “An empirical comparison of social, collaborative filtering, and hybrid recommenders,” ACM Trans. on Intelligent Systems and Technology, vol. 4, no. 1, pp. 1-37, January 2013.

W. Zeng, M. S. Shang, Q. M. Zhang, et al., “Can Dissimilar Users Contribute to Accuracy and Diversity of Personalized Recommendation?,” International Journal of Modern Physics C, vol. 21, no. 10, pp. 1217- 1227, June 2010.

X. Liu, G. Huang, and H. Mei, “Discovering homogeneous web service community in the user-centric web environment,” IEEE Trans. on Services Computing, vol. 2, no. 2, pp. 167-181, April-June 2009.

Zielinnski, T. Szydlo, R. Szymacha, et al., “Adaptive soa solution stack,” IEEE Trans. on Services Computing, vol. 5, no. 2, pp. 149-163, April-June 2012.

F. Chang, J. Dean, S. mawat, et al., “Bigtable: A distributed storage system for structured data,” ACM Trans. on Computer Systems, vol. 26, no. 2, pp. 1-39, June 2008.

V. Gupta, G. S. Lehal, “A Survey of Common Stemming Techniques and Existing Stemmers for Indian Languages,” Journal of Emerging Technologies in Web Intelligence, vol. 5, no. 2, pp. 157-161, May 2013.

T. Niknam, E. Taherian Fard, N. Pourjafarian, et al., “An efficient algorithm based on modified imperialist competitive algorithm and K-means for data clustering,” Engineering Applications of Artificial Intelligence, vol. 24, no. 2, pp. 306-317, March 2011.

M. J. Li, M. K. Ng, Y. M. Cheung, et al. “Agglomerative fuzzy k-means clustering algorithm with selection of number of clusters,” IEEE Trans. on Knowledge and Data Engineering, vol. 20, no. 11, pp. 1519-1534, November 2008.

AbdelrahmanElsayed, Osama Ismail, and Mohamed E. El-Sharkawi, MapReduce: State-of-the-Art and Research Directions.

Xindong Wu, Xingquan Zhu, Gong-Qing Wu and Wei Ding, Data Mining with Big Data (IEEE), Vol. 26, NO. 1, JANUARY 2014

Michele De Gennaro, Elena Paffumi, Giorgio Martini, Big Data for Supporting Low-Carbon Road Transport Policies in Europe: Applications, Challenges and Opportunities, Intl Journal of Big Data Research (Elsevier), 2 June 2016

Cui Yu, Josef Boyd, FB+- tree for Big Data Management, Intl Big Data Research (Elsevier), Pg: 25-36, Vol. 4, June 2016


  • There are currently no refbacks.