A Review of Sentiment Analysis in Twitter Data Using Hadoop
DOI:
https://doi.org/10.51983/arss-2015.4.2.2770Keywords:
Twitter, Sentiment Analysis, Hadoop, Map reduce, HDFSAbstract
Twitter is an online social networking site which contains rich amount of data that can be a structured, semistructured and un-structured data. In this work, a method which performs classification of tweet sentiment in Twitter is discussed. To improve its scalability and efficiency, it is proposed to implement the work on Hadoop Ecosystem, a widely-adopted distributed processing platform using the MapReduce parallel processing paradigm. Finally, extensive experiments will be conducted on real-world data sets, with an expectation to achieve comparable or greater accuracy than the proposed techniques in literature.
References
Wilson, T., Wiebe, J., & Hoffmann, P. (2005). Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of HLT and EMNLP (pp. 347–354). ACL.
Chu, C.-T., Kim, S. K., Lin, Y.-A., Yu, Y., Bradski, G., Ng, A. Y., & Olukotun, K. (2006). Map-reduce for machine learning on multicore. In NIPS (Vol. 6, pp. 281-288).
Lin, J., & Kolcz, A. (2012). Large-scale machine learning at Twitter. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (pp. 793-804). ACM.
Bian, J., Topaloglu, U., & Yu, F. (2012). Towards large-scale Twitter mining for drug-related adverse events. In Proceedings of the 2012 international workshop on Smart health and wellbeing (pp. 25-32). ACM.
Liu, B., Blasch, E., Chen, Y., Shen, D., & Chen, G. (2013). Scalable Sentiment Classification for Big Data Analysis Using Naive Bayes Classifier. In Big Data, 2013 IEEE International Conference on (pp. 99-104). IEEE.
ÁlvaroCuesta, D. F., & R-Moreno, M. D. (2014). A Framework For Massive Twitter Data Extraction And Analysis. Malaysian Journal of Computer Science, 50-67.
Skuza, M., & Romanowski, A. (2015). Sentiment analysis of Twitter data within big data distributed environment for stock prediction. In Computer Science and Information Systems (FedCSIS), 2015 Federated Conference on (pp. 1349-1354). IEEE.
Tare, M., Gohokar, I., Sable, J., Paratwar, D., & Wajgi, R. (2014). Multi-Class Tweet Categorization Using Map Reduce Paradigm. International Journal of Computer Trends and Technology, 78-81.
Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.
Bu, Y., et al. (2010). HaLoop: Efficient iterative data processing on large clusters. Proceedings of the VLDB Endowment, 3(1-2), 285-296.
Taboada, M., et al. (2011). Lexicon-based methods for sentiment analysis. Computational Linguistics, 37(2), 267-307.
Rao, T., & Srivastava, S. (2012). Analyzing stock market movements using Twitter sentiment analysis. Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012). IEEE Computer Society.
Dooms, Pessemier, & Martens. (2013). MovieTweetings: A Movie Rating Dataset Collected From Twitter. Ghent University, Ghent, Belgium.
Twitter. (n.d.). Twitter Search API. Available at https://dev.twitter.com/rest/public/search.
Katkar, V. D., Kulkarni, S. V. (2013). A Novel Parallel implementation of Naive Bayesian classifier for Big Data. International Conference on Green Computing, Communication and Conservation of Energy, 847-852.
Kumar, S., Morstatter, F., Liu, H. (2013). Twitter Data Analytics. Springer Science & Business Media.
Bhatnagar, V. (2013). Data Mining in Dynamic Social Networks and Fuzzy Systems. IGI Global.
Elmer, G., Langlois, G., Redden, J. (2015). Compromised Data: From Social Media to Big Data. Bloomsbury Publishing USA.
White, T. (2012). Hadoop: The Definitive Guide (Third Edition). O'Reilly.
George, L. (2011). HBase: The Definitive Guide. O'Reilly.
Hewitt, E. (2010). Cassandra: The Definitive Guide. O'Reilly.
Gates, A. (2011). Programming Pig. O'Reilly.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2015 The Research Publication
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.