Thesis Presentation V4
-
Upload
ahmed-abdel-aziz -
Category
Documents
-
view
806 -
download
0
Transcript of Thesis Presentation V4
1 Cork Institute of Technology - Candidate for Master of Science Degree 1
Using Big Data Analytics in a Social Domain
Master’s in Cloud Computing 2013/2014
Ahmed Abdel-Aziz May 2015
EMCCAe, CISSP, PMP
Cork Institute of Technology - Candidate for Master of Science Degree 2
Objective
1) Social Media, Analytics and the Marketing Campaign 2) Sentiment Analysis – Methodology & Techniques 3) The Need for Case-Study & an Analytics Prototype 4) Learning Outcomes & Future Work
Cork Institute of Technology - Candidate for Master of Science Degree 3
Social Media & Social Analytics • Social media sites offspring of Web 2.0 Movement – Based
on cloud computing model (Software-as-a-Service) • Percentage of companies using social media for marketing is
88%
Section 1 of 4
Cork Institute of Technology - Candidate for Master of Science Degree 4
Marketing Campaign Lifecycle Section 1 of 4
• Consists of 5 phases • Social analytics answers social questions for each phase Ex: What is sentiment trend?
Cork Institute of Technology - Candidate for Master of Science Degree 5
• Social analytic projects based on sentiment analysis benefit from a well thought out methodology
Section 2 of 4 Sentiment Analysis Methodology &
Techniques
Cork Institute of Technology - Candidate for Master of Science Degree 6
Section 2 of 4 Sentiment Analysis Methodology &
Techniques • Social sentiment analysis starts with social listening
– Social listening can be performed using a variety of open source tools such as PostgreSQL, R, Wordle, and Circos, as well as tools such as Attensity 360 and Analyze.
• Social data comes from 3 main categories of sources – Social user’s account – analytic capability limited by social
media provider (FB, Twitter, LinkedIn) – Social APIs – social media provider offers API to tap into
social data. Allows development of unique analytic programs – 3rd party tools – provides very fast results but does not
offer same level of analytic capability of a custom program
SANS Technology Institute - Candidate for Master of Science Degree 7
Section 2 of 4 Sentiment Analysis Methodology
& Techniques • Sentiment analysis techniques grouped into two main
categories: • Supervised machine learning method • Unsupervised method
• Supervised learning method learns features/words that correlate with +ve/-ve sentiment. Can identify new text sentiment
• Unsupervised methods a lexicon is used with words pre-
scored for polarity values. Sum of scores gives sentiment
• Both techniques widely used and offer comparable results Cork Institute of Technology - Candidate for Master of Science Degree 7
Cork Institute of Technology - Candidate for Master of Science Degree 8
• Company launched new product to market – Marketing campaign already launched long ago and in Account Performance Phase
• Marketing team needs to measure upticks in sentiment
trend regarding new product to take appropriate actions
• Data science team believes continuous user surveys are ineffective and a computational approach is necessary -> Better results and much less intrusive
Need for Case-Study/Analytics Prototype Section 3 of 4
Cork Institute of Technology - Candidate for Master of Science Degree 9
Need for Case-Study/Analytics Prototype
• Decision made to build a prototype for tool to measure sentiment trend on Twitter specifically as start
• Twitter found to be the social network of choice regarding brand and product sentiment topics à Thus Twitter
• Data science team key objectives: – Produce useful results quickly and cost efficiently
Cloud Computing Value Proposition!? – Get buy-in from marketing management to build full App
Section 3 of 4
Cork Institute of Technology - Candidate for Master of Science Degree 10
• Applying the Analytic Project Lifecycle to the Prototype
– Data Preparation Input Data: raw tweets Output Data: clean tweet text ready for sentiment analysis
– Analytic Model Planning & Building Input Data: clean tweet text and learnt Naïve Bayesian model Output Data: sentiment of analyzed tweets
– Communicate Results
Input Data: sentiment of analyzed tweets and tweets Output Data: sentiment trend graph for both +ve and –ve sentiments
Section 3 of 4
Need for Case-Study/Analytics Prototype
Cork Institute of Technology - Candidate for Master of Science Degree 11
Need for Case-Study/Analytics Prototype Section 3 of 4
• Technology decisions made by data science team – R programming language for social listening – Twitter Social APIs for source of social data – Leverage ready-made R packages to accelerate building time – R programming for data preparation – Leverage analytics cloud services offered such as Datumbox –
supervised machine learning method using Naïve Bayesian – R programming to build main body for prototype analytics
application – Making use of R plotting capabilities to present easy to understand
results for non-technical members of Marketing team – Settling on the technologies to use to build the full blown
application dealing with much larger data sets – GPText/Pivotal HD
Cork Institute of Technology - Candidate for Master of Science Degree 12
Need for Case-Study/Analytics Prototype Section 3 of 4
• Snippet R code for analytics application – Main Loop • > possentiments = 0 • > negsentiments = 0 • > for (i in 1:"5") • > { • > tweets = searchTwitter("iPhone", n=5, lang="en”)t • > tweet_txt = sapply(tweets, function(x) x$getText()) • > tweet_clean = clean.text(tweet_txt) • > tweet_num = length(tweet_clean) • …….. • > for (i in 1:tweet_num) • > { • > tmp = getSentiment(tweet_clean[i], "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa") • > tweet_df$sentiment[i] = tmp$sentiment • ……….. • > } • > possentiments <- c(possentiments, sum(tweet_df$sentiment=="positive")) • > negsentiments <- c(negsentiments, sum(tweet_df$sentiment=="negative")) • > Sys.sleep(5)
Cork Institute of Technology - Candidate for Master of Science Degree 13
Need for Case-Study/Analytics Prototype
• Plotting the trend of both positive and negative sentiments
Section 3 of 4
SANS Technology Institute - Candidate for Master of Science Degree 14
Learning Outcomes
• Initial State – Good foundation in cloud computing and data analytics – Very little knowledge in social domain – Not even FB account J – Last coding experience was Java 13 years back
• Initial research project stages – Social media university – Addictive analytics workshop -> Introduction to Marketing domain – Pivotal workshop to learn data analytics in social domain ->
Relevant Pivotal Data Analytics Platforms: GPText and Pivotal HD
• Later research project stages – practical – Learning enough about R to build small scale analytics application – How to leverage Datumbox analytics-as-a-service offering
Section 4 of 4
Cork Institute of Technology - Candidate for Master of Science Degree 14
Cork Institute of Technology - Candidate for Master of Science Degree 15
Summary
• Cloud, social, and Data Analytics synergy serve Marketing • Is there an uptick in +ve/-ve sentiments of my product?
Is a question strategically important in the Account Performance phase of a Marketing Campaign
• The research answered the question using a computational
approach based on a supervised learning method for sentiment analysis that is cloud based
• Data source and data analytics in the cloud. Data preparation and data presentation on-premise using R. Future work: Optimize & Tune for Large Datasets -> Can be all Cloud