Summary
Over the past years I have became a huge fan of the artist Divide Music and the music he releases. Some of my favorite songs include Drag Me Under, Stand Tall, and By My Side. I love how he incorporates the feeling and emotion from the movie/show into his song that he backdrops to his created music video on Youtube. I know his channel has been growing over the last years but I wanted to do this project to find out just how much it has been growing.
In this project we aim to answer the following questions…
What is the average number of views and likes to expect per video?
What are the top songs based on number of likes?
How many likes can you expect on your next video based on the number of views?
Does the Life of the Channel have a significant effect on the number of views or likes they can expect per day?
To collect and clean the data I utilize Python and the Google API to pull data for the Divide Music channel and each one of their videos. After that I store this data within MYSQL for easy access to visual and statistical software. I then create a view by joining the Channel data to the Video data and create additional fields for help in analysis.
Visualizing this data is done utilizing R on this website and on Tableau. In these visualizations we try to answer the questions mentioned above by performing Descriptive, Diagnostic, Predictive, and Prescriptive analysis. This way we can describe what has happened, why it happened, what might happen in the future, and what possible courses they should take.
Discover New Music
Within Shinyapps, I have harnessed the power of interactive technology to craft a survey that aims to introduce you to the amazing world of Divide Music. This survey is designed to delve into your unique musical tastes, allowing me to recommend videos that resonate with you. With Divide Music’s extensive catalog, this approach ensures a variety of suggestions to ensure that you’ll uncover hidden gems and musical treasures that align perfectly with your preferences. Whether you have heard of Divide Music or just beginning your journey, mysurvey is your gateway to a harmonious discovery of the music that truly speaks to you.
Predictive Analysis
Here we want to predict the number of views and likes the Divide Music Youtube Channel will get on their next video which they typically publish every 2 weeks. By doing this we will run a simple linear regression on how the life of the channel affects the number of views or likes they can expect per day.
Exploratory Analysis on Views Per Day
Using GGPLOT and PLOTLY we graph the relationship between the Life of the Channel and the average number of views received per day the video has been released. With this graph showing a Upward Linear relationship on the chart, we can feel confident in moving forward with this explanatory variable.
Cleaning the Views Per Day Data
Within this dataset we can see several outliers occurring. This is due to the video either being out for a long time or not out for long enough. To keep these observations from affecting the model we are going to remove them using a package I developed to clean outliers outside the Interquartile Range (25th and 75th percentile). This will eliminate this percentage of our data…
## [1] "12%"
Our data now looks like so…
Creating the Model for Views Per Day
In this model we utilize Linear Regression to try to gauge a prediction. Here we gauge the summary of how the model has done with the data provided so we can properly gauge our prediction on future views.
Within this model we show a P-value below .05 showing that the Life of the Channel is statistically significant when it comes to the expected number of views to be gained per day. We also see an R-squared and Adjusted R-squared value around .49 which proves a mid level correlation on the model, but 51% of the variability isn’t explained within the model. I believe as more videos are released this model will begin to have a better fit.
##
## Call:
## lm(formula = views_to_life_ratio ~ life_of_channel, data = model.data.clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1101.77 -244.01 -73.48 175.60 1473.60
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -417.20931 97.81565 -4.265 0.0000358 ***
## life_of_channel 0.86508 0.06845 12.637 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 487 on 145 degrees of freedom
## Multiple R-squared: 0.5241, Adjusted R-squared: 0.5208
## F-statistic: 159.7 on 1 and 145 DF, p-value: < 0.00000000000000022
Predicition on Views Per Day for the Next Video
Finally, now that our model is created, we can run a prediction to determine the number of average views to be expected for the next video Divide Music Launches. The next video is typically launched every 2 weeks on Friday and that is what my prediction is based on.
The expected number of views per day is…
## 1
## 1816.4
Exploratory Analysis on Likes Per Day
Using GGPLOT and PLOTLY we graph the relationship between the Life of the Channel and the average number of likes received per day the video has been released. With this graph showing a Upward Linear relationship on the chart, we can feel confident in moving forward with this explanatory variable.
Cleaning the Likes Per Day Data
Within this dataset we can see several outliers occurring. This is due to the video either being out for a long time or not out for long enough. To keep these observations from affecting the model we are going to remove them using a package I developed to clean outliers outside the Interquartile Range (25th and 75th percentile). This will eliminate this percentage of our data…
## [1] "15%"
Our data now looks like so…
Creating the Model for Likes Per Day
In this model we utilize Linear Regression to try to gauge a prediction. Here we gauge the summary of how the model has done with the data provided so we can properly gauge our prediction on future likes.
Within this model we show a P-value below .05 showing that the Life of the Channel is statistically significant when it comes to the expected number of likes to be gained per day. We also see an R-squared and Adjusted R-squared value around .65 which proves a high level correlation on the model, but 35% of the variability isn’t explained within the model. This is better model than the one for Views Per Day.
##
## Call:
## lm(formula = likes_to_life_ration ~ life_of_channel, data = model.data.clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.1370 -3.4166 -0.7227 2.8804 18.8222
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.5649288 1.2296156 -6.152 0.00000000756 ***
## life_of_channel 0.0146467 0.0009153 16.002 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.74 on 140 degrees of freedom
## Multiple R-squared: 0.6465, Adjusted R-squared: 0.644
## F-statistic: 256.1 on 1 and 140 DF, p-value: < 0.00000000000000022
Predicition on Likes Per Day for the Next Video
Finally, now that our model is created, we can run a prediction to determine the number of average likes to be expected for the next video Divide Music Launches. The next video is typically launched every 2 weeks on Friday and that is what my prediction is based on.
The expected number of likes per day is…
## 1
## 30.3
Conclusion
Here we were able to predict the number of views and likes the Divide Music Youtube Channel will get on average per day on their next video which they typically publish every 2 weeks. To do this we ran two linear regression models and utilized the Life of the Channel as our Explanatory Variable to determine those predictions above.
This variable proved significant in both models and the model had mid to high level fit within out data’s variability. I am so excited to have utilized this data to show how much this channel has grown over the years. Their content has improved drastically and I look forward to their videos almost every 2 weeks. I invite you to check out their content as well on Youtube.
Explore Divide Music Visually
Tableau
Tableau Public is a platform created by Tableau for users to share interactive visualizations with a public community. Tableau has been an area where creativity meets data and being able to communicate/present the data in a way that is appealing to others has always been my goal. Here I published a dashboard as a tool to help others find specific information pertinent to this project. Please take a look!
Metabase
Metabase is an open-source business intelligence and data visualization tool that allows users to easily create interactive dashboards and analyze data without the need for extensive technical expertise. It provides a user-friendly interface for querying databases, exploring datasets, and generating visualizations such as charts and graphs. Metabase supports various data sources, including SQL databases, NoSQL databases, and cloud-based data warehouses. Its goal is to democratize data access within organizations, enabling non-technical users to make informed decisions based on data insights. Metabase is designed to be easy to install, configure, and use, making it a popular choice for organizations seeking a simple yet powerful business intelligence solution.
Project Repo
Github
I utilize GitHub as a way to collaborate with others to show them personal projects and host static websites to view these projects. Within this repo you can view the tools, assets, and code used in this project. Some of which include but are not limited to R Scripts, Tableau Workbooks, Image Files, Data Files, etc… Please feel free to take a look at projects I have listed and test them out for yourself!