
This portfolio is a collection of data analysis and machine learning projects I have created in an effort to hone my skills as a data scientist and machine learning engineer. Projects are divided into several broad categories below. It is recommended that guests view the projects in Jupyter nbviewer if given the choice; links are provided below each project title.
View this project on Jupyter nbviewer or GitHub
This project involves the analysis of colleges across all 50 states between the years 2006 and 2016. The analysis focuses on the financial status and employment status of students after graduation, as well as the distribution of degree types, based on geography.
View this project on Jupyter nbviewer or GitHub
This project involves the analysis of responses to the National Youth Tobacco Survey (NYTS) distributed by the Center for Disease Control (CDC). The analysis focuses on the rate of smoking amongst young adults in the United States and seeks to uncover whether or not the introduction of electronic cigarettes has really sparked a health crisis among the youth of our nation.
View this project on Jupyter nbviewer or GitHub
This project involves the analysis of raw text data in the form of reviews, written by users of www.drugs.com. The analysis focuses on exploring various characteristics of the brand names of the medications as well as the reviews of the drugs themselves. The main goals of the analysis involve finding ways in which drug manufacturers may improve upon the production quality and marketing of the medications they produce.
View this project on Jupyter nbviewer
This project involves the prediction of the critical temperature for various, potential, superconducting compounds using a Random Forest Regression model. The critical temperature for materials is nutoriously difficult to predict. The model created here may aid researchers in the inference of critical temperatures based on a compound’s chemical structure and characteristics.
View this project on Rpubs
Investors often rely on intuition and past experience in order to judge whether a start-up company will be worth investing in. This project aims to put forth a rigorous analysis that can be used to identify relevant factors indicative of a successful start-up company. In addition, a machine learning model is created to predict whether or not a company is likely to succeed based on an array of company characteristics and economic measures.
View this project on Jupyter nbviewer or GitHub
In this project, a predictive classification model is built with the goal of predicting whether or not a child will be diagnosed with Autism Spectrum Disorder (ASD). The process for diagnosing ASD is, currently, lengthy as well as costly. A statistical model may relieve some of the shortcomings of the current process making it more efficient and easier to implement.
View this project on Rpubs
Built a random forest model to predict likely customers from bank marketing data with 87% accuracy using over 30,000 rows of data collected by a Portuguese banking institution.
View this project on Rpubs
In this analysis, Parts-of-Speech Tagging and sentiment analysis using a Logistic Regression classification model are performed on Amazon Instant Video data in an effort to determine if marketing of the Amazon Instant Video service can be further improved leading to an increase in revenue.
View this project on Jupyter nbviewer or GitHub
In this project, a clustering analysis is performed on the gene expression data of over 20,000 genes and across five different types of cancers. In addition, a classification model is built with the goal of predicting the type of cancer present in a patient based on patterns in the expression of their genes. Advances in biology and medicine seem to stagnate as a result of overwhelming amounts of data needing to be analyzed. Machine learning can be implemented to compress this data down to smaller and more meaningful portions of information.
View this project on Jupyter nbviewer
In this project, the customers of a UK-based e-commerce retailer are clustered into groups based on their shopping behaviors and a Logistic Regression model is created to classify new customers based on their purchase history using data consisting of over 500,000 user transactions over the span of 2 years.
View this project on Jupyter nbviewer
In this project, an ARIMA model and Facebook’s Prophet model are used to forecast the future views of some of their most popular web pages with 95% accuracy.
View this project on my blog or Jupyter nbviewer
Utilized time series analytics and predictive modeling to forecast the future trajectory of several emerging tech trends, with 95% accuracy, utilizing an ARIMA model and over 10 years of Google Trends’ time series data.
View this project on Jupyter nbviewer or GitHub
This project explores the inner workings of Artificial Neural Networks. A simple neural network model is built from scratch and with the Keras package as well. Finally, it is turned into a classification agent for the recognition of hand written digits using the famed MNIST dataset.
View this project on Jupyter nbviewer or GitHub
This project probes the inner workings of Convolutional Neural Networks. How CNNs are structured and how they function is first discussed followed by the creation of two simple CNNS. The first model attempts to classify images of dogs and cats while the other distinguishes between three different types of tumors using brain MRI scans.
View this project on Jupyter nbviewer or GitHub
This post uncovers the secrets of the Recurrent Neural Network. The structure and functions of the model are initially discussed folowed by a working example of an RNN in action. The goal of the RNN that is built is to predict the future open prices of Google’s stock, for January of 2017, using the daily open price, close price, high, low, and volume of the stock between the years of 2012 and 2017.