Personal Portfolio Website

⊱ ⋅Projects⋅ ⊰

Explore my data science projects showcasing skills in machine learning, data analysis, and visualization. Each project demonstrates my approach to solving real-world problems with data. Hover over or click photo to view summary.

Featured Project

Predicting Diabetes Risk Through Deep Learning

I led the development of a deep learning solution to predict diabetes risk using uncorrelated health indicators. I evaluated both Feedforward Neural Networks (FNN) and Convolutional Neural Networks (CNN), utilizing Random Forest for feature selection. The FNN achieved 87.5% accuracy, outperforming CNN and demonstrating strong predictive power for independent health variables despite minor overfitting. This project showcases my skills in deep learning model evaluation, feature selection, and health data analysis.

View Project View Slides

Personal Portfolio Website for Data Science Projects
Applications: Responsive Web Design, Custom CSS Animations, SVG Integration, Mobile-First Layout

Responsive Personal Portfolio Website for Data Science Projects

Overview: Designed and developed a personal portfolio website from scratch to showcase data science projects, skills, and professional experience. Implemented a fully responsive layout optimized for both desktop and mobile viewing, with interactive navigation, section-based scrolling, and customized SVG elements.

Results: Successfully launched a fully responsive, visually cohesive portfolio site that effectively presents projects, skills, and professional background. Achieved enhanced user experience through smooth navigation, clear section organization, and mobile-optimized layouts while maintaining consistent branding across all devices.

View Website

Pima Indians Diabetes Predictive Analysis
Applications: XGBoost, SVM, MLPClassifier, Cross-Validation, Confusion Matrix

Pima Indians Diabetes Predictive Analysis

Overview: Conducted predictive analysis on the Pima Indians Diabetes dataset to detect early signs of diabetes using lab-derived indicators such as glucose, insulin, blood pressure, age, and BMI. Applied data mining pipeline steps including feature selection, correlation analysis, cross-validation, and multiple classification models to evaluate predictive performance. Results confirm that lab work and characteristic features can be used to identify predisposed individuals for early detection and intervention.

Results: Achieved 70.5% accuracy, 75.7% precision, and 77.2% F1-score using SVM, with Decision Tree and XGBoost further supporting that glucose, insulin, and blood pressure are key predictors of diabetes.

View Project

Monet GAN Image Generation
Applications: DCGAN, TPU, Image Augmentation

Monet GAN Image Generation

Overview: Trained a Deep Convolutional GAN (DCGAN) model using the "I'm Something of a Painter Myself" Kaggle dataset to generate Monet-style paintings from real-world photo inputs. Data preprocessing involved image augmentation, resizing, and scaling, with modeling supported by TPU acceleration for performance.

Results: Generated Monet-inspired image transformations, though full training was interrupted by TPU limitations. Despite this, Model demonstrated strong artistic potential.

View Project

Predicting Diabetes Risk Through Deep Learning
Applications: FNN, CNN, Random Forest (feature selection)

Predicting Diabetes Risk Through Deep Learning

Overview: Evaluated two deep learning architectures, the Feedforward Neural Network (FNN) and the Convolutional Neural Network (CNN), to predict diabetes risk from uncorrelated health indicators. Feature selection was done via Random Forest, focusing on Polyuria, Polydipsia, Gender, Sudden Weight Loss, and Partial Paresis.

Results: Achieved 87.5% accuracy with FNN, which outperformed CNN, indicating it is better fit to work with independent features. Despite signs of overfitting, FNN demonstrated strong potential in modeling diabetes risk from uncorrelated health variables.

View Project View Slides

Predicting Diabetes Risk Through Unsupervised Learning
Applications: K-Means, Hierarchical Clustering, PCA, NMF, Silhouette Score

Predicting Diabetes Risk Through Unsupervised Learning

Overview: Applied K-Means and Hierarchical Clustering on patient data to identify individuals at risk for diabetes without using labeled outcomes. Reduced dimensionality with NMF before clustering and evaluated model performance using accuracy, precision, confusion matrices, and silhouette scores.

Results: Achieved 81.7% accuracy and 96.3% precision with Hierarchical Clustering as well as a 80.5% accuracy and 96.7% precision for K-Means Clustering, indicating these are high potential methods for evaluating diabetes risks. However, low silhouette scores (<0.45) indicated weak intra-cluster similarity.

View Project View Slides

Predicting Diabetes Risk Through Supervised Learning
Applications: Logistic Regression, Decision Tree, Learning Curves, Confusion Matrix

Predicting Diabetes Risk Through Supervised Learning

Overview: Built two predictive models, a Logistic Regression and a Decision Tree Classifier, in order to identify diabetes risk, based on symptoms and health indicators such as: age, gender, polyuria, and partial paresis. Evaluated performance using accuracy, precision, confusion matrices, and learning curves.

Results: Achieved 94.2% accuracy with Decision Tree and 93.3% with Logistic Regression, demonstrating two models that could effectively predict diabetes risks.

View Project View Slides

Disaster Tweets Classification
Applications: CNN, RNN (LSTM)

Disaster Tweets Classification

Overview: Used NLP and deep learning to classify tweets as real or fake disaster alerts. Compared CNN vs. RNN (LSTM) deep learning models to classify tweets as real disaster-related posts or not. Used Keras for model design and Tokenizer for preprocessing.

Results: Achieved a Kaggle score of 0.56 with CNN, which slightly outperformed RNN. However, both models underperformed due to CNN overfitting and RNN failing to learn effectively. Future improvements include better hyperparameter tuning and more complex layer configurations.

View Project

NYPD Shooting Incident Data Analysis
Applications: Exploratory Data Analysis, Data Wrangling, Data Visualization

NYPD Shooting Incident Data Analysis

Overview: Analyzed 15 years of NYPD shooting incident data to investigate how borough location, race, and sex affect the likelihood of becoming a shooting victim in NYC. Built visualizations and performed trend modeling to assess patterns over time and across demographics.

Results: Found that sex was the most influential factor, with males consistently at higher risk, while race and location showed minimal predictive power. Despite Brooklyn and the Bronx having more shootings, these were proportional across boroughs and years.

View Project

Analysis of Covid-19 Death Rates by Continent
Applications: Exploratory Data Analysis, Data Wrangling

Analysis of Covid-19 Death Rates by Continent

Overview: Analyzed Covid-19 death rates across continents using April 2021 data to explore the relationship between death rates and various socioeconomic and health indicators: population density, extreme poverty, elderly population, hospital beds, life expectancy, cardiovascular death rates, and diabetes prevalence. Investigated trends through visualizations including univariate, bivariate, and multivariate analyses.

Results: Explored 7 variables across continents found that age (especially those 70 and over), diabetes prevalence, and extreme poverty were most correlated with Covid-19 death rates via correlation by differing continents.

View Project View Slides

Katherine Nguyen

Data Scientist | Data Analyst

⊱ ⋅About⋅ ⊰

Get to Know Me

Curious by Nature, Data by Choice

⊱ ⋅Projects⋅ ⊰

Featured Project

Predicting Diabetes Risk Through Deep Learning

⊱ ⋅Skills⋅ ⊰

Python

R

SQL

⊱ ⋅Certifications⋅ ⊰

IBM Data Analyst Certificate

Microsoft Power BI Data Analyst

Data Science Graduate Certificate

⊱ ⋅Work Experience⋅ ⊰

Evidence-Based Policy (EBP) Intern