Portfolio - Projects and Tools

This page contains all of the projects I have either personally built or built as part of a Hackathon team. All of these projects can be found on my Github page.

News Bias Detection with PyTorch and Tensorflow

This project focused on building several Neural Network models using PyTorch, Gensim, and Tensorflow that are capable of detecting bias in news articles. The project utilizes the "All the News" (2016) dataset from kaggle as is common in the literature. There are 144K articles from 15 different news sources in this dataset. The articles themselves do not come with a bias label so I had to generate them. Each article lists it's publisher so to assign the bias labels, I cross-referenced each publisher with it's corresponding bias label from mediabiasfactcheck. This source is also utilized heavily in the literature and is regarded as a reputable source for verifying news bias. So, each article received the bias label corresponding to it's publisher.

Data pre-processing involved generating feature sets based on Moral Foundation Theory, Information Theory and Semantic Analysis as detailed in my write-up of the project linked at the end of this post. This resulted in 39 distinct features for each article. Note that I did not perform Principle Component Analysis to identify the best features to use. This was left for future work. Generating these features largely involved utilizing word vectors and the GLoVE 4B words pre-trained word vector model as a backing distribution. As it turns out, the Gensim library provides excellent tools for loading, generating and working with word vectors. I recommend it to anyone who wants to get into natural language processing in the future. The only downside is that it does not support GPU acceleration, though there are unofficial extensions for the library that add it in.

The documentation for this project and it's implementation may be found at it's Github repository.

Senslify - Real Time Sensor Monitoring

This project was conducted as part of the IOT Collaborative during Summer 2019. Senslify is a fully asynchronous, real-time, sensor monitoring application. It is built in Python3 and utilizes WebSockets, aiohttp, and MongoDB. Senslify allows users to organize sensors into deployment groups, assign sensors unique identifiers, monitor sensor readings in real-time, view and download historical sensor data, view aggregated sensor statistics, and be alerted when a sensor reading exceeds a user-configurable threshold.

Senslify was built from the ground up in conjunction with Devendra Waikul as a part of his Computer Engineering Masters thesis at Case Western Reserve University. Senslify is an open-source project. All of the documentation for Senslify from the requirements engineering stage to the deployment and maintenance stage are freely available online. Furthermore, Senslify was built using fully open-source, royalty-free software and is licensed under the MIT open-source license. If you are interested in contributing to it's development, please fork it on Github and make a pull request.

This documentation for this project and it's implementation may be found at it's
Github repository.

Video Game Review Classification and Score Generation

This project focused on building two data processing pipelines that were capable of classifying game reviews and generating review scores all just from review text. This project utilized Stanford Core NLP and Google Bert as the classification and review score generation components of their respective pipelines. Data was politely sourced over the course of three days from Metacritic and consists of ~400K professional game reviews and ~600K user reviews.

Both review classification accuracy and score generation were approximately ~90% accurate. Both pipelines were trained using an 80/20 split of training/validation data. In both pipelines, data was processed in an identical fashion.

The documentation for this project and it's implementation may be found at it's Github repository.

Coursely - Full Stack Course Management Software

This project implements full-stack course management software from the ground up and was completed in approximately 300 hours. This software implements a web application that divides users into three groups each with their own set of permissions and use cases for the software: Administrators, Professors, Students. Administrators have unrestricted administrative access to the web application. As such, they are responsible for creating student and professor accounts, adding courses to the application, assigning advisors to advisees, archive/unarchive courses, ... Professors use the web application to schedule appointments with their advisees, list the courses they are scheduled to teach and their enrollment, assign grades to students, .... Students are able to enroll/drop classes by semester, view their grades by semester, see who their advisor is and when their appointments with them are, etc...

This project was implemented on with the following tech-stack: ASP.NET Webforms/Bootstrap, IIS, MS SQL Server. For simplicity, user accounts are stored in the same database as other site data, however all passwords are both hashed and salted in according with modern security standards. Courses may have prerequisites and co-requisites that the database design accounts for and the web application enforces when students attempt to enroll in courses. Professors may not view each others courses or advisees. Students may not view each others grades or courses. The full design of this project from requirements elicitation and analysis to maintenance and evolution is entirely open-source and freely viewable.

This documentation and implementation for this project may be found at it's
Github repository.

Ohio Safe Businesses

Built as part of the 24-hour MakeUC 2020 hackathon, Ohio Safe Businesses is a COVID-19 Compliance reporting tool for businesses across Ohio. This web application integrates with Google Maps and allows users to search for businesses using an integrated map and report their compliance with the 2019 state of Ohio COVID-19 guidelines on a scale of 1-10. Reviews are aggregated into an average score per guideline as well as an overall score for the establishment. These reviews can be used by potential customers to gauge whether they should patronize an establishment. They can also be used by business owners to gauge how well their employees adhere to the Ohio 2019 COVID-19 guidelines.

This project placed 3rd overall amongst a swathe of projects that incorporated machine learning and artificial intelligence into their applications. This project was constructed as a team effort between Nathan Dixon, Nicholas Cleary, and David Fu.

The documentation for this project and it's implementation may be found at it's Github repository.

Gutenberg Search

This project implements a full-text search engine on-top of the Project Gutenberg e-Text library. Search was implemented utilizing well-known metrics from Information Theory and was implemented utilizing the aggregation feature in MongoDB. Data was sourced from the the Project Gutenberg archive and preprocessed using a rather complex Python script. On a single node, the search function takes on average ~9s to complete. I did not have the hardware to shard my MongoDB database, but doing so would drastically improve user response time.

The documentation for this project and implementation can be found at it's Github repository.

TinyOS Python 3 Library

This project focused on porting the tinyos Python 2 library to Python 3. During my PhD study, I was tasked with building a sensor network using tinyos motes. I wanted to use Python for data collection and processing, but as it turned out, the official tinyos library for Python was only compatible with Python 2, so I set about porting it to Python 3. I first applied the Python tool 2to3 to figure in verbose mode to figure out what files from the library would change and where. I then applied the tool to automate most of the changes. Afterwards, I only had to make minor changes to several files to complete porting it. As it stands however, outside of my own application of it, it is largely untested. I consider this fine though as tinyos itself is not really utilized all that heavily today thanks to Arduino, RaspberryPi, etc... so my port doesn't see much use. This was an excellent teaching lesson for me in porting Python applications from Python version 2 to version 3.

The documentation and implementation for this project can be found at it's Github repository.

TinyOS HideMsg Application

This project focused on creating a inter-mote application that could be used to securely transfer arbitrary length data between tinyos motes. Sender and receiver node addresses are masked during transmission using a simple arithmetic function while data is masked using a chunking function. These mechanisms improve the overall anonymity of both the sender and receiver and keep the data payload secured on it's way to it's final destination. This project was proven secure in a graduate-level laboratory study.

The documentation and implementation for this project can be found at it's Github repository.