Code Projects
Code projects on Github
Natural Language Processing
mainly from CSCI 5832 Natural Language Processing and previous research
- Text classification with XLNET, logistic regression, BI-LSTM/SVM with embeddings. The data including a hotel reviews and SemEval shared task 4 for ‘Don’t Petronize Me’ detection. The implementation with pytorch, keras, tensorflow, sklearn. [repo]
Fine-tuning models on downstream tasks such as extractive Q&A, Machine Translation, Mask Language Models and token classification, as well as a GPT-2 language model training from scratch with HuggingFace Transformers. [repo]
- Name Entity Recognition with token-based tagging(BIO/BIOE)
Machine Learning
mainly from CSCI 5622 Machine Learning
- Course Project: The bigger, the better? [Poster] [email me to access the code]
- compare the performance of image classification task on CIFAR-19 dataset between a vanilla CNN and a CNN including ResNet-50 pretrained models.
- The experiments show that after preprocessing like Deep-Image-Prior(DIP) on the nosiy data, the perfomance on a vanilla CNN can perform in the similar level to that on a complicated CNN leveraging ResNet-50 without denosing. The vanilla CNN trains faster, less weights.
- Also find variables such as optimizer, upsampling impact differently on different CNN.
- Kaggle competition: Bike shared count[Kaggle page] [email me to access the code]
- leverage Exploratory Data Analysis, feature engineering for handling data format features, feature transformation&normalization, and XGBoost with Parameter Gridsearch
- Rank #3/52 in Public leaderboard with R^2=0.935 and #9/52 in Private Leaderboard with R^2=0.939
- Implementation from scratch with numpy to mimic scikit-learn: KNN, Naive Bayes, Decision Tree, Bagging&Boosting, K-means, logistic regression, MLP[email me to access the code]
Computer Vision
Mainly from CSCI 5922 Neural Networks and Deeplearning
- MNIST recognition with scikit-learn code
- Cifar-10 image classification with Keras(using regularization L2/Batch Normilization/Drop out) code
- Viz-Wiz challenge: VQA for blind people, extracting questions and image features via BERT and VGG-16, then feed the flatten feature vector into a MLP. code
Software Development and Object-Oriented Design
Mainly from CSCI 5448 Object-Oriented Analysis and Design
- UMR writer: An Online annotation platform for UMR project, based on Flask framework, PostgresSQL as database and SQLalchemy as ORM, vanilla JS//Jquery/HTML/CSS as front-end, mainly focus on Events listener and DOM operation. The application is deployed at Heroku. [repo]
A music store simulation, including the functionality like placing an order, checking the inventory, sell the items, buying items and so on. Injecting design patterns into the code, including factory, strategy, decorator, command, observer, singleton. Design with class UML, state diagram and sequence diagram. The language we use Java.repo
- A online e-commerce shopping website development. Implementing with Django 2, we use JS and Jquery for front end and Django-admin for backstage management.repo