Projects

1. Recommendation System using Collaborative Filtering: Using Yelp dataset, a recommendation system was made based on the user's past rating history, specifically for the restaurants of Toronto. The programming was done using Python and implementation required knowledge of Artificial Neural Networks.

2. Churn Predictions for a Business Entity using Rossman Kaggle Dataset: Classification of customers, based on their activity data and providing them discounts according to their churn probability. Ensemble classifiers and Naive Bayes classifiers are used to compare the net accuracy received with respect to simple logistic regression. ROC-AUC score and other intricate accuracy metrics like precision, recall, f1-score are used for analysis of models.

3. Detecting Heart Disease Prone Patients using Classification Techniques: Cleveland dataset comprises of data on patients who may or may not have heart diseases. A classification with less accuracy but high positively classifying property, might help prevent heart diseases in the majority patients. Such a classifier is modelled using simple Logistic regression with changeable threshold.

4. Analysing particulate matter as a possible carrier of COVID-19 virus using Machine Learning(Ongoing): Weather and Air Quality data of Delhi suburbs are obtained and used for finding the Pearson correlation coefficient between the particulate matter and covid spread. The data is trained on Random Forests and Gradient boosting regressor with hyper tuning parameters and the data so obtained is used to calculate the indoor mortality rate increment due to outdoor particulate matter infiltration.

5. Causal Inference Analysis of Data using BERT Classifier(Ongoing): The BERT base is a powerful data classifier and has been used here, to identify statements with causality associated with them. The annotated data is visualised to obtain insights into words which have impact or causality associated to them are unique. Topic Modelling is carried out to classify the data in certain topic distributions. The dataset is found to be imbalanced and is rectified by data augmentation using language translation techniques. Finally, the stemmed data is trained on the pretrained BERT classifier. The classifier provided an accuracy of 94% in detecting causality on the explicitly populated data.