About Me

A full-time student and part-time chef (only in my own kitchen), who loves hosting dinners that somehow end up with everyone overstuffed and overly impressed. Born into a family where music flows through our veins, I grew up surrounded by melodies that shaped my path—from family jam sessions to my current role at Bose, where I merge my passion for sound with cutting-edge technology.

When I’m not juggling assignments or diving into data science projects in the media industry, you’ll find me soaking up the sun, hitting the beach, or attempting to pet every cat and dog I spot on the street. A globetrotter at heart, I once managed to visit four countries in a single month—I still claim it was for “research purposes.”

And Sundays? Those are sacred. They’re reserved for my unique brand of therapy: grocery shopping at Trader Joe’s followed by intense gaming sessions, all while curating the perfect soundtrack, of course. Life’s all about finding harmony in every sense of the word, right?

Education

Northeastern University, Boston, MA
Master of Science in Data Science (Graduated May 2025)
Graduate Teaching Assistant, Khoury College of Computer Science
- Fall 2023: TA for DS 2000 Programming with Data, assisting 700+ students with foundational programming concepts, debugging, and solving real-world data challenges using Python.
- Fall 2024: TA for DS 3000 Advanced Programming with Data, guiding students through advanced topics such as APIs, data visualization, and efficient data manipulation.
- Mentored students during office hours and project consultations, enhancing their problem-solving skills and my ability to communicate complex concepts effectively.
- Collaborated with faculty on assignment design and grading, improving organizational and time management skills.
Relevant Courses: LLM Agents, Supervised & Unsupervised Machine Learning, Deep Learning, NLP, Database Systems, Data Processing & Visualization, Algorithms
Vellore Institute of Technology, Vellore, India
Bachelor of Technology in Computer Science and Engineering (June 2016 – June 2020)
Relevant Courses: Probability Theory, Linear Algebra, Statistics, Calculus, Differential Equations, AI, Data Mining

Work Experience

Bose Corporation (Boston, MA)

Data Scientist NLP Intern (January 2024 – August 2024)
Key Focus Areas:
LLM Optimization (GPT-4, Llama 3.1), RAG Systems (AWS Kendra), NLP Pipelines (BERTopic, RoBERTa), Streamlit Development

End-to-End ML Pipeline: Implemented unsupervised topic clustering to identify 120 key return drivers from customer feedback. Trained & deployed a RoBERTa model on AWS for multi-label classification F1 Score 0.92, integrating Spark for data preprocessing, model inference, & result cleaning.
AI Interview Automation: Engineered OpenAI-powered assistant with Streamlit-Snowflake backend automating 200+ daily interviews, reducing candidate dropout by 30% while saving 50+ weekly hours
Enterprise Chatbot: Deployed AWS Kendra RAG system handling 2K+ daily interactions at 99.8% uptime, validated through red team testing and guardrail implementation
Cost-Optimized LLM Ops: Transitioned sentiment analysis from GPT-4→Llama 3.1 via A/B testing, achieving 97% cost reduction while maintaining performance on 2M+ records
GenAI Analytics: Automated product review analysis using Claude Sonnet/AWS Bedrock, enabling 65% faster insights via Streamlit dashboard with 98% accuracy

West Pharmaceutical Services (Bangalore, IN)

Data Scientist (November 2020 – December 2022)
Key Focus Areas:
Computer Vision (XceptionNet), Cloud Deployment (Azure), Production ML, Data Visualization

Defect Detection: Architected XceptionNet model identifying 15 defect classes (92% accuracy, 37% improvement vs manual), deployed via Docker/Azure
Production Analytics: Built Power BI dashboards enabling 43% faster issue identification through visual trend analysis
Document Intelligence: Implemented BERT/ALBERT/RoBERTa pipelines improving document-keyword mapping by 23%

Associate Data Scientist (January 2020 – November 2020)
Key Focus Areas:
ETL Pipelines (Azure Data Factory), NLP (BERT/ALBERT), Data Storytelling

Enterprise Data Integration: Designed Azure Data Factory pipelines ingesting/transforming documents from 10+ sources, reducing data prep time by 35%
Document Intelligence: Implemented transformer-based NLP workflows improving document-keyword mapping by 23%, presenting insights through weekly stakeholder dashboards
Process Optimization: Migrated legacy manual workflows to automated ETL/NLP pipelines, collaborating with 5+ factory teams to ensure operational alignment

Info Origin (Bangalore, IN)

Data Scientist Intern (May 2018 – August 2018)
Key Focus Areas:
Time Series Forecasting, ETL Development, Data Visualization

Workplace Analytics: Built ETL pipeline for 10K+ employee conference room data, enabling ARIMA forecasts (R²:0.90) with Tableau dashboards tracking utilization trends
Stakeholder Communication: Presented forecasting insights through weekly visual reports, helping facilities team optimize room allocation and reduce scheduling conflicts by 28%
Process Documentation: Created technical manuals for ARIMA model deployment, enabling knowledge transfer to operations team

Projects

Visual Mood-Based Music Recommendation (In progress)

Leveraged LLaVA for vision-language understanding, retrieved mood-aligned songs via RAG with contrastive learning on audio embeddings, and integrated personalized Spotify playlist filtering.
Tools: LLaVA, RAG, Contrastive Learning, Spotify API, Streamlit

Automating Job Applications (October 2024) Live App

Built an LLM agent to scrape job descriptions, match resume embeddings, and generate tailored resumes/cover letters.
Tools: OpenAI Embedding API, Groq Llama 3.1 API, Streamlit

This Week in Football (September 2024)

Developed a local LLM-powered agent to summarize and classify Reddit football data into short audio/text insights.
Tools: Reddit API, LangChain, Qwen 2.5, Llama 3.1, PostgreSQL

ArguSense: Argument Essay Evaluation (September 2023)

Developed an NLP pipeline using Longformers and BERT to classify argument structures in essays. Integrated MLflow in AWS for scalable model tracking, versioning, and deployment with containerization.
Tools: Streamlit, HuggingFace, Named Entity Recognition, Longformers, BERT, MLFlow, AWS, Git LFS

Football Match Analysis and Predictions (June 2023) Live App

Deployed a Streamlit app for interactive analysis and visualization of 500+ football matches, integrating an ensemble model with Markov Chains, XGBoost, and Logistic Regression to predict the “Expected Threat” metric.
Tools: Streamlit, Python, StatsBomb API, OpenAI API, Postgresql

Achievements

Most Impactful Business Project Award - Bose Hackathon: Developed a semantic search retrieval system for 1M+ records.

Technical Skills

Programming: Python, R, SQL, C#
Data Science: Machine Learning, NLP, Deep Learning, Feature Engineering
Libraries: NumPy, Pandas, Scikit-learn, TensorFlow, PyTorch, LangChain, HuggingFace
Platforms: AWS, Microsoft Azure, Snowflake, Databricks
Other: Docker, Kubernetes, LLMOps, Streamlit

Contact

GitHub: github.com/PratikHotchandani22
LinkedIn: linkedin.com/in/pratik-hotchandani
Email: pratikhotchandani22@gmail.com, hotchandani.p@northeastern.edu