Dhruvil Dave

dxd210049@utdallas.edu ; Kaggle ; LinkedIn ; +1 551-888-6315

Education

University of Texas at DallasDallas, Texas, USA
M.S.May 2024
  • Major: Business Analytics - Data Science concentration
  • Minor: Applied Machine Learning
  • Awarded Dean's Excellence Scholarship
Ahmedabad UniversityAhmedabad, Gujarat, India
B.Tech.May 2022
  • Additional coursework:
    • Completed Data Engineering, Big Data, and Machine Learning on Google Cloud Platform Specialization (Coursera)
    • Introduction to Database Engineering (Udemy)

Technical Skills

Skills
  • Data Science and Machine Learning: Python, R, PostgreSQL, Tidyverse, Numpy, Pandas, Apache Spark, Apache Cassandra, Puppeteer, Google BigQuery, Plotly, Seaborn, Docker
  • Web Development: Golang, TypeScript, C, Linux, Node.js, Deno, Bash, HTML, CSS, Next.js
Interests
  • Advanced Sanskrit. Conversational Gujarati and Hindi. Pursuing a degree in Indian Classical Music in Vocals and Harmonium and like exploring philosophies and various aspects of history and spend time playing football.

Experience

Ahmedabad UniversityAhmedabad, Gujarat, India
Teaching AssistantJul '21 - Jun '22
  • Working as a teaching assistant for Advanced Statistics, Computer Networks, and Operating Systems, course for handling coursework and teaching a class of 150 students
PedalsUpAhmedabad, Gujarat, India
Backend Golang InternOct '20 - Nov '20
  • Facilitated in porting the Continuous Integration pipeline using Golang to speed up builds by 6 times from 1 hour to 11 minutes. Coordinated in moving the production database of 10 million records to PostgreSQL from SQLite with zero downtime on Docker

Projects

Kaggle Notebooks and Datasets Master
  • Ranked in top 300 in Notebooks category and 30 in Datasets category globally amongst more than a million users on the platform.
  • Secured 5th position in Song Popularity Contest. Participated in various Machine Learning and Data Science competitions.
  • Improved skills like Data Preprocessing, Model testing, Statistical Modelling, Hypothesis Testing, A/B Testing, and Feature Engineering.
  • Published over 15 datasets and 20 articles, analysis, and tutorial notebooks.
Spotify Charts Dataset
  • A complete dataset of all “Top 200” and “Viral 50” charts published by Spotify of daily statistics of the top tracks on the platform. The data was scraped, processed, and curated completely from scratch. An entire pipeline was written to ensure smooth and fast ingestion and parsing of 40 GB data with approximately 26 million rows and updating it daily and hosted over Kaggle and BigQuery and creating PostgreSQL dumps using Apache Spark, TypeScript and Golang
Wikibooks Dataset
  • Created a dataset of complete dump of Wikibooks in 12 languages. The dataset contains all the pages of all the chapters of books in 12 languages along with metadata like title, abstract, and body in text and HTML with size over 12 GB. Hosted on Kaggle, this dataset has been downloaded over 3,000+ times and even got selected as a research dataset by various institutes and universities
Warehouse Storage Optimization
  • Developed a warehouse optimization system as a part of Machine Learning course to understand the workings of clustering and classification algorithms using Scikit-Learn, Pandas, and Optuna

Leadership Experience

  • IEEE Ahmedabad University Student Branch
  • Red-Black Decision Tree Machine Learning Group

Publications

  • Unicode Aware Sanskrit Transliteration: UAST (https://arxiv.org/abs/2203.14277)

Additional Information

Eligibility
    Eligible to work in USA for internships and full-time employment for up to 36 months without sponsorship