Data Scientist Intern
Department: Decision Science
Analyze a distribution using a massive dataset. Made recommendations and implemented a system for dynamically selecting a cutoff for the attribute based on optimizing the performance of a system via a simulator.
Impact of work
Time spent working
How did you find the job / apply?
Basic programming questions: No actual coding, just talking through problems. No trick questions or brain teasers, some focused on practical situations like data contained on multiple servers, i.e. data-science focused problems Mid-level data science questions: How to handle missing attributes, finding bias in missing attributes (missing at random assumption), useful summary statistics (why use mean vs median, variance vs IQR, etc) Mid-level machine-learning questions: Recommending types of classifiers and regressors for specific data sets and applications (logistic regression vs neural net vs decision tree), analyzing performance of classifiers (accuracy vs false positive vs false negative), explainability of classifiers (e.g. decision tree more explainable than NN) Past experience: Describe previous work and projects, how data science or machine learning was used in them, experience with databases and distributed systems, research interests
Advice on how to prepare
Past experience or research in data science and machine learning a definite plus. Studying the fundamentals and more application-focused sections in any data science and machine learning texts would be helpful.
More questions? Send the reviewer a message!