There are 2 rounds of telephone interviews, each 45 minutes, with an interval of about a week, with Hiring Manager and a Senior data scientist.
Round 1: How would you measure the difficulty of a coursera course?
follow up question 1: If the variables of course participants/students are not included in the model, what bias will there be? selection bias
follow up question : If a course is newly rolled out and there is not a lot of data, how would you estimate the difficulty?
- Rebuild a model with historical data and include only these known variables. Focus on prerequisites, subjects, duration, keywords, etc.
- Find someone in academia to manually calibrate, such as professors, PhDs with rich teaching experience, etc.
The second round: A product manager came to ask you, what is the relationship between the length of the course and the purchase rate, how do you answer?
I answered same as before, which focuses on whether the dependent variable should be based on course level observation (for example, course A purchase rate of 4%, course B purchase rate of 2%) or customer level observation (customer A bought it is 1, not bought Yes 0) Put it into the model and the pros and cons of the two methods.