There are 2 rounds of telephone interviews, each 45 minutes, with an interval of about a week, with Hiring Manager and a Senior data scientist.

* Round 1:* How would you measure the difficulty of a coursera course?

.

*follow up question*1: If the variables of course participants/students are not included in the model, what bias will there be? selection bias

*follow up question*: If a course is newly rolled out and there is not a lot of data, how would you estimate the difficulty?

- Rebuild a model with historical data and include only these known variables. Focus on prerequisites, subjects, duration, keywords, etc.
- Find someone in academia to manually calibrate, such as professors, PhDs with rich teaching experience, etc.

** The second round: A** product manager came to ask you, what is the relationship between the length of the course and the purchase rate, how do you answer?

I answered same as before, which focuses on whether the dependent variable should be based on course level observation (for example, course A purchase rate of 4%, course B purchase rate of 2%) or customer level observation (customer A bought it is 1, not bought Yes 0) Put it into the model and the pros and cons of the two methods.