Bringing It All Together#
In this introductory chapter alone, we’ve already covered a substantial amount of material. We’ve discussed the importance of problem articulation, the idea that the way data scientists solve problems is by answering questions, and the three types of questions data scientists are likely to encounter.
It’s easy to see how this framework might result in a sequential development of a project. First, a hospital comes to you concerned about the cost of surgical complications. So you:
Work with them to more clearly define the problem (“Surgical complications are extremely costly to the hospital and harm patients. We want to reduce these complications in the most cost-effective manner possible.”)
You answer some Exploratory Questions (“Are all surgical complications equally costly, or are there some we should be most concerned about?”).
You develop a model to answer a Passive Prediction Question (“Given data in patient charts, can we predict which patients are most likely to experience complications?”) so the hospital can marshal its limited nursing resources more effectively.
The hospital then comes back to you to ask the Causal Question “Would a new program of post-discharge nurse home visits for patients identified as being at high risk of complications reduce complications?”
In reality, however, while it is important that some steps come before others (if you don’t start by defining your problem, where do you even start?), real projects are never so linear. The reality is that you will constantly find yourself moving back and forth between different types of questions, using new insights gained from answering one question to refine your problem statement and articulate new questions.
Nevertheless, by using this framework as a starting point, and using this taxonomy to help you recognize (a) the type of question you are asking, and (b) the reason you are seeking to answer a given question even when iterating through a project, you will see tremendous gains in your ability to please your stakeholders by staying focused on the problems they need addressed.
Reading Reflection Questions#
At the end of many readings in this book you will find a set of “reading reflection questions.” As the name implies, these are questions meant to help readers reflect on what they’ve read, as well as to draw attention to key points from the chapter.
What is the purpose of this book? What problem in data science education does it aim to address?
What is the most important task for a data scientist hoping to successfully help their stakeholder?
In the view of this book, all data science tools are tools for doing what? Do you agree?
What are the three types of questions a data scientist is likely to encounter? What is the primary purpose of each type of question?
Does one always move through the questions presented here in the same order?