Polya × Drivetrain

Understanding the Problem

First.

Define the Objective

You have to understand what you are trying to achieve. What is the goal? Whose goal is it?

You have to understand the problem.

Be precise. A goal stated as "build a good recommendation engine" is not a goal. It is a description of a tool. What does success look like for the person using your product? What specific outcome are you optimizing for?

What is the unknown? What is the condition? Is it possible to satisfy the condition? Is the condition sufficient to determine the unknown? Or is it insufficient? Or redundant? Or contradictory?

Separate the various parts of the condition. Can you write them down?

If you find yourself describing a model or a prediction, stop. What action should that prediction enable? The objective is the action, not the prediction.

Can you state the objective in one sentence, without reference to algorithms, models, or data?

Identify the Levers

Find what you can control to influence the objective. What knobs can you turn? What inputs to the system are under your command?

Separate the levers you control from the variables you cannot control. These are inputs to your models, but you cannot turn them.

Are there levers you have not considered? Do not limit yourself to the levers that are obvious or traditional.

Draw a figure. Introduce suitable notation.

Gather the Data

What are the data?

Determine what new data you need. Data you do not already have. What information would allow you to connect each lever to the objective?

Do not assume existing data is sufficient. The data you have was likely collected for a different purpose. What would you measure if you could measure anything?

Can you run a randomized experiment? What is the cost of not knowing?

Update the figure using suitable notation.

Devising a Plan

Second.

Find the connection between the data and the unknown.

You may be obliged to consider auxiliary problems if an immediate connection cannot be found.

You should obtain eventually a plan of the solution.

Have you seen it before? Or have you seen the same problem in a slightly different form?

Do you know of a related problem? Do you know a theorem that could be useful?

Look at the unknown! And try to think of a familiar problem having the same or a similar unknown.

Here is a problem related to yours and solved before. Could you use it? Could you use its result? Could you use its method?Should you introduce some auxiliary element in order to make its use possible?

Could you restate the problem? Could you restate it still differently? Go back to definitions.

If you cannot solve the proposed problem, try to solve first some related problem. Could you imagine a more accessable related problem? A more general problem? A more special problem? An analogous problem? Could you solve a part of the problem? Keep only a part of the condition, drop the other part; How far is the unknown then determined, how can it vary? Could you derive something useful from the data? Could you think of other data appropriate to determine the unknown? Could you change the unknown or the data, or both if necessary, so that the new unknown and the new data are nearer to each other?

Did you use all the data? Did you use the whole condition? Have you taken into account all essential notions involved in the problem?

Plan the Models

Plan not one model, but a model assembly line. Three machines, in sequence:

4a. The Modeler. What predictive models do you need? What is the causal relationship between each lever and the objective? Plan models that predict outcomes as a function of the levers you can pull.

Do not stop at one model. The modeler is a collection of component models, each capturing a different part of the system. Could you introduce an auxiliary model?

4b. The Simulator. Can you ask "what if"? Plan to feed the modeler a wide range of lever combinations. Explore the entire surface of possible outcomes, not just one slice. The simulator reveals the distribution of outcomes, including the catastrophic ones.

Does your simulator expose the shape of the problem? Can you see where the cliffs are? Where the plateaus lie?

4c. The Optimizer. Where is the peak? Search across the surface the simulator generated. Find the combination of levers that best achieves the objective, subject to your constraints.

But also: what are you avoiding? The optimizer should identify not only the optimal outcome, but also the disastrous ones.

Carrying out the Plan

Third.

Carry out your plan.

Carrying out your plan of the solution, check each step. Can you see clearly that this step is correct? Can you prove that it is correct?

Does the optimal lever combination make intuitive sense? If not, is your objective wrong, or are your models wrong? Revisit earlier steps.

Looking Back

Fourth.

Examine the solution obtained.

Can you check the result? Can you check the argument?

Can you derive the result differently? Can you see it at a glance?

Can you use the result, or the method, for some other problem?

Did you start with the objective, or did you start with a model and retrofit a goal onto it?

What levers did you overlook? Return to Step 2. Are there controls you dismissed as impractical that technology now makes feasible?

Could you collect better data? The first version of your product will reveal gaps. What would you measure differently next time?

Can you apply this same structure, Objective → Levers → Data → Models, to a different problem?