Reach out for questions at [email protected]

Intro:

The two-step task is a classical way to distinguish model-based planning from model-free reinforcement. These two prototypes represent separate components of human learning and decision-making (but, see Extra reading - 1). Model-based planning differs from model-free reinforcement (which you learned before) in that it considers an internal representation of the environment to guide its learning and decision processes. Most commonly, this representation would be the “state transitions”, which is how each action leads you from one state in the world to another (but, see Extra reading - 2). You can therefore imagine a Model-based agent as an extension of previous models, where the agent not only acts and learns but also plans future actions based on some structural knowledge it has about the environment.

Key Concepts

Task explanation:

The two-step task allows us to disassociate the model-based (MB) planning component from the model-free (MF) reinforcement. It does so by introducing another stage before the one you are familiar with. The decision in the first stage probabilistically determines the state to which the participant will get for the second stage. The second stage choice is similar to the one you previously saw, where each arm leads to a binary reward with some drifting probability.

https://www.cell.com/neuron/fulltext/S0896-6273(11)00125-5?_returnURL=https://linkinghub.elsevier.com/retrieve/pii/S0896627311001255?showall=true

Untitled

In this classical task by Daw et al., 2011, the first stage is marked in green and shows two Thai letters (unfamiliar to participants). After choosing in the first stage, participants will arrive at either the pink or the blue state of the second stage. Importantly, the “Model” here is defined by the state transition probabilities. That is, choosing the left letter in the first stage will lead participants with a 70% chance to the pink state of the second stage and 30% to the blue state, and vice versa for choosing the right letter in the green state. Decisions in the second stage are followed by a probabilistic binary reward.

Untitled

The empirical results show participants tend to combine both model-free reinforcement with model-based planning (but, see Extra reading - 3). Panel A illustrates a prototypical MF-only agent. The y-axis indicates the agent’s tendency to repeat its first-stage choice depending on two factors. The x-axis is whether the previous trial was rewarded (i.e., whether the previous second choice resulted in a golden coin or not), and the colors represent whether the transition between the first to the second stage was the common one (70%) or the rare one (30%).

(A) The simulation results for the MF-only agent clearly show a reward main effect such that a previously rewarded trial increased its tendency to repeat or “stay” with the same first-stage choice. However, as a model-free agent has no representation of the transition structure it is indifferent to whether the previous trial had a common or a rare transition.

(B) On the contrary, an MB-only agent shows an interaction effect such that a previous reward increased its tendency to stay with the same first-stage choice when it followed a common transition but decreased it had the previous transition was a rare one. This results from the incorporation of state transition knowledge into the model.

(C) Empirical data supports the existence of both these effects. Namely, we see both an increased overall tendency to stay following a reward (MF - reward main effect) as well as a dependency on the previous transition (MB - interaction effect).

Model explanation: