Which learning method updates estimates after an entire episode has finished?

Get ready for the GARP Risk and AI Exam with flashcards and multiple choice questions. Each question comes with hints and explanations. Prepare for success!

Multiple Choice

Which learning method updates estimates after an entire episode has finished?

Explanation:
Monte Carlo methods update estimates after an entire episode because they rely on the actual total return observed from the start to the episode’s end. You wait until the episode finishes, compute the return Gt for each visited state (the sum of discounted rewards from that point forward), and then update the value estimates toward those realized returns. This makes the method sample-based and non-bootstrapping, meaning it uses the complete outcome of an episode rather than an estimate of future rewards. In contrast, temporal-difference methods update after each step, using a bootstrap target that combines the immediate reward with an estimate of the value of the next state. Q-learning is a TD method that updates per step using the observed reward plus the discounted best value of the next state. Deep reinforcement learning covers many techniques, but the characteristic described—updating only after the episode ends—is what defines Monte Carlo updates.

Monte Carlo methods update estimates after an entire episode because they rely on the actual total return observed from the start to the episode’s end. You wait until the episode finishes, compute the return Gt for each visited state (the sum of discounted rewards from that point forward), and then update the value estimates toward those realized returns. This makes the method sample-based and non-bootstrapping, meaning it uses the complete outcome of an episode rather than an estimate of future rewards.

In contrast, temporal-difference methods update after each step, using a bootstrap target that combines the immediate reward with an estimate of the value of the next state. Q-learning is a TD method that updates per step using the observed reward plus the discounted best value of the next state. Deep reinforcement learning covers many techniques, but the characteristic described—updating only after the episode ends—is what defines Monte Carlo updates.

Subscribe

Get the latest from Examzify

You can unsubscribe at any time. Read our privacy policy