Which parameter is used to gradually reduce epsilon over time, causing exploration to be heavy early and exploitation later?

Unlock all questions

This demo includes only 20 questions. Upgrade to access hundreds of questions, flashcards, exam simulations, and disable ads.

Full question bankExam simulationsFlashcards

From $9.99Unlock all

Get ready for the GARP Risk and AI Exam with flashcards and multiple choice questions. Each question comes with hints and explanations. Prepare for success!

Multiple Choice

Which parameter is used to gradually reduce epsilon over time, causing exploration to be heavy early and exploitation later?

The shift from exploring to exploiting in an epsilon-greedy setup is controlled by a decay factor. In this approach, the agent chooses a random action with probability epsilon and a best-known action with probability 1 - epsilon. To favor exploration early and exploitation later, epsilon is reduced over time using a decay factor, often applied each episode or step: epsilon := max(epsilon_min, epsilon * decay_factor). The decay_factor is a number slightly less than 1, so it gradually lowers epsilon toward a minimum. A value near 1 means slow decay and longer exploration; a smaller value means faster decay and quicker reliance on learned estimates. The other terms describe the strategy itself (epsilon-greedy), the learning algorithm (Q-learning), or the broader field (deep reinforcement learning) rather than the mechanism that reduces epsilon over time.

Which parameter is used to gradually reduce epsilon over time, causing exploration to be heavy early and exploitation later?

Get ready for the GARP Risk and AI Exam with flashcards and multiple choice questions. Each question comes with hints and explanations. Prepare for success!

Which parameter is used to gradually reduce epsilon over time, causing exploration to be heavy early and exploitation later?

Get the latest from Examzify