MCTS 를 여전히? 사용하여 (완전 랜덤은 아니지만) random 한 게임을 진행하는 것은 맞다.
마파고까지의 프로그램들은 승률 계산을 위해 rollouts(=빠르게 MCTS를 돌리는 것) 을 사용하였는데, 승률 계산을 다른 영역에 맏기므로 rollout을 할 필요가 없다.
이런 결론이네요.
Compared to the MCTS in AlphaGo Fan and AlphaGo Lee, the principal differences are that AlphaGo Zero does not use any rollouts; it uses a single neural network instead of separate policy and value networks; leaf nodes are always expanded, rather than using dynamic expansion; each search thread simply waits for the neural network evaluation, rather than performing evaluation and backup asynchronously; and there is no tree policy. A transposition table was also used in the large (40 block, 40 day) instance of AlphaGo Zero.
MCTS 를 여전히? 사용하여 (완전 랜덤은 아니지만) random 한 게임을 진행하는 것은 맞다.
마파고까지의 프로그램들은 승률 계산을 위해 rollouts(=빠르게 MCTS를 돌리는 것) 을 사용하였는데, 승률 계산을 다른 영역에 맏기므로 rollout을 할 필요가 없다.
이런 결론이네요.
Compared to the MCTS in AlphaGo Fan and AlphaGo Lee, the principal differences are that AlphaGo Zero does not use any rollouts; it uses a single neural network instead of separate policy and value networks; leaf nodes are always expanded, rather than using dynamic expansion; each search thread simply waits for the neural network evaluation, rather than performing evaluation and backup asynchronously; and there is no tree policy. A transposition table was also used in the large (40 block, 40 day) instance of AlphaGo Zero.