作者用action, reward, state等当做lalbel,进行有监督训练。
黄世宇/Shiyu Huang's Personal Page:https://huangshiyu13.github.io/
Loss is its own Reward: Self-Supervision for Reinforcement Learning
阅读 79
2022-07-18
作者用action, reward, state等当做lalbel,进行有监督训练。
黄世宇/Shiyu Huang's Personal Page:https://huangshiyu13.github.io/
相关推荐
精彩评论(0)