Reinforcement learning with supervision beyond environmental rewards