Utilizing negative policy information to accelerate reinforcement learning