High Performance Heterogeneous Acceleration: Exploiting Data Parallelism And Beyond