Techniques for enabling GPU code generation of low-level optimizations and dynamic parallelism from high-level abstractions