Programming Dense Linear Algebra Kernels On Vectorized Architectures