Learning Semantic Information from Multimodal Data using Deep Neural Networks