Visual Question Answering Through Adversarial Learning of Multi-modal Representation