A Dive into Vision-Language Models | Textpad