Search

V3ALab

Peng Wang

Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps.
Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension
Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge
Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
FVQA: Fact-based visual question answering
Image Captioning and Visual Question Answering Based on Attributes and Their Related External Knowledge
Visual Question Answering: A Survey of Models and Datasets
Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning
Visual Question Answering with Memory-Augmented Networks
Explicit Knowledge-based Reasoning for Visual Question Answering
The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions
Ask Me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources

Published with V3ALab