Monday, September 6, 2021

How can we visualize attention?

 A nice and recent paper from Lior Wolf's lab at Tel Aviv University: https://arxiv.org/pdf/2103.15679.pdf by Hila Chefer, Shir Gur and Lior Wolf. The problem is very simple: given a transformer encoder/ decoder network, we would like to visualize the affect of attention on the image. While the problem is simple the answer is pretty complicated: we need to take into account attention matrices from mutliple layers at once. The paper suggests an iterative way to add up all those attention layers into one coherent image.

Figure 4 shows that the result is very compelling vs. previous art: 

top row is the new paper and bottom row is work for comparison. 

No comments:

Post a Comment