Query regarding visualization of attention

Thank you @gordicaleksa for the fantastic code and detailed documentation! It has helped me a lot in understanding the details of GAT. 
While looking at the visualization functions in the code - I understand that entropy is used because the softmax applied over the attention coefficients bring it into a range of [0, 1] - resembling a probability distribution. While obtaining the attention coefficients from the GAT layer in the code, you have used:

```
def visualize_entropy_histograms(model_name=r'gat_PPI_000000.pth', dataset_name=DatasetType.PPI.name):
    # Fetch the data we'll need to create visualizations
    all_nodes_unnormalized_scores, edge_index, node_labels, gat = gat_forward_pass(model_name, dataset_name)
```
all_nodes_unnormalized_scores comes from the GAT forward function:

```
out_nodes_features = self.skip_concat_bias(attentions_per_edge, in_nodes_features, out_nodes_features)
return (out_nodes_features, edge_index)
```

When reading the GAT paper (Petar Veliˇckovi ́c et al) - the attention coefficients obtained after softmax are used to obtain the final output node features from the GAT layer. In the GAT implementation:

`attentions_per_edge = self.neighborhood_aware_softmax(scores_per_edge, edge_index[self.trg_nodes_dim], num_of_nodes)`

the above function gives the attention coefficients in [0, 1] range. The subsequent functions (self.aggregate_neighbors and self.skip_concat_bias) will give the final node features from the GAT layer. So is the "all_nodes_unnormalized_scores" variable used in the entropy histogram visualization function still in the range [0, 1]? Or is the entropy histogram used to visualize the output node features and not the softmax-normalized attention coefficients? 

I also came across the entropy visualization in a DGL tutorial on GAT (https://docs.dgl.ai/en/0.4.x/tutorials/models/1_gnn/9_gat.html) and they were using the attention coefficients after softmax normalization for the visualization. Sorry if the question is very naive - I'm trying to apply this visualization to one of my projects involving inductive learning. Let me know if I have misunderstood the information being extracted from the GAT layer. Thanks in advance!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Query regarding visualization of attention #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Query regarding visualization of attention #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions