Skip to content

Query regarding visualization of attention #6

Open
@Sowmya-R-Krishnan

Description

@Sowmya-R-Krishnan

Thank you @gordicaleksa for the fantastic code and detailed documentation! It has helped me a lot in understanding the details of GAT.
While looking at the visualization functions in the code - I understand that entropy is used because the softmax applied over the attention coefficients bring it into a range of [0, 1] - resembling a probability distribution. While obtaining the attention coefficients from the GAT layer in the code, you have used:

def visualize_entropy_histograms(model_name=r'gat_PPI_000000.pth', dataset_name=DatasetType.PPI.name):
    # Fetch the data we'll need to create visualizations
    all_nodes_unnormalized_scores, edge_index, node_labels, gat = gat_forward_pass(model_name, dataset_name)

all_nodes_unnormalized_scores comes from the GAT forward function:

out_nodes_features = self.skip_concat_bias(attentions_per_edge, in_nodes_features, out_nodes_features)
return (out_nodes_features, edge_index)

When reading the GAT paper (Petar Veliˇckovi ́c et al) - the attention coefficients obtained after softmax are used to obtain the final output node features from the GAT layer. In the GAT implementation:

attentions_per_edge = self.neighborhood_aware_softmax(scores_per_edge, edge_index[self.trg_nodes_dim], num_of_nodes)

the above function gives the attention coefficients in [0, 1] range. The subsequent functions (self.aggregate_neighbors and self.skip_concat_bias) will give the final node features from the GAT layer. So is the "all_nodes_unnormalized_scores" variable used in the entropy histogram visualization function still in the range [0, 1]? Or is the entropy histogram used to visualize the output node features and not the softmax-normalized attention coefficients?

I also came across the entropy visualization in a DGL tutorial on GAT (https://docs.dgl.ai/en/0.4.x/tutorials/models/1_gnn/9_gat.html) and they were using the attention coefficients after softmax normalization for the visualization. Sorry if the question is very naive - I'm trying to apply this visualization to one of my projects involving inductive learning. Let me know if I have misunderstood the information being extracted from the GAT layer. Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions