Laplacians and Middleschool
tl;dr: I visualize the connection between Graphs and their Laplacian Eigenvalues.
The below example is something I put together to supplement an extra credit assignment as part of my Applied Machine Learning for Engineering and Design course, though I'll post it here in case others find it useful.
%matplotlib inline import matplotlib.pyplot as plt import seaborn as sns import networkx as nx import numpy as np import scipy # Add a dark background #sns.set_style('dark') sns.set_context("poster")
A Relaxed Caveman graph basically just creates K groups of N members each, where each N member is connected to each other within its group, but to no-one outside of its group (this should look familiar to anyone who ever went to middleschool...). The "Relaxed part" controls whether these internal connects are randomly re-writen to other members at random (i.e., the nodes from one group will start to randomly connect with other groups as the probability increases.
number_of_clusters = 4 number_of_nodes_in_each_cluster = 5 probability_of_randomly_rewriting_edges = 0.01 G = nx.relaxed_caveman_graph(number_of_clusters, number_of_nodes_in_each_cluster, probability_of_randomly_rewriting_edges, seed=53) fig = plt.figure(figsize=(8,16)) ax = plt.subplot(2,1,1) nx.draw(G,pos=nx.spring_layout(G, iterations=500, weight=500), linewidths=1, width=2, with_labels=True, ax=ax, node_color = sns.color_palette(), node_size=500) plt.draw()
With this graph, G, (which, above, has
number_of_clusters components), we can plot its adjacency matrix, showing which nodes are corrected to which other nodes:
A = nx.adjacency_matrix(G).todense() plt.matshow(A,cmap = plt.cm.Blues)
<matplotlib.image.AxesImage at 0x129b1eb0>
We can now construct something called the Graph Laplacian, which is just: $$L = D - A$$ where A is the adjacency matrix above, and D is the Degree Matrix of the graph. The Laplacian is just a matrix, so we can take that matrix and find its Eigenvalues, and then plot the eigenvalues from smallest to largest:
D = np.diag(list(nx.degree(G).values())) L = D-A # For undirected graphs, the Laplacian is real-symmetric w = scipy.linalg.eigvalsh(L) plt.plot(w,'o-') plt.xlabel('Index of Eigenvalue') plt.ylabel('Value of Eigenvalue')
<matplotlib.text.Text at 0x14d191b0>
To help us understand the role of the laplacian here, let's take what we did above, and just put it in a function, so that we can look at different kinds of graphs and how things change.
def draw_relaxed_graph_and_spectrum(probability_of_randomly_rewriting_edges): number_of_clusters = 4 number_of_nodes_in_each_cluster = 5 G = nx.relaxed_caveman_graph(number_of_clusters, number_of_nodes_in_each_cluster, probability_of_randomly_rewriting_edges, seed=53) A = nx.adjacency_matrix(G).todense() D = np.diag(list(nx.degree(G).values())) L = D-A # Now Plot things fig = plt.figure(figsize=(16,8)) fig.patch.set_alpha(0) ax1 = plt.subplot(1,2,1) ax2 = plt.subplot(1,2,2) # Plot the graph nx.draw(G,pos=nx.spring_layout(G, iterations=500, weight=1, k=.17), linewidths=1, width=2, ax=ax1, node_color = sns.color_palette(), node_size=500) # Plot the eigenvalues ax2.plot(scipy.linalg.eigvalsh(L),'o-') plt.xlabel('Index of Eigenvalue') plt.ylabel('Value of Eigenvalue') plt.draw()
Specifically, let's look at different Relaxed Caveman graphs as we increase the relaxation:
# Low Probability of Re-writing (aka Middle School) draw_relaxed_graph_and_spectrum(0.01)
# Get some edges in there... draw_relaxed_graph_and_spectrum(0.04)
# Some Re-writing (though still pretty clustered) draw_relaxed_graph_and_spectrum(0.06)
# Now we're mixing! draw_relaxed_graph_and_spectrum(0.2)
# Edges now equally likely to mix vs stay in the community draw_relaxed_graph_and_spectrum(0.5)
# Completely random re-writing draw_relaxed_graph_and_spectrum(1.0)
Questions for students:¶
- What is the connection between the graph and the eigenvalues of the Laplacian (in terms of the smaller eigenvalues)?
- What does it mean if an eigenvalue is zero?
- Describe one possible way you might use the above as a way to "discover" communities and clusters in data, even when the communities aren't completely separable.
Note, most libraries will have a function for what we just did above: