Laplacians and Middleschool

  |   Source

tl;dr: I visualize the connection between Graphs and their Laplacian Eigenvalues.

The below example is something I put together to supplement an extra credit assignment as part of my Applied Machine Learning for Engineering and Design course, though I'll post it here in case others find it useful.

In [771]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx
import numpy as np
import scipy
# Add a dark background
#sns.set_style('dark')
sns.set_context("poster")

A Relaxed Caveman graph basically just creates K groups of N members each, where each N member is connected to each other within its group, but to no-one outside of its group (this should look familiar to anyone who ever went to middleschool...). The "Relaxed part" controls whether these internal connects are randomly re-writen to other members at random (i.e., the nodes from one group will start to randomly connect with other groups as the probability increases.

In [772]:
number_of_clusters = 4
number_of_nodes_in_each_cluster = 5
probability_of_randomly_rewriting_edges = 0.01
G = nx.relaxed_caveman_graph(number_of_clusters,
                             number_of_nodes_in_each_cluster,
                             probability_of_randomly_rewriting_edges,
                             seed=53)
fig = plt.figure(figsize=(8,16))
ax = plt.subplot(2,1,1)
nx.draw(G,pos=nx.spring_layout(G, iterations=500, weight=500),
        linewidths=1, width=2, with_labels=True, ax=ax,
        node_color = sns.color_palette()[0], node_size=500)
plt.draw()

With this graph, G, (which, above, has number_of_clusters components), we can plot its adjacency matrix, showing which nodes are corrected to which other nodes:

In [773]:
A = nx.adjacency_matrix(G).todense()
plt.matshow(A,cmap = plt.cm.Blues)
Out[773]:
<matplotlib.image.AxesImage at 0x129b1eb0>

We can now construct something called the Graph Laplacian, which is just: $$L = D - A$$ where A is the adjacency matrix above, and D is the Degree Matrix of the graph. The Laplacian is just a matrix, so we can take that matrix and find its Eigenvalues, and then plot the eigenvalues from smallest to largest:

In [774]:
D = np.diag(list(nx.degree(G).values()))
L = D-A
# For undirected graphs, the Laplacian is real-symmetric
w = scipy.linalg.eigvalsh(L)
plt.plot(w,'o-')
plt.xlabel('Index of Eigenvalue')
plt.ylabel('Value of Eigenvalue')
Out[774]:
<matplotlib.text.Text at 0x14d191b0>

To help us understand the role of the laplacian here, let's take what we did above, and just put it in a function, so that we can look at different kinds of graphs and how things change.

In [775]:
def draw_relaxed_graph_and_spectrum(probability_of_randomly_rewriting_edges):
    number_of_clusters = 4
    number_of_nodes_in_each_cluster = 5
    G = nx.relaxed_caveman_graph(number_of_clusters,
                                 number_of_nodes_in_each_cluster,
                                 probability_of_randomly_rewriting_edges,
                                 seed=53)
    A = nx.adjacency_matrix(G).todense()
    D = np.diag(list(nx.degree(G).values()))
    L = D-A
    # Now Plot things
    fig = plt.figure(figsize=(16,8))
    fig.patch.set_alpha(0)
    ax1 = plt.subplot(1,2,1)
    ax2 = plt.subplot(1,2,2)
    # Plot the graph
    nx.draw(G,pos=nx.spring_layout(G, iterations=500, weight=1, k=.17),
            linewidths=1, width=2, ax=ax1,
            node_color = sns.color_palette()[0], node_size=500)
    # Plot the eigenvalues
    ax2.plot(scipy.linalg.eigvalsh(L),'o-')
    plt.xlabel('Index of Eigenvalue')
    plt.ylabel('Value of Eigenvalue')
    plt.draw()

Specifically, let's look at different Relaxed Caveman graphs as we increase the relaxation:

In [776]:
# Low Probability of Re-writing (aka Middle School)
draw_relaxed_graph_and_spectrum(0.01)
In [777]:
# Get some edges in there...
draw_relaxed_graph_and_spectrum(0.04)
In [778]:
# Some Re-writing (though still pretty clustered)
draw_relaxed_graph_and_spectrum(0.06)
In [779]:
# Now we're mixing!
draw_relaxed_graph_and_spectrum(0.2)
In [780]:
# Edges now equally likely to mix vs stay in the community
draw_relaxed_graph_and_spectrum(0.5)
In [781]:
# Completely random re-writing
draw_relaxed_graph_and_spectrum(1.0)

Questions for students:

  1. What is the connection between the graph and the eigenvalues of the Laplacian (in terms of the smaller eigenvalues)?
  2. What does it mean if an eigenvalue is zero?
  3. Describe one possible way you might use the above as a way to "discover" communities and clusters in data, even when the communities aren't completely separable.

Note, most libraries will have a function for what we just did above: nx.laplacian_spectrum(G)