-
Notifications
You must be signed in to change notification settings - Fork 7
Description
TL;DR how to compute the earth mover's distance between arbitrary vectors using DiffusionEMD?
I'm going through the two notebooks (Line and Swiss) as well as testing on my own data. Part of this relates to #4 and defining a function with similar input/output to e.g. sklearn distance metrics.
Let's say you have 10 points embedded in 3 dimensions. We'll call this data, and say that it has 10 rows and 3 columns.
data = np.random.randn(10, 3)
Based on the Jupyter notebooks, it seems like the first argument of DiffusionCheb() and DiffusionTree() are the adjacency matrices (I couldn't find any documentation on this otherwise).
The size of the adjacency matrix is the same as the number of vertices on a graph, so I believe this should be done by e.g.
adj = graphtools.Graph(data, use_pygsp=True).W
which produces a 10 x 10 matrix.
The Line and Swiss examples show ds.labels as the 2nd argument to DiffusionCheb(). For Line, I notice that self.labels = np.eye(N), whereas for the Swiss Roll self.labels = np.repeat(np.eye(n_distributions), n_points_per_distribution, axis=0). For the Line example, if there are 100 points, then labels is 100 x 100, which is just an identity matrix.
Assuming 5 points per distribution, this looks like:
1 0 0 ... 0
1 0 0 ... 0
1 0 0 ... 0
1 0 0 ... 0
1 0 0 ... 0
0 1 0 ... 0
0 1 0 ... 0
0 1 0 ... 0
0 1 0 ... 0
0 1 0 ... 0
0 0 1 ... 0
etc.So as the name implies, it's labeling each point by which distribution it came from. Assuming there are 50 distributions, naturally there will be 50 columns.
For the 10 x 3 data I mentioned, I think I have the adjacency matrix correct, but what about the label matrix? Assuming it's unlabeled data, is it just np.eye(data.shape[0])? Or is it something else, such as np.ones([data.shape[0],1])? If I'm understanding correctly, these two cases would suggest that each point comes from a separate distribution and that each point comes from the same distribution, respectively. What I'm hoping to calculate is the earth mover's distance between data[0] and data[1], etc.
Any help would be appreciated. Thanks!