Some issues in Tutorial_Multidimensional_Motif_Discovery and MDL

I have been looking for different documents to better understand MDL, and I came across [this tutorial notebook](https://github.com/TDAmeritrade/stumpy/blob/main/docs/Tutorial_Multidimensional_Motif_Discovery.ipynb) which explains Multidimensional Motif Discovery.  I discovered a few issues:

## (1) The locations of co-motifs do not match 
According to Fig. 2 in [Matrix Profile VI](https://www.cs.ucr.edu/~eamonn/Motif_Discovery_ICDM.pdf), the locations of motifs in the first two dimensions are the same. Personally, I call it co-motifs, i.e. motif pair `(A, A')` in one dimension and `(B, B')` in another dimension starts at the same index. (Also: see Definition 11).

The toy data provided in the notebook, however, does not result in matching indices for the motifs in the first two dimensions.

## (2) When I set `normalize=True` everything is good. But, if I set the last value of time series in the last dimension to `1000`, I get inconclusive result if I use MDL. 

```
# df is the toy data used in the tutorial notebook

df.iloc[-1, 2] = 1000

normalize=True
mps, indices = stumpy.mstump(df, m, normalize=normalize)
motifs_idx = np.argmin(mps, axis=1)
nn_idx = indices[np.arange(len(motifs_idx)), motifs_idx]

mdls, subspaces = stumpy.mdl(df, m, motifs_idx, nn_idx, normalize=normalize)
```
And, I will see this plot when I want to visualize the MDL results:

<img width="987" alt="image" src="https://github.com/TDAmeritrade/stumpy/assets/38519522/7d33aecf-d844-4d4d-a937-18451e6a4d3b">

In this case, the minimum is at index 2. However, we know that this is not correct. It is interesting that the elbow still indicates the correct result:
<img width="986" alt="image" src="https://github.com/TDAmeritrade/stumpy/assets/38519522/ed597b4c-d42c-47b2-a7c0-780e959d3a29">


## (3) Let's set `normalize` to False again. Also, let's scale the time series in the dim 0, 1, 2 by 1000, 100, 10, respectively.

```
# df is toy data
df.iloc[0, :] = df.iloc[0, :] * 1000
df.iloc[1, :] = df.iloc[1, :] * 100
df.iloc[2, :] = df.iloc[2, :] * 10
normalize=True
mps, indices = stumpy.mstump(df, m, normalize=normalize)

motifs_idx = np.argmin(mps, axis=1)
nn_idx = indices[np.arange(len(motifs_idx)), motifs_idx]
```

And I get this:
```
>>> motifs_idx
array([ 65, 152, 151])

>>> nn_idx
array([477, 352, 351])
```

But I was expecting to get the same index for the first two dimensions. In this case, I think the reason is that we are just adding the distances across dimensions. see:
https://github.com/EitanHemed/stumpy/blob/d569c9adbb5f4fd3ba018661a78ac80cbb2d5808/stumpy/core.py#L3999-L4001

While this can make sense when `normalize=True`, it may not be appropriate to just add them together **(but I do understand that we probably do not have any other choice here).** Note that if we apply matrix profile on each dimension individually, we get correct answer (still, issue (1) exists). However, if we just apply multi-dim matrix profile, we get strange result because the scale of time series are not the same, and it affects the result when `normalize==False`. 

Maybe it is not an issue(?!) but still I expected to get correct answer since applying metrix profile on each time series reveals co-motifs in the first two dimensions. So, maybe we just add a note in the doctoring saying that it is better to normalize the WHOLE time series in EACH dimension first before passing it to `mstump(...., normalize=False)`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some issues in Tutorial_Multidimensional_Motif_Discovery and MDL #945

(1) The locations of co-motifs do not match

(2) When I set `normalize=True` everything is good. But, if I set the last value of time series in the last dimension to `1000`, I get inconclusive result if I use MDL.

(3) Let's set `normalize` to False again. Also, let's scale the time series in the dim 0, 1, 2 by 1000, 100, 10, respectively.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Some issues in Tutorial_Multidimensional_Motif_Discovery and MDL #945

Description

(1) The locations of co-motifs do not match

(2) When I set normalize=True everything is good. But, if I set the last value of time series in the last dimension to 1000, I get inconclusive result if I use MDL.

(3) Let's set normalize to False again. Also, let's scale the time series in the dim 0, 1, 2 by 1000, 100, 10, respectively.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

(2) When I set `normalize=True` everything is good. But, if I set the last value of time series in the last dimension to `1000`, I get inconclusive result if I use MDL.

(3) Let's set `normalize` to False again. Also, let's scale the time series in the dim 0, 1, 2 by 1000, 100, 10, respectively.