(FM/MVM, etc.): ArrayIndexOutOfBoundsException when scaling

I get the exception below whenever I run the FM code on any of my real datasets. It seems to break roughly when you have >100k training examples and >100 machines.

```
java.lang.ArrayIndexOutOfBoundsException: -1
    at org.apache.spark.graphx.util.collection.GraphXPrimitiveKeyOpenHashMap$mcJI$sp.apply$mcJI$sp(GraphXPrimitiveKeyOpenHashMap.scala:64)
    at org.apache.spark.graphx.impl.EdgePartition.updateVertices(EdgePartition.scala:91)
    at org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:75)
    at org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:73)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:163)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
```

Here's the driver stacktrace:

```
org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706)
com.github.cloudml.zen.ml.partitioner.DBHPartitioner$.partitionByDBH(DBHPartitioner.scala:70)
com.github.cloudml.zen.ml.recommendation.FM$.initializeDataSet(FM.scala:498)
```

The odd thing is that the driver stacktrace shows the error happening in `initializeDataSet`, but it doesn't seem to occur until training is done. To speed reproduction of the problem I set `numIterations` to 1.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(FM/MVM, etc.): ArrayIndexOutOfBoundsException when scaling #62

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

(FM/MVM, etc.): ArrayIndexOutOfBoundsException when scaling #62

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions