English | 中文
nebula-algorithm is a Spark Application based on GraphX with the following Algorithm provided for now:
| Name | Use Case |
|---|---|
| PageRank | page ranking, important node digging |
| Louvain | community digging, hierarchical clustering |
| KCore | community detection, financial risk control |
| LabelPropagation | community detection, consultation propagation, advertising recommendation |
| ConnectedComponent | community detection, isolated island detection |
| StronglyConnectedComponent | community detection |
| ShortestPath | path plan, network plan |
| TriangleCount | network structure analysis |
| GraphTriangleCount | network structure and tightness analysis |
| BetweennessCentrality | important node digging, node influence calculation |
| DegreeStatic | graph structure analysis |
You could submit the entire spark application or invoke algorithms in lib library to apply graph algorithms for DataFrame.
-
Build Nebula Algorithm
$ git clone https://github.com/vesoft-inc/nebula-algorithm.git $ cd nebula-algorithm $ mvn clean package -Dgpg.skip -Dmaven.javadoc.skip=true -Dmaven.test.skip=trueAfter the above buiding process, the target file
nebula-algorithm-2.0.0.jarwill be placed undernebula-algorithm/target. -
Download from Maven repo
Alternatively, it could be downloaded from the following Maven repo:
https://repo1.maven.org/maven2/com/vesoft/nebula-algorithm/2.0.0/
Limitation: Due to Nebula Algorithm will not encode string id, thus during the algorithm execution, the source and target of edges must be in Type Int (The vid_type in Nebula Space could be String, while data must be in Type Int).
-
Option 1: Submit nebula-algorithm package
- Configuration
Refer to the configuration example.
- Submit Spark Application
${SPARK_HOME}/bin/spark-submit --master <mode> --class com.vesoft.nebula.algorithm.Main nebula-algorithm-2.0.0.jar -p application.conf -
Option2: Call nebula-algorithm interface
Now there are 10 algorithms provided in
libfromnebula-algorithm, which could be invoked in a programming fashion as below:- Add dependencies in
pom.xml.
<dependency> <groupId>com.vesoft</groupId> <artifactId>nebula-algorithm</artifactId> <version>2.0.0</version> </dependency>- Instantiate algorithm's config, below is an example for
PageRank.
val prConfig = new PRConfig(5, 1.0) val louvainResult = PageRankAlgo.apply(spark, data, prConfig, false)For other algorithms, please refer to test cases.
Note: The first column of DataFrame in the application represents the source vertices, the second represents the target vertices and the third represents edges' weight.
- Add dependencies in
| Nebula Algorithm Version | Nebula Version |
|---|---|
| 2.0.0 | 2.0.0, 2.0.1 |
| 2.1.0 | 2.0.0, 2.0.1 |
| 2.5.0 | 2.5.0 |
| 2.5-SNAPSHOT | nightly |
Nebula Algorithm is open source, you are more than welcomed to contribute in the following ways:
- Discuss in the community via the forum or raise issues here.
- Compose or improve our documents.
- Pull Request to help improve the code itself here.