Getting Started

Note: this is for HiBench 5.0

System setup.

(1) Setup JDK, Hadoop-YARN, Spark runtime environment properly.

(2) For HiBench V4.0 and later, python 2.x(>=2.6) is required.

(3) Download/checkout HiBench benchmark suite

(4) Build HiBench with Maven. Please specify Spark version and Map Reduce version. For example, for Spark 1.5 and MR2, run
```
  cd src
  mvn clean package -D spark1.5 -D MR2
```
Optionally you can run <HiBench_Root>/bin/build-all.sh to build HiBench for all known Spark and MR versions.
HiBench Configurations.

For minimum requirements: create & edit conf/99-user_defined_properties.conf：
```
   cd conf
   cp 99-user_defined_properties.conf.template 99-user_defined_properties.conf
```
And Make sure below properties has been set:
```
   hibench.hadoop.home      The Hadoop installation location
   hibench.spark.home       The Spark installation location
   hibench.hdfs.master      HDFS master
   hibench.spark.master     SPARK master
```
Note: For YARN mode, set hibench.spark.master to yarn-client. (yarn-cluster is not supported yet)

If the spark and hadoop version is not auto probed correctly, please set hibench.hadoop.executable, hibench.hadoop.version and hibench.spark.version in 99-user_defined_properties.conf.

To run HiBench on HDP, please specify hibench.hadoop.mapreduce.home to the mapreduce home, normally it should be "/usr/hdp/current/hadoop-mapreduce-client". Also please specify hibench.hadoop.release to "hdp".
Run. For example, to run a single workload wordcount on Spark.
```
    workloads/wordcount/prepare/prepare.sh
    workloads/wordcount/spark/scala/bin/run.sh
```
You can also try <HiBench_Root>/bin/run-all.sh to run all workloads. Note: The same configuration may not work for all workloads.
View the report:

Goto <HiBench_Root>/report to check for the final report:
- report/hibench.report: Overall report about all workloads.
- report/<workload>/<language APIs>/bench.log: Raw logs on client side.
- report/<workload>/<language APIs>/monitor.html: System utilization monitor results.
- report/<workload>/<language APIs>/conf/<workload>.conf: Generated environment variable configurations for this workload.
- report/<workload>/<language APIs>/conf/sparkbench/<workload>/sparkbench.conf: Generated configuration for this workloads, which is used for mapping to environment variable.
- report/<workload>/<language APIs>/conf/sparkbench/<workload>/spark.conf: Generated configuration for spark.
[Optional] Execute <HiBench root>/bin/report_gen_plot.py report/hibench.report to generate report figures.

Note: report_gen_plot.py requires python2.x and python-matplotlib.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Getting Started

Getting Started

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally