-
Notifications
You must be signed in to change notification settings - Fork 771
Getting Started
Note: this is for HiBench 5.0
-
System setup.
(1) Setup JDK, Hadoop-YARN, Spark runtime environment properly.
(2) For HiBench V4.0 and later, python 2.x(>=2.6) is required.
(3) Download/checkout HiBench benchmark suite
(4) Build HiBench with Maven. Please specify Spark version and Map Reduce version. For example, for Spark 1.5 and MR2, run
cd src mvn clean package -D spark1.5 -D MR2Optionally you can run
<HiBench_Root>/bin/build-all.shto build HiBench for all known Spark and MR versions. -
HiBench Configurations.
For minimum requirements: create & edit
conf/99-user_defined_properties.conf:cd conf cp 99-user_defined_properties.conf.template 99-user_defined_properties.confAnd Make sure below properties has been set:
hibench.hadoop.home The Hadoop installation location hibench.spark.home The Spark installation location hibench.hdfs.master HDFS master hibench.spark.master SPARK masterNote: For YARN mode, set
hibench.spark.mastertoyarn-client. (yarn-clusteris not supported yet)If the spark and hadoop version is not auto probed correctly, please set
hibench.hadoop.executable,hibench.hadoop.versionandhibench.spark.versionin 99-user_defined_properties.conf.To run HiBench on HDP, please specify
hibench.hadoop.mapreduce.hometo the mapreduce home, normally it should be "/usr/hdp/current/hadoop-mapreduce-client". Also please specifyhibench.hadoop.releaseto "hdp". -
Run. For example, to run a single workload
wordcounton Spark.workloads/wordcount/prepare/prepare.sh workloads/wordcount/spark/scala/bin/run.shYou can also try
<HiBench_Root>/bin/run-all.shto run all workloads. Note: The same configuration may not work for all workloads. -
View the report:
Goto
<HiBench_Root>/reportto check for the final report:-
report/hibench.report: Overall report about all workloads. -
report/<workload>/<language APIs>/bench.log: Raw logs on client side. -
report/<workload>/<language APIs>/monitor.html: System utilization monitor results. -
report/<workload>/<language APIs>/conf/<workload>.conf: Generated environment variable configurations for this workload. -
report/<workload>/<language APIs>/conf/sparkbench/<workload>/sparkbench.conf: Generated configuration for this workloads, which is used for mapping to environment variable. -
report/<workload>/<language APIs>/conf/sparkbench/<workload>/spark.conf: Generated configuration for spark.
[Optional] Execute
<HiBench root>/bin/report_gen_plot.py report/hibench.reportto generate report figures.Note:
report_gen_plot.pyrequirespython2.xandpython-matplotlib. -