This pipeline is now 0.0.2
Target API version: 1.1.3.3
This section helps you get started with setting up this constituent in eclipse
- Clone repository on to your machine as
git clone git@repo.haystack.one:server.tachyon/constituent.shell.git
- Move to directory, and add plugins.sbt as,
vi project/plugins.sbt
- Add eclipse plugin line and close.
addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "5.2.4")
- Issue
sbt eclipseand wait for creation of project. - Import project in eclipse.
You will have to import following storage classes to your project. These JARs can be found inside the dist/lib and dist/lib/spark directory of your PredictionIO source installation directory since we build from sources.
pio-assembly-0.13.0.jarpio-data-elasticsearch-assembly-0.13.0.jarpio-data-hbase-assembly-0.13.0.jarpio-data-hdfs-assembly-0.13.0.jar
You may hit Scala version compatibility issues. If so, go to project compiler settings and choose Latest 2.11 bundle (dynamic)
Finally, edit build.sbt and add the following under libraryDependencies
"org.xerial.snappy" % "snappy-java" % "1.1.1.7"- Add this for debugging only. Do not commit it.
Helps us train and deploy locally.
Create a new Run/Debug Configuration by going to Debug > Edit Configurations.... Choose Scala Application and click on the + button. Name it pio train and put in the following:
- Main class:
org.apache.predictionio.workflow.CreateWorkflow
- VM options:
-Dspark.master=local -Dlog4j.configuration=file:/usr/local/pio/conf/log4j.properties -Dpio.log.dir=/var/log/haystack/pio
- Program arguments:
--engine-id dummy --engine-version dummy --engine-variant engine.json --env dummy=dummy
Make sure working directory is set to the base directory of the template that you are working on.
Add environment variables to this configuration as indicated under deployment document; you may find it at /usr/local/pio/conf/pio-env.sh. Other considerations are as outlined below;
- You might encounter issues with HDFS if configuration is not set right; if so switch to file system for model storage like so;
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE: LOCALFSPIO_STORAGE_SOURCES_LOCALFS_TYPE: localfsPIO_STORAGE_SOURCES_LOCALFS_PATH: /path/to/data/folder
- You might have to also do the following for smoother provisioning;
- Copy
/usr/local/pio/conf/tojarsfolder (the one where you brought in all the libraries from) - Add this folder as an external class path for the project
- Copy
Simply duplicate the previous configuration and replace the following;
- Main class:
org.apache.predictionio.workflow.CreateServer
- Program arguments:
--engineInstanceId <id_from_pio_train> --engine-variant engine.json- You will find
engineInstanceIdprinted in console at the end of train
This section provides you details on how to provision HMLP on to HOLU. Refer to individual constituents git page for proposed port mappings and access keys. Do not forget to add access to event server as;
- Edit
/etc/default/haystackand add base paths for event server at both announcer and consumer nodes.holu.base=http://192.168.136.90:7070
The first element is to generate access tokens denoted as prediction pipeline units.
- Execute the following to generate skeleton unit
pio app new constituent.shell- Add
--access-keyparameter if you want to control key generated- It should be a 64-char string of the form
abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ12
- It should be a 64-char string of the form
- Record unit ID and access key. You will need this later.
It is time to prepare constituent unit files that eventually manifests as a HML pipeline.
- Retrieve engine files by cloning relevant git repository
- Setup the folder as
mkdir -p /var/lib/haystack/pio/constituents/constituent.shell - Go to folder as
cd /var/lib/haystack/pio/constituents/constituent.shell - Generate structure folders as;
mkdir bin conf pipeline - Go to pipeline folder and get all files as
- Either clone to pipeline as,
git clone git@repo.haystack.one:server.tachyon/constituent.shell.git pipeline - Or if it is zipped,
unzip constituent.shell.zip -d pipeline - Make sure that the application name is set right at;
pipeline/engine.json. ChangeappNametoconstituent.shell.
- Either clone to pipeline as,
- Copy all scripts to bin as;
cp pipeline/src/main/resources/scripts/* bin/ - Copy configuration to conf as;
cp pipeline/src/main/resources/configuration/* conf/ - Ensure that all scripts have execute permission as;
chmod +x bin/* - Get your configuration right as;
vi conf/pipeline.conf- Pay attention to
HOSTNAME, HOST, ACCESS_KEY, TRAIN_MASTER, DEPLOY_MASTER, X_CORES and Y_MEMORY
- Pay attention to
- Setup the folder as
- Edit
/etc/default/haystackand add access keys to denote addition of HMLP.- For consumer nodes;
haystack.tachyon.events.dispatch.skeleton=<accesskey>
- For consumer nodes;
- Complete events import through migration and turning on concomitant consumer
It is important to complete at least one iteration of build, train and deploy cycle prior to consumption.
- Go to folder as
cd /var/lib/haystack/pio/constituents/constituent.shell/bin - Build the prediction unit as,
./build
- Train the predictive model as (ensure events migration is complete),
./train
- Deploy the prediction unit as,
./deploy- Do not kill the deployed process. Subsequent train and deploy would take care of provisioning it again.
- You can verify deployed HMLP by visiting
http://192.168.136.90:17071/and querying athttp://192.168.136.90:17071/queries.json
- Edit
/etc/default/haystackand add url keys to denote addition of HMLP. - For announcer nodes;
haystack.tachyon.pipeline.access.skeleton=http://192.168.136.90:17071
Now that we have successfully provisioned this HMLP; let us set it up for a periodic train-deploy cycle. Note that events are always consumed at real-time but are not accounted for until the next train cycle builds the model.
- Find the accompanying shell scripts of constituent and modify for consumption.
- Go to constituent directory at;
cd /var/lib/haystack/pio/constituents/constituent.shell/
- Verify configuration is right as;
vi conf/pipeline.conf- Adjust spark driver and executor settings as required
- Do not forget to make scripts executable;
chmod +x bin/*
- Ensure
pio buildis run at least once before enablingcronjob.
- Go to constituent directory at;
Finally, setup crontab for executing these scripts. mailutils is used in this script. For Ubuntu, you can do sudo update-alternatives --config mailx and see if /usr/bin/mail.mailutils is selected.
- Edit crontab file as;
crontab -efor user level- Add the entry as;
0 0,6,12,18 * * * /var/lib/haystack/pio/constituents/constituent.shell/bin/redeploy >/dev/null 2>/dev/null- Use
man cronto check usage - Manage schedules in conjunction with all other HMLPs and ensure that trains do not overlap
- Reload to take effect (optional)
sudo service cron reload- Restart if needed;
sudo systemctl restart cron
You are all set!
Use this structure for other constituents
- Recommend contexts to user
description: given user id; recommends relevant contexts to patron (context ids)known as: constituent.recommend-contexts-to-usertakes: user and context entities; view, interest, disinterest eventsqueries with- user idreturns: list of recommended context idsanswers: at feeds page, populate "Trending now"works for: patron (consumer), publisher (creator)example: detail out an example
- user view context events
- user interested context events
- user ID
- a ranked list of recommended contextIDs
- ALS
- Person view this, bought that and hence might be interested in this item
Latest from for Apache PredictionIO 0.13.0