Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion packaging/src/docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ RUN tar -xzv \
-f /opt/apache-tez-$TEZ_VERSION-bin.tar.gz \
-C /opt

FROM eclipse-temurin:21.0.3_9-jre-ubi9-minimal AS run
FROM eclipse-temurin:21-jdk-ubi9-minimal AS run

ARG UID=1000
ARG HADOOP_VERSION
Expand Down Expand Up @@ -101,6 +101,8 @@ COPY --chown=hive conf $HIVE_HOME/conf
RUN chmod +x /entrypoint.sh && \
mkdir -p $HIVE_HOME/data/warehouse && \
chown hive $HIVE_HOME/data/warehouse && \
mkdir -p $HIVE_HOME/scratch && \
chown hive $HIVE_HOME/scratch && \
mkdir -p /home/hive/.beeline && \
chown hive /home/hive/.beeline

Expand Down
38 changes: 33 additions & 5 deletions packaging/src/docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,15 +56,15 @@ docker run -d -p 9083:9083 --env SERVICE_NAME=metastore --name metastore-standal

### Detailed Setup
---
#### Build image
#### Build images
Apache Hive relies on Hadoop, Tez and some others to facilitate reading, writing, and managing large datasets.
The `build.sh` provides ways to build the image against specified version of the dependent, as well as build from source.
The `build.sh` script provides ways to build the main Hive image against specified versions of the dependencies, as well as build from source. An additional Dockerfile is provided for a specialized LLAP Daemon image.

##### Build from source
##### Build from source (main Hive image)
```shell
mvn clean package -pl packaging -DskipTests -Pdocker
```
##### Build with specified version
##### Build with specified versions (main Hive image)
There are some arguments to specify the component version:
```shell
-hadoop <hadoop version>
Expand All @@ -83,7 +83,7 @@ together with Hadoop 3.1.0 and Tez 0.10.1 to build the image,
```shell
./build.sh -hadoop 3.1.0 -tez 0.10.1
```
After building successfully, we can get a Docker image named `apache/hive` by default, the image is tagged by the provided Hive version.
After building successfully, you get a Docker image named `apache/hive` by default, tagged by the provided Hive version.

#### Run services

Expand Down Expand Up @@ -189,6 +189,34 @@ To stop/remove them all,
docker compose down
```

#### Starting an LLAP cluster with Docker Compose

The compose file `packaging/src/docker/docker-compose.yml` can start a cluster with LLAP daemons (discovered via Zookeeper) if the `llap` profile is activated.

Use the following workflow from `packaging/src/docker`:

```shell
docker-compose down --rmi local # cleanup previous containers and images

export POSTGRES_LOCAL_PATH=... # set the path to the postgres driver jar on the host machine
./build.sh -hive 4.2.0 -hadoop 3.4.1 -tez 0.10.5 # build image from the common Dockerfile
./start-hive.sh --llap
```

To view LLAP logs:

```shell
docker compose logs -f llapdaemon
```

To stop and remove the LLAP stack:

```shell
./stop-hive.sh --llap # to stop and delete containers
#OR
./stop-hive.sh --llap --cleanup # to remove volumes also
```

#### Usage

- HiveServer2 web
Expand Down
4 changes: 2 additions & 2 deletions packaging/src/docker/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ TEZ_VERSION=
usage() {
cat <<EOF 1>&2
Usage: $0 [-h] [-hadoop <Hadoop version>] [-tez <Tez version>] [-hive <Hive version>] [-repo <Docker repo>]
Build the Hive Docker image
Build the Hive Docker image (reused for LLAP too)
-help Display help
-hadoop Build image with the specified Hadoop version
-tez Build image with the specified Tez version
Expand Down Expand Up @@ -129,6 +129,6 @@ docker build \
-t "$repo/hive:$HIVE_VERSION" \
--build-arg "HIVE_VERSION=$HIVE_VERSION" \
--build-arg "HADOOP_VERSION=$HADOOP_VERSION" \
--build-arg "TEZ_VERSION=$TEZ_VERSION" \
--build-arg "TEZ_VERSION=$TEZ_VERSION"

rm -r "${WORK_DIR}"
20 changes: 16 additions & 4 deletions packaging/src/docker/conf/hive-site.xml.template
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,6 @@
<name>hive.tez.exec.print.summary</name>
<value>true</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/opt/hive/scratch_dir</value>
</property>
<property>
<name>hive.user.install.directory</name>
<value>/opt/hive/install_dir</value>
Expand Down Expand Up @@ -64,4 +60,20 @@
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<property>
<name>hive.llap.execution.mode</name>
<value>all</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>${HIVE_SCRATCH_DIR}</value>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>${HIVE_SCRATCH_DIR}</value>
</property>
<property>
<name>hive.query.results.cache.directory</name>
<value>${HIVE_QUERY_RESULTS_CACHE_DIRECTORY}</value>
</property>
</configuration>
47 changes: 47 additions & 0 deletions packaging/src/docker/conf/llap-daemon-site.xml.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
<property>
<name>hive.zookeeper.quorum</name>
<value>${HIVE_ZOOKEEPER_QUORUM}</value>
</property>
<property>
<name>hive.llap.daemon.service.hosts</name>
<value>${HIVE_LLAP_DAEMON_SERVICE_HOSTS}</value>
</property>
<property>
<name>hive.llap.daemon.memory.per.instance.mb</name>
<value>${LLAP_MEMORY_MB}</value>
</property>
<property>
<name>hive.llap.daemon.num.executors</name>
<value>${LLAP_EXECUTORS}</value>
</property>
<property>
<name>hive.llap.daemon.web.port</name>
<value>${LLAP_WEB_PORT}</value>
</property>
<property>
<name>hive.llap.management.rpc.port</name>
<value>${LLAP_MANAGEMENT_RPC_PORT}</value>
</property>
<property>
<name>hive.llap.daemon.yarn.shuffle.port</name>
<value>${LLAP_SHUFFLE_PORT}</value>
</property>
</configuration>
79 changes: 77 additions & 2 deletions packaging/src/docker/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.

version: '3.9'
name: hive

services:
postgres:
image: postgres
Expand Down Expand Up @@ -73,10 +74,23 @@ services:
- metastore
restart: unless-stopped
container_name: hiveserver2
hostname: hiveserver2
environment:
USER: hive
HADOOP_CLASSPATH: /opt/hadoop/share/hadoop/tools/lib/*
HIVE_SERVER2_THRIFT_PORT: 10000
Comment thread
abstractdog marked this conversation as resolved.
SERVICE_OPTS: '-Xmx1G -Dhive.metastore.uris=thrift://metastore:9083'

# Directories shared between HiveServer2 and LLAP daemon
HIVE_SCRATCH_DIR: /opt/hive/scratch
HIVE_QUERY_RESULTS_CACHE_DIRECTORY: /opt/hive/scratch/_resultscache_

SERVICE_OPTS: >-
-Xmx1G
-Dhive.metastore.uris=thrift://metastore:9083

-Dhive.execution.mode=${HIVE_EXECUTION_MODE:-container}
-Dhive.zookeeper.quorum=${HIVE_ZOOKEEPER_QUORUM:-}
-Dhive.llap.daemon.service.hosts=${HIVE_LLAP_DAEMON_SERVICE_HOSTS:-}
IS_RESUME: 'true'
SERVICE_NAME: 'hiveserver2'

Expand All @@ -88,14 +102,75 @@ services:
- '10002:10002'
volumes:
- warehouse:/opt/hive/data/warehouse
- scratch:/opt/hive/scratch
# Mount local jars to a temporary staging area (Read-Only)
- ./jars:/tmp/ext-jars:ro
networks:
- hive

zookeeper:
profiles:
- llap
image: zookeeper:3.8.4
container_name: zookeeper
hostname: zookeeper
restart: unless-stopped
ports:
- '2181:2181'
networks:
- hive
volumes:
- zookeeper_data:/data
- zookeeper_datalog:/datalog
- zookeeper_logs:/logs

#TODO Tez AM container (in the meantime, the HS2(with local Tez AM) + LLAP daemon setup is working properly)
# 1. Define and use a Tez AM image from HIVE-29419 or TEZ-4682
# 2. Configure TezAM to use Zookeeper Llap Registry to discover the LLAP daemon
# 3. Configure HiveServer2 to use the Tez AM Zookeeper Registry to discover the Tez AM
# Prerequisites:
# - tez-api 1.0.0-SNAPSHOT jar injected into HiveSever2 until Tez 1.0.0 is released
# - make HIVE-29477 happen to let HiveServer2 use Tez external sessions
# 4. Define hadoop components here to be used by all the containers (working example can be found at TEZ-4682), currently a local volume

llapdaemon:
profiles:
- llap
image: apache/hive:${HIVE_VERSION}
depends_on:
- zookeeper
restart: unless-stopped
environment:
USER: hive
SERVICE_NAME: 'llap'

LLAP_MEMORY_MB: '1024'
LLAP_EXECUTORS: '1'

HIVE_SCRATCH_DIR: /opt/hive/scratch
HIVE_QUERY_RESULTS_CACHE_DIRECTORY: /opt/hive/scratch/_resultscache_

LOCAL_DIRS: /tmp/llap-local

LLAP_WEB_PORT: '15001'
LLAP_MANAGEMENT_RPC_PORT: '15004'
LLAP_SHUFFLE_PORT: '15551'
volumes:
- warehouse:/opt/hive/data/warehouse
- scratch:/opt/hive/scratch
networks:
- hive

volumes:
hive-db:
warehouse:
scratch:
zookeeper_data:
name: zookeeper_data
zookeeper_datalog:
name: zookeeper_datalog
zookeeper_logs:
name: zookeeper_logs

networks:
hive:
Expand Down
62 changes: 59 additions & 3 deletions packaging/src/docker/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,9 @@ fi
# =========================================================================
# REPLACE ${VARS} in the template
# =========================================================================
: "${HIVE_WAREHOUSE_PATH:=/opt/hive/data/warehouse}"
export HIVE_WAREHOUSE_PATH
export HIVE_WAREHOUSE_PATH="${HIVE_WAREHOUSE_PATH:-/opt/hive/data/warehouse}"
export HIVE_SCRATCH_DIR="${HIVE_SCRATCH_DIR:-/opt/hive/scratch}"
export HIVE_QUERY_RESULTS_CACHE_DIRECTORY="${HIVE_WAREHOUSE_PATH:-/opt/hive/scratch/_resultscache_}"

envsubst < $HIVE_HOME/conf/core-site.xml.template > $HIVE_HOME/conf/core-site.xml
envsubst < $HIVE_HOME/conf/hive-site.xml.template > $HIVE_HOME/conf/hive-site.xml
Expand Down Expand Up @@ -67,6 +68,59 @@ function initialize_hive {
fi
}

function run_llap {
export HIVE_ZOOKEEPER_QUORUM="${HIVE_ZOOKEEPER_QUORUM:-zookeeper:2181}"
export HIVE_LLAP_DAEMON_SERVICE_HOSTS="${HIVE_LLAP_DAEMON_SERVICE_HOSTS:-@llap0}"
Comment thread
deniskuzZ marked this conversation as resolved.
export LLAP_MEMORY_MB="${LLAP_MEMORY_MB:-1024}"
export LLAP_EXECUTORS="${LLAP_EXECUTORS:-1}"

envsubst < "$HIVE_HOME/conf/llap-daemon-site.xml.template" > "$HIVE_HOME/conf/llap-daemon-site.xml"

export LLAP_DAEMON_LOG_DIR="${LLAP_DAEMON_LOG_DIR:-/tmp/llapDaemonLogs}"
export LLAP_DAEMON_TMP_DIR="${LLAP_DAEMON_TMP_DIR:-/tmp/llapDaemonTmp}"
export LOCAL_DIRS="${LOCAL_DIRS:-/tmp/llap-local}"
mkdir -p "${LLAP_DAEMON_LOG_DIR}" "${LLAP_DAEMON_TMP_DIR}" "${LOCAL_DIRS}"

# runLlapDaemon.sh expects jars under ${LLAP_DAEMON_HOME}/lib.
# In this image, LLAP jars are under ${HIVE_HOME}/lib.
export LLAP_DAEMON_HOME="${LLAP_DAEMON_HOME:-$HIVE_HOME}"
export LLAP_DAEMON_CONF_DIR="${LLAP_DAEMON_CONF_DIR:-$HIVE_CONF_DIR}"
export LLAP_DAEMON_USER_CLASSPATH="${LLAP_DAEMON_USER_CLASSPATH:-$TEZ_HOME/*:$TEZ_HOME/lib/*:$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/common/lib/*:$HADOOP_HOME/share/hadoop/yarn/*:$HADOOP_HOME/share/hadoop/yarn/lib/*:$HADOOP_HOME/share/hadoop/hdfs/*:$HADOOP_HOME/share/hadoop/hdfs/lib/*:$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*}"

JAVA_ADD_OPENS=(
"--add-opens=java.base/java.lang=ALL-UNNAMED"
"--add-opens=java.base/java.util=ALL-UNNAMED"
"--add-opens=java.base/java.io=ALL-UNNAMED"
"--add-opens=java.base/java.net=ALL-UNNAMED"
"--add-opens=java.base/java.nio=ALL-UNNAMED"
"--add-opens=java.base/java.util.concurrent=ALL-UNNAMED"
"--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED"
"--add-opens=java.base/java.util.regex=ALL-UNNAMED"
"--add-opens=java.base/java.lang.reflect=ALL-UNNAMED"
"--add-opens=java.sql/java.sql=ALL-UNNAMED"
"--add-opens=java.base/java.text=ALL-UNNAMED"
"-Dnet.bytebuddy.experimental=true"
)
for opt in "${JAVA_ADD_OPENS[@]}"; do
if [[ " ${LLAP_DAEMON_OPTS:-} " != *" ${opt} "* ]]; then
LLAP_DAEMON_OPTS="${LLAP_DAEMON_OPTS:-} ${opt}"
fi
done

if [[ -n "${LLAP_EXTRA_OPTS:-}" ]]; then
export LLAP_DAEMON_OPTS="${LLAP_DAEMON_OPTS:-} ${LLAP_EXTRA_OPTS}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove export from L119 export LLAP_DAEMON_OPTS and move L116 which just exporting after this 2nd if stmt?

for opt in "${JAVA_ADD_OPENS[@]}"; do
    if [[ " ${LLAP_DAEMON_OPTS:-} " != *" ${opt} "* ]]; then
      LLAP_DAEMON_OPTS="${LLAP_DAEMON_OPTS:-} ${opt}"
    fi
  done


  if [[ -n "${LLAP_EXTRA_OPTS:-}" ]]; then
    LLAP_DAEMON_OPTS="${LLAP_DAEMON_OPTS:-} ${LLAP_EXTRA_OPTS}"
  fi

  export LLAP_DAEMON_OPTS

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

fi

export LLAP_DAEMON_OPTS

LLAP_RUN_SCRIPT="${HIVE_HOME}/scripts/llap/bin/runLlapDaemon.sh"
if [ ! -x "${LLAP_RUN_SCRIPT}" ]; then
echo "LLAP daemon launcher script not found at ${LLAP_RUN_SCRIPT}."
exit 1
fi
exec "${LLAP_RUN_SCRIPT}" run "$@"
}

export HIVE_CONF_DIR=$HIVE_HOME/conf
if [ -d "${HIVE_CUSTOM_CONF_DIR:-}" ]; then
find "${HIVE_CUSTOM_CONF_DIR}" -type f -exec \
Expand All @@ -76,7 +130,7 @@ if [ -d "${HIVE_CUSTOM_CONF_DIR:-}" ]; then
fi

export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Xmx1G $SERVICE_OPTS"
if [[ "${SKIP_SCHEMA_INIT}" == "false" ]]; then
if [[ "${SKIP_SCHEMA_INIT}" == "false" && ( "${SERVICE_NAME}" == "hiveserver2" || "${SERVICE_NAME}" == "metastore" ) ]]; then
# handles schema initialization
initialize_hive
fi
Expand All @@ -91,4 +145,6 @@ elif [ "${SERVICE_NAME}" == "metastore" ]; then
else
exec "$HIVE_HOME/bin/hive" --skiphadoopversion --skiphbasecp --service "$SERVICE_NAME"
fi
elif [ "${SERVICE_NAME}" == "llap" ]; then
run_llap "$@"
fi
Loading
Loading