Skip to content

HIVE-29411: Provide docker image for LLAP#6390

Open
abstractdog wants to merge 6 commits intoapache:masterfrom
abstractdog:HIVE-29411
Open

HIVE-29411: Provide docker image for LLAP#6390
abstractdog wants to merge 6 commits intoapache:masterfrom
abstractdog:HIVE-29411

Conversation

@abstractdog
Copy link
Copy Markdown
Contributor

@abstractdog abstractdog commented Mar 25, 2026

What changes were proposed in this pull request?

Implemented an LLAP Docker image with all required files and configurations. Also added Docker Compose, incorporating two daemons to demonstrate multiple daemons working together (e.g., shuffle).

Why are the changes needed?

One step towards a fully-distributed Dockerized Hive setup.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Start the HS2 + HMS + 2 LLAP setup:

cd packaging/src/docker

# cleanup previous containers and images if any
docker-compose --profile llap -f docker-compose.yml down --rmi local

# set the path to the postgres driver jar on the host machine
export POSTGRES_LOCAL_PATH=~/.m2/repository/org/postgresql/postgresql/42.7.3/postgresql-42.7.3.jar

# build all needed images 
./build.sh -hive 4.2.0 -hadoop 3.4.1 -tez 0.10.5

# start everything and follow logs
docker compose --profile llap -f docker-compose.yml up -d
docker compose --profile llap -f docker-compose.yml logs -f

Try it with beeline and see logs:

beeline -u 'jdbc:hive2://localhost:10000/' -n $USER
DROP table IF EXISTS iceberg_table; CREATE TABLE iceberg_table (id BIGINT) STORED BY iceberg; INSERT INTO iceberg_table VALUES(1);

Check LLAP daemons' web UI:
http://localhost:15001/
http://localhost:15002/

Proof that tasks went to different daemons:

llapdaemon2  | 2026-03-26T09:05:00,843  INFO [TezTR-852925_1_2_1_0_0 (1774515852925_0001_2_01_000000_0)] exec.SerializationUtilities: Deserializing ReduceWork using kryo

llapdaemon1  | 2026-03-26T09:05:00,637  INFO [TezTR-852925_1_2_0_0_0 (1774515852925_0001_2_00_000000_0)] exec.SerializationUtilities: Deserializing MapWork using kryo

# limitations under the License.
services:
zookeeper:
image: zookeeper:3.9
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use same zookeeper version as in pom i.e. 3.8.4

@Aggarwal-Raghav
Copy link
Copy Markdown
Contributor

It's working, used the following url for faster download.

export HADOOP_URL=https://dlcdn.apache.org/hadoop/common/hadoop-3.4.1/hadoop-3.4.1.tar.gz
export TEZ_URL=https://dlcdn.apache.org/tez/0.10.5/apache-tez-0.10.5-bin.tar.gz
export HIVE_URL=https://dlcdn.apache.org/hive/hive-4.2.0/apache-hive-4.2.0-bin.tar.gz
export POSTGRES_LOCAL_PATH=~/.m2/repository/org/postgresql/postgresql/42.7.3/postgresql-42.7.3.jar
export HIVE_VERSION=4.2.0
Screenshot 2026-03-25 at 9 43 26 PM Screenshot 2026-03-25 at 9 43 37 PM

@Aggarwal-Raghav
Copy link
Copy Markdown
Contributor

During docker compose its giving warn. I think default values can be provided for these vars HIVE_WAREHOUSE_PATH and DEFAULT_FS

WARN[0000] The "S3_ENDPOINT_URL" variable is not set. Defaulting to a blank string.
WARN[0000] The "AWS_SECRET_ACCESS_KEY" variable is not set. Defaulting to a blank string.
WARN[0000] The "AWS_ACCESS_KEY_ID" variable is not set. Defaulting to a blank string.
WARN[0000] The "DEFAULT_FS" variable is not set. Defaulting to a blank string.
WARN[0000] The "HIVE_WAREHOUSE_PATH" variable is not set. Defaulting to a blank string.
WARN[0000] The "S3_ENDPOINT_URL" variable is not set. Defaulting to a blank string.
WARN[0000] The "AWS_ACCESS_KEY_ID" variable is not set. Defaulting to a blank string.
WARN[0000] The "AWS_SECRET_ACCESS_KEY" variable is not set. Defaulting to a blank string.

-p 15004:15004 \
-p 15551:15551 \
-p 15002:15002 \
-e HADOOP_CONF_DIR=/etc/hadoop \
Copy link
Copy Markdown
Contributor

@Aggarwal-Raghav Aggarwal-Raghav Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the conf dir should be ?

/opt/hadoop/etc/hadoop/
/opt/hive/conf/

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole section doesn't make sense anymore, I mean, to start an LLAP daemon alone, removing this and let the users rely on the docker-compose

--exclude="*tests.jar" \
--exclude="*/webapps" \
-f /opt/hadoop-$HADOOP_VERSION.tar.gz \
-C /opt/ && \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe rm -rf the tarball. i know it won't help in docker image size because of multi-stage build.

-f /opt/apache-tez-$TEZ_VERSION-bin.tar.gz \
-C /opt

FROM eclipse-temurin:21.0.3_9-jdk-ubi9-minimal AS run
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe use eclipse-temurin:21-jdk-ubi9-minimal its a rolling tag pointing to JDK21 latest

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
services:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: name: llap-cluster can be added. currently it coming as docker in docker desktop

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finally, this has been moved to the original cluster and now has the name "hive-cluster"


mkdir -p "${LLAP_DAEMON_LOG_DIR}" "${LLAP_DAEMON_TMP_DIR}" "${LOCAL_DIRS}"

cd "${HIVE_HOME}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this cd "${HIVE_HOME}" as WORKDIR $HIVE_HOME is set in Dockerfile-llap?

@abstractdog
Copy link
Copy Markdown
Contributor Author

thanks @Aggarwal-Raghav for your comments so far, addressed them in 3a31f13

@deniskuzZ
Copy link
Copy Markdown
Member

deniskuzZ commented Mar 26, 2026

instead of docker-compose-llap.yml could we add llap as param?

docker compose --profile llap up -d

services:
  llapdaemon1:
    image: apache/hive-llap:${HIVE_VERSION}
    container_name: llapdaemon1
    profiles:
      - llap

@abstractdog
Copy link
Copy Markdown
Contributor Author

instead of docker-compose-llap.yml could we add llap as param?

docker compose --profile llap up -d

services:
  llapdaemon1:
    image: apache/hive-llap:${HIVE_VERSION}
    container_name: llapdaemon1
    profiles:
      - llap

sounds good, let me check

@abstractdog
Copy link
Copy Markdown
Contributor Author

instead of docker-compose-llap.yml could we add llap as param?

docker compose --profile llap up -d

services:
  llapdaemon1:
    image: apache/hive-llap:${HIVE_VERSION}
    container_name: llapdaemon1
    profiles:
      - llap

sounds good, let me check

@deniskuzZ: seems to work a115774
a separate hiveserver2 was needed because of config, I don't think it's a problem, a fair trade-off for different setups

@abstractdog
Copy link
Copy Markdown
Contributor Author

one more thing to check: let me try if a single hive image is suitable here with a different llap entrypoint, hence we won't need to maintain and deploy different images like "hive" and "hive-llap"

@abstractdog
Copy link
Copy Markdown
Contributor Author

one more thing to check: let me try if a single hive image is suitable here with a different llap entrypoint, hence we won't need to maintain and deploy different images like "hive" and "hive-llap"

works like a charm with 0860209

@sonarqubecloud
Copy link
Copy Markdown

export LLAP_DAEMON_OPTS

if [[ -n "${LLAP_EXTRA_OPTS:-}" ]]; then
export LLAP_DAEMON_OPTS="${LLAP_DAEMON_OPTS:-} ${LLAP_EXTRA_OPTS}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove export from L119 export LLAP_DAEMON_OPTS and move L116 which just exporting after this 2nd if stmt?

for opt in "${JAVA_ADD_OPENS[@]}"; do
    if [[ " ${LLAP_DAEMON_OPTS:-} " != *" ${opt} "* ]]; then
      LLAP_DAEMON_OPTS="${LLAP_DAEMON_OPTS:-} ${opt}"
    fi
  done


  if [[ -n "${LLAP_EXTRA_OPTS:-}" ]]; then
    LLAP_DAEMON_OPTS="${LLAP_DAEMON_OPTS:-} ${LLAP_EXTRA_OPTS}"
  fi

  export LLAP_DAEMON_OPTS

: "${LLAP_SERVICE_HOSTS:=@llap0}"
: "${LLAP_MEMORY_MB:=1024}"
: "${LLAP_EXECUTORS:=1}"
export HIVE_ZOOKEEPER_QUORUM
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, the export can be combined as done for L86-94:

export HIVE_ZOOKEEPER_QUORUM="${HIVE_ZOOKEEPER_QUORUM:-zookeeper:2181}"
export LLAP_SERVICE_HOSTS="${LLAP_SERVICE_HOSTS:-@llap0}"
export LLAP_MEMORY_MB="${LLAP_MEMORY_MB:-1024}"
export LLAP_EXECUTORS="${LLAP_EXECUTORS:-1}"

@Aggarwal-Raghav
Copy link
Copy Markdown
Contributor

In the description of the JIRA, beeline -u 'jdbc:hive2://localhost:10000/' -n $USER should point to now localhost:10001 port as that is hs2 for llap
Screenshot 2026-03-28 at 10 17 58 PM

environment:
USER: hive
HADOOP_CLASSPATH: /opt/hadoop/share/hadoop/tools/lib/*
HIVE_SERVER2_THRIFT_PORT: 10000
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIVE_SERVER2_WEB_PORT: 10002 should be defined here for keeping similarity with LLAP profile?
It's automatically picking localhost:10002 and working but better to define it.

@Aggarwal-Raghav
Copy link
Copy Markdown
Contributor

Aggarwal-Raghav commented Mar 28, 2026

@abstractdog , I'm seeing 1 error (metrics related) when I restart beeline session, attaching screenshot and logs
Steps:

docker compose --profile llap -f docker-compose.yml up -d
docker exec -it hiveserver2-llap bash
beeline -u 'jdbc:hive2://hiveserver2-llap:10001/'
DROP table IF EXISTS iceberg_table; CREATE TABLE iceberg_table (id BIGINT) STORED BY iceberg; INSERT INTO iceberg_table VALUES(1); 
✅
!q
beeline -u 'jdbc:hive2://hiveserver2-llap:10001/'
DROP table IF EXISTS iceberg_table; CREATE TABLE iceberg_table (id BIGINT) STORED BY iceberg; INSERT INTO iceberg_table VALUES(1);
❌
Screenshot 2026-03-28 at 10 38 29 PM

logs.txt

@deniskuzZ
Copy link
Copy Markdown
Member

⠙ hiveserver2-llap Pulling
⠸ hiveserver2 Pulling

is that expected?

@Aggarwal-Raghav
Copy link
Copy Markdown
Contributor

Aggarwal-Raghav commented Mar 30, 2026

⠙ hiveserver2-llap Pulling ⠸ hiveserver2 Pulling

is that expected?

At what phase you faced that? I didn't, Please ensure you have

export HIVE_VERSION=4.2.0
./build.sh -hive 4.2.0 -hadoop 3.4.1 -tez 0.10.5

@deniskuzZ
Copy link
Copy Markdown
Member

⠙ hiveserver2-llap Pulling ⠸ hiveserver2 Pulling
is that expected?

At what phase you faced that? Please ensure you have

export HIVE_VERSION=4.2.0
./build.sh -hive 4.2.0 -hadoop 3.4.1 -tez 0.10.5

docker compose, are we starting 2 hs2?

526347bb9f27   apache/hive:4.2.0   "sh -c /entrypoint.sh"   40 seconds ago   Up 39 seconds                  0.0.0.0:10000->10000/tcp, 9083/tcp, 0.0.0.0:10002->10002/tcp                         hiveserver2
78b9cddef5cc   apache/hive:4.2.0   "sh -c /entrypoint.sh"   40 seconds ago   Up 39 seconds                  9083/tcp, 10000/tcp, 0.0.0.0:10001->10001/tcp, 10002/tcp, 0.0.0.0:10003->10003/tcp   hiveserver2-llap

i think we should move regular hs2 under tez profile

networks:
- hive

hiveserver2-llap:
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we merge the definitions?

 hiveserver2:
  image: apache/hive:${HIVE_VERSION}
  container_name: hiveserver2
  hostname: hiveserver2
  depends_on:
    - metastore
  restart: unless-stopped

  environment:
    USER: hive
    HADOOP_CLASSPATH: /opt/hadoop/share/hadoop/tools/lib/*
    HIVE_SERVER2_THRIFT_PORT: 10000

    HIVE_SCRATCH_DIR: /opt/hive/scratch
    HIVE_QUERY_RESULTS_CACHE_DIRECTORY: /opt/hive/scratch/_resultscache_

    SERVICE_NAME: 'hiveserver2'
    IS_RESUME: 'true'

    # Base SERVICE_OPTS without LLAP
    SERVICE_OPTS: >-
      -Xmx1G
      -Dhive.metastore.uris=thrift://metastore:9083
      -Dhive.execution.engine=tez

    S3_ENDPOINT_URL: "${S3_ENDPOINT_URL}"
    AWS_ACCESS_KEY_ID: "${AWS_ACCESS_KEY_ID}"
    AWS_SECRET_ACCESS_KEY: "${AWS_SECRET_ACCESS_KEY}"

  ports:
    - "10000:10000"
    - "10002:10002"

  volumes:
    - warehouse:/opt/hive/data/warehouse
    - scratch:/opt/hive/scratch
    - ./jars:/tmp/ext-jars:ro

  networks:
    - hive

  # Add profile-specific override for LLAP
  profiles: [default, llap]


hiveserver2-llap-env:
  profiles: [llap]
  environment:
    SERVICE_OPTS: >-
      -Xmx1G
      -Dhive.metastore.uris=thrift://metastore:9083
      -Dhive.execution.engine=tez
      -Dhive.execution.mode=llap
      -Dhive.llap.execution.mode=all
      -Dhive.zookeeper.quorum=zookeeper:2181
      -Dhive.llap.daemon.service.hosts=@llap0

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zookeeper:
  profiles: [llap]

llapdaemon1:
  profiles: [llap]

llapdaemon2:
  profiles: [llap]

@Aggarwal-Raghav
Copy link
Copy Markdown
Contributor

⠙ hiveserver2-llap Pulling ⠸ hiveserver2 Pulling
is that expected?

At what phase you faced that? Please ensure you have

export HIVE_VERSION=4.2.0
./build.sh -hive 4.2.0 -hadoop 3.4.1 -tez 0.10.5

docker compose, are we starting 2 hs2?

526347bb9f27   apache/hive:4.2.0   "sh -c /entrypoint.sh"   40 seconds ago   Up 39 seconds                  0.0.0.0:10000->10000/tcp, 9083/tcp, 0.0.0.0:10002->10002/tcp                         hiveserver2
78b9cddef5cc   apache/hive:4.2.0   "sh -c /entrypoint.sh"   40 seconds ago   Up 39 seconds                  9083/tcp, 10000/tcp, 0.0.0.0:10001->10001/tcp, 10002/tcp, 0.0.0.0:10003->10003/tcp   hiveserver2-llap

i think we should move regular hs2 under tez profile

Maybe @abstractdog can give proper reply but my understanding is that before llap profile was introduced, a regular hs2 was running, and after llap profile, a new hiveserver2-llap is introduced. If we move the regular hs2 under tez profile then we are changing the older behaviour of how the docker command was used by users isn't it?
Also, i can't find any tez profile in docker-compose.yml? you mean introduce a new one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants