This document assumes that you have a Ubuntu system installed. The user must have sudo permissions. Set the hostname and add it to the /etc/hosts file.
- Create a New Virtual Machine
- Create a New User
- Log in as the New User
- Install Java
- Set Environment Variables
- Set up SSH
- Enable Passwordless SSH
- Update the System
- Download and Install Hadoop
- Configure Hadoop
- Format Hadoop Namenode
- Start Hadoop Services
- Test the Hadoop Setup
-
Update the System
sudo apt update sudo apt upgrade
-
Create a New Virtual Machine
Before starting, make the following changes:
Add two network adapters before running the machine:- 1st: NAT
- 2nd: Host-only
Then, start the machine.
-
Create a New User
sudo adduser hduser
Provide sudo Permissions to the User:
Open the sudoers file:sudo visudo
Add the following line to the file:
hduser ALL=(ALL) NOPASSWD: ALL
Save and exit the file.
-
Log in as the New User
Log in as hduser using PUTTY or any other terminal client. -
Install Java
The Hadoop release supports Java 11 or Java 8. We'll install Java 8:sudo apt install openjdk-8-jdk
Verify the installation:
java -version
-
Set Environment Variables
Edit the .bashrc file:nano ~/.bashrcAdd the following lines:
# Java Environment export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ export JRE_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre # Hadoop Environment Variables export HADOOP_HOME=/usr/local/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_LOG_DIR=$HADOOP_HOME/logs export HADOOP_MAPRED_HOME=$HADOOP_HOME # Add Hadoop bin/ directory to PATH export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" export PDSH_RCMD_TYPE=ssh
Reload .bashrc:
source ~/.bashrc
-
Set up SSH
Install SSH and PDSH:sudo apt install ssh -y sudo apt install pdsh -y
Start and enable the SSH service:
sudo systemctl start ssh sudo systemctl enable ssh -
Enable Passwordless SSH
Generate SSH keys:ssh-keygen -t rsa
Press Enter on all prompts. This will create a .ssh folder and keys.
Copy the public key:ssh-copy-id -i ~/.ssh/id_rsa.pub hduser@localhostVerify SSH access without a password:
ssh hduser@localhost
-
Download and Install Hadoop
Download Hadoop:wget -c -O hadoop.tar.gz https://dlcdn.apache.org/hadoop/common/hadoop-3.2.4/hadoop-3.2.4.tar.gz tar xvzf hadoop.tar.gz
Move Hadoop to /usr/local/hadoop:
sudo mkdir /usr/local/hadoop sudo mv hadoop-3.2.4/* /usr/local/hadoop/ sudo chown -R hduser:hduser /usr/local/hadoop sudo chmod 755 -R /usr/local/hadoop -
Configure Hadoop
Edit the following configuration files:hadoop-env.sh:Add/modify the following:cd $HADOOP_HOME/etc/hadoop nano hadoop-env.sh
#JAVA
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
export JRE_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre
#Hadoop Environment Variables
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export HADOOP_LOG_DIR=$HADOOP_HOME/logs
export HDFS_NAMENODE_USER=hduser
export HDFS_DATANODE_USER=hduser
export HDFS_SECONDARYNAMENODE_USER=hduser
export YARN_RESOURCEMANAGER_USER=hduser
export YARN_NODEMANAGER_USER=hduser
export YARN_NODEMANAGER_USER=hduser
# AddHadoopbin/directory to PATH
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"core-site.xml:
nano core-site.xmlAdd:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/hd_store/tmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>-
yarn-site.xml:nano yarn-site.xml
Add:
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
-
hdfs-site.xml:
nano hdfs-site.xmlAdd:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/hd_store/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/hd_store/datanode</value>
</property>
</configuration>mapred-site.xml:
nano mapred-site.xmlAdd:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME<
/value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME<
/value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME<
/value>
</property>
</configuration>workers:
Add localhost if not present:nano workers localhost
-
Format Hadoop Namenode
cd $HADOOP_HOME/sbin hadoop namenode -format
-
Start Hadoop Services
Start HDFS:start-dfs.sh
Verify the running Java processes:
jps
Start YARN:
start-yarn.sh
Verify YARN is running:
jps
-
Test the Hadoop Setup
Create an input directory in HDFS:hdfs dfs -mkdir /input
Create a test file:
nano ~/words.txtCopy the file to HDFS:
hdfs dfs -copyFromLocal words.txt /input/
List the contents of the HDFS input directory:
hdfs dfs -ls /input
List HDFS directories:
hdfs dfs -ls / hdfs dfs -ls /input
- Verify running processes:
You should see processes like Namenode, Datanode, ResourceManager, and NodeManager running.
jps
This setup allows you to run a Hadoop cluster on a single machine for testing or development.
👨💻 𝓒𝓻𝓪𝓯𝓽𝓮𝓭 𝓫𝔂: Suraj Kumar Choudhary | 📩 𝓕𝓮𝓮𝓵 𝓯𝓻𝓮𝓮 𝓽𝓸 𝓓𝓜 𝓯𝓸𝓻 𝓪𝓷𝔂 𝓱𝓮𝓵𝓹: csuraj982@gmail.com