Installing a Single-Node Hadoop Cluster

This document assumes that you have a Ubuntu system installed. The user must have sudo permissions. Set the hostname and add it to the /etc/hosts file.

Steps to Install Hadoop

Create a New Virtual Machine
Create a New User
Log in as the New User
Install Java
Set Environment Variables
Set up SSH
Enable Passwordless SSH
Update the System
Download and Install Hadoop
Configure Hadoop
Format Hadoop Namenode
Start Hadoop Services
Test the Hadoop Setup

Let's Dive into Detail

Update the System
```
sudo apt update
sudo apt upgrade
```
Create a New Virtual Machine
Before starting, make the following changes:
Add two network adapters before running the machine:
- 1st: NAT
- 2nd: Host-only
  Then, start the machine.
Create a New User
```
sudo adduser hduser
```
Provide sudo Permissions to the User:
Open the sudoers file:
```
sudo visudo
```
Add the following line to the file:
```
hduser ALL=(ALL) NOPASSWD: ALL
```
Save and exit the file.
Log in as the New User
Log in as hduser using PUTTY or any other terminal client.
Install Java
The Hadoop release supports Java 11 or Java 8. We'll install Java 8:
```
sudo apt install openjdk-8-jdk
```
Verify the installation:
```
java -version
```

Set Environment Variables
Edit the .bashrc file:

nano ~/.bashrc

Add the following lines:

# Java Environment
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
export JRE_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre

# Hadoop Environment Variables
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_LOG_DIR=$HADOOP_HOME/logs
export HADOOP_MAPRED_HOME=$HADOOP_HOME

# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export PDSH_RCMD_TYPE=ssh

Reload .bashrc:

source ~/.bashrc

Set up SSH
Install SSH and PDSH:

sudo apt install ssh -y
sudo apt install pdsh -y

Start and enable the SSH service:

sudo systemctl start ssh
sudo systemctl enable ssh

Enable Passwordless SSH
Generate SSH keys:
```
ssh-keygen -t rsa
```
Press Enter on all prompts. This will create a .ssh folder and keys.
Copy the public key:
```
ssh-copy-id -i ~/.ssh/id_rsa.pub hduser@localhost
```
Verify SSH access without a password:
```
ssh hduser@localhost
```

Download and Install Hadoop
Download Hadoop:

wget -c -O hadoop.tar.gz https://dlcdn.apache.org/hadoop/common/hadoop-3.2.4/hadoop-3.2.4.tar.gz
tar xvzf hadoop.tar.gz

Move Hadoop to /usr/local/hadoop:

sudo mkdir /usr/local/hadoop
sudo mv hadoop-3.2.4/* /usr/local/hadoop/
sudo chown -R hduser:hduser /usr/local/hadoop
sudo chmod 755 -R /usr/local/hadoop

Configure Hadoop
Edit the following configuration files:
- hadoop-env.sh:
```
cd $HADOOP_HOME/etc/hadoop
nano hadoop-env.sh
```
  Add/modify the following:

 #JAVA
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
export JRE_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre
#Hadoop Environment Variables
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export HADOOP_LOG_DIR=$HADOOP_HOME/logs
export HDFS_NAMENODE_USER=hduser
export HDFS_DATANODE_USER=hduser
export HDFS_SECONDARYNAMENODE_USER=hduser
export YARN_RESOURCEMANAGER_USER=hduser
export YARN_NODEMANAGER_USER=hduser
export YARN_NODEMANAGER_USER=hduser
# AddHadoopbin/directory to PATH
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

core-site.xml:

     nano core-site.xml

Add:

     <configuration>
       <property>
         <name>hadoop.tmp.dir</name>
         <value>/usr/local/hadoop/hd_store/tmp</value>
       </property>
       <property>
         <name>fs.defaultFS</name>
         <value>hdfs://localhost:9000</value>
       </property>
     </configuration>

yarn-site.xml:

   nano yarn-site.xml

Add:

   <configuration>
     <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
     </property>
   </configuration>

hdfs-site.xml:

     nano hdfs-site.xml

Add:

    <configuration>
      <property>
        <name>dfs.replication</name>
        <value>1</value>
      </property>
      <property>
        <name>dfs.name.dir</name>
        <value>/usr/local/hadoop/hd_store/namenode</value>
      </property>
      <property>
        <name>dfs.data.dir</name>
        <value>/usr/local/hadoop/hd_store/datanode</value>
      </property>
    </configuration>

mapred-site.xml:

     nano mapred-site.xml

Add:

   <configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME<
/value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME<
/value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME<
/value>
</property>
</configuration>

workers:
Add localhost if not present:
```
    nano workers
    localhost
```

Format Hadoop Namenode

cd $HADOOP_HOME/sbin
hadoop namenode -format

Start Hadoop Services
Start HDFS:
```
start-dfs.sh
```
Verify the running Java processes:
```
jps
```
Start YARN:
```
start-yarn.sh
```
Verify YARN is running:
```
jps
```
Test the Hadoop Setup
Create an input directory in HDFS:
```
hdfs dfs -mkdir /input
```
Create a test file:
```
nano ~/words.txt
```
Copy the file to HDFS:
```
hdfs dfs -copyFromLocal words.txt /input/
```
List the contents of the HDFS input directory:
```
hdfs dfs -ls /input
```
List HDFS directories:
```
hdfs dfs -ls /
hdfs dfs -ls /input
```

Verify running processes:
```
jps
```
You should see processes like Namenode, Datanode, ResourceManager, and NodeManager running.

This setup allows you to run a Hadoop cluster on a single machine for testing or development.

👨‍💻 𝓒𝓻𝓪𝓯𝓽𝓮𝓭 𝓫𝔂: Suraj Kumar Choudhary | 📩 𝓕𝓮𝓮𝓵 𝓯𝓻𝓮𝓮 𝓽𝓸 𝓓𝓜 𝓯𝓸𝓻 𝓪𝓷𝔂 𝓱𝓮𝓵𝓹: csuraj982@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Installing a Single Node Hadoop Cluster.pdf		Installing a Single Node Hadoop Cluster.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Installing a Single-Node Hadoop Cluster

Steps to Install Hadoop

Let's Dive into Detail

About

Uh oh!

Releases

Packages

Surajkumar4-source/Hadoop_Single_Node_Cluster_Implementation

Folders and files

Latest commit

History

Repository files navigation

Installing a Single-Node Hadoop Cluster

Steps to Install Hadoop

Let's Dive into Detail

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages