Skip to content

Surajkumar4-source/Hadoop_Single_Node_Cluster_Implementation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Installing a Single-Node Hadoop Cluster

This document assumes that you have a Ubuntu system installed. The user must have sudo permissions. Set the hostname and add it to the /etc/hosts file.

Steps to Install Hadoop

  1. Create a New Virtual Machine
  2. Create a New User
  3. Log in as the New User
  4. Install Java
  5. Set Environment Variables
  6. Set up SSH
  7. Enable Passwordless SSH
  8. Update the System
  9. Download and Install Hadoop
  10. Configure Hadoop
  11. Format Hadoop Namenode
  12. Start Hadoop Services
  13. Test the Hadoop Setup

Let's Dive into Detail

  1. Update the System

    sudo apt update
    sudo apt upgrade
  2. Create a New Virtual Machine
    Before starting, make the following changes:
    Add two network adapters before running the machine:

    • 1st: NAT
    • 2nd: Host-only
      Then, start the machine.
  3. Create a New User

    sudo adduser hduser

    Provide sudo Permissions to the User:
    Open the sudoers file:

    sudo visudo

    Add the following line to the file:

    hduser ALL=(ALL) NOPASSWD: ALL

    Save and exit the file.

  4. Log in as the New User
    Log in as hduser using PUTTY or any other terminal client.

  5. Install Java
    The Hadoop release supports Java 11 or Java 8. We'll install Java 8:

    sudo apt install openjdk-8-jdk

    Verify the installation:

    java -version
  6. Set Environment Variables
    Edit the .bashrc file:

    nano ~/.bashrc

    Add the following lines:

    # Java Environment
    export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
    export JRE_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre
    
    # Hadoop Environment Variables
    export HADOOP_HOME=/usr/local/hadoop
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export HADOOP_LOG_DIR=$HADOOP_HOME/logs
    export HADOOP_MAPRED_HOME=$HADOOP_HOME
    
    # Add Hadoop bin/ directory to PATH
    export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
    export PDSH_RCMD_TYPE=ssh

    Reload .bashrc:

    source ~/.bashrc
  7. Set up SSH
    Install SSH and PDSH:

    sudo apt install ssh -y
    sudo apt install pdsh -y

    Start and enable the SSH service:

    sudo systemctl start ssh
    sudo systemctl enable ssh
  8. Enable Passwordless SSH
    Generate SSH keys:

    ssh-keygen -t rsa

    Press Enter on all prompts. This will create a .ssh folder and keys.
    Copy the public key:

    ssh-copy-id -i ~/.ssh/id_rsa.pub hduser@localhost

    Verify SSH access without a password:

    ssh hduser@localhost
  9. Download and Install Hadoop
    Download Hadoop:

    wget -c -O hadoop.tar.gz https://dlcdn.apache.org/hadoop/common/hadoop-3.2.4/hadoop-3.2.4.tar.gz
    tar xvzf hadoop.tar.gz

    Move Hadoop to /usr/local/hadoop:

    sudo mkdir /usr/local/hadoop
    sudo mv hadoop-3.2.4/* /usr/local/hadoop/
    sudo chown -R hduser:hduser /usr/local/hadoop
    sudo chmod 755 -R /usr/local/hadoop
  10. Configure Hadoop
    Edit the following configuration files:

    • hadoop-env.sh:
      cd $HADOOP_HOME/etc/hadoop
      nano hadoop-env.sh
      Add/modify the following:
 #JAVA
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
export JRE_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre
#Hadoop Environment Variables
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export HADOOP_LOG_DIR=$HADOOP_HOME/logs
export HDFS_NAMENODE_USER=hduser
export HDFS_DATANODE_USER=hduser
export HDFS_SECONDARYNAMENODE_USER=hduser
export YARN_RESOURCEMANAGER_USER=hduser
export YARN_NODEMANAGER_USER=hduser
export YARN_NODEMANAGER_USER=hduser
# AddHadoopbin/directory to PATH
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
  • core-site.xml:
     nano core-site.xml

Add:

     <configuration>
       <property>
         <name>hadoop.tmp.dir</name>
         <value>/usr/local/hadoop/hd_store/tmp</value>
       </property>
       <property>
         <name>fs.defaultFS</name>
         <value>hdfs://localhost:9000</value>
       </property>
     </configuration>
  • yarn-site.xml:

       nano yarn-site.xml

    Add:

       <configuration>
         <property>
           <name>yarn.nodemanager.aux-services</name>
           <value>mapreduce_shuffle</value>
         </property>
       </configuration>
  • hdfs-site.xml:

     nano hdfs-site.xml

Add:

    <configuration>
      <property>
        <name>dfs.replication</name>
        <value>1</value>
      </property>
      <property>
        <name>dfs.name.dir</name>
        <value>/usr/local/hadoop/hd_store/namenode</value>
      </property>
      <property>
        <name>dfs.data.dir</name>
        <value>/usr/local/hadoop/hd_store/datanode</value>
      </property>
    </configuration>
  • mapred-site.xml:
     nano mapred-site.xml

Add:

   <configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME<
/value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME<
/value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME<
/value>
</property>
</configuration>
  • workers:
    Add localhost if not present:
        nano workers
        localhost
  1. Format Hadoop Namenode

    cd $HADOOP_HOME/sbin
    hadoop namenode -format
  2. Start Hadoop Services
    Start HDFS:

    start-dfs.sh

    Verify the running Java processes:

    jps

    Start YARN:

    start-yarn.sh

    Verify YARN is running:

    jps
  3. Test the Hadoop Setup
    Create an input directory in HDFS:

    hdfs dfs -mkdir /input

    Create a test file:

    nano ~/words.txt

    Copy the file to HDFS:

    hdfs dfs -copyFromLocal words.txt /input/

    List the contents of the HDFS input directory:

    hdfs dfs -ls /input

    List HDFS directories:

    hdfs dfs -ls /
    hdfs dfs -ls /input
  • Verify running processes:
    jps
    You should see processes like Namenode, Datanode, ResourceManager, and NodeManager running.

This setup allows you to run a Hadoop cluster on a single machine for testing or development.






👨‍💻 𝓒𝓻𝓪𝓯𝓽𝓮𝓭 𝓫𝔂: Suraj Kumar Choudhary | 📩 𝓕𝓮𝓮𝓵 𝓯𝓻𝓮𝓮 𝓽𝓸 𝓓𝓜 𝓯𝓸𝓻 𝓪𝓷𝔂 𝓱𝓮𝓵𝓹: csuraj982@gmail.com


About

Implementation of a single-node Hadoop cluster, where a single node functions as both the NameNode and DataNode.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published