Putting in Hadoop 3.2.1 Single node cluster on Home windows 10

Apr 18, 2020·7 min read

Whereas engaged on a venture two years in the past, I wrote a step-by-step information to set up Hadoop 3.1.0 on Ubuntu 16.04 working system. Since we’re presently engaged on a brand new venture the place we have to set up a Hadoop cluster on Home windows 10, I made a decision to jot down a information for this course of.

This article is a part of a series that we are publishing on TowardsDataScience.com that aims to illustrate how to install Big Data technologies on Windows operating system.

Different printed articles on this sequence:

    Putting in Apache Pig 0.17.0 on Home windows 10

1. Stipulations

First, we have to guarantee that the next conditions are put in:

1. Java 8 runtime atmosphere (JRE): Hadoop 3 requires a Java 8 set up. I choose utilizing the offline installer.

2. Java 8 improvement Equipment (JDK)

3. To unzip downloaded Hadoop binaries, we must always set up 7zip.

4. I’ll create a folder “E:hadoop-env” on my native machine to retailer downloaded recordsdata.

2. Obtain Hadoop binaries

The first step is to download Hadoop binaries from the official website. The binary package size is about 342 MB.

 class=

After ending the file obtain, we must always unpack the package deal utilizing 7zip int two steps. First, we must always extract the hadoop-3.2.1.tar.gz library, after which, we must always unpack the extracted tar file:

 class=

 class=

 class=

The tar file extraction may take some minutes to finish. In the end, you may see some warnings about symbolic link creation. Just ignore these warnings since they are not related to windows.

 class=

After unpacking the package deal, we must always add the Hadoop native IO libraries, which could be discovered within the following GitHub repository: https://github.com/cdarlint/winutils.

Since we are installing Hadoop 3.2.1, we should download the files located in https://github.com/cdarlint/winutils/tree/master/hadoop-3.2.1/bin and copy them into the “hadoop-3.2.1bin” directory.

3. Organising atmosphere variables

After installing Hadoop and its prerequisites, we should configure the environment variables to define Hadoop and Java default paths.

To edit atmosphere variables, go to Management Panel > System and Safety > System (or right-click > properties on My Laptop icon) and click on on the “Advanced system settings” hyperlink.

 class=

When the “Advanced system settings” dialog appears, go to the “Advanced” tab and click on the “Environment variables” button located on the bottom of the dialog.

 class=

Within the “Environment Variables” dialog, press the “New” button so as to add a brand new variable.

Note: In this guide, we will add user variables since we are configuring Hadoop for a single user. If you are looking to configure Hadoop for multiple users, you can define System variables instead.

There are two variables to outline:

1. JAVA_HOME: JDK set up folder path

2. HADOOP_HOME: Hadoop set up folder path

 class=

 class=

Now, we should edit the PATH variable to add the Java and Hadoop binaries paths as shown in the following screenshots.

 class=

 class=

 class=

3.1. JAVA_HOME is incorrectly set error

Now, let’s open PowerShell and attempt to run the next command:

hadoop -version

On this instance, for the reason that JAVA_HOME path comprises areas, I obtained the next error:

JAVA_HOME is incorrectly set

 class=

To resolve this subject, we must always use the home windows 8.3 path as a substitute. For instance:

    Use “Progra~1” as a substitute of “Program Files”

After replacing “Program Files” with “Progra~1”, we closed and reopened PowerShell and tried the same command. As shown in the screenshot below, it runs without errors.

 class=

4. Configuring Hadoop cluster

There are 4 recordsdata we must always alter to configure Hadoop cluster:

    %HADOOP_HOMEpercentetchadoophdfs-site.xml

4.1. HDFS website configuration

As we all know, Hadoop is constructed utilizing a master-slave paradigm. Earlier than altering the HDFS configuration file, we must always create a listing to retailer all grasp node (title node) knowledge and one other one to retailer knowledge (knowledge node). On this instance, we created the next directories:

    E:hadoop-envhadoop-3.2.1datadfsnamenode

Now, let’s open “hdfs-site.xml” file positioned in “%HADOOP_HOME%etchadoop” listing, and we must always add the next properties inside the <configuration></configuration> component:

<property><title>dfs.replication</title><worth>1</worth></property><property><title>dfs.namenode.title.dir</title><value>file:///E:/hadoop-env/hadoop-3.2.1/data/dfs/namenode</value></property><property><title>dfs.datanode.knowledge.dir</title><value>file:///E:/hadoop-env/hadoop-3.2.1/data/dfs/datanode</value></property>

Word that we’ve set the replication issue to 1 since we’re making a single node cluster.

4.2. Core website configuration

Now, we should configure the name node URL adding the following XML code into the <configuration></configuration> element within “core-site.xml”:

<property><title>fs.default.title</title><worth>hdfs://localhost:9820</worth></property>

4.3. Map Scale back website configuration

Now, we should add the following XML code into the <configuration></configuration> element within “mapred-site.xml”:

<property><title>mapreduce.framework.title</title><worth>yarn</worth><description>MapReduce framework title</description></property>

4.4. Yarn website configuration

Now, we must always add the next XML code into the <configuration></configuration> component inside “yarn-site.xml”:

<property><title>yarn.nodemanager.aux-services</title><worth>mapreduce_shuffle</worth><description>Yarn Node Supervisor Aux Service</description></property>

5. Formatting Identify node

After ending the configuration, let’s attempt to format the title node utilizing the next command:

hdfs namenode -format

Attributable to a bug within the Hadoop 3.2.1 launch, you’ll obtain the next error:

2020–04–17 22:04:01,503 ERROR namenode.NameNode: Failed to start out namenode.java.lang.UnsupportedOperationExceptionat java.nio.file.Information.setPosixFilePermissions(Information.java:2044)at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:452)at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:591)at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:613)at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:188)at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1206)at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1649)at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1759)2020–04–17 22:04:01,511 INFO util.ExitUtil: Exiting with standing 1: java.lang.UnsupportedOperationException2020–04–17 22:04:01,518 INFO namenode.NameNode: SHUTDOWN_MSG:

This subject will likely be solved inside the subsequent launch. For now, you possibly can repair it briefly utilizing the next steps (reference):

    Obtain hadoop-hdfs-3.2.1.jar file from the following hyperlink.

Now, if we try to re-execute the format command (Run the command prompt or PowerShell as administrator), you need to approve file system format.

 class=

And the command is executed efficiently:

 class=

6. Beginning Hadoop providers

Now, we are going to open PowerShell, and navigate to “%HADOOP_HOME%sbin” listing. Then we are going to run the next command to start out the Hadoop nodes:

.start-dfs.cmd

 class=

Two command immediate home windows will open (one for the title node and one for the info node) as follows:

 class=

Subsequent, we should begin the Hadoop Yarn service utilizing the next command:

./start-yarn.cmd

 class=

Two command immediate home windows will open (one for the useful resource supervisor and one for the node supervisor) as follows:

 class=

See also  Guardians of Ember

Leave a Reply

Your email address will not be published.