Hadi Fadlallah
Apr 18, 2020·7 min read
Whereas engaged on a venture two years in the past, I wrote a step-by-step information to set up Hadoop 3.1.0 on Ubuntu 16.04 working system. Since we’re presently engaged on a brand new venture the place we have to set up a Hadoop cluster on Home windows 10, I made a decision to jot down a information for this course of.
This article is a part of a series that we are publishing on TowardsDataScience.com that aims to illustrate how to install Big Data technologies on Windows operating system.
Different printed articles on this sequence:
- Putting in Apache Pig 0.17.0 on Home windows 10
- Putting in Apache Hive 3.1.2 on Home windows 10
1. Stipulations
First, we have to guarantee that the next conditions are put in:
1. Java 8 runtime atmosphere (JRE): Hadoop 3 requires a Java 8 set up. I choose utilizing the offline installer.
2. Java 8 improvement Equipment (JDK)
3. To unzip downloaded Hadoop binaries, we must always set up 7zip.
4. I’ll create a folder “E:hadoop-env” on my native machine to retailer downloaded recordsdata.
2. Obtain Hadoop binaries
The first step is to download Hadoop binaries from the official website. The binary package size is about 342 MB.
Since we are installing Hadoop 3.2.1, we should download the files located in https://github.com/cdarlint/winutils/tree/master/hadoop-3.2.1/bin and copy them into the “hadoop-3.2.1bin” directory.
3. Organising atmosphere variables
After installing Hadoop and its prerequisites, we should configure the environment variables to define Hadoop and Java default paths.
To edit atmosphere variables, go to Management Panel > System and Safety > System (or right-click > properties on My Laptop icon) and click on on the “Advanced system settings” hyperlink.
Note: In this guide, we will add user variables since we are configuring Hadoop for a single user. If you are looking to configure Hadoop for multiple users, you can define System variables instead.
There are two variables to outline:
1. JAVA_HOME: JDK set up folder path
2. HADOOP_HOME: Hadoop set up folder path
Now, let’s open PowerShell and attempt to run the next command:
hadoop -version
On this instance, for the reason that JAVA_HOME path comprises areas, I obtained the next error:
JAVA_HOME is incorrectly set
- Use “Progra~1” as a substitute of “Program Files”
- Use “Progra~2” as a substitute of “Program Files(x86)”
After replacing “Program Files” with “Progra~1”, we closed and reopened PowerShell and tried the same command. As shown in the screenshot below, it runs without errors.
There are 4 recordsdata we must always alter to configure Hadoop cluster:
- %HADOOP_HOMEpercentetchadoophdfs-site.xml
- %HADOOP_HOMEpercentetchadoopcore-site.xml
- %HADOOP_HOMEpercentetchadoopmapred-site.xml
- %HADOOP_HOMEpercentetchadoopyarn-site.xml
4.1. HDFS website configuration
As we all know, Hadoop is constructed utilizing a master-slave paradigm. Earlier than altering the HDFS configuration file, we must always create a listing to retailer all grasp node (title node) knowledge and one other one to retailer knowledge (knowledge node). On this instance, we created the next directories:
- E:hadoop-envhadoop-3.2.1datadfsnamenode
- E:hadoop-envhadoop-3.2.1datadfsdatanode
Now, let’s open “hdfs-site.xml” file positioned in “%HADOOP_HOME%etchadoop” listing, and we must always add the next properties inside the <configuration></configuration> component:
<property><title>dfs.replication</title><worth>1</worth></property><property><title>dfs.namenode.title.dir</title><value>file:///E:/hadoop-env/hadoop-3.2.1/data/dfs/namenode</value></property><property><title>dfs.datanode.knowledge.dir</title><value>file:///E:/hadoop-env/hadoop-3.2.1/data/dfs/datanode</value></property>
Word that we’ve set the replication issue to 1 since we’re making a single node cluster.
4.2. Core website configuration
Now, we should configure the name node URL adding the following XML code into the <configuration></configuration> element within “core-site.xml”:
<property><title>fs.default.title</title><worth>hdfs://localhost:9820</worth></property>
4.3. Map Scale back website configuration
Now, we should add the following XML code into the <configuration></configuration> element within “mapred-site.xml”:
<property><title>mapreduce.framework.title</title><worth>yarn</worth><description>MapReduce framework title</description></property>
4.4. Yarn website configuration
Now, we must always add the next XML code into the <configuration></configuration> component inside “yarn-site.xml”:
<property><title>yarn.nodemanager.aux-services</title><worth>mapreduce_shuffle</worth><description>Yarn Node Supervisor Aux Service</description></property>
5. Formatting Identify node
After ending the configuration, let’s attempt to format the title node utilizing the next command:
hdfs namenode -format
Attributable to a bug within the Hadoop 3.2.1 launch, you’ll obtain the next error:
2020–04–17 22:04:01,503 ERROR namenode.NameNode: Failed to start out namenode.java.lang.UnsupportedOperationExceptionat java.nio.file.Information.setPosixFilePermissions(Information.java:2044)at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:452)at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:591)at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:613)at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:188)at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1206)at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1649)at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1759)2020–04–17 22:04:01,511 INFO util.ExitUtil: Exiting with standing 1: java.lang.UnsupportedOperationException2020–04–17 22:04:01,518 INFO namenode.NameNode: SHUTDOWN_MSG:
This subject will likely be solved inside the subsequent launch. For now, you possibly can repair it briefly utilizing the next steps (reference):
- Obtain hadoop-hdfs-3.2.1.jar file from the following hyperlink.
- Rename the file title hadoop-hdfs-3.2.1.jar to hadoop-hdfs-3.2.1.bak in folder %HADOOP_HOMEpercentsharehadoophdfs
- Copy the downloaded hadoop-hdfs-3.2.1.jar to folder %HADOOP_HOMEpercentsharehadoophdfs
Now, if we try to re-execute the format command (Run the command prompt or PowerShell as administrator), you need to approve file system format.
Now, we are going to open PowerShell, and navigate to “%HADOOP_HOME%sbin” listing. Then we are going to run the next command to start out the Hadoop nodes:
.start-dfs.cmd
./start-yarn.cmd