How To Download Hadoop On Mac

12/27/2020

In this video, cloudyrathor explain the installation process on ubuntu 16.04 with Hadoop-2.9.0 and try to explain as easy as possible also try to do the inst.

Pseudo-Distributed Operation

Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

Configuration

Download Oracle VM Virtual Box. For windows user: VirtualBox-5.1.2-108956-Windows.exe. For MAC users: VirtualBox-5.1.2-108956-MAC OSX.dmg. And Great Online Training Hadoop appliance from the buttons above.
Setting up Hadoop 2.6 on Mac OS X Yosemite. After comparing different guides on the internet, I ended up my own version base on the Hadoop official guide with manual download. If you prefer Homebrew, this one would be your best choice.

How to Install Hadoop on Mac OS to create standalone hadoop cluster. Some folks actually create a set of Linux VMs with a full Hadoop/HBase stack and run that on the Mac, but that is a bit of overkill for now. These instructions mainly follow the standard instructions for Apache Hadoop and Apache HBase. Mac OS X Xcode developer tools which includes Java 1.6.x.

Use the following:

etc/hadoop/core-site.xml:

etc/hadoop/hdfs-site.xml:

Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:

If you cannot ssh to localhost without a passphrase, execute the following commands:

Execution

The following instructions are to run a MapReduce job locally. If you want to execute a job on YARN, see YARN on Single Node.

Format the filesystem:
Start NameNode daemon and DataNode daemon:

The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).
Browse the web interface for the NameNode; by default it is available at:
- NameNode - http://localhost:50070/
Make the HDFS directories required to execute MapReduce jobs:
Copy the input files into the distributed filesystem:
Run some of the examples provided:
Into you remix mac miller download. Examine the output files: Copy the output files from the distributed filesystem to the local filesystem and examine them:

or

https://cleverpenny829.weebly.com/blog/xcode-7-free-download-for-mac. View the output files on the distributed filesystem:
When you’re done, stop the daemons with:

YARN on a Single Node

You can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running ResourceManager daemon and NodeManager daemon in addition.

The following instructions assume that 1. ~ 4. steps of the above instructions are already executed.

Configure parameters as follows:etc/hadoop/mapred-site.xml:

etc/hadoop/yarn-site.xml:
Start ResourceManager daemon and NodeManager daemon:
Browse the web interface for the ResourceManager; by default it is available at:
- ResourceManager - http://localhost:8088/
Run a MapReduce job.
When you’re done, stop the daemons with:

Hadoop best performs on a cluster of multiple nodes/servers, however, it can run perfectly on a single machine, even a Mac, so we can use it for development. Also, Spark is a popular tool to process data in Hadoop. The purpose of this blog is to show you the steps to install Hadoop and Spark on a Mac.

Operating System: Mac OSX Yosemite 10.11.3
Hadoop Version 2.7.2
Spark 1.6.1

Pre-requisites

1. Install Java

Open a terminal window to check what Java version is installed.
$ java -version

If Java is not installed, go to https://java.com/en/download/ to download and install latest JDK. If Java is installed, use following command in a terminal window to find the java home path
$ /usr/libexec/java_home

Next we need to set JAVA_HOME environment on mac
$ echo export “JAVA_HOME=$(/usr/libexec/java_home)” >> ~/.bash_profile
$ source ~/.bash_profile

2. Enable SSH as Hadoop requires it.

Go to System Preferences -> Sharing -> and check “Remote Login”.

Generate SSH Keys
$ ssh-keygen -t rsa -P “”
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Open a terminal window, and make sure we can do this.
$>ssh localhost

Mac apps download. Download Hadoop Distribution

Download the latest hadoop distribution (2.7.2 at the time of writing)
http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz

Create Hadoop Folder

Open a new terminal window, and go to the download folder, (let’s use “~/Downloads”), and find hadoop-2.7.2.tar

$ cd ~/Downloads
$ tar xzvf hadoop-2.7.2.tar
$ mv hadoop-2.7.2 /usr/local/hadoop

Hadoop Configuration Files

https://cleverpenny829.weebly.com/blog/mavis-beacon-teaches-typing-download-mac. Go to the directory where your hadoop distribution is installed.
$ cd /usr/local/hadoop

Then change the following files

$ vi etc/hadoop/hdfs-site.xml

<property>

<value>1</value>

</configuration>

$ vi etc/hadoop/core-site.xml

<property>

<value>hdfs://localhost:9000</value>

</configuration>

$ vi etc/hadoop/yarn-site.xml

<property>

<value>mapreduce_shuffle</value>

</configuration>

$ vi etc/hadoop/mapred-site.xml

<property>

<value>yarn</value>

</configuration>

Start Hadoop Services

Format HDFS
$ cd /usr/local/hadoop
$ bin/hdfs namenode -format

Start HDFS
$ sbin/start-dfs.sh

Start YARN
$ sbin/start-yarn.sh

Download Hadoop 2.7.1

Validation

Check HDFS file Directory
$ bin/hdfs dfs -ls /

If you don’t like to include the bin/ every time you run a hadoop command, you can do the following

$ vi ~/.bash_profile
append this line to the end of the file “export PATH=$PATH:/usr/local/hadoop/bin”
$ source ~/.bash_profile

Now try to add the following two folders in HDFS that is needed for MapReduce job, but this time, don’t include the bin/.

$ hdfs dfs -mkdir /user
$ hdfs dfs -mkdir /user/{your username}

You can also open a browser and access Hadoop by using the following URL
http://localhost:50070/

Next: Spark

Installing Spark is a little easier. You can download the latest Spark here:
http://spark.apache.org/downloads.html

It’s a little tricky on choosing which package type. We want to choose “pre-build with user provided Hadoop [can use with most Hadoop distributions]” type, and the downloaded file name is spark-1.6.1-bin-without-hadoop.tgz

Hadoop Mac Os X

After spark is downloaded, we need to untar it. Open a terminal window and do the following:

$ cd ~/Downloads
$ tar xzvf spark-1.6.1-bin-without-hadoop.tgz
$ mv spark-1.6.1-bin-without-hadoop /usr/local/spark

Add spark bin folder to PATH

$ vi ~/.bash_profile
append this line to the end of the file “export PATH=$PATH:/usr/local/spark/bin”
$ source ~/.bash_profile

What about Scala?

Spark is written in Scala, so even though we can use Java to write Spark code, we want to install Scala as well.

Download Scala from here: http://www.scala-lang.org/download/
Choose the first one to download Scala in binary, and the downloaded file is scala-2.11.8.tar

What Is Hadoop

Untar Scala and move it to a dedicated folder

$ cd ~/Downloads
$ tar xzvf scala-2.11.8.tar
$ mv scala-2.11.8 /usr/local/scala

Add Scala bin folder to PATH

Mac Hadoop Install

$ vi ~/.bash_profile
append this line to the end of the file “export PATH=$PATH:/usr/local/scala/bin”
$ source ~/.bash_profile

Now you should be able to do the following to access Spark shell for Scala

Hadoop 2.7 Download

$ spark-shell

That’s it! Happy coding!

Comments are closed.