Hadoop, Scala, Spark 簡單安裝與配置

關於之前 JS 的 this,肯定會更的…

先來一篇水文吧!

前期準備工作

  1. CentOS 7
  2. JDK1.8 自行準備,此文不叙
  3. https://hadoop.apache.org/releases.html 下載 Hadoop 3.x 版本
  4. https://www.scala-lang.org/download/ 下載 Scala 2.12.x 版本(.tgz File)
  5. http://spark.apache.org/downloads.html 下載 Spark 2.3.x 版本
    • Choose a package type: Pre-built for Apache Hadoop 2.7 and later
  6. 全部下載好后,將壓縮包放入 /opt/sources 文件夾下,將各個壓縮包解壓的文件夾放入 /opt 文件夾下。

全局設置

/etc/profile 文件中添加如下内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
## Java
export JAVA_HOME=/opt/jdk1.8.0_181

## Hadoop
export HADOOP_HOME=/opt/hadoop-3.1.1

## HBase
export HBASE_HOME=/opt/hbase-1.4.6

## ZooKeeper
export ZOOKEEPER_HOME=/opt/zookeeper-3.4.12
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$ZOOKEEPER_HOME/bin:$JAVA_HOME/bin:$PATH

## Scala
export SCALA_HOME=/opt/scala-2.12.6
export PATH=$PATH:$SCALA_HOME/bin

## Spark
export SPARK_HOME=/opt/spark-2.3.1-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin

設置完畢後,記得要: source /etc/profile

Hadoop 3.x 偽分佈配置

  1. /opt/hadoop-3.1.1/etc/hadoop/hadoop-env.sh

    1
    2
    3
    # The java implementation to use. By default, this environment
    # variable is REQUIRED on ALL platforms except OS X!
    export JAVA_HOME=/opt/jdk1.8.0_181
  2. /opt/hadoop-3.1.1/etc/hadoop/core-site.xml

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    <configuration>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/users/hadoop/hadoop/tmp</value>
    </property>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    </property>
    </configuration>
  3. /opt/hadoop-3.1.1/etc/hadoop/hdfs-site.xml

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    <configuration>
    <property>
    <name>dfs.datanode.data.dir</name>
    <value>/home/users/hadoop/hadoop/data</value>
    </property>
    <property>
    <name>dfs.namenode.name.dir</name>
    <value>/home/users/hadoop/hadoop/name</value>
    </property>
    <property>
    <name>dfs.http.address</name>
    <value>0.0.0.0:8100</value>
    </property>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    </configuration>
  4. /opt/hadoop-3.1.1/sbin/start-dfs.sh/opt/hadoop-3.1.1/sbin/stop-dfs.sh

    1
    2
    3
    4
    HDFS_DATANODE_USER=root
    HDFS_DATANODE_SECURE_USER=hdfs
    HDFS_NAMENODE_USER=root
    HDFS_SECONDARYNAMENODE_USER=root
  5. (Optional) /opt/hadoop-3.1.1/sbin/start-yarn.sh/opt/hadoop-3.1.1/sbin/start-yarn.sh

    1
    2
    3
    YARN_RESOURCEMANAGER_USER=root
    HDFS_DATANODE_SECURE_USER=yarn
    YARN_NODEMANAGER_USER=root
  6. hdfs namenode -format

  7. start-dfs.sh

測試 Scala 是否正常

1
2
3
[[email protected] ~]# scala -version

Scala code runner version 2.12.6 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.

測試 Spark 是否正常

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[[email protected] ~]# spark-shell
2018-09-13 15:13:57 WARN Utils:66 - Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 192.168.123.101 instead (on interface enp0s3)
2018-09-13 15:13:57 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2018-09-13 15:13:57 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://192.168.123.101:4040
Spark context available as 'sc' (master = local[*], app id = local-1536822855033).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.1
/_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
Type in expressions to have them evaluated.
Type :help for more information.

scala>
0%