内容简介:可以通过--help查看options 使用
- scala集合操作图标
- spark的
standalone模式安装部署(java, scala, hdfs环境配好的情况下)- 解压文件
- 修改配置文件
slaves,spark-env.sh,spark-default.conf - 启动使用
sbin/start-all.sh命令 - 测试使用
bin/spark-shell进入交互式命令行
1. Scala集合操作
本章主要包括
- scala集合操作图标
- spark的
standalone模式安装部署(java, scala, hdfs环境配好的情况下)- 解压文件
- 修改配置文件
slaves,spark-env.sh,spark-default.conf - 启动使用
sbin/start-all.sh命令 - 测试使用
bin/spark-shell进入交互式命令行
1. Scala集合操作
2. spark安装部署
spark有四种部署模式
- Local
- Standalone
- Yarn
- Mesos
2.1 standalone安装模式
- 安装jdk(略)
- 安装Scala(2.10.4)(略)
- 安装Hadoop 2.x(略)
- 安装Spark Standalone
tar -zvxfspark-1.3.0-bin-2.5.0 export SPARK_HOME=/opt/modules/spark-1.3.0-bin-2.5.0
slaves 指定workers的服务器
zk1 复制代码
spark-env.sh
#!/usr/bin/env bash
# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.
JAVA_HOME=/usr/local/jdk
SCALA_HOME=/usr/local/scala
# Options read when launching programs locally with
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
# - SPARK_CLASSPATH, default classpath entries to append
HADOOP_CONF_DIR=/opt/.../etc/hadoop
# Options read by executors and drivers running inside the cluster
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
# - SPARK_CLASSPATH, default classpath entries to append
# - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
# - MESOS_NATIVE_JAVA_LIBRARY, to point to your libmesos.so if you use Mesos
# Options read in YARN client mode
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_EXECUTOR_INSTANCES, Number of workers to start (Default: 2)
# - SPARK_EXECUTOR_CORES, Number of cores for the workers (Default: 1).
# - SPARK_EXECUTOR_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
# - SPARK_DRIVER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 1G)
# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)
# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.
# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.
# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_INSTANCES, to set the number of worker processes per node
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g).
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers
SPARK_MASTER_IP=zk1
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8080
SPARK_WORKER_CORES=1
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_PORT=7077
SPARK_WORKER_WEBUI_PORT=8081
SPARK_WORKER_INSTANCES=1
# Generic options for the daemons used in the standalone deploy mode
# - SPARK_CONF_DIR Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - SPARK_LOG_DIR Where log files are stored. (Default: ${SPARK_HOME}/logs)
# - SPARK_PID_DIR Where the pid file is stored. (Default: /tmp)
# - SPARK_IDENT_STRING A string representing this instance of spark. (Default: $USER)
# - SPARK_NICENESS The scheduling priority for daemons. (Default: 0)
复制代码
- 启动
可以通过--help查看options 使用 sbin/start-all.sh 启动
- 验证
-
jps
-
Web UIzk1:8080
- 进入交互式界面使用
bin/shpark-shell
可以通过 --help 查看命令帮助 通过 --master 指定运行的master
- 通过下面命令进行测试
val fd = sc.testFile("hdfs://zk1:8020/test_input")
fd.collect
复制代码
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:- Golang入门:从安装、部署以及GoLand的安装开始
- Grafana 安装部署
- Haproxy安装部署文档
- 安装部署OpenTSDB
- [jaeger] 一、安装和部署
- PostgreSQL主从流复制部署安装
本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Django 1.0 Template Development
Scott Newman / Packt / 2008 / 24.99
Django is a high-level Python web application framework designed to support the rapid development of dynamic websites, web applications, and web services. Getting the most out of its template system a......一起来看看 《Django 1.0 Template Development》 这本书的介绍吧!