Ambari 2.0.0 Documentation Suite

Transcription

Ambari 2.0.0
Documentation Suite
March 26, 2015
© 2012-2015 Hortonworks, Inc.
Hortonworks Data Platform
Ambari 2.0.0
March 26, 2015
2
March 26, 2015
Table of Contents
Ambari User's Guide ................................................................................................................................. 11
Overview ................................................................................................................................................ 11
Architecture ........................................................................................................................................... 11
Sessions ............................................................................................................................................. 11
Accessing Ambari Web ......................................................................................................................... 12
Monitoring and Managing your HDP Cluster with Ambari .................................................................. 12
Viewing Metrics on the Dashboard ...................................................................................................... 13
Scanning System Metrics ................................................................................................................. 13
Viewing Heatmaps ............................................................................................................................. 17
Scanning Status................................................................................................................................. 18
Managing Hosts .................................................................................................................................... 19
Working with Hosts ........................................................................................................................... 19
Determining Host Status ................................................................................................................... 19
Filtering the Hosts List ...................................................................................................................... 20
Performing Host-Level Actions ......................................................................................................... 20
Viewing Components on a Host ....................................................................................................... 21
Decommissioning Masters and Slaves ............................................................................................ 22
Deleting a Host from a Cluster .......................................................................................................... 23
Setting Maintenance Mode ............................................................................................................... 24
Adding Hosts to a Cluster ................................................................................................................. 26
Managing Services ................................................................................................................................ 26
Starting and Stopping All Services ................................................................................................... 27
Selecting a Service ............................................................................................................................ 27
Editing Service Config Properties ..................................................................................................... 31
Viewing Summary, Alert, and Health Information ............................................................................ 31
Performing Service Actions .............................................................................................................. 32
Using Quick Links .............................................................................................................................. 34
Rolling Restarts ................................................................................................................................. 34
Refreshing YARN Capacity Scheduler ............................................................................................. 36
Rebalancing HDFS ............................................................................................................................ 36
Managing Service High Availability ...................................................................................................... 37
NameNode High Availability.............................................................................................................. 37
Resource Manager High Availability ................................................................................................. 49
3
March 26, 2015
HBase High Availability ..................................................................................................................... 50
Hive High Availability ......................................................................................................................... 50
Oozie High Availability ....................................................................................................................... 51
Managing Configurations ..................................................................................................................... 52
Configuring Services ......................................................................................................................... 52
Using Host Config Groups ................................................................................................................ 53
Customizing Log Settings ................................................................................................................. 54
Downloading Client Configs ............................................................................................................. 55
Service Configuration Versions ......................................................................................................... 55
Administering the Cluster ..................................................................................................................... 60
Managing Stack and Versions .......................................................................................................... 60
Service Accounts ............................................................................................................................... 63
Kerberos............................................................................................................................................. 63
Monitoring and Alerts............................................................................................................................ 64
Managing Alerts ................................................................................................................................. 64
Configuring Notifications .................................................................................................................. 66
List of Predefined Alerts .................................................................................................................... 67
Installing HDP Using Ambari .................................................................................................................... 78
Determine Stack Compatibility ............................................................................................................. 78
Meet Minimum System Requirements ................................................................................................. 78
Hardware Recommendations ........................................................................................................... 79
Operating Systems Requirements .................................................................................................... 79
Browser Requirements ...................................................................................................................... 79
Software Requirements ..................................................................................................................... 80
JDK Requirements............................................................................................................................. 80
Database Requirements .................................................................................................................... 81
Memory Requirements ...................................................................................................................... 81
Package Size and Inode Count Requirements ................................................................................ 82
Check the Maximum Open File Descriptors .................................................................................... 82
Collect Information ................................................................................................................................ 82
Prepare the Environment ...................................................................................................................... 83
Check Existing Package Versions .................................................................................................... 83
Set Up Password-less SSH ............................................................................................................... 84
Set up Service User Accounts .......................................................................................................... 85
Enable NTP on the Cluster and on the Browser Host ..................................................................... 85
4
March 26, 2015
Check DNS ......................................................................................................................................... 85
Configuring iptables .......................................................................................................................... 86
Disable SELinux and PackageKit and check the umask Value ....................................................... 87
Using a Local Repository ...................................................................................................................... 87
Obtaining the Repositories ............................................................................................................... 87
Setting Up a Local Repository .......................................................................................................... 91
Download the Ambari Repo.................................................................................................................. 97
Set Up the Ambari Server ................................................................................................................... 102
Setup Options .................................................................................................................................. 103
Start the Ambari Server ...................................................................................................................... 104
Install, Configure and Deploy a HDP Cluster ........................................................................................ 105
Log In to Apache Ambari .................................................................................................................... 105
Launching the Ambari Install Wizard ................................................................................................. 105
Name Your Cluster .............................................................................................................................. 106
Select Stack ........................................................................................................................................ 106
Install Options ..................................................................................................................................... 109
Confirm Hosts ..................................................................................................................................... 109
Choose Services ................................................................................................................................. 110
Assign Masters .................................................................................................................................... 110
Assign Slaves and Clients .................................................................................................................. 110
Customize Services ............................................................................................................................ 111
Review ................................................................................................................................................. 112
Install, Start and Test .......................................................................................................................... 112
Complete ............................................................................................................................................. 112
Upgrading Ambari ................................................................................................................................... 113
Ambari 2.0 Upgrade Guide ................................................................................................................. 113
Upgrading to Ambari 2.0 ................................................................................................................. 113
Planning for Ambari Alerts and Metrics in Ambari 2.0 ................................................................... 119
Upgrading Ambari with Kerberos-Enabled Cluster ....................................................................... 121
Upgrading the HDP Stack from 2.1 to 2.2.......................................................................................... 122
Prepare the 2.1 Stack for Upgrade ................................................................................................. 123
Upgrade the 2.1 Stack to 2.2 .......................................................................................................... 128
Complete the Upgrade of the 2.1 Stack to 2.2 ............................................................................... 132
Upgrading the HDP Stack from 2.0 to 2.2.......................................................................................... 154
Prepare the 2.0 Stack for Upgrade ................................................................................................. 155
5
March 26, 2015
Upgrade the 2.0 Stack to 2.2 .......................................................................................................... 160
Complete the Upgrade of the 2.0 Stack to 2.2 ............................................................................... 164
Automated HDP Stack Upgrade: HDP 2.2.0 to 2.2.4 ......................................................................... 187
Prerequisites .................................................................................................................................... 187
Preparing to Upgrade ...................................................................................................................... 188
Registering a New Version .............................................................................................................. 188
Installing a New Version on All Hosts ............................................................................................. 188
Performing an Upgrade ................................................................................................................... 189
Manual HDP Stack Upgrade: HDP 2.2.0 to 2.2.4 ............................................................................... 189
Registering a New Version .............................................................................................................. 189
Installing a New Version on All Hosts ............................................................................................. 190
Performing a Manual Upgrade ........................................................................................................ 190
Administering Ambari ............................................................................................................................. 193
Terms and Definitions ......................................................................................................................... 193
Logging in to Ambari ........................................................................................................................... 194
About the Ambari Administration Interface ....................................................................................... 194
Changing the Administrator Account Password ............................................................................... 195
Ambari Admin Tasks ........................................................................................................................... 195
Creating a Cluster ............................................................................................................................... 195
Setting Cluster Permissions ............................................................................................................... 196
Viewing the Cluster Dashboard .......................................................................................................... 197
Renaming a Cluster ............................................................................................................................. 197
Managing Users and Groups ................................................................................................................. 198
Users and Groups Overview ............................................................................................................... 198
Local and LDAP User and Group Types ........................................................................................ 198
Ambari Admin Privileges ................................................................................................................. 198
Creating a Local User ......................................................................................................................... 199
Setting User Status ............................................................................................................................. 199
Setting the Ambari Admin Flag ........................................................................................................... 199
Changing the Password for a Local User .......................................................................................... 200
Deleting a Local User .......................................................................................................................... 200
Creating a Local Group ....................................................................................................................... 200
Managing Group Membership ............................................................................................................ 201
Adding a User to a Group ............................................................................................................... 201
Modifying Group Membership ........................................................................................................ 201
6
March 26, 2015
Deleting a Local Group ....................................................................................................................... 201
Managing Views ...................................................................................................................................... 202
Terminology ......................................................................................................................................... 202
Basic Concepts ................................................................................................................................... 203
Deploying a View ................................................................................................................................. 204
Creating View Instances ..................................................................................................................... 204
Setting View Permissions ................................................................................................................... 205
Additional Information......................................................................................................................... 205
Ambari Security Guide ............................................................................................................................ 207
Configuring Ambari and Hadoop for Kerberos .................................................................................. 207
Kerberos Overview .......................................................................................................................... 207
Hadoop and Kerberos Principals .................................................................................................... 208
Installing and Configuring the KDC ................................................................................................ 209
Enabling Kerberos Security in Ambari ............................................................................................ 213
Running the Kerberos Wizard ......................................................................................................... 214
Kerberos Client Packages ............................................................................................................... 215
Post-Kerberos Wizard User/Group Mapping ................................................................................. 216
Advanced Security Options for Ambari ................................................................................................. 218
Configuring Ambari for LDAP or Active Directory Authentication .................................................... 218
Setting Up LDAP User Authentication ............................................................................................ 218
Optional: Encrypt Database and LDAP Passwords .......................................................................... 222
Reset Encryption ............................................................................................................................. 223
Optional: Set Up SSL for Ambari ........................................................................................................ 224
Set Up HTTPS for Ambari Server .................................................................................................... 224
Optional: Set Up Kerberos for Ambari Server .................................................................................... 225
Optional: Set Up Two-Way SSL Between Ambari Server and Ambari Agents................................. 226
Optional: Configure Ciphers and Protocols for Ambari Server ......................................................... 226
Troubleshooting Ambari Deployments .................................................................................................. 227
Introduction: Troubleshooting Ambari Issues ................................................................................... 227
Reviewing Ambari Log Files ............................................................................................................... 227
Resolving Ambari Installer Problems ................................................................................................. 227
Problem: Browser crashed before Install Wizard completes ........................................................ 227
Problem: Install Wizard reports that the cluster install has failed ................................................. 228
Problem: Ambari Agents May Fail to Register with Ambari Server............................................... 228
Problem: The “yum install ambari-server” Command Fails .......................................................... 229
7
March 26, 2015
Problem: HDFS Smoke Test Fails ................................................................................................... 229
Problem: yum Fails on Free Disk Space Check ............................................................................. 230
Problem: A service with a customized service user is not appearing properly in Ambari Web .. 230
Resolving Cluster Deployment Problems .......................................................................................... 230
Problem: Trouble Starting Ambari on System Reboot .................................................................. 230
Problem: Metrics and Host information display incorrectly in Ambari Web ................................. 231
Problem: On SUSE 11 Ambari Agent crashes within the first 24 hours ........................................ 231
Problem: Attempting to Start HBase REST server causes either REST server or Ambari Web to
fail ..................................................................................................................................................... 231
Problem: Multiple Ambari Agent processes are running, causing re-register.............................. 231
Problem: Some graphs do not show a complete hour of data until the cluster has been running
for an hour ........................................................................................................................................ 232
Problem: Ambari stops MySQL database during deployment, causing Ambari Server to crash.232
Problem: Cluster Install Fails with Groupmod Error ...................................................................... 232
Problem: Host registration fails during Agent bootstrap on SLES due to timeout. ..................... 232
Problem: Host Check Fails if Transparent Huge Pages (THP) is not disabled. ............................ 233
Resolving General Problems .............................................................................................................. 233
During Enable Kerberos, the Check Kerberos operation fails. ..................................................... 233
Problem: Hive developers may encounter an exception error message during Hive Service Check
.......................................................................................................................................................... 234
Problem: API calls for PUT, POST, DELETE respond with a "400 - Bad Request" ...................... 234
Ambari Reference Guide ........................................................................................................................ 235
Ambari Reference Topics ................................................................................................................... 235
Installing Ambari Agents Manually......................................................................................................... 236
Download the Ambari Repo................................................................................................................ 236
Install the Ambari Agents Manually .................................................................................................... 240
Configuring Ambari for Non-Root .......................................................................................................... 243
How to Configure Ambari Server for Non-Root ................................................................................. 243
How to Configure an Ambari Agent for Non-Root ............................................................................. 243
Sudoer Configuration ...................................................................................................................... 243
Customizable Users ........................................................................................................................ 243
Non-Customizable Users ................................................................................................................ 244
Commands ....................................................................................................................................... 244
Sudo Defaults .................................................................................................................................. 245
Customizing HDP Services .................................................................................................................... 246
Defining Service Users and Groups for a HDP 2.x Stack .................................................................. 246
8
March 26, 2015
Setting Properties That Depend on Service Usernames/Groups ..................................................... 247
Configuring Storm for Supervision ........................................................................................................ 248
Configuring Storm for Supervision ..................................................................................................... 248
Using Custom Host Names .................................................................................................................... 250
How to Customize the name of a host ............................................................................................... 250
Moving the Ambari Server ...................................................................................................................... 251
Back up Current Data ......................................................................................................................... 251
Update Agents .................................................................................................................................... 251
Install the New Server and Populate the Databases ......................................................................... 252
Configuring LZO Compression .............................................................................................................. 254
Configure core-site.xml for LZO ......................................................................................................... 254
Running Compression with Hive Queries .......................................................................................... 254
Create LZO Files .............................................................................................................................. 254
Write Custom Java to Create LZO Files ......................................................................................... 255
Using Non-Default Databases ................................................................................................................ 256
Using Non-Default Databases - Ambari ............................................................................................. 256
Using Ambari with Oracle................................................................................................................ 256
Using Ambari with MySQL .............................................................................................................. 257
Using Ambari with PostgreSQL ...................................................................................................... 258
Troubleshooting Ambari .................................................................................................................. 259
Using Non-Default Databases - Hive ................................................................................................. 260
Using Hive with Oracle .................................................................................................................... 260
Using Hive with MySQL ................................................................................................................... 261
Using Hive with PostgreSQL ........................................................................................................... 263
Troubleshooting Hive ...................................................................................................................... 264
Using Non-Default Databases - Oozie ............................................................................................... 266
Using Oozie with Oracle .................................................................................................................. 266
Using Oozie with MySQL................................................................................................................. 266
Using Oozie with PostgreSQL ......................................................................................................... 267
Troubleshooting Oozie .................................................................................................................... 268
Setting up an Internet Proxy Server for Ambari .................................................................................... 270
How To Set Up an Internet Proxy Server for Ambari ......................................................................... 270
Configuring Network Port Numbers....................................................................................................... 271
Default Network Port Numbers - Ambari ........................................................................................... 271
Optional: Changing the Default Ambari Server Port .......................................................................... 271
9
March 26, 2015
Changing the JDK Version on an Existing Cluster ................................................................................ 273
How to change the JDK Version for an Existing Cluster ................................................................... 273
Using Ambari Blueprints ......................................................................................................................... 274
Overview: Ambari Blueprints .............................................................................................................. 274
Configuring HDP Stack Repositories for Red Hat Satellite .................................................................. 275
How To Configure HDP Stack Repositories for Red Hat Satellite .................................................... 275
Tuning Ambari Performance .................................................................................................................. 276
How To Tune Ambari Performance .................................................................................................... 276
Using Ambari Views ................................................................................................................................ 277
Tez View............................................................................................................................................... 277
Configuring Tez in Your Cluster ...................................................................................................... 277
Deploying the Tez View ................................................................................................................... 278
Hive SQL on Tez - DAG, Vertex and Task ...................................................................................... 280
Using the Jobs View............................................................................................................................ 286
Deploying the Jobs View ................................................................................................................. 286
Using the Slider View .......................................................................................................................... 286
Deploying the Slider View ............................................................................................................... 286
10
March 26, 2015
Ambari User's Guide
Overview
Hadoop is a large-scale, distributed data storage and processing infrastructure using clusters of
commodity hosts networked together. Monitoring and managing such complex distributed systems
is a non-trivial task. To help you manage the complexity, Apache Ambari collects a wide range of
information from the cluster's nodes and services and presents it to you in an easy-to-read and use,
centralized web interface, Ambari Web.
Ambari Web displays information such as service-specific summaries, graphs, and alerts. You use
Ambari Web to create and manage your HDP cluster and to perform basic operational tasks such as
starting and stopping services, adding hosts to your cluster, and updating service configurations.
You also use Ambari Web to perform administrative tasks for your cluster, such as managing users
and groups and deploying Ambari Views.
For more information on administering Ambari users, groups and views, refer to the Ambari
Administration Guide.
Architecture
The Ambari Server serves as the collection point for data from across your cluster. Each host has a
copy of the Ambari Agent - either installed automatically by the Install wizard or manually - which
allows the Ambari Server to control each host.
Ambari Server Architecture
Sessions
Ambari Web is a client-side JavaScript application, which calls the Ambari REST API (accessible
from the Ambari Server) to access cluster information and perform cluster operations. After
authenticating to Ambari Web, the application authenticates to the Ambari Server. Communication
between the browser and server occurs asynchronously via the REST API.
Ambari Web sessions do not time out. The Ambari Server application constantly accesses the
Ambari REST API, which resets the session timeout. During any period of Ambari Web inactivity, the
Ambari Web user interface (UI) refreshes automatically. You must explicitly sign out of the Ambari
Web UI to destroy the Ambari session with the server.
11
March 26, 2015
Accessing Ambari Web
Typically, you start the Ambari Server and Ambari Web as part of the installation process. If Ambari
Server is stopped, you can start it using a command line editor on the Ambari Server host machine.
Enter the following command:
ambari-server start
To access Ambari Web, open a supported browser and enter the Ambari Web URL:
http://<your.ambari.server>:8080
Enter your user name and password. If this is the first time Ambari Web is accessed, use the default
values, admin/admin.
These values can be changed, and new users provisioned, using the Manage Ambari option.
For more information about managing users and other administrative tasks, see Administering
Ambari.
Monitoring and Managing your HDP Cluster with Ambari
This topic describes how to use Ambari Web features to monitor and manage your HDP cluster. To
navigate, select one of the following feature tabs located at the top of the Ambari main window. The
selected tab appears white.
•
Viewing Metrics on the Dashboard
•
Managing Services
•
Managing Hosts
•
Managing Service High Availabilty
•
Managing Configurations
•
Administering the Cluster
12
March 26, 2015
Viewing Metrics on the Dashboard
Ambari Web displays the Dashboard page as the home page. Use the Dashboard to view the
operating status of your cluster in the following three ways:
•
Scanning System Metrics
•
Scanning Status
•
Viewing Heatmaps
Scanning System Metrics
View Metrics that indicate the operating status of your cluster on the Ambari Dashboard. Each
metrics widget displays status information for a single service in your HDP cluster. The Ambari
Dashboard displays all metrics for the HDFS, YARN, HBase, and Storm services, and cluster-wide
metrics by default.
Metrics data for Storm is buffered and sent as a batch to Ambari every five minutes.
After adding the Storm service, anticipate a five-minute delay for Storm metrics to
appear.
You can add and remove individual widgets, and rearrange the dashboard by dragging and dropping
each widget to a new location in the dashboard.
Status information appears as simple pie and bar charts, more complex charts showing usage and
load, sets of links to additional data sources, and values for operating parameters such as uptime
and average RPC queue wait times. Most widgets display a single fact by default. For example,
HDFS Disk Usage displays a load chart and a percentage figure. The Ambari Dashboard includes
metrics for the following services:
Table 1. Ambari Service Metrics and Descriptions Metric:
HDFS
HDFS Disk Usage
Data Nodes Live
NameNode Heap
NameNode RPC
Description:
The Percentage of DFS used, which is a
combination of DFS and non-DFS used.
The number of DataNodes live, as reported from
the NameNode.
The percentage of NameNode JVM Heap used.
The average RPC queue latency.
13
NameNode CPU WIO
NameNode Uptime
YARN (HDP 2.1 or later Stacks)
ResourceManager Heap
ResourceManager Uptime
NodeManagers Live
YARN Memory
HBase
HBase Master Heap
HBase Ave Load
HBase Master Uptime
Region in Transition
Storm (HDP 2.1 or later Stacks)
Supervisors Live
March 26, 2015
The percentage of CPU Wait I/O.
The NameNode uptime calculation.
The percentage of ResourceManager JVM Heap
used.
The ResourceManager uptime calculation.
The number of DataNodes live, as reported from
the ResourceManager.
The percentage of available YARN memory (used
vs. total available).
The percentage of NameNode JVM Heap used.
The average load on the HBase server.
The HBase Master uptime calculation.
The number of HBase regions in transition.
The number of Supervisors live, as reported from
the Nimbus server.
Drilling Into Metrics for a Service
•
To see more detailed information about a service, hover your cursor over a Metrics widget.
More detailed information about the service displays, as shown in the following example:
•
To remove a widget from the mashup, click the white X.
•
To edit the display of information in a widget, click the pencil icon. For more information
about editing a widget, see Customizing Metrics Display.
Viewing Cluster-Wide Metrics
Cluster-wide metrics display information that represents your whole cluster. The Ambari Dashboard
shows the following cluster-wide metrics:
14
March 26, 2015
Table 2. Ambari Cluster-‐Wide Metrics and Descriptions Metric:
Memory Usage
Network Usage
CPU Usage
Cluster Load
Description:
The cluster-wide memory utilization, including memory
cached, swapped, used, shared.
The cluster-wide network utilization, including in-andout.
Cluster-wide CPU information, including system, user
and wait IO.
Cluster-wide Load information, including total number of
nodes. total number of CPUs, number of running
processes and 1-min Load.
•
To remove a widget from the dashboard, click the white X.
•
Hover your cursor over each cluster-wide metric to magnify the chart or itemize the widget
display.
•
To remove or add metric items from each cluster-wide metric widget, select the item on the
widget legend.
•
To see a larger view of the chart, select the magnifying glass icon.
Ambari displays a larger version of the widget in a pop-out window, as shown in the following
example:
Use the pop-up window in the same ways that you use cluster-wide metric widgets on the
dashboard.
To close the widget pop-up window, choose OK.
Adding a Widget to the Dashboard
To replace a widget that has been removed from the dashboard:
1
Select the Metrics drop-down, as shown in the following example:
2
Choose Add.
3
Select a metric, such as Region in Transition.
15
4
March 26, 2015
Choose Apply.
Resetting the Dashboard
To reset all widgets on the dashboard to display default settings:
1
Select the Metrics drop-down, as shown in the following example:
2
Choose Edit.
3
Choose Reset all widgets to default.
Customizing Metrics Display
To customize the way a service widget displays metrics information:
1
Hover your cursor over a service widget.
2
Select the pencil-shaped, edit icon that appears in the upper-right corner.
The Customize Widget pop-up window displays properties that you can edit, as shown in the
following example.
3
Follow the instructions in the Customize Widget pop-up to customize widget appearance.
In this example, you can adjust the thresholds at which the HDFS Capacity bar chart changes
color, from green to orange to red.
4
To save your changes and close the editor, choose Apply.
5
To close the editor without saving any changes, choose Cancel.
Not all widgets support editing.
Viewing More Metrics for your HDP Stack
The HDFS Links and HBase Links widgets list HDP components for which links to more metrics
information, such as thread stacks, logs and native component UIs are available. For example, you
can link to NameNode, Secondary NameNode, and DataNode components for HDFS, using the links
shown in the following example:
16
March 26, 2015
Choose the More drop-down to select from the list of links available for each service. The Ambari
Dashboard includes additional links to metrics for the following services:
Table 3. Links to More Metrics for HDP Services Service:
HDFS
Metric:
Description:
NameNode UI
NameNode Logs
NameNode JMX
Thread Stacks
Links to the NameNode UI.
Links to the NameNode logs.
Links to the NameNode JMX servlet.
Links to the NameNode thread stack
traces.
HBase Master UI
HBase Logs
ZooKeeper Info
HBase Master JMX
Links to the HBase Master UI.
Links to the HBase logs.
Links to ZooKeeper information.
Links to the HBase Master JMX
servlet.
Links to debug information.
Links to the HBase Master thread
stack traces.
HBase
Debug Dump
Thread Stacks
Viewing Heatmaps
Heatmaps provides a graphical representation of your overall cluster utilization using simple color
coding.
17
March 26, 2015
A colored block represents each host in your cluster. To see more information about a specific host,
hover over the block representing the host in which you are interested. A pop-up window displays
metrics about HDP components installed on that host. Colors displayed in the block represent usage
in a unit appropriate for the selected set of metrics. If any data necessary to determine state is not
available, the block displays "Invalid Data". Changing the default maximum values for the heatmap
lets you fine tune the representation. Use the Select Metric drop-down to select the metric type.
Heatmaps supports the following metrics:
Metric
Host/Disk Space Used %
Host/Memory Used %
Host/CPU Wait I/O %
HDFS/Bytes Read
HDFS/Bytes Written
HDFS/Garbage Collection Time
HDFS/JVM Heap MemoryUsed
YARN/Garbage Collection Time
YARN / JVM Heap Memory Used
YARN / Memory used %
HBase/RegionServer read
request count
HBase/RegionServer write
request count
HBase/RegionServer
compaction queue size
HBase/RegionServer regions
HBase/RegionServer memstore
sizes
Uses
disk.disk_free and disk.disk_total
memory.mem_free and memory.mem_total
cpu.cpu_wio
dfs.datanode.bytes_read
dfs.datanode.bytes_written
jvm.gcTimeMillis
jvm.memHeapUsedM
jvm.gcTimeMillis
jvm.memHeapUsedM
UsedMemoryMB and AvailableMemoryMB
hbase.regionserver.readRequestsCount
hbase.regionserver.writeRequestsCount
hbase.regionserver.compactionQueueSize
hbase.regionserver.regions
hbase.regionserver.memstoreSizeMB
Scanning Status
Notice the color of the dot appearing next to each component name in a list of components, services
or hosts. The dot color and blinking action indicates operating status of each component, service, or
host. For example, in the Summary View, notice green dot next to each service name. The following
colors and actions indicate service status:
Table 4. Status Indicators Color
Solid Green
Blinking Green
Solid Red
Blinking Red
Status
All masters are running
Starting up
At least one master is down
Stopping
18
March 26, 2015
Click the service name to open the Services screen, where you can see more detailed information
on each service.
Managing Hosts
Use Ambari Hosts to manage multiple HDP components such as DataNodes, NameNodes,
NodeManagers and RegionServers, running on hosts throughout your cluster. For example, you can
restart all DataNode components, optionally controlling that task with rolling restarts. Ambari Hosts
supports filtering your selection of host components, based on operating status, host health, and
defined host groupings.
Working with Hosts
Use Hosts to view hosts in your cluster on which Hadoop services run. Use options on Actions to
perform actions on one or more hosts in your cluster.
View individual hosts, listed by fully-qualified domain name, on the Hosts landing page.
Determining Host Status
A colored dot beside each host name indicates operating status of each host, as follows:
•
Red - At least one master component on that host is down. Hover to see a tooltip that lists
affected components.
•
Orange - At least one slave component on that host is down. Hover to see a tooltip that lists
affected components.
•
Yellow - Ambari Server has not received a heartbeat from that host for more than 3 minutes.
•
Green - Normal running state.
A red condition flag overrides an orange condition flag, which overrides a yellow condition flag. In
other words, a host having a master component down may also have other issues. The following
example shows three hosts, one having a master component down, one having a slave component
down, and one healthy. Warning indicators appear next to hosts having a component down.
19
March 26, 2015
Filtering the Hosts List
Use Filters to limit listed hosts to only those having a specific operating status. The number of hosts
in your cluster having a listed operating status appears after each status name, in parenthesis. For
example, the following cluster has one host having healthy status and three hosts having
Maintenance Mode turned on.
For example, to limit the list of hosts appearing on Hosts home to only those with Healthy status,
select Filters, then choose the Healthy option. In this case, one host name appears on Hosts home.
Alternatively, to limit the list of hosts appearing on Hosts home to only those having Maintenance
Mode on, select Filters, then choose the Maintenance Mode option. In this case, three host names
appear on Hosts home.
Use the general filter tool to apply specific search and sort criteria that limits the list of hosts
appearing on the Hosts page.
Performing Host-Level Actions
Use Actions to act on one, or multiple hosts in your cluster. Actions performed on multiple hosts are
also known as bulk operations.
Actions comprises three menus that list the following options types:
•
Hosts - lists selected, filtered or all hosts options, based on your selections made using
Hosts home and Filters.
•
Objects - lists component objects that match your host selection criteria.
•
Operations - lists all operations available for the component objects you selected.
For example, to restart DataNodes on one host:
1
In Hosts, select a host running at least one DataNode.
2
In Actions, choose Selected Hosts > DataNodes > Restart, as shown in the following
image.
3
Choose OK to confirm starting the selected operation.
20
4
March 26, 2015
Optionally, use Monitoring Background Operations to follow, diagnose or troubleshoot the
restart operation.
Viewing Components on a Host
To manage components running on a specific host, choose a FQDN on the Hosts page. For example,
choose c6403.ambari.apache.org in the default example shown. Summary-Components lists all
components installed on that host.
Choose options in Host Actions, to start, stop, restart, delete, or turn on maintenance mode for all
components installed on the selected host.
Alternatively, choose action options from the drop-down menu next to an individual component on a
host. The drop-down menu shows current operation status for each component, For example, you
can decommission, restart, or stop the DataNode component (started) for HDFS, by selecting one of
the options shown in the following example:
21
March 26, 2015
Decommissioning Masters and Slaves
Decommissioning is a process that supports removing a component from the cluster. You must
decommission a master or slave running on a host before removing the component or host from
service. Decommissioning helps prevent potential loss of data or service disruption.
Decommissioning is available for the following component types:
•
DataNodes
•
NodeManagers
•
RegionServers
Decommissioning executes the following tasks:
•
For DataNodes, safely replicates the HDFS data to other DataNodes in the cluster.
•
For NodeManagers, stops accepting new job requests from the masters and stops the
component.
•
For RegionServers, turns on drain mode and stops the component.
How to Decommission a Component
To decommission a component using Ambari Web, browse Hosts to find the host FQDN on which
the component resides.
Using Actions, select HostsComponent Type, then choose Decommission.
For example:
The UI shows "Decommissioning" status while steps process, then "Decommissioned" when
complete.
How to Delete a Component
To delete a component using Ambari Web, on Hosts choose the host FQDN on which the
component resides.
1
In Components, find a decommissioned component.
2
Stop the component, if necessary.
22
March 26, 2015
A decommissioned slave component may restart in the decommissioned state.
3
For a decommissioned component, choose Delete from the component drop-down menu.
Restarting services enables Ambari to recognize and monitor the correct number of
components.
Deleting a slave component, such as a DataNode does not automatically inform a master
component, such as a NameNode to remove the slave component from its exclusion list.
Adding a deleted slave component back into the cluster presents the following issue; the
added slave remains decommissioned from the master's perspective. Restart the master
component, as a work-around.
Deleting a Host from a Cluster
Deleting a host removes the host from the cluster. Before deleting a host, you must complete the
following prerequisites:
•
Stop all components running on the host.
•
Decommission any DataNodes running on the host.
•
Move from the host any master components, such as NameNode or ResourceManager,
running on the host.
•
Turn Off Maintenance Mode, if necessary, for the host.
How to Delete a Host from a Cluster
1
In Hosts, click on a host name.
2
On the Host-Details page, select Host Actions drop-down menu.
3
Choose Delete.
If you have not completed prerequisite steps, a warning message similar to the following one
appears:
23
March 26, 2015
Setting Maintenance Mode
Maintenance Mode supports suppressing alerts and skipping bulk operations for specific services,
components and hosts in an Ambari-managed cluster. You typically turn on Maintenance Mode when
performing hardware or software maintenance, changing configuration settings, troubleshooting,
decommissioning, or removing cluster nodes. You may place a service, component, or host object in
Maintenance Mode before you perform necessary maintenance or troubleshooting tasks.
Maintenance Mode affects a service, component, or host object in the following two ways:
•
Maintenance Mode suppresses alerts, warnings and status change indicators generated for
the object
•
Maintenance Mode exempts an object from host-level or service-level bulk operations
Explicitly turning on Maintenance Mode for a service implicitly turns on Maintenance Mode for
components and hosts that run the service. While Maintenance Mode On prevents bulk operations
being performed on the service, component, or host, you may explicitly start and stop a service,
component, or host having Maintenance Mode On.
Setting Maintenance Mode for Services, Components, and Hosts
For example, examine using Maintenance Mode in a 3-node, Ambari-managed cluster installed using
default options. This cluster has one data node, on host c6403. This example describes how to
explicitly turn on Maintenance Mode for the HDFS service, alternative procedures for explicitly
turning on Maintenance Mode for a host, and the implicit effects of turning on Maintenance Mode for
a service, a component and a host.
How to Turn On Maintenance Mode for a Service 1
Using Services, select HDFS.
2
Select Service Actions, then choose Turn On Maintenance Mode.
3
Choose OK to confirm.
Notice, on Services Summary that Maintenance Mode turns on for the NameNode and
SNameNode components.
How to Turn On Maintenance Mode for a Host 1
Using Hosts, select c6401.ambari.apache.org.
2
Select Host Actions, then choose Turn On Maintenance Mode.
3
Notice on Components, that Maintenance Mode turns on for all components.
How to Turn On Maintenance Mode for a Host (alternative using filtering for hosts) 1
Using Hosts, select c6403.ambari.apache.org.
2
In Actions > Selected Hosts > Hosts choose Turn On Maintenance Mode.
3
Notice that Maintenance Mode turns on for host c6403.ambari.apache.org.
24
March 26, 2015
Your list of Hosts now shows Maintenance Mode On for hosts c6401 and c6403.
•
•
Hover your cursor over each Maintenance Mode icon appearing in the Hosts list.
•
Notice that hosts c6401 and c6403 have Maintenance Mode On.
•
Notice that on host c6401; HBaseMaster, HDFS client, NameNode, and ZooKeeper
Server have Maintenance Mode turned On.
•
Notice on host c6402, that HDFS client and Secondary NameNode have Maintenance
Mode On.
•
Notice on host c6403, that 15 components have Maintenance Mode On.
The following behavior also results:
•
Alerts are suppressed for the DataNode.
•
DataNode is skipped from HDFS Start/Stop/Restart All, Rolling Restart.
•
DataNode is skipped from all Bulk Operations except Turn Maintenance Mode
ON/OFF.
•
DataNode is skipped from Start All and / Stop All components.
•
DataNode is skipped from a host-level restart/restart all/stop all/start.
Maintenance Mode Use Cases Four common Maintenance Mode Use Cases follow:
1
You want to perform hardware, firmware, or OS maintenance on a host.
You want to:
•
Prevent alerts generated by all components on this host.
•
Be able to stop, start, and restart each component on the host.
•
Prevent host-level or service-level bulk operations from starting, stopping, or
restarting components on this host.
To achieve these goals, turn On Maintenance Mode explicitly for the host. Putting a host in
Maintenance Mode implicitly puts all components on that host in Maintenance Mode.
2
You want to test a service configuration change. You will stop, start, and restart the service
using a rolling restart to test whether restarting picks up the change.
You want:
•
No alerts generated by any components in this service.
•
To prevent host-level or service-level bulk operations from starting, stopping, or
restarting components in this service.
25
March 26, 2015
To achieve these goals, turn on Maintenance Mode explicitly for the service. Putting a service
in Maintenance Mode implicitly turns on Maintenance Mode for all components in the
service.
3
You turn off a service completely.
You want:
•
The service to generate no warnings.
•
To ensure that no components start, stop, or restart due to host-level actions or bulk
operations.
To achieve these goals, turn On Maintenance Mode explicitly for the service. Putting a
service in Maintenance Mode implicitly turns on Maintenance Mode for all components in the
service.
4
A host component is generating alerts.
You want to:
•
Check the component.
•
Assess warnings and alerts generated for the component.
•
Prevent alerts generated by the component while you check its condition.
To achieve these goals, turn on Maintenance Mode explicitly for the host component. Putting a host
component in Maintenance Mode prevents host-level and service-level bulk operations from starting
or restarting the component. You can restart the component explicitly while Maintenance Mode is on.
Adding Hosts to a Cluster
To add new hosts to your cluster, browse to the Hosts page and select Actions > +Add New
Hosts. The Add Host W izard provides a sequence of prompts similar to those in the Ambari Install
Wizard. Follow the prompts, providing information similar to that provided to define the first set of
hosts in your cluster.
Managing Services
Use Services to monitor and manage selected services running in your Hadoop cluster.
All services installed in your cluster are listed in the leftmost Services panel.
26
March 26, 2015
Services supports the following tasks:
•
Starting and Stopping All Services
•
Selecting a Service
•
Editing Service Config Properties
•
Performing Service Actions
•
Viewing Summary, Alert, and Health Information
•
Rolling Restarts
•
Refreshing YARN Capacity Scheduler
•
Rebalancing HDFS
Starting and Stopping All Services
To start or stop all listed services at once, select Actions, then choose Start All or Stop All, as
Selecting a Service
Selecting a service name from the list shows current summary, alert, and health information for the
selected service. To refresh the monitoring panels and show information about a different service,
select a different service name from the list.
Notice the colored dot next to each service name, indicating service operating status and a small,
red, numbered rectangle indicating any alerts generated for the service.
27
March 26, 2015
Adding a Service
The Ambari install wizard installs all available Hadoop services by default. You may choose to deploy
only some services initially, then add other services at later times. For example, many customers
deploy only core Hadoop services initially. Add Service supports deploying additional services
without interrupting operations in your Hadoop cluster. When you have deployed all available
services, Add Service displays disabled.
For example, if you are using HDP 2.2 Stack and did not install Falcon or Storm, you can use the
Add Service capability to add those services to your cluster.
To add a service, select Actions > Add Service, then complete the following procedure using the
Add Service Wizard.
Adding a Service to your Hadoop cluster This example shows the Falcon service selected for addition.
1
Choose Services.
Choose an available service. Alternatively, choose all to add all available services to your
cluster. Then, choose Next. The Add Service wizard displays installed services highlighted
green and check-marked, not available for selection.
Ambari 2.0 supports adding Ranger and Spark services, using the Add Services
Wizard.
28
March 26, 2015
For more information about installing Ranger, see Installing Ranger.
For more information about Installing Spark, see Installing Spark.
2
In Assign Masters, confirm the default host assignment. Alternatively, choose a different
host machine to which master components for your selected service will be added. Then,
choose Next.
The Add Services Wizard indicates hosts on which the master components for a chosen
service will be installed. A service chosen for addition shows a grey check mark.
Using the drop-down, choose an alternate host name, if necessary.
3
•
A green label located on the host to which its master components will be added, or
•
An active drop-down list on which available host names appear.
In Assign Slaves and Clients, accept the default assignment of slave and client
components to hosts. Then, choose Next.
Alternatively, select hosts on which you want to install slave and client components. You
must select at least one host for the slave of each service being added.
Service Added
YARN
HBase
Host Role Required
NodeManager
RegionServer
Table 5. Host Roles Required for Added Services 29
March 26, 2015
The Add Service Wizard skips and disables the Assign Slaves and Clients step for a service
requiring no slave nor client assignment.
4
In Customize Services, accept the default configuration properties.
Alternatively, edit the default values for configuration properties, if necessary. Choose
Override to create a configuration group for this service. Then, choose Next.
5
In Review, make sure the configuration settings match your intentions. Then, choose Deploy.
6
Monitor the progress of installing, starting, and testing the service. When the service installs
and starts successfully, choose Next.
7
Summary displays the results of installing the service. Choose Complete.
30
8
March 26, 2015
Restart any other components having stale configurations.
Editing Service Config Properties
Select a service, then select Configs to view and update configuration properties for the selected
service. For example, select MapReduce2, then select Configs. Expand a config category to view
configurable service properties. For example, select General to configure Default virtual memory for a
job's map task.
Viewing Summary, Alert, and Health Information
After you select a service, the Summary tab displays basic information about the selected service.
31
March 26, 2015
Select one of the View Host links, as shown in the following example, to view components and the
host on which the selected service is running.
Alerts and Health Checks
On each Service page, in the Summary area, click Alerts to see a list of all health checks and their
status for the selected service. Critical alerts are shown first. Click the text title of each alert message
in the list to see the alert definition. For example, On the HBase > Services, click Alerts. Then, in
Alerts for HBase, click HBase Master Process.
Analyzing Service Metrics
Review visualizations in Metrics that chart common metrics for a selected service. Services >
Summary displays metrics widgets for HDFS, HBase, Storm services. For more information about
using metrics widgets, see Scanning System Metrics.
Performing Service Actions
Manage a selected service on your cluster by performing service actions. In Services, select the
Service Actions drop-down menu, then choose an option. Available options depend on the service
you have selected. For example, HDFS service action options include:
32
March 26, 2015
Optionally, choose Turn On Maintenance Mode to suppress alerts generated by a service before
performing a service action. Maintenance Mode suppresses alerts and status indicator changes
generated by the service, while allowing you to start, stop, restart, move, or perform maintenance
tasks on the service. For more information about how Maintenance Mode affects bulk operations for
host components, see Setting Maintenance Mode.
Monitoring Background Operations
Optionally, use Background Operations to monitor progress and completion of bulk operations such
as rolling restarts.
Background Operations opens by default when you run a job that executes bulk operations.
1
Select the right-arrow for each operation to show restart operation progress on each host.
2
After restarts complete, Select the right-arrow, or a host name, to view log files and any error
messages generated on the selected host.
3
Select links at the upper-right to copy or open text files containing log and error information.
33
March 26, 2015
Optionally, select the option to not show the bulk operations dialog.
Using Quick Links
Select Quick Links options to access additional sources of information about a selected service.
For example, HDFS Quick Links options include the native NameNode GUI, NameNode logs, the
NameNode JMX output, and thread stacks for the HDFS service. Quick Links are not available for
every service.
Rolling Restarts
When you restart multiple services, components, or hosts, use rolling restarts to distribute the task;
minimizing cluster downtime and service disruption. A rolling restart stops, then starts multiple,
running slave components such as DataNodes, NodeManagers, RegionServers, or Supervisors, using
a batch sequence. You set rolling restart parameter values to control the number of, time between,
tolerance for failures, and limits for restarts of many components across large clusters.
To run a rolling restart:
1
Select a Service, then link to a lists of specific components or hosts that Require Restart.
2
Select Restart, then choose a slave component option.
3
Review and set values for Rolling Restart Parameters.
4
Optionally, reset the flag to only restart components with changed configurations.
5
Choose Trigger Restart.
Use Monitor Background Operations to monitor progress of rolling restarts.
34
March 26, 2015
Setting Rolling Restart Parameters
When you choose to restart slave components, use parameters to control how restarts of
components roll. Parameter values based on ten percent of the total number of components in your
cluster are set as default values. For example, default settings for a rolling restart of components in a
3-node cluster restarts one component at a time, waits two minutes between restarts, will proceed if
only one failure occurs, and restarts all existing components that run this service.
If you trigger a rolling restart of components, Restart components with stale configs defaults to true.
If you trigger a rolling restart of services, Restart services with stale configs defaults to false.
Rolling restart parameter values must satisfy the following criteria:
Table 6. Validation Rules for Rolling Restart Parameters Parameter
Batch Size
Required
Yes
Value
Must be an integer > 0
Wait Time
Yes
Must be an integer > =
0
Tolerate up to x
failures
Yes
Must be an integer > =
0
Aborting a Rolling Restart
To abort future restart operations in the batch, choose Abort Rolling Restart.
35
Description
Number of
components to include
in each restart batch.
Time (in seconds) to
wait between queuing
each batch of
components.
Total number of restart
failures to tolerate,
across all batches,
before halting the
restarts and not
queuing batches.
March 26, 2015
Refreshing YARN Capacity Scheduler
After you modify the Capacity Scheduler configuration, YARN supports refreshing the queues without
requiring you to restart your ResourceManager. The “refresh” operation is valid if you have made no
destructive changes to your configuration. Removing a queue is an example of a destructive change.
How to refresh the YARN Capacity Scheduler
This topic describes how to refresh the Capacity Scheduler in cases where you have added or
modified existing queues.
•
In Ambari Web, browse to Services > YARN > Summary.
•
Select Service Actions, then choose Refresh YARN Capacity Scheduler.
•
Confirm you would like to perform this operation.
The refresh operation is submitted to the YARN ResourceManager.
The Refresh operation will fail with the following message: “Failed to re-init queues” if
you attempt to refresh queues in a case where you performed a destructive change,
such as removing a queue. In cases where you have made destructive changes, you
must perform a ResourceManager restart for the capacity scheduler change to take
effect.
Rebalancing HDFS
HDFS provides a “balancer” utility to help balance the blocks across DataNodes in the cluster.
How to rebalance HDFS
This topic describes how you can initiate an HDFS rebalance from Ambari.
1
. In Ambari Web, browse to Services > HDFS > Summary.
2
Select Service Actions, then choose Rebalance HDFS.
3
Enter the Balance Threshold value as a percentage of disk capacity.
4
Click Start to begin the rebalance.
5
You can check rebalance progress or cancel a rebalance in process by opening the
Background Operations dialog.
36
March 26, 2015
Managing Service High Availability
Ambari provides the ability to configure the High Availability features available with the HDP Stack
services. This section describes how to enable HA for the various Stack services.
•
NameNode High Availability
•
Resource Manager High Availability
•
HBase High Availability
•
Hive High Availability
•
Oozie High Availability
NameNode High Availability
To ensure that a NameNode in your cluster is always available if the primary NameNode host fails,
enable and set up NameNode High Availability on your cluster using Ambari Web.
In Ambari Web, browse to Services > HDFS > Summary, select Service Actions and then
choose Enable NameNode HA. Follow the steps in the Enable NameNode HA Wizard.
For more information about using the Enable NameNode HA Wizard, see How to Set Up NameNode
High Availability.
How To Configure NameNode High Availability
1
Check to make sure you have at least three hosts in your cluster and are running at least
three ZooKeeper servers.
2
In Ambari Web, select Services > HDFS > Summary. Select Service Actions and
choose Enable NameNode HA.
3
The Enable HA Wizard launches. This wizard describes the set of automated and manual
steps you must take to set up NameNode high availability.
4
Get Started : This step gives you an overview of the process and allows you to select a
Nameservice ID. You use this Nameservice ID instead of the NameNode FQDN once HA has
been set up. Click Next to proceed.
37
March 26, 2015
5
Select Hosts : Select a host for the additional NameNode and the JournalNodes. The
wizard suggest options that you can adjust using the drop-down lists. Click Next to proceed.
6
Review : Confirm your host selections and click Next.
7
Create Checkpoints : Follow the instructions in the step. You need to log in to your
current NameNode host to run the commands to put your NameNode into safe mode and
create a checkpoint. When Ambari detects success, the message on the bottom of the
window changes. Click Next.
38
March 26, 2015
8
Configure Components : The wizard configures your components, displaying progress
bars to let you track the steps. Click Next to continue.
9
Initialize JournalNodes : Follow the instructions in the step. You need to login to your
current NameNode host to run the command to initialize the JournalNodes. When Ambari
detects success, the message on the bottom of the window changes. Click Next.
10 Start Components : The wizard starts the ZooKeeper servers and the NameNode,
displaying progress bars to let you track the steps. Click Next to continue.
11 Initialize Metadata : Follow the instructions in the step. For this step you must log in to
both the current NameNode and the additional NameNode. Make sure you are logged in
to the correct host for each command. Click Next when you have completed the two
commands. A Confirmation pop-up window displays, reminding you to do both steps. Click
OK to confirm.
39
March 26, 2015
12 Finalize HA Setup : The wizard the setup, displaying progress bars to let you track the
steps. Click Done to finish the wizard. After the Ambari Web GUI reloads, you may see some
alert notifications. Wait a few minutes until the services come back up. If necessary, restart
any components using Ambari Web.
13 If you are using Hive, you must manually change the Hive Metastore FS root to point to the
Nameservice URI instead of the NameNode URI. You created the Nameservice ID in the Get
Started step.
1
Check the current FS root. On the Hive host:
hive --config /etc/hive/conf.server --service metatool listFSRoot
The output looks similar to the following:
Listing FS Roots...
hdfs://<namenode-host>/apps/hive/warehouse
2
Use this command to change the FS root:
$ hive --config /etc/hive/conf.server --service metatool
updateLocation <new-location><old-location>
-
For example, where the Nameservice ID is mycluster:
$ hive --config /etc/hive/conf.server --service metatool updateLocation hdfs://mycluster/apps/hive/warehouse
hdfs://c6401.ambari.apache.org/apps/hive/warehouse
The output looks similar to the following:
Successfully updated the following locations...
Updated X records in SDS table
14 Adjust the ZooKeeper Failover Controller retries setting for your environment.
40
March 26, 2015
•
Browse to Services > HDFS > Configs > core-site.
•
Set ha.failover-controller.active-standbyelector.zk.op.retries=120
How to Roll Back NameNode HA
To roll back NameNode HA to the previous non-HA state use the following step-by-step manual
process, depending on your installation.
1
Stop HBase
2
Checkpoint the Active NameNode
3
Stop All Services
4
Prepare the Ambari Host for Rollback
5
Restore the HBase Configuration
6
Delete ZooKeeper Failover Controllers
7
Modify HDFS Configurations
8
Recreate the standby NameNode
9
Re-enable the standby NameNode
10 Delete All JournalNodes
11 Delete the Additional NameNode
12 Verify the HDFS Components
13 Start HDFS
Stop HBase 1
From Ambari Web, go to the Services view and select HBase.
2
Choose Service Actions > Stop.
3
Wait until HBase has stopped completely before continuing.
Checkpoint the Active NameNode If HDFS has been in use after you enabled NameNode HA, but you wish to revert back to a non-HA
state, you must checkpoint the HDFS state before proceeding with the rollback.
If the Enable NameNode HA wizard failed and you need to revert back, you can skip this step and
move on to Stop All Services.
•
If Kerberos security has not been enabled on the cluster:
On the Active NameNode host, execute the following commands to save the namespace. You
must be the HDFS service user to do this.
sudo su -l <HDFS_USER> -c 'hdfs dfsadmin -safemode enter'
sudo su -l <HDFS_USER> -c 'hdfs dfsadmin -saveNamespace'
41
•
March 26, 2015
If Kerberos security has been enabled on the cluster:
sudo su -l <HDFS_USER> -c 'kinit -kt
/etc/security/keytabs/nn.service.keytab nn/<HOSTNAME>@<REALM>;hdfs
dfsadmin -safemode enter'
sudo su -l <HDFS_USER> -c 'kinit -kt
/etc/security/keytabs/nn.service.keytab nn/<HOSTNAME>@<REALM>;hdfs
dfsadmin -saveNamespace'
Where <HDFS_USER> is the HDFS service user; for example hdfs, <HOSTNAME> is the Active
NameNode hostname, and <REALM> is your Kerberos realm.
Stop All Services Browse to Ambari W eb > Services, then choose Stop All in the Services navigation panel. You
must wait until all the services are completely stopped.
Prepare the Ambari Server Host for Rollback Log into the Ambari server host and set the following environment variables to prepare for the
rollback procedure:
Variable
export AMBARI_USER=AMBARI_USERNAME
export AMBARI_PW=AMBARI_PASSWORD
export AMBARI_PORT=AMBARI_PORT
export AMBARI_PROTO=AMBARI_PROTOCOL
export CLUSTER_NAME=CLUSTER_NAME
export NAMENODE_HOSTNAME=NN_HOSTNAME
export
ADDITIONAL_NAMENODE_HOSTNAME=ANN_HOSTNAME
export
SECONDARY_NAMENODE_HOSTNAME=SNN_HOSTNAME
export
JOURNALNODE1_HOSTNAME=JOUR1_HOSTNAME
export
export
42
Value
Substitute the value of the
administrative user for Ambari Web.
The default value is admin.
Substitute the value of the
administrative password for Ambari
Web. The default value is admin.
Substitute the Ambari Web port. The
default value is 8080.
Substitute the value of the protocol for
connecting to Ambari Web. Options are
http or https. The default value is http.
Substitute the name of your cluster, set
during the Ambari Install Wizard
process. For example: mycluster.
Substitute the FQDN of the host for the
non-HA NameNode. For example:
nn01.mycompany.com.
additional NameNode in your HA setup.
standby NameNode for the non-HA
setup.
first Journal Node.
second Journal Node.
third Journal Node.
March 26, 2015
Double check that these environment variables are set correctly.
Restore the HBase Configuration If you have installed HBase, you may need to restore a configuration to its pre-HA state.
1
To check if your current HBase configuration needs to be restored, on the Ambari Server
host:
/var/lib/ambari-server/resources/scripts/configs.sh -u <AMBARI_USER> -p
<AMBARI_PW> -port <AMBARI_PORT> get localhost <CLUSTER_NAME> hbase-site
Where the environment variables you set up in Prepare the Ambari Server Host for Rollback
substitute for the variable names.
Look for the configuration property hbase.rootdir. If the value is set to the NameService
ID you set up using the Enable NameNode HA wizard, you need to revert the hbase-site
configuration set up back to non-HA values. If it points instead to a specific NameNode host,
it does not need to be rolled back and you can go on to Delete ZooKeeper Failover
Controllers.
For example:
"hbase.rootdir":"hdfs://<name-service-id>:8020/apps/hbase/data"
The hbase.rootdir property points to the NameService ID and the value needs to be rolled
back
"hbase.rootdir":"hdfs://<nn01.mycompany.com>:8020/apps/hbase/data"
The hbase.rootdir property points to a specific NameNode host and not a NameService ID.
This does not need to be rolled back.
2
If you need to roll back the hbase.rootdir value, on the Ambari Server host, use the
config.sh script to make the necessary change:
<AMBARI_PW> -port <AMBARI_PORT> set localhost <CLUSTER_NAME> hbase-site
hbase.rootdir hdfs://<NAMENODE_HOSTNAME>:8020/apps/hbase/data
Where the environment variables you set up in Prepare the Ambari Server Host for Rollback
substitute for the variable names.
3
Verify that the hbase.rootdir property has been restored properly. On the Ambari Server
host:
<AMBARI_PW> -port <AMBARI_PORT> get localhost <CLUSTER_NAME> hbase-site
The hbase.rootdir property should now be set to the NameNode hostname, not the
NameService ID.
43
March 26, 2015
Delete ZooKeeper Failover Controllers You may need to delete ZooKeeper (ZK) Failover Controllers.
1
To check if you need to delete ZK Failover Controllers, on the Ambari Server host:
curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i
<AMBARI_PROTO>://localhost:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>
/host_components?HostRoles/component_name=ZKFC
If this returns an empty items array, you may proceed to Modify HDFS Configuration.
Otherwise you must use the following DELETE commands:
2
To delete all ZK Failover Controllers, on the Ambari Server host:
curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i -X
DELETE
/hosts/<NAMENODE_HOSTNAME>/host_components/ZKFC
DELETE
/hosts/<ADDITIONAL_NAMENODE_HOSTNAME>/host_components/ZKFC
3
Verify that the ZK Failover Controllers have been deleted. On the Ambari Server host:
curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i
/host_components?HostRoles/component_name=ZKFC
This command should return an empty items array.
Modify HDFS Configurations You may need to modify your hdfs-site configuration and/or your core-site configuration.
1
To check if you need to modify your hdfs-site configuration, on the Ambari Server host:
<AMBARI_PW> -port <AMBARI_PORT> get localhost <CLUSTER_NAME> hdfs-site
If you see any of the following properties, you must delete them from your configuration.
•
dfs.nameservices
•
dfs.client.failover.proxy.provider.<NAMESERVICE_ID>
•
dfs.ha.namenodes.<NAMESERVICE_ID>
•
dfs.ha.fencing.methods
•
dfs.ha.automatic-failover.enabled
44
March 26, 2015
•
dfs.namenode.http-address.<NAMESERVICE_ID>.nn1
•
dfs.namenode.http-address.<NAMESERVICE_ID>.nn2
•
dfs.namenode.rpc-address.<NAMESERVICE_ID>.nn1
•
dfs.namenode.rpc-address.<NAMESERVICE_ID>.nn2
•
dfs.namenode.shared.edits.dir
•
dfs.journalnode.edits.dir
•
dfs.journalnode.http-address
•
dfs.journalnode.kerberos.internal.spnego.principal
•
dfs.journalnode.kerberos.principal
•
dfs.journalnode.keytab.file
Where <NAMESERVICE_ID> is the NameService ID you created when you ran the
Enable NameNode HA wizard.
2
To delete these properties, execute the following for each property you found. On the
Ambari Server host:
<AMBARI_PW> -port <AMBARI_PORT> delete localhost <CLUSTER_NAME> hdfssite property_name
Where you replace property_name with the name of each of the properties to be deleted.
3
Verify that all of the properties have been deleted. On the Ambari Server host:
<AMBARI_PW> -port <AMBARI_PORT> get localhost <CLUSTER_NAME> hdfs-site
None of the properties listed above should be present.
4
To check if you need to modify your core-site configuration, on the Ambari Server host:
<AMBARI_PW> -port <AMBARI_PORT> get localhost <CLUSTER_NAME> core-site
5
If you see the property ha.zookeeper.quorum, it must be deleted. On the Ambari Server
host:
<AMBARI_PW> -port <AMBARI_PORT> delete localhost <CLUSTER_NAME> coresite ha.zookeeper.quorum
6
If the property fs.defaultFS is set to the NameService ID, it must be reverted back to its
non-HA value. For example:
45
March 26, 2015
"fs.defaultFS":"hdfs://<name-service-id>"
The property fs.defaultFS needs to be modified as it points to a
NameService ID
"fs.defaultFS":"hdfs://<nn01.mycompany.com>"
The property fs.defaultFS does not need to be changed as it points to a specific
NameNode, not to a NameService ID
7
To revert the property fs.defaultFS to the NameNode host value, on the Ambari Server
host:
<AMBARI_PW> -port <AMBARI_PORT> set localhost <CLUSTER_NAME> core-site
fs.defaultFS hdfs://<NAMENODE_HOSTNAME>
8
Verify that the core-site properties are now properly set. On the Ambari Server host:
<AMBARI_PW> -port <AMBARI_PORT> get localhost <CLUSTER_NAME> core-site
The property fs.defaultFS should be set to point to the NameNode host and the property
ha.zookeeper.quorum should not be there.
Recreate the Standby NameNode You may need to recreate your standby NameNode.
1
To check to see if you need to recreate the standby NameNode, on the Ambari Server host:
curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i -X GET
/host_components?HostRoles/component_name=SECONDARY_NAMENODE
If this returns an empty items array, you must recreate your standby NameNode. Otherwise
you can go on to Re-enable Standby NameNode.
2
Recreate your standby NameNode. On the Ambari Server host:
POST -d '{"host_components" :
[{"HostRoles":{"component_name":"SECONDARY_NAMENODE"}]
}'
/hosts?Hosts/host_name=<SECONDARY_NAMENODE_HOSTNAME>
3
Verify that the standby NameNode now exists. On the Ambari Server host:
/host_components?HostRoles/component_name=SECONDARY_NAMENODE
This should return a non-empty items array containing the standby NameNode.
46
March 26, 2015
Re-‐enable the Standby NameNode To re-enable the standby NameNode, on the Ambari Server host:
'{"RequestInfo":{"context":"Enable Secondary
NameNode"},"Body":{"HostRoles":{"state":"INSTALLED"}}}'<AMBARI_PROTO>://local
host:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/hosts/<SECONDARY_NAMENODE_H
OSTNAME}/host_components/SECONDARY_NAMENODE
•
If this returns 200, go to Delete All JournalNodes.
•
If this returns 202, wait a few minutes and run the following on the Ambari Server host:
curl -u <AMBARI_USER>:${AMBARI_PW -H "X-Requested-By: ambari" -i -X
"<AMBARI_PROTO>://localhost:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME
>/host_components?HostRoles/component_name=SECONDARY_NAMENODE&fields=Ho
stRoles/state"
When "state" : "INSTALLED" is in the response, go on to the next step.
Delete All JournalNodes You may need to delete any JournalNodes.
1
To check to see if you need to delete JournalNodes, on the Ambari Server host:
/host_components?HostRoles/component_name=JOURNALNODE
If this returns an empty items array, you can go on to Delete the Additional NameNode.
Otherwise you must delete the JournalNodes.
2
To delete the JournalNodes, on the Ambari Server host:
DELETE
/hosts/<JOURNALNODE1_HOSTNAME>/host_components/JOURNALNODE
DELETE
DELETE
3
Verify that all the JournalNodes have been deleted. On the Ambari Server host:
47
March 26, 2015
/host_components?HostRoles/component_name=JOURNALNODE
This should return an empty items array.
Delete the Additional NameNode You may need to delete your Additional NameNode.
1
To check to see if you need to delete your Additional NameNode, on the Ambari Server host:
/host_components?HostRoles/component_name=NAMENODE
If the items array contains two NameNodes, the Additional NameNode must be deleted.
2
To delete the Additional NameNode that was set up for HA, on the Ambari Server host:
DELETE
/hosts/<ADDITIONAL_NAMENODE_HOSTNAME>/host_components/NAMENODE
3
Verify that the Additional NameNode has been deleted:
/host_components?HostRoles/component_name=NAMENODE
This should return an items array that shows only one NameNode.
Verify the HDFS Components Make sure you have the correct components showing in HDFS.
1
Go to Ambari W eb UI > Services, then select HDFS.
2
Check the Summary panel and make sure that the first three lines look like this:
•
NameNode
•
SNameNode
•
DataNodes
You should not see any line for JournalNodes.
Start HDFS 1
In the Ambari W eb UI, select Service Actions, then choose Start.
48
March 26, 2015
Wait until the progress bar shows that the service has completely started and has passed the
service checks.
If HDFS does not start, you may need to repeat the previous step.
2
To start all of the other services, select Actions > Start All in the Services navigation
panel.
Resource Manager High Availability
This feature is available with HDP Stack 2.2 or later.
The following topic explains How to Configure ResourceManager High Availability.
How to Configure ResourceManager High Availability
1
Check to make sure you have at least three hosts in your cluster and are running at least
three ZooKeeper servers.
2
In Ambari Web, browse to Services > YARN > Summary. Select Service Actions and
choose Enable ResourceManager HA.
3
The Enable ResourceManager HA Wizard launches. The wizard describes a set of automated
and manual steps you must take to set up ResourceManager High Availability.
4
Get Started: This step gives you an overview of enabling ResourceManager HA. Click Next
to proceed.
5
Select Host: The wizard shows you the host on which the current ResourceManager is
installed and suggests a default host on which to install an additional ResourceManager.
Accept the default selection, or choose an available host. Click Next to proceed.
49
March 26, 2015
6
Review Selections: The wizard shows you the host selections and configuration changes
that will occur to enable ResourceManager HA. Expand YARN, if necessary, to review all the
YARN configuration changes. Click Next to approve the changes and start automatically
configuring ResourceManager HA.
7
Configure Components: The wizard configures your components automatically,
displaying progress bars to let you track the steps. After all progress bars complete, click
Complete to finish the wizard.
HBase High Availability
During the HBase service install, depending on your component assignment, Ambari installs and
configures one HBase Master component and multiple RegionServer components. To setup high
availability for the HBase service, you can run two or more HBase Master components by adding an
HBase Master component. Once running two or more HBase Masters, HBase uses ZooKeeper for
coordination of the active Master.
Adding an HBase Master Component
1
In Ambari Web, browse to Services > HBase.
2
In Service Actions, select the + Add HBase Master option.
3
Choose the host to install the additional HBase Master, then choose Confirm Add.
Ambari installs the new HBase Master and reconfigure HBase to handle multiple Master instances.
Hive High Availability
The Hive service has multiple, associated components. The primary Hive components are: Hive
Metastore and HiveServer2. To setup high availability for the Hive service, you can run two or more of
each of those components.
This feature is available with HDP 2.2 Stack.
50
March 26, 2015
The relational database that backs the Hive Metastore itself should also be made
highly available using best practices defined for the database system in use.
Adding a Hive Metastore Component
1
In Ambari Web, browse to Services > Hive.
2
In Service Actions, select the + Add Hive Metastore option.
3
Choose the host to install the additional Hive Metastore, then choose Confirm Add.
4
Ambari installs the component and reconfigures Hive to handle multiple Hive Metastore
instances.
Adding a HiveServer2 Component
1
In Ambari Web, browse to the host where you would like to install another HiveServer2.
2
On the Host page, choose +Add.
3
Select HiveServer2 from the list.
4
Ambari installs the new HiveServer2.
Ambari installs the component and reconfigures Hive to handle multiple Hive Metastore instances.
Oozie High Availability
To setup high availability for the Oozie service, you can run two or more instances of the Oozie
Server component.
This capability is available with HDP 2.2 Stack.
The relational database that backs the Oozie Server should also be made highly
available using best practices defined for the database system in use. Using the
default installed Derby database instance is not supported with multiple Oozie Server
instances and therefore, you must use an existing relational database. When using
Derby for the Oozie Server, you will not have an option to add Oozie Server
components to your cluster.
High availability for Oozie requires the use of an external Virtual IP Address or Load
Balancer to direct traffic to the Oozie servers.
Adding an Oozie Server Component
1
In Ambari Web, browse to the host where you would like to install another Oozie Server.
2
On the Host page, click the “+Add” button.
51
March 26, 2015
3
Select “Oozie Server” from the list and Ambari will install the new Oozie Server.
4
After configuring your external Load Balancer, update the oozie configuration.
5
Browse to Services > Oozie > Configs and in oozie-site add the following:
Property
oozie.zookeeper.c
onnection.string
oozie.services.ext
oozie.base.url
6
Value
List of ZooKeeper hosts with ports. For example:
c6401.ambari.apache.org:2181,c6402.ambari.apache.org:2181,c6403.ambari.ap
ache.org:2181
org.apache.oozie.service.ZKLocksService,org.apache.oozie.service.ZKXLogStr
eamingService,org.apache.oozie.service.ZKJobsConcurrencyService
http://<loadbalancer.hostname>:11000/oozie
In oozie-env, uncomment OOZIE_BASE_URL property and change value to point to the Load
Balancer. For example:
export OOZIE_BASE_URL="http://<loadbalance.hostname>:11000/oozie"
7
Restart Oozie service for the changes to take affect.
8
Update HDFS configs for the Oozie proxy user. Browse to Services > HDFS > Configs and in
core-site update the hadoop.proxyuser.oozie.hosts property to include the newly added
Oozie Server host. Hosts should be comma separated.
9
Restart all needed services.
Managing Configurations
Use Ambari Web to manage your HDP component configurations. Select any of the following topics:
•
Configuring Services
•
Using Host Config Groups
•
Customizing Log Settings
•
Downloading Client Configs
•
Service Configuration Versions
Configuring Services
Select a service, then select Configs to view and update configuration properties for the selected
service. For example, select MapReduce2, then select Configs. Expand a config category to view
configurable service properties.
Updating Service Properties
1
Expand a configuration category.
2
Edit values for one or more properties that have the Override option.
Edited values, also called stale configs, show an Undo option.
3
Choose Save.
52
March 26, 2015
Restarting components
After editing and saving a service configuration, Restart indicates components that you must restart.
Select the Components or Hosts links to view details about components or hosts requiring a restart.
Then, choose an option appearing in Restart. For example, options to restart YARN components
include:
Using Host Config Groups
Ambari initially assigns all hosts in your cluster to one, default configuration group for each service
you install. For example, after deploying a three-node cluster with default configuration settings,
each host belongs to one configuration group that has default configuration settings for the HDFS
service. In Configs, select Manage Config Groups, to create new groups, re-assign hosts, and
override default settings for host components you assign to each group.
To create a Configuration Group:
1
Choose Add New Configuration Group.
2
Name and describe the group, then choose Save.
3
Select a Config Group, then choose Add Hosts to Config Group.
4
Select Components and choose from available Hosts to add hosts to the new group.
Select Configuration Group Hosts enforces host membership in each group, based on
installed components for the selected service.
5
Choose OK.
53
6
March 26, 2015
In Manage Configuration Groups, choose Save.
To edit settings for a configuration group:
1
In Configs, choose a Group.
2
Select a Config Group, then expand components to expose settings that allow Override.
3
Provide a non-default value, then choose Override or Save.
Configuration groups enforce configuration properties that allow override, based on installed
components for the selected service and group.
4
5
Override prompts you to choose one of the following options:
1
Select an existing configuration group (to which the property value override provided
in step 3 will apply), or
2
Create a new configuration group (which will include default properties, plus the
property override provided in step 3).
3
Then, choose OK.
In Configs, choose Save.
Customizing Log Settings
Ambari Web displays default logging properties in Service Configs > Custom log 4j
Properties. Log 4j properties control logging activities for the selected service.
54
March 26, 2015
Restarting components in the service pushes the configuration properties displayed in Custom log 4j
Properties to each host running components for that service. If you have customized logging
properties that define how activities for each service are logged, you will see refresh indicators next
to each service name after upgrading to Ambari 1.5.0 or higher. Make sure that logging properties
displayed in Custom log 4j Properties include any customization. Optionally, you can create
configuration groups that include custom logging properties. For more information about saving and
overriding configuration settings, see Editing Service Config Properties.
Downloading Client Configs
For Services that include client components (for example Hadoop Client or Hive Client), you can
download the client configuration files associated with that client from Ambari.
•
In Ambari Web, browse to the Service with the client for which you want the configurations.
•
Choose Service Actions.
•
Choose Download Client Configs. You are prompted for a location to save the client
configs bundle.
•
Save the bundle.
Service Configuration Versions
Ambari provides the ability to manage configurations associated with a Service. You can make
changes to configurations, see a history of changes, compare + revert changes and push
configuration changes to the cluster hosts.
•
Basic Concepts
•
Terminology
•
Saving a Change
•
Viewing History
•
Comparing Versions
•
Reverting a Change
•
Versioning and Host Config Groups
55
March 26, 2015
Basic Concepts
It’s important to understand how service configurations are organized and stored in Ambari.
Properties are grouped into Configuration Types (config types). A set of config types makes up the
set of configurations for a service.
For example, the HDFS Service includes the following config types: hdfs-site, core-site, hdfs-log4j,
hadoop-env, hadoop-policy. If you browse to Services > HDFS > Configs, the configuration
properties for these config types are available for edit.
Versioning of configurations is performed at the service-level. Therefore, when you modify a
configuration property in a service, Ambari will create a Service Config Version. The figure below
shows V1 and V2 of a Service Configuration Version with a change to a property in Config Type A.
After making the property change to Config Type A in V1, V2 is created.
Terminology
The following table lists configuration versioning terms and concepts that you should know.
Term
Configuration Property
Configuration Type (Config Type)
Service Configurations
Change Notes
Service Config Version (SCV)
Host Config Group (HCG)
Description
Configuration property managed by Ambari, such as NameNode
heapsize or replication factor.
Group of configuration properties. For example: hdfs-site is a
Config Type.
Set of configuration types for a particular service. For example:
hdfs-site and core-site Config Types are part of the HDFS
Service Configuration.
Optional notes to save with a service configuration change.
Particular version of configurations for a specific service. Ambari
saves a history of service configuration versions.
Set of configuration properties to apply to a specific set of hosts.
Each service has a default Host Config Group, and custom
config groups can be created on top of the default configuration
group to target property overrides to one or more hosts in the
cluster. See Managing Configuration Groups for more
information.
Saving a Change
1
Make the configuration property change.
2
Choose Save.
56
March 26, 2015
3
You are prompted to enter notes that describe the change.
4
Click Save to confirm your change. Cancel will not save but instead returns you to the
configuration page to continuing editing.
To revert the changes you made and not save, choose Discard.
To return to the configuration page and continue editing without saving changes, choose
Cancel.
Viewing History
Service Config Version history is available from Ambari Web in two places: On the Dashboard page
under the Config History tab; and on each Service page under the Configs tab.
The Dashboard > Config History tab shows a list of all versions across services with each
version number and the date and time the version was created. You can also see which user
authored the change with the notes entered during save. Using this table, you can filter, sort and
search across versions.
The most recent configuration changes are shown on the Service > Configs tab. Users can
navigate the version scrollbar left-right to see earlier versions. This provides a quick way to access
the most recent changes to a service configuration.
57
March 26, 2015
Click on any version in the scrollbar to view, and hover to display an option menu which allows you
compare versions and perform a revert. Performing a revert makes any config version that you select
the current version.
Comparing Versions
When navigating the version scroll area on the Services > Configs tab, you can hover over a
version to display options to view, compare or revert.
To perform a compare between two service configuration versions:
1
Navigate to a specific configuration version. For example “V6”.
2
Using the version scrollbar, find the version would you like to compare against “V6”. For
example, if you want to compare V6 to V2, find V2 in the scrollbar.
3
Hover over the version to display the option menu. Click “Compare”.
4
Ambari displays a comparison of V6 to V2, with an option to revert to V2.
5
Ambari also filters the display by only “Changed properties”. This option is available under
the Filter control.
58
March 26, 2015
Reverting a Change
You can revert to an older service configuration version by using the “Make Current” feature. The
“Make Current” will actually create a new service configuration version with the configuration
properties from the version you are reverting -- it is effectively a “clone”. After initiating the Make
Current operation, you are prompted to enter notes for the new version (i.e. the clone) and save. The
notes text will include text about the version being cloned.
There are multiple methods to revert to a previous configuration version:
•
View a specific version and click the “Make V* Current” button.
•
Use the version navigation dropdown and click the “Make Current” button.
•
Hover on a version in the version scrollbar and click the “Make Current” button.
•
Perform a comparison and click the “Make V* Current” button.
Versioning and Host Config Groups
Service configuration versions are scoped to a host config group. For example, changes made in the
default group can be compared and reverted in that config group. Same with custom config groups.
The following example describes a flow where you have multiple host config groups and create
service configuration versions in each config group.
59
March 26, 2015
Administering the Cluster
From the cluster dashboard, use the Admin options to view information about Managing Stack and
Versions, Service Accounts, and to Enable Kerberos security.
For more information about administering your Ambari Server, see the Ambari
Managing Stack and Versions
The Stack section includes information about the Services installed and available in the cluster
Stack. Browse the list of Services and click Add Service to start the wizard to install Services into
your cluster.
The Versions section shows what version of software is currently running and installed in the
cluster. This section also exposes the capability to perform an automated cluster upgrade for
maintenance and patch releases for the Stack. This capability is available for HDP 2.2 Stack only. If
you have a cluster running HDP 2.2, you can perform Stack upgrades to later maintenance and patch
releases. For example: you can upgrade from the GA release of HDP 2.2 (which is HDP 2.2.0.0) to the
first maintenance release of HDP 2.2 (which is HDP 2.2.4.2).
For more details on upgrading from HDP 2.2.0.0 to the latest HDP 2.2 maintenance
release, see the Ambari Upgrade Guide.
60
March 26, 2015
The process for managing versions and performing an upgrade is comprised of three main steps:
1
Register a Version into Ambari
2
Install the Version into the Cluster
3
Perform Upgrade to the New Version
Register a Version
Ambari can manage multiple versions of Stack software.
To register a new version:
1
On the Versions tab, click Manage Versions.
2
Proceed to register a new version by clicking + Register Version.
3
Enter a two-digit version number. For example, enter 4.2, (which makes the version HDP2.2.4.2).
4
Select one or more OS families and enter the respective Base URLs.
5
Click Save.
6
You can click “Install On...” or you can browse back to Admin > Stack and Versions >
Versions tab. You will see the version current running and the version you just registered.
Proceed to Install the Version.
Install the Version
To install a version in the cluster:
1
On the versions tab, click Install Packages.
2
Click OK to confirm.
3
The Install version operation will start and the new version will be installed on all hosts.
4
You can browse to Hosts and to each Host > Versions tab to see the new version is
installed. Proceed to Perform Upgrade.
61
March 26, 2015
Perform Upgrade
Once your target version has been registered into Ambari, installed on all hosts in the cluster and you
meet the Prerequisites you are ready to perform an upgrade.
The perform upgrade process switches over the services in the cluster to a new version in a rolling
fashion. The process follows the flow below. Starting with ZooKeeper and the Core Master
components, ending with a Finalize step. To ensure the process runs smoothly, this process includes
some manual prompts for you to perform cluster verification and testing along the way. You will be
prompted when your input is required.
This process can take some time to complete. You should validate the upgrade
process in a dev/test environment prior to performing in production, as well, plan a
block of time to monitor the progress. And as always, be sure to perform backups of
your service metadata (you will be prompted during the first-stages of the upgrade
process).
Upgrade Prerequisites To perform an automated cluster upgrade from Ambari, your cluster must meet the following
prerequisites:
Item
Cluster
Requirement
Stack Version
Version
HDFS
New Version
NameNode HA
HDFS
Decommission
YARN
Hosts
YARN WPR
Heartbeats
Description
Must be running HDP 2.2 Stack. This capability is not
available for HDP 2.0 or 2.1 Stacks.
All hosts must have the new version installed.
NameNode HA must be enabled and working properly.
See the Ambari User’s Guide for more information
Configuring NameNode High Availability.
No components should be in decommissioning or
decommissioned state.
Work Preserving Restart must be configured.
All Ambari Agents must be heartbeating to Ambari
Server. Any hosts that are not heartbeating must be in
Maintenance Mode.
62
Hosts
Maintenance Mode
Services
Services
Services Started
Maintenance Mode
March 26, 2015
Any hosts in Maintenance Mode must not be hosting any
Service master components.
All Services must be started.
No Services can be in Maintenance Mode.
To perform an upgrade to a new version.
1
On the versions tab, click Perform Upgrade on the new version.
2
Follow the steps on the wizard.
Service Accounts
To view the list of users and groups used by the cluster services, choose Admin > Service
Accounts.
Kerberos
If Kerberos has not been enabled in your cluster, click the Enable Kerberos button to launch the
Kerberos wizard. For more information on configuring Kerberos in your cluster, see the Ambari
Security Guide. Once Kerberos is enabled, you can:
•
Regenerate Keytabs
•
Disable Kerberos
How To Regenerate Keytabs
1
Browse to Admin > Kerberos.
2
Click the Regenerate Kerberos button.
3
Confirm your selection to proceed.
63
March 26, 2015
4
Optionally, you can regenerate keytabs for only those hosts that are missing keytabs. For
example, hosts that were not online/available from Ambari when enabling Kerberos.
5
Once you confirm, Ambari will connect to the KDC and regenerate the keytabs for the Service
and Ambari principals in the cluster.
6
Once complete, you must restart all services for the new keytabs to be used.
Ambari requires the Kerberos Admin credentials in order to regenerate the keytabs. If
the credentials are not available to Ambari, you will be prompted to enter the KDC
Admin username and password. For more information on configuring Kerberos in your
cluster, see the Ambari Security Guide.
How To Disable Kerberos
1
Browse to Admin > Kerberos.
2
Click the Disable Kerberos button.
3
Confirm your selection to proceed. Cluster services will be stopped and the Ambari Kerberos
security settings will be reset.
4
To re-enable Kerberos, click Enable Kerberos and follow the wizard steps. For more
information on configuring Kerberos in your cluster, see the Ambari Security Guide.
Monitoring and Alerts
Ambari monitors cluster health and can alert you in the case of certain situations to help you identify
and troubleshoot problems. You manage how alerts are organized, under which conditions
notifications are sent, and by which method. This section provides information on:
•
Managing Alerts
•
Configuring Notifications
•
List of Predefined Alerts
Managing Alerts
Ambari predefines a set of alerts that monitor the cluster components and hosts. Each alert is
defined by an Alert Definition, which specifies the checking interval and thresholds (which are
dependent on the Alert Type). When a cluster is created or modified, Ambari reads the Alert
Definitions and creates Alert Instances for the specific components to watch.
Terms and Definitions
The following basic terms help describe the key concepts associated with Ambari Alerts:
Term
Alert Definition
Type
State
Definition
Defines the alert including the description, check interval, type and
thresholds.
The type of alert, such as PORT or METRIC.
Indicates the state of an alert definition. Enabled or disabled. When
disabled, no alert instances are created.
64
Alert Instance
Status
Threshold
Alert Group
Notification
March 26, 2015
Represents the specific alert instances based on an alert definition. For
example, the alert definition for DataNode process will have an alert
instance per DataNode in the cluster.
An alert instance status is defined by severity. The most common
severity levels are OK, WARN, CRIT but there are also severities for
UNKNOWN and NONE. See “Alert Instances” for more information.
The thresholds assigned to each status.
Grouping of alert definitions, useful for handling notifications targets.
A notification target for when an alert instance status changes. Methods
of notification include EMAIL and SNMP.
Table 7. Terminology Alert Definitions and Instances
An Alert Definition includes name, description and check interval, as well as configurable thresholds
for each status (depending on the Alert Type).
The following table lists the types of alerts, their possible status and if the thresholds are
configurable:
Type
Description
Status
PORT
Watches a port based on a
configuration property as the
uri.
Example: Hive Metastore
Process
Watches a metric based on a
configuration property.
Example: ResourceManager
RPC Latency
Aggregate of status for
another alert definition.
Example: percentage
NodeManagers Available
Watches a Web UI and adjusts
status based on response.
Example: App Timeline Web
UI
Uses a custom script to
handle checking.
Example: NodeManager
Health Summary
OK, WARN,
CRIT
METRIC
AGGREGATE
WEB
SCRIPT
Thresholds
Configurable
Yes
Units
OK, WARN,
CRIT
Yes
variable
OK, WARN,
CRIT
Yes
percentage
OK, WARN,
CRIT
No
n/a
OK, CRIT
No
n/a
Table 8. Alert Types How To Change an Alert
1
Browse to the Alerts section in Ambari Web.
2
Find the alert definition to modify and click to view the definition details.
3
Click to Edit the description, check interval or thresholds.
4
Changes will take effect on all alert instances at the next interval check.
65
seconds
March 26, 2015
How To View a List of Alert Instances
1
2
Find the alert definition and click to view the definition details.
3
The list of alert instances is shown.
4
Alternatively, you can browse to a specific host via the Hosts section of Ambari Web to view
the list of alert instances specific to that host.
How To Enable or Disable an Alert
1
2
Find the alert definition. Click to enable/disable.
3
Alternatively, you can click to view the definition details and click to enable/disable.
4
When disabled, not alert instances are in effect, therefore no alerts will be reported or
dispatched for the alert definition.
Configuring Notifications
With Alert Groups and Notifications, you can create groups of alerts and setup notification targets for
each group. This way, you can notify different parties interested in certain sets of alerts via different
methods. For example, you might want your Hadoop Operations team to receive all alerts via EMAIL,
regardless of status. And at the same time, have your System Administration team receive all RPC
and CPU related alerts that are Critical only via SNMP. To achieve this scenario, you would have an
Alert Notification that handles Email for all alert groups for all severity levels, and you would have a
different Alert Notification group that handles SNMP on critical severity for an Alert Group that
contains the RPC and CPU alerts.
Ambari defines a set of default Alert Groups for each service installed in the cluster. For example, you
will see a group for HDFS Default. These groups cannot be deleted and the alerts in these groups are
not modifiable. If you choose not to use these groups, just do not set a notification target for them.
Creating or Editing Notifications
1
2
Under the Actions menu, click Manage Notifications.
3
The list of existing notifications is shown.
4
Click + to “Create new Alert Notification”. The Create Alert Notification is displayed.
5
Enter the notification name, select that groups the notification should be assigned to (all or a
specific set), select the Severity levels that this notification responds to, include a
description, and choose the method for notification (EMAIL or SNMP).
6
•
For EMAIL: you will need to provide information about your SMTP infrastructure such
as SMTP Server, Port, To/From address and if authentication is required to relay
messages through the server. You can add custom properties to the SMTP
configuration based on the Javamail SMTP options.
•
For SNMP: you will need to select the SNMP version, OIDs, community and port.
After completing the notification, click Save.
66
March 26, 2015
Creating or Editing Alert Groups
1
2
From the Actions menu, choose Manage Alert Groups
3
The list of existing groups (default and custom) is shown.
4
Choose + to “Create Alert Group”. Enter the Group a name and click Save.
5
By clicking on the custom group in the list, you can add or delete alert definitions from this
group, and change the notification targets for the group.
List of Predefined Alerts
•
HDFS Service Alerts
•
NameNode HA Alerts
•
YARN Alerts
•
MapReduce2 Alerts
•
HBase Service Alerts
•
Hive Alerts
•
Oozie Alerts
•
ZooKeeper Alerts
•
Ambari Alerts
HDFS Service Alerts
Alert
NameNode
Blocks
health
Description
This service-level alert is
triggered if the number of corrupt
or missing blocks exceeds the
configured critical threshold.
Potential Causes
Some
DataNodes are
down and the
replicas that are
missing blocks
are only on those
DataNodes.
The
corrupt/missing
blocks are from
files with a
replication factor
of 1. New
replicas cannot
be created
because the only
replica of the
block is missing.
67
Possible Remedies
For critical data, use a
replication factor of 3.
Bring up the failed
DataNodes with missing or
corrupt blocks.
Identify the files associated
with the missing or corrupt
blocks by running the
Hadoop fsck command.
Delete the corrupt files and
recover them from backup, if
it exists.
NameNode
process
DataNode
Storage
DataNode
process
DataNode
Web UI
NameNode
host CPU
utilization
March 26, 2015
This host-level alert is triggered if
the NameNode process cannot
be confirmed to be up and
listening on the network for the
configured critical threshold,
given in seconds.
The NameNode
process is down
on the HDFS
master host.
The NameNode
process is up
and running but
not listening on
the correct
network port
(default 8201).
storage capacity is full on the
DataNode (90% critical). It
checks the DataNode JMX
Servlet for the Capacity and
Remaining properties.
Cluster storage
is full.
the individual DataNode
processes cannot be established
to be up and listening on the
network for the configured critical
threshold, given in seconds.
DataNode
process is down
or not
responding.
If cluster storage
is not full,
DataNode is full.
DataNode are
not down but is
not listening to
the correct
network
port/address.
the DataNode Web UI is
unreachable.
CPU utilization of the NameNode
exceeds certain thresholds
(200% warning, 250% critical). It
checks the NameNode JMX
Servlet for the SystemCPULoad
property. This information is only
available if you are running JDK
1.7.
The DataNode
process is not
running.
Unusually high
CPU utilization:
Can be caused
by a very
unusual
job/query
workload, but
this is generally
the sign of an
issue in the
daemon.
68
Check for any errors in the
logs
(/var/log/hadoop/hdfs/)and
restart the NameNode
host/process using the HMC
Manage Services tab.
Run the netstat-tuplpn
command to check if the
NameNode process is bound
to the correct network port.
If cluster still has storage,
use Balancer to distribute the
data to relatively less-used
datanodes.
If the cluster is full, delete
unnecessary data or add
additional storage by adding
either more DataNodes or
more or larger disks to the
DataNodes. After adding
more storage run Balancer.
Check for dead DataNodes in
Ambari Web.
DataNode logs
(/var/log/hadoop/hdfs) and
restart the DataNode, if
necessary.
DataNode process is bound
Check whether the DataNode
process is running.
Use the top command to
determine which processes
are consuming excess CPU.
Reset the offending process.
NameNode
Web UI
Percent
DataNodes
with
Available
Space
Percent
DataNodes
Available
NameNode
RPC
latency
NameNode
Last
Checkpoint
Secondary
NameNode
Process
March 26, 2015
the NameNode Web UI is
unreachable.
triggered if the storage if full on a
certain percentage of DataNodes
(10% warn, 30% critical). It
aggregates the result from the
check_datanode_storage.php
plug-in.
The NameNode
process is not
running.
Cluster storage
is full.
This alert is triggered if the
number of down DataNodes in
the cluster is greater than the
configured critical threshold. It
uses the check_aggregate plug-in
to aggregate the results of Data
node process checks.
DataNodes are
down
If cluster storage
is not full,
DataNode is full.
DataNodes are
not down but are
not listening to
the correct
network
port/address.
the NameNode operations RPC
latency exceeds the configured
critical threshold. Typically an
increase in the RPC processing
time increases the RPC queue
length, causing the average
queue wait time to increase for
NameNode operations.
This alert will trigger if the last
time that the NameNode
performed a checkpoint was too
long ago or if the number of
uncommitted transactions is
beyond a certain threshold.
A job or an
application is
performing too
many NameNode
operations.
Too much time
elapsed since
last NameNode
checkpoint.
Uncommitted
transactions
beyond
threshold.
The Secondary
NameNode is not
running.
If the Secondary NameNode
process cannot be confirmed to
be up and listening on the
network. This alert is not
applicable when NameNode HA
is configured.
69
Check whether the
NameNode process is
running.
If cluster still has storage,
use Balancer to distribute the
data to relatively less used
DataNodes.
If the cluster is full, delete
unnecessary data or add
additional storage by adding
either more DataNodes or
more or larger disks to the
DataNodes. After adding
more storage run Balancer.
Check for dead DataNodes in
Ambari Web.
DataNode logs
(/var/log/hadoop/hdfs) and
restart the DataNode
hosts/processes.
DataNode process is bound
Review the job or the
application for potential bugs
causing it to perform too
many NameNode operations.
Set NameNode checkpoint.
Review threshold for
uncommitted transactions.
Check that the Secondary
DataNode process is running.
NameNode
Directory
Status
This alert checks if the
NameNode NameDirStatus metric
reports a failed directory.
HDFS
capacity
utilization
triggered if the HDFS capacity
utilization exceeds the configured
critical threshold (80% warn, 90%
critical). It checks the NameNode
JMX Servlet for the CapacityUsed
and CapacityRemaining
properties.
DataNode
Health
Summary
March 26, 2015
One or more of
the directories
are reporting as
not healthy.
Cluster storage
is full.
Check the NameNode UI for
information about unhealthy
directories.
Delete unnecessary data.
Archive unused data.
Add more DataNodes.
Add more or larger disks to
the DataNodes.
triggered if there are unhealthy
DataNodes.
A DataNode is in
an unhealthy
state.
After adding more storage,
run Balancer.
Check the NameNode UI for
the list of dead DataNodes.
NameNode HA Alerts
Alert
Description
JournalNode
process
the individual JournalNode
process cannot be established to
be up and listening on the network
for the configured critical
threshold, given in seconds.
NameNode
High
Availability
Health
This service-level alert is triggered
if either the Active NameNode or
Standby NameNode are not
running.
Potential
Causes
The
JournalNode
process is
down or not
responding.
The
JournalNode is
not down but is
not listening to
the correct
network
port/address.
The Active,
Standby or both
NameNode
processes are
down.
Possible Remedies
Check if the JournalNode
process is dead.
On each host running
NameNode, check for any
errors in the logs
(/var/log/hadoop/hdfs/) and
restart the NameNode
host/process using Ambari
Web.
On each host running
NameNode, run the netstattuplpn command to check if
the NameNode process is
bound to the correct
network port.
70
ZooKeeper
Failover
Controller
process
March 26, 2015
ZooKeeper Failover Controller
process cannot be confirmed to
be up and listening on the
network.
The ZKFC
process is
down or not
responding.
Check if the ZKFC process
is running.
YARN Alerts
Alert
Percent
NodeManagers
Available
Description
number of down NodeManagers
in the cluster is greater than the
configured critical threshold. It
aggregates the results of
DataNode process alert checks.
ResourceManager
Web UI
This host-level alert is triggered
if the ResourceManager Web UI
is unreachable.
ResourceManager
RPC latency
if the ResourceManager
operations RPC latency
exceeds the configured critical
threshold. Typically an increase
in the RPC processing time
increases the RPC queue
length, causing the average
queue wait time to increase for
ResourceManager operations.
if CPU utilization of the
ResourceManager exceeds
certain thresholds (200%
warning, 250% critical). It
checks the ResourceManager
JMX Servlet for the
SystemCPULoad property. This
information is only available if
you are running JDK 1.7.
ResourceManager
CPU utilization
71
Potential Causes
NodeManagers
are down.
Possible Remedies
Check for dead
NodeManagers.
NodeManagers
are not down but
are not listening
to the correct
network
port/address.
Check for any errors
in the NodeManager
logs
(/var/log/hadoop/yarn)
and restart the
NodeManagers
hosts/processes, as
necessary.
The
ResourceManager
process is not
running.
A job or an
application is
performing too
many
ResourceManager
operations.
Unusually high
CPU utilization:
Can be caused by
a very unusual
job/query
workload, but this
is generally the
sign of an issue in
the daemon.
Run the netstattuplpn command to
check if the
NodeManager
process is bound to
the correct network
port.
Check if the
ResourceManager
process is running.
application for
potential bugs
causing it to perform
too many
ResourceManager
operations.
Use the top command
to determine which
processes are
consuming excess
CPU.
Reset the offending
process.
NodeManager
Web UI
March 26, 2015
if the NodeManager process
cannot be established to be up
and listening on the network for
the configured critical threshold,
given in seconds.
NodeManager
health
This host-level alert checks the
node health property available
from the NodeManager
component.
NodeManager
process is down
or not
responding.
NodeManager is
not down but is
not listening to
the correct
network
port/address.
Node Health
Check script
reports issues or
is not configured.
Check if the
NodeManager is
running.
Check for any errors
in the NodeManager
logs
and restart the
NodeManager, if
necessary.
Check in the
NodeManager logs
for health check
errors and restart the
NodeManager, and
restart if necessary.
Check in the
ResourceManager UI
logs
for health check
errors.
MapReduce2 Alerts
Alert
Description
HistoryServer
Web UI
the HistoryServer Web UI is
unreachable.
HistoryServer
RPC latency
the HistoryServer operations RPC
latency exceeds the configured
critical threshold. Typically an
increase in the RPC processing
time increases the RPC queue
length, causing the average queue
wait time to increase for
NameNode operations.
72
Potential
Causes
The
HistoryServer
process is not
running.
A job or an
application is
performing
too many
HistoryServer
operations.
Possible Remedies
Check if the HistoryServer
process is running.
application for potential
bugs causing it to perform
too many HistoryServer
operations.
HistoryServer
CPU
utilization
HistoryServer
process
March 26, 2015
the percent of CPU utilization on
the HistoryServer exceeds the
configured critical threshold.
the HistoryServer process cannot
be established to be up and
given in seconds.
73
Unusually high
CPU
utilization:
Can be
caused by a
very unusual
job/query
workload, but
this is
generally the
sign of an
issue in the
daemon.
HistoryServer
process is
down or not
responding.
HistoryServer
is not down
but is not
listening to the
correct
network
port/address.
Use the top command to
determine which processes
are consuming excess CPU.
Reset the offending process.
Check the HistoryServer is
running.
HistoryServer logs
(/var/log/hadoop/mapred)
and restart the
HistoryServer, if necessary.
March 26, 2015
HBase Service Alerts
Alert
Percent
RegionServers
live
Description
triggered if the configured
percentage of Region Server
processes cannot be determined
to be up and listening on the
network for the configured critical
threshold. The default setting is
10% to produce a WARN alert
and 30% to produce a CRITICAL
alert. It aggregates the results of
RegionServer process down
checks.
Potential Causes
Misconfiguration
or less-than-ideal
configuration
caused the
RegionServers to
crash.
Cascading failures
brought on by
some workload
caused the
RegionServers to
crash.
The RegionServers
shut themselves
own because there
were problems in
the dependent
services,
ZooKeeper or
HDFS.
HBase Master
process
HBase
Master Web
UI
This alert is triggered if the HBase
master processes cannot be
confirmed to be up and listening
on the network for the configured
critical threshold, given in
seconds.
the HBase Master Web UI is
unreachable.
74
GC paused the
RegionServer for
too long and the
RegionServers lost
contact with
Zookeeper.
The HBase master
process is down.
The HBase master
has shut itself
down because
there were
problems in the
dependent
services,
ZooKeeper or
HDFS.
The HBase Master
process is not
running.
Possible Remedies
Check the dependent
services to make sure
they are operating
correctly.
Look at the
RegionServer log files
(usually
/var/log/hbase/*.log) for
further information.
If the failure was
associated with a
particular workload, try
to understand the
workload better.
Restart the
RegionServers.
Check the dependent
services.
Look at the master log
files (usually
/var/log/hbase/*.log) for
further information.
Look at the
configuration files
(/etc/hbase/conf).
Restart the master.
Check if the Master
process is running.
HBase Master
CPU
utilization
RegionServer
process
March 26, 2015
CPU utilization of the HBase
Master exceeds certain
thresholds (200% warning, 250%
critical). It checks the HBase
Master JMX Servlet for the
SystemCPULoad property. This
information is only available if you
are running JDK 1.7.
the RegionServer processes
cannot be confirmed to be up and
given in seconds.
Unusually high
CPU utilization:
Can be caused by
a very unusual
job/query
workload, but this
is generally the
sign of an issue in
the daemon.
The RegionServer
process is down
on the host.
The RegionServer
process is up and
running but not
listening on the
correct network
port (default
60030).
Use the top command
to determine which
processes are
consuming excess CPU
Reset the offending
process.
Check for any errors in
the logs
(/var/log/hbase/) and
restart the
RegionServer process
using Ambari Web.
command to check if
the RegionServer
process is bound to the
correct network port.
Hive Alerts
Alert
HiveServer2
Process
Hive
Metastore
Process
WebHCat
Server
status
Description
the HiveServer cannot be
determined to be up and
responding to client requests.
the Hive Metastore process
cannot be determined to be up
and listening on the network for
the configured critical threshold,
given in seconds.
the WebHCat server cannot be
Potential Causes
HiveServer2 process
is not running.
HiveServer2 process
is not responding.
The Hive Metastore
service is down.
The database used by
the Hive Metastore is
down.
The Hive Metastore
host is not reachable
over the network.
The WebHCat server
is down.
The WebHCat server
is hung and not
responding.
The WebHCat server
is not reachable over
the network.
75
Possible Remedies
Using Ambari Web,
check status of
HiveServer2
component. Stop and
then restart.
Using Ambari Web,
stop the Hive service
and then restart it.
Restart the WebHCat
server using Ambari
Web.
March 26, 2015
Oozie Alerts
Alert
Oozie
status
Description
the Oozie server cannot be
Potential Causes
The Oozie server is
down.
Possible Remedies
Restart the Oozie
service using Ambari
Web.
The Oozie server is hung
and not responding.
The Oozie server is not
reachable over the
network.
ZooKeeper Alerts
Alert
Description
Percent
ZooKeeper
Servers
Available
This service-level alert is triggered
if the configured percentage of
ZooKeeper processes cannot be
determined to be up and listening
on the network for the configured
critical threshold, given in seconds.
It aggregates the results of
Zookeeper process checks.
Potential
Causes
The
majority of
your
ZooKeeper
servers are
down and
not
responding.
Possible Remedies
Check the dependent services to
make sure they are operating
correctly.
Check the ZooKeeper logs
(/var/log/hadoop/zookeeper.log)
for further information.
If the failure was associated with
a particular workload, try to
understand the workload better.
ZooKeeper
Server
process
the ZooKeeper server process
cannot be determined to be up and
configured critical threshold, given
in seconds.
The
ZooKeeper
server
process is
down on
the host.
The
ZooKeeper
server
process is
up and
running but
not listening
on the
correct
network
port (default
2181).
76
Restart the ZooKeeper servers
from the Ambari UI.
ZooKeeper logs (/var/log/hbase/)
and restart the ZooKeeper
process using Ambari Web.
Run the netstat-tuplpn command
to check if the ZooKeeper server
process is bound to the correct
network port.
March 26, 2015
Ambari Alerts
Alert
Ambari
Agent
Disk
Usage
Description
the amount of disk space used on a
host goes above specific
thresholds. The default values are
50% for WARNING and 80% for
CRITICAL.
Potential Causes
The host is running
out of disk space.
Possible Remedies
Check logs and
temporary directories for
items to remove.
Add more disk space.
77
March 26, 2015
Installing HDP Using Ambari
This section describes the information and materials you should get ready to install a HDP cluster
using Ambari. Ambari provides an end-to-end management and monitoring solution for your HDP
cluster. Using the Ambari Web UI and REST APIs, you can deploy, operate, manage configuration
changes, and monitor services for all nodes in your cluster from a central point.
•
Determine Stack Compatibility
•
Meet Minimum System Requirements
•
Collect Information
•
Prepare the Environment
•
Optional: Configure Local Repositories for Ambari
Determine Stack Compatibility
Use this table to determine whether your Ambari and HDP stack versions are compatible.
Ambari
2.0.0
1.7.0
1.6.1
1.6.0
1.5.1
1.5.0
1.4.4.23
1.4.3.38
1.4.2.104
1.4.1.61
1.4.1.25
1.2.5.17
HDP 2.21
x
x
HDP 2.12
x
x
x
x
x
HDP 2.03
x
x
x
x
x
x
x
x
x
x
x
HDP1.3
x
x
x
x
x
x
x
x
x
x
x
For more information about:
•
Installing Accumulo, Hue, and Solr services, see Installing HDP Manually.
•
Installing Spark, see Installing Spark.
•
Installing Ranger, see Installing Ranger.
Meet Minimum System Requirements
To run Hadoop, your system must meet the following minimum requirements:
•
1
2
3
Hardware Recommendations
Ambari 2.0x does not install Accumulo, Hue, or Solr services for the HDP 2.2 Stack.
Ambari 2.0x does not install Accumulo, Hue, Knox, or Solr services for the HDP 2.1 Stack.
Ambari 2.0x does not install Hue for the HDP 2.0 Stack.
78
March 26, 2015
•
Operating Systems Requirements
•
Browser Requirements
•
Software Requirements
•
JDK Requirements
•
Database Requirements
•
Recommended Maximum Open File Descriptors
Hardware Recommendations
There is no single hardware requirement set for installing Hadoop.
For more information about hardware components that may affect your installation, see Hardware
Recommendations For Apache Hadoop.
Operating Systems Requirements
The following, 64-bit operating systems are supported:
•
Red Hat Enterprise Linux (RHEL) v6.x
•
Red Hat Enterprise Linux (RHEL) v5.x (deprecated)
•
CentOS v6.x
•
CentOS v5.x (deprecated)
•
Oracle Linux v6.x
•
Oracle Linux v5.x (deprecated)
•
SUSE Linux Enterprise Server (SLES) v11, SP1 and SP3
•
Ubuntu Precise v12.04
If you plan to install HDP Stack on SLES 11 SP3, be sure to refer to Configuring
Repositories in the HDP documentation for the HDP repositories specific for SLES 11
SP3. Or, if you plan to perform a Local Repository install, be sure to use the SLES 11
SP3 repositories.
The installer pulls many packages from the base OS repositories. If you do not have a
complete set of base OS repositories available to all your machines at the time of
installation you may run into issues.
If you encounter problems with base OS repositories being unavailable, please contact
your system administrator to arrange for these additional repositories to be proxied or
mirrored. For more information see Using a Local Repository.
Browser Requirements
The Ambari Install Wizard runs as a browser-based Web application. You must have a machine
capable of running a graphical browser to use this tool.
The minimum required browser versions are:
79
•
•
•
March 26, 2015
Windows (Vista, 7, 8)
•
Internet Explorer 9.0
•
Firefox 18
•
Google Chrome 26
Mac OS X (10.6 or later)
•
Firefox 18
•
Safari 5
•
Google Chrome 26
Linux (RHEL, CentOS, SLES, Oracle Linux, Ubuntu)
•
Firefox 18
•
Google Chrome 26
On any platform, we recommend updating your browser to the latest, stable version.
Software Requirements
On each of your hosts:
•
yum and rpm (RHEL/CentOS/Oracle Linux)
•
zypper and php_curl (SLES)
•
apt (Ubuntu)
•
scp, curl, unzip, tar, and wget
•
OpenSSL (v1.01, build 16 or later)
•
python v2.6
•
•
The Python version shipped with SUSE 11, 2.6.0-8.12.2, has a critical bug that
may cause the Ambari Agent to fail within the first 24 hours. If you are installing
on SUSE 11, please update all your hosts to Python version 2.6.8-0.15.1.
Python v2.7.9 or later is not supported due to changes in how Python
performs certificate validation.
JDK Requirements
The following Java runtime environments are supported:
•
Oracle JDK 1.7_67 64-bit (default)
•
Oracle JDK 1.6_31 64-bit (DEPRECATED)
•
OpenJDK 7 64-bit (not supported on SLES)
To install OpenJDK 7 for RHEL, run the following command on all hosts:
yum install java-1.7.0-openjdk
80
March 26, 2015
Database Requirements
Ambari requires a relational database to store information about the cluster configuration and
topology. If you install HDP Stack with Hive or Oozie, they also require a relational database. The
following table outlines these database requirements:
Component
Ambari
Hive
Oozie
Description
By default, will install an instance of PostgreSQL on the Ambari Server host. Optionally,
to use an existing instance of PostgreSQL, MySQL or Oracle. For further information,
see Using Non-Default Databases - Ambari.
By default (on RHEL/CentOS/Oracle Linux 6), Ambari will install an instance of MySQL
on the Hive Metastore host. Otherwise, you need to use an existing instance of
PostgreSQL, MySQL or Oracle. See Using Non-Default Databases - Hive for more
information.
By default, Ambari will install an instance of Derby on the Oozie Server host.
Optionally, to use an existing instance of PostgreSQL, MySQL or Oracle, see Using
Non-Default Databases - Oozie for more information.
For the Ambari database, if you use an existing Oracle database, make sure the Oracle
listener runs on a port other than 8080 to avoid conflict with the default Ambari port.
Alternatively, refer to the Ambari Reference Guide for information on Changing the
Default Ambari Server Port.
Memory Requirements
The Ambari host should have at least 1 GB RAM, with 500 MB free.
The Ambari Metrics Collector host should have the following memory and disk space available based
on cluster size:
Number of hosts
1
10
50
100
300
500
1000
2000
Memory Available
1024
1024
2048
4096
4096
8096
12288
16384
Disk Space
10 GB
20 GB
50 GB
100 GB
100 GB
200 GB
200 GB
500 GB
To check available memory on any host, run
free -m
The above is offered as guidelines. Be sure to test for your particular environment. Also
refer to Package Size and Inode Count Requirements for more information on package
size and Inode counts.
81
March 26, 2015
Package Size and Inode Count Requirements
Ambari Server
Ambari Agent
Ambari Metrics Collector
Ambari Metrics Monitor
Ambari Metrics Hadoop Sink
After Ambari Server Setup
After Ambari Server Start
After Ambari Agent Start
Size
100MB
8MB
225MB
1MB
8MB
N/A
N/A
N/A
Inodes
5,000
1,000
4,000
100
100
4,000
500
200
Table 9. *Size and Inode values are approximate Check the Maximum Open File Descriptors
The recommended maximum number of open file descriptors is 10000, or more.
To check the current value set for the maximum number of open file descriptors, execute the
following shell commands on each host:
ulimit -Sn
ulimit -Hn
Collect Information
Before deploying an HDP cluster, you should collect the following information:
•
The fully qualified domain name (FQDN) of each host in your system.
The Ambari install wizard supports using IP addresses. You can use hostname -f to check
or verify the FQDN of a host.
Deploying all HDP components on a single host is possible, but is appropriate only for
initial evaluation purposes. Typically, you set up at least three hosts; one master host
and two slaves, as a minimum cluster. For more information about deploying HDP
components, see the descriptions for aTypical Hadoop Cluster.
•
A list of components you want to set up on each host.
•
The base directories you want to use as mount points for storing:
•
NameNode data
•
DataNodes data
•
Secondary NameNode data
•
Oozie data
•
YARN data (Hadoop version 2.x)
•
ZooKeeper data, if you install ZooKeeper
•
Various log, pid, and db files, depending on your install type
82
March 26, 2015
You must use base directories that provide persistent storage locations for your HDP
components and your Hadoop data. Installing HDP components in locations that may
be removed from a host may result in cluster failure or data loss.
For example: Do Not use /tmp in a base directory path.
Prepare the Environment
To deploy your Hadoop instance, you need to prepare your deployment environment:
•
Check Existing Package Versions
•
Set up Password-less SSH
•
Set up Service User Accounts
•
Enable NTP on the Cluster
•
Check DNS
•
Configure iptables
•
Disable SELinux, PackageKit and Check umask Value
Check Existing Package Versions
During installation, Ambari overwrites current versions of some packages required by Ambari to
manage a Hadoop cluster. Package versions other than those that Ambari installs can cause
problems running the installer. Remove any package versions that do not match the following ones:
Component - Description
Ambari Server Database
Ambari Agent - Installed on each host in your cluster.
Communicates with the Ambari Server to execute
commands.
commands.
commands.
Files and Versions
postgresql 8.4.13-1.el6_3, postgresql-libs
8.4.13-1.el6_3, postgresql-server 8.4.131.el6_3
None
Files and Versions
postgresql 8.3.5-1, postgresql-server 8.3.51, postgresql-libs 8.3.5-1
None
Files and Versions
libpq5 postgresql postgresql-9.1
postgresql-client-9.1 postgresql-clientcommon postgresql-common ssl-cert
zlibc_0.9k-4.1_amd64
Files and Versions
83
March 26, 2015
commands.
libffi 3.0.5-1.el5, python26 2.6.8-2.el5,
python26-libs 2.6.8-2.el5, postgresql
8.4.13-1.el6_3, postgresql-libs 8.4.131.el6_3, postgresql-server 8.4.13-1.el6_3
libffi 3.0.5-1.el5, python26 2.6.8-2.el5,
python26-libs 2.6.8-2.el5
Set Up Password-less SSH
To have Ambari Server automatically install Ambari Agents on all your cluster hosts, you must set up
password-less SSH connections between the Ambari Server host and all other hosts in the cluster.
The Ambari Server host uses SSH public key authentication to remotely access and install the
Ambari Agent.
You can choose to manually install the Agents on each cluster host. In this case, you
do not need to generate and distribute SSH keys.
1
Generate public and private SSH keys on the Ambari Server host.
ssh-keygen
2
Copy the SSH Public Key (id_rsa.pub) to the root account on your target hosts.
.ssh/id_rsa
.ssh/id_rsa.pub
3
Add the SSH Public Key to the authorized_keys file on your target hosts.
cat id_rsa.pub >> authorized_keys
4
Depending on your version of SSH, you may need to set permissions on the .ssh directory (to
700) and the authorized_keys file in that directory (to 600) on the target hosts.
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
5
From the Ambari Server, make sure you can connect to each host in the cluster using SSH,
without having to enter a password.
ssh root@<remote.target.host>
where <remote.target.host> has the value of each host name in your cluster.
6
If the following warning message displays during your first connection:
Are you sure you want to continue connecting (yes/no)?
Enter Yes.
7
Retain a copy of the SSH Private Key on the machine from which you will run the web-based
Ambari Install Wizard.
84
March 26, 2015
It is possible to use a non-root SSH account, if that account can execute sudo without
entering a password.
Set up Service User Accounts
Each HDP service requires a service user account. The Ambari Install wizard creates new and
preserves any existing service user accounts, and uses these accounts when configuring Hadoop
services. Service user account creation applies to service user accounts on the local operating
system and to LDAP/AD accounts.
For more information about customizing service user accounts for each HDP service, see Defining
Service Users and Groups for a HDP 2.x Stack.
Enable NTP on the Cluster and on the Browser Host
The clocks of all the nodes in your cluster and the machine that runs the browser through which you
access the Ambari Web interface must be able to synchronize with each other.
To check that the NTP service is on, run the following command on each host:
chkconfig --list ntpd
To set the NTP service to start on reboot, run the following command on each host:
chkconfig ntpd on
To turn on the NTP service, run the following command on each host:
service ntpd start
Check DNS
All hosts in your system must be configured for both forward and and reverse DNS.
If you are unable to configure DNS in this way, you should edit the /etc/hosts file on every host in
your cluster to contain the IP address and Fully Qualified Domain Name of each of your hosts. The
following instructions are provided as an overview and cover a basic network setup for generic Linux
hosts. Different versions and flavors of Linux might require slightly different commands and
procedures. Please refer to the documentation for the operating system(s) deployed in your
environment.
Edit the Host File
1
Using a text editor, open the hosts file on every host in your cluster. For example:
vi /etc/hosts
2
Add a line for each host in your cluster. The line should consist of the IP address and the
FQDN.
For example:
1.2.3.4 <fully.qualified.domain.name>
85
March 26, 2015
Do not remove the following two lines from your hosts file. Removing or editing the
following lines may cause various programs that require network functionality to fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
Set the Hostname
1
Confirm that the hostname is set by running the following command:
hostname -f
This should return the <fully.qualified.domain.name> you just set.
2
Use the "hostname" command to set the hostname on each host in your cluster.
For example:
hostname <fully.qualified.domain.name>
Edit the Network Configuration File
1
Using a text editor, open the network configuration file on every host and set the desired
network configuration for each host. For example:
vi /etc/sysconfig/network
2
Modify the HOSTNAME property to set the fully qualified domain name.
NETWORKING=yes
NETWORKING_IPV6=yes
HOSTNAME=<fully.qualified.domain.name>
Configuring iptables
For Ambari to communicate during setup with the hosts it deploys to and manages, certain ports
must be open and available. The easiest way to do this is to temporarily disable iptables, as follows:
chkconfig iptables off
/etc/init.d/iptables stop
You can restart iptables after setup is complete. If the security protocols in your environment prevent
disabling iptables, you can proceed with iptables enabled, if all required ports are open and
available. For more information about required ports, see Configuring Network Port Numbers.
Ambari checks whether iptables is running during the Ambari Server setup process. If iptables is
running, a warning displays, reminding you to check that required ports are open and available. The
Host Confirm step in the Cluster Install Wizard also issues a warning for each host that has iptables
running.
86
March 26, 2015
Disable SELinux and PackageKit and check the umask Value
1
You must temporarily disable SELinux for the Ambari setup to function.
On each host in your cluster,
setenforce 0
To permanently disable SELinux
set SELINUX=disabled in /etc/selinux/config
This ensures that SELinux does not turn itself on after you reboot the machine .
2
On an installation host running RHEL/CentOS with PackageKit installed,
open /etc/yum/pluginconf.d/refresh-packagekit.conf using a text editor.
Make the following change: enabled=0
PackageKit is not enabled by default on SLES or Ubuntu systems. Unless you have
specifically enabled PackageKit, you may skip this step for a SLES or Ubuntu
installation host.
3
UMASK (User Mask or User file creation MASK) sets the default permissions or base
permissions granted when a new file or folder is created on a Linux machine. Most Linux
distros set 022 as the default umask value. A umask value of 022 grants read, write, execute
permissions of 755 for new files or folders. A umask value of 027 grants read, write, execute
permissions of 750 for new files or folders.
Ambari supports a umask value of 022 or 027.
For example, to set the umask value to 022, run the following command as root on all hosts,
vi /etc/profile
then, append the following line:
umask 022
Using a Local Repository
If your cluster is behind a fire wall that prevents or limits Internet access, you can install Ambari and a
Stack using local repositories. This section describes how to:
•
Obtain the repositories
•
Set up a local repository having:
•
•
No Internet Access
•
Temporary Internet Access
Prepare the Ambari repository configuration file
Obtaining the Repositories
This section describes how to obtain:
87
•
Ambari Repositories
•
HDP Stack Repositories
March 26, 2015
Ambari Repositories
If you do not have Internet access for setting up the Ambari repository, use the link appropriate for
your OS family to download a tarball that contains the software.
RHEL/CentOS/Oracle Linux 6
wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/ambari2.0.0-centos6.tar.gz
SLES 11
wget -nv http://public-repo-1.hortonworks.com/ambari/suse11/ambari2.0.0-suse11.tar.gz
UBUNTU 12
wget -nv http://public-repo-1.hortonworks.com/ambari/ubuntu12/ambari2.0.0-ubuntu12.tar.gz
RHEL/CentOS/ORACLE Linux 5 (DEPRECATED)
wget -nv http://public-repo-1.hortonworks.com/ambari/centos5/ambari2.0.0-centos5.tar.gz
If you have temporary Internet access for setting up the Ambari repository, use the link appropriate
for your OS family to download a repository that contains the software.
wget -nv http://public-repo1.hortonworks.com/ambari/centos6/2.x/updates/2.0.0/ambari.repo
SLES 11
wget -nv http://public-repo1.hortonworks.com/ambari/suse11/2.x/updates/2.0.0/ambari.repo
UBUNTU 12
wget -nv http://public-repo1.hortonworks.com/ambari/ubuntu12/2.x/updates/2.0.0/ambari.list
wget -nv http://public-repo1.hortonworks.com/ambari/centos5/2.x/updates/2.0.0/ambari.repo
88
March 26, 2015
If you do not have Internet access to set up the Stack repositories, use the link appropriate for your
OS family to download a tarball that contains the HDP Stack version you plan to install.
wget -nv http://public-repo-1.hortonworks.com/HDP/centos6/HDP-2.2.4.2centos6-rpm.tar.gz
wget -nv http://public-repo-1.hortonworks.com/HDP-UTILS1.1.0.20/repos/centos6/HDP-UTILS-1.1.0.20-centos6.tar.gz
SLES 11SP3
wget -nv http://public-repo-1.hortonworks.com/HDP/suse11sp3/HDP2.2.4.2-suse11sp3-rpm.tar.gz
wget -nv http://public-repo-1.hortonworks.com/HDP-UTILS1.1.0.20/repos/suse11sp3/HDP-UTILS-1.1.0.20-suse11sp3.tar.gz
UBUNTU 12
wget -nv http://public-repo-1.hortonworks.com/HDP/ubuntu12/HDP-2.2.4.2ubuntu12-deb.tar.gz
wget -nv http://public-repo-1.hortonworks.com/HDP-UTILS1.1.0.20/repos/ubuntu12/HDP-UTILS-1.1.0.20-ubuntu12.tar.gz
SLES 11
wget -nv http://public-repo-1.hortonworks.com/HDP/suse11sp3/HDP2.1.10.0-suse11sp3-rpm.tar.gz
wget -nv http://public-repo-1.hortonworks.com/HDP-UTILS1.1.0.19/repos/suse11sp3/HDP-UTILS-1.1.0.19-suse11sp3.tar.gz
UBUNTU 12
89
March 26, 2015
wget -nv http://public-repo-1.hortonworks.com/HDP/ubuntu12/HDP2.1.10.0-ubuntu12-tars-tarball.tar.gz
wget -nv http://public-repo-1.hortonworks.com/HDP-UTILS1.1.0.19/repos/ubuntu12/hdp.list
SLES 11
wget -nv http://public-repo-1.hortonworks.com/HDP/suse11/HDP-2.0.13.0suse11-rpm.tar.gz
wget -nv http://public-repo-1.hortonworks.com/HDP-UTILS1.1.0.17/repos/suse11/HDP-UTILS-1.1.0.17-suse11.tar.gz
If you have temporary Internet access for setting up the Stack repositories, use the link appropriate
for your OS family to download a repository that contains the HDP Stack version you plan to install.
wget -nv http://public-repo1.hortonworks.com/HDP/centos6/2.x/updates/2.2.4.2/hdp.repo -O
/etc/yum.repos.d/HDP.repo
SLES 11SP3
wget -nv http://public-repo1.hortonworks.com/HDP/suse11sp3/2.x/updates/2.2.4.2/hdp.repo -O
/etc/zypp/repos.d/HDP.repo
UBUNTU 12
wget -nv http://public-repo1.hortonworks.com/HDP/ubuntu12/2.x/updates/2.2.4.2/hdp.list -O
/etc/apt/sources.list.d/HDP.list
90
March 26, 2015
SLES 11SP3
wget -nv http://public-repo1.hortonworks.com/HDP/suse11sp3/2.x/updates/2.1.10.0/hdp.repo -O
UBUNTU 12
wget -nv http://public-repo1.hortonworks.com/HDP/ubuntu12/2.x/updates/2.1.10.0/hdp.list
/etc/apt/sources.list.d/HDP.list
/etc/yum.repos.d/hdp.repo
SLES 11
wget -nv http://public-repo1.hortonworks.com/HDP/suse11/2.x/updates/2.0.13.0/hdp.repo -O
RHEL/CentOS/ORACLE 5 (DEPRECATED)
Setting Up a Local Repository
Based on your Internet access, choose one of the following options:
•
No Internet Access
91
March 26, 2015
This option involves downloading the repository tarball, moving the tarball to the selected
mirror server in your cluster, and extracting to create the repository.
•
Temporary Internet Access
This option involves using your temporary Internet access to sync (using reposync) the
software packages to your selected mirror server and creating the repository.
Both options proceed in a similar, straightforward way. Setting up for each option presents
some key differences, as described in the following sections:
•
Getting Started Setting Up a Local Repository
•
Setting Up a Local Repository with No Internet Access
•
Setting Up a Local Repository with Temporary Internet Access
Getting Started Setting Up a Local Repository
To get started setting up your local repository, complete the following prerequisites:
•
Select an existing server in, or accessible to the cluster, that runs a supported operating
system
•
Enable network access from all hosts in your cluster to the mirror server
•
Ensure the mirror server has a package manager installed such as yum (RHEL / CentOS /
Oracle Linux), zypper (SLES), or apt-get (Ubuntu)
•
Optional: If your repository has temporary Internet access, and you are using
RHEL/CentOS/Oracle Linux as your OS, install yum utilities:
yum install yum-utils createrepo
1
Create an HTTP server.
1
On the mirror server, install an HTTP server (such as Apache httpd) using the
instructions provided here .
2
Activate this web server.
3
Ensure that any firewall settings allow inbound HTTP access from your cluster nodes
to your mirror server.
If you are using Amazon EC2, make sure that SELinux is disabled.
2
On your mirror server, create a directory for your web server.
•
For example, from a shell window, type:
•
For RHEL/CentOS/Oracle Linux: "
mkdir -p /var/www/html/
• For SLES:
mkdir -p /srv/www/htdocs/rpms
• For Ubuntu:
92
March 26, 2015
mkdir -p /var/www/html/
•
If you are using a symlink, enable the followsymlinks on your web server.
After you have completed the steps in Getting Started Setting up a Local Repository,
move on to specific setup for your repository internet access type.
Setting Up a Local Repository with No Internet Access
After completing the Getting Started Setting up a Local Repository procedure, finish setting up your
repository by completing the following steps:
1
Obtain the tarball for the repository you would like to create. For options, see Obtaining the
Repositories.
2
Copy the repository tarballs to the web server directory and untar.
1
Browse to the web server directory you created.
•
For RHEL/CentOS/Oracle Linux: "
cd /var/www/html/
• For SLES:
cd /srv/www/htdocs/rpms
• For Ubuntu:
cd /var/www/html/
2
Untar the repository tarballs to the following locations:
where <web.server>, <web.server.directory>, <OS>, <version>, and
<latest.version> represent the name, home directory, operating system type,
version, and most recent release version, respectively.
Repository Content
Ambari Repository
Repository Location
Untar under <web.server.directory>
Create directory and untar under <web.server.directory>/hdp
Table 10. Untar Locations for a Local Repository -‐ No Internet Access 3
Confirm you can browse to the newly created local repositories.
Repository
Ambari Base URL
HDP Base URL
HDP-UTILS Base URL
URL
http://<web.server>/ambari/<OS>/2.x/updates/2.0.0
http://<web.server>/hdp/HDP/<OS>/2.x/updates/<latest.version>
http://<web.server>/hdp/HDP-UTILS-<version>/repos/<OS>
Table 11. URLs for a Local Repository -‐ No Internet Access where <web.server> = FQDN of the web server host, and <OS> is centos5, centos6, sles11,
or ubuntu12.
93
March 26, 2015
Be sure to record these Base URLs. You will need them when installing Ambari and the
cluster.
4
Optional: If you have multiple repositories configured in your environment, deploy the
following plug-in on all the nodes in your cluster.
1
Install the plug-in.
•
For RHEL and CentOS 6:
yum install yum-plugin-priorities
• For RHEL and CentOS 5:
yum install yum-priorities
2
Edit the /etc/yum/pluginconf.d/priorities.conf file to add the following:
[main]
enabled=1
gpgcheck=0
Setting up a Local Repository With Temporary Internet Access
After completing the Getting Started Setting up a Local Repository procedure, finish setting up your
repository by completing the following steps:
1
Put the repository configuration files for Ambari and the Stack in place on the host. For
options, see Obtaining the Repositories.
2
Confirm availability of the repositories.
For RHEL/CentOS/Oracle Linux:
yum repolist
For SLES:
zypper repos
For Ubuntu:
dpkg -list
3
Synchronize the repository contents to your mirror server.
•
Browse to the web server directory:
cd /var/www/html
For SLES:
cd /srv/www/htdocs/rpms
For Ubuntu:
94
March 26, 2015
cd var/www/html
•
For Ambari, create ambari directory and reposync.
mkdir -p ambari/<OS>
cd ambari/<OS>
reposync -r Updates-ambari-2.0.0
where <OS> is centos5, centos6, sles11, or ubuntu12.
•
For HDP Stack Repositories, create hdp directory and reposync.
mkdir -p hdp/<OS>
cd hdp/<OS>
reposync -r HDP-<latest.version>
reposync -r HDP-UTILS-<version>
4
Generate the repository metadata.
•
For Ambari:
createrepo <web.server.directory>/ambari/<OS>/Updates-ambari2.0.0
•
For HDP Stack Repositories:
createrepo <web.server.directory>/hdp/<OS>/HDP-<latest.version>
createrepo <web.server.directory>/hdp/<OS>/HDP-UTILS-<version>
5
Confirm that you can browse to the newly created repository.
Repository
Ambari Base URL
HDP Base URL
HDP-UTILS Base URL
URL
http://<web.server>/ambari/<OS>/Updates-ambari-2.0.0
http://<web.server>/hdp/<OS>/HDP-<latest.version>
http://<web.server>/hdp/<OS>/HDP-UTILS-<version>
Table 12. URLs for the New Repository where <web.server> = FQDN of the web server host, and <OS> is centos5, centos6, sles11,
or ubuntu12.
Be sure to record these Base URLs. You will need them when installing Ambari and the
Cluster.
6
Optional. If you have multiple repositories configured in your environment, deploy the
following plug-in on all the nodes in your cluster.
1
Install the plug-in.
•
For RHEL and CentOS 6:
yum install yum-plugin-priorities
95
2
March 26, 2015
[main]
enabled=1
gpgcheck=0
Preparing The Ambari Repository Configuration File
1
Download the ambari.repo file from the mirror server you created in the preceding sections
or from the public repository.
•
From your mirror server:
http://<web.server>/ambari/<OS>/2.x/updates/2.0.0/ambari.repo
•
From the public repository:
http://public-repo1.hortonworks.com/ambari/<OS>/2.x/updates/2.0.0/ambari.repo
where <web.server> = FQDN of the web server host, and <OS> is CENTOS6, SLES11, or
UBUNTU12.
2
Edit the ambari.repo file using the Ambari repository Base URL obtained when setting up
your local repository. Refer to step 3 in Setting Up a Local Repository with No Internet
Access, or step 5 in Setting Up a Local Repository with Temporary Internet Access, if
necessary.
Repository
Ambari Base URL
URL
http://<web.server>/ambari/<OS>/2.x/updates/2.0.0
Table 13. Base URL for a Local Repository where <web.server> = FQDN of the web server host, and <OS> is CENTOS6, SLES11, or
UBUNTU12.
3
If this an Ambari updates release, disable the GA repository definition.
[ambari-2.x]
name=Ambari 2.x
baseurl=http://public-repo-1.hortonworks.com/ambari/centos6/2.x/GA
gpgcheck=1
gpgkey=http://public-repo-1.hortonworks.com/ambari/centos6/RPM-GPGKEY/RPM-GPG-KEY-Jenkins
enabled=0
priority=1
4
Place the ambari.repo file on the machine you plan to use for the Ambari Server.
/etc/yum.repos.d/ambari.repo
96
March 26, 2015
For SLES:
/etc/zypp/repos.d/ambari.repo
For Ubuntu:
/etc/apt/sources.list.d/ambari.list
[main]
enabled=1
gpgcheck=0
5
Proceed to Installing Ambari Server to install and setup Ambari Server.
Download the Ambari Repo
Select one of the following tabs that shows the OS family running on your installation host.
Follow instructions in the section for the operating system that runs on your installation host.
Use a command line editor to perform each instruction.
1
Log in to your host as root. You may sudo as su if your environment requires such access.
For example, type:
<username> ssh <hostname.FQDN>
sudo su where <username> is your user name and <hostname.FQDN> is the fully qualified domain
name of your server host.
2
Download the Ambari repository file to a directory on your installation host.
wget -nv http://public-repo1.hortonworks.com/ambari/centos6/2.x/updates/2.0.0/ambari.repo -O
Do not modify the ambari.repo file name. This file is expected to be available on the
Ambari Server host during Agent registration.
3
Confirm that the repository is configured by checking the repo list.
yum repolist
You should see values similar to the following for Ambari repositories in the list.
Version values vary, depending on the installation.
repo id
repo name
97
status
AMBARI.2.0.0-.x
base
extras
updates
4
March 26, 2015
Ambari 2.x
CentOS-6 - Base
CentOS-6 - Extras
CentOS-6 - Updates
5
6,518
15
209
Install the Ambari bits. This also installs the default PostgreSQL Ambari database.
yum install ambari-server
5
Enter y when prompted to to confirm transaction and dependency checks.
A successful installation displays output similar to the following:
Installing : postgresql-libs-8.4.20-1.el6_5.x86_64
1/4
Installing : postgresql-8.4.20-1.el6_5.x86_64
2/4
Installing : postgresql-server-8.4.20-1.el6_5.x86_64
3/4
Installing : ambari-server-2.0.0-147.noarch 4/4
4/4
Verifying : postgresql-server-8.4.20-1.el6_5.x86_64
1/4
Verifying : postgresql-libs-8.4.20-1.el6_5.x86_64
2/4
Verifying : ambari-server-2.0.0-147.noarch 4/4
3/4
Verifying : postgresql-8.4.20-1.el6_5.x86_64
4/4
Installed:
ambari-server.noarch 0:1.7.0-135
Dependency Installed:
postgresql.x86_64 0:8.4.20-1.el6_5
0:8.4.20-1.el6_5
postgresql-server.x86_64 0:8.4.20-1.el6_5
postgresql-libs.x86_64
Complete!
Accept the warning about trusting the Hortonworks GPG Key. That key will be
automatically downloaded and used to validate packages from Hortonworks. You will
see the following message:
Importing GPG key 0x07513CAD:
Userid: "Jenkins (HDP Builds) <[email protected]>"
From :
http://s3.amazonaws.com/dev.hortonworks.com/ambari/centos6/RPM-GPGKEY/RPM-GPG-KEY-Jenkins
98
March 26, 2015
SLES 11
1
For example, type:
2
3
wget -nv http://public-repo1.hortonworks.com/ambari/suse11/2.x/updates/2.0.0/ambari.repo -O
4
Confirm the downloaded repository is configured by checking the repo list.
zypper repos
You should see the Ambari repositories in the list.
Alias
AMBARI.2.0.0-1.x
http-demeter.uniregensburg.dec997c8f9
opensuse
5
Name
Ambari 2.x
SUSE-Linux-Enterprise-SoftwareDevelopment-Kit-11-SP1 11.1.1-1.57
Enabled
Yes
Yes
Refresh
No
Yes
OpenSuse
Yes
Yes
Install the Ambari bits. This also installs PostgreSQL.
zypper install ambari-server
6
99
March 26, 2015
Retrieving package postgresql-libs-8.3.5-1.12.x86_64 (1/4), 172.0 KiB
(571.0 KiB unpacked)
Retrieving: postgresql-libs-8.3.5-1.12.x86_64.rpm [done (47.3 KiB/s)]
Installing: postgresql-libs-8.3.5-1.12 [done]
Retrieving package postgresql-8.3.5-1.12.x86_64 (2/4), 1.0 MiB (4.2 MiB
unpacked)
Retrieving: postgresql-8.3.5-1.12.x86_64.rpm [done (148.8 KiB/s)]
Installing: postgresql-8.3.5-1.12 [done]
Retrieving package postgresql-server-8.3.5-1.12.x86_64 (3/4), 3.0 MiB
(12.6 MiB unpacked)
Retrieving: postgresql-server-8.3.5-1.12.x86_64.rpm [done (452.5
KiB/s)]
Installing: postgresql-server-8.3.5-1.12 [done]
Updating etc/sysconfig/postgresql...
Retrieving package ambari-server-1.7.0-135.noarch (4/4), 99.0 MiB
(126.3 MiB unpacked)
Retrieving: ambari-server-1.7.0-135.noarch.rpm [done (3.0 MiB/s)]
Installing: ambari-server-1.7.0-135 [done]
ambari-server
0:off 1:off 2:off 3:on
4:off 5:on
6:off
UBUNTU 12
1
For example, type:
2
3
wget -nv http://public-repo1.hortonworks.com/ambari/ubuntu12/2.x/updates/2.0.0/ambari.list -O
apt-key adv --recv-keys --keyserver keyserver.ubuntu.com
B9733A7A07513CAD
apt-get update
Do not modify the ambari.list file name. This file is expected to be available on the
4
Confirm that Ambari packages downloaded successfully by checking the package name list.
apt-cache pkgnames
You should see the Ambari packages in the list.
100
March 26, 2015
Alias
AMBARI-dev-2.x
5
Name
Ambari 2.x
apt-get install ambari-server
1
For example, type:
2
3
4
Confirm the repository is configured by checking the repo list.
yum repolist
AMBARI.2.0.0-1.x
AMBARI.2.0.0-1.x/primary
AMBARI.2.0.0-1.x
epel
epel/primary_db
repo Id
AMBARI.2.2.0-1.x
base
epel
puppet
updates
5
| 951 B
| 1.6 kB
5/5
| 3.7 kB
| 3.9 MB
00:00
00:00
00:00
00:01
repo Name
Ambari 2.x
CentOS-5 - Base
Extra Packages for Enterprise
Linux 5 - x86_64
Puppet
CentOS-5 - Updates
101
status
5
3,667
7,614
433
118
March 26, 2015
When deploying HDP on a cluster having limited or no Internet access, you should
provide access to the bits using an alternative method.
For more information about setting up local repositories, see Using a Local Repository.
Ambari Server by default uses an embedded PostgreSQL database. When you install
the Ambari Server, the PostgreSQL packages and dependencies must be available for
install. These packages are typically available as part of your Operating System
repositories. Please confirm you have the appropriate repositories available for the
postgresql-server packages.
Set Up the Ambari Server
The ambari-server command manages the setup process. Run the following command on the
Ambari server host:
You may append Setup Options to the command.
ambari-server setup
Respond to the following prompts:
1
If you have not temporarily disabled SELinux, you may get a warning. Accept the default (y),
and continue.
2
By default, Ambari Server runs under root. Accept the default (n) at the Customize user
account for ambari-server daemon prompt, to proceed as root.
If you want to create a different user to run the Ambari Server, or to assign a previously
created user, select y at the Customize user account for ambari-server daemon
prompt, then provide a user name.
3
If you have not temporarily disabled iptables you may get a warning. Enter y to continue.
4
Select a JDK version to download. Enter 1 to download Oracle JDK 1.7.
By default, Ambari Server setup downloads and installs Oracle JDK 1.7 and the
accompanying Java Cryptography Extension (JCE) Policy Files. If you plan to use a
different version of the JDK, see Setup Options for more information.
5
Accept the Oracle JDK license when prompted. You must accept this license to download
the necessary JDK from Oracle. The JDK is installed during the deploy phase.
6
Select n at Enter advanced database configuration to use the default, embedded
PostgreSQL database for Ambari. The default PostgreSQL database name is ambari. The
default user name and password are ambari/bigdata.
Otherwise, to use an existing PostgreSQL, MySQL or Oracle database with Ambari, select y.
•
If you are using an existing PostgreSQL, MySQL, or Oracle database instance, use
one of the following prompts:
102
March 26, 2015
You must prepare a non-default database instance, using the steps detailed in Using
Non-Default Databases-Ambari, before running setup and entering advanced database
configuration.
•
To use an existing Oracle 11g r2 instance, and select your own database name, user
name, and password for that database, enter 2.
Select the database you want to use and provide any information requested at the
prompts, including host name, port, Service Name or SID, user name, and password.
•
To use an existing MySQL 5.x database, and select your own database name, user
name, and password for that database, enter 3.
prompts, including host name, port, database name, user name, and password.
•
To use an existing PostgreSQL 9.x database, and select your own database name,
user name, and password for that database, enter 4.
prompts, including host name, port, database name, user name, and password.
7
At Proceed with configuring remote database connection properties [y/n] choose y.
8
Setup completes.
If your host accesses the Internet through a proxy server, you must configure Ambari
Server to use this proxy server. See How to Set Up an Internet Proxy Server for Ambari
for more information.
Setup Options
The following table describes options frequently used for Ambari Server setup.
Option
-j (or -java-home)
Description
Specifies the JAVA_HOME path to use on the Ambari Server and all hosts in the cluster.
By default when you do not specify this option, Ambari Server setup downloads the
Oracle JDK 1.7 binary and accompanying Java Cryptography Extension (JCE) Policy
Files to /var/lib/ambari-server/resources. Ambari Server then installs the JDK to
/usr/jdk64.
Use this option when you plan to use a JDK other than the default Oracle JDK 1.7. See
JDK Requirements for more information on the supported JDKs. If you are using an
alternate JDK, you must manually install the JDK on all hosts and specify the Java
Home path during Ambari Server setup. If you plan to use Kerberos, you must also
install the JCE on all hosts.
This path must be valid on all hosts.
For example: ambari-server setup –j /usr/java/default
103
--jdbcdriver
--jdbc-db
-s (or -silent)
-v (or -verbose)
-g (or -debug)
March 26, 2015
Should be the path to the JDBC driver JAR file. Use this option to specify the location
of the JDBC driver JAR and to make that JAR available to Ambari Server for distribution
to cluster hosts during configuration. Use this option with the --jdbc-db option to
specify the database type.
Specifies the database type. Valid values are: [postgres | mysql | oracle] Use this option
with the --jdbc-driver option to specify the location of the JDBC driver JAR file.
Setup runs silently. Accepts all default prompt values.
Prints verbose info and warning messages to the console during Setup.
Start Ambari Server in debug mode
Next Steps
Start the Ambari Server
Start the Ambari Server
•
Run the following command on the Ambari Server host:
ambari-server start
•
To check the Ambari Server processes:
ambari-server status
•
To stop the Ambari Server:
ambari-server stop
If you plan to use an existing database instance for Hive or for Oozie, you must
complete the preparations described in Using Non-Default Databases-Hive and Using
Non-Default Databases-Oozie before installing your Hadoop cluster.
Next Steps
Install, configure and deploy an HDP cluster
104
March 26, 2015
Install, Configure and Deploy a HDP Cluster
This section describes how to use the Ambari install wizard running in your browser to install,
configure, and deploy your cluster.
•
Log In to Apache Ambari
•
Name Your Cluster
•
Select Stack
•
Install Options
•
Confirm Hosts
•
Choose Services
•
Assign Masters
•
Assign Slaves and Clients
•
Customize Services
•
Review
•
Install, Start and Test
•
Complete
Log In to Apache Ambari
After starting the Ambari service, open Ambari Web using a web browser.
1
Point your browser to http://<your.ambari.server>:8080, where
<your.ambari.server> is the name of your ambari server host. For example, a default
Ambari server host is located at http://c6401.ambari.apache.org:8080.
2
Log in to the Ambari Server using the default user name/password: admin/admin. You can
change these credentials later.
For a new cluster, the Ambari install wizard displays a Welcome page from which you launch
the Ambari Install wizard.
Launching the Ambari Install Wizard
From the Ambari Welcome page, choose Launch Install Wizard.
105
March 26, 2015
Name Your Cluster
1
In Name your cluster, type a name for the cluster you want to create. Use no white spaces
or special characters in the name.
2
Choose Next.
Select Stack
The Service Stack (the Stack) is a coordinated and tested set of HDP components. Use a radio
button to select the Stack version you want to install. To install an HDP 2x stack, select the HDP 2.2,
HDP 2.1, or HDP 2.0 radio button.
106
March 26, 2015
Expand Advanced Repository Options to select the Base URL of a repository from which Stack
software packages download. Ambari sets the default Base URL for each repository, depending on
the Internet connectivity available to the Ambari server host, as follows:
•
For an Ambari Server host having Internet connectivity, Ambari sets the repository Base URL
for the latest patch release for the HDP Stack version. For an Ambari Server having NO
Internet connectivity, the repository Base URL defaults to the latest patch release version
available at the time of Ambari release.
•
You can override the repository Base URL for the HDP Stack with an earlier patch release if
you want to install a specific patch release for a given HDP Stack version. For example, the
HDP 2.1 Stack will default to the HDP 2.1 Stack patch release 7, or HDP-2.1.7. If you want to
install HDP 2.1 Stack patch release 2, or HDP-2.1.2 instead, obtain the Base URL from the
HDP Stack documentation, then enter that location in Base URL.
•
If you are using a local repository, see Using a Local Repository for information about
configuring a local repository location, then enter that location as the Base URL instead of
the default, public-hosted HDP Stack repositories.
107
March 26, 2015
The UI displays repository Base URLs based on Operating System Family (OS Family).
Be sure to set the correct OS Family based on the Operating System you are running.
The following table maps the OS Family to the Operating Systems.
Table 14. Operating Systems mapped to each OS Family OS Family
redhat6
suse11
ubuntu12
redhat5
Operating Systems
Red Hat 6, CentOS 6, Oracle Linux 6
SUSE Linux Enterprise Server 11
Ubuntu Precise 12.04
Red Hat 5, CentOS 5, Oracle Linux 5
108
March 26, 2015
Install Options
In order to build up the cluster, the install wizard prompts you for general information about how you
want to set it up. You need to supply the FQDN of each of your hosts. The wizard also needs to
access the private key file you created in Set Up Password-less SSH. Using the host names and key
file information, the wizard can locate, access, and interact securely with all hosts in the cluster.
1
Use the Target Hosts text box to enter your list of host names, one per line. You can use
ranges inside brackets to indicate larger sets of hosts. For example, for host01.domain
through host10.domain use host[01-10].domain
If you are deploying on EC2, use the internal Private DNS host names.
2
If you want to let Ambari automatically install the Ambari Agent on all your hosts using SSH,
select Provide your SSH Private Key and either use the Choose File button in the Host
Registration Information section to find the private key file that matches the public key
you installed earlier on all your hosts or cut and paste the key into the text box manually.
If you are using IE 9, the Choose File button may not appear. Use the text box to cut
and paste your private key manually.
Fill in the user name for the SSH key you have selected. If you do not want to use root , you
must provide the user name for an account that can execute sudo without entering a
password.
3
If you do not want Ambari to automatically install the Ambari Agents, select Perform
manual registration. For further information, see Installing Ambari Agents Manually.
4
Choose Register and Confirm to continue.
Confirm Hosts
Confirm Hosts prompts you to confirm that Ambari has located the correct hosts for your cluster
and to check those hosts to make sure they have the correct directories, packages, and processes
required to continue the install.
If any hosts were selected in error, you can remove them by selecting the appropriate checkboxes
and clicking the grey Remove Selected button. To remove a single host, click the small white
Remove button in the Action column.
At the bottom of the screen, you may notice a yellow box that indicates some warnings were
encountered during the check process. For example, your host may have already had a copy of wget
or curl. Choose Click here to see the warnings to see a list of what was checked and what
caused the warning. The warnings page also provides access to a python script that can help you
clear any issues you may encounter and let you run Rerun Checks.
109
March 26, 2015
If you are deploying HDP using Ambari 1.4 or later on RHEL 6.5 you will likely see
Ambari Agents fail to register with Ambari Server during the Confirm Hosts step in
the Cluster Install wizard. Click the Failed link on the Wizard page to display the Agent
logs. The following log entry indicates the SSL connection between the Agent and
Server failed during registration:
INFO 2014-04-02 04:25:22,669 NetUtil.py:55
- Failed to connect to https://<ambari-server>:8440/cert/ca due to
[Errno 1] _ssl.c:492:
error:100AE081:elliptic curve
routines:EC_GROUP_new_by_curve_name:unknown group
For more information about this issue, see the Ambari Troubleshooting Guide.
When you are satisfied with the list of hosts, choose Next.
Choose Services
Based on the Stack chosen during Select Stack, you are presented with the choice of Services to
install into the cluster. HDP Stack comprises many services. You may choose to install any other
available services now, or to add services later. The install wizard selects all available services for
installation by default.
1
Choose none to clear all selections, or choose all to select all listed services.
2
Choose or clear individual checkboxes to define a set of services to install now.
3
After selecting the services to install now, choose Next.
Assign Masters
The Ambari install wizard assigns the master components for selected services to appropriate hosts
in your cluster and displays the assignments in Assign Masters. The left column shows services and
current hosts. The right column shows current master component assignments by host, indicating
the number of CPU cores and amount of RAM installed on each host.
1
To change the host assignment for a service, select a host name from the drop-down menu
for that service.
2
To remove a ZooKeeper instance, click the green minus icon next to the host address you
want to remove.
3
When you are satisfied with the assignments, choose Next.
Assign Slaves and Clients
The Ambari installation wizard assigns the slave components (DataNodes, NodeManagers, and
RegionServers) to appropriate hosts in your cluster. It also attempts to select hosts for installing the
appropriate set of clients.
1
Use all or none to select all of the hosts in the column or none of the hosts, respectively.
110
March 26, 2015
If a host has a red asterisk next to it, that host is also running one or more master
components. Hover your mouse over the asterisk to see which master components are on
that host.
2
Fine-tune your selections by using the checkboxes next to specific hosts.
As an option you can start the HBase REST server manually after the install process is
complete. It can be started on any host that has the HBase Master or the Region
Server installed. If you attempt to start it on the same host as the Ambari server,
however, you need to start it with the -p option, as its default port is 8080 and that
conflicts with the Ambari Web default port.
/usr/lib/hbase/bin/hbase-daemon.sh start rest -p
<custom_port_number>
3
When you are satisfied with your assignments, choose Next.
Customize Services
Customize Services presents you with a set of tabs that let you manage configuration settings for
HDP components. The wizard sets reasonable defaults for each of the options here, but you can use
this set of tabs to tweak those settings. You are strongly encouraged to do so, as your requirements
may be slightly different. Pay particular attention to the directories suggested by the installer.
To prevent out-of-memory errors during the install, at the Customize Services step in
the Cluster Install wizard browse to Hive > hive-site.xml, then modify the
following configuration settings:
Property Name
fs.hdfs.impl.disable.cache
fs.file.impl.disable.cache
Purpose
Disable HDFS
filesystem cache
Disable local
filesystem cache
Default Value
false
Required Value
true
false
true
For the HDFSServicesConfigsGeneral configuration property, make sure to enter
an integer value, in bytes, that sets the HDFS maximum edit log size for checkpointing.
A typical value is 500000000.
Hover your cursor over each of the properties to see a brief description of what it does. The number
of tabs you see is based on the type of installation you have decided to do. A typical installation has
at least ten groups of configuration properties and other related options, such as database settings
for Hive/HCat and Oozie, admin name/password, and alert email for Nagios.
The install wizard sets reasonable defaults for all properties. You must provide database passwords
for the Hive, Nagios, and Oozie services, the Master Secret for Knox, and a valid email address to
which system alerts will be sent. Select each service that displays a number highlighted red. Then, fill
in the required field on the Service Config tab. Repeat this until the red flags disappear.
For example, Choose Hive. Expand the Hive Metastore section, if necessary. In Database Password,
provide a password, then retype to confirm it, in the fields marked red and "This is required."
111
March 26, 2015
For more information about customizing specific services for a particular HDP Stack, see
Customizing HDP Services.
After you complete Customizing Services, choose Next.
Review
The assignments you have made are displayed. Check to make sure everything is correct. If you
need to make changes, use the left navigation bar to return to the appropriate screen.
To print your information for later reference, choose Print.
When you are satisfied with your choices, choose Deploy.
Install, Start and Test
The progress of the install displays on the screen. Ambari installs, starts, and runs a simple test on
each component. Overall status of the process displays in progress bar at the top of the screen and
host-by-host status displays in the main section. Do not refresh your browser during this process.
Refreshing the browser may interrupt the progress indicators.
To see specific information on what tasks have been completed per host, click the link in the
Message column for the appropriate host. In the Tasks pop-up, click the individual task to see the
related log files. You can select filter conditions by using the Show drop-down list. To see a larger
version of the log contents, click the Open icon or to copy the contents to the clipboard, use the
Copy icon.
When Successfully installed and started the services appears, choose Next.
Complete
The Summary page provides you a summary list of the accomplished tasks. Choose Complete.
Ambari Web GUI displays.
112
March 26, 2015
Upgrading Ambari
Ambari and the HDP Stack being managed by Ambari can be upgraded independently. This guide
provides information on:
•
Upgrading to Ambari 2.0
•
Planning for Ambari Alerts and Metrics in Ambari 2.0
•
Upgrading Ambari with Kerberos-Enabled Cluster
•
Upgrade HDP Stack from HDP 2.1 to 2.2
•
Upgrade HDP Stack from HDP 2.0 to 2.2
•
Automated HDP Stack Upgrade: HDP 2.2.0 to 2.2.4
•
Manual HDP Stack Upgrade: HDP 2.2.0 to 2.2.4
Ambari 2.0 does not include support for managing HDP 1.3 Stack. For more
information, see the Stack Compatibility Matrix. If you are using Ambari to manage an
HDP 1.3 Stack, prior to upgrading to Ambari 2.0, you must first upgrade your
Stack to HDP 2.0 or later. For more information about upgrading HDP 1.3 Stack to
HDP 2.0 or later, see the Ambari 1.7.0 upgrade instructions. Then, return to this guide
and perform your upgrade to Ambari 2.0.
Ambari 2.0 Upgrade Guide
Upgrading to Ambari 2.0
Use this procedure to upgrade Ambari 1.4.1 through 1.7.0 to Ambari 2.0.0. If your current Ambari
version is 1.4.1 or below, you must upgrade the Ambari Server version to 1.7.0 before upgrading to
version 2.0.0. Upgrading Ambari version does not change the underlying HDP Stack being managed
by Ambari.
Before Upgrading Ambari to 2.0.0, make sure that you perform the following actions:
•
You must have root, administrative, or root-equivalent authorization on the Ambari server
host and all servers in the cluster.
•
You must know the location of the Nagios server before you begin the upgrade process.
•
You must know the location of the Ganglia server before you begin the upgrade process.
•
You must backup the Ambari Server database.
•
You must make a safe copy of the Ambari Server configuration file found at /etc/ambariserver/conf/ambari.properties.
•
Plan to remove Nagios and Ganglia from your cluster and replace with Ambari
Alerts and Metrics. For more information, see Planning for Ambari Alerts and Metrics in
Ambari 2.0.
•
If you have a Kerberos-enabled cluster, you must review Upgrading Ambari with KerberosEnabled Clusterand be prepared to perform post-upgrade steps required.
113
March 26, 2015
•
If you are using Ambari with Oracle, you must create an Ambari user in the Oracle database
and grant that user all required permissions. Specifically, you must alter the Ambari database
user and grant the SEQUENCE permission.
For more information about creating users and granting required user permissions, see Using
Ambari with Oracle.
•
If you plan to upgrade your HDP Stack, back up the configuration properties for your current
Hadoop services.
For more information about upgrading the Stack and locating the configuration files for your
current services, see one of the following topics:
1
2
•
Upgrade from HDP 2.1 to HDP 2.2, Getting Ready to Upgrade
•
Upgrade from HDP 2.0 to HDP 2.2, Getting Ready to Upgrade
Stop the Nagios and Ganglia services. In Ambari W eb:
a
Browse to Services and select the Nagios service.
b
Use Service Actions to stop the Nagios service.
c
Wait for the Nagios service to stop.
d
Browse to Services and select the Ganglia service.
e
Use Service Actions to stop the Ganglia service.
f
Wait for the Ganglia service to stop.
Stop the Ambari Server. On the Ambari Server host,
ambari-server stop
3
Stop all Ambari Agents. On each Ambari Agent host,
ambari-agent stop
4
Fetch the new Ambari repo and replace the old repository file with the new repository file on
all hosts in your cluster.
Check your current directory before you download the new repository file to make sure
that there are no previous versions of the ambari.repo file. If you do not, and a
previous version exists, the new download will be saved with a numeric extension,
such as ambari.repo.1. Make sure that the version you copy is the new version.
Select the repository appropriate for your environment from the following list:
•
For RHEL/CentOS 6/Oracle Linux 6:
•
For SLES 11:
114
•
March 26, 2015
For Ubuntu 12:
/etc/apt/sources/list.d/ambari.list
•
For RHEL/CentOS 5/Oracle Linux 5: (DEPRECATED)
If your cluster does not have access to the Internet, set up a local repository with this
data before you continue. See Using a Local Repository for more information.
Ambari Server does not automatically turn off iptables. Check that your installation
setup does not depend on iptables being disabled. After upgrading the server, you
must either disable iptables manually or make sure that you have appropriate ports
available on all cluster hosts. For more information about ports, see Configuring
Network Port Numbers.
5
Upgrade Ambari Server. On the Ambari Server host:
•
yum clean all
yum upgrade ambari-server ambari-log4j
•
For SLES:
zypper clean
zypper up ambari-server ambari-log4j
•
For Ubuntu:
apt-get clean all
apt-get install ambari-server ambari-log4j
115
March 26, 2015
When performing upgrade on SLES, you will see a message "There is an update
candidate for 'ambari-server', but it is from different vendor. Use 'zypper install ambariserver-2.0.0-101.noarch' to install this candidate". You will need to to use yast to
update the package, as follows:
1
From the command line run: > yast.
> yast
2
3
4
5
6
You will see command line UI for YaST program.
Choose Software > Software Management, then click enter button.
In the Search Phrase field, enter ambari-server, then click the enter
button.
On the right side you will see the search result ambari-server 2.0.0 .
Click Actions, choose Update, then click the enter button.
Go to Accept, and click enter.
Check for upgrade success by noting progress during the Ambari server installation process
you started in Step 5.
•
As the process runs, the console displays output similar, although not identical, to
the following:
Setting up Upgrade Process
Resolving Dependencies
--> Running transaction check
---> Package ambari-log4j.noarch 0:1.7.0.169-1 will be updated ...
---> Package ambari-log4j.noarch 0:2.0.0.1129-1 will be an update ...
---> Package ambari-server.noarch 0:1.7.0-169 will be updated ...
---> Package ambari-log4j.noarch 0:2.0.0.1129 will be an update ...
•
If the upgrade fails, the console displays output similar to the following:
Setting up Upgrade Process
No Packages marked for Update
•
A successful upgrade displays the following output:
Updated: ambari-log4j.noarch 0:2.0.0.111-1 ambari-server.noarch 0:2.0.0-111
Complete!
Confirm there is only one ambari-server*.jar file in /usr/lib/ambari-server. If there
is more than one JAR file with name ambari-server*.jar, move all JARs except ambariserver-2.0.0.*.jar to /tmp before proceeding with upgrade.
7
On the Ambari Server host:
If ambari-agent is also installed on this host, first run "yum upgrade ambari-agent" (or
equivalent in other OS'es)
Now, upgrade the server database schema by running,
ambari-server upgrade
8
Upgrade the Ambari Agent on each host. On each Ambari Agent host:
116
March 26, 2015
yum upgrade ambari-agent ambari-log4j
For SLES:
zypper up ambari-agent ambari-log4j
Ignore the warning that begins with "There are some running programs that use files
deleted by recent upgrade".
When performing upgrade on SLES, you will see a message "There is an update
candidate for 'ambari-agent', but it is from different vendor. Use 'zypper install ambariagent-2.0.0-101.noarch' to install this candidate". You will need to to use yast to
update the package, as follows:
1
From the command line run: > yast
> yast
2
3
4
5
You will see command line UI for YaST program.
Choose Software > Software Management, then click enter button.
In the Search Phrase field, enter ambari-agent, then click the enter
button.
On the right side you will see the search result ambari-agent 2.0.0 . Click
Actions, choose Update, then click the enter button.
Go to Accept, and click enter.
For Ubuntu:
apt-get update
apt-get install ambari-agent ambari-log4j
9
After the upgrade process completes, check each host to make sure the new 2.0.0 files have
been installed:
rpm -qa | grep ambari
10 Start the Ambari Server. On the Ambari Server host:
ambari-server start
11 Start the Ambari Agents on all hosts. On each Ambari Agent host:
ambari-agent start
12 Open Ambari Web.
117
March 26, 2015
Point your browser to http://<your.ambari.server>:8080
where <your.ambari.server> is the name of your ambari server host. For example,
c6401.ambari.apache.org.
Refresh your browser so that it loads the new version of the Ambari Web code. If you
have problems, clear your browser cache manually, then restart Ambari Server.
13 Log in, using the Ambari administrator credentials that you have set up.
For example, the default name/password is admin/admin.
14 If you have customized logging properties, you will see a Restart indicator next to each
service name after upgrading to Ambari 2.0.0.
Restarting a service pushes the configuration properties displayed in Custom
log4j.properties to each host running components for that service.
To preserve any custom logging properties after upgrading, for each service:
1
Replace default logging properties with your custom logging properties, using
Service Configs > Custom
log4j.properties.
2
Restart all components in any services for which you have customized logging
properties.
15 Review the HDP-UTILS repository Base URL setting in Ambari.
If you are upgrading from Ambari 1.6.1 or earlier, the HDP-UTILS repository Base URL is no
longer set in the ambari.repo file.
If using HDP 2.2 Stack:
•
Browse to Ambari W eb > Admin > Stack and Versions.
•
Click on the Versions tab.
•
You will see the current installed HDP Stack version displayed.
•
Click the Edit repositories icon in the upper-right of the version display and confirm
the value of the HDP-UTILS repository Base URL is correct for your environment.
•
If you are using a local repository for HDP-UTILS, be sure to confirm the Base URL is
correct for your locally hosted HDP-UTILS repository.
If using HDP 2.0 or 2.1 Stack:
•
Browse to Ambari W eb > Admin > Stack and Versions.
•
Under the Services table, the current Base URL settings are displayed.
118
March 26, 2015
•
Confirm the value of the HDP-UTILS repository Base URL is correct for your
environment or click the Edit button to modify the HDP-UTILS Base URL.
•
If you are using a local repository for HDP-UTILS, be sure to confirm the Base URL is
correct for your locally hosted HDP-UTILS repository.
16 If using HDP 2.2 Stack, you must get the cluster hosts to advertise the "current version". This
can be done by restarting a master or slave component (such as a DataNode) on each host
to have the host advertise it's version so Ambari can record the version. For example, in
Ambari Web, navigate to the Hosts page and select any Host that has the DataNode
component, then restart that DataNode component on that single host.
17 If you have configured Ambari to authenticate against an external LDAP or Active Directory,
review your Ambari LDAP authentication settings. You must re-run "ambari-server setupldap”. For more information, see Set Up LDAP or Active Directory Authentication.
18 If you have configured your cluster for Hive or Oozie with an external database (Oracle,
MySQL or PostgreSQL), you must re-run “ambari-server setup --jdbc-db and --jdbc-driver” to
get the JDBC driver JAR file in place. For more information, see Using Non-Default
Databases - Hive and Using Non-Default Databases - Oozie.
19 Adjust your cluster for Ambari Alerts and Metrics. For more information, see Planning for
Ambari Alerts and Metrics in Ambari 2.0.
20 Adjust your cluster for Kerberos (if already enabled). For more information, see Upgrading
Ambari with Kerberos-Enabled Cluster.
Planning for Ambari Alerts and Metrics in Ambari 2.0
As part of Ambari 2.0, Ambari includes built-in systems for alerting and metrics collection. Therefore,
when upgrading to Ambari 2.0, the legacy Nagios and Ganglia services must be removed and
replaced with the new systems.
We highly recommended that you perform and validate this procedure in a test
environment prior to attempting the Ambari upgrade on production systems.
Moving from Nagios to Ambari Alerts
After upgrading to Ambari 2.0, the Nagios service will be removed from the cluster. The Nagios server
and packages will remain on the existing installed host but Nagios itself is removed from Ambari
management.
Nagios used the operating system sendmail utility to dispatch email alerts on changes.
With Ambari Alerts, the email dispatch is handled from the Ambari Server via Javamail.
Therefore, you must provide SMTP information to Ambari for sending email alerts.
Have this information ready. You will use it after the Ambari 2.0 upgrade to get Ambari
Alert email notifications configured in the new Ambari Alerts system.
The Ambari Alerts system is configured automatically to replace Nagios but you must:
1
Configure email notifications in Ambari to handle dispatch of alerts. Browse to Ambari W eb
> Alerts.
119
March 26, 2015
2
In the Actions menu, select Manage Notifications.
3
Click to Create a new Notification. Enter information about the SMTP host, port to and
from email addresses and select the Alerts to receive notifications.
4
Click Save.
(Optional) Remove the Nagios packages (nagios, nagios-www) from the Nagios host.
For more information Ambari Alerts, see Managing Alerts in the Ambari User’s Guide.
Moving from Ganglia to Ambari Metrics
After upgrading to Ambari 2.0, the Ganglia service stays intact in cluster. You must perform the
following steps to remove Ganglia from the cluster and to move to the new Ambari Metrics system.
•
•
If you are using HDP 2.2 Stack, Storm metrics will not work with Ambari Metrics
until you are upgraded to HDP 2.2.4 or later.
Do not add the Ambari Metrics service to your cluster until you have removed
Ganglia using the steps below.
1
Stop Ganglia service via Ambari Web.
2
Using the Ambari REST API, remove the Ganglia service by executing the following:
curl -u <admin_user_name>:<admin_password> -H 'X-Requested-By:ambari' -X
DELETE
'http://<ambari_server_host>:8080/api/v1/clusters/<cluster_name>/services/GAN
GLIA'
3
Refresh Ambari Web and make sure that Ganglia service is no longer visible.
4
In the Actions menu on the left beneath the list of Services, use the "Add Service" wizard to
add Ambari Metrics to the cluster.
5
This will install an Ambari Metrics Collector into the cluster, and an Ambari Metrics
Monitor on each host.
6
Pay careful attention to following service configurations:
Section
Advanced
amshbasesite
7
Property
hbase.rootdir
Description
Ambari Metrics service uses HBase as default
storage backend. Set the rootdir for HBase to
either local filesystem path if using Ambari
Metrics in embedded mode or to a HDFS dir.
For example:
hdfs://namenode.example.org:8020/amshbase.
Default Value
file:///var/lib/ambarimetrics-collector/hbas
For the cluster services to start sending metrics to Ambari Metrics, restart all services. For
example, restart HDFS, YARN, HBase, Flume, Storm and Kafka.
120
March 26, 2015
(Optional) Remove the Ganglia packages (ganglia-gmetad and ganglia-gmond) from
the hosts.
If you are managing a HDP 2.2 cluster that includes Kafka, you must adjust the Kafka
configuration to send metrics to the Ambari Metrics system.
From Ambari Web, browse to Services > Kafka > Configs and edit the kafka-env
template found under Advanced kafka-env to include the following: :
# Add kafka sink to classpath and related dependencies
if [ -e "/usr/lib/ambari-metrics-kafka-sink/ambari-metrics-kafkasink.jar" ]; then
export CLASSPATH=$CLASSPATH:/usr/lib/ambari-metrics-kafkasink/ambari-metrics-kafka-sink.jar
export CLASSPATH=$CLASSPATH:/usr/lib/ambari-metrics-kafkasink/lib/* fi
Upgrading Ambari with Kerberos-Enabled Cluster
If you are upgrading to Ambari 2.0 from an Ambari-managed cluster that is already Kerberos enabled,
because of the new Ambari 2.0 Kerberos features, you need perform the following steps after Ambari
upgrade.
1
Review the procedure for Configuring Ambari and Hadoop for Kerberos in the Ambari
Security Guide.
2
Have your Kerberos environment information readily available, including your KDC Admin
account credentials.
3
Take note of current Kerberos security settings for your cluster.
a
Browse to Services > HDFS > Configs.
b
Record the core-site auth-to-local property value.
4
Upgrade Ambari according to the steps in Upgrading to Ambari 2.0.
5
Ensure your cluster and the Services are healthy.
6
Browse to Admin > Kerberos and you’ll notice Ambari thinks that Kerberos is not enabled.
Run the Enable Kerberos Wizard, following the instructions in the Ambari Security Guide.
7
Ensure your cluster and the Services are healthy.
8
Verify the Kerberos security settings for your cluster are correct.
•
Browse to Services > HDFS > Configs.
•
Check the core-site auth-to-local property value.
•
Adjust as necessary, based on the pre-upgrade value recorded in Step 3.
121
March 26, 2015
Upgrading the HDP Stack from 2.1 to 2.2
The HDP Stack is the coordinated set of Hadoop components that you have installed on hosts in
your cluster. Your set of Hadoop components and hosts is unique to your cluster. Before upgrading
the Stack on your cluster, review all Hadoop services and hosts in your cluster. For example, use the
Hosts and Services views in Ambari Web, which summarize and list the components installed on
each Ambari host, to determine the components installed on each host. For more information about
using Ambari to view components in your cluster, see Working with Hosts, and Viewing Components
on a Host.
Upgrading the HDP Stack is a three-step procedure:
1
Prepare the 2.1 Stack for Upgrade
2
Upgrade the 2.1 Stack to 2.2
3
Complete the Upgrade of the 2.1 Stack to 2.2
If you plan to upgrade your existing JDK, do so after upgrading Ambari, before
upgrading the Stack. The upgrade steps require that you remove HDP v2.1
components and install HDP v2.2.0 components.
As noted in that section, you should remove and install on each host, only the
components on each host that you want to run on the HDP 2.2.0 stack. For example, if
you want to run Storm or Falcon components on the HDP 2.2.0 stack, you will install
those components and then configure their properties during the upgrade procedure.
In preparation for future HDP 2.2 releases to support rolling upgrades, the HDP RPM package
version naming convention has changed to include the HDP 2.2 product version in file and directory
names. HDP 2.2 marks the first release where HDP rpms, debs, and directories contain versions in
the names to permit side-by-side installations of later HDP releases. To transition between previous
releases and HDP 2.2, Hortonworks provides hdp-select, a script that symlinks your directories to
hdp/current and lets you maintain using the same binary and configuration paths that you were
using before.
The following instructions have you remove your older version HDP components, install hdp-select,
and install HDP 2.2 components to prepare for rolling upgrade.
Use this procedure for upgrading from HDP 2.1 to any of the HDP 2.2 maintenance
releases. For example, to HDP 2.2.4. The instructions in this document refer to HDP
2.2.x.0 as a placeholder. To use an HDP 2.2.x.0 maintenance release, be sure to
replace 2.2.x.0 in the following instructions with the appropriate maintenance version,
such as 2.2.0.0 for the HDP 2.2 GA release, or 2.2.4.2 for an HDP 2.2 maintenance
release.
Refer to the HDP documentation for the information about the latest HDP 2.2
maintenance releases.
122
March 26, 2015
To prepare for upgrading the HDP Stack, perform the following tasks:
•
Disable Security.
If your Stack has Kerberos Security turned on, disable Kerberos before performing
the Stack upgrade. On Ambari W eb UI > Admin > Security, click Disable
Kerberos. You can re-enable Kerberos Security after performing the upgrade.
•
Checkpoint user metadata and capture the HDFS operational state.
This step supports rollback and restore of the original state of HDFS data, if necessary.
•
Backup Hive and Oozie metastore databases.
This step supports rollback and restore of the original state of Hive and Oozie data, if
necessary.
•
Stop all HDP and Ambari services.
•
Make sure to finish all current jobs running on the system before upgrading the stack.
Libraries will change during the upgrade. Any jobs remaining active that use the older
version libraries will probably fail during the upgrade.
1
Use Ambari Web, browse to Services. Go thru each service and in the Service Actions
menu, select Stop All, except for HDFS and ZooKeeper.
2
Stop any client programs that access HDFS.
Perform steps 3 through 8 on the NameNode host. In a highly-available NameNode
configuration, execute the following procedure on the primary NameNode.
To locate the primary NameNode in an Ambari-managed HDP cluster, browse Ambari
Web > Services > HDFS. In Summary, click NameNode. Hosts > Summary
displays the host name FQDN.
3
If HDFS is in a non-finalized state from a prior upgrade operation, you must finalize HDFS
before upgrading further. Finalizing HDFS will remove all links to the metadata of the prior
HDFS version. Do this only if you do not want to rollback to that prior HDFS version.
123
March 26, 2015
On the NameNode host, as the HDFS user,
su -l <HDFS_USER>
hdfs dfsadmin -finalizeUpgrade
where <HDFS_USER> is the HDFS Service user. For example, hdfs.
4
Check the NameNode directory to ensure that there is no snapshot of any prior HDFS
upgrade.
Specifically, using Ambari W eb > HDFS > Configs > NameNode, examine the
<dfs.namenode.name.dir> or the <dfs.name.dir> directory in the NameNode
Directories property. Make sure that only a "\current" directory and no "\previous" directory
exists on the NameNode host.
5
Create the following logs and other files.
Creating these logs allows you to check the integrity of the file system, post-upgrade.
As the HDFS user,
su -l <HDFS_USER>
1
Run fsck with the following flags and send the results to a log.
The resulting file contains a complete block map of the file system. You use this log
later to confirm the upgrade.
hdfs fsck / -files -blocks -locations > dfs-old-fsck-1.log
2
Optional: Capture the complete namespace of the file system.
The following command does a recursive listing of the root file system:
hadoop dfs -ls -R / > dfs-old-lsr-1.log
3
Create a list of all the DataNodes in the cluster.
hdfs dfsadmin -report > dfs-old-report-1.log
4
6
Optional: Copy all unrecoverable data stored in HDFS to a local file system or to a
backup instance of HDFS.
Save the namespace.
You must be the HDFS service user to do this and you must put the cluster in Safe Mode.
hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace
124
March 26, 2015
In a highly-available NameNode configuration, the command hdfs dfsadmin saveNamespace sets a checkpoint in the first NameNode specified in the
configuration, in dfs.ha.namenodes.[nameservice ID]. You can also use the
dfsadmin -fs option to specify which NameNode to connect.
For example, to force a checkpoint in namenode 2:
hdfs dfsadmin -fs hdfs://namenode2-hostname:namenode2-port saveNamespace
7
Copy the checkpoint files located in <dfs.name.dir/current> into a backup directory.
Find the directory, using Ambari W eb > HDFS > Configs > NameNode > NameNode
Directories on your primary NameNode host.
In a highly-available NameNode configuration, the location of the checkpoint depends
on where the saveNamespace command is sent, as defined in the preceding step.
8
Store the layoutVersion for the NameNode.
Make a copy of the file at <dfs.name.dir>/current/VERSION, where <dfs.name.dir>
is the value of the config parameter NameNode directories. This file will be used later to verify
that the layout version is upgraded.
9
Stop HDFS.
10 Stop ZooKeeper.
11 Using Ambari Web > Services > <service.name> > Summary, review each service and
make sure that all services in the cluster are completely stopped.
12 At the Hive Metastore database host, stop the Hive metastore service, if you have not done
so already.
Make sure that the Hive metastore database is running. For more information about
Administering the Hive metastore database, see the Hive Metastore Administrator
documentation.
13 If you are upgrading Hive and Oozie, back up the Hive and Oozie metastore databases on the
Hive and Oozie database host machines, respectively.
Make sure that your Hive database is updated to the minimum recommended version.
If you are using Hive with MySQL, we recommend upgrading your MySQL
database to version 5.6.21 before upgrading the HDP Stack to v2.2.x.
For specific information, see Database Requirements.
125
1
March 26, 2015
Optional - Back up the Hive Metastore database.
These instructions are provided for your convenience. Please check your database
documentation for the latest back up instructions.
Table 15. Hive Metastore Database Backup and Restore Database
Type
MySQL
Postgres
Oracle
Backup
Restore
mysqldump <dbname> >
<outputfilename.sql>
For example: mysqldump hive >
/tmp/mydir/backup_hive.sql
sudo -u <username> pg_dump <databasename>
> <outputfilename.sql>
For example: sudo -u postgres pg_dump hive >
mysql <dbname> <
<inputfilename.sql>
For example: mysql hive <
sudo -u <username> psql
<databasename> <
<inputfilename.sql>
For example: sudo -u postgres psql
hive < /tmp/mydir/backup_hive.sql
Import the database: imp
username/password@database
ile=input_file.dmp
Connect to the Oracle database using sqlplus
export the database: exp
username/password@database full=yes
file=output_file.dmp
2
Optional - Back up the Oozie Metastore database.
Table 16. Oozie Metastore Database Backup and Restore Database
Type
MySQL
Postgres
Backup
Restore
mysqldump <dbname> > <outputfilename.sql>
For example: mysqldump oozie >
/tmp/mydir/backup_oozie.sql
sudo -u <username> pg_dump <databasename>
> <outputfilename.sql>
For example: sudo -u postgres pg_dump oozie >
126
mysql <dbname> <
<inputfilename.sql>
For example: mysql oozie <
<databasename> <
<inputfilename.sql>
For example: sudo -u postgres
psql oozie <
March 26, 2015
14 Backup Hue.
If you are using the embedded SQLite database, you must perform a backup of the database
before you upgrade Hue to prevent data loss. To make a backup copy of the database, stop
Hue, then "dump" the database content to a file, as follows:
./etc/init.d/hue stop
su $HUE_USER
mkdir ~/hue_backup
cd /var/lib/hue
sqlite3 desktop.db .dump > ~/hue_backup/desktop.bak
For other databases, follow your vendor-specific instructions to create a backup.
15 Stage the upgrade script.
a
Create an "Upgrade Folder". For example, /work/upgrade_hdp_2, on a host that
can communicate with Ambari Server. The Ambari Server host is a suitable
candidate.
b
Copy the upgrade script to the Upgrade Folder. The script is available here:
/var/lib/ambari-server/resources/scripts/upgradeHelper.py on the
Ambari Server host.
c
Copy the upgrade catalog to the Upgrade Folder.
The catalog is available here:
/var/lib/ambariserver/resources/upgrade/catalog/UpgradeCatalog_2.1_to_2.2.2.json .
Make sure that Python is available on the host and that the version is 2.6 or higher:
python version
For RHEL/Centos/Oracle Linux 5, you must use Python 2.6.
d
python --version
16 Backup current configuration settings.
a
Go to the Upgrade Folder you just created in step 15.
b
Execute the backup-configs action:
127
March 26, 2015
python upgradeHelper.py --hostname <HOSTNAME> --user <USERNAME> -password<PASSWORD> --clustername <CLUSTERNAME> backup-configs
Where
<HOSTNAME> is the name of the Ambari Server host
<USERNAME> is the admin user for Ambari Server
<PASSWORD> is the password for the admin user
<CLUSTERNAME> is the name of the cluster
This step produces a set of files named TYPE_TAG, where TYPE is the configuration
type and TAG is the tag. These files contain copies of the various configuration
settings for the current (pre-upgrade) cluster. You can use these files as a reference
later.
17 On the Ambari Server host, stop Ambari Server and confirm that it is stopped.
ambari-server stop
18 Stop all Ambari Agents.
On every host in your cluster known to Ambari,
ambari-agent stop
1
Upgrade the HDP repository on all hosts and replace the old repository file with the new file:
Be sure to replace GA/2.2.x.0 in the following instructions with the appropriate
maintenance version, such as GA/2.2.0.0 for the HDP 2.2 GA release, or
updates/2.2.4.2 for an HDP 2.2 maintenance release.
•
For RHEL/CentOS/Oracle Linux 6:
wget -nv http://public-repo1.hortonworks.com/HDP/centos6/2.x/GA/2.2.x.0/hdp.repo -O
•
For SLES 11 SP3:
wget -nv http://public-repo1.hortonworks.com/HDP/suse11sp3/2.x/GA/2.2.x.0/hdp.repo -O
•
For SLES 11 SP1:
wget -nv http://public-repo1.hortonworks.com/HDP/sles11sp1/2.x/GA/2.2.x.0/hdp.repo -O
•
For Ubuntu12:
128
March 26, 2015
wget -nv http://public-repo1.hortonworks.com/HDP/ubuntu12/2.x/GA/2.2.x.0/hdp.list -O
/etc/apt/sourceslist.d/HDP.list
•
For RHEL/CentOS/Oracle Linux 5: (DEPRECATED)
Make sure to download the HDP.repo file under /etc/yum.repos.d on ALL hosts.
2
Update the Stack version in the Ambari Server database.
On the Ambari Server host, use the following command to update the Stack version to HDP2.2:
ambari-server upgradestack HDP-2.2
3
Back up the files in following directories on the Oozie server host and make sure that all files,
including *site.xml files are copied.
mkdir oozie-conf-bak
cp -R /etc/oozie/conf/* oozie-conf-bak
4
5
Remove the old oozie directories on all Oozie server and client hosts
•
rm -rf /etc/oozie/conf
•
rm -rf /usr/lib/oozie/
•
rm -rf /var/lib/oozie/
Upgrade the Stack on all Ambari Agent hosts.
For each host, identify the HDP components installed on that host. Use Ambari Web to
view components on each host in your cluster. Based on the HDP components
installed, edit the following upgrade commands for each host to upgrade only those
components residing on that host.
For example, if you know that a host has no HBase service or client packages
installed, then you can edit the command to not include HBase, as follows:
yum install "collectd*" "gccxml*" "pig*" "hdfs*" "sqoop*"
"zookeeper*" "hive*"
If you are writing to multiple systems using a script, do not use " " with the run
command. You can use " " with pdsh -y.
129
March 26, 2015
•
1
On all hosts, clean the yum repository.
yum clean all
2 Remove all HDP 2.1 components that you want to upgrade.
This command un-installs the HDP 2.1 component bits. It leaves the user data and metadata, but
removes your configurations.
yum erase "hadoop*" "webhcat*" "hcatalog*" "oozie*" "pig*" "hdfs*" "sqoop*"
"zookeeper*" "hbase*" "hive*" "tez*" "storm*" "falcon*" "flume*" "phoenix*"
"accumulo*" "mahout*" "hue*" "hdp_mon_nagios_addons"
3
Install all HDP 2.2 components that you want to upgrade.
yum install "hadoop_2_2_x_0_*" "oozie_2_2_x_0_*" "pig_2_2_x_0_*"
"sqoop_2_2_x_0_*" "zookeeper_2_2_x_0_*" "hbase_2_2_x_0_*" "hive_2_2_x_0_*"
"tez_2_2_x_0_*" "storm_2_2_x_0_*" "falcon_2_2_x_0_*" "flume_2_2_x_0_*"
"phoenix_2_2_x_0_*" "accumulo_2_2_x_0_*" "mahout_2_2_x_0_*"
rpm -e --nodeps hue-shell
yum install hue hue-common hue-beeswax hue-hcatalog hue-pig hue-oozie
4
Verify that the components were upgraded.
yum list installed | grep HDP-<old.stack.version.number>
No component file names should appear in the returned list.
•
For SLES:
1
On all hosts, clean the zypper repository.
zypper clean --all
2 Remove all HDP 2.1 components that you want to upgrade.
This command un-installs the HDP 2.1 component bits. It leaves the user data and metadata, but
removes your configurations.
zypper remove "hadoop*" "webhcat*" "hcatalog*" "oozie*" "pig*" "hdfs*"
"sqoop*" "zookeeper*" "hbase*" "hive*" "tez*" "storm*" "falcon*" "flume*"
"phoenix*" "accumulo*" "mahout*" "hue*" "hdp_mon_nagios_addons"
3
Install all HDP 2.2 components that you want to upgrade.
zypper install "hadoop\_2_2_x_0_*" "oozie\_2_2_x_0_*" "pig\_2_2_x_0_*"
"sqoop\_2_2_x_0_*" "zookeeper\_2_2_x_0_*" "hbase\_2_2_x_0_*"
"hive\_2_2_x_0_*" "tez\_2_2_x_0_*" "storm\_2_2_x_0_*" "falcon\_2_2_x_0_*"
"flume\_2_2_x_0_*" "phoenix\_2_2_x_0_*" "accumulo\_2_2_x_0_*"
"mahout\_2_2_x_0_*"
zypper install hue hue-common hue-beeswax hue-hcatalog hue-pig hue-oozie
4
rpm -qa | grep hdfs, && rpm -qa | grep hive && rpm -qa | grep hcatalog
No component files names should appear in the returned list.
5 If any components were not upgraded, upgrade them as follows:
130
March 26, 2015
yast --update hdfs hcatalog hive
6
Symlink directories, using hdp-select.
To prevent version-specific directory issues for your scripts and updates, Hortonworks
provides hdp-select, a script that symlinks directories to hdp-current and modifies
paths for configuration directories.
Check that the hdp-select package installed:
rpm -qa | grep hdp-select
You should see: hdp-select-2.2.x.0-xxxx.el6.noarch for the HDP 2.2.x
release.
If not, then run:
yum install hdp-select
Run hdp-select as root, on every node.
hdp-select set all 2.2.x.x-<$version>
where $version is the build number. For the HDP 2.2.4.2 release <$version> = 2.
7
Verify that all components are on the new version. The output of this statement should be
empty,
hdp-select status | grep -v 2\.2\.x\.x | grep -v None
8
If you are using Hue, you must upgrade Hue manually. For more information, see Confiure
and Start Hue.
9
On the Hive Metastore database host, stop the Hive Metastore service, if you have not done
so already. Make sure that the Hive Metastore database is running.
10 Upgrade the Hive metastore database schema from v13 to v14, using the following
instructions:
•
Set java home:
export JAVA_HOME=/path/to/java
•
Copy (rewrite) old Hive configurations to new conf dir:
cp -R /etc/hive/conf.server/* /etc/hive/conf/
•
Copy jdbc connector to /usr/hdp/<$version>/hive/lib, if it is not already in
that location.
131
•
March 26, 2015
<HIVE_HOME>/bin/schematool -upgradeSchema -dbType<databaseType>
where <HIVE_HOME> is the Hive installation directory.
For example, on the Hive Metastore host:
/usr/hdp/2.2.x.0-<$version>/hive/bin/schematool -upgradeSchema -dbType
<databaseType>
where <$version> is the 2.2.x build number and <databaseType> is derby, mysql,
oracle, or postgres.
1
Start Ambari Server.
On the Ambari Server host,
ambari-server start
2
Start all Ambari Agents.
At each Ambari Agent host,
ambari-agent start
3
Update the repository Base URLs in Ambari Server for the HDP-2.2 stack.
Browse to Ambari W eb > Admin > Repositories, then set the values for the HDP and
HDP-UTILS repository Base URLs. For more information about viewing and editing repository
Base URLs, see Viewing Cluster Stack Version and Repository URLs.
For a remote, accessible, public repository, the HDP and HDP-UTILS Base URLs are
the same as the baseurl=values in the HDP.repo file downloaded in Upgrade the 2.1
Stack to 2.2: Step 1. For a local repository, use the local repository Base URL that you
configured for the HDP Stack. For links to download the HDP repository files for your
version of the Stack, see HDP Stack Repositories.
4
Update the respective configurations.
a
Go to the Upgrade Folder you created when Preparing the 2.1 Stack for Upgrade.
b
Execute the update-configs action:
132
March 26, 2015
python UpgradeHelper_HDP2.py --hostname $HOSTNAME --user
$USERNAME --password $PASSWORD --clustername $CLUSTERNAME -fromStack=$FROMSTACK --toStack=$TOSTACK -upgradeCatalog=$UPGRADECATALOG update-configs [configuration
item]
Where
<FROMSTACK> is the version number of pre-upgraded stack, for example 2.1
<TOSTACK> it the version number of the upgraded stack, for example 2.2.x
<UPGRADECATALOG> is the path to the upgrade catalog file, for example
UpgradeCatalog_2.1_to_2.2.x.json
For example,
To update all configuration items:
$USERNAME --password $PASSWORD --clustername $CLUSTERNAME -fromStack=2.1 --toStack=2.2.x -upgradeCatalog=UpgradeCatalog_2.1_to_2.2.x.json update-configs
To update configuration item hive-site:
hive-site
5
Using the Ambari Web UI, add the Tez service if if it has not been installed already. For more
information about adding a service, see Adding a Service.
6
Using the Ambari Web UI, add any new services that you want to run on the HDP 2.2.x stack.
You must add a Service before editing configuration properties necessary to complete the
upgrade.
7
Using the Ambari Web UI > Services, start the ZooKeeper service.
8
Copy (rewrite) old hdfs configurations to new conf directory, on all Datanode and Namenode
hosts,
133
March 26, 2015
cp /etc/hadoop/conf.empty/hdfs-site.xml.rpmsave /etc/hadoop/conf/hdfssite.xml;
cp /etc/hadoop/conf.empty/hadoop-env.sh.rpmsave
/etc/hadoop/conf/hadoop-env.sh.xml;
cp /etc/hadoop/conf.empty/log4j.properties.rpmsave
/etc/hadoop/conf/log4j.properties;
cp /etc/hadoop/conf.empty/core-site.xml.rpmsave /etc/hadoop/conf/coresite.xml
9
If you are upgrading from an HA NameNode configuration, start all JournalNodes.
At each JournalNode host, run the following command:
su -l <HDFS_USER> -c "/usr/hdp/2.2.x.x-<$version>/hadoop/sbin/hadoopdaemon.sh start journalnode"
All JournalNodes must be running when performing the upgrade, rollback, or
finalization operations. If any JournalNodes are down when running any such
operation, the operation will fail.
10 Because the file system version has now changed, you must start the NameNode manually.
On the active NameNode host, as the HDFS user,
su -l <HDFS_USER> -c "export HADOOP_LIBEXEC_DIR=/usr/hdp/2.2.x.0<$version>/hadoop/libexec && /usr/hdp/2.2.x.0<$version>/hadoop/sbin/hadoop-daemon.sh start namenode -upgrade"
To check if the Upgrade is progressing, check that the " \previous " directory has been
created in \NameNode and \JournalNode directories. The " \previous" directory contains
a snapshot of the data before upgrade.
134
March 26, 2015
In a NameNode HA configuration, this NameNode does not enter the standby state as
usual. Rather, this NameNode immediately enters the active state, upgrades its local
storage directories, and upgrades the shared edit log. At this point, the standby
NameNode in the HA pair is still down, and not synchronized with the upgraded, active
NameNode.
To re-establish HA, you must synchronize the active and standby NameNodes. To do
so, bootstrap the standby NameNode by running the NameNode with the 'bootstrapStandby' flag. Do NOT start the standby NameNode with the '-upgrade' flag.
At the Standby NameNode,
su -l <HDFS_USER> -c "hdfs namenode -bootstrapStandby -force"
The bootstrapStandby command downloads the most recent fsimage from the active
NameNode into the <dfs.name.dir> directory on the standby NameNode.
Optionally, you can access that directory to make sure the fsimage has been
successfully downloaded. After verifying, start the ZKFailoverController, then start the
standby NameNode using Ambari W eb > Hosts > Components.
11 Start all DataNodes.
At each DataNode, as the HDFS user,
su -l <HDFS_USER> -c "/usr/hdp/2.2.x.0-<$version>/hadoop/sbin/hadoopdaemon.sh --config /etc/hadoop/conf start datanode"
The NameNode sends an upgrade command to DataNodes after receiving block reports.
12 Update HDFS Configuration Properties for HDP 2.2.x
Using Ambari Web UI > Services > HDFS > Configs > core-site.xml:
Ä Add
Name
hadoop.http.authentication.simple.anonymous.allowed
Value
true
Using Ambari Web UI > Services > HDFS > Configs > hdfs-site.xml:
Ä Add
Name
dfs.namenode.startup.delay.block.deletion.sec
Value
3600
135
March 26, 2015
Ä Modify
Name
dfs.datanode.max.transfer.threads
Value
4096
13 Restart HDFS.
1
Open the Ambari Web GUI. If the browser in which Ambari is running has been open
throughout the process, clear the browser cache, then refresh the browser.
2
Choose Ambari W eb > Services > HDFS > Service Actions > Restart All.
In a cluster configured for NameNode High Availability, use the following procedure to
restart NameNodes. Using the following procedure preserves HA when upgrading the
cluster.
1 Using Ambari W eb > Services > HDFS, choose Active NameNode.
This shows the host name of the current, active NameNode.
2 Write down (or copy, or remember) the host name of the active NameNode.
You need this host name for step 4.
3 Using Ambari W eb > Services > HDFS > Service Actions > choose
Stop.
This stops all of the HDFS Components, including both NameNodes.
4 Using Ambari W eb > Hosts choose the host name you noted in Step 2, then
start that NameNode component, using Host Actions > Start.
This causes the original, active NameNode to re-assume its role as the active
NameNode.
5
Using Ambari W eb > Services > HDFS > Service Actions, choose ReStart All.
3
Choose Service Actions > Run Service Check. Makes sure the service check
passes.
14 After the DataNodes are started, HDFS exits SafeMode. To monitor the status, run the
following command, on each DataNode:
sudo su -l <HDFS_USER> -c "hdfs dfsadmin -safemode get"
When HDFS exits SafeMode, the following message displays:
Safe mode is OFF
15 Make sure that the HDFS upgrade was successful.
Optionally, repeat step 5 in Prepare the 2.1 Stack for Upgrade to create new versions of the
logs and reports, substituting "-new " for "-old " in the file names as necessary.
•
Compare the old and new versions of the following log files:
•
dfs-old-fsck-1.log versus dfs-new-fsck-1.log.
The files should be identical unless the hadoop fsck reporting format has changed in the new
version.
• dfs-old-lsr-1.log versus dfs-new-lsr-1.log.
136
March 26, 2015
The files should be identical unless the format of hadoop fs -lsr reporting or the data structures have
changed in the new version.
• dfs-old-report-1.log versus fs-new-report-1.log
Make sure that all DataNodes in the cluster before upgrading are up and running.
16 If YARN is installed in your HDP 2.1 stack, and the Application Timeline Server (ATS)
components are NOT, then you must create and install ATS service and host components
using the API
Run the following commands on the server that will host the YARN ATS in your cluster. Be
sure to replace <your_ATS_component_hostname> with a host name appropriate for your
environment.
1
Create the ATS Service Component.
curl --user admin:admin -H "X-Requested-By: ambari" -i -X POST
http://localhost:8080/api/v1/clusters/<your_cluster_name>/services/YARN/compo
nents/APP_TIMELINE_SERVER
2
Create the ATS Host Component.
http://localhost:8080/api/v1/clusters/<your_cluster_name>/hosts/<your_ATS_com
ponent_hostname>/host_components/APP_TIMELINE_SERVER
3
Install the ATS Host Component.
curl --user admin:admin -H "X-Requested-By: ambari" -i -X PUT -d
'{"HostRoles": { "state": "INSTALLED"}}'
curl commands use the default username/password = admin/admin. To run the curl
commands using non-default credentials, modify the --user option to use your Ambari
administrator credentials.
For example: --user <ambari_admin_username>:<ambari_admin_password>.
17 Prepare MR2 and Yarn for work. Execute HDFS commands on any host.
•
Create mapreduce dir in hdfs.
su -l <HDFS_USER> -c "hdfs dfs -mkdir -p /hdp/apps/2.2.x.0<$version>/mapreduce/"
•
Copy new mapreduce.tar.gz to HDFS mapreduce dir.
su -l <HDFS_USER> -c "hdfs dfs -copyFromLocal /usr/hdp/2.2.x.0<$version>/hadoop/mapreduce.tar.gz /hdp/apps/2.2.x.0<$version>/mapreduce/."
•
Grant permissions for created mapreduce dir in hdfs.
137
March 26, 2015
su -l <HDFS_USER> -c "hdfs dfs -chown -R <HDFS_USER>:<HADOOP_GROUP> /hdp";
su -l <HDFS_USER> -c "hdfs dfs -chmod -R 555 /hdp/apps/2.2.x.0<$version>/mapreduce";
su -l <HDFS_USER> -c "hdfs dfs -chmod -R 444 /hdp/apps/2.2.x.0<$version>/mapreduce/mapreduce.tar.gz"
•
Update YARN Configuration Properties for HDP 2.2.x
On ambari-server host,
cd /var/lib/ambari-server/resources/scripts
Ä then run the following scripts:
./configs.sh set localhost <your.cluster.name> capacity-scheduler
"yarn.scheduler.capacity.resource-calculator"
"org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator";
"yarn.scheduler.capacity.root.accessible-node-labels" "*";
"yarn.scheduler.capacity.root.accessible-node-labels.default.capacity" "-1";
"yarn.scheduler.capacity.root.accessible-node-labels.default.maximumcapacity" "-1";
"yarn.scheduler.capacity.root.default-node-label-expression" ""
Using Ambari Web UI > Service > Yarn > Configs > Advanced > yarn-site:
Ä Add
Name
Value
yarn.application.classpath
$HADOOP_CONF_DIR,/usr/hdp/current/hadoop-client/*,
/usr/hdp/current/hadoop-client/lib/*,
/usr/hdp/current/hadoop-hdfs-client/*,
/usr/hdp/current/hadoop-hdfs-client/lib/*,
/usr/hdp/current/hadoop-yarn-client/*
,/usr/hdp/current/hadoop-yarn-client/lib/*
hadoop.registry.zk.quorum

hadoop.registry.rm.enabled
false
yarn.client.nodemanager900000
connect.max-wait-ms
yarn.client.nodemanager10000
connect.retry-interval-ms
yarn.node-labels.fs-store.retry- 2000, 500
policy-spec
yarn.node-labels.fs-store.root- /system/yarn/node-labels
dir
yarn.node-labels.managerorg.apache.hadoop.yarn.server.resourcemanager.nodelabels.Mem
class
oryRMNodeLabelsManager
yarn.nodemanager.bind-host
0.0.0.0
yarn.nodemanager.disk90
health-checker.max-diskutilization-per-disk-percentage
138
yarn.nodemanager.diskhealth-checker.min-freespace-per-disk-mb
yarn.nodemanager.linuxcontainerexecutor.cgroups.hierarchy
yarn.nodemanager.linuxcontainerexecutor.cgroups.mount
yarn.nodemanager.linuxcontainerexecutor.cgroups.strictresource-usage
yarn.nodemanager.linuxcontainer-executor.resourceshandler.class
yarn.nodemanager.logaggregation.debug-enabled
yarn.nodemanager.logaggregation.num-log-files-perapp
yarn.nodemanager.logaggregation.roll-monitoringinterval-seconds
yarn.nodemanager.recovery.di
r
yarn.nodemanager.recovery.en
abled
yarn.nodemanager.resource.c
pu-vcores
yarn.nodemanager.resource.p
ercentage-physical-cpu-limit
yarn.resourcemanager.bindhost
yarn.resourcemanager.connec
t.max-wait.ms
t.retry-interval.ms
yarn.resourcemanager.fs.state
-store.retry-policy-spec
-store.uri
yarn.resourcemanager.ha.ena
bled
yarn.resourcemanager.recover
y.enabled
yarn.resourcemanager.statestore.max-completedapplications
yarn.resourcemanager.store.cl
ass
March 26, 2015
1000
hadoop-yarn
false
false
org.apache.hadoop.yarn.server.nodemanager.util.DefaultLCEResou
rcesHandler
false
30
-1
/var/log/hadoop-yarn/nodemanager/recovery-state
false
1
100
0.0.0.0
900000
30000
2000, 500
<enter a "space" as the property value>
false
false
${yarn.resourcemanager.max-completed-applications}
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMSt
ateStore
139
yarn.resourcemanager.system
-metricspublisher.dispatcher.pool-size
-metrics-publisher.enabled
yarn.resourcemanager.webap
p.delegation-token-authfilter.enabled
yarn.resourcemanager.workpreserving-recovery.enabled
yarn.resourcemanager.workpreservingrecovery.scheduling-wait-ms
yarn.resourcemanager.zk-acl
yarn.resourcemanager.zkaddress
yarn.resourcemanager.zknum-retries
yarn.resourcemanager.zkretry-interval-ms
yarn.resourcemanager.zkstate-store.parent-path
yarn.resourcemanager.zktimeout-ms
yarn.timeline-service.bind-host
yarn.timelineservice.client.max-retries
yarn.timelineservice.client.retry-interval-ms
yarn.timeline-service.httpauthentication.simple.anonym
ous.allowed
yarn.timeline-service.httpauthentication.type
yarn.timeline-service.leveldbtimeline-store.read-cache-size
yarn.timeline-service.leveldbtimeline-store.start-time-readcache-size
yarn.timeline-service.leveldbtimeline-store.start-time-writecache-size
Ä Modify
Name
yarn.timelineservice.webapp.address
yarn.timelineservice.webapp.https.address
yarn.timeline-service.address
March 26, 2015
10
true
false
false
10000
world:anyone:rwcda
localhost:2181
1000
1000
/rmstore
10000
0.0.0.0
30
1000
true
simple
104857600
10000
10000
Value
<PUT_THE_FQDN_OF_ATS_HOST_NAME_HERE>:8188
140
•
March 26, 2015
Update MapReduce2 Configuration Properties for HDP 2.2.x
Using Ambari Web UI > Services > MapReduce2 > Configs > mapredsite.xml:
Ä Add
Name
Value
mapreduce.job.emit-timeline-data
false
mapreduce.jobhistory.bind-host
0.0.0.0
mapreduce.reduce.shuffle.fetch.retry.enab 1
led
mapreduce.reduce.shuffle.fetch.retry.inter 1000
val-ms
mapreduce.reduce.shuffle.fetch.retry.time 30000
out-ms
mapreduce.application.framework.path
/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz
#mr-framework
Ä Modify
Name
mapreduce.admin.map.ch
ild.java.opts
mapreduce.admin.reduce.
child.java.opts
yarn.app.mapreduce.am.a
dmin-command-opts
yarn.app.mapreduce.am.c
ommand-opts
mapreduce.application.cl
asspath
mapreduce.admin.user.en
v
•
Value
-server -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true Dhdp.version=${hdp.version}
-Dhdp.version=${hdp.version}
-Xmx546m -Dhdp.version=${hdp.version}
$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:
$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:
$PWD/mr-framework/hadoop/share/hadoop/common/*:
$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:
$PWD/mr-framework/hadoop/share/hadoop/yarn/*:
$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:
$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:
$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:
/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure$
LD_LIBRARY_PATH=/usr/hdp/${hdp.version}/hadoop/lib/native:/usr/hdp
/${hdp.version}/hadoop/lib/native/Linux-amd64-64
Update HBase Configuration Properties for HDP 2.2.x
Using Ambari Web UI > Services > HBase > Configs > hbase-site.xml:
Ä Add
Name
Value
hbase.hregion.majorcompaction.jitter
0.50
Ä Modify
Name
hbase.hregion.majorcompaction
Value
604800000
141
March 26, 2015
hbase.hregion.memstore.block.multiplier
4
Ä Remove
Name
hbase.hstore.flush.retries.number
Value
120
•
Update Hive Configuration Properties for HDP 2.2.x
Using Ambari Web UI > Services > Hive > Configs > hive-site.xml:
Ä Add
Name
Value
hive.cluster.delegation.token.store.zo 
okeeper.connectString
hive.auto.convert.sortmerge.join.to.m false
apjoin
hive.cbo.enable
true
hive.cli.print.header
false
hive.cluster.delegation.token.store.cla org.apache.hadoop.hive.thrift.ZooKeeperTokenStore
ss
hive.cluster.delegation.token.store.zo /hive/cluster/delegation
okeeper.znode
hive.conf.restricted.list
hive.security.authenticator.manager,hive.security.authorizati
on.manager,hive.users.in.admin.role
hive.convert.join.bucket.mapjoin.tez
false
hive.exec.compress.intermediate
false
hive.exec.compress.output
false
hive.exec.dynamic.partition
true
hive.exec.dynamic.partition.mode
nonstrict
hive.exec.max.created.files
100000
hive.exec.max.dynamic.partitions
5000
hive.exec.max.dynamic.partitions.per 2000
node
hive.exec.orc.compression.strategy
SPEED
hive.exec.orc.default.compress
ZLIB
hive.exec.orc.default.stripe.size
67108864
hive.exec.parallel
false
hive.exec.parallel.thread.number
8
hive.exec.reducers.bytes.per.reducer
67108864
hive.exec.reducers.max
1009
hive.exec.scratchdir
/tmp/hive
hive.exec.submit.local.task.via.child
true
hive.exec.submitviachild
false
hive.fetch.task.aggr
false
hive.fetch.task.conversion
more
hive.fetch.task.conversion.threshold
1073741824
hive.map.aggr.hash.force.flush.memo 0.9
ry.threshold
hive.map.aggr.hash.min.reduction
0.5
hive.map.aggr.hash.percentmemory
0.5
142
hive.mapjoin.optimized.hashtable
hive.merge.mapfiles
hive.merge.mapredfiles
hive.merge.orcfile.stripe.level
hive.merge.rcfile.block.level
hive.merge.size.per.task
hive.merge.smallfiles.avgsize
hive.merge.tezfiles
hive.metastore.authorization.storage.
checks
hive.metastore.client.connect.retry.de
lay
hive.metastore.connect.retries
hive.metastore.failure.retries
hive.metastore.server.max.threads
hive.optimize.constant.propagation
hive.optimize.metadataonly
hive.optimize.null.scan
hive.optimize.sort.dynamic.partition
hive.orc.compute.splits.num.threads
hive.prewarm.enabled
hive.prewarm.numcontainers
hive.security.metastore.authenticator.
manager
hive.security.metastore.authorization.
auth.reads
hive.server2.allow.user.substitution
hive.server2.logging.operation.enable
d
hive.server2.logging.operation.log.loc
ation
hive.server2.table.type.mapping
hive.server2.thrift.http.path
hive.server2.thrift.http.port
hive.server2.thrift.max.worker.threads
hive.server2.thrift.sasl.qop
hive.server2.transport.mode
hive.server2.use.SSL
hive.smbjoin.cache.rows
hive.stats.dbclass
hive.stats.fetch.column.stats
hive.stats.fetch.partition.stats
hive.support.concurrency
hive.tez.auto.reducer.parallelism
hive.tez.cpu.vcores
hive.tez.dynamic.partition.pruning
hive.tez.dynamic.partition.pruning.ma
x.data.size
x.event.size
March 26, 2015
true
true
false
true
true
256000000
16000000
false
false
5s
24
24
100000
true
true
true
false
10
false
10
org.apache.hadoop.hive.ql.security.HadoopDefaultMetastor
eAuthenticator
true
true
true
${system:java.io.tmpdir}/${system:user.name}/operation_log
s
CLASSIC
cliservice
10001
500
auth
binary
false
10000
fs
false
true
false
false
-1
true
104857600
1048576
143
March 26, 2015
hive.tez.log.level
hive.tez.max.partition.factor
hive.tez.min.partition.factor
hive.tez.smb.number.waves
hive.user.install.directory
hive.vectorized.execution.reduce.ena
bled
hive.zookeeper.client.port
hive.zookeeper.namespace
hive.zookeeper.quorum
Ä Modify
Name
hive.metastore.client.socket.tim
eout
hive.optimize.reducededuplicati
on.min.reducer
hive.security.authorization.mana
ger
hive.security.metastore.authoriz
ation.manager
hive.server2.support.dynamic.s
ervice.discovery
hive.vectorized.groupby.checki
nterval
fs.file.impl.disable.cache
fs.hdfs.impl.disable.cache
INFO
2.0
0.25
0.5
/user/
false
2181
hive_zookeeper_namespace

Value
1800s
4
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQ
LStdConfOnlyAuthorizerFactory
org.apache.hadoop.hive.ql.security.authorization.StorageBasedAu
thorizationProvider,
org.apache.hadoop.hive.ql.security.authorization.MetaStoreAuthz
APIAuthorizerEmbedOnly
true
4096
true
true
18 Using Ambari Web > Services > Service Actions, start YARN.
19 Using Ambari Web > Services > Service Actions, start MapReduce2.
20 Using Ambari Web > Services > Service Actions, start HBase and ensure the service
check passes.
21 Using Ambari Web > Services > Service Actions, start the Hive service.
22 Upgrade Oozie.
1
Perform the following preparation steps on each Oozie server host:
You must replace your Oozie configuration after upgrading.
1
2
Copy configurations from oozie-conf-bak to the /etc/oozie/conf directory on
each Oozie server and client.
Create /usr/hdp/2.2.x.0-<$version>/oozie/libext-upgrade22 directory.
mkdir /usr/hdp/2.2.x.0-<$version>/oozie/libext-upgrade22
144
3
March 26, 2015
Copy the JDBC jar of your Oozie database to both /usr/hdp/2.2.x.0<$version>/oozie/libext-upgrade22 and /usr/hdp/2.2.x.0<$version>/oozie/libtools.
For example, if you are using MySQL, copy your mysql-connector-java.jar.
4
Copy these files to /usr/hdp/2.2.x.0-<$version>/oozie/libext-upgrade22
directory
cp /usr/lib/hadoop/lib/hadoop-lzo*.jar /usr/hdp/2.2.x.0<$version>/oozie/libext-upgrade22;
cp /usr/share/HDP-oozie/ext-2.2.zip /usr/hdp/2.2.x.0-<$version>/oozie/libextupgrade22;
cp /usr/share/HDP-oozie/ext-2.2.zip /usr/hdp/2.2.x.0-<$version>/oozie/libext
5 Grant read/write access to the Oozie user.
chmod -R 777 /usr/hdp/2.2.x.0-<$version>/oozie/libext-upgrade22
2
Upgrade steps:
1
2
3
On the Services view, make sure that YARN and MapReduce2 services are running.
Make sure that the Oozie service is stopped.
In /etc/oozie/conf/oozie-env.sh, comment out CATALINA_BASE property, also
do the same using Ambari Web UI in Services > Oozie > Configs > Advanced
oozie-env.
Upgrade Oozie.
4
At the Oozie database host, as the Oozie service user:
sudo su -l <OOZIE_USER> -c"/usr/hdp/2.2.x.0-<$version>/oozie/bin/ooziedb.sh
upgrade -run"
where <OOZIE_USER> is the Oozie service user. For example, oozie.
Make sure that the output contains the string "Oozie DB has been upgraded to Oozie version
<OOZIE_Build_Version>.
5 Prepare the Oozie WAR file.
The Oozie server must be not running for this step. If you get the message "ERROR:
Stop Oozie first", it means the script still thinks it's running. Check, and if needed,
remove the process id (pid) file indicated in the output. You may see additional "File
Not Found" error messages during a successful upgrade of Oozie.
On the Oozie server, as the Oozie user
sudo su -l <OOZIE_USER> -c "/usr/hdp/2.2.x.0-<$version>/oozie/bin/ooziesetup.sh prepare-war -d /usr/hdp/2.2.x.0-<$version>/oozie/libext-upgrade22"
Make sure that the output contains the string "New Oozie WAR file added".
6 Using Ambari W eb, choose Services > Oozie > Configs, expand oozie-log4j,
then add the following property:
145
March 26, 2015
log4j.appender.oozie.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L SERVER[${oozie.instance.id}] %m%n
where ${oozie.instance.id} is determined by Oozie, automatically.
7 Using Ambari Web > Services > Oozie > Configs, expand Advanced ooziesite, then edit the following properties:
A In oozie.service.coord.push.check.requeue.interval, replace the existing
property value with the following one:
30000
B
In oozie.service.URIHandlerService.uri.handlers, append to the existing
property value the following string, if is it is not already present:
org.apache.oozie.dependency.FSURIHandler,org.apache.oozie.dependency.HCatURIH
andler
C In oozie.services, make sure all the following properties are present:
org.apache.oozie.service.SchedulerService,
org.apache.oozie.service.InstrumentationService,
org.apache.oozie.service.MemoryLocksService,
org.apache.oozie.service.UUIDService,
org.apache.oozie.service.ELService,
org.apache.oozie.service.AuthorizationService,
org.apache.oozie.service.UserGroupInformationService,
org.apache.oozie.service.HadoopAccessorService,
org.apache.oozie.service.JobsConcurrencyService,
org.apache.oozie.service.URIHandlerService,
org.apache.oozie.service.DagXLogInfoService,
org.apache.oozie.service.SchemaService,
org.apache.oozie.service.LiteWorkflowAppService,
org.apache.oozie.service.JPAService,
org.apache.oozie.service.StoreService,
org.apache.oozie.service.CoordinatorStoreService,
org.apache.oozie.service.SLAStoreService,
org.apache.oozie.service.DBLiteWorkflowStoreService,
org.apache.oozie.service.CallbackService,
org.apache.oozie.service.ActionService,
org.apache.oozie.service.ShareLibService,
org.apache.oozie.service.CallableQueueService,
org.apache.oozie.service.ActionCheckerService,
org.apache.oozie.service.RecoveryService,
org.apache.oozie.service.PurgeService,
org.apache.oozie.service.CoordinatorEngineService,
org.apache.oozie.service.BundleEngineService,
org.apache.oozie.service.DagEngineService,
org.apache.oozie.service.CoordMaterializeTriggerService,
org.apache.oozie.service.StatusTransitService,
org.apache.oozie.service.PauseTransitService,
org.apache.oozie.service.GroupsService,
org.apache.oozie.service.ProxyUserService,
org.apache.oozie.service.XLogStreamingService,
org.apache.oozie.service.JvmPauseMonitorService
D
Add the oozie.service.AuthorizationService.security.enabled property
with the following property value: false
146
March 26, 2015
Specifies whether security (user name/admin role) is enabled or not. If disabled any user can manage
Oozie system and manage any job.
E Add the oozie.service.HadoopAccessorService.kerberos.enabled
property with the following property value: false
Indicates if Oozie is configured to use Kerberos.
F In oozie.services.ext, append to the existing property value the following
string, if is it is not already present:
org.apache.oozie.service.PartitionDependencyManagerService,org.apache.oozie.s
ervice.HCatAccessorService
G
8
After modifying all properties on the Oozie Configs page, choose Save to update
oozie.site.xml, using the modified configurations.
Replace the content of /usr/oozie/share in HDFS.
On the Oozie server host:
1 Extract the Oozie sharelib into a tmp folder.
mkdir -p /tmp/oozie_tmp;
cp /usr/hdp/2.2.x.0-<$version>/oozie/oozie-sharelib.tar.gz /tmp/oozie_tmp;
cd /tmp/oozie_tmp;
tar xzvf oozie-sharelib.tar.gz;
2
Back up the /user/oozie/share folder in HDFS and then delete it.
If you have any custom files in this folder, back them up separately and then add them to the /share
folder after updating it.
mkdir /tmp/oozie_tmp/oozie_share_backup;
chmod 777 /tmp/oozie_tmp/oozie_share_backup;
su -l <HDFS_USER> -c "hdfs dfs -copyToLocal /user/oozie/share
/tmp/oozie_tmp/oozie_share_backup";
su -l <HDFS_USER> -c "hdfs dfs -rm -r /user/oozie/share";
where <HDFS_USER> is the HDFS service user. For example, hdfs.
3 Add the latest share libs that you extracted in step 1. After you have added the files,
modify ownership and acl.
su -l <HDFS_USER> -c "hdfs dfs -copyFromLocal /tmp/oozie_tmp/share
/user/oozie/.";
su -l <HDFS_USER> -c "hdfs dfs -chown -R <OOZIE_USER>:<HADOOP_GROUP>
/user/oozie";
su -l <HDFS_USER> -c "hdfs dfs -chmod -R 755 /user/oozie";
3
Update Oozie Configuration Properties for HDP 2.2.x
Using Ambari Web UI > Services > Oozie > Configs > oozie-site.xml:
Ä Add
147
March 26, 2015
Name
oozie.authentication.simple.anonymous.allowed
oozie.service.coord.check.maximum.frequency
oozie.service.HadoopAccessorService.kerberos.enabled
Ä Modify
Name
oozie.service.SchemaServic
e.wf.ext.schemas
oozie.services.ext
Value
true
false
false
Value
shell-action-0.1.xsd,shell-action-0.2.xsd,shell-action-0.3.xsd,emailaction-0.1.xsd,email-action-0.2.xsd,
hive-action-0.2.xsd,hive-action-0.3.xsd,hive-action-0.4.xsd,hiveaction-0.5.xsd,sqoop-action-0.2.xsd,
sqoop-action-0.3.xsd,sqoop-action-0.4.xsd,ssh-action-0.1.xsd,sshaction-0.2.xsd,distcp-action-0.1.xsd,
distcp-action-0.2.xsd,oozie-sla-0.1.xsd,oozie-sla-0.2.xsd
org.apache.oozie.service.JMSAccessorService,org.apache.oozie.servi
ce.PartitionDependencyManagerService,
org.apache.oozie.service.HCatAccessorService
23 Use the Ambari Web UI > Services view to start the Oozie service.
Make sure that ServiceCheck passes for Oozie.
24 Update WebHCat.
A
Action
Modify
Modify the webhcat-site config type.
Using Ambari Web > Services > WebHCat, modify the following configuration:
Property Name
Property Value
templeton.storage.class org.apache.hive.hcatalog.templeton.tool.ZooKeeperStorage
B
Expand Advanced > webhcat-site.xml.
Check if property templeton.port exists. If not, then add it using the Custom
webhcat-site panel. The default value for templeton.port = 50111.
C
On each WebHCat host, update the Pig and Hive tar bundles, by updating the
following files:
•
•
/apps/webhcat/pig.tar.gz
/apps/webhcat/hive.tar.gz
Find these files only on a host where WebHCat is installed.
For example, to update a *.tar.gz file:
1 Move the file to a local directory.
su -l <HCAT_USER> -c "hadoop --config /etc/hadoop/conf fs -copyToLocal
/apps/webhcat/*.tar.gz <local_backup_dir>"
148
2
March 26, 2015
Remove the old file.
su -l <HCAT_USER> -c "hadoop --config /etc/hadoop/conf fs -rm
/apps/webhcat/*.tar.gz"
3 Copy the new file.
su -l <HCAT_USER> -c "hdfs --config /etc/hadoop/conf dfs -copyFromLocal
/usr/hdp/2.2.x.0-<$version>/hive/hive.tar.gz /apps/webhcat/"; su -l
<HCAT_USER> -c "hdfs --config /etc/hadoop/conf dfs -copyFromLocal
/usr/hdp/2.2.0.0-<$version>/pig/pig.tar.gz /apps/webhcat/";
where <HCAT_USER> is the HCatalog service user. For example, hcat.
D
On each WebHCat host, update /app/webhcat/hadoop-streaming.jar file.
1
Move the file to a local directory.
/apps/webhcat/hadoop-streaming*.jar <local_backup_dir>"
2
/apps/webhcat/hadoop-streaming*.jar"
3
Copy the new hadoop-streaming.jar file.
/usr/hdp/2.2.x.0-<$version>/hadoop-mapreduce/hadoop-streaming*.jar
/apps/webhcat"
25 If Tez was not installed during the upgrade, you must prepare Tez for work, using the
following steps:
The Tez client should be available on the same host with Pig.
If you use Tez as the Hive execution engine, and if the variable hive.server2.enabled.doAs is
set to true, you must create a scratch directory on the NameNode host for the username that
will run the HiveServer2 service. If you installed Tez before upgrading the Stack, use the
following commands:
sudo su -c "hdfs -makedir /tmp/hive- <username> "
sudo su -c "hdfs -chmod 777 /tmp/hive- <username> "
where <username> is the name of the user that runs the HiveServer2 service.
•
Update Tez Configuration Properties for HDP 2.2.x
Using Ambari Web UI > Services > Tez > Configs > tez-site.xml:
149
Ä Add
Name
tez.am.container.idle.release
-timeout-max.millis
tez.am.container.idle.release
-timeout-min.millis
tez.am.launch.clusterdefault.cmd-opts
tez.am.launch.cmd-opts
tez.am.launch.env
tez.am.max.app.attempts
tez.am.maxtaskfailures.per.n
ode
tez.cluster.additional.classp
ath.prefix
tez.counters.max
tez.counters.max.groups
tez.generate.debug.artifacts
tez.grouping.max-size
tez.grouping.min-size
tez.grouping.split-waves
tez.history.logging.service.cl
ass
tez.runtime.compress
tez.runtime.compress.codec
tez.runtime.io.sort.mb
tez.runtime.unordered.outpu
t.buffer.size-mb
tez.shuffle-vertexmanager.max-src-fraction
tez.shuffle-vertexmanager.min-src-fraction
tez.task.am.heartbeat.count
er.interval-ms.max
tez.task.launch.clusterdefault.cmd-opts
tez.task.launch.cmd-opts
tez.task.launch.env
tez.task.max-events-perheartbeat
tez.task.resource.memory.m
b
March 26, 2015
Value
20000
10000
-server -Djava.net.preferIPv4Stack=true Dhdp.version=${hdp.version}
-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps XX:+UseNUMA -XX:+UseParallelGC
LD_LIBRARY_PATH=/usr/hdp/${hdp.version}/hadoop/lib/native:/usr/h
dp/${hdp.version}/hadoop/lib
/native/Linux-amd64-64
2
10
/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure
2000
1000
false
1073741824
16777216
1.7
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService
true
org.apache.hadoop.io.compress.SnappyCodec
272
51
0.4
0.2
4000
-server -Djava.net.preferIPv4Stack=true Dhdp.version=${hdp.version}
-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps XX:+UseNUMA -XX:+UseParallelGC
LD_LIBRARY_PATH=/usr/hdp/${hdp.version}/hadoop/lib/native:/usr/h
dp/${hdp.version}/hadoop/lib
/native/Linux-amd64-64
500
682
Ä Modify
150
Name
tez.am.container.reuse.nonlocal-fallback.enabled
tez.am.resource.memory.mb
tez.lib.uris
tez.session.client.timeout.secs
March 26, 2015
Value
false
1364
/hdp/apps/${hdp.version}/tez/tez.tar.gz
-1
Ä Remove
Name
tez.am.container.session.delayallocation-millis
tez.am.env
tez.am.grouping.max-size
tez.am.grouping.min-size
tez.am.grouping.split-waves
tez.am.java.opt
tez.am.shuffle-vertexmanager.max-src-fraction
tez.am.shuffle-vertexmanager.min-src-fraction
tez.runtime.intermediateinput.compress.codec
tez.runtime.intermediateinput.is-compressed
tez.runtime.intermediateoutput.compress.codec
tez.runtime.intermediateoutput.should-compress
tez.yarn.ats.enabled
•
Value
10000
LD_LIBRARY_PATH=/usr/hdp/2.2.0.01947/hadoop/lib/native:/usr/hdp/2.2.0.01947/hadoop/lib/native/Linux-amd64-64
1073741824
16777216
1.4
-server -Xmx546m -Djava.net.preferIPv4Stack=true XX:+UseNUMA -XX:+UseParallelGC
0.4
0.2
false
false
true
Put Tez libraries in hdfs. Execute at any host:
su -l hdfs -c "hdfs dfs -mkdir -p /hdp/apps/2.2.x.0-<$version>/tez/"
su -l hdfs -c "hdfs dfs -copyFromLocal /usr/hdp/2.2.x.0<$version>/tez/lib/tez.tar.gz /hdp/apps/2.2.x.0-<$version>/tez/."
su -l hdfs -c "hdfs dfs -chown -R <HDFS_USER>:<HADOOP_GROUP> /hdp" su -l hdfs
-c "hdfs dfs -chmod -R 555 /hdp/apps/2.2.x.0-<$version>/tez" su -l hdfs -c
"hdfs dfs -chmod -R 444 /hdp/apps/2.2.x.0-<$version>/tez/tez.tar.gz"
26 Prepare the Storm service properties.
•
Edit nimbus.childopts.
Using Ambari Web UI > Services > Storm > Configs > Nimbus > find
nimbus.childopts. Update the path for the jmxetric-1.0.4.jar to:
/usr/hdp/current/storm-nimbus/contrib/storm-jmxetric/lib/jmxetric-1.0.4.jar.
If nimbus.childopts property value contains "Djava.security.auth.login.config=/path/to/storm_jaas.conf", remove this text.
•
Edit supervisor.childopts.
151
March 26, 2015
Using Ambari Web UI > Services > Storm > Configs > Supervisor > find
supervisor.childopts. Update the path for the jmxetric-1.0.4.jar to:
If supervisor.childopts property value contains "Djava.security.auth.login.config=/etc/storm/conf/storm_jaas.conf", remove this text.
•
Edit worker.childopts.
Using Ambari Web UI > Services > Storm > Configs > Advanced > storm-site find
worker.childopts. Update the path for the jmxetric-1.0.4.jar to:
Check if the _storm.thrift.nonsecure.transport property exists. If not, add it,
_storm.thrift.nonsecure.transport =
backtype.storm.security.auth.SimpleTransportPlugin, using the Custom storm-site
panel.
•
Remove the storm.local.dir from every host where the Storm component is
installed.
You can find this property in the Storm > Configs > General tab.
rm -rf <storm.local.dir>
•
If you are planning to enable secure mode, navigate to Ambari Web UI > Services
> Storm > Configs > Advanced storm-site and add the following property:
_storm.thrift.secure.transport=backtype.storm.security.auth.kerberos.Kerberos
SaslTransportPlugin
•
Stop the Storm Rest_API Component.
curl -u admin:admin -X PUT -H 'X-Requested-By:1' -d
'{"RequestInfo":{"context":"Stop
Component"},"Body":{"HostRoles":{"state":"INSTALLED"}}}'
http://server:8080/api/v1/clusters/c1/hosts/host_name/host_components/STORM_R
EST_API
In HDP 2.2, STORM_REST_API component was deleted because the service was
moved into STORM_UI_SERVER. When upgrading from HDP 2.1 to 2.2, you must
delete this component using the API as follows:
•
Delete the Storm Rest_API Component.
curl -u admin:admin -X DELETE -H 'X-Requested-By:1'
http://server:8080/api/v1/clusters/c1/hosts/host_name/host_components/STORM_R
EST_API
27 Upgrade Pig.
Copy the the Pig configuration files to /etc/pig/conf.
152
March 26, 2015
cp /etc/pig/conf.dist/pig.properties.rpmsave
/etc/pig/conf/pig.properties;
cp /etc/pig/conf.dist/pig-env.sh /etc/pig/conf/;
cp /etc/pig/conf.dist/log4j.properties.rpmsave
/etc/pig/conf/log4j.properties
28 Using Ambari Web UI > Services > Storm, start the Storm service.
29 Prepare the Falcon service properties:
•
Update Falcon Configuration Properties for HDP 2.2.x
Using Ambari Web UI > Services > Falcon > Configs > falcon startup
properties:
Ä Add
Name
Value
*.application.services
org.apache.falcon.security.AuthenticationInitializationService,\
org.apache.falcon.workflow.WorkflowJobEndNotificationService,\
org.apache.falcon.service.ProcessSubscriberService,\
org.apache.falcon.entity.store.ConfigurationStore,\
org.apache.falcon.rerun.service.RetryService,\
org.apache.falcon.rerun.service.LateRunService,\
org.apache.falcon.service.LogCleanupService
Using Ambari Web UI > Services > Falcon > Configs > advanced falconstartup:
Ä Add
Name
Value
*.dfs.namenode.kerberos.principal
nn/[email protected]
*.falcon.enableTLS
false
*.falcon.http.authentication.cookie.domain
EXAMPLE.COM
*.falcon.http.authentication.kerberos.keytab
/etc/security/keytabs/spnego.service.keytab
*.falcon.http.authentication.kerberos.principal HTTP/[email protected]
*.falcon.security.authorization.admin.groups
falcon
*.falcon.security.authorization.admin.users
falcon,ambari-qa
*.falcon.security.authorization.enabled
false
*.falcon.security.authorization.provider
org.apache.falcon.security.DefaultAuthorizationProvi
der
*.falcon.security.authorization.superusergrou
falcon
p
*.falcon.service.authentication.kerberos.keyta /etc/security/keytabs/falcon.service.keytab
b
*.falcon.service.authentication.kerberos.princi falcon/[email protected]
pal
*.journal.impl
org.apache.falcon.transaction.SharedFileSystemJou
rnal
prism.application.services
org.apache.falcon.entity.store.ConfigurationStore
prism.configstore.listeners
org.apache.falcon.entity.v0.EntityGraph,\
org.apache.falcon.entity.ColoClusterRelation,\
org.apache.falcon.group.FeedGroupMap
153
March 26, 2015
30 Using Ambari Web > Services > Service Actions, re-start all stopped services.
31 The upgrade is now fully functional but not yet finalized. Using the finalize command
removes the previous version of the NameNode and DataNode storage directories.
After the upgrade is finalized, the system cannot be rolled back. Usually this step is not
taken until a thorough testing of the upgrade has been performed.
The upgrade must be finalized before another upgrade can be performed.
Directories used by Hadoop 1 services set in
/etc/hadoop/conf/taskcontroller.cfg are not automatically deleted after
upgrade. Administrators can choose to delete these directories after the upgrade.
To finalize the upgrade, execute the following command once, on the primary NameNode
host in your HDP cluster,
sudo su -l <HDFS_USER> -c "hdfs dfsadmin -finalizeUpgrade"
Upgrading the HDP Stack from 2.0 to 2.2
The HDP Stack is the coordinated set of Hadoop components that you have installed on hosts in
your cluster. Your set of Hadoop components and hosts is unique to your cluster. Before upgrading
the Stack on your cluster, review all Hadoop services and hosts in your cluster to confirm the
location of Hadoop components. For example, use the Hosts and Services views in Ambari Web,
which summarize and list the components installed on each Ambari host, to determine the
components installed on each host. For more information about using Ambari to view components in
your cluster, see Working with Hosts, and Viewing Components on a Host.
Complete the following procedures to upgrade the Stack from version 2.0 to version 2.2.x on your
current, Ambari-installed-and-managed cluster.
1
2
3
154
March 26, 2015
If you plan to upgrade your existing JDK, do so after upgrading Ambari, before
upgrading the Stack. The upgrade steps require that you remove HDP v2.0
components and install HDP v2.2 components. As noted in that section, you should
remove and install on each host, only the components on each host that you want to
run on the HDP 2.2 stack.
For example, if you want to run Storm or Falcon components on the HDP 2.2 stack,
you will install those components and then configure their properties during the
upgrade procedure.
In preparation for future HDP 2.2 releases to support rolling upgrades, the HDP RPM package
version naming convention has changed to include the HDP 2.2 product version in file and directory
names. HDP 2.2 marks the first release where HDP rpms, debs, and directories contain versions in
the names to permit side-by-side installations of later HDP releases. To transition between previous
releases and HDP 2.2, Hortonworks provides hdp-select, a script that symlinks your directories to
hdp/current and lets you maintain using the same binary and configuration paths that you were
using before.
Use this procedure for upgrading from HDP 2.0 to any of the HDP 2.2 maintenance
releases. For example, to HDP 2.2.4. The instructions in this document refer to HDP
2.2.x.0 as a placeholder. To use an HDP 2.2.x.0 maintenance release, be sure to
replace 2.2.x.0 in the following instructions with the appropriate maintenance version,
such as 2.2.0.0 for the HDP 2.2 GA release, or 2.2.4.2 for an HDP 2.2 maintenance
release.
Refer to the HDP documentation for the information about the latest HDP 2.2
maintenance releases.
To prepare for upgrading the HDP Stack, this section describes how to perform the following tasks:
•
Disable Security.
If your Stack has Kerberos Security turned on, turn it off before performing the
upgrade. On Ambari W eb UI > Admin > Security click Disable Security. You
can re-enable Security after performing the upgrade.
•
Checkpoint user metadata and capture the HDFS operational state.
This step supports rollback and restore of the original state of HDFS data, if necessary.
•
Backup Hive and Oozie metastore databases.
This step supports rollback and restore of the original state of Hive and Oozie data, if
necessary.
•
Stop all HDP and Ambari services.
155
•
March 26, 2015
Make sure to finish all current jobs running on the system before upgrading the stack.
Libraries will change during the upgrade. Any jobs remaining active that use the older
version libraries will probably fail during the upgrade.
1
Use Ambari W eb > Services > Service Actions to stop all services except HDFS and
ZooKeeper.
2
Stop any client programs that access HDFS.
Perform steps 3 through 8 on the NameNode host. In a highly-available NameNode
configuration, execute the following procedure on the primary NameNode.
To locate the primary NameNode in an Ambari-managed HDP cluster, browse Ambari
Web > Services > HDFS. In Summary, click NameNode. Hosts > Summary
displays the host name FQDN.
3
If HDFS is in a non-finalized state from a prior upgrade operation, you must finalize HDFS
before upgrading further. Finalizing HDFS will remove all links to the metadata of the prior
HDFS version - do this only if you do not want to rollback to that prior HDFS version.
On the NameNode host, as the HDFS user,
su -l <HDFS_USER>
hdfs dfsadmin -finalizeUpgrade
4
Check the NameNode directory to ensure that there is no snapshot of any prior HDFS
upgrade.
Specifically, using Ambari W eb > HDFS > Configs > NameNode, examine the
<$dfs.namenode.name.dir> or the <$dfs.name.dir> directory in the NameNode
Directories property. Make sure that only a "\current" directory and no "\previous" directory
exists on the NameNode host.
5
Create the following logs and other files.
Creating these logs allows you to check the integrity of the file system, post-upgrade.
As the HDFS user,
su -l <HDFS_USER>
1
Run fsck with the following flags and send the results to a log. The resulting file
contains a complete block map of the file system. You use this log later to confirm
the upgrade.
156
March 26, 2015
hdfs fsck / -files -blocks -locations > dfs-old-fsck-1.log
2
Optional: Capture the complete namespace of the filesystem.
The following command does a recursive listing of the root file system:
hadoop dfs -ls -R / > dfs-old-lsr-1.log
3
Create a list of all the DataNodes in the cluster.
hdfs dfsadmin -report > dfs-old-report-1.log
4
6
Optional: Copy all unrecoverable data stored in HDFS to a local file system or to a
backup instance of HDFS.
Save the namespace.
You must be the HDFS service user to do this and you must put the cluster in Safe Mode.
hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace
In a highly-available NameNode configuration, the command hdfs dfsadmin saveNamespace sets a checkpoint in the first NameNode specified in the
configuration, in dfs.ha.namenodes.[nameservice ID]. You can also use the
dfsadmin -fs option to specify which NameNode to connect.
For example, to force a checkpoint in namenode 2:
hdfs dfsadmin -fs hdfs://namenode2-hostname:namenode2-port saveNamespace
7
Copy the checkpoint files located in <$dfs.name.dir/current> into a backup directory.
Find the directory, using Ambari Web > HDFS > Configs > NameNode > NameNode
Directories on your primary NameNode host.
In a highly-available NameNode configuration, the location of the checkpoint depends
on where the saveNamespace command is sent, as defined in the preceding step.
8
Store the layoutVersion for the NameNode.
Make a copy of the file at <dfs.name.dir>/current/VERSION where <dfs.name.dir> is
the value of the config parameter NameNode directories. This file will be used later to
verify that the layout version is upgraded.
157
9
March 26, 2015
Stop HDFS.
10 Stop ZooKeeper.
11 Using Ambari Web > Services > <service.name> > Summary, review each service and
make sure that all services in the cluster are completely stopped.
12 On the Hive Metastore database host, stop the Hive metastore service, if you have not done
so already.
Make sure that the Hive metastore database is running. For more information about
Administering the Hive metastore database, see the Hive Metastore Administrator
documentation.
13 If you are upgrading Hive and Oozie, back up the Hive and Oozie metastore databases on the
Hive and Oozie database host machines, respectively.
Make sure that your Hive database is updated to the minimum recommended version.
If you are using Hive with MySQL, we recommend upgrading your MySQL
database version to 5.6.21 before upgrading the HDP Stack to v2.2.x.
For specific information, see Database Requirements.
1
Optional - Back up the Hive Metastore database.
Table 17. Hive Metastore Database Backup and Restore Database
Type
MySQL
Postgres
Oracle
Backup
Restore
For example: mysqldump hive >
sudo -u <username> pg_dump
<databasename> > <outputfilename.sql>
For example: sudo -u postgres pg_dump hive >
Connect to the Oracle database using sqlplus
export the database: exp
username/password@database full=yes
file=output_file.dmp
2
mysql <dbname> <
<inputfilename.sql>
For example: mysql hive <
<databasename> <
<inputfilename.sql>
hive < /tmp/mydir/backup_hive.sql
Import the database: imp
username/password@database
ile=input_file.dmp
Optional - Back up the Oozie Metastore database.
158
March 26, 2015
Table 18. Oozie Metastore Database Backup and Restore Database
Type
MySQL
Backup
Restore
For example: mysqldump oozie >
sudo -u <username> pg_dump
<databasename> >
For example: sudo -u postgres pg_dump
oozie > /tmp/mydir/backup_oozie.sql
Postgres
mysql <dbname> <
<inputfilename.sql>
For example: mysql oozie <
<databasename> <
<inputfilename.sql>
oozie < /tmp/mydir/backup_oozie.sql
14 Stage the upgrade script.
a
Create an "Upgrade Folder". For example, /work/upgrade_hdp_2, on a host that
can communicate with Ambari Server. The Ambari Server host is a suitable
candidate.
b
Copy the upgrade script to the Upgrade Folder.
The script is available on the Ambari Server host in /var/lib/ambariserver/resources/scripts/upgradeHelper.py .
c
Copy the upgrade catalog to the Upgrade Folder.
The catalog is available in /var/lib/ambariserver/resources/upgrade/catalog/UpgradeCatalog_2.0_to_2.2.0.2.js
on .
python --version
15 Backup current configuration settings:
a
Go to the Upgrade Folder you just created in step 14.
b
Execute the backup-configs action:
159
March 26, 2015
python upgradeHelper.py --hostname <HOSTNAME> --user <USERNAME> -password<PASSWORD> --clustername <CLUSTERNAME> backup-configs
Where
This step produces a set of files named TYPE_TAG, where TYPE is the configuration
type and TAG is the tag. These files contain copies of the various configuration
settings for the current (pre-upgrade) cluster. You can use these files as a reference
later.
16 On the Ambari Server host, stop Ambari Server and confirm that it is stopped.
ambari-server stop
17 Stop all Ambari Agents.
At every host in your cluster known to Ambari,
ambari-agent stop
1
Upgrade the HDP repository on all hosts and replace the old repository file with the new file:
Be sure to replace GA/2.2.x.0 in the following instructions with the appropriate
maintenance version, such as GA/2.2.0.0 for the HDP 2.2 GA release, or
updates/2.2.4.2 for an HDP 2.2 maintenance release.
•
•
For SLES 11 SP3:
wget -nv http://public-repo1.hortonworks.com/HDP/suse11sp3/2.x/GA/2.2.x.0/hdp.repo -O
•
For SLES 11 SP1:
wget -nv http://public-repo1.hortonworks.com/HDP/sles11sp1/2.x/GA/2.2.x.0/hdp.repo -O
•
For UBUNTU:
160
March 26, 2015
wget -nv http://public-repo1.hortonworks.com/HDP/ubuntu12/2.x/GA/2.2.x.0/hdp.list -O
/etc/apt/sourceslist.d/HDP.list
•
Make sure to download the HDP.repo file under /etc/yum.repos on ALL hosts.
2
Update the Stack version in the Ambari Server database.
On the Ambari Server host, use the following command to update the Stack version to HDP2.2:
ambari-server upgradestack HDP-2.2
3
Back up the files in following directories on the Oozie server host and make sure that all files,
including *site.xml files are copied.
mkdir oozie-conf-bak
cp -R /etc/oozie/conf/* oozie-conf-bak
4
5
Remove the old oozie directories on all Oozie server and client hosts.
•
rm -rf /etc/oozie/conf
•
rm -rf /usr/lib/oozie/
•
rm -rf /var/lib/oozie/
Upgrade the Stack on all Ambari Agent hosts.
For each host, identify the HDP components installed on each host. Use Ambari Web,
to view components on each host in your cluster.
Based on the HDP components installed, tailor the following upgrade commands for
each host to upgrade only components residing on that host. For example, if you know
that a host has no HBase service or client packages installed, then you can adapt the
command to not include HBase, as follows:
yum install "collectd*" "gccxml*" "pig*" "hadoop*" "sqoop*"
"zookeeper*" "hive*"
If you are writing to multiple systems using a script, do not use " " with the run
command. You can use " " with pdsh -y.
161
March 26, 2015
•
1
On all hosts, clean the yum repository.
yum clean all
2 Remove all components that you want to upgrade. At least, WebHCat, HCatlaog, and
Oozie components.
This command un-installs the HDP 2.0 component bits. It leaves the user data and
metadata, but removes your configurations.
yum erase "hadoop*" "webhcat*" "hcatalog*" "oozie*" "pig*" "hdfs*" "sqoop*"
"zookeeper*" "hbase*" "hive*" "phoenix*" "accumulo*" "mahout*" "hue*"
"flume*" "hdp_mon_nagios_addons"
3
Install the following components:
yum install "hadoop_2_2_x_0_*" "zookeeper_2_2_x_0_*" "hive_2_2_x_0_*"
"flume_2_2_x_0_*" "phoenix_2_2_x_0_*" "accumulo_2_2_x_0_*" "mahout_2_2_x_0_*"
yum install hue hue-common hue-beeswax hue-hcatalog hue-pig hue-oozie
4
yum list installed | grep HDP-<old-stack-version-number>
Nothing should appear in the returned list.
•
For SLES:
1
On all hosts, clean the zypper repository.
zypper clean --all
2 Remove WebHCat, HCatalog, and Oozie components.
This command uninstalls the HDP 2.0 component bits. It leaves the user data and
metadata, but removes your configurations.
zypper remove "hadoop*" "webhcat*" "hcatalog*" "oozie*" "pig*" "hdfs*"
"sqoop*" "zookeeper*" "hbase*" "hive*" "phoenix*" "accumulo*" "mahout*"
"hue*" "flume*" "hdp_mon_nagios_addons"
3
Install the following components:
zypper install "hadoop\_2_2_x_0_*" "oozie\_2_2_x_0_*" "pig\_2_2_x_0_*"
"sqoop\_2_2_x_0_*" "zookeeper\_2_2_x_0_*" "hbase\_2_2_x_0_*"
"hive\_2_2_x_0_*" "flume\_2_2_x_0_*" "phoenix\_2_2_x_0_*"
"accumulo\_2_2_x_0_*" "mahout\_2_2_x_0_*"
zypper install hue hue-common hue-beeswax hue-hcatalog hue-pig hue-oozie
4
rpm -qa | grep hadoop, && rpm -qa | grep hive && rpm -qa | grep hcatalog
No 2.0 components should appear in the returned list.
5 If components were not upgraded, upgrade them as follows:
yast --update hadoop hcatalog hive
6
Symlink directories, using hdp-select.
162
March 26, 2015
To prevent version-specific directory issues for your scripts and updates, Hortonworks
provides hdp-select, a script that symlinks directories to hdp-current and modifies
paths for configuration directories.
You should see: hdp-select-2.2.4.4-2.el6.noarch
If not, then run:
Run hdp-select as root, on every node. In /usr/bin:
where <$version> is the build number. For the HDP 2.2.4.2 release <$version> =
2.
You should see: hdp-select-2.2.4.2-2.el6.noarch
If not, then run:
Run hdp-select as root, on every node. In /usr/bin:
where <$version> is the build number. For the HDP 2.2.4.2 release <$version> = 2.
7
Verify that all components are on the new version. The output of this statement should be
empty,
hdp-select status | grep -v 2\.2\.x\.x | grep -v None
8
If you are using Hue, you must upgrade Hue manually. For more information, see Confiure
and Start Hue.
9
On the Hive Metastore database host, stop the Hive Metastore service, if you have not done
so already. Make sure that the Hive Metastore database is running.
10 Upgrade the Hive metastore database schema from v12 to v14, using the following
instructions:
•
Set java home:
163
March 26, 2015
export JAVA_HOME=/path/to/java
•
Copy (rewrite) old Hive configurations to new conf dir:
cp -R /etc/hive/conf.server/* /etc/hive/conf/
•
Copy the jdbc connector to /usr/hdp/<$version>/hive/lib, if it not there, yet.
•
<HIVE_HOME>/bin/schematool -upgradeSchema -dbType<databaseType>
where <HIVE_HOME> is the Hive installation directory.
For example, on the Hive Metastore host:
/usr/hdp/2.2.x.0-<$version>/hive/bin/schematool -upgradeSchema -dbType
<databaseType>
where <$version> is the 2.2.x build number and <databaseType> is derby, mysql,
oracle, or postgres.
1
On the Server host,
amber-server start
2
Start all Ambari Agents.
On each Ambari Agent host,
ambari-agent start
3
Update the repository Base URLs in the Ambari Server for the HDP 2.2.0 stack.
Browse to Ambari Web > Admin > Repositories, then set the value of the HDP and HDPUTILS repository Base URLs. For more information about viewing and editing repository Base
URLs, see Viewing Cluster Stack Version and Repository URLs.
For a remote, accessible, public repository, the HDP and HDP-UTILS Base URLs are
the same as the baseurl=values in the HDP.repo file downloaded in Upgrade the Stack:
Step 1. For a local repository, use the local repository Base URL that you configured
for the HDP Stack. For links to download the HDP repository files for your version of
the Stack, see HDP Stack Repositories.
4
Update the respective configurations.
a
Go to the Upgrade Folder you created when Preparing the 2.0 Stack for Upgrade.
b
Execute the update-configs action:
164
March 26, 2015
$USERNAME --password $PASSWORD --clustername $CLUSTERNAME -fromStack=$FROMSTACK --toStack=$TOSTACK -upgradeCatalog=$UPGRADECATALOG update-configs [configuration
item]
Where
<FROMSTACK> is the version number of pre-upgraded stack, for example 2.0
<TOSTACK> it the version number of the upgraded stack, for example 2.2.x
<UPGRADECATALOG> is the path to the upgrade catalog file, for example
UpgradeCatalog_2.0_to_2.2.x.json
For example,
To update all configuration items:
To update configuration item hive-site:
hive-site
5
Using the Ambari W eb UI > Services, start the ZooKeeper service.
6
At all Datanode and Namenode hosts, copy (rewrite) old hdfs configurations to new conf
directory:
cp /etc/hadoop/conf.empty/hdfs-site.xml.rpmsave
site.xml;
/etc/hadoop/conf/hdfs-
cp /etc/hadoop/conf.empty/hadoop-env.sh.rpmsave
/etc/hadoop/conf/hadoop-env.sh;
cp /etc/hadoop/conf.empty/log4j.properties.rpmsave
/etc/hadoop/conf/log4j.properties;
cp /etc/hadoop/conf.empty/core-site.xml.rpmsave /etc/hadoop/conf/coresite.xml
7
If you are upgrading from an HA NameNode configuration, start all JournalNodes.
165
March 26, 2015
On each JournalNode host, run the following command:
su -l <HDFS_USER> -c "/usr/hdp/2.2.x.0-<$version>/hadoop/sbin/hadoopdaemon.sh start journalnode"
All JournalNodes must be running when performing the upgrade, rollback, or
finalization operations. If any JournalNodes are down when running any such
operation, the operation will fail.
8
Because the file system version has now changed, you must start the NameNode manually.
On the active NameNode host, as the HDFS user:
su -l <HDFS_USER> -c "export HADOOP_LIBEXEC_DIR=/usr/hdp/2.2.x.0<$version>/hadoop/libexec && /usr/hdp/2.2.x.0-<$version>/hadoop/sbin/hadoopdaemon.sh start namenode -upgrade"
To check if the Upgrade is in progress, check that the " \previous " directory has been
created in \NameNode and \JournalNode directories. The " \previous " directory contains a
snapshot of the data before upgrade.
In a NameNode HA configuration, this NameNode will not enter the standby state as
usual. Rather, this NameNode will immediately enter the active state, perform an
upgrade of its local storage directories, and also perform an upgrade of the shared edit
log. At this point, the standby NameNode in the HA pair is still down. It will be out of
sync with the upgraded active NameNode.
To synchronize the active and standby NameNode, re-establishing HA, re-bootstrap
the standby NameNode by running the NameNode with the '-bootstrapStandby' flag.
Do NOT start this standby NameNode with the '-upgrade' flag.
As the HDFS user:
su -l <HDFS_USER> -c "hdfs namenode -bootstrapStandby -force"w
The bootstrapStandby command will download the most recent fsimage from the
active NameNode into the <dfs.name.dir> directory of the standby NameNode. You
can enter that directory to make sure the fsimage has been successfully downloaded.
After verifying, start the ZKFailoverController via Ambari, then start the standby
NameNode via Ambari. You can check the status of both NameNodes using the Web
UI.
9
Start all DataNodes.
On each DataNode, as the HDFS user,
166
March 26, 2015
su -l <HDFS_USER> -c "/usr/hdp/2.2.x.0-<$version>/hadoop/sbin/hadoopdaemon.sh --config /etc/hadoop/conf start datanode"
The NameNode sends an upgrade command to DataNodes after receiving block reports.
10 Update HDFS Configuration Properties for HDP 2.2.x
Using Ambari Web UI > Services > HDFS > Configs > core-site.xml:
Ä Add
Name
hadoop.proxyuser.falcon.groups
hadoop.proxyuser.falcon.hosts
Value
users
*
Using Ambari Web UI > Services > HDFS > Configs > hdfs-site.xml:
Ä Add
Name
dfs.namenode.startup.delay.block.deletion.sec
Value
3600
Ä Modify
Name
dfs.datanode.max.transfer.threads
Value
4096
11 Restart HDFS.
1
Open the Ambari Web GUI. If the browser in which Ambari is running has been open
throughout the process, clear the browser cache, then refresh the browser.
2
Choose Ambari Web > Services > HDFS > Service Actions > Restart All.
In a cluster configured for NameNode High Availability, use the following procedure to
restart NameNodes. Using the following procedure preserves HA when upgrading the
cluster.
1 Using Ambari Web > Services > HDFS, choose Active NameNode.
This shows the host name of the current, active NameNode.
2 Write down (or copy, or remember) the host name of the active NameNode.
You need this host name for step 4.
3 Using Ambari Web > Services > HDFS > Service Actions > choose Stop.
This stops all of the HDFS Components, including both NameNodes.
4 Using Ambari Web > Hosts > choose the host name you noted in Step 2, then
start that NameNode component, using Host Actions > Start.
This causes the original, active NameNode to re-assume its role as the active
NameNode.
5
Using Ambari Web > Services > HDFS > Service Actions, choose Re-Start All.
3
Choose Service Actions > Run Service Check. Makes sure the service checks
pass.
167
March 26, 2015
12 After the DataNodes are started, HDFS exits safe mode. Monitor the status, by running the
following command, as the HDFS user:
sudo su -l <HDFS_USER> -c "hdfs dfsadmin -safemode get"
When HDFS exits safe mode, the following message displays:
Safe mode is OFF
13 Make sure that the HDFS upgrade was successful.
•
Compare the old and new versions of the following log files:
•
dfs-old-fsck-1.log versus dfs-new-fsck-1.log.
The files should be identical unless the hadoop fsck reporting format has changed in the new
version.
• dfs-old-lsr-1.log versus dfs-new-lsr-1.log.
The files should be identical unless the format of hadoop fs -lsr reporting or the data structures have
changed in the new version.
• dfs-old-report-1.log versus fs-new-report-1.log.
Make sure that all DataNodes in the cluster before upgrading are up and running.
14 Update HBase Configuration Properties for HDP 2.2.x
Using Ambari Web UI > Services > HBase > Configs > hbase-site.xml:
Ä Add
Name
hbase.hregion.majorcompaction.jitter
Value
0.50
Ä Modify
Name
hbase.hregion.majorcompaction
hbase.hregion.memstore.block.multiplier
Value
604800000
4
Ä Remove
Name
hbase.hstore.flush.retries.number
Value
120
15 Using Ambari Web, navigate to Services > Hive > Configs > Advanced and verify that the
following properties are set to their default values:
168
March 26, 2015
Hive (Advanced)
hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.author
ization.StorageBasedAuthorizationProvider
hive.security.metastore.authorization.manager=org.apache.hadoop.hive.ql.secur
ity.authorization.StorageBasedAuthorizationProvider
hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.ProxyU
serAuthenticator
The Security Wizard enables Hive authorization. The default values for these properties
changed in Hive-0.12. If you are upgrading Hive from 0.12 to 0.13 in a secure cluster,
you should not need to change the values. If upgrading from Hive-older than version
0.12 to Hive-0.12 or greater in a secure cluster, you will need to correct the values.
16 Update Hive Configuration Properties for HDP 2.2.x
Using Ambari Web UI > Services > Hive > Configs > hive-site.xml:
Ä Add
Name
hive.cluster.delegation.token.store.zo
okeeper.connectString
datanucleus.cache.level2.type
hive.auto.convert.sortmerge.join.to.m
apjoin
hive.cbo.enable
hive.cli.print.header
hive.cluster.delegation.token.store.cla
ss
hive.cluster.delegation.token.store.zo
okeeper.znode
hive.compactor.abortedtxn.threshold
hive.compactor.check.interval
hive.compactor.delta.num.threshold
hive.compactor.delta.pct.threshold
hive.compactor.initiator.on
hive.compactor.worker.threads
hive.compactor.worker.timeout
hive.compute.query.using.stats
hive.conf.restricted.list
hive.convert.join.bucket.mapjoin.tez
hive.enforce.sortmergebucketmapjoin
hive.exec.compress.intermediate
hive.exec.compress.output
hive.exec.dynamic.partition
hive.exec.dynamic.partition.mode
hive.exec.max.created.files
hive.exec.max.dynamic.partitions
hive.exec.max.dynamic.partitions.per
node
Value

none
false
true
false
org.apache.hadoop.hive.thrift.ZooKeeperTokenStore
/hive/cluster/delegation
1000
300L
10
0.1f
false
0
86400L
true
hive.security.authenticator.manager,hive.security.authorizati
on.manager,hive.users.in.admin.role
false
true
false
false
true
nonstrict
100000
5000
2000
169
hive.exec.orc.compression.strategy
hive.exec.orc.default.compress
hive.exec.orc.default.stripe.size
hive.exec.parallel
hive.exec.parallel.thread.number
hive.exec.reducers.bytes.per.reducer
hive.exec.reducers.max
hive.exec.scratchdir
hive.exec.submit.local.task.via.child
hive.exec.submitviachild
hive.fetch.task.aggr
hive.fetch.task.conversion
hive.fetch.task.conversion.threshold
hive.limit.optimize.enable
hive.limit.pushdown.memory.usage
hive.map.aggr.hash.force.flush.memo
ry.threshold
hive.map.aggr.hash.min.reduction
hive.map.aggr.hash.percentmemory
hive.mapjoin.optimized.hashtable
hive.merge.mapfiles
hive.merge.mapredfiles
hive.merge.orcfile.stripe.level
hive.merge.rcfile.block.level
hive.merge.size.per.task
hive.merge.smallfiles.avgsize
hive.merge.tezfiles
hive.metastore.authorization.storage.
checks
hive.metastore.client.connect.retry.de
lay
hive.metastore.connect.retries
hive.metastore.failure.retries
hive.metastore.kerberos.keytab.file
hive.metastore.kerberos.principal
hive.metastore.server.max.threads
hive.optimize.constant.propagation
hive.optimize.metadataonly
hive.optimize.null.scan
hive.optimize.sort.dynamic.partition
hive.orc.compute.splits.num.threads
hive.orc.splits.include.file.footer
hive.prewarm.enabled
hive.prewarm.numcontainers
hive.security.metastore.authenticator.
manager
hive.security.metastore.authorization.
auth.reads
hive.server2.allow.user.substitution
March 26, 2015
SPEED
ZLIB
67108864
false
8
67108864
1009
/tmp/hive
true
false
false
more
1073741824
true
0.04
0.9
0.5
0.5
true
true
false
true
true
256000000
16000000
false
false
5s
24
24
/etc/security/keytabs/hive.service.keytab
hive/[email protected]
100000
true
true
true
false
10
false
false
10
org.apache.hadoop.hive.ql.security.HadoopDefaultMetastor
eAuthenticator
true
true
170
hive.server2.authentication.spnego.ke
ytab
hive.server2.authentication.spnego.pr
incipal
hive.server2.logging.operation.enable
d
hive.server2.logging.operation.log.loc
ation
hive.server2.table.type.mapping
hive.server2.tez.default.queues
hive.server2.tez.sessions.per.default.
queue
hive.server2.thrift.http.path
hive.server2.thrift.http.port
hive.server2.thrift.max.worker.threads
hive.server2.thrift.sasl.qop
hive.server2.transport.mode
hive.server2.use.SSL
hive.smbjoin.cache.rows
hive.stats.autogather
hive.stats.dbclass
hive.stats.fetch.column.stats
hive.stats.fetch.partition.stats
hive.support.concurrency
hive.tez.auto.reducer.parallelism
hive.tez.cpu.vcores
hive.tez.dynamic.partition.pruning
x.data.size
x.event.size
hive.tez.input.format
hive.tez.log.level
hive.tez.max.partition.factor
hive.tez.min.partition.factor
hive.tez.smb.number.waves
hive.txn.manager
hive.txn.max.open.batch
hive.txn.timeout
hive.user.install.directory
hive.vectorized.execution.reduce.ena
bled
hive.vectorized.groupby.checkinterval
hive.vectorized.groupby.flush.percent
hive.vectorized.groupby.maxentries
hive.zookeeper.client.port
hive.zookeeper.namespace
hive.zookeeper.quorum
March 26, 2015
HTTP/[email protected]
/etc/security/keytabs/spnego.service.keytab
true
${system:java.io.tmpdir}/${system:user.name}/operation_log
s
CLASSIC
default
1
cliservice
10001
500
auth
binary
false
10000
true
fs
false
true
false
false
-1
true
104857600
1048576
org.apache.hadoop.hive.ql.io.HiveInputFormat
INFO
2.0
0.25
0.5
org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager
1000
300
/user/
false
4096
0.1
100000
2181
hive_zookeeper_namespace

Ä Modify
171
Name
hive.auto.convert.joi
n.noconditionaltask.
size
hive.metastore.client
.socket.timeout
hive.optimize.reduce
deduplication.min.re
ducer
hive.security.authori
zation.manager
hive.security.metast
ore.authorization.ma
nager
hive.server2.support
.dynamic.service.dis
covery
hive.tez.container.si
ze
hive.tez.java.opts
fs.file.impl.disable.c
ache
fs.hdfs.impl.disable.
cache
March 26, 2015
Value
238026752
1800s
4
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfO
nlyAuthorizerFactory
org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorization
Provider,org.apache.hadoop.hive.ql.security.authorization.MetaStoreAuthzAP
IAuthorizerEmbedOnly
true
682
-server -Xmx546m -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 XX:+UseNUMA -XX:+UseParallelGC -XX:+PrintGCDetails -verbose:gc XX:+PrintGCTimeStamps
true
true
17 If YARN is installed in your HDP 2.0 stack, and the Application Timeline Server (ATS)
components are NOT, then you must create and install ATS service and host components via
API by running the following commands on the server that will host the YARN application
timeline server in your cluster. Be sure to replace <your_ATS_component_hostname> with
a host name appropriate for your envrionment.
Ambari does not currently support ATS in a kerberized cluster. If you are upgrading
YARN in a kerberized cluster, skip this step.
1
Create the ATS Service Component.
http://localhost:8080/api/v1/clusters/<your_cluster_name>/services/YARN/compo
nents/APP_TIMELINE_SERVER
2
Create the ATS Host Component.
3
Install the ATS Host Component.
172
March 26, 2015
curl --user admin:admin -H "X-Requested-By: ambari" -i -X PUT -d '{
"HostRoles": { "state": "INSTALLED"}}'
curl commands use the default username/password = admin/admin. To run the
curl commands using non-default credentials, modify the --user option to use your
Ambari administrator credentials. For example: --user
<ambari_admin_username>:<ambari_admin_password>.
18 Make the following config changes required for Application Timeline Server. Use the Ambari
web UI to navigate to the service dashboard and add/modify the following configurations:
YARN (Custom yarn-site.xml)
yarn.timeline-service.leveldb-timeline-store.path=/var/log/hadoopyarn/timeline
yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=300000
yarn.timeline-service.storeclass=org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore
yarn.timeline-service.ttl-enable=true
yarn.timeline-service.ttl-ms=2678400000
yarn.timeline-service.generic-application-history.storeclass=org.apache.hadoop.yarn.server.applicationhistoryservice.NullApplication
HistoryStore
yarn.timelineservice.webapp.address=<PUT_THE_FQDN_OF_ATS_HOST_NAME_HERE>:8188
yarn.timelineservice.webapp.https.address=<PUT_THE_FQDN_OF_ATS_HOST_NAME_HERE>:8190
yarn.timeline-service.address=<PUT_THE_FQDN_OF_ATS_HOST_NAME_HERE>:10200
HIVE (hive-site.xml)
hive.execution.engine=mr
hive.exec.failure.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook
hive.exec.post.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook
hive.exec.pre.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook
hive.tez.container.size=<map-container-size>
*If mapreduce.map.memory.mb > 2GB then set it equal to
mapreduce.map.memory. Otherwise, set it equal to
mapreduce.reduce.memory.mb*
hive.tez.java.opts="-server -Xmx" + Math.round(0.8 * map-container-size) + "m
-Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA XX:+UseParallelGC"
Use configuration values appropriate for your environment. For example, the value
"800" in the preceding example is shown only for illustration purposes.
19 Prepare MR2 and Yarn for work. Execute hdfs commands on any host.
•
Create mapreduce dir in hdfs.
173
March 26, 2015
su -l <HDFS_USER> -c "hdfs dfs -mkdir -p /hdp/apps/2.2.x.0<$version>/mapreduce/"
•
Copy new mapreduce.tar.gz to hdfs mapreduce dir.
su -l <HDFS_USER> -c "hdfs dfs -copyFromLocal /usr/hdp/2.2.x.0<$version>/hadoop/mapreduce.tar.gz /hdp/apps/2.2.x.0-<$version>/mapreduce/."
•
Grant permissions for created mapreduce dir in hdfs.
su -l <HDFS_USER> -c "hdfs dfs -chown -R <HDFS_USER>:<HADOOP_GROUP> /hdp";
su -l <HDFS_USER> -c "hdfs dfs -chmod -R 555 /hdp/apps/2.2.x.0<$version>/mapreduce";
su -l <HDFS_USER> -c "hdfs dfs -chmod -R 444 /hdp/apps/2.2.x.0<$version>/mapreduce/mapreduce.tar.gz"
•
Using Ambari Web UI > Service > Mapreduce2 > Configs > Advanced >
mapred-site:
•
Add
Name
mapreduce.job.emit-timeline-data
mapreduce.jobhistory.bind-host
mapreduce.reduce.shuffle.fetch.retry.enabled
mapreduce.reduce.shuffle.fetch.retry.intervalms
mapreduce.reduce.shuffle.fetch.retry.timeoutms
• Modify
Name
mapreduce.admin.map.ch
ild.java.opts
mapreduce.admin.reduce.
child.java.opts
mapreduce.map.java.opts
mapreduce.map.memory.
mb
mapreduce.reduce.java.o
pts
mapreduce.task.io.sort.m
b
yarn.app.mapreduce.am.a
dmin-command-opts
yarn.app.mapreduce.am.c
ommand-opts
yarn.app.mapreduce.am.r
esource.mb
mapreduce.application.fra
mework.path
Value
false
0.0.0.0
1
1000
30000
Value
-Xmx546m
682
-Xmx546m
273
-Dhdp.version=${hdp.version}
-Xmx546m -Dhdp.version=${hdp.version}
682
/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework
174
mapreduce.application.cl
asspath
mapreduce.admin.user.en
v
•
March 26, 2015
$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mrframework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mrframework/hadoop/share/hadoop/common/*:$PWD/mrframework/hadoop/share/hadoop/common/lib/*:$PWD/mrframework/hadoop/share/hadoop/yarn/*:$PWD/mrframework/hadoop/share/hadoop/yarn/lib/*:$PWD/mrframework/hadoop/share/hadoop/hdfs/*:$PWD/mrframework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/had
oop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure
LD_LIBRARY_PATH=/usr/hdp/${hdp.version}/hadoop/lib/native:/usr/hdp
/${hdp.version}/hadoop/lib/native/Linux-amd64-64
Using Ambari Web UI > Service > Yarn > Configs > Advanced > yarn-site.
Add/modify the following property:
Name
hadoop.registry.zk.quorum
hadoop.registry.rm.enabled
yarn.client.nodemanagerconnect.max-wait-ms
yarn.client.nodemanagerconnect.retry-interval-ms
yarn.node-labels.fs-store.retrypolicy-spec
yarn.node-labels.fs-store.rootdir
yarn.node-labels.managerclass
yarn.nodemanager.bind-host
yarn.nodemanager.diskhealth-checker.max-diskutilization-per-disk-percentage
yarn.nodemanager.diskhealth-checker.min-freespace-per-disk-mb
yarn.nodemanager.linuxcontainerexecutor.cgroups.hierarchy
yarn.nodemanager.linuxcontainerexecutor.cgroups.mount
yarn.nodemanager.linuxcontainerexecutor.cgroups.strictresource-usage
yarn.nodemanager.linuxcontainer-executor.resourceshandler.class
yarn.nodemanager.logaggregation.debug-enabled
Value

false
900000
10000
2000, 500
/system/yarn/node-labels
org.apache.hadoop.yarn.server.resourcemanager.nodelabels.Mem
oryRMNodeLabelsManager
0.0.0.0
90
1000
hadoop-yarn
false
false
org.apache.hadoop.yarn.server.nodemanager.util.DefaultLCEResou
rcesHandler
false
175
yarn.nodemanager.logaggregation.num-log-files-perapp
yarn.nodemanager.logaggregation.roll-monitoringinterval-seconds
yarn.nodemanager.recovery.di
r
yarn.nodemanager.recovery.en
abled
yarn.nodemanager.resource.c
pu-vcores
yarn.nodemanager.resource.p
ercentage-physical-cpu-limit
yarn.resourcemanager.bindhost
t.max-wait.ms
t.retry-interval.ms
-store.retry-policy-spec
-store.uri
yarn.resourcemanager.ha.ena
bled
yarn.resourcemanager.recover
y.enabled
yarn.resourcemanager.statestore.max-completedapplications
yarn.resourcemanager.store.cl
ass
-metricspublisher.dispatcher.pool-size
-metrics-publisher.enabled
yarn.resourcemanager.webap
p.delegation-token-authfilter.enabled
yarn.resourcemanager.workpreserving-recovery.enabled
yarn.resourcemanager.workpreservingrecovery.scheduling-wait-ms
yarn.resourcemanager.zk-acl
yarn.resourcemanager.zkaddress
yarn.resourcemanager.zknum-retries
March 26, 2015
30
-1
/var/log/hadoop-yarn/nodemanager/recovery-state
false
1
100
0.0.0.0
900000
30000
2000, 500
<enter a "space" as the property value>
false
false
${yarn.resourcemanager.max-completed-applications}
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMSt
ateStore
10
true
false
false
10000
world:anyone:rwcda
localhost:2181
1000
176
yarn.resourcemanager.zkretry-interval-ms
yarn.resourcemanager.zkstate-store.parent-path
yarn.resourcemanager.zktimeout-ms
yarn.timeline-service.bind-host
yarn.timelineservice.client.max-retries
yarn.timelineservice.client.retry-interval-ms
yarn.timeline-service.enabled
yarn.timeline-service.httpauthentication.simple.anonym
ous.allowed
yarn.timeline-service.httpauthentication.type
yarn.timeline-service.leveldbtimeline-store.read-cache-size
yarn.timeline-service.leveldbtimeline-store.start-time-readcache-size
yarn.timeline-service.leveldbtimeline-store.start-time-writecache-size
March 26, 2015
1000
/rmstore
10000
0.0.0.0
30
1000
true
true
simple
104857600
10000
10000
20 Using Ambari Web > Services > Service Actions, start YARN.
21 Using Ambari Web > Services > Service Actions, start MapReduce2.
22 Using Ambari Web > Services > Service Actions, start HBase and ensure the service
check passes.
23 Using Ambari Web > Services > Service Actions, start the Hive service.
24 Upgrade Oozie.
1
Perform the following preparation steps on each Oozie server host:
You must replace your Oozie configuration after upgrading.
1
2
Copy configurations from oozie-conf-bak to the /etc/oozie/conf directory on
each Oozie server and client.
Create /usr/hdp/2.2.x.0-<$version>/oozie/libext-upgrade22 directory.
mkdir /usr/hdp/2.2.x.0-<$version>/oozie/libext-upgrade22
177
March 26, 2015
3
Copy the JDBC jar of your Oozie database to both /usr/hdp/2.2.x.0<$version>/oozie/libext-upgrade22 and /usr/hdp/2.2.x.0<$version>/oozie/libtools.
For example, if you are using MySQL, copy your mysql-connector-java.jar.
4
Copy these files to /usr/hdp/2.2.x.0-<$version>/oozie/libext-upgrade22
directory
cp /usr/lib/hadoop/lib/hadoop-lzo*.jar /usr/hdp/2.2.x.0<$version>/oozie/libext-upgrade22;
cp /usr/share/HDP-oozie/ext-2.2.zip /usr/hdp/2.2.x.0-<$version>/oozie/libextupgrade22;
cp /usr/share/HDP-oozie/ext-2.2.zip /usr/hdp/2.2.x.0-<$version>/oozie/libext
5 Grant read/write access to the Oozie user.
chmod -R 777 /usr/hdp/2.2.x.0-<$version>/oozie/libext-upgrade22
2
Upgrade steps:
1
2
3
On the Services view, make sure that YARN and MapReduce2 services are running.
Make sure that the Oozie service is stopped.
In oozie-env.sh, comment out CATALINA_BASE property, also do the same using
Ambari Web UI in Services > Oozie > Configs > Advanced oozie-env.
Upgrade Oozie.
4
At the Oozie server host, as the Oozie service user:
sudo su -l <OOZIE_USER> -c"/usr/hdp/2.2.x.0-<$version>/oozie/bin/ooziedb.sh
upgrade -run"
Make sure that the output contains the string "Oozie DB has been upgraded to Oozie version
<OOZIE_Build_Version>.
5 Prepare the Oozie WAR file.
The Oozie server must be not running for this step. If you get the message "ERROR:
Stop Oozie first", it means the script still thinks it's running. Check, and if needed,
remove the process id (pid) file indicated in the output.
At the Oozie server, as the Oozie user
sudo su -l <OOZIE_USER> -c "/usr/hdp/2.2.x.0-<$version>/oozie/bin/ooziesetup.sh prepare-war -d /usr/hdp/2.2.x.0-<$version>/oozie/libext-upgrade22"
Make sure that the output contains the string "New Oozie WAR file added".
6 Using Ambari W eb, choose Services > Oozie > Configs, expand oozielog4j, then add the following property:
178
March 26, 2015
log4j.appender.oozie.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L SERVER[${oozie.instance.id}] %m%n
where ${oozie.instance.id} is determined by oozie, automatically.
7 Using Ambari Web, choose Services > Oozie > Configs, expand Advanced
oozie-site, then edit the following properties:
A In oozie.service.coord.push.check.requeue.interval, replace the existing
property value with the following one:
30000
B
In oozie.service.SchemaService.wf.ext.schemas, append (using
copy/paste) to the existing property value the following string, if is it is not already
present:
shell-action-0.1.xsd,shell-action-0.2.xsd,shell-action-0.3.xsd,email-action0.1.xsd,email-action-0.2.xsd,hive-action-0.2.xsd,hive-action-0.3.xsd,hiveaction-0.4.xsd,hive-action-0.5.xsd,sqoop-action-0.2.xsd,sqoop-action0.3.xsd,sqoop-action-0.4.xsd,ssh-action-0.1.xsd,ssh-action-0.2.xsd,distcpaction-0.1.xsd,distcp-action-0.2.xsd,oozie-sla-0.1.xsd,oozie-sla-0.2.xsd
If you have customized schemas, append this string to your custom schema name
string.
Do not overwrite custom schemas.
If you have no customized schemas, you can replace the existing string with the
following one:
shell-action-0.1.xsd,email-action-0.1.xsd,hive-action-0.2.xsd,sqoop-action0.2.xsd,ssh-action-0.1.xsd,distcp-action-0.1.xsd,shell-action-0.2.xsd,oozie-sla0.1.xsd,oozie-sla-0.2.xsd,hive-action-0.3.xsd
C
In oozie.service.URIHandlerService.uri.handlers, append to the existing
property value the following string, if is it is not already present:
org.apache.oozie.dependency.FSURIHandler,org.apache.oozie.dependency.HCatURIH
andler
D In oozie.services, make sure all the following properties are present:
179
March 26, 2015
org.apache.oozie.service.SchedulerService,
org.apache.oozie.service.InstrumentationService,
org.apache.oozie.service.MemoryLocksService,
org.apache.oozie.service.UUIDService,
org.apache.oozie.service.ELService,
org.apache.oozie.service.AuthorizationService,
org.apache.oozie.service.UserGroupInformationService,
org.apache.oozie.service.HadoopAccessorService,
org.apache.oozie.service.JobsConcurrencyService,
org.apache.oozie.service.URIHandlerService,
org.apache.oozie.service.DagXLogInfoService,
org.apache.oozie.service.SchemaService,
org.apache.oozie.service.LiteWorkflowAppService,
org.apache.oozie.service.JPAService,
org.apache.oozie.service.StoreService,
org.apache.oozie.service.CoordinatorStoreService,
org.apache.oozie.service.SLAStoreService,
org.apache.oozie.service.DBLiteWorkflowStoreService,
org.apache.oozie.service.CallbackService,
org.apache.oozie.service.ActionService,
org.apache.oozie.service.ShareLibService,
org.apache.oozie.service.CallableQueueService,
org.apache.oozie.service.ActionCheckerService,
org.apache.oozie.service.RecoveryService,
org.apache.oozie.service.PurgeService,
org.apache.oozie.service.CoordinatorEngineService,
org.apache.oozie.service.BundleEngineService,
org.apache.oozie.service.DagEngineService,
org.apache.oozie.service.CoordMaterializeTriggerService,
org.apache.oozie.service.StatusTransitService,
org.apache.oozie.service.PauseTransitService,
org.apache.oozie.service.GroupsService,
org.apache.oozie.service.ProxyUserService,
org.apache.oozie.service.XLogStreamingService,
org.apache.oozie.service.JvmPauseMonitorService
E
Add the oozie.services.coord.check.maximum.frequency property with the
following property value: false
If you set this property to true, Oozie rejects any coordinators with a frequency faster than 5 minutes.
It is not recommended to disable this check or submit coordinators with frequencies faster than 5
minutes: doing so can cause unintended behavior and additional system stress.
F Add the oozie.service.AuthorizationService.security.enabled property
with the following property value: false
Specifies whether security (user name/admin role) is enabled or not. If disabled any user can manage
Oozie system and manage any job.
G Add the oozie.service.HadoopAccessorService.kerberos.enabled
property with the following property value: false
Indicates if Oozie is configured to use Kerberos.
H Add the oozie.authentication.simple.anonymous.allowed property with the
following property value: true
180
March 26, 2015
Indicates if anonymous requests are allowed. This setting is meaningful only when using 'simple'
authentication.
I
In oozie.services.ext, append to the existing property value the following
string, if is it is not already present:
org.apache.oozie.service.PartitionDependencyManagerService,org.apache.oozie.s
ervice.HCatAccessorService
J Update Oozie Configuration Properties for HDP 2.2.x
Using Ambari Web UI > Services > Oozie > Configs > oozie-site.xml:
Ä Add
Name
Value
oozie.authentication.simple.anonymo
true
us.allowed
oozie.service.coord.check.maximum.f false
requency
oozie.service.ELService.ext.functions. now=org.apache.oozie.extensions.OozieELExtensions#ph2_
coord-action-create
now,
today=org.apache.oozie.extensions.OozieELExtensions#ph
2_today,
yesterday=org.apache.oozie.extensions.OozieELExtensions
#ph2_yesterday,
currentMonth=org.apache.oozie.extensions.OozieELExtensi
ons#ph2_currentMonth,
lastMonth=org.apache.oozie.extensions.OozieELExtensions
#ph2_lastMonth,
currentYear=org.apache.oozie.extensions.OozieELExtensio
ns#ph2_currentYear,
lastYear=org.apache.oozie.extensions.OozieELExtensions#
ph2_lastYear,
latest=org.apache.oozie.coord.CoordELFunctions#ph2_coor
d_latest_echo,
future=org.apache.oozie.coord.CoordELFunctions#ph2_coo
rd_future_echo,
formatTime=org.apache.oozie.coord.CoordELFunctions#ph
2_coord_formatTime,
user=org.apache.oozie.coord.CoordELFunctions#coord_use
r
181
oozie.service.ELService.ext.functions.
coord-action-create-inst
coord-action-start
March 26, 2015
now=org.apache.oozie.extensions.OozieELExtensions#ph2_
now_inst,
2_today_inst,
#ph2_yesterday_inst,
ons#ph2_currentMonth_inst,
#ph2_lastMonth_inst,
ns#ph2_currentYear_inst,
ph2_lastYear_inst,
d_latest_echo,
rd_future_echo,
2_coord_formatTime,
r
now,
2_today,
#ph2_yesterday,
ons#ph2_currentMonth,
#ph2_lastMonth,
ns#ph2_currentYear,
ph2_lastYear,
d_latest,
rd_future,
dataIn=org.apache.oozie.extensions.OozieELExtensions#ph
3_dataIn,
instanceTime=org.apache.oozie.coord.CoordELFunctions#p
h3_coord_nominalTime,
dateOffset=org.apache.oozie.coord.CoordELFunctions#ph3
_coord_dateOffset,
3_coord_formatTime,
r
182
coord-job-submit-data
coord-job-submit-instances
coord-sla-create
coord-sla-submit
March 26, 2015
now_echo,
1_today_echo,
#ph1_yesterday_echo,
ons#ph1_currentMonth_echo,
#ph1_lastMonth_echo,
ns#ph1_currentYear_echo,
ph1_lastYear_echo,
dataIn=org.apache.oozie.extensions.OozieELExtensions#ph
1_dataIn_echo,
h1_coord_nominalTime_echo_wrap,
1_coord_formatTime_echo,
dateOffset=org.apache.oozie.coord.CoordELFunctions#ph1
_coord_dateOffset_echo,
r
now_echo,
1_today_echo,
#ph1_yesterday_echo,
ons#ph1_currentMonth_echo,
#ph1_lastMonth_echo,
ns#ph1_currentYear_echo,
ph1_lastYear_echo,
1_coord_formatTime_echo,
d_latest_echo,
rd_future_echo
h2_coord_nominalTime,
r
h1_coord_nominalTime_echo_fixed,
r
183
oozie.service.HadoopAccessorService
.kerberos.enabled
oozie.service.HadoopAccessorService
.supported.filesystems
Ä Modify
Name
oozie.service.Schema
Service.wf.ext.schem
as
oozie.services.ext
C
4
1
March 26, 2015
false
*
Value
shell-action-0.1.xsd,shell-action-0.2.xsd,shell-action-0.3.xsd,email-action0.1.xsd,email-action-0.2.xsd,hive-action-0.2.xsd,hive-action-0.3.xsd,hiveaction-0.4.xsd,hive-action-0.5.xsd,sqoop-action-0.2.xsd,sqoop-action0.3.xsd,sqoop-action-0.4.xsd,ssh-action-0.1.xsd,ssh-action-0.2.xsd,distcpaction-0.1.xsd,distcp-action-0.2.xsd,oozie-sla-0.1.xsd,oozie-sla-0.2.xsd
org.apache.oozie.service.JMSAccessorService,org.apache.oozie.service.Par
titionDependencyManagerService,org.apache.oozie.service.HCatAccessorS
ervice
After modifying all properties on the Oozie Configs page, choose Save to update
oozie.site.xml, using the updated configurations.
Replace the content of /usr/oozie/share in HDFS.
On the Oozie server host:
Extract the Oozie sharelib into a tmp folder.
mkdir -p /tmp/oozie_tmp;
cp /usr/hdp/2.2.x.0-<$version>/oozie/oozie-sharelib.tar.gz /tmp/oozie_tmp;
cd /tmp/oozie_tmp;
tar xzvf oozie-sharelib.tar.gz;
2
Back up the /user/oozie/share folder in HDFS and then delete it. If you have any
custom files in this folder, back them up separately and then add them to the /share
folder after updating it.
mkdir /tmp/oozie_tmp/oozie_share_backup;
chmod 777 /tmp/oozie_tmp/oozie_share_backup;
su -l <HDFS_USER> -c "hdfs dfs -copyToLocal /user/oozie/share
/tmp/oozie_tmp/oozie_share_backup";
su -l <HDFS_USER> -c "hdfs dfs -rm -r /user/oozie/share";
3 Add the latest share libs that you extracted in step 1. After you have added the files,
modify ownership and acl.
su -l <HDFS_USER> -c "hdfs dfs -copyFromLocal /tmp/oozie_tmp/share
/user/oozie/.";
su -l <HDFS_USER> -c "hdfs dfs -chown -R <OOZIE_USER>:<HADOOP_GROUP>
/user/oozie";
su -l <HDFS_USER> -c "hdfs dfs -chmod -R 755 /user/oozie";
4 Use the Ambari Web UI > Services view to start the Oozie service. Make sure that
ServiceCheck passes for Oozie.
25 Update WebHCat.
A
Modify the webhcat-site config type.
184
Action
Modify
March 26, 2015
Using Ambari W eb, navigate to Services > WebHCat and modify the following
configuration:
Property Name
Property Value
templeton.storage.class
org.apache.hive.hcatalog.templeton.tool.ZooKeeperStorage
B
Expand Advanced > webhcat-site.xml.
Check if property templeton.port exists. If not, then add it using the Custom
webhcat-site panel. The default value for templeton.port = 50111.
C
On each WebHCat host, update the Pig and Hive tar bundles, by updating the
following files:
•
•
/apps/webhcat/pig.tar.gz
/apps/webhcat/hive.tar.gz
Find these files only on a host where WebHCat is installed.
For example, to update a *.tar.gz file:
1 Move the file to a local directory.
/apps/webhcat/*.tar.gz <local_backup_dir>"
2 Remove the old file.
/apps/webhcat/*.tar.gz"
3 Copy the new file.
/usr/hdp/2.2.x.0-<$version>/hive/hive.tar.gz /apps/webhcat/"; su -l
<HCAT_USER> -c "hdfs --config /etc/hadoop/conf dfs -copyFromLocal
/usr/hdp/2.2.x.0-<$version>/pig/pig.tar.gz /apps/webhcat/";
D
On each WebHCat host, update /app/webhcat/hadoop-streaming.jar file.
1
Move the file to a local directory.
/apps/webhcat/hadoop-streaming*.jar <local_backup_dir>"
2
/apps/webhcat/hadoop-streaming*.jar"
3
Copy the new hadoop-streaming.jar file.
185
March 26, 2015
/usr/hdp/2.2.x.0-<$version>/hadoop-mapreduce/hadoop-streaming*.jar
/apps/webhcat"
26 Prepare Tez for work.
Add the Tez service to your cluster using the Ambari Web UI, if Tez was not installed earlier.
Configure Tez.
cd /var/lib/ambari-server/resources/scripts/;
./configs.sh set localhost <your-cluster-name> cluster-env
"tez_tar_source" "/usr/hdp/current/tez-client/lib/tez.tar.gz";
./configs.sh set localhost <your-cluster-name> cluster-env
"tez_tar_destination_folder" "hdfs:///hdp/apps/{{ hdp_stack_version
}}/tez/"
If you use Tez as the Hive execution engine, and if the variable hive.server2.enabled.doAs is
set to true, you must create a scratch directory on the NameNode host for the username that
will run the HiveServer2 service. For example, use the following commands:
sudo su -c "hdfs -makedir /tmp/hive- <username> "
sudo su -c "hdfs -chmod 777 /tmp/hive- <username> "
27 Using the Ambari W eb UI> Services > Hive, start the Hive service.
28 If you use Tez as the Hive execution engine, and if the variable
hive.server2.enabled.doAs is set to true, you must create a scratch directory on the
NameNode host for the username that will run the HiveServer2 service. For example, use the
following commands:
sudo su -c "hdfs -makedir /tmp/hive-<username>"
sudo su -c "hdfs -chmod 777 /tmp/hive-<username>"
29 Using Ambari W eb > Services, re-start the remaining services.
30 The upgrade is now fully functional but not yet finalized. Using the finalize command
removes the previous version of the NameNode and DataNode storage directories.
After the upgrade is finalized, the system cannot be rolled back. Usually this step is not
taken until a thorough testing of the upgrade has been performed.
The upgrade must be finalized before another upgrade can be performed.
186
March 26, 2015
Directories used by Hadoop 1 services set in /etc/hadoop/conf/taskcontroller.cfg are
not automatically deleted after upgrade. Administrators can choose to delete these
directories after the upgrade.
To finalize the upgrade, execute the following command once, on the primary NameNode
host in your HDP cluster:
sudo su -l <HDFS_USER> -c "hdfs dfsadmin -finalizeUpgrade"
Automated HDP Stack Upgrade: HDP 2.2.0 to 2.2.4
Ambari 2.0 has the capability to perform an automated cluster upgrade for maintenance and patch
releases for the Stack. This capability is available for HDP 2.2 Stack only. If you have a cluster
running HDP 2.2, you can perform Stack upgrades to later maintenance and patch releases. For
example: you can upgrade from the GA release of HDP 2.2 (which is HDP 2.2.0.0) to the first
maintenance release of HDP 2.2 (which is HDP 2.2.4.2).
This section describes the steps to perform an upgrade from HDP 2.2.0 to HDP 2.2.4.
•
Prerequisites
•
Preparing to Upgrade
•
Registering New Version
•
Installing New Version
•
Performing an Upgrade
Prerequisites
To perform an automated cluster upgrade from Ambari, your cluster must meet the following
prerequisites:
Item
Cluster
Requirement
Stack Version
Version
Target Version
HDFS
NameNode HA
HDFS
Decommission
YARN
Hosts
YARN WPR
Heartbeats
Hosts
Maintenance
Mode
Services Up
Maintenance
Mode
Services
Services
Description
Must be running HDP 2.2 Stack. This capability is not available for HDP
2.0 or 2.1 Stacks.
All hosts must have the target version installed. See the Register Version
and Install Version sections for more information.
NameNode HA must be enabled and working properly. See the Ambari
User’s Guide for more information Configuring NameNode High
Availability.
No components should be in decommissioning or decommissioned
state.
Work Preserving Restart must be configured.
All Ambari Agents must be heartbeating to Ambari Server. Any hosts that
are not heartbeating must be in Maintenance Mode.
Any hosts in Maintenance Mode must not be hosting any Service master
components.
All Services must be started.
No Services can be in Maintenance Mode.
187
March 26, 2015
If you do not meet the upgrade prerequisite requirements listed above, you can consider a Manual
Upgrade of the cluster.
Preparing to Upgrade
It is highly recommended that you perform backups of your Hive Metastore and Oozie
Server databases prior to beginning upgrade.
Registering a New Version
Register the HDP 2.2.4.2 Version
•
Log in to Ambari.
•
Browse to Admin > Stack and Versions.
•
Click on the Versions tab. Click Manage Versions.
•
•
Enter a two-digit version number. For example, enter 4.2 (which makes the version name
HDP-2.2.4.2).
•
Select one or more OS families and enter the respective Base URLs.
•
Click Save.
•
You can click “Install On...MyCluster”, or you can browse back to Admin > Stack and
Versions. You will see the version current running (HDP 2.2.0.0) and the version you just
registered (HDP 2.2.4.2). Proceed to Install a New Version on All Hosts.
Installing a New Version on All Hosts
Install HDP 2.2.4.2 on All Hosts
1
Log in to Ambari.
2
3
188
March 26, 2015
4
Click Install Packages and click OK to confirm.
5
6
You can browse to Hosts and to each host > Versions tab to see the new version is installed.
Proceed to Perform Upgrade.
Performing an Upgrade
Perform the Upgrade to HDP 2.2.4.2
1
Log in to Ambari.
2
3
4
Click Perform Upgrade.
Manual HDP Stack Upgrade: HDP 2.2.0 to 2.2.4
The following sections describe the steps involved with performing a manual Stack upgrade:
•
•
•
Performing a Manual Upgrade
This is an alternative to using the Automated Upgrade feature of Ambari when using
the HDP 2.2 Stack.
Register the HDP 2.2.4.2 Version
•
Log in to Ambari.
•
•
Click on the Versions tab. You will see the version current running HDP-2.2.0.0-2041.
•
Click Manage Versions.
•
•
Enter a two-digit version number. For example, enter 4.2(which makes the version name
HDP-2.2.4.2).
•
Select one or more OS families and enter the repository Base URLs for that OS.
•
Click Save.
189
•
March 26, 2015
Click Go to Dashboard and browse back to Admin > Stack and Versions > Versions.
You will see the current running version HDP-2.2.0.0-2041 and the version you just
registered HDP-2.2.4.2. Proceed to Install a New Version on All Hosts.
Install HDP 2.2.4.2 on All Hosts
1
Log in to Ambari.
2
3
4
Click Install Packages and click OK to confirm.
5
6
You can browse to Hosts and to each host > Versions tab to see the new version is installed.
Proceed to Perform Manual Upgrade.
Performing a Manual Upgrade
Perform the Manual Upgrade to HDP 2.2.4.2
1
Log in to Ambari.
2
3
4
Under the newly registered and installed version HDP-2.2.4.2, is the actual software
repository version in parenthesis (Ambari determined this repository version during the
install). For example, in the picture below the display name is HDP-2.2.4.2 and the
repository version 2.2.4.2-2. Record this repository version. You will use it later in the
manual upgrade process.
5
Stop all services from Ambari. On the Services tab, in the Service navigation area Actions
button, select Stop All to stop all services.
190
March 26, 2015
If you are upgrading a NameNode HA configuration, keep your JournalNodes running
while performing this upgrade procedure. Upgrade, rollback and finalization operations
on HA NameNodes must be performed with all JournalNodes running.
6
Go to the command line on each host and move the current HDP version to the newly
installed version using the hdp-select utility and repository version number (obtained in Step
4).
hdp-select set all {repository-version}
For example:
hdp-select set all 2.2.4.2-2
7
Restart all services from Ambari. One by one, browse to each Service in Ambari Web, and in
the Service Actions menu select Restart All. Do not select Start All. You must use
Restart All. For example, browse to Ambari W eb > Services > HDFS and select
Restart All.
8
During a manual upgrade, it is necessary for all components to advertise the version that they
are on. This is typically done by Restarting an entire Service. However, client-only
services (e.g., Pig, Tez and Slider) do not have a Restart command. Instead, they need an
API call that will trigger the same behavior. For each of services installed that are client-only
issue an Ambari REST API call that will cause the hosts running these clients to advertise
their version. Perform this REST API call for each client-only service configured in your
cluster:
curl -X POST -u username:password -H 'X-Requested-By:ambari'
http://ambari.server:8080/api/v1/clusters/MyCluster/requests ‘{
"RequestInfo":
{ "command":"RESTART",
"context":"Restart all components for TEZ_CLIENT",
"operation_level": {
"level":"SERVICE",
"cluster_name":"MyCluster",
"service_name":"TEZ"
}
},
"Requests/resource_filters": [{
"service_name":"TEZ",
"component_name":"TEZ_CLIENT",
"hosts":"c6401.ambari.apache.org,c6402.apache.ambari.org"}]
}’
Replace the Ambari Server username + password, Ambari Server hostname, your cluster
name, service name + component name (see the following table), and the list of hosts in your
cluster that are running the client.
Service
Tez
Pig
Slider
Sqoop
service_name
TEZ
PIG
SLIDER
SQOOP
component_name
TEZ_CLIENT
PIG
SLIDER
SQOOP
191
9
March 26, 2015
After all the services are confirmed to be started and healthy, go to the command line on the
Ambari Server and run the following to finalize the upgrade, which will move the current
version to the new version.
ambari-server set-current --cluster-name=MyCluster --version-displayname=HDP-2.2.4.2
Ambari Admin login: admin
Ambari Admin password: *****
10 If the ambari-server set-current command is not successful, try restarting the Ambari
Server and waiting for all agents to re-register before trying again.
192
March 26, 2015
Administering Ambari
Apache Ambari is a system to help you provision, manage and monitor Hadoop clusters. This guide
is intended for Cluster Operators and System Administrators responsible for installing and
maintaining Ambari and the Hadoop clusters managed by Ambari. Installing Ambari creates a default
user with "Admin Admin" privilege, with the following username/password: admin/admin.
When you sign into Ambari as Ambari Admin, you can:
•
Perform Ambari Admin Tasks
•
Create and Manage a Cluster
•
Manage Stack and Versions
•
Manage Users and Groups
•
Manage Views
For specific information about provisioning an HDP cluster, see Install, Configure, and Deploy an
HDP Cluster.
Terms and Definitions
The following basic terms help describe the key concepts associated with Ambari Administration.
Term
Ambari
Admin
Account
Cluster
Group
Group Type
Permissions
Principal
Privilege
Resource
User
User Type
Version
View
Definition
Specific privilege granted to a user that enables the user to administer Ambari. The
default user admin created by Ambari is flagged as an “Ambari Admin”. Users with the
Ambari Admin privilege can grant, or revoke this privilege on other users.
User name, password and privileges.
Installation of a Hadoop cluster, based on a particular Stack, that is managed by
Ambari.
Unique group of users in Ambari.
Local and LDAP. Local groups are maintained in the Ambari database. LDAP groups
are imported (and synchronized) with an external LDAP (if configured).
Represents the permission that can be granted to a principal (user or group) on a
particular resource.
For example, cluster resources support Operator and Read-Only permissions.
User or group that can be authenticated by Ambari.
Represents the mapping of a principal to a permission and a resource.
For example: the user admin is granted the permission Operator on cluster
DevCluster.
Represents the resource available and managed in Ambari. Ambari supports two types
of resources: cluster and view. An Ambari Admin assigns permissions for a resource
for users and groups.
Unique user in Ambari.
Local and LDAP. Local users are maintained in the Ambari database and
authentication is performed against the Ambari database. LDAP users are imported
(and synchronized) with an external LDAP (if configured).
Represents a Stack version, which includes a set of repositories to install that version
on a cluster. For more information about Stack versions, see Managing Stack and
Versions.
Defines a user interface component that is available to Ambari.
193
March 26, 2015
Logging in to Ambari
After installing Ambari, you can log in to Ambari as follows:
1
Enter the following URL in a web browser:
http://<your.ambari.server>:8080
where <your.ambari.server> is the hostname for your Ambari server machine and 8080
is the default HTTP port.
2
Enter the user account credentials for the default administrative user automatically created
during install:
username/password = admin/admin
3
The Ambari Administrationweb page displays. From this page you can Manage Users and
Groups, Manage Views, Manage Stack and Versions, and Create a Cluster.
About the Ambari Administration Interface
When you log in to the Ambari Administration interface with "Ambari Admin" privilege, a landing page
displays links to the operations available. Plus, the operations are available from the left menu for
clusters, views, users, and groups.
•
Clusters displays a link to a cluster (if created) and links to manage access permissions for
that cluster.
See Creating and Managing a Cluster for more information.
•
User and Group Management provides the ability create and edit users and groups.
See Managing Users and Groups for more information.
•
Views lets you to create and edit instances of deployed Views and manage access
permissions for those instances.
See Managing Views for more information.
•
Versions provides the ability to manage the Stack versions that are available for the clusters.
See Managing Stack and Versions for more information.
194
March 26, 2015
Changing the Administrator Account Password
During install and setup, the Cluster Installer wizard automatically creates a default user with "Ambari
Admin" privilege. You can change the password for this user (or other Local users in the system)
from the Ambari Administration interface. You can change the password for the default admin user
to create a unique administrator credential for your system.
To change the password for the default admin account:
1
Browse to the Users section.
2
Select the admin user.
3
Click the Change Password button.
4
Enter the current admin password and the new password twice.
5
Click OK to save the new password.
Ambari Admin Tasks
An "Ambari Admin" has administrator (or super-user) privilege. When logged into Ambari with the
"Ambari Admin" privilege, you can:
•
Create a cluster
•
Set access permissions for an existing cluster
•
Create, delete, and edit view instances
•
Manage permissions for view instances
•
Create, edit, and delete users and user groups
For more information about creating Ambari users locally and importing Ambari LDAP users, see
Managing Users and Groups.
Creating a Cluster
As an Ambari Admin, you can launch the Cluster Install Wizard and create a cluster.
To create a cluster, from the Ambari Administration interface:
1
Click Install Cluster.
The Cluster Install Wizard displays.
2
Follow the steps in the wizard to install your cluster.
For more information about prerequisites and system requirements, see Installing HDP using Ambari.
195
March 26, 2015
Setting Cluster Permissions
After you create a cluster, users with Admin Admin privileges automatically get Operator permission
on the cluster. By default, no users have access to the cluster. You can grant permissions on the
cluster to other users and groups from the Ambari Administration interface.
Ambari manages the following permissions for a cluster: Operator and Read-Only. Users and
Groups with Operator permission are granted access to the cluster. Operator permission provides
full control of the following services:
•
Start
•
Stop
•
Restart
•
Add New
And The Following Configurations:
•
Modify
•
Revert
Users and Groups with Read-Only permission can only view, not modify, services and
configurations.
Users with Ambari Admin privileges are implicitly granted Operator permission. Plus, Ambari Admin
users have access to the Ambari Administration interface which allows them to control permissions
for the cluster.
To modify user and group permissions for a cluster:
1
As an Ambari Admin, access the Ambari Administration interface.
2
Click Permissions, displayed under the cluster name.
3
The form showing the permissions Operator and Read-Only with users and groups is
displayed.
4
Modify the users and groups mapped to each permission and save.
For more information about managing users and groups, see Managing Users and Groups.
Assigning permissions to a group having no members is possible.
Verify user permissions, group membership, and group permissions to ensure that
each user and group has appropriate permissions.
196
March 26, 2015
Viewing the Cluster Dashboard
After you have created a cluster, select Clusters > Go to Dashboard to open the Dashboard
view. For more information about using Ambari to monitor and manage your cluster, see Monitoring
and Managing your HDP Cluster with Ambari.
Renaming a Cluster
A user with Admin Admin privileges can rename a cluster, using the Ambari Administration interface.
To rename a cluster:
1
In Clusters, click the Rename Cluster icon, next to the cluster name.
The cluster name becomes write-able.
2
Enter alphanumeric characters as a cluster name.
3
Click the check mark.
4
Confirm.
197
March 26, 2015
Managing Users and Groups
An "Ambari Admin" can create and manage users and groups available to Ambari. An Ambari Admin
can also import user and group information into Ambari from external LDAP systems. This section
describes the specific tasks you perform when managing users and groups in Ambari.
•
Local and LDAP User Types
•
Ambari Admin Privileges
•
Creating a Local User
•
Setting User Status
•
Setting the Ambari Admin Flag
•
Changing the Password for a Local User
•
Deleting a Local User
•
Creating a Local Group
•
Managing Group Membership
•
Deleting a Local Group
Users and Groups Overview
Ambari supports two types of users and groups: Local and LDAP. The following topics describe how
Ambari Administration supports managing Local and LDAP users and groups.
•
Local and LDAP User and Group Types
•
Local and LDAP User and Group Types
Local users are stored in and authenticate against the Ambari database. LDAP users have basic
account information stored in the Ambari database. Unlike Local users, LDAP users authenticate
against an external LDAP system.
Local groups are stored in the Ambari database. LDAP groups have basic information stored in the
Ambari database, including group membership information. Unlike Local groups, LDAP groups are
imported and synchronized from an external LDAP system.
To use LDAP users and groups with Ambari, you must configure Ambari to authenticate against an
external LDAP system. For more information about running ambari-server setup-ldap, see Configure
Ambari to use LDAP Server. A new Ambari user or group, created either locally or by synchronizing
against LDAP, is granted no privileges by default. You, as an Ambari Admin, must explicitly grant
each user permissions to access clusters or views.
As an Ambari Admin, you can create new users, delete users, change user passwords and edit user
settings. You can control certain privileges for Local and LDAP users. The following table lists the
privileges available and those not available to the Ambari Admin for Local and LDAP Ambari users.
198
March 26, 2015
Table 19. Ambari Administrator Privileges for Ambari Local and LDAP Users Administrator User
Privilege
Change Password
Set Ambari Admin Flag
Change Group Membership
Delete User
Set Active / Inactive
Local
User
Available
Available
Available
Available
Available
LDAP User
Not Available
Available
Not Available
Not Available
Available
Creating a Local User
To create a local user:
1
Browse to Users.
2
Click Create Local User.
3
Enter a unique user name.
4
Enter a password, then confirm that password.
5
Click Save.
Setting User Status
User status indicates whether the user is active and should be allowed to log into Ambari or should
be inactive and denied the ability to log in. By setting the Status flag as Active or Inactive, you can
effectively "disable" user account access to Ambari while preserving the user account information
related to permissions.
To set user Status:
1
On the Ambari Administration interface, browse to Users.
2
Click the user name of the user to modify.
3
Click the Status control to toggle between Active or Inactive.
4
Choose OK to confirm the change.
The change is saved immediately.
Setting the Ambari Admin Flag
You can elevate one or more users to have Ambari administrative privileges, by setting the Ambari
Admin flag. You must be logged in as an account that is an Ambari Admin to set or remove the
Ambari Admin flag.
To set the Ambari Admin Flag:
1
Browse to the Users section.
2
Click the user name you wish to modify.
3
Click on the Ambari Admin control.
4
Switch Yes to set, or No to remove the Admin flag.
199
March 26, 2015
To prevent you from accidently locking yourself out of the Ambari Administration user
interface, Ambari prevents setting the Ambari Admin flag for your own Ambari Admin
account to No.
Changing the Password for a Local User
An Ambari Administrator can change local user passwords. LDAP passwords are not managed by
Ambari since LDAP users authenticate to external LDAP. Therefore, LDAP user passwords cannot be
changed from Ambari.
To change the password for a local user:
1
Browse to the user.
2
Click Change password.
3
Enter YOUR administrator password to confirm that you have privileges required to change a
local user password.
4
Enter a password, then confirm that password.
5
Click Save.
Deleting a Local User
Deleting a local user removes the user account from the system, including all privileges associated
with the user. You can reuse the name of a local user that has been deleted.
To delete a local user:
1
Browse to the User.
2
Click Delete User.
3
Confirm.
If you want to disable user log in, set the user Status to Inactive.
Creating a Local Group
To create a local group:
1
Browse to Groups.
2
Click Create Local Group.
3
Enter a unique group name.
4
Click Save.
200
March 26, 2015
Managing Group Membership
You can manage group membership of Local groups by adding or removing users from groups.
•
Adding a User to a Group
•
Modifying Group Membership
Adding a User to a Group
To add a user to group:
1
Browse to Groups.
2
Click a name in the Group Name list.
3
Choose the Local Members control to edit the member list.
4
In the empty space, type the first character in an existing user name.
5
From the list of available user names, choose a user name.
6
Click the check mark to save the current, displayed members as group members.
Modifying Group Membership
To modify Local group membership:
1
In the Ambari Administration interface, browse to Groups.
2
Click the name of the Group to modify.
3
Choose the Local Members control to edit the member list.
4
Click in the Local Members text area to modify the current membership.
5
Click the X to remove a user.
6
To save your changes, click the checkmark. To discard your changes, click the x.
Deleting a Local Group
Deleting a local group removes all privileges associated with the group.
To delete a local group:
1
Browse to the Group.
2
Click Delete Group.
3
Confirm. The group is deleted and the associated group membership information is removed.
201
March 26, 2015
Managing Views
The Ambari Views Framework offers a systematic way to plug in UI capabilities to surface custom
visualization, management and monitoring features in Ambari Web. The development and use of
Views allows you to extend and customize Ambari Web to meet your specific needs.
A View extends Ambari to let third parties plug in new resource types along with APIs, providers, and
UIs to support them. A View is deployed into the Ambari Server and Ambari Admins can create View
instances and set the privileges on access to users and groups.
The following sections cover the basics of Views and how to deploy and manage View instances in
Ambari:
•
Terminology
•
Basic Concepts
•
Deploying Views
•
Creating View Instances
•
Setting View Permissions
•
Additional Information
Terminology
The following are Views terms and concepts you should be familiar with:
Term
Views
Framework
View
Definition
View
Package
View
Deployment
View Name
View
Version
View
Instance
View
Instance
Name
Framework
Services
Description
The core framework that is used to develop a View. This is very similar to a Java Web
App.
Describes the View resources and core View properties such as name, version and any
necessary configuration properties. On deployment, the View definition is read by
Ambari.
Packages the View client and server assets (and dependencies) into a bundle that is
ready to deploy into Ambari.
Deploying a View into Ambari. This makes the View available to Ambari Admins for
creating instances.
Unique identifier for a View. A View can have one or more versions of a View. The
name is defined in the View Definition (created by the View Developer) that is built into
the View Package.
Specific version of a View. Multiple versions of a View (uniquely identified by View
name) can be deployed into Ambari.
Instantiation of a specific View version. Instances are created and configured by
Ambari Admins and must have a unique View instance name.
Unique identifier of a specific instance of View.
View context, instance data, configuration properties and events are available from the
Views Framework.
202
March 26, 2015
Basic Concepts
Views are basically Web applications that can be “plugged into” Ambari. Just like a typical web
application, a View can include server-side resources and client-side assets. Server-side resources,
which are written in Java, can integrate with external systems (such as cluster services) and expose
REST end-points that are used by the view. Client-side assets, such as HTML/JavaScript/CSS,
provide the UI for the view that is rendered in the Ambari Web interface.
Ambari exposes the Views Framework as the basis for View development. The Framework provides
the following:
•
Method for describing and packaging a View
•
Method for deploying a View
•
Framework services for a View to integrate with Ambari
•
Method for managing View versions, instances, and permissions
The Views Framework is separate from Views themselves. The Framework is a core feature of Ambari
and Views build on that Framework. Although Ambari does include some Views out-of-the-box, the
feature of Ambari is the Framework to enable the development, deployment and creation of views.
The development and delivery of a View follows this process flow:
•
Develop the View (similar to how you would build a Web application)
•
Package the View (similar to a WAR)
•
Deploy the View into Ambari (using the Ambari Administration interface)
•
Create and configure instances of the View (performed by Ambari Admins)
Considering the above, it is important to understand the different personas involved. The following
table describes the three personas:
Persona
View
Developer
Ambari Admin
View User
Description
Person who builds the front-end and back-end of a View and uses the Framework
services available during development. The Developer created the View, resulting in
a View Package that is delivered to an Ambari Admin.
Ambari user that has Ambari Admin privilege and uses the Views Management
section of the Ambari Administration interface to create and managing instances of
Views. Ambari Admin also deploys the View Packages delivered by the View
Developer.
Ambari user that has access to one or more Views in Ambari Web. Basically, this is
the end user.
This document covers the tasks related to an Ambari Admin using and making Views
available to users in their Ambari deployment. This document does not cover View
development and packaging. See Additional Information for more information on where
to obtain information about developing Views.
After Views are developed, views are identified by unique a view name. Each View can have one or
more View versions. Each View name + version combination is deployed as a single View package.
Once a View package is deployed, the Ambari Admin can create View instances, where each
instance is identified by a unique View instance name. The Ambari Admin can then set access
permissions for each View instance.
203
March 26, 2015
Deploying a View
Deploying a View involves obtaining the View Package and making the View available to the Ambari
Server. Each View deployed has a unique name. Multiple versions of a View can be deployed at the
same time. You can configure multiple versions of a View for your users, depending on their roles,
and deploy these versions at the same time.
For more information about building Views, see the Apache Ambari Wiki page.
1
Obtain the View package. For example, files-0.1.0.jar.
2
On the Ambari Server host, browse to the views directory.
cd /var/lib/ambari-server/resources/views
3
Copy the View package into place.
4
Restart Ambari Server.
ambari-server restart
5
The View is extracted, registered with Ambari, and displays in the Ambari Administration
interface as available to create instances.
/var/lib/ambari-server/resources/views is the default directory into which
Views are deployed. You can change the default location by editing the views.dir
property in ambari.properties.
Creating View Instances
To create a View instance:
1
Browse to a View and expand.
2
Click the “Create Instance” button.
3
Provide the following information:
Item
View
Version
Instance
Name
Display
Label
Description
Required
Yes
Description
Select the version of the View to instantiate.
Yes
Must be unique for a given View.
Yes
Visible
No
Properties
Maybe
Readable display name used for the View instance when shown in
Ambari Web.
Readable description used for the View instance when shown in Ambari
Web.
Designates whether the View is visible or not visible to the end-user in
Ambari web. Use this property to temporarily hide a view in Ambari Web
from users.
Depends on the View. If the View requires certain configuration
properties, you are prompted to provide the required information.
Yes
204
March 26, 2015
Setting View Permissions
After a view instance has been created, an Ambari Admin can set which users and groups can
access the view by setting the Use permission. By default, after view instance creation, no
permissions are set on a view.
To set permissions on a view:
1
Browse to a view and expand. For example, browse to the Slider or Jobs view.
2
Click on the view instance you want to modify.
3
In the Permissions section, click the Users or Groups control.
4
Modify the user and group lists as appropriate.
5
Click the check mark to save changes.
The Framework provides a way for view developers to specify custom permissions,
beyond just the default Use permission. If custom permissions are are specified, they
will show up in the Ambari Administration interface and the Ambari Admin can set
users and groups on these permissions. See Additional Information for more
information on developing with the Views framework.
Additional Information
To learn more about developing views and the views framework itself, refer to the following
resources:
Resource
Views
Wiki
Descripti
on
Learn
about
the
Views
Framewo
rk and
Framewo
rk
services
available
to views
develope
rs.
Link
https://cwiki.apache.org/confluence/display/AMBARI/Viewsche.org/confluen
ce/display/AMBARI/Views
205
Views API
Views
Examples
View
Contributi
ons
4
Covers
the
Views
REST
API and
associat
ed
framewo
rk Java
classes.
Code for
example
views
that
hover
different
areas of
the
framewo
rk and
framewo
rk
services.
Views
that are
being
develope
d and
contribut
ed to the
Ambari
communi
ty.4
March 26, 2015
https://github.com/apache/ambari/blob/trunk/ambari-views/docs/index.md
https://github.com/apache/ambari/tree/trunk/ambari-views/examples
https://github.com/apache/ambari/tree/trunk/contrib/views
The Views in the community are not supported by Hortonworks.
206
March 26, 2015
Ambari Security Guide
Ambari and Hadoop have many advanced security options. This guide provides information on
configuring Ambari and Hadoop for strong authentication with Kerberos, as well as other security
options.
•
Configuring Ambari and Hadoop for Kerberos
•
Set Up Ambari for LDAP or AD Authentication
•
Encrypting Ambari Database and LDAP Passwords
•
Set Up SSL for Ambari Server
•
Set Up Two-Way SSL for Ambari Server and Agents
•
Configure Ciphers and Protocols for Ambari Server
Configuring Ambari and Hadoop for Kerberos
This topic describes how to configure Kerberos for strong authentication for Hadoop users and hosts
in an Ambari-managed cluster.
•
Kerberos Overview
•
Hadoop and Kerberos Principals
•
Installing and Configuring the KDC
•
Enabling Kerberos Security in Ambari
Kerberos Overview
Strongly authenticating and establishing a user’s identity is the basis for secure access in Hadoop.
Users need to be able to reliably “identify” themselves and then have that identity propagated
throughout the Hadoop cluster. Once this is done, those users can access resources (such as files or
directories) or interact with the cluster (like running MapReduce jobs). Besides users, Hadoop cluster
resources themselves (such as Hosts and Services) need to authenticate with each other to avoid
potential malicious systems or daemon’s “posing as” trusted components of the cluster to gain
access to data.
Hadoop uses Kerberos as the basis for strong authentication and identity propagation for both user
and services. Kerberos is a third party authentication mechanism, in which users and services rely
on a third party - the Kerberos server - to authenticate each to the other. The Kerberos server itself is
known as the Key Distribution Center, or KDC. At a high level, it has three parts:
•
A database of the users and services (known as principals) that it knows about and their
respective Kerberos passwords
•
An Authentication Server (AS) which performs the initial authentication and issues a
Ticket Granting Ticket (TGT)
•
A Ticket Granting Server (TGS) that issues subsequent service tickets based on the initial
TGT
207
March 26, 2015
A user principal requests authentication from the AS. The AS returns a TGT that is encrypted using
the user principal's Kerberos password, which is known only to the user principal and the AS. The
user principal decrypts the TGT locally using its Kerberos password, and from that point forward,
until the ticket expires, the user principal can use the TGT to get service tickets from the TGS.
Service tickets are what allow a principal to access various services.
Because cluster resources (hosts or services) cannot provide a password each time to decrypt the
TGT, they use a special file, called a keytab, which contains the resource principal's authentication
credentials. The set of hosts, users, and services over which the Kerberos server has control is called
a realm.
Table 20. Terminology Term
Key Distribution Center, or KDC
Kerberos KDC Server
Kerberos Client
Principal
Keytab
Realm
KDC Admin Account
Description
The trusted source for authentication in a Kerberosenabled environment.
The machine, or server, that serves as the Key
Distribution Center (KDC).
Any machine in the cluster that authenticates against
the KDC.
The unique name of a user or service that
authenticates against the KDC.
A file that includes one or more principals and their
keys.
The Kerberos network that includes a KDC and a
number of Clients.
An administrative account used by Ambari to create
principals and generate keytabs in the KDC.
Table 21. Terminology Hadoop and Kerberos Principals
Each service and sub-service in Hadoop must have its own principal. A principal name in a given
realm consists of a primary name and an instance name, in this case the instance name is the FQDN
of the host that runs that service. As services do not log in with a password to acquire their tickets,
their principal's authentication credentials are stored in a keytab file, which is extracted from the
Kerberos database and stored locally in a secured directory with the service principal on the service
component host.
208
March 26, 2015
Principals and Keytabs Asset
Principa
ls
Keytabs
Convention
$service_component_name/$FQDN@EXAMPL
E.COM
$service_component_abbreviation.service.keyt
ab
Example
nn/c6401.ambari.apache.org@EXAMPLE
.COM
/etc/security/keytabs/nn.service.keytab
Table 22. Principal and Keytab Naming Conventions In addition to the Hadoop Service Principals, Ambari itself also requires a set of
Ambari Principals to perform service “smoke” checks and alert health checks.
Keytab files for the Ambari, or headless, principals reside on each cluster host, just as
keytab files for the service principals.
Notice in the preceding example the primary name for each service principal. These primary names,
such as nn or hive for example, represent the NameNode or Hive service, respectively. Each primary
name has appended to it the instance name, the FQDN of the host on which it runs. This convention
provides a unique principal name for services that run on multiple hosts, like DataNodes and
NodeManagers. Adding the host name serves to distinguish, for example, a request from DataNode A
from a request from DataNode B. This is important for the following reasons:
•
Compromised Kerberos credentials for one DataNode do not automatically lead to
compromised Kerberos credentials for all DataNodes.
•
If multiple DataNodes have exactly the same principal and are simultaneously connecting to
the NameNode, and if the Kerberos authenticator being sent happens to have same
timestamps, then the authentication is rejected as a replay request.
Installing and Configuring the KDC
Ambari is able to configure Kerberos in the cluster to work with an existing MIT KDC, or existing
Active Directory installation. This section describes the steps necessary to prepare for this
integration.
209
March 26, 2015
If you do not have an existing KDC (MIT or Active Directory), Install a new MIT KDC .
Please be aware that installing a KDC on a cluster host after installing the Kerberos
client may overwrite the krb5.conf file generated by Ambari.
•
Use an Existing MIT KDC
•
Use an Existing Active Directory
•
(Optional) Install a new MIT KDC
Use an Exisiting MIT KDC
To use an existing MIT KDC for the cluster, you must prepare the following:
•
Ambari Server and cluster hosts have network access to both the KDC and KDC admin
hosts.
•
KDC administrative credentials are on-hand.
Proceed with Enabling Kerberos Security in Ambari.
Use an Existing Active Directory Domain
To use an existing Active Directory domain for the cluster, you must prepare the following:
•
Ambari Server and cluster hosts have network access to, and be able to resolve the DNS
names of, the Domain Controllers.
•
Active Directory secure LDAP (LDAPS) connectivity has been configured.
•
Active Directory User container for principals has been created and is on-hand.
For example, "OU=Hadoop,OU=People,dc=apache,dc=org"
•
Active Directory administrative credentials with delegated control of “Create, delete, and
manage user accounts” on the previously mentioned User container are on-hand.
Proceed with Enabling Kerberos Security in Ambari.
(Optional) Install a new MIT KDC
The following gives a very high level description of the KDC installation process. To get more
information see specific Operating Systems documentation, such as RHEL documentation, CentOS
documentation, or SLES documentation.
Because Kerberos is a time-sensitive protocol, all hosts in the realm must be timesynchronized, for example, by using the Network Time Protocol (NTP). If the local
system time of a client differs from that of the KDC by as little as 5 minutes (the
default), the client will not be able to authenticate.
Install the KDC Server
1
Install a new version of the KDC server:
yum install krb5-server krb5-libs krb5-auth-dialog krb5-workstation
210
March 26, 2015
SLES 11
zypper install krb5 krb5-server krb5-client
Ubuntu 12
apt-get install krb5 krb5-server krb5-client
2
Using a text editor, open the KDC server configuration file, located by default here:
/etc/krb5.conf
3
Change the [realms] section of this file by replacing the default “kerberos.example.com”
setting for the kdc and admin_server properties with the Fully Qualified Domain Name of the
KDC server host. In the following example, “kerberos.example.com” has been replaced with
“my.kdc.server”.
[realms]
EXAMPLE.COM = {
kdc = my.kdc.server
admin_server = my.kdc.server
}
Create the Kerberos Database
•
Use the utility kdb5_util to create the Kerberos database.
kdb5_util create -s
SLES 11
kdb5_util create -s
Ubuntu 12
kdb5_util create -s
Start the KDC
•
Start the KDC server and the KDC admin server.
/etc/rc.d/init.d/krb5kdc start
/etc/rc.d/init.d/kadmin start
SLES 11
rckrb5kdc start
rckadmind start
Ubuntu 12
rckrb5kdc start
rckadmind start
211
March 26, 2015
When installing and managing your own MIT KDC, it is very important to set up the
KDC server to auto-start on boot.
For example:
chkconfig krb5kdc on
chkconfig kadmin on
SLES 11
chkconfig rckrb5kdc on
chkconfig rckadmind on
Ubuntu 12
update-rc.d rckrb5kdc defaults
update-rc.d rckadmind defaults
Create a Kerberos Admin
Kerberos principals can be created either on the KDC machine itself or through the network, using an
“admin” principal. The following instructions assume you are using the KDC machine and using the
kadmin.local command line administration utility. Using kadmin.local on the KDC machine
allows you to create principals without needing to create a separate "admin" principal before you
start.
You will need to provide these admin account credentials to Ambari when enabling
Kerberos. This allows Ambari to connect to the KDC, create the cluster principals and
generate the keytabs.
1
Create a KDC admin.
kadmin.local -q "addprinc admin/admin"
SLES 11
Ubuntu 12
2
Confirm that this admin principal has permissions in the KDC ACL.
For example, on RHEL/CentOS, check the /var/kerberos/krb5kdc/kadm5.acl file has an entry
like so to allow the */admin principal to administer the KDC for your specific realm. In this
case, for the EXAMPLE.COM realm: */[email protected] *. When using a realm that is
different than EXAMPLE.COM, ensure there is an entry for the realm you are using. If not
present, principal creation will fail. After editing the kadm5.acl, you must restart the kadmind
process.
212
March 26, 2015
Enabling Kerberos Security in Ambari
Ambari provides a wizard to help with enabling Kerberos in the cluster. This section provides
information on preparing Ambari before running the wizard, and the steps to run the wizard.
•
Installing the JCE
•
Running the Kerberos Wizard
Installing the JCE
Before enabling Kerberos in the cluster, you must deploy the Java Cryptography Extension (JCE)
security policy files on the Ambari Server and on all hosts in the cluster. Depending on your choice of
JDK and if your Ambari Server has Internet Access, Ambari has a few options and actions for you to
pursue.
Scenario
If you have Internet Access and selected
Oracle JDK 1.6 or Oracle JDK 1.7 during
Ambari Server setup.
Action
Ambari automatically downloaded the JCE policy files
(that match the JDK) and installed the JCE onto the
Ambari Server.
If you have Internet Access and selected
Custom JDK during Ambari Server setup.
You must distribute and install the JCE on all hosts.
The JCE has not been downloaded or installed on the
Ambari Server or the hosts in the cluster.
You must distribute and install the JCE on Ambari
and all hosts.
If you do not have Internet Access and
selected Custom JDK during Ambari
Server setup.
and all hosts.
If you have a previous Ambari install and
upgraded to Ambari 2.0.0.
and all hosts.
Table 23. JCE Options and Actions Distribute and Install the JCE 1
On the Ambari Server, obtain the JCE policy file appropriate for the JDK version in your
cluster.
•
For Oracle JDK 1.6:
http://www.oracle.com/technetwork/java/javase/downloads/jce-6-download429243.html
•
For Oracle JDK 1.7:
213
March 26, 2015
http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download432124.html
2
Save the policy file archive in a temporary location.
3
On Ambari Server and on each host in the cluster, add the unlimited security policy JCE jars
to $JAVA_HOME/jre/lib/security/.
For example, run the following to extract the policy jars into the JDK installed on your host:
unzip -o -j -q UnlimitedJCEPolicyJDK7.zip -d
/usr/jdk64/jdk1.7.0_67/jre/lib/security/
4
5
Proceed to Running the Security Wizard.
Running the Kerberos Wizard
The Kerberos Wizard prompts for information related to the KDC, the KDC Admin Account and the
Service and Ambari principals. Once provided, Ambari will automatically create principals, generate
keytabs and distribute keytabs to the hosts in the cluster. The services will be configured for
Kerberos and the service components are restarted to authenticate against the KDC.
Since Ambari will automatically create principals in the KDC and generate keytabs, you
must have Kerberos Admin Account credentials available when running the wizard.
High-‐Level View of Principal Creation, Keytab Generation, and Distribution Flow Launching the Kerberos Wizard
1
Be sure you've Installed and Configured your KDC and have prepared the JCE on each host
in the cluster.
2
Log in to Ambari Web and Browse to Admin > Kerberos.
3
Click “Enable Kerberos” to launch the wizard.
4
Select the type of KDC you are using and confirm you have met the prerequisites.
5
Provide information about the KDC and admin account.
214
6
March 26, 2015
Proceed with the install.
(Optional) To manage your Kerberos client krb5.conf manually (and not have Ambari manage
the krb5.conf), expand the Advanced krb5-conf section and uncheck the "Manage" option.
(Optional) If you need to customize the attributes for the principals Ambari will create, see the
Customizing the Attribute Template for more information.
7
Ambari will install Kerberos clients on the hosts and test access to the KDC by testing that
Ambari can create a principal, generate a keytab and distribute that keytab.
8
Customize the Kerberos identities used by Hadoop and proceed to kerberize the cluster.
Pay particular attention to the Ambari principal names. For example, if you want the
Ambari Smoke User Principal name to be unique and include the cluster name , you
can append ${cluster_name} to the identity setting. ${clusterenv/smokeuser}-${cluster_name}@{realm}
9
After principals have been created and keytabs have been generated and distributed, Ambari
updates the cluster configurations, then starts and tests the Services in the cluster.
If you cluster includes Storm, after enabling Kerberos, you must also Set Up Ambari for
Kerberos for storm Service Summary information to be displayed in Ambari Web.
Otherwise, you will see n/a for Storm information such as Slots, Tasks, Executors and
Topologies.
Customizing the Attribute Template
Depending on your KDC policies, you can customize the attributes that Ambari sets when creating
principals. On the Configure Kerberos step of the wizard, in the Advanced kerberos-env section,
you have access to the Ambari Attribute Template. This template (which is based on the Apache
Velocity templating syntax) can be modified to adjust which attributes are set on the principals and
how those attribute values are derived.
The following table lists the set of computed attribute variables available if you choose to modify the
template:
Attribute Variables
$normalized_principal
$principal_name
$principal_primary
$principal_digest
$principal_instance
$realm
$password
Example
nn/[email protected]
nn/c6401.ambari.apache.org
nn
[[MD5 hash of the $normalized_principal]]
c6401.ambari.apache.org
EXAMPLE.COM
[[password]]
Kerberos Client Packages
As part of the enabling Kerberos process, Ambari installs the Kerberos clients on the cluster hosts.
Depending on your operating system, the following packages are installed:
215
Operating System
RHEL/CentOS/Oracle Linux 5 +
6
SLES 11
Ubuntu 12
March 26, 2015
Packages
krb5-workstation
krb5-client
krb5-user, krb5-config
Table 24. Packages installed by Ambari for the Kerberos Client Post-Kerberos Wizard User/Group Mapping
If you have chosen to use existing MIT or Active Directory Kerberos infrastructure with your cluster, it
is important to tell the cluster how to map usernames from those existing systems to principals
within the cluster. This is required to properly translate username syntaxes from existing systems to
Hadoop to ensure usernames can be mapped successfully.
Hadoop uses a rule-based system to create mappings between service principals and their related
UNIX usernames. The rules are specified using the configuration property
hadoop.security.auth_to_local as part of core-site.
The default rule is simply named DEFAULT. It translates all principals in your default domain to their
first component. For example, [email protected] and
myusername/[email protected] both become myusername, assuming your default domain is
EXAMPLE.COM. In this case, EXAMPLE.COM represents the Kerberos realm, or Active Directory
Domain that is being used.
Creating Auth-to-Local Rules
To accommodate more complex translations, you can create a hierarchical set of rules to add to the
default. Each rule is divided into three parts: base, filter, and substitution.
•
The Base
The base begins with the number of components in the principal name (excluding the realm),
followed by a colon, and the pattern for building the username from the sections of the
principal name. In the pattern section $0 translates to the realm, $1 translates to the first
component and $2 to the second component.
For example:
[1:$1@$0] translates [email protected] to [email protected]
[2:$1] translates myusername/[email protected] to myusername
[2:$1%$2] translates myusername/[email protected] to “myusername%admin
•
The Filter
The filter consists of a regular expression (regex) in a parentheses. It must match the
generated string for the rule to apply.
For example:
(.*%admin) matches any string that ends in %admin
(.*@SOME.DOMAIN) matches any string that ends in @SOME.DOMAIN
•
The Substitution
216
March 26, 2015
The substitution is a sed rule that translates a regex into a fixed string.
For example:
s/@ACME\.COM// removes the first instance of @SOME.DOMAIN
s/@[A-Z]*\.COM// removes the first instance of @ followed by a name followed by COM.
s/X/Y/g replaces all of X's in the name with Y
Examples
•
If your default realm was EXAMPLE.COM, but you also wanted to take all principals from
ACME.COM that had a single component [email protected], the following rule would do this:
RULE:[1:$1@$0]([email protected])s/@.// DEFAULT
•
To translate names with a second component, you could use these rules:
RULE:[1:$1@$0]([email protected])s/@.// RULE:[2:$1@$0]([email protected])s/@.// DEFAULT
•
To treat all principals from EXAMPLE.COM with the extension /admin as admin, your rules
would look like this:
RULE[2:$1%$2@$0](.%[email protected])s/./admin/ DEFAULT
After your mapping rules have been configured and are in place, Hadoop uses those rules to map
principals to UNIX users. By default, Hadoop uses the UNIX shell to resolve a user’s UID, GID, and
list of associated groups for secure operation on every node in the cluster. This is because in a
kerberized cluster, individual tasks run as the user who submitted the application. In this case, the
user’s identity is propagated all they way down to local JVM processes to ensure tasks are run as the
user who submitted them. For this reason, typical enterprise customers choose to use technologies
such as PAM, SSSD, Centrify, or other solutions to integrate with a corporate directory. As Linux is
commonly used in the enterprise, there is most likely an existing enterprise solution that has been
adopted for your organization. The assumption going forward is that such a solution has been
integrated successfully, so logging into each individual DataNode using SSH can be accomplished
using LDAP credentials, and typing in id results in a UID, GID, and list of associated groups being
returned.
If you use Hue, you must install and configure Hue manaully, after running the Kerberos
wizard. For information about installing Hue manually, see Installing Hue .
217
March 26, 2015
Advanced Security Options for Ambari
This section describes several security options for an Ambari-monitored-and-managed Hadoop
cluster.
•
Setting Up LDAP or Active Directory Authentication
•
Encrypt Database and LDAP Passwords
•
Set Up SSL for Ambari
•
Set Up Kerberos for Ambari Server
•
Set Up Two-Way SSL Between Ambari Server and Ambari Agents
Configuring Ambari for LDAP or Active Directory
Authentication
By default Ambari uses an internal database as the user store for authentication and authorization. If
you want to configure LDAP or Active Directory (AD) external authentication, you need to collect the
following information and run a setup command.
Also, you must synchronize your LDAP users and groups into the Ambari DB to be able to manage
authorization and permissions against those users and groups.
Setting Up LDAP User Authentication
The following table details the properties and values you need to know to set up LDAP
authentication.
If you are going to set bindAnonymously to false (the default), you need to make sure
you have an LDAP Manager name and password set up. If you are going to use SSL,
you need to make sure you have already set up your certificate and keys.
Table 25. Ambari Server LDAP Properties Property
authentication.ldap.primaryUrl
Values
server:port
authentication.ldap.secondaryUrl
server:port
authentication.ldap.useSSL
true or false
authentication.ldap.usernameAttribut
e
authentication.ldap.baseDn
[LDAP
attribute]
[Distinguish
ed Name]
218
Description
The hostname and port for the LDAP or AD
server. Example: my.ldap.server:389
The hostname and port for the secondary
LDAP or AD server. Example:
my.secondary.ldap.server:389 This is an
optional value.
If true, use SSL when connecting to the LDAP
or AD server.
The attribute for username. Example: uid
The root Distinguished Name to search in the
directory for users. Example:
ou=people,dc=hadoop,dc=apache,dc=org
authentication.ldap.referral
authentication.ldap.bindAnonymousl
y
authentication.ldap.managerDn
March 26, 2015
[Referral
method]
true or false
[Full
Distinguishe
d Name]
authentication.ldap.managerPasswor
d
authentication.ldap.userObjectClass
[password]
authentication.ldap.groupObjectClas
s
[LDAP
Object
Class]
[LDAP
attribute]
[LDAP
attribute]
authentication.ldap.groupMembershi
pAttr
authentication.ldap.groupNamingAttr
[LDAP
Object
Class]
Determines if LDAP referrals should be
followed, or ignored.
If true, bind to the LDAP or AD server
anonymously
If Bind anonymous is set to false, the
Distinguished Name (“DN”) for the manager.
Example:
uid=hdfs,ou=people,dc=hadoop,dc=apache,d
c=org
If Bind anonymous is set to false, the
password for the manager
The object class that is used for users.
Example: organizationalPerson
The object class that is used for groups.
Example: groupOfUniqueNames
The attribute for group membership.
Example: uniqueMember
The attribute for group name.
Configure Ambari to use LDAP Server
Only if you are using LDAPS, and the LDAPS server certificate is signed by a
trusted Certificate Authority, there is no need to import the certificate into Ambari so
this section does not apply to you. If the LDAPS server certificate is self-signed, or is
signed by an unrecognized certificate authority such as an internal certificate authority,
you must import the certificate and create a keystore file. The following example
creates a keystore file at /keys/ldaps-keystore.jks, but you can create it anywhere in
the file system:
Run the LDAP setup command on the Ambari server and answer the prompts, using
the information you collected above:
1
mkdir /etc/ambari-server/keys
where the keys directory does not exist, but should be created.
2
$JAVA_HOME/bin/keytool -import -trustcacerts -alias root -file
$PATH_TO_YOUR_LDAPS_CERT -keystore /etc/ambari-server/keys/ldapskeystore.jks
3
Set a password when prompted. You will use this during ambari-server setup-ldap.
ambari-server setup-ldap
1
At the Primary URL* prompt, enter the server URL and port you collected above. Prompts
marked with an asterisk are required values.
219
March 26, 2015
2
At the Secondary URL* prompt, enter the secondary server URL and port. This value is
optional.
3
At the Use SSL* prompt, enter your selection. If using LDAPS, enter true.
4
At the User object class* prompt, enter the object class that is used for users.
5
At the User name attribute* prompt, enter your selection. The default value is uid.
6
At the Group object class* prompt, enter the object class that is used for groups.
7
At the Group name attribute* prompt, enter the attribute for group name.
8
At the Group member attribute* prompt, enter the attribute for group membership.
9
At the Distinguished name attribute* prompt, enter the attribute that is used for the
distinguished name.
10 At the Base DN* prompt, enter your selection.
11 At the Referral method* prompt, enter to follow or ignore LDAP referrals.
12 At the Bind anonymously* prompt, enter your selection.
13 At the Manager DN* prompt, enter your selection if you have set bind.Anonymously to
false.
14 At the Enter the Manager Password* prompt, enter the password for your LDAP
manager DN.
15 If you set Use SSL* = true in step 3, the following prompt appears: Do you want to
provide custom TrustStore for Ambari?
Consider the following options and respond as appropriate.
•
More secure option: If using a self-signed certificate that you do not want
imported to the existing JDK keystore, enter y.
For example, you want this certificate used only by Ambari, not by any other
applications run by JDK on the same host.
1
2
3
If you choose this option, additional prompts appear. Respond to the additional
prompts as follows:
At the TrustStore type prompt, enter jks.
At the Path to TrustStore file prompt, enter /keys/ldaps-keystore.jks (or
the actual path to your keystore file).
At the Password for TrustStore prompt, enter the password that you defined for
the keystore.
•
Less secure option: If using a self-signed certificate that you want to import and
store in the existing, default JDK keystore, enter n.
1
Convert the SSL certificate to X.509 format, if necessary, by executing the following
command:
openssl x509 -in slapd.pem -out <slapd.crt>
Where <slapd.crt> is the path to the X.509 certificate.
2 Import the SSL certificate to the existing keystore, for example the default jre
certificates storage, using the following instruction:
220
March 26, 2015
/usr/jdk64/jdk1.7.0_45/bin/keytool -import -trustcacerts -file slapd.crt keystore /usr/jdk64/jdk1.7.0_45/jre/lib/security/cacerts
Where Ambari is set up to use JDK 1.7. Therefore, the certificate must be imported in the JDK 7
keystore.
16 Review your settings and if they are correct, select y.
17 Start or restart the Server
The users you have just imported are initially granted the Ambari User privilege. Ambari Users
can read metrics, view service status and configuration, and browse job information. For
these new users to be able to start or stop services, modify configurations, and run smoke
tests, they need to be Admins. To make this change, as an Ambari Admin, use Manage
Ambari > Users > Edit. For instructions, see Managing Users and Groups.
Example Active Directory Configuration
Directory Server implementations use specific object classes and attributes for storing identities. In
this example, configurations specific to Active Directory are displayed as an example. Only those
properties that are specific to Active Directory are displayed.
Run ambari-server setup-ldap and provide the following information about your Domain.
Prompt
User object class* (posixAccount)
User name attribute* (uid)
Group object class* (posixGroup)
Group member attribute* (memberUid)
Example AD Values
user
cn
group
member
Synchronizing LDAP Users and Groups
Run the LDAP synchronize command and answer the prompts to initiate the sync:
ambari-server sync-ldap [option]
To perform this operation, your Ambari Server must be running.
• When prompted, you must provide credentials for an Ambari Admin.
• When syncing ldap, Local user accounts with matching username will switch to
LDAP type, which means their authentication will be against the external LDAP
and not against the Local Ambari user store.
• LDAP sync only syncs up-to-1000 users. If your LDAP contains over 1000 users
and you plan to import over 1000 users, you must use the --users option when
syncing and specify a filtered list of users to perform import in batches.
The utility provides three options for synchronization:
•
Specific set of users and groups, or
•
Synchronize the existing users and groups in Ambari with LDAP, or
•
All users and groups
221
March 26, 2015
Review log files for failed synchronization attempts, at /var/log/ambari-server/ambariserver.log on the Ambari Server host.
Specific Set of Users and Groups
ambari-server sync-ldap --users users.txt --groups groups.txt
Use this option to synchronize a specific set of users and groups from LDAP into Ambari. Provide the
command a text file of comma-separated users and groups, and those LDAP entities will be imported
and synchronized with Ambari.
Group membership is determined using the Group Membership Attribute specified
during setup-ldap.
Existing Users and Groups ambari-server sync-ldap --existing
After you have performed a synchronization of a specific set of users and groups, you use this option
to synchronize only those entities that are in Ambari with LDAP. Users will be removed from Ambari if
they no longer exist in LDAP, and group membership in Ambari will be updated to match LDAP.
Group membership is determined using the Group Membership Attribute specified
during setup-ldap.
All Users and Groups Only use this option if you are sure you want to synchronize all users and groups from
LDAP into Ambari. If you only want to synchronize a subset of users and groups, use a
specific set of users and groups option.
ambari-server sync-ldap --all
This will import all entities with matching LDAP user and group object classes into Ambari.
Optional: Encrypt Database and LDAP Passwords
By default the passwords to access the Ambari database and the LDAP server are stored in a plain
text configuration file. To have those passwords encrypted, you need to run a special setup
command.
222
March 26, 2015
Ambari Server should not be running when you do this: either make the edits before you start Ambari
Server the first time or bring the server down to make the edits.
1
On the Ambari Server, run the special setup command and answer the prompts:
ambari-server setup-security
1
Select Option 2: Choose one of the following options:
[1] Enable HTTPS for Ambari server.
[2] Encrypt passwords stored in ambari.properties file.
[3] Setup Ambari kerberos JAAS configuration.
2
Provide a master key for encrypting the passwords. You are prompted to enter the
key twice for accuracy.
If your passwords are encrypted, you need access to the master key to start Ambari
Server.
3
You have three options for maintaining the master key:
•
•
•
Persist it to a file on the server by pressing y at the prompt.
Create an environment variable AMBARI_SECURITY_MASTER_KEY and set it to the
key.
Provide the key manually at the prompt on server start up.
4
Start or restart the Server
Reset Encryption
There may be situations in which you want to:
•
Remove encryption entirely
•
Change the current master key, either because the key has been forgotten or because you
want to change the current key as a part of a security routine.
Ambari Server should not be running when you do this.
Remove Encryption Entirely
To reset Ambari database and LDAP passwords to a completely unencrypted state:
1
On the Ambari host, open /etc/ambari-server/conf/ambari.properties with a text
editor and set this property
security.passwords.encryption.enabled=false
2
Delete /var/lib/ambari-server/keys/credentials.jceks
3
Delete /var/lib/ambari-server/keys/master
4
You must now reset the database password and, if necessary, the LDAP password. Run
ambari-server setup and ambari-server setup-ldap again.
223
March 26, 2015
Change the Current Master Key
To change the master key:
•
If you know the current master key or if the current master key has been persisted:
1
1
4
5
6
•
Re-run the encryption setup command and follow the prompts.
Select Option 2: Choose one of the following options:
[1] Enable HTTPS for Ambari server.
[2] Encrypt passwords stored in ambari.properties file.
[3] Setup Ambari kerberos JAAS configuration.
Enter the current master key when prompted if necessary (if it is not persisted or set
as an environment variable).
At the Do you want to reset Master Key prompt, enter yes.
At the prompt, enter the new master key and confirm.
If you do not know the current master key:
1
Remove encryption entirely, as described here.
2
Re-run ambari-server setup-security as described here.
3
Start or restart the Ambari Server.
Optional: Set Up SSL for Ambari
Set Up HTTPS for Ambari Server
If you want to limit access to the Ambari Server to HTTPS connections, you need to provide a
certificate. While it is possible to use a self-signed certificate for initial trials, they are not suitable for
production environments. After your certificate is in place, you must run a special setup command.
Ambari Server should not be running when you do this. Either make these changes before you start
Ambari the first time, or bring the server down before running the setup command.
1
Log into the Ambari Server host.
2
Locate your certificate. If you want to create a temporary self-signed certificate, use this as
an example:
openssl genrsa -out $wserver.key 2048
openssl req -new -key $wserver.key -out $wserver.csr
openssl x509 -req -days 365 -in $wserver.csr -signkey $wserver.key -out
$wserver.crt
Where $wserver is the Ambari Server host name.
The certificate you use must be PEM-encoded, not DER-encoded. If you attempt to use a
DER-encoded certificate, you see the following error:
224
March 26, 2015
unable to load certificate
140109766494024:error:0906D06C:PEM routines:PEM_read_bio:no start
line:pem_lib.c
:698:Expecting: TRUSTED CERTIFICATE
You can convert a DER-encoded certificate to a PEM-encoded certificate using the following
command:
openssl x509 -in cert.crt -inform der -outform pem -out cert.pem
where cert.crt is the DER-encoded certificate and cert.pem is the resulting PEMencoded certificate.
3
Run the special setup command and answer the prompts
1
Select 1 for Enable HTTPS for Ambari server.
2
Respond y to Do you want to configure HTTPS ?
3
Select the port you want to use for SSL. The default port number is 8443.
4
Provide the path to your certificate and your private key. For example, put your
certificate and private key in /etc/ambari-server/certs with root as the owner or
the non-root user you designated during Ambari Server setup for the ambari-server
daemon.
5
Provide the password for the private key.
6
Start or restart the Server
Optional: Set Up Kerberos for Ambari Server
When a cluster is enabled for Kerberos, the component REST endpoints (such as the YARN ATS
component) require SPNEGO authentication.
Depending on the Services in your cluster, Ambari Web needs access to these APIs. As well, views
such as the Jobs View and the Tez View need access to ATS. Therefore, the Ambari Server requires
a Kerberos principal in order to authenticate via SPNEGO against these APIs. This section describes
how to configure Ambari Server with a Kerberos principal and keytab to allow views to authenticate
via SPNEGO against cluster components.
1
Create a principal in your KDC for the Ambari Server. For example, using kadmin:
addprinc -randkey [email protected]
2
Generate a keytab for that principal.
xst -k ambari.server.keytab [email protected]
3
Place that keytab on the Ambari Server host.
225
March 26, 2015
/etc/security/keytabs/ambari.server.keytab
4
Stop the ambari server.
ambari-server stop
5
Run the setup-security command.
6
Select 3 for Setup Ambari kerberos JAAS configuration.
7
Enter the Kerberos principal name for the Ambari Server you set up earlier.
8
Enter the path to the keytab for the Ambari principal.
9
Optional: Set Up Two-Way SSL Between Ambari Server and
Ambari Agents
Two-way SSL provides a way to encrypt communication between Ambari Server and Ambari Agents.
By default Ambari ships with Two-way SSL disabled. To enable Two-way SSL:
Ambari Server should not be running when you do this: either make the edits before you start Ambari
Server the first time or bring the server down to make the edits.
1
On the Ambari Server host, open /etc/ambari-server/conf/ambari.properties with a
text editor.
2
Add the following property:
security.server.two_way_ssl = true
3
Start or restart the Ambari Server.
The Agent certificates are downloaded automatically during Agent Registration.
Optional: Configure Ciphers and Protocols for Ambari Server
Ambari provides control of ciphers and protocols that are exposed via Ambari Server.
1
To disable specific ciphers, you can optionally add a list of the following format to
ambari.properties. If you specify multiple ciphers, separate each cipher using a vertical bar |.
security.server.disabled.ciphers=TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA
2
To disable specific protocols, you can optionally add a list of the following format to
ambari.properties. If you specify multiple protocols, separate each protocol using a vertical
bar |.
security.server.disabled.protocols=SSL|SSLv2|SSLv3
226
March 26, 2015
Troubleshooting Ambari Deployments
Introduction: Troubleshooting Ambari Issues
The first step in troubleshooting any problem in an Ambari-deploying Hadoop cluster is Reviewing
the Ambari Log Files.
Find a recommended solution to a troubleshooting problem in one of the following sections:
•
Resolving Ambari Installer Problems
•
Resolving Cluster Deployment Problems
•
Resolving General Problems
Reviewing Ambari Log Files
Find files that log activity on an Ambari host in the following locations:
•
Ambari Server logs
<your.Ambari.server.host>/var/log/ambari-server/ambari-server.log
•
Ambari Agent logs
<your.Ambari.agent.host>/var/log/ambari-agent/ambari-agent.log
•
Ambari Action logs
<your.Ambari.agent.host>/var/lib/ambari-agent/data/
This location contains logs for all tasks executed on an Ambari agent host.
Each log name includes:
•
command-N.json - the command file corresponding to a specific task.
•
output-N.txt - the output from the command execution.
•
errors-N.txt - error messages.
Resolving Ambari Installer Problems
Try the recommended solution for each of the following problems:
Problem: Browser crashed before Install Wizard completes
Your browser crashes or you accidentally close your browser before the Install Wizard completes.
Solution
The response to a browser closure depends on where you are in the process:
•
The browser closes before you press the Deploy button.
227
March 26, 2015
Re-launch the same browser and continue the install process. Using a different browser
forces you to re-start the entire process.
•
The browser closes after you press Deploy, while or after the Install, Start, and Test
screen opens.
Re-launch the same browser and continue the process, or log in again, using a different
browser.
When the Install, Start, and Test displays, proceed.
Problem: Install Wizard reports that the cluster install has failed
The Install, Start, and Test screen reports that the cluster install has failed.
Solution
The response to a report of install failure depends on the cause of the failure:
•
The failure is due to intermittent network connection errors during software package installs.
Use the Retry button on the Install, Start, and Test screen.
•
•
•
The failure is due to misconfiguration or other setup errors.
1
Use the left navigation bar to go back to the appropriate screen. For example,
Customize Services.
2
Make your changes.
3
Continue in the normal way.
The failure occurs during the start/test sequence.
1
Click Next and Complete, then proceed to the Monitoring Dashboard.
2
Use the Services View to make your changes.
3
Re-start the service using Service Actions.
The failure is due to something else.
1
Open an SSH connection to the Ambari Server host.
2
Clear the database. At the command line, type:
ambari-server reset
3
Clear your browser cache.
4
Re-run the Install Wizard.
Problem: Ambari Agents May Fail to Register with Ambari Server.
When deploying HDP using Ambari 1.4.x or later on RHEL CentOS 6.5, click the “Failed” link on the
Confirm Hosts page in the Cluster Install wizard to display the Agent logs. The following log entry
indicates the SSL connection between the Agent and Server failed during registration:
228
March 26, 2015
INFO 2014-04-02 04:25:22,669 NetUtil.py:55 - Failed to connect to
https://{ambari-server}:8440/cert/ca due to [Errno 1] _ssl.c:492:
error:100AE081:elliptic curve routines:EC_GROUP_new_by_curve_name:unknown
group
For more detailed information about this OpenSSL issue, see
https://bugzilla.redhat.com/show_bug.cgi?id=1025598
Solution:
In certain recent Linux distributions, such as RHEL/Centos/Oracle Linux 6.x, the default value of
nproc is lower than the value required to deploy the HBase service successfully. If you are deploying
HBase, change the value of nproc:
1
Check the OpenSSL library version installed on your host(s):
rpm -qa | grepopenssl openssl-1.0.1e-15.el6.x86_64
2
If the output reads openssl-1.0.1e-15.x86_64 (1.0.1 build 15), you must upgrade
the OpenSSL library. To upgrade the OpenSSL library, run the following command:
yum upgrade openssl
3
Verify you have the newer version of OpenSSL (1.0.1 build 16):
rpm -qa
4
| grep opensslopenssl-1.0.1e-16.el6.x86_64
Restart Ambari Agent(s) and click Retry -> Failed in the wizard user interface.
Problem: The “yum install ambari-server” Command Fails
You are unable to get the initial install command to run.
Solution:
You may have incompatible versions of some software components in your environment. See Meet
Minimum System Requirements in Installing HDP Using Ambari for more information, then make any
necessary changes.
Problem: HDFS Smoke Test Fails
If your DataNodes are incorrectly configured, the smoke tests fail and you get this error message in
the DataNode logs:
DisallowedDataNodeException
org.apache.hadoop.hdfs.server.protocol.
DisallowedDatanodeException
Solution:
1
Make sure that reverse DNS look-up is properly configured for all nodes in your cluster.
2
Make sure you have the correct FQDNs when specifying the hosts for your cluster. Do not
use IP addresses - they are not supported.
229
3
March 26, 2015
Restart the installation process.
Problem: yum Fails on Free Disk Space Check
If you boot your Hadoop DataNodes with/as a ramdisk, you must disable the free space check for
yum before doing the install. If you do not disable the free space check, yum will fail with the
following error:
Fail: Execution of '/usr/bin/yum -d 0 -e 0 -y install unzip' returned 1.
Error Downloading Packages: unzip-6.0-1.el6.x86_64: Insufficient space in
download directory /var/cache/yum/x86_64/6/base/packages
* free
0
* needed 149 k
Solution:
To disable free space check, update the DataNode image with a directive in /etc/yum.conf:
diskspacecheck=0
Problem: A service with a customized service user is not appearing
properly in Ambari Web
You are unable to monitor or manage a service in Ambari Web when you have created a customized
service user name with a hyphen, for example, hdfs-user.
Solution
Hyphenated service user names are not supported. You must re-run the Ambari Install Wizard and
create a different name.
Resolving Cluster Deployment Problems
Try the recommended solution for each of the following problems:.
Problem: Trouble Starting Ambari on System Reboot
If you reboot your cluster, you must restart the Ambari Server and all the Ambari Agents manually.
Solution:
Log in to each machine in your cluster separately:
1
On the Ambari Server host machine:
ambari-server start
2
On each host in your cluster:
ambari-agent start
230
March 26, 2015
Problem: Metrics and Host information display incorrectly in Ambari
Web
Charts appear incorrectly or not at all despite Host health status is displayed incorrectly.
Solution:
All the hosts in your cluster and the machine from which you browse to Ambari Web must be in sync
with each other. The easiest way to assure this is to enable NTP.
Problem: On SUSE 11 Ambari Agent crashes within the first 24 hours
SUSE 11 ships with Python version 2.6.0-8.12.2 which contains a known defect that causes this
crash.
Solution:
Upgrade to Python version 2.6.8-0.15.1.
Problem: Attempting to Start HBase REST server causes either REST
server or Ambari Web to fail
As an option you can start the HBase REST server manually after the install process is complete. It
can be started on any host that has the HBase Master or the Region Server installed. If you install the
REST server on the same host as the Ambari server, the http ports will conflict.
Solution
In starting the REST server, use the -p option to set a custom port.
Use the following command to start the REST server.
/usr/lib/hbase/bin/hbase-daemon.sh start rest -p <custom_port_number>
Problem: Multiple Ambari Agent processes are running, causing reregister
On a cluster host ps aux | grep ambari-agent shows more than one agent process running.
This causes Ambari Server to get incorrect ids from the host and forces Agent to restart and reregister.
Solution
On the affected host, kill the processes and restart.
1
Kill the Agent processes and remove the Agent PID files found here: /var/run/ambariagent/ambari-agent.pid.
2
Restart the Agent process:
ambari-agent start
231
March 26, 2015
Problem: Some graphs do not show a complete hour of data until the
cluster has been running for an hour
When you start a cluster for the first time, some graphs, such as Services View > HDFS and
Services View > MapReduce, do not plot a complete hour of data. Instead, they show data only
for the length of time the service has been running. Other graphs display the run of a complete hour.
Solution
Let the cluster run. After an hour all graphs will show a complete hour of data.
Problem: Ambari stops MySQL database during deployment, causing
Ambari Server to crash.
The Hive Service uses MySQL Server by default. If you choose MySQL server as the database on the
Ambari Server host as the managed server for Hive, Ambari stops this database during deployment
and crashes.
Solution
If you plan to use the default MySQL Server setup for Hive and use MySQL Server for Ambari - make
sure that the two MySQL Server instances are different.
If you plan to use the same MySQL Server for Hive and Ambari - make sure to choose the existing
database option for Hive.
Problem: Cluster Install Fails with Groupmod Error
The cluster fails to install with an error related to running groupmod. This can occur in environments
where groups are managed in LDAP, and not on local Linux machines.
You may see an error message similar to the following one:
Fail: Execution of 'groupmod hadoop' returned 10. groupmod: group 'hadoop'
does not exist in /etc/group
Solution
When installing the cluster using the Cluster Installer Wizard, at the Customize Services step,
select the Misc tab and choose the Skip group modifications during install option.
Problem: Host registration fails during Agent bootstrap on SLES due to
timeout.
When using SLES and performing host registration using SSH, the Agent bootstrap may fail due to
timeout when running the setupAgent.py script. The host on which the timeout occurs will show
the following process hanging:
c6401.ambari.apache.org:/etc/ # ps -ef | grep zypper
root
18318 18317 5 03:15 pts/1
00:00:00 zypper -q search -s --matchexact ambari-agent
232
March 26, 2015
Solution
1
If you have a repository registered that is prompting to accept keys, via user interaction, you
may see the hang and timeout. In this case, run zypper refresh and confirm all repository
keys are accepted for the zypper command to work without user interaction.
2
Another alternative is to perform manual Agent setup and not use SSH for host registration.
This option does not require that Ambari call zypper without user interaction.
Problem: Host Check Fails if Transparent Huge Pages (THP) is not
disabled.
When installing Ambari on RHEL/CentOS 6 using the Cluster Installer Wizard at the Host Checks
step, one or more host checks may fail if you have not disabled Transparent Huge Pages on all hosts.
Host Checks will warn you when a failure occurs.
Solution
Disable THP.
On all hosts,
1
Add the following command to your /etc/rc.local file:
if test -f /sys/kernel/mm/transparent_hugepage/enabled;
then echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag;
then echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
fi
2
To confirm, reboot the host then run the following command:
$ cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]
Resolving General Problems
During Enable Kerberos, the Check Kerberos operation fails.
When enabling Kerberos using the wizard, the Check Kerberos operation fails. In /var/log/ambariserver/ambari-server.log, you see a message: 02:45:44,490 WARN [qtp567239306-238]
MITKerberosOperationHandler:384 - Failed to execute kadmin:
Solution 1:
Check that NTP is running and confirm your hosts and the KDC times are in sync. A time skew as
little as 5 minutes can cause Kerberos authentication to fail.
233
March 26, 2015
Solution 2: (on RHEL/CentOS/Oracle Linux)
Check that the Kerberos Admin principal being used has the necessary KDC ACL rights as set in
/var/kerberos/krb5kdc/kadm5.acl .
Problem: Hive developers may encounter an exception error message
during Hive Service Check
MySQL is the default database used by the Hive metastore. Depending on several factors, such as
the version and configuration of MySQL, a Hive developer may see an exception message similar to
the following one:
An exception was thrown while adding/validating classes) : Specified key was
too long; max key length is 767 bytes
Solution
Administrators can resolve this issue by altering the Hive metastore database to use the Latin1
character set, as shown in the following example:
mysql> ALTER DATABASE <metastore.database.name> character set latin1;
Problem: API calls for PUT, POST, DELETE respond with a "400 - Bad
Request"
When attempting to perform a REST API call, you receive a 400 error response. REST API calls
require the "X-Requested-By" header.
Solution
Starting with Ambari 1.4.2, you must include the "X-Requested-By" header with the REST API calls.
For example, if using curl, include the -H "X-Requested-By: ambari" option.
curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://<ambarihost>:8080/api/v1/hosts/host1
234
March 26, 2015
Ambari Reference Guide
Ambari Reference Topics
For more information about using Ambari 2.0, see the following topics:
Installing Ambari Agents Manually
Configuring Ambari for Non-Root
Customizing HDP Services
Configuring Storm for Supervision
Using Custom Host Names
Moving the Ambari Server
Configuring LZO Compression
Using Non-Default Databases
Setting up an Internet Proxy Server for Ambari
Configuring Network Port Numbers
Changing the JDK Version on an Existing Cluster
Using Ambari Blueprints
Configuring HDP Stack Repositories for Red Hat Satellite
Tuning Ambari Performance
Using Ambari Views
235
March 26, 2015
Installing Ambari Agents Manually
Involves two steps:
1
2
Install Ambari Agents
Select the OS family running on your installation host.
Follow instructions in the section for the operating system that runs on your installation host.
Use a command line editor to perform each instruction.
1
Log in to your host as root. For example, type:
ssh <username>@<fqdn>
sudo su where <username> is your user name and <fqdn> is the fully qualified domain name of your
server host.
2
3
Confirm that the repository is configured by checking the repo list.
yum repolist
You should see values similar to the following for Ambari repositories in the list.
repo id
AMBARI.2.0.0-2.x
base
extras
updates
repo name
Ambari 2.x
CentOS-6 - Base
CentOS-6 - Extras
CentOS-6 - Updates
236
status
8
6,518
37
785
4
March 26, 2015
Install the Ambari bits. This also installs the default PostgreSQL Ambari database.
5
Installing : postgresql-libs-8.4.20-1.el6_5.x86_64
1/4
Installing : postgresql-8.4.20-1.el6_5.x86_64
2/4
Installing : postgresql-server-8.4.20-1.el6_5.x86_64
3/4
Installing : ambari-server-2.0.0-147.noarch
4/4
Verifying : postgresql-server-8.4.20-1.el6_5.x86_64
1/4
Verifying : postgresql-libs-8.4.20-1.el6_5.x86_64
2/4
Verifying : postgresql-8.4.20-1.el6_5.x86_64
3/4
Verifying : ambari-server-2.0.0-147.noarch
4/4
Installed:
ambari-server.noarch 0:2.0.0-59
Dependency Installed:
postgresql.x86_64 0:8.4.20-1.el6_5
libs.x86_64 0:8.4.20-1.el6_5
postgresql-server.x86_64 0:8.4.20-1.el6_5
postgresql-
Complete!
Accept the warning about trusting the Hortonworks GPG Key. That key will be
automatically downloaded and used to validate packages from Hortonworks. You will
see the following message:
Importing GPG key 0x07513CAD:
Userid: "Jenkins (HDP Builds) <[email protected]>"
From :
http://s3.amazonaws.com/dev.hortonworks.com/ambari/centos6/RPM-GPGKEY/RPM-GPG-KEY-Jenkins
SLES 11
1
server host.
237
2
March 26, 2015
3
Confirm the downloaded repository is configured by checking the repo list.
zypper repos
Alias
AMBARI.2.0.0-2.x
http-demeter.uniregensburg.dec997c8f9
opensuse
4
Name
Ambari 2.x
SUSE-Linux-Enterprise-Software-DevelopmentKit-11-SP1 11.1.1-1.57
Enabled
Yes
Yes
Refresh
No
Yes
OpenSuse
Yes
Yes
zypper install ambari-server
5
Retrieving package postgresql-libs-8.3.5-1.12.x86_64 (1/4), 172.0 KiB
(571.0 KiB unpacked)
Retrieving: postgresql-libs-8.3.5-1.12.x86_64.rpm [done (47.3 KiB/s)]
Installing: postgresql-libs-8.3.5-1.12 [done]
Retrieving package postgresql-8.3.5-1.12.x86_64 (2/4), 1.0 MiB (4.2 MiB
unpacked)
Retrieving: postgresql-8.3.5-1.12.x86_64.rpm [done (148.8 KiB/s)]
Installing: postgresql-8.3.5-1.12 [done]
Retrieving package postgresql-server-8.3.5-1.12.x86_64 (3/4), 3.0 MiB
(12.6 MiB unpacked)
Retrieving: postgresql-server-8.3.5-1.12.x86_64.rpm [done (452.5
KiB/s)]
Installing: postgresql-server-8.3.5-1.12 [done]
Updating etc/sysconfig/postgresql...
Retrieving package ambari-server-2.0.0-59.noarch (4/4), 99.0 MiB (126.3
MiB unpacked)
Retrieving: ambari-server-2.0.0-59.noarch.rpm [done (3.0 MiB/s)]
Installing: ambari-server-2.0.0-59 [done]
ambari-server
0:off 1:off 2:off 3:on
4:off 5:on
6:off
238
March 26, 2015
UBUNTU 12
1
server host.
2
apt-key adv --recv-keys --keyserver keyserver.ubuntu.com
B9733A7A07513CAD
apt-get update
Do not modify the ambari.list file name. This file is expected to be available on the
3
Confirm that Ambari packages downloaded successfully by checking the package name list.
apt-cache pkgnames
You should see the Ambari packages in the list.
Alias
AMBARI-dev-2.x
4
Name
Ambari 2.x
apt-get install ambari-server
1
server host.
2
239
March 26, 2015
3
Confirm the repository is configured by checking the repo list.
yum repolist
You should see listed values similar to the following:
repo Id
AMBARI.2.0.0-2.x
base
epel
repo Name
Ambari 2.x
CentOS-5 - Base
Extra Packages for Enterprise
Linux 5 - x86_64
Puppet
CentOS-5 - Updates
puppet
updates
4
status
5
3,667
7,614
433
118
When deploying HDP on a cluster having limited or no Internet access, you should
provide access to the bits using an alternative method.
•
•
For more information about setting up local repositories, see Using a Local
Repository.
For more information about obtaining JCE policy archives for secure
authentication, see Installing the JCE.
Ambari Server by default uses an embedded PostgreSQL database. When you install
the Ambari Server, the PostgreSQL packages and dependencies must be available for
install. These packages are typically available as part of your Operating System
repositories. Please confirm you have the appropriate repositories available for the
postgresql-server packages.
Install the Ambari Agents Manually
Use the instructions specific to the OS family running on your agent hosts.
1
Install the Ambari Agent on every host in your cluster.
yum install ambari-agent
2
Using a text editor, configure the Ambari Agent by editing the ambari-agent.ini file as
240
March 26, 2015
vi /etc/ambari-agent/conf/ambari-agent.ini
[server]
hostname=<your.ambari.server.hostname>
url_port=8440
secured_url_port=8441
3
Start the agent on every host in your cluster.
ambari-agent start
The agent registers with the Server on start.
SLES 11
1
2
[server]
url_port=8440
3
ambari-agent start
UBUNTU 12
1
2
[server]
url_port=8440
3
ambari-agent start
241
March 26, 2015
RHEL/CentOS/Oracle Linux 5 (DEPRECATED)
1
2
[server]
url_port=8440
3
ambari-agent start
242
March 26, 2015
Configuring Ambari for Non-Root
In most secure environments, restricting access to and limiting services that run as root is a hard
requirement. For these environments, Ambari can be configured to operate without direct root
access. Both Ambari Server and Ambari Agent components allow for non-root operation, and the
following sections will walk you through the process.
•
How to Configure Ambari Server for Non-Root
•
How to Configure an Ambari Agent for Non-Root
How to Configure Ambari Server for Non-Root
You can configure the Ambari Server to run as a non-root user. During the ambari-server setup
process, when prompted to Customize user account for ambari-server daemon?, choose
y. The setup process prompts you for the appropriate, non-root user to run the Ambari Server as; for
example: ambari.
How to Configure an Ambari Agent for Non-Root
You can configure the Ambari Agent to run as a non-privileged user as well. That user requires
specific sudo access in order to su to Hadoop service accounts and perform specific privileged
commands. Configuring Ambari Agents to run as non-root requires that you manually install agents
on all nodes in the cluster. For these details, see Installing Ambari Agents Manually. After installing
each agent, you must configure the agent to run as the desired, non-root user. In this example we will
use the ambari user.
Change the run_as_user property in the /etc/ambari-agent/conf/ambari-agent.ini file, as
illustrated below:
run_as_user=ambari
Once this change has been made, the ambari-agent must be restarted to begin running as the nonroot user.
The non-root functionality relies on sudo to run specific commands that require elevated privileges
as defined in the Sudoer Configuration. The sudo configuration is split into three sections:
Customizable Users, Non-Customizable Users, Commands, and Sudo Defaults.
Sudoer Configuration
The Customizable Users, Non-Customizable Users, Commands, and Sudo Defaults sections will
cover how sudo should be configured to enable Ambari to run as a non-root user. Each of the
sections includes the specific sudo entries that should be placed in /etc/sudoers by running the
visudo command.
Customizable Users
This section contains the su commands and corresponding Hadoop service accounts that are
configurable on install:
243
March 26, 2015
# Ambari Customizable Users
ambari ALL=(ALL) NOPASSWD:SETENV: /bin/su hdfs *, /bin/su zookeeper *,
/bin/su knox *,/bin/su falcon *,/bin/su flume *,/bin/su hbase *,/bin/su hive
*, /bin/su hcat *,/bin/su kafka *,/bin/su mapred *,/bin/su oozie *,/bin/su
sqoop *,/bin/su storm *,/bin/su tez *,/bin/su yarn *,/bin/su ams *, /bin/su
ambari-qa *, /bin/su spark *
These user accounts must match the service user accounts referenced in the
Customize Services > Misc tab during the Install Wizard configuration step. For
example, if you customize YARN to run as xyz_yarn, modify the su command above to
be /bin/su xyz_yarn.
Non-Customizable Users
This section contains the su commands for the system accounts that cannot be modified:
# Ambari Non-Customizable Users
ambari ALL=(ALL) NOPASSWD:SETENV: /bin/su mysql *
Commands
This section contains the specific commands that must be issued for standard agent operations:
# Ambari Commands
ambari
ALL=(ALL)
NOPASSWD:SETENV:
/usr/bin/yum,/usr/bin/zypper,/usr/bin/apt-get, /bin/mkdir, /bin/ln,
/bin/chown, /bin/chmod, /bin/chgrp, /usr/sbin/groupadd, /usr/sbin/groupmod,
/usr/sbin/useradd, /usr/sbin/usermod, /bin/cp, /bin/sed, /bin/rm, /bin/kill,
/usr/bin/unzip, /bin/tar, /usr/bin/hdp-select, /usr/hdp/current/hadoopclient/sbin/hadoop-daemon.sh, /usr/lib/hadoop/bin/hadoop-daemon.sh,
/usr/lib/hadoop/sbin/hadoop-daemon.sh, /usr/sbin/service mysql *,
/sbin/service mysqld *, /sbin/service mysql *, /sbin/chkconfig gmond off,
/sbin/chkconfig gmetad off, /etc/init.d/httpd *, /sbin/service hdp-gmetad
start, /sbin/service hdp-gmond start, /usr/bin/tee, /usr/sbin/gmond,
/usr/sbin/update-rc.d ganglia-monitor *, /usr/sbin/update-rc.d gmetad *,
/etc/init.d/apache2 *, /usr/sbin/service hdp-gmond *, /usr/sbin/service hdpgmetad *, /usr/bin/test, /bin/touch, /usr/bin/stat, /usr/sbin/setenforce
Do not modify the command lists, only the usernames in the Customizable Users
section may be modified.
To re-iterate, you must do this sudo configuration on every node in the cluster. To ensure that the
configuration has been done properly, you can su to the ambari user and run sudo -l. There, you can
double check that there are no warnings, and that the configuration output matches what was just
applied.
244
March 26, 2015
Sudo Defaults
Some versions of sudo have a default configuration that prevents sudo from being invoked from a
non-interactive shell. In order for the agent to run it's commands non-interactively, some defaults
need to be overridden.
Defaults
Defaults
Defaults:
exempt_group = ambari
!env_reset,env_delete-=PATH
ambari
!requiretty
To re-iterate, this sudo configuration must be done on every node in the cluster. To ensure that the
configuration has been done properly, you can su to the ambari user and run sudo -l. There, you can
double-check that there are no warnings, and that the configuration output matches what was just
applied.
245
March 26, 2015
Customizing HDP Services
Defining Service Users and Groups for a HDP 2.x Stack
The individual services in Hadoop run under the ownership of their respective Unix accounts. These
accounts are known as service users. These service users belong to a special Unix group. "Smoke
Test" is a service user dedicated specifically for running smoke tests on components during
installation using the Services View of the Ambari Web GUI. You can also run service checks as the
"Smoke Test" user on-demand after installation. You can customize any of these users and groups
using the Misc tab during the Customize Services installation step.
Use the Skip Group Modifications option to not modify the Linux groups in the
cluster. Choosing this option is typically required if your environment manages groups
using LDAP and not on the local Linux machines.
If you choose to customize names, Ambari checks to see if these custom accounts already exist. If
they do not exist, Ambari creates them. The default accounts are always created during installation
whether or not custom accounts are specified. These default accounts are not used and can be
removed post-install.
All new service user accounts, and any existing user accounts used as service users,
must have a UID >= 1000.
Table 26. Service Users Service*
Ambari
Metrics
Falcon
Component
Metrics Collector, Metrics Monitor
Default User Account
ams
Falcon Server
Flume
HBase
HDFS
Hive
Kafka
Knox
MapReduce2
Oozie
PostgreSQL
Flume Agents
MasterServer RegionServer
NameNode SecondaryNameNode
DataNode
Hive Metastore, HiveServer2
Kafka Broker
Knox Gateway
HistoryServer
Oozie Server
PostgreSQL (with Ambari Server)
falcon
(Falcon is available with HDP 2.1 or 2.2 Stack.)
flume
hbase
hdfs
Ranger
Spark
Sqoop
Ranger Admin, Ranger Usersync
Spark History Server
Sqoop
246
hive
kafka
knox
mapred
oozie
postgres
(Created as part of installing the default
PostgreSQL database with Ambari Server. If you
are not using the Ambari PostgreSQL database,
this user is not needed.)
ranger (Ranger is available with HDP 2.2 Stack)
spark (Spark is available with HDP 2.2 Stack)
sqoop
Storm
March 26, 2015
Tez
Masters (Nimbus, DRPC Server,
Storm REST API, Server, Storm UI
Server) Slaves (Supervisors,
Logviewers)
Tez clients
WebHCat
YARN
ZooKeeper
WebHCat Server
NodeManager ResourceManager
ZooKeeper
storm
(Storm is available with HDP 2.1 or 2.2 Stack.)
tez
(Tez is available with HDP 2.1 or 2.2 Stack.)
hcat
yarn
zookeeper
*For all components, the Smoke Test user performs smoke tests against cluster services as part of
the install process. It also can perform these on-demand, from the Ambari Web UI. The default user
account for the smoke test user is ambari-qa.
Table 27. Service Groups Service
All
Knox
Ranger
Spark
Components
All
Knox Gateway
Ranger Admin, Ranger
Usersync
Spark History Server
Default Group Account
hadoop
knox
ranger
spark
Setting Properties That Depend on Service
Usernames/Groups
Some properties must be set to match specific service user names or service groups. If you have set
up non-default, customized service user names for the HDFS or HBase service or the Hadoop group
name, you must edit the following properties, using Services > Service.Name > Configs >
Advanced:
Table 28. HDFS Settings: Advanced Property Name
dfs.permissions.superusergroup
dfs.cluster.administrators
dfs.block.local-path-access.user
Value
The same as the HDFS username. The default is "hdfs"
A single space followed by the HDFS username.
The HBase username. The default is "hbase".
Table 29. MapReduce Settings: Advanced Property Name
mapreduce.cluster.administrators
Value
A single space followed by the Hadoop group name.
247
March 26, 2015
If you have installed a cluster with HDP 2.2 Stack that includes the Storm service, you can configure
the Storm components to operate under supervision. This section describes those steps:
1
Stop all Storm components.
Using Ambari Web, browse to Services > Storm > Service Actions, choose Stop. Wait
until the Storm service stop completes.
2
Stop Ambari Server.
ambari-server stop
3
Change Supervisor and Nimbus command scripts in the Stack definition.
On Ambari Server host, run:
sed -ir "s/scripts\/supervisor.py/scripts\/supervisor_prod.py/g"
/var/lib/ambari-server/resources/common-services/STORM/0.9.1.2.1/metainfo.xml
sed -ir "s/scripts\/nimbus.py/scripts\/nimbus_prod.py/g" /var/lib/ambariserver/resources/common-services/STORM/0.9.1.2.1/metainfo.xml
4
Install supervisord on all Nimbus and Supervisor hosts.
•
Install EPEL repository.
yum install epel-release -y
•
Install supervisor package for supervisord.
yum install supervisor -y
•
Enable supervisord on autostart.
chkconfig supervisord on
•
Change supervisord configuration file permissions.
chmod 600 /etc/supervisord.conf
5
Configure supervisord to supervise Nimbus Server and Supervisors by appending the
following to /etc/supervisord.conf on all Supervisor host and Nimbus hosts accordingly.
248
March 26, 2015
[program:storm-nimbus]
command=env PATH=$PATH:/bin:/usr/bin/:/usr/jdk64/jdk1.7.0_67/bin/
JAVA_HOME=/usr/jdk64/jdk1.7.0_67 /usr/hdp/current/storm-nimbus/bin/storm
nimbus
user=storm
autostart=true
autorestart=true
startsecs=10
startretries=999
log_stdout=true
log_stderr=true
logfile=/var/log/storm/nimbus.out
logfile_maxbytes=20MB
logfile_backups=10
[program:storm-supervisor]
command=env PATH=$PATH:/bin:/usr/bin/:/usr/jdk64/jdk1.7.0_67/bin/
JAVA_HOME=/usr/jdk64/jdk1.7.0_67 /usr/hdp/current/storm-supervisor/bin/storm
supervisor
user=storm
autostart=true
autorestart=true
startsecs=10
startretries=999
log_stdout=true
log_stderr=true
logfile=/var/log/storm/supervisor.out
logfile_maxbytes=20MB
logfile_backups=10
Change /usr/jdk64/jdk1.7.0_67 accordingly to the location of the JDK being used
by Ambari in your environment.
6
Start Supervisord service on all Supervisor and Nimbus hosts.
service supervisord start
7
ambari-server start
8
Start all the other Storm components.
Using Ambari Web, browse to Services > Storm > Service Actions, choose Start.
249
March 26, 2015
Using Custom Host Names
You can customize the agent registration host name and the public host name used for" each host in
Ambari. Use this capability when "hostname" does not return the public network host name for your
machines.
How to Customize the name of a host
How to Customize the name of a host
1
At the Install Options step in the Cluster Installer wizard, select Perform Manual
Registration for Ambari Agents.
2
Install the Ambari Agents manually on each host, as described in Install the Ambari Agents
Manually.
3
To echo the customized name of the host to which the Ambari agent registers, for every host,
create a script like the following example, named" /var/lib/ambari-agent/hostname.sh.
Be sure to chmod the script so it is executable by the Agent.
#!/bin/sh echo
<ambari_hostname>
where <ambari_hostname> is the host name to use for Agent registration.
4
Open /etc/ambari-agent/conf/ambari-agent.ini on every host, using a text editor.
5
Add to the [agent] section the following line:
hostname_script=/var/lib/ambari-agent/hostname.sh
where /var/lib/ambari-agent/hostname.sh is the name of your custom echo script.
6
To generate a public host name for every host, create a script like the following example,
named var/lib/ambari-agent/public_hostname.sh to show the name for that host in
the UI. Be sure to chmod the script so it is executable by the Agent.
#!/bin/sh <hostname> -f
where <hostname> is the host name to use for Agent registration.
7
Open /etc/ambari-agent/conf/ambari-agent.ini on every host, using a text editor.
8
Add to the [agent] section the following line:
public_hostname_script=/var/lib/ambari-agent/public_hostname.sh
9
If applicable, add the host names to /etc/hosts on every host.
10 Restart the Agent on every host for these changes to take effect.
ambari-agent restart
250
March 26, 2015
Moving the Ambari Server
To transfer an Ambari Server that uses the default, PostgreSQL database to a new host, use the
following instructions:
1
Back up all current data - from the original Ambari Server and MapReduce databases.
2
Update all Agents - to point to the new Ambari Server.
3
Install the New Server - on a new host and populate databases with information from original
Server.
If your Ambari database is one of the non-default types, such as Oracle, adjust the
database backup, restore, and stop/start procedures to match that database type.
Back up Current Data
1
Stop the original Ambari Server.
ambari-server stop
2
Create a directory to hold the database backups.
cd /tmp
mkdir dbdumps
cd dbdumps/
3
Create the database backups.
pg_dump -U <AMBARI.SERVER.USERNAME> ambari > ambari.sql Password:
<AMBARI.SERVER.PASSWORD>
pg_dump -U <MAPRED.USERNAME> ambarirca > ambarirca.sql Password:
<MAPRED.PASSWORD>
where <AMBARI.SERVER.USERNAME>, <MAPRED.USERNAME>,
<AMBARI.SERVER.PASSWORD>, and <MAPRED.PASSWORD> are the user names and
passwords that you set up during installation. Default values are: ambari-server/bigdata
and mapred/mapred.
Update Agents
1
On each agent host, stop the agent.
ambari-agent stop
2
Remove old agent certificates.
251
March 26, 2015
rm /var/lib/ambari-agent/keys/*
3
Using a text editor, edit /etc/ambari-agent/conf/ambari-agent.ini to point to the
new host.
[server]
hostname= <NEW FULLY.QUALIFIED.DOMAIN.NAME>
url_port=8440
Install the New Server and Populate the Databases
1
Install the Server on the new host.
2
Stop the Server so that you can copy the old database data to the new Server.
ambari-server stop
3
Restart the PostgreSQL instance.
service postgresql restart
4
Open the PostgreSQL interactive terminal.
su - postgres
psql
5
Using the interactive terminal, drop the databases created by the fresh install.
drop database ambari;
drop database ambarirca;
6
Check to make sure the databases have been dropped.
/list
The databases should not be listed.
7
Create new databases to hold the transferred data.
create database ambari;
create database ambarirca;
8
Exit the interactive terminal.
^d
9
Copy the saved data from Back up Current Data to the new Server.
cd /tmp
scp -i <ssh-key> root@<original.Ambari.Server>/tmp/dbdumps/*.sql/tmp
252
March 26, 2015
compress/transfer/uncompress as needed from source to dest
psql -d ambari -f /tmp/ambari.sql
psql -d ambarirca -f /tmp/ambarirca.sql
10 Start the new Server.
<exit to root>
ambari-server start
11 On each Agent host, start the Agent.
ambari-agent start
12 Open Ambari Web. Point your browser to:
<new.Ambari.Server>:8080
13 Go to Services > MapReduce and use the Management Header to Stop and Start the
MapReduce service.
14 Start other services as necessary.
The new Server is ready to use.
253
March 26, 2015
Configuring LZO Compression
LZO is a lossless data compression library that favors speed over compression ratio. Ambari does
not install nor enable LZO Compression by default.
To enable LZO compression in your HDP cluster, you must Configure core-site.xml for LZO.
Optionally, you can implement LZO to optimize Hive queries in your cluster for speed.
For more information about using LZO compression with Hive, see Running Compression with Hive
Queries.
Configure core-site.xml for LZO
1
Browse to Ambari W eb > Services > HDFS > Configs, then expand Advanced coresite.
2
Find the io.compression.codecs property key.
3
Append to the io.compression.codecs property key, the following value:
com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCode
c
4
Add a description of the config modification, then choose Save.
5
Expand the Custom core-site.xml section.
6
Select Add Property.
7
Add to Custom core-site.xml the following property key and value
Property Key
io.compression.codec.lzo.class
Property Value
com.hadoop.compression.lzo.LzoCodec
8
Choose Save.
9
Add a description of the config modification, then choose Save.
10 Restart the HDFS, MapReduce2 and YARN services.
If performing a Restart or a Restart All does not start the required package install, you
may need to stop, then start the HDFS service to install the necessary LZO packages.
Restart is only available for a service in the "Runnning" or "Started" state.
Running Compression with Hive Queries
Running Compression with Hive Queries requires creating LZO files. To create LZO files, use one of
the following procedures:
Create LZO Files
1
Create LZO files as the output of the Hive query.
2
Use lzop command utility or your custom Java to generate lzo.index for the .lzo files.
254
March 26, 2015
Hive Query Parameters
Prefix the query string with these parameters:
SET
mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.L
zopCodec
SET hive.exec.compress.output=true
SET mapreduce.output.fileoutputformat.compress=true
For example:
hive -e "SET
mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.L
zopCodec;SET
hive.exec.compress.output=true;SET
mapreduce.output.fileoutputformat.compress=true;"
Write Custom Java to Create LZO Files
1
Create text files as the output of the Hive query.
2
Write custom Java code to
•
convert Hive query generated text files to .lzo files
•
generate lzo.index files for the .lzo files
Hive Query Parameters
Prefix the query string with these parameters:
SET hive.exec.compress.output=false
SET mapreduce.output.fileoutputformat.compress=false
For example:
hive -e "SET hive.exec.compress.output=false;SET
mapreduce.output.fileoutputformat.compress=false;<query-string>"
255
March 26, 2015
Using Non-Default Databases
Use the following instructions to prepare a non-default database for Ambari, Hive/HCatalog, or
Oozie. You must complete these instructions before you set up the Ambari Server by running
ambari-server setup.
•
Using Non-Default Databases - Ambari
•
Using Non-Default Databases - Hive
•
Using Non-Default Databases - Oozie
Using Non-Default Databases - Ambari
The following sections describe how to use Ambari with an existing database, other than the
embedded PostgreSQL database instance that Ambari Server uses by default.
•
Using Ambari with Oracle
•
Using Ambari with MySQL
•
Using Ambari with PostgreSQL
•
Troubleshooting Non-Default Databases with Ambari
Using Ambari with Oracle
To set up Oracle for use with Ambari:
1
On the Ambari Server host, install the appropriate JDBC.jar file.
1
Download the Oracle JDBC (OJDBC) driver from
http://www.oracle.com/technetwork/database/features/jdbc/index-091264.html.
2
Select Oracle Database 11g Release 2 - ojdbc6.jar.
3
Copy the .jar file to the Java share directory.
cp ojdbc6.jar /usr/share/java
4
2
Make sure the .jar file has the appropriate permissions - 644.
Create a user for Ambari and grant that user appropriate permissions.
For example, using the Oracle database admin utility, run the following commands:
# sqlplus sys/root as sysdba
CREATE USER <AMBARIUSER> IDENTIFIED BY <AMBARIPASSWORD> default
tablespace
“USERS” temporary tablespace “TEMP”;
GRANT unlimited tablespace to <AMBARIUSER>;
GRANT create session to <AMBARIUSER>;
GRANT create TABLE to <AMBARIUSER>;
GRANT create SEQUENCE to <AMBARIUSER>;
QUIT;
Where <AMBARIUSER> is the Ambari user name and <AMBARIPASSWORD> is the Ambari user
password.
256
3
March 26, 2015
Load the Ambari Server database schema.
1
You must pre-load the Ambari database schema into your Oracle database using the
schema script.
sqlplus <AMBARIUSER>/<AMBARIPASSWORD> < Ambari-DDL-OracleCREATE.sql
2
4
Find the Ambari-DDL-Oracle-CREATE.sql file in the /var/lib/ambariserver/resources/ directory of the Ambari Server host after you have installed
Ambari Server.
When setting up the Ambari Server, select Advanced Database Configuration >
Option [2] Oracle and respond to the prompts using the username/password credentials
you created in step 2.
Using Ambari with MySQL
To set up MySQL for use with Ambari:
1
On the Ambari Server host, install the connector.
1
Install the connector
RHEL/CentOS/Oracle Linux
yum install mysql-connector-java
SLES
zypper install mysql-connector-java
Ubuntu
apt-get install mysql-connector-java
2
Confirm that .jar is in the Java share directory.
ls /usr/share/java/mysql-connector-java.jar
3
2
Create a user for Ambari and grant it permissions.
•
For example, using the MySQL database admin utility:
# mysql -u root -p
CREATE USER '<AMBARIUSER>'@'%' IDENTIFIED BY '<AMBARIPASSWORD>';
GRANT ALL PRIVILEGES ON *.* TO '<AMBARIUSER>'@'%';
CREATE USER '<AMBARIUSER>'@'localhost' IDENTIFIED BY
'<AMBARIPASSWORD>';
GRANT ALL PRIVILEGES ON *.* TO '<AMBARIUSER>'@'localhost';
CREATE USER '<AMBARIUSER>'@'<AMBARISERVERFQDN>' IDENTIFIED BY
'<AMBARIPASSWORD>';
GRANT ALL PRIVILEGES ON *.* TO
'<AMBARIUSER>'@'<AMBARISERVERFQDN>';
FLUSH PRIVILEGES;
257
•
3
March 26, 2015
Where <AMBARIUSER> is the Ambari user name, <AMBARIPASSWORD> is the Ambari
user password and <AMBARISERVERFQDN> is the Fully Qualified Domain Name of the
Ambari Server host.
•
You must pre-load the Ambari database schema into your MySQL database using the
schema script.
mysql -u <AMBARIUSER> -p CREATE DATABASE <AMBARIDATABASE>;
USE <AMBARIDATABASE>;
SOURCE Ambari-DDL-MySQL-CREATE.sql;
•
Where <AMBARIUSER> is the Ambari user name and <AMBARIDATABASE> is the
Ambari database name.
Find the Ambari-DDL-MySQL-CREATE.sql file in the /var/lib/ambariserver/resources/ directory of the Ambari Server host after you have installed
Ambari Server.
4
Option [3] MySQL and enter the credentials you defined in Step 2. for user name,
password and database name.
Using Ambari with PostgreSQL
To set up PostgreSQL for use with Ambari:
1
Create a user for Ambari and grant it permissions.
•
Using the PostgreSQL database admin utility:
# sudo -u postgres psql
CREATE DATABASE <AMBARIDATABASE>;
CREATE USER <AMBARIUSER> WITH PASSWORD ‘<AMBARIPASSWORD>’;
GRANT ALL PRIVILEGES ON DATABASE <AMBARIDATABASE> TO
<AMBARIUSER>;
\connect <AMBARIDATABASE>;
CREATE SCHEMA <AMBARISCHEMA> AUTHORIZATION <AMBARIUSER>;
ALTER SCHEMA <AMBARISCHEMA> OWNER TO <AMBARIUSER>;
ALTER ROLE <AMBARIUSER> SET search_path to ‘<AMBARISCHEMA>’,
'public';
•
2
Where <AMBARIUSER> is the Ambari user name <AMBARIPASSWORD> is the Ambari
user password, <AMBARIDATABASE> is the Ambari database name and
<AMBARISCHEMA> is the Ambari schema name.
•
You must pre-load the Ambari database schema into your PostgreSQL database
using the schema script.
# psql -U <AMBARIUSER> -d <AMBARIDATABASE>
\connect <AMBARIDATABASE>;
\i Ambari-DDL-Postgres-CREATE.sql;
258
•
3
March 26, 2015
Find the Ambari-DDL-Postgres-CREATE.sql file in the /var/lib/ambariserver/resources/ directory of the Ambari Server host after you have installed
Ambari Server.
Option[4] PostgreSQL and enter the credentials you defined in Step 2. for user name,
password, and database name.
Troubleshooting Ambari
Use these topics to help troubleshoot any issues you might have installing Ambari with an existing
Oracle database.
Problem: Ambari Server Fails to Start: No Driver
Check /var/log/ambari-server/ambari-server.log for the following error:
ExceptionDescription:Configurationerror.Class[oracle.jdbc.driver.OracleDriver
] not found.
The Oracle JDBC.jar file cannot be found.
Solution Make sure the file is in the appropriate directory on the Ambari server and re-run ambari-server
setup. Review the load database procedure appropriate for your database type in Using Non-Default
Databases - Ambari.
Problem: Ambari Server Fails to Start: No Connection
The Network Adapter could not establish the connection Error Code: 17002
Ambari Server cannot connect to the database.
Solution Confirm that the database host is reachable from the Ambari Server and is correctly configured by
reading /etc/ambari-server/conf/ambari.properties.
server.jdbc.url=jdbc:oracle:thin:@oracle.database.hostname:1521/ambaridb
server.jdbc.rca.url=jdbc:oracle:thin:@oracle.database.hostname:1521/ambari
Problem: Ambari Server Fails to Start: Bad Username
Internal Exception: java.sql.SQLException:ORA-01017: invalid
username/password; logon denied
You are using an invalid username/password.
Solution Confirm the user account is set up in the database and has the correct privileges. See Step 3 above.
Problem: Ambari Server Fails to Start: No Schema
259
March 26, 2015
Internal Exception: java.sql.SQLSyntaxErrorException: ORA-00942: table or
view does not exist
The schema has not been loaded.
Solution Confirm you have loaded the database schema. Review the load database schema procedure
appropriate for your database type in Using Non-Default Databases - Ambari.
Using Non-Default Databases - Hive
The following sections describe how to use Hive with an existing database, other than the MySQL
database instance that Ambari installs by default.
•
Using Hive with Oracle
•
Using Hive with MySQL
•
Using Hive with PostgreSQL
•
Troubleshooting Non-Default Databases with Hive
Using Hive with Oracle
To set up Oracle for use with Hive:
1
On the Ambari Server host, stage the appropriate JDBC driver file for later deployment.
1
2
Select Oracle Database 11g Release 2 - ojdbc6.jar and download the file.
3
4
Execute the following command, adding the path to the downloaded .jar file:
ambari-server setup --jdbc-db=oracle --jdbcdriver=/path/to/downloaded/ojdbc6.jar
2
Create a user for Hive and grant it permissions.
•
Using the Oracle database admin utility:
CREATE USER <HIVEUSER> IDENTIFIED BY <HIVEPASSWORD>;
GRANT SELECT_CATALOG_ROLE TO <HIVEUSER>;
GRANT CONNECT, RESOURCE TO <HIVEUSER>;
QUIT;
•
3
Where <HIVEUSER> is the Hive user name and <HIVEPASSWORD> is the Hive user
password.
Load the Hive database schema.
•
For a HDP 2.2 Stack
260
March 26, 2015
Ambari sets up the Hive Metastore database schema automatically.
You do not need to pre-load the Hive Metastore database schema into your Oracle
database for a HDP 2.2 Stack.
•
For a HDP 2.1 Stack
You must pre-load the Hive database schema into your Oracle database using the
schema script, as follows:
sqlplus <HIVEUSER>/<HIVEPASSWORD> < hive-schema-0.13.0.oracle.sql
Find the hive-schema-0.13.0.oracle.sql file in the /var/lib/ambariserver/resources/stacks/HDP/2.1/services/HIVE/etc/ directory of the
Ambari Server host after you have installed Ambari Server.
•
For a HDP 2.0 Stack
Find the hive-schema-0.12.0.oracle.sql file in the /var/lib/ambariserver/resources/stacks/HDP/2.0.6/services/HIVE/etc/ directory of the
•
For a HDP 1.3 Stack
Find the hive-schema-0.10.0.oracle.sql file in the /var/lib/ambariserver/resources/stacks/HDP/1.3.2/services/HIVE/etc/ directory of the
Using Hive with MySQL
To set up MySQL for use with Hive:
1
On the Ambari Server host, stage the appropriate MySQL connector for later deployment.
1
Install the connector.
yum install mysql-connector-java*
SLES
zypper install mysql-connector-java*
261
March 26, 2015
Ubuntu
apt-get install mysql-connector-java*
2
Confirm that mysql-connector-java.jar is in the Java share directory.
3
4
Execute the following command:
ambari-server setup --jdbc-db=mysql --jdbcdriver=/usr/share/java/mysql-connector-java.jar
2
•
Using the MySQL database admin utility:
# mysql -u root -p
CREATE USER ‘<HIVEUSER>’@’localhost’ IDENTIFIED BY
‘<HIVEPASSWORD>’;
GRANT ALL PRIVILEGES ON *.* TO '<HIVEUSER>'@'localhost';
CREATE USER ‘<HIVEUSER>’@’%’ IDENTIFIED BY ‘<HIVEPASSWORD>’;
GRANT ALL PRIVILEGES ON *.* TO '<HIVEUSER>'@'%';
CREATE USER '<HIVEUSER>'@'<HIVEMETASTOREFQDN>'IDENTIFIED BY
'<HIVEPASSWORD>';
GRANT ALL PRIVILEGES ON *.* TO
'<HIVEUSER>'@'<HIVEMETASTOREFQDN>';
FLUSH PRIVILEGES;
•
3
Where <HIVEUSER> is the Hive user name, <HIVEPASSWORD> is the Hive user
password and <HIVEMETASTOREFQDN> is the Fully Qualified Domain Name of the
Hive Metastore host.
Create the Hive database.
The Hive database must be created before loading the Hive database schema.
# mysql -u root -p CREATE DATABASE <HIVEDATABASE>
Where <HIVEDATABASE> is the Hive database name.
4
•
For a HDP 2.2 Stack:
You do not need to pre-load the Hive Metastore database schema into your MySQL
database for a HDP 2.2 Stack.
•
262
March 26, 2015
You must pre-load the Hive database schema into your MySQL database using the
schema script, as follows.
mysql -u root -p <HIVEDATABASE> hive-schema-0.13.0.mysql.sql
Find the hive-schema-0.13.0.mysql.sql file in the /var/lib/ambariserver/resources/stacks/HDP/2.1/services/HIVE/etc/ directory of the
Using Hive with PostgreSQL
To set up PostgreSQL for use with Hive:
1
On the Ambari Server host, stage the appropriate PostgreSQL connector for later
deployment.
1
yum install postgresql-jdbc*
SLES
zypper install -y postgresql-jdbc
2
Copy the connector.jar file to the Java share directory.
cp /usr/share/pgsql/postgresql-*.jdbc3.jar
/usr/share/java/postgresql-jdbc.jar
3
ls /usr/share/java/postgresql-jdbc.jar
4
Change the access mode of the.jar file to 644.
chmod 644 /usr/share/java/postgresql-jdbc.jar
5
ambari-server setup --jdbc-db=postgres --jdbcdriver=/usr/share/java/postgresql-connector-java.jar
2
•
echo "CREATE DATABASE <HIVEDATABASE>;" | psql -U postgres
echo "CREATE USER <HIVEUSER> WITH PASSWORD '<HIVEPASSWORD>';" |
psql -U postgres
echo "GRANT ALL PRIVILEGES ON DATABASE <HIVEDATABASE> TO
<HIVEUSER>;" | psql -U postgres
•
Where <HIVEUSER> is the Hive user name, <HIVEPASSWORD> is the Hive user
password and <HIVEDATABASE> is the Hive database name.
263
3
March 26, 2015
•
You do not need to pre-load the Hive Metastore database schema into your
PostgreSQL database for a HDP 2.2 Stack.
•
You must pre-load the Hive database schema into your PostgreSQL database using
the schema script, as follows:
# psql -U <HIVEUSER> -d <HIVEDATABASE>
\connect <HIVEDATABASE>;
\i hive-schema-0.13.0.postgres.sql;
Find the hive-schema-0.13.0.postgres.sql file in the /var/lib/ambariserver/resources/stacks/HDP/2.1/services/HIVE/etc/ directory of the
•
Find the hive-schema-0.12.0.postgres.sql file in the /var/lib/ambariserver/resources/stacks/HDP/2.0.6/services/HIVE/etc/ directory of the
•
Find the hive-schema-0.10.0.postgres.sql file in the /var/lib/ambariserver/resources/stacks/HDP/1.3.2/services/HIVE/etc/ directory of the
Troubleshooting Hive
Use these entries to help you troubleshoot any issues you might have installing Hive with non-default
databases.
264
March 26, 2015
Problem: Hive Metastore Install Fails Using Oracle
Check the install log:
cp /usr/share/java/${jdbc_jar_name} ${target}] has failures: true
The Oracle JDBC.jar file cannot be found.
Solution Make sure the file is in the appropriate directory on the Hive Metastore server and click Retry.
Problem: Install Warning when "Hive Check Execute" Fails Using Oracle
java.sql.SQLSyntaxErrorException: ORA-01754:
a table may contain only one column of type LONG
The Hive Metastore schema was not properly loaded into the database.
Solution Ignore the warning, and complete the install. Check your database to confirm the Hive Metastore
schema is loaded. In the Ambari Web GUI, browse to Services > Hive. Choose Service Actions >
Service Check to check that the schema is correctly in place.
Problem: Hive Check Execute may fail after completing an Ambari upgrade to version
1.4.2
For secure and non-secure clusters, with Hive security authorization enabled, the Hive service check
may fail. Hive security authorization may not be configured properly.
Solution Two workarounds are possible. Using Ambari Web, in HiveConfigsAdvanced:
•
Disable hive.security.authorization, by setting the
hive.security.authorization.enabled value to false.
or
•
Properly configure Hive security authorization. For example, set the following properties:
For more information about configuring Hive security, see Metastore Server Security in Hive
Authorization and the HCatalog document Storage Based Authorization.
Table 30. Hive Security Authorization Settings Property
hive.security.authorization.manag
er
hive.security.metastore.authorizati
on.manager
hive.security.authenticator.manag
er
Value
org.apache.hadoop.hive.ql.security.authorization.StorageBased
AuthorizationProvider
org.apache.hadoop.hive.ql.security.authorization.StorageBased
AuthorizationProvider
org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator
265
March 26, 2015
Metastore Server Security
Hive Authorization
Storage Based Authorization
Using Non-Default Databases - Oozie
The following sections describe how to use Oozie with an existing database, other than the Derby
database instance that Ambari installs by default.
•
Using Oozie with Oracle
•
Using Oozie with MySQL
•
Using Oozie with PostgreSQL
•
Troubleshooting Non-Default Databases with Oozie
Using Oozie with Oracle
To set up Oracle for use with Oozie:
1
On the Ambari Server host, stage the appropriate JDBC driver file for later deployment.
1
2
Select Oracle Database 11g Release 2 - ojdbc6.jar.
3
4
Execute the following command, adding the path to the downloaded.jar file:
ambari-server setup --jdbc-db=oracle --jdbcdriver=/path/to/downloaded/ojdbc6.jar
2
Create a user for Oozie and grant it permissions.
Using the Oracle database admin utility, run the following commands:
CREATE USER <OOZIEUSER> IDENTIFIED BY <OOZIEPASSWORD>;
GRANT ALL PRIVILEGES TO <OOZIEUSER>;
GRANT CONNECT, RESOURCE TO <OOZIEUSER>;
QUIT;
Where <OOZIEUSER> is the Oozie user name and <OOZIEPASSWORD> is the Oozie user
password.
Using Oozie with MySQL
To set up MySQL for use with Oozie:
1
On the Ambari Server host, stage the appropriate MySQL connector for later deployment.
1
266
March 26, 2015
yum install mysql-connector-java*
SLES
zypper install mysql-connector-java*
UBUNTU
apt-get install mysql-connector-java*
2
Confirm that mysql-connector-java.jar is in the Java share directory.
3
4
ambari-server setup --jdbc-db=mysql --jdbcdriver=/usr/share/java/mysql-connector-java.jar
2
•
Using the MySQL database admin utility:
# mysql -u root -p
CREATE USER ‘<OOZIEUSER>’@’%’ IDENTIFIED BY ‘<OOZIEPASSWORD>’;
GRANT ALL PRIVILEGES ON *.* TO '<OOZIEUSER>'@'%';
FLUSH PRIVILEGES;
•
3
Where <OOZIEUSER> is the Oozie user name and <OOZIEPASSWORD> is the Oozie
user password.
Create the Oozie database.
•
The Oozie database must be created prior.
# mysql -u root -p
CREATE DATABASE <OOZIEDATABASE>
•
Where <OOZIEDATABASE> is the Oozie database name.
Using Oozie with PostgreSQL
To set up PostgreSQL for use with Oozie:
1
On the Ambari Server host, stage the appropriate PostgreSQL connector for later
deployment.
1
yum install postgresql-jdbc
267
March 26, 2015
SLES
zypper install -y postgresql-jdbc
UBUNTU
apt-get install -y postgresql-jdbc
2
Copy the connector.jar file to the Java share directory.
cp /usr/share/pgsql/postgresql-*.jdbc3.jar
/usr/share/java/postgresql-jdbc.jar
3
ls /usr/share/java/postgresql-jdbc.jar
4
Change the access mode of the .jar file to 644.
chmod 644 /usr/share/java/postgresql-jdbc.jar
5
ambari-server setup --jdbc-db=postgres --jdbcdriver=/usr/share/java/postgresql-connector-java.jar
2
•
echo "CREATE DATABASE <OOZIEDATABASE>;" | psql -U postgres
echo "CREATE USER <OOZIEUSER> WITH PASSWORD '<OOZIEPASSWORD>';" |
psql -U postgres
echo "GRANT ALL PRIVILEGES ON DATABASE <OOZIEDATABASE> TO
<OOZIEUSER>;" | psql -U postgres
•
Where <OOZIEUSER> is the Oozie user name, <OOZIEPASSWORD> is the Oozie user
password and <OOZIEDATABASE> is the Oozie database name.
Troubleshooting Oozie
Use these entries to help you troubleshoot any issues you might have installing Oozie with nondefault databases.
Problem: Oozie Server Install Fails Using MySQL
cp /usr/share/java/mysql-connector-java.jar
usr/lib/oozie/libext/mysql-connector-java.jar
has failures: true
268
March 26, 2015
The MySQL JDBC.jar file cannot be found.
Solution Make sure the file is in the appropriate directory on the Oozie server and click Retry.
Problem: Oozie Server Install Fails Using Oracle or MySQL
Exec[exec cd /var/tmp/oozie &&
/usr/lib/oozie/bin/ooziedb.sh create -sqlfile oozie.sql -run ]
has failures: true
Oozie was unable to connect to the database or was unable to successfully setup the schema for
Oozie.
Solution Check the database connection settings provided during the Customize Services step in the
install wizard by browsing back to Customize Services > Oozie. After confirming and adjusting
your database settings, proceed forward with the install wizard.
If the Install Oozie Server wizard continues to fail, get more information by connecting directly to the
Oozie server and executing the following command as <OOZIEUSER>:
su oozie /usr/lib/oozie/bin/ooziedb.sh create -sqlfile oozie.sql -run
269
March 26, 2015
Setting up an Internet Proxy Server for Ambari
If you plan to use the public repositories for installing the Stack, Ambari Server must have Internet
access to confirm access to the repositories and validate the repositories. If your machine requires
use of a proxy server for Internet access, you must configure Ambari Server to use the proxy server.
How To Set Up an Internet Proxy Server for Ambari
How To Set Up an Internet Proxy Server for Ambari
1
On the Ambari Server host, add proxy settings to the following script: /var/lib/ambariserver/ambari-env.sh.
-Dhttp.proxyHost=<yourProxyHost> -Dhttp.proxyPort=<yourProxyPort>
2
Optionally, to prevent some host names from accessing the proxy server, define the list of
excluded hosts, as follows:
-Dhttp.nonProxyHosts=<pipe|separated|list|of|hosts>
3
If your proxy server requires authentication, add the user name and password, as follows:
-Dhttp.proxyUser=<username> -Dhttp.proxyPassword=<password>
4
Restart the Ambari Server to pick up this change.
If you plan to use local repositories, see Using a Local Repository. Configuring Ambari
to use a proxy server and have Internet access is not required. The Ambari Server must
have access to your local repositories.
270
March 26, 2015
Configuring Network Port Numbers
This chapter lists port number assignments required to maintain communication between Ambari
Server, Ambari Agents, and Ambari Web.
•
Default Network Port Numbers - Ambari
•
Optional: Changing the Default Ambari Server Port
For more information about configuring port numbers for Stack components, see Configuring
Ports in the HDP Stack documentation.
Default Network Port Numbers - Ambari
The following table lists the default ports used by Ambari Server and Ambari Agent services.
Service
Servers
Default Ports
Used
Protocol
Description
Ambari
Server
Ambari
Server host
8080 See
Optional:
Change the
Ambari Server
Port for
instructions on
changing the
default port.
Interface to
Ambari Web and
Ambari REST API
Ambari
Server
Ambari
Server host
8440
http See
Optional:
Set Up
HTTPS for
Ambari
Server for
instructions
on enabling
HTTPS.
https
Ambari
Server
Ambari
Server host
8441
https
Ambari
Agent
All hosts
running
Ambari
Agents
8670 You can
change the
Ambari Agent
ping port in
the Ambari
Agent
configuration.
tcp
Handshake Port
for Ambari Agents
to Ambari Server
Registration and
Heartbeat Port for
Ambari Agents to
Ambari Server
Ping port used for
alerts to check
the health of the
Ambari Agent
Need
End
User
Access?
No
Configuration
Parameters
No
No
No
Optional: Changing the Default Ambari Server Port
By default, Ambari Server uses port 8080 to access the Ambari Web UI and the REST API. To change
the port number, you must edit the Ambari properties file.
Ambari Server should not be running when you change port numbers. Edit ambari.properties
before you start Ambari Server the first time or stop Ambari Server before editing properties.
271
March 26, 2015
1
On the Ambari Server host, open /etc/ambari-server/conf/ambari.properties with a
text editor.
2
Add the client API port property and set it to your desired port value:
client.api.port=<port_number>
3
Start or re-start the Ambari Server. Ambari Server now accesses Ambari Web via the newly
configured port:
http://<your.ambari.server>:<port_number>
272
March 26, 2015
Changing the JDK Version on an Existing Cluster
During your initial Ambari Server Setup, you selected the JDK to use or provided a path to a custom
JDK already installed on your hosts. After setting up your cluster, you may change the JDK version
using the following procedure.
How to change the JDK Version for an Existing Cluster
How to change the JDK Version for an Existing Cluster
1
Re-run Ambari Server Setup.
ambari-server setup
2
At the prompt to change the JDK, Enter y.
Do you want to change Oracle JDK [y/n] (n)? y
3
At the prompt to choose a JDK, Enter 1 to change the JDK to v1.7.
[1] [2] [3] Enter
Oracle JDK 1.7
Oracle JDK 1.6
Custom JDK
choice: 3
If you choose Oracle JDK 1.7 or Oracle JDK 1.6, the JDK you choose downloads and installs
automatically.
4
If you choose Custom JDK, verify or add the custom JDK path on all hosts in the cluster.
5
After setup completes, you must restart each component for the new JDK to be used by the
Hadoop services.
Using the Ambari Web UI, do the following tasks:
•
Restart each component
•
Restart each host
•
Restart all services
For more information about managing services in your cluster, see Managing Services.
273
March 26, 2015
Using Ambari Blueprints
Overview: Ambari Blueprints
Ambari Blueprints provide an API to perform cluster installations. You can build a reusable
“blueprint” that defines which Stack to use, how Service Components should be laid out across a
cluster and what configurations to set.
After setting up a blueprint, you can call the API to instantiate the cluster by providing the list of
hosts to use. The Ambari Blueprint framework promotes reusability and facilitates automating cluster
installations without UI interaction.
Learn more about Ambari Blueprints API on the Ambari Wiki.
274
March 26, 2015
Configuring HDP Stack Repositories for Red Hat
Satellite
As part of installing HDP Stack with Ambari, HDP.repo and HDP-UTILS.repo files are generated
and distributed to the cluster hosts based on the Base URL user input from the Cluster Install Wizard
during the Select Stack step. In cases where you are using Red Hat Satellite to manage your Linux
infrastructure, you can disable the repositories defined in the HDP Stack .repo files and instead
leverage Red Hat Satellite.
How To Configure HDP Stack Repositories for Red Hat Satellite
How To Configure HDP Stack Repositories for Red Hat
Satellite
To disable the repositories defined in the HDP Stack.repo files:
1
Before starting the Ambari Server and installing a cluster, on the Ambari Server browse to the
Stacks definition directory.
cd /var/lib/ambari-server/resources/stacks/
2
Browse the install hook directory:
For HDP 2.0 or HDP 2.1 Stack
cd HDP/2.0.6/hooks/before-INSTALL/templates
For HDP 1.3 Stack
cd HDP/1.3.2/hooks/before-INSTALL/templates
3
Modify the.repo template file
vi repo_suse_rhel.j2
4
Set the enabled property to 0 to disable the repository.
enabled=0
5
Save and exit.
6
Start the Ambari Server and proceed with your install.
The .repo files will still be generated and distributed during cluster install but the repositories defined
in the .repo files will not be enabled.
You must configure Red Hat Satellite to define and enable the Stack repositories.
Please refer to the Red Hat Satellite documentation for more information.
275
March 26, 2015
Tuning Ambari Performance
For clusters larger than 200 nodes, calculate and set a larger task cache size on the Ambari server.
How To Tune Ambari Performance
For clusters larger than 200 nodes:
1
Calculate the new, larger cache size, using the following relationship:
ecCacheSizeValue=60*<cluster_size>
where <cluster_size> is the number of nodes in the cluster.
2
On the Ambari Server host, in etc/ambari-server/conf/ambari-properties, add the
following property and value:
server.ecCacheSize=<ecCacheSizeValue>
where <ecCacheSizeValue> is the value calculated previously, based on the number of
nodes in the cluster.
3
276
March 26, 2015
Using Ambari Views
Ambari includes the Ambari Views Framework, which allows for developers to create UI components
that “plug into” the Ambari Web interface. Ambari includes a built-in set of Views that are predeployed. This section describes the views that are included with Ambari and their configuration.
View
Tez
Slider
Jobs
Description
View information related to
Tez jobs that are executing on
the cluster.
A tool to help deploy and
manage Slider-based
applications.
A visualization tool for Hive
queries that execute on the
Tez engine.
HDP Stacks
HDP 2.2 or
later
Required Services
HDFS, YARN, Tez, Hive, Pig
HDP 2.1 or
later
HDFS, YARN
HDP 2.1 or
later
HDFS, YARN, Tez, Hive
Topics
•
Using Tez View
•
Using Slider View
•
Using Jobs View
Learning More About Views
You can learn more about the Views Framework at the following resources:
Resource
Administering
Views
Ambari Project
Wiki
Example Views
View
Contributions
URL
Ambari Administration Guide - Managing Views
https://cwiki.apache.org/confluence/display/AMBARI/Views
https://github.com/apache/ambari/tree/trunk/ambari-views/examples
https://github.com/apache/ambari/tree/trunk/contrib/views
Tez View
Tez is a general, next-generation execution engine like MapReduce that can efficiently execute jobs
from multiple applications such as Apache Hive and Apache Pig. When you run a job such as a Hive
query or Tez script using Tez, you can use the Tez View to track and debug the execution of that job.
Topics in this section describe how to configure, deploy and use the Tez View to execute jobs in your
cluster.
Configuring Tez in Your Cluster
In your cluster, confirm the following configurations are set:
277
Configuration
yarn-site.xml
March 26, 2015
Property
yarn.resourcemanager.systemmetrics-publisher.enabled
yarn.timeline-service.enabled
yarn.timelineservice.webapp.address
yarn-site.xml
yarn-site.xml
Comments
Enable generic history service in timeline server.
Verify that this property is set=true.
Enabled the timeline server for logging details.
Value must be the IP:PORT on which timeline
server is running
Deploying the Tez View
To deploy the Tez View, you must first configure Ambari for Tez, and then configure Tez to make use
of the Tez View in Ambari.
1
Configure Ambari for Tez.
a
From the Ambari Administration interface, browse to the Views section.
b
Click to expand the Tez view and click Create Instance.
c
Enter the instance name, the display name and description.
d
Enter the configuration properties for your cluster.
Property
YARN Timeline
Server URL
(required)
YARN
ResourceManager
URL
(required)
Description
The URL to the YARN Application Timeline Server,
used to provide Tez information. Typically this is the
yarn.timeline-service.webapp.address property in the
yarn-site.xml configuration. URL must be
accessible from Ambari Server host.
The URL to the YARN ResourceManager, used to
provide YARN Application data. Typically this is the
yarn.resourcemanager.webapp.address property in the
yarn-site.xml configuration. URL must be
accessible from Ambari Server host.
Example
http://yarn.timelineservice.hostname:8188
As in the following examples, URLs that you provide as properties in the Views
configuration for both YARN Timeline Server and YARN ResourceManager must
include "http://" .
e
Save the View.
278
2
March 26, 2015
Configure Tez to make use of the Tez View in Ambari:
a
From Ambari > Admin, Open the Tez View, then choose "Go To Instance".
b
Copy the URL for the Tez View from your web browser's address bar.
c
Select Services > Tez > Configs.
d
In custom tez-site, add the following property:
Key: tez.tez-ui.history-url.base
Value: <Tez View URL>
where <Tez View URL> is the the URL you copied from the browser session for the
open Tez View.
e
Restart Tez.
f
Restart Hive.
For more information about managing Ambari Views, see Managing Views in the Ambari
If your cluster is configured for Kerberos,you must set up Ambari Server for Kerberos
for the Tez view to access the ATS component. For more information, see Set Up
Kerberos for Ambari Server.
279
March 26, 2015
Hive SQL on Tez - DAG, Vertex and Task
In Hive, the user query written in SQL is compiled and for execution converted into a Tez execution
graph, or more precisely a Directed Acyclic Graph (DAG). A DAG is a collection of Vertices where
each Vertex executes a part, or fragment of the user Query. The directed connections between
Vertices determine the order in which they are executed. For example, the Vertex to read a table has
to be run before a filter can be applied to the rows of that table.
Let’s say that a Vertex reads a user table. This table can be very large and distributed across multiple
machines and multiple racks. So, this table read is achieved by running many tasks in parallel. Here
is a simplified example using a sample query that shows the execution of a SQL query in Hive.
Executing a SQL query in Hive The Tez View tool lets your more easily understand and debug any submitted Tez job. Examples of
Tez jobs include: a Hive query or Pig script executed using the Tez execution engine. Specifically,
Tez helps you do the following tasks:
•
Identify the Tez DAG for your Job
•
Better Understand How your Query is Being Executed
•
Identify the Cause of a Failed Job
•
Identify the Cause of a Slow-Performing Job
•
Monitor Task Progress for a Job
280
March 26, 2015
Identify the Tez DAG for your job
The Tez View displays a list of jobs sorted by time, latest first. You can search a job using the
following fields:
•
DagID
•
User
•
Start Time
•
Job Status
Status
Submitted
Running
Succeeded
Failed
Killed
Error
Description
The DAG has been submitted to Tez but has not started running yet.
The DAG is currently running.
The DAG completed successfully.
The DAG failed to complete successfully.
The DAG was stopped manually.
An internal error occurred when executing the DAG.
Table 31. Tez Job Status Descriptions The Tez View is the primary entry point for finding a Tez job. At this point, no other UI links to the Tez
View. To select columns shown in the Tez View, choose the wheel icon, select field names, then
choose OK.
281
March 26, 2015
Better Understand How your Job is being Executed
This is the primary use case that was not available earlier. Users were not able to get insight into how
their tasks are running. This allows the user to identify the complexity and progress of a running job.
The View Tab shows the following:
•
The DAG graphical view
•
All Vertices
•
Tasks per Vertex on top right of Vertex
•
Failed Vertex displays red to provide visual contrast with successful vertices that display
green
•
Details of timelines are available on mouse-over on a Vertex
282
March 26, 2015
The View Tab provides a launching place to further investigate the Vertices that have failures or are
taking time.
Identify the Cause of a Failed Job
Previously, a Tez task that failed gave an error code such as 1. Someone familiar with Tez error logs
had to log in and review the logs to find why a particular task failed. The Tez View exposes errors in a
way that you can more easily find and report.
When a Tez task fails, you must be able to:
•
Identify the reason for task failure
•
Capture the reason for task failure
When a Tez task fails, the Tez Detail Tab show the failure as follows:
283
March 26, 2015
See Details of Failing Tasks
Multiple task failures may occur. The Tez Tasks Tab lets you see all tasks that failed and examine
the reason and logs for each failure. Logs for genuine failures; not for killed tasks are available to
download from the Tez Tasks Tab.
Identify the Cause of a Slow-Performing Job
The Tez View shows counters at the Vertex and Task levels that let you understand why a certain
task is performing more slowly than expected.
Counters at Vertex and Task Level Counters are available at the DAG, Vertex, and Task levels Counters help you understand the task
size better and find any anomalies. Elapsed time is one of the primary counters to look at.
DAG-level Counters
284
March 26, 2015
Vertex-level Counters
Task-level Counters
Monitor Task Progress for a Job
The Tez View shows task progress by increasing count of completed tasks and total tasks. This
allows you identify hung tasks and get insight into long running tasks.
285
March 26, 2015
Using the Jobs View
The Jobs view provides a visualization for Hive queries that have executed on the Tez engine.
Deploying the Jobs View
Refer to the Ambari Administration guide for general information about Managing Views.
1
2
Click to expand the Jobs view and click Create Instance.
3
4
Property
yarn.ats.url
(required)
yarn.resourcemanager.url
(required)
5
Description
The URL to the YARN Application
Timeline Server, used to provide Tez
information. Typically this is the
yarn.timeline-service.webapp.address
property in the yarn-site.xml
configuration. URL must be
accessible from Ambari Server
host.
The URL to the YARN ResourceManager,
used to provide YARN Application data.
Typically this is the
yarn.resourcemanager.webapp.address
property in the yarn-site.xml
configuration. URL must be
accessible from Ambari Server
host.
Example
Save the view.
If your cluster is configured for Kerberos, you must set up the Ambari Server for
Kerberos to provide the Jobs view access to the ATS component. For more
information, see Set Up Kerberos for Ambari Server in the Ambari Security Guide.
Using the Slider View
Slider is a framework for deploying and managing long-running applications on YARN. When
applications are packaged using Slider for YARN, the Slider View can be used to help deploy and
manage those applications from Ambari.
Deploying the Slider View
Refer to the Ambari Administration guide for general information about Managing Views.
1
2
Click to expand the Slider view and click Create Instance.
286
March 26, 2015
3
4
Property
Ambari
Server
URL
(required)
Ambari
Server
Username
(required)
Ambari
Server
Password
(required)
Slider User
Kerberos
Principal
Kerberos
Keytab
5
Description
The Ambari REST URL to the
cluster resource.
Example
http://ambari.server:8080/api/v1/clusters/MyCluster
The username to connect to
Ambari. Must be an Ambari
Admin user.
admin
The password for the Ambari
user.
password
The user to deploy slider
applications as. By default, the
applications will be deployed as
the “yarn” service account user.
To use the current logged-in
Ambari user, enter
${username}.
The Kerberos principal for Ambari
views. This principal identifies the
process in which the view runs.
Only required if your cluster is
configured for Kerberos. Be sure
to configure the view principal as
a proxy user in core-site.
The Kerberos keytab for Ambari
views. Only required if your
cluster is configured for
Kerberos.
joe.user or ${username}
[email protected]
/path/to/keytab/view-principal.headless.keytab
Save the view.
287

Ambari 2.0.0 Documentation Suite

Transcription

Similar documents

Hortonworks Data Platform - Ambari Troubleshooting Guide

Troubleshooting Ambari Issues

Ambari 2.0.0 Troubleshooting Guide

Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide

Creating a Local User

Ambari 2.0.0 Security Guide