VMware Greenplum
Backup and Restore v1.25
Documentation
VMware Greenplum Backup and Restore 1.25
You can find the most up-to-date technical documentation on the VMware website at:
https://docs.vmware.com/
VMware, Inc.
3401 Hillview Ave.
Palo Alto, CA 94304
www.vmware.com
Copyright © 2023 VMware, Inc. All rights reserved. Copyright and trademark information.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
2
Contents
Tanzu Greenplum® Backup and Restore 8
Getting More Information 8
Release Notes 9
Version 1.25.0 9
Platform Support 9
Software Component Versions 9
gpbackup v1.25.0 9
gpbackup_helper v1.25.0 10
gprestore v1.25.0 10
gpbackup_manager v1.5.1 10
Backup/Restore Plugin API 10
gpbackup_ddboost_plugin v1.7.0 10
gpbackup_s3_plugin v1.8.1 10
Data Domain Boost 10
Known Issues 11
Differences Compared to Open Source Greenplum Backup and Restore 13
Release Numbering Conventions 13
Backup and Restore Overview 14
Parallel Backup with gpbackup and gprestore 14
Non-Parallel Backup with pg_dump 14
Installation Guide 16
Installing the gppkg Distribution 16
Installing the tarball Distribution 16
Installing pgcrypto in Greenplum Database 17
Backing Up and Restoring Databases 18
Parallel Backup with gpbackup and gprestore 18
Requirements and Limitations 19
Objects Included in a Backup or Restore 20
Performing Basic Backup and Restore Operations 22
Restoring from Backup 24
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
3
Report Files 25
History File 26
Return Codes 26
Filtering the Contents of a Backup or Restore 26
Filtering by Leaf Partition 27
Filtering with gprestore 28
Configuring Email Notifications 29
gpbackup and gprestore Email File Format 30
Email YAML File Sections 30
Examples 31
Understanding Backup Files 32
Segment Data Files 34
Creating and Using Incremental Backups with gpbackup and gprestore 34
About Incremental Backup Sets 35
Using Incremental Backups 36
Example Using Incremental Backup Sets 36
Creating an Incremental Backup with gpbackup 37
Restoring from an Incremental Backup with gprestore 38
Incremental Backup Notes 39
Using gpbackup and gprestore with BoostFS 40
Installing BoostFS 40
Backing Up and Restoring with BoostFS 41
Using gpbackup Storage Plugins 42
Using the S3 Storage Plugin with gpbackup and gprestore 42
Prerequisites 43
Installing the S3 Storage Plugin 43
Using the S3 Storage Plugin 43
S3 Storage Plugin Configuration File Format 44
Using the DD Boost Storage Plugin with gpbackup, gprestore, and
gpbackup_manager
46
DD Boost Storage Plugin Configuration File Format 47
Examples 49
Best Practices 51
Filtered Restore with the DD Boost Storage Plugin 51
Notes 51
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
4
Backup/Restore Storage Plugin API 51
Plugin Configuration File 52
Plugin API 53
Plugin Commands 54
Implementing a Backup/Restore Storage Plugin 54
Verifying a Backup/Restore Storage Plugin 55
Procedure 55
Packaging and Deploying a Backup/Restore Storage Plugin 56
backup_data 56
Synopsis 57
Description 57
Arguments 57
Exit Code 57
backup_file 57
Synopsis 57
Description 57
Arguments 57
Exit Code 58
cleanup_plugin_for_backup 58
Synopsis 58
Description 58
Arguments 58
Exit Code 59
cleanup_plugin_for_restore 59
Synopsis 59
Description 59
Arguments 60
Exit Code 60
delete_backup 60
Synopsis 60
Description 61
Arguments 61
Exit Code 61
Example 61
plugin_api_version 61
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
5
Synopsis 61
Description 61
Return Value 61
restore_data 61
Synopsis 62
Description 62
Arguments 62
Exit Code 62
See Also 62
restore_data_subset 62
Synopsis 62
Description 62
Arguments 63
Exit Code 63
See Also 63
restore_file 63
Synopsis 63
Description 64
Arguments 64
Exit Code 64
setup_plugin_for_backup 64
Synopsis 64
Description 64
Arguments 64
Exit Code 65
setup_plugin_for_restore 65
Synopsis 65
Description 66
Arguments 66
Exit Code 66
Backup Utility Reference 68
gpbackup 68
Synopsis 68
Description 68
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
6
Options 69
Return Codes 74
Schema and Table Names 74
Examples 75
See Also 75
gprestore 76
Synopsis 76
Description 76
Options 77
Return Codes 83
Examples 83
See Also 84
gpbackup_manager 84
Synopsis 84
Commands 84
Options 85
Description 85
Examples 87
See Also 89
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
7
Tanzu Greenplum® Backup and Restore
VMware provides a separate downloadable package with the following components:
gpbackup utility
gprestore utility
gpbackup_manager utility
Backup plugins for DD Boost and S3
Each Tanzu Greenplum Backup and Restore package is versioned independently of Greenplum
Database, but is backwards-compatible with earlier versions of Greenplum Database. See the
Release Notes for additional compatibility information.
Getting More Information
Use the reference page links on this site to view the exact syntax for specific versions of the Tanzu
Greenplum Backup and Restore software.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
8
Release Notes
Release Date: June 24, 2022
Tanzu Greenplum Backup and Restore utilities are released separately from Tanzu Greenplum
Database (as of Backup and Restore v1.13.0), and are updated independently of the core server.
Version 1.25.0
Tanzu Greenplum Backup and Restore version 1.25.0 is a minor release that introduces new features
and resolves a number of issues.
Platform Support
Tanzu Greenplum Backup and Restore is compatible with these Greenplum Database versions:
Tanzu Greenplum Database 4.3.22 and later
Tanzu Greenplum Database 5.5.0 and later
Tanzu Greenplum Database 6.0.0 and later
Software Component Versions
This release includes the following utilities:
gpbackup v1.25.0
gpbackup_helper v1.25.0
gprestore v1.25.0
gpbackup_manager v1.5.1
Backup/Restore Plugin API
gpbackup_ddboost_plugin v1.7.0
gpbackup_s3_plugin v1.8.1
See Data Domain Boost for Data Domain Boost support information.
See Known Issues for a list of issues known to exist in this release.
See Differences Compared to Open Source Greenplum Backup and Restore for the list of features
that are only in Tanzu Greenplum Backup and Restore.
See Release Numbering Conventions for a description of the Tanzu Greenplum Backup and Restore
release numbering scheme.
gpbackup v1.25.0
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
9
New Feature
gpbackup can now export synchronized distributed snapshots, ensuring data consistency
across parallel worker processes used per the --jobs option. As a result of this
enhancement, the worker processes no longer need to hold ACCESS SHARE locks for the
entire duration of the backup operation.
NOTE: This gpbackup feature requires Greenplum Database version 6.21 or higher.
Resolved Issues
[32110] Resolves an issue where implicit casts may cause backups to fail.
[32118] Resolves an issue where applications that queried the gp_segment_configuration
table were blocking because they were competing for connections.
gpbackup_helper v1.25.0
Resolved Issue
[181670325] Resolves an issue where gpbackup_helper processes were lingering
unnecessarily, rather than terminating.
gprestore v1.25.0
No changes since VMware Tanzu Greenplum Backup and Restore version 1.24.0.
gpbackup_manager v1.5.1
No changes since VMware Tanzu Greenplum Backup and Restore version 1.24.0.
Backup/Restore Plugin API
No changes since Tanzu Greenplum Backup and Restore version 1.19.0.
gpbackup_ddboost_plugin v1.7.0
No changes since VMware Tanzu Greenplum Backup and Restore version 1.24.0.
gpbackup_s3_plugin v1.8.1
No changes since VMware Tanzu Greenplum Backup and Restore version 1.24.0.
Data Domain Boost
This release of gpbackup and gprestore supports Data Domain Boost for backup on Red Hat
Enterprise Linux. This table lists the supported versions of DDOS, Data Domain Boost SDK, and
BoostFS.
DDOS Data Domain Boost BoostFS
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
10
7.7 7.7 7.7
7.6 7.6 7.6
7.5 7.5 7.5
7.4 7.4 7.4
7.2 7.2 7.2
7.1 7.1 7.0
6.2 3.5 1.3
6.1 (all versions) 3.4 1.1
6.0 (all versions) 3.3 n/a
Note: In addition to the DDOS versions listed in the previous table, gpbackup and gprestore support
all minor patch releases (fourth digit releases) later than the certified version.
Known Issues
Greenplum Backup and Restore versions 1.21.0 and 1.22.0 are incompatible with Centos 6
(glibc 2.12) and SLES 11 (glibc 2.11). The workaround is to upgrade to version 1.23.0 or later.
When attempting to restore data from Greenplum 4.3.x/5.x/6.x (prior to Greenplum v6.14.1
and v5.28.6), views with anyarray typecasts are not restorable. From Tanzu Greenplum
Backup and Restore v1.20.3, gpbackup generates an error when such views are detected.
Workaround: On the original cluster, identify any views that exhibit this symptom, using a
query similar to:
SELECT relname AS anyarray_views
FROM pg_class WHERE relkind = 'v' AND oid >= 16384 AND (pg_ge
t_viewdef(oid)
LIKE '%::anyarray%') IS TRUE;
Then select option 1 or 2:
1. Using the backup set with the errors, run gprestore with the --on-error-continue
flag. The affected views will not be re-created. After the restore completes, re-create
each VIEW using your original VIEW definition.
2. Drop each affected VIEW on the original Greenplum cluster. Take another backup of
the older cluster and run gprestore on the new cluster. Finally, recreate the VIEW on
the new cluster using the original VIEW definition.
When attempting to restore data from Greenplum 4.3.x/5.x into Greenplum 6.x, if the
restore involved a table distributed by character(n)or char(n) using the legacy bpchar hash
operator, the restore would fail. This issue has been resolved in Greenplum Database 6.12.
Upgrade to the latest Greenplum release to avoid this issue.
gprestore does not support restoring a backup that contains partitioned tables where the
table is created with a non-reserved keyword that is used as a partition name. For example,
the non-reserved keyword at is used as a partition name in this SUBPARTITION TEMPLATE
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
11
clause fragment:
...
SUBPARTITION TEMPLATE
( SUBPARTITION "at" VALUES ('usa'),
...
gpbackup backs up the partitioned table, but gprestore returns an error when attempting to
restore the table.
Before performing a backup with gpbackup, you must ensure that partitioned tables do not
use any of these non-reserved keywords as a partition name:
ADD, ALTER, ALWAYS, AT, ATTRIBUTE, CATALOG, COMMENTS, CONFIGURATION,
CONFLICT, CONTINUE, CURRENT, DATA, DAY, DENY, DEPENDS, DICTIONARY,
DISCARD, DOCUMENT, DXL, EVENT, EXTENSION, FAMILY, FILESPACE, FILTER,
FULLSCAN, FUNCTIONS, HOUR, IDENTITY, IGNORE, IMPORT, INITPLAN, INLINE, LABEL,
LEAKPROOF, LOCKED, LOGGED, MAPPING, MATERIALIZED, METHOD, MINUTE,
MONTH, NOCREATEEXTTABLE, OFF, ORDERED, ORDINALITY, OVER, PARALLEL,
PARSER, PASSING, PLANS, POLICY, PROGRAM, RANDOMLY, READABLE, READS,
RECURSIVE, REF, REFRESH, REJECT, REPLICATED, ROOTPARTITION, SECOND,
SEQUENCES, SERVER, SKIP, SNAPSHOT, SQL, STANDALONE, STRIP, TABLES, TEXT,
TRANSFORM, TYPES, UNLOGGED, VALIDATION, VARYING, VIEWS, WEB,
WHITESPACE, WITHIN, WITHOUT, WRAPPER, WRITABLE, XML, YEAR, YES
A gprestore operation using the --redirect-schema option fails if gprestore attempts to
restore an index in a schema and the name of the index is the name of the schema followed
by a '.' (period). For example, this CREATE INDEX command creates the index named test.
on a table in the schema test. The index is in the schema test.
CREATE INDEX "test." ON test.mytbl USING btree (i);
If the index and table are backed up with gpbackup, restoring the backup with this gprestore
command fails because gprestore fails to restore the test. index.
gprestore --timestamp <timestamp> --redirect-schema foo2
Beginning with versions 4.3.33 and 5.19, Greenplum Database checks that the distribution
key for a table is a prefix of any unique index key. This policy is not in place for Greenplum
Database versions before 4.3.33 or 5.19, and it is possible to create a backup of a database
from one of these earlier versions with unique indexes that do not comply with the policy.
When you restore such a backup to a Greenplum Database version that does enforce the
policy:
If the unique index is for a primary key constraint, Greenplum Database automatically
modifies the table's distribution policy if the table has no data.
In other cases, creating a unique index with a key that does not begin with the table's
distribution key fails. This issue affects restoring backups made from Greenplum
Database 4.3.32.0, 5.18.0, or earlier to a Greenplum Database version 4.3.33.0,
5.19.0, 6.0, or later system. The issue affects the gprestore, gpdbrestore, and
pg_restore utilities.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
12
Differences Compared to Open Source Greenplum Backup
and Restore
Tanzu Greenplum Backup and Restore includes all of the functionality in the open source Greenplum
Backup github repository and S3 storage plugin repository and adds:
Greenplum backup plugin for DD Boost
Greenplum gpbackup_manager utility
Greenplum Backup and Restore installation file in gppkg format
Release Numbering Conventions
The Tanzu Greenplum Backup and Restore distribution release number indicates the type of the
release.
The first number is the gpbackup/gprestore major release number. For example, given the
release number, 1.16.0, the major release number is 1.
The second number is the minor release number. Given the release number 1.16.0, the
minor release number is 16. This number increments when new features are added to the
gpbackup/gprestore utilities.
The third number is the maintenance release number. This number increments when the
gpbackup/gprestore utilities included in the distribution contain fixes without new features, or
if one or more components included in the distribution, such as gpbackup_manager and the
backup storage plugins, have been updated.
The release versions of the components included in the distribution, such as gpbackup_manager and
the backup storage plugins, are separate from the distribution version, but follow the same
numbering scheme as the distribution.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
13
Backup and Restore Overview
Greenplum Database supports parallel and non-parallel methods for backing up and restoring
databases. Parallel operations scale regardless of the number of segments in your system, because
segment hosts each write their data to local disk storage simultaneously. With non-parallel backup
and restore operations, the data must be sent over the network from the segments to the master,
which writes all of the data to its storage. In addition to restricting I/O to one host, non-parallel
backup requires that the master have sufficient local disk storage to store the entire database.
Parallel Backup with gpbackup and gprestore
gpbackup and gprestore are the Greenplum Database backup and restore utilities. gpbackup utilizes
ACCESS SHARE locks at the individual table level, instead of EXCLUSIVE locks on the pg_class catalog
table. This enables you to execute DML statements during the backup, such as CREATE, ALTER, DROP,
and TRUNCATE operations, as long as those operations do not target the current backup set.
Backup files created with gpbackup are designed to provide future capabilities for restoring individual
database objects along with their dependencies, such as functions and required user-defined
datatypes. See Parallel Backup with gpbackup and gprestore for more information.
Non-Parallel Backup with pg_dump
The PostgreSQL pg_dump and pg_dumpall non-parallel backup utilities can be used to create a single
dump file on the master host that contains all data from all active segments.
The PostgreSQL non-parallel utilities should be used only for special cases. They are much slower
than using the Greenplum backup utilities since all of the data must pass through the master.
Additionally, it is often the case that the master host has insufficient disk space to save a backup of an
entire distributed Greenplum database.
The pg_restore utility requires compressed dump files created by pg_dump or pg_dumpall. Before
starting the restore, you should modify the CREATE TABLE statements in the dump files to include the
Greenplum DISTRIBUTED clause. If you do not include the DISTRIBUTED clause, Greenplum Database
assigns default values, which may not be optimal. For details, see CREATE TABLE in the
Greenplum
Database Reference Guide
.
To perform a non-parallel restore using parallel backup files, you can copy the backup files from
each segment host to the master host, and then load them through the master.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
14
Another non-parallel method for backing up Greenplum Database data is to use the COPY TO SQL
command to copy all or a portion of a table out of the database to a delimited text file on the master
host.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
15
Installation Guide
For RHEL/CentOS and Ubuntu, you can download a Tanzu Greenplum Backup and Restore software
distribution as a package for the Greenplum package manager (gppkg). You can also download and
install the software from a tarball.
Note: If you want to use the DD Boost plugin to back up to a Dell EMC Data Domain appliance, after
you install the Greenplum Backup and Restore software see Installing pgcrypto in Greenplum
Database.
Installing the gppkg Distribution
Installing the tarball Distribution
Installing pgcrypto in Greenplum Database
Installing the gppkg Distribution
The gppkg utility installs the Greenplum Backup and Restore software on all hosts in your Greenplum
Database system.
1. Download the latest Tanzu Greenplum Backup and Restore software distribution for your
Greenplum Database version and OS platform from VMware Tanzu Network.
2. Copy the gppkg file you downloaded to the gpadmin user's home directory on the
Greenplum Database master host.
$ scp pivotal_greenplum_backup_restore-<version>.gppkg gpadmin@mdw:
3. Install the package using the Greenplum Database gppkg utility.
$ gppkg -i pivotal_greenplum_backup_restore-<version>-<platform>.gppkg
Installing the tarball Distribution
Install the Greenplum Backup and Restore tarball distribution on every host in your Greenplum
System, including the master, standby master, and segment hosts.
1. Download the latest Backup and Restore compressed tarball distribution from VMware Tanzu
Network.
2. Copy the compressed tarball file to the Greenplum Database master host.
$ scp pivotal_greenplum_backup_restore-<version>.tar.gz gpadmin@mdw:
3. Log in to the Greenplum Database master host as the gpadmin user.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
16
$ ssh gpadmin@mdw
4. Copy the Backup and Restore compressed tarball to the Greenplum Database installation
directory on the master, standby master, and every segment host.
Note: The hostfile_gpssh file contains a list of all Greenplum hosts, including the master and
standby master hosts.
$ gpscp -v -f hostfile_gpssh pivotal_greenplum_backup_restore-<version>.tar.gz
=:/$GPHOME
5. Unpack the tarball in the $GPHOME directory on every Greenplum host.
$ gpssh -f hostfile_gpssh -v -e 'cd $GPHOME; tar -xzvf pivotal_greenplum_backup
_restore-<version>.tar.gz'
6. Verify that the Backup and Restore version is installed on all of the hosts.
$ gpssh -f hostfile_gpssh -v -e 'gpbackup --version'
Installing pgcrypto in Greenplum Database
If you are using the DD Boost plugin to back up to a Dell EMC Data Domain appliance and you want
to secure Data Domain passwords in the DD Boost configuration file, you must install the pgcrypto
extension in the postgres database.
The method for installing pgcrypto differs for each Greenplum Database major version.
Greenplum Database 4.3.x - See Installing Greenplum Database Extensions in the
Greenplum Database Installation Guide
.
Greenplum Database 5.x - See Installing Optional Extensions in the
Greenplum Database
Installation Guide
.
Greenplum Database 6.x - See Installing Additional Supplied Modules in the
Greenplum
Database Installation Guide
.
You can verify that the pgcrypto functions are installed in the postgres database by listing a pgcrypto
function, for example the digest() function.
$ psql postgres
postgres=# \df digest
List of functions
Schema | Name | Result data type | Argument data types | Type
--------+--------+------------------+---------------------+--------
public | digest | bytea | bytea, text | normal
public | digest | bytea | text, text | normal
(2 rows)
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
17
Backing Up and Restoring Databases
This topic describes how to use Greenplum backup and restore features.
Performing backups regularly ensures that you can restore your data or rebuild your Greenplum
Database system if data corruption or system failure occur. You can also use backups to migrate data
from one Greenplum Database system to another.
Parallel Backup with gpbackup and gprestore
Parallel Backup with gpbackup and gprestore
gpbackup and gprestore are Tanzu Greenplum utilities that create and restore backup sets for
Greenplum Database. By default, gpbackup stores only the object metadata files and DDL files for a
backup in the Greenplum Database master data directory. Greenplum Database segments use the
COPY ... ON SEGMENT command to store their data for backed-up tables in compressed CSV data
files, located in each segment's backups directory.
The backup metadata files contain all of the information that gprestore needs to restore a full backup
set in parallel. Backup metadata also provides the framework for restoring only individual objects in
the data set, along with any dependent objects, in future versions of gprestore. (See Understanding
Backup Files for more information.) Storing the table data in CSV files also provides opportunities for
using other restore utilities, such as gpload, to load the data either in the same cluster or another
cluster. By default, one file is created for each table on the segment. You can specify the --leaf-
partition-data option with gpbackup to create one data file per leaf partition of a partitioned table,
instead of a single file. This option also enables you to filter backup sets by leaf partitions.
Each gpbackup task uses a single transaction in Greenplum Database. During this transaction,
metadata is backed up on the master host, and data for each table on each segment host is written
to CSV backup files using COPY ... ON SEGMENT commands in parallel. The backup process acquires
an ACCESS SHARE lock on each table that is backed up.
For information about the gpbackup and gprestore utility options, see gpbackup and gprestore.
Requirements and Limitations
Objects Included in a Backup or Restore
Performing Basic Backup and Restore Operations
Filtering the Contents of a Backup or Restore
Configuring Email Notifications
Understanding Backup Files
Creating and Using Incremental Backups with gpbackup and gprestore
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
18
Using gpbackup and gprestore with BoostFS
Using gpbackup Storage Plugins
Backup/Restore Storage Plugin API
Parent topic:Backing Up and Restoring Databases
Requirements and Limitations
The gpbackup and gprestore utilities are compatible with these Greenplum Database versions:
Tanzu Greenplum 4.3.22 and later
Tanzu Greenplum 5.5.0 and later
Tanzu Greenplum 6.0.0 and later
gpbackup and gprestore have the following limitations:
If you create an index on a parent partitioned table, gpbackup does not back up that same
index on child partitioned tables of the parent, as creating the same index on a child would
cause an error. However, if you exchange a partition, gpbackup does not detect that the
index on the exchanged partition is inherited from the new parent table. In this case,
gpbackup backs up conflicting CREATE INDEX statements, which causes an error when you
restore the backup set.
You can execute multiple instances of gpbackup, but each execution requires a distinct
timestamp.
Database object filtering is currently limited to schemas and tables.
When backing up a partitioned table where some or all leaf partitions are in different
schemas from the root partition, the leaf partition table definitions, including the schemas, are
backed up as metadata. This occurs even if the backup operation specifies that schemas that
contain the leaf partitions should be excluded. To control data being backed up for this type
of partitioned table in this situation, use the --leaf-partition-data option.
If the --leaf-partition-data option is not specified, the leaf partition data is also
backed up even if the backup operation specifies that the leaf partition schemas
should be excluded.
If the --leaf-partition-data option is specified, the leaf partition data will not be
backed up if the backup operation specifies that the leaf partition schemas should be
excluded. Only the metadata for leaf partition tables are backed up.
If you specify a leaf partition name with --exclude-table or in a file used with --exclude-
table-table, gpbackup ignores the partition name. The leaf partition is not excluded from the
backup.
If you use the gpbackup --single-data-file option to combine table backups into a single
file per segment, you cannot perform a parallel restore operation with gprestore (cannot set
--jobs to a value higher than 1).
Backing up a database with gpbackup while simultaneously running DDL commands might
cause gpbackup to fail, in order to ensure consistency within the backup set. For example, if a
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
19
table is dropped after the start of the backup operation, gpbackup exits and displays the error
message ERROR: relation <schema.table> does not exist.
gpbackup might fail when a table is dropped during a backup operation due to table locking
issues. gpbackup generates a list of tables to back up and acquires an ACCESS SHARED lock on
the tables. If an EXCLUSIVE LOCK is held on a table, gpbackup acquires the ACCESS SHARED lock
after the existing lock is released. If the table no longer exists when gpbackup attempts to
acquire a lock on the table, gpbackup exits with the error message.
For tables that might be dropped during a backup, you can exclude the tables from a backup
with a gpbackup table filtering option such as --exclude-table or --exclude-schema.
A backup created with gpbackup can only be restored to a Greenplum Database cluster with
the same number of segment instances as the source cluster. If you run gpexpand to add
segments to the cluster, backups you made before starting the expand cannot be restored
after the expansion has completed.
Parent topic:Parallel Backup with gpbackup and gprestore
Objects Included in a Backup or Restore
The following table lists the objects that are backed up and restored with gpbackup and gprestore.
Database objects are backed up for the database you specify with the --dbname option.
Global objects (Greenplum Database system objects) are backed up by default, but they are not
restored by default. Use the gprestore --with-globals option to restore global objects. Or, use the
gpbackup --without-globals option to prevent backing up global objects.
Table 1. Objects that are backed up and restored
Database Objects Global Objects
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
20
Session-level configuration parameter settings
(GUCs)
Schemas, see Note
Procedural language extensions
Sequences
Comments
Tables
Indexes
Owners
Writable External Tables (DDL only)
Readable External Tables (DDL only)
Functions
Aggregates
Casts
Types
Views
Materialized Views (DDL only)
Protocols
Triggers. (While Greenplum Database does not
support triggers, any trigger definitions that are
present are backed up and restored.)
Rules
Domains
Operators, operator families, and operator
classes
Conversions
Extensions
Text search parsers, dictionaries, templates, and
configurations
Table statistics (when the --with-stats option
is specified.)
Tablespaces
Database-wide configuration parameter
settings (GUCs)
Resource group definitions
Resource queue definitions
Roles
GRANT assignments of roles to databases
Note: These schemas are not included in a backup.
gp_toolkit
information_schema
pg_aoseg
pg_bitmapindex
pg_catalog
pg_toast*
pg_temp*
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
21
When restoring to an existing database, gprestore assumes the public schema exists when
restoring objects to the public schema. When restoring to a new database (with the --create-db
option), gprestore creates the public schema automatically when creating a database with the
CREATE DATABASE command. The command uses the template0 database that contains the public
schema.
See also Understanding Backup Files.
Parent topic:Parallel Backup with gpbackup and gprestore
Performing Basic Backup and Restore Operations
To perform a complete backup of a database, as well as Greenplum Database system metadata, use
the command:
$ gpbackup --dbname <database_name>
For example:
$ gpbackup --dbname demo
20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Starting backup
of database demo
20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Backup Timestamp
= 20180105112754
20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Backup Database
= demo
20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Backup Type = Un
filtered Compressed Full Backup
20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Gathering list o
f tables for backup
20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Acquiring ACCESS
SHARE locks on tables
Locks acquired: 6 / 6 [==============================================================
==] 100.00% 0s
20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Gathering additi
onal table metadata
20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Writing global d
atabase metadata
20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Global database
metadata backup complete
20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Writing pre-data
metadata
20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Pre-data metadat
a backup complete
20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Writing post-dat
a metadata
20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Post-data metada
ta backup complete
20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Writing data to
file
Tables backed up: 3 / 3 [============================================================
==] 100.00% 0s
20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Data backup comp
lete
20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Found neither /u
sr/local/greenplum-db/./bin/gp_email_contacts.yaml nor /home/gpadmin/gp_email_contacts
.yaml
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
22
20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Email containing
gpbackup report /gpmaster/seg-1/backups/20180105/20180105112754/gpbackup_201801051127
54_report will not be sent
20180105:11:27:55 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Backup completed
successfully
The above command creates a file that contains global and database-specific metadata on the
Greenplum Database master host in the default directory,
$MASTER_DATA_DIRECTORY/backups/<YYYYMMDD>/<YYYYMMDDHHMMSS>/. For example:
$ ls /gpmaster/gpsne-1/backups/20180105/20180105112754
gpbackup_20180105112754_config.yaml gpbackup_20180105112754_report
gpbackup_20180105112754_metadata.sql gpbackup_20180105112754_toc.yaml
By default, each segment stores each table's data for the backup in a separate compressed CSV file
in <seg_dir>/backups/<YYYYMMDD>/<YYYYMMDDHHMMSS>/:
$ ls /gpdata1/gpsne0/backups/20180105/20180105112754/
gpbackup_0_20180105112754_17166.gz gpbackup_0_20180105112754_26303.gz
gpbackup_0_20180105112754_21816.gz
To consolidate all backup files into a single directory, include the --backup-dir option. Note that you
must specify an absolute path with this option:
$ gpbackup --dbname demo --backup-dir /home/gpadmin/backups
20171103:15:31:56 gpbackup:gpadmin:0ee2f5fb02c9:017586-[INFO]:-Starting backup of data
base demo
...
20171103:15:31:58 gpbackup:gpadmin:0ee2f5fb02c9:017586-[INFO]:-Backup completed succes
sfully
$ find /home/gpadmin/backups/ -type f
/home/gpadmin/backups/gpseg0/backups/20171103/20171103153156/gpbackup_0_20171103153156
_16543.gz
/home/gpadmin/backups/gpseg0/backups/20171103/20171103153156/gpbackup_0_20171103153156
_16524.gz
/home/gpadmin/backups/gpseg1/backups/20171103/20171103153156/gpbackup_1_20171103153156
_16543.gz
/home/gpadmin/backups/gpseg1/backups/20171103/20171103153156/gpbackup_1_20171103153156
_16524.gz
/home/gpadmin/backups/gpseg-1/backups/20171103/20171103153156/gpbackup_20171103153156_
config.yaml
/home/gpadmin/backups/gpseg-1/backups/20171103/20171103153156/gpbackup_20171103153156_
predata.sql
/home/gpadmin/backups/gpseg-1/backups/20171103/20171103153156/gpbackup_20171103153156_
global.sql
/home/gpadmin/backups/gpseg-1/backups/20171103/20171103153156/gpbackup_20171103153156_
postdata.sql
/home/gpadmin/backups/gpseg-1/backups/20171103/20171103153156/gpbackup_20171103153156_
report
/home/gpadmin/backups/gpseg-1/backups/20171103/20171103153156/gpbackup_20171103153156_
toc.yaml
When performing a backup operation, you can use the --single-data-file in situations where the
additional overhead of multiple files might be prohibitive. For example, if you use a third party
storage solution such as Data Domain with back ups.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
23
Note: Backing up a materialized view does not back up the materialized view data. Only the
materialized view definition is backed up.
Restoring from Backup
To use gprestore to restore from a backup set, you must use the --timestamp option to specify the
exact timestamp value (YYYYMMDDHHMMSS) to restore. Include the --create-db option if the database
does not exist in the cluster. For example:
$ dropdb demo
$ gprestore --timestamp 20171103152558 --create-db
20171103:15:45:30 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Restore Key = 20171103
152558
20171103:15:45:31 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Creating database
20171103:15:45:44 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Database creation comp
lete
20171103:15:45:44 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Restoring pre-data met
adata from /gpmaster/gpsne-1/backups/20171103/20171103152558/gpbackup_20171103152558_p
redata.sql
20171103:15:45:45 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Pre-data metadata rest
ore complete
20171103:15:45:45 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Restoring data
20171103:15:45:45 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Data restore complete
20171103:15:45:45 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Restoring post-data me
tadata from /gpmaster/gpsne-1/backups/20171103/20171103152558/gpbackup_20171103152558_
postdata.sql
20171103:15:45:45 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Post-data metadata res
tore complete
If you specified a custom --backup-dir to consolidate the backup files, include the same --backup-
dir option when using gprestore to locate the backup files:
$ dropdb demo
$ gprestore --backup-dir /home/gpadmin/backups/ --timestamp 20171103153156 --create-db
20171103:15:51:02 gprestore:gpadmin:0ee2f5fb02c9:017819-[INFO]:-Restore Key = 20171103
153156
...
20171103:15:51:17 gprestore:gpadmin:0ee2f5fb02c9:017819-[INFO]:-Post-data metadata res
tore complete
gprestore does not attempt to restore global metadata for the Greenplum System by default. If this is
required, include the --with-globals argument.
By default, gprestore uses 1 connection to restore table data and metadata. If you have a large
backup set, you can improve performance of the restore by increasing the number of parallel
connections with the --jobs option. For example:
$ gprestore --backup-dir /home/gpadmin/backups/ --timestamp 20171103153156 --create-db
--jobs 8
Test the number of parallel connections with your backup set to determine the ideal number for fast
data recovery.
Note: You cannot perform a parallel restore operation with gprestore if the backup combined table
backups into a single file per segment with the gpbackup option --single-data-file.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
24
Restoring a materialized view does not restore materialized view data. Only the materialized view
definition is restored. To populate the materialized view with data, use REFRESH MATERIALIZED VIEW.
The tables that are referenced by the materialized view definition must be available when you
refresh the materialized view. The gprestore log file lists the materialized views that were restored
and the REFRESH MATERIALIZED VIEW commands that are used to populate the materialized views
with data.
Report Files
When performing a backup or restore operation, gpbackup and gprestore generate a report file.
When email notification is configured, the email sent contains the contents of the report file. For
information about email notification, see Configuring Email Notifications.
The report file is placed in the Greenplum Database master backup directory. The report file name
contains the timestamp of the operation. These are the formats of the gpbackup and gprestore
report file names.
gpbackup_<backup_timestamp>_report
gprestore_<backup_timestamp>_<restore_timesamp>_report
For these example report file names, 20180213114446 is the timestamp of the backup and
20180213115426 is the timestamp of the restore operation.
gpbackup_20180213114446_report
gprestore_20180213114446_20180213115426_report
This backup directory on a Greenplum Database master host contains both a gpbackup and
gprestore report file.
$ ls -l /gpmaster/seg-1/backups/20180213/20180213114446
total 36
-r--r--r--. 1 gpadmin gpadmin 295 Feb 13 11:44 gpbackup_20180213114446_config.yaml
-r--r--r--. 1 gpadmin gpadmin 1855 Feb 13 11:44 gpbackup_20180213114446_metadata.sql
-r--r--r--. 1 gpadmin gpadmin 1402 Feb 13 11:44 gpbackup_20180213114446_report
-r--r--r--. 1 gpadmin gpadmin 2199 Feb 13 11:44 gpbackup_20180213114446_toc.yaml
-r--r--r--. 1 gpadmin gpadmin 404 Feb 13 11:54 gprestore_20180213114446_2018021311542
6_report
The contents of the report files are similar. This is an example of the contents of a gprestore report
file.
Greenplum Database Restore Report
Timestamp Key: 20180213114446
GPDB Version: 5.4.1+dev.8.g9f83645 build commit:9f836456b00f855959d52749d5790ed1c6efc0
42
gprestore Version: 1.0.0-alpha.3+dev.73.g0406681
Database Name: test
Command Line: gprestore --timestamp 20180213114446 --with-globals --createdb
Start Time: 2018-02-13 11:54:26
End Time: 2018-02-13 11:54:31
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
25
Duration: 0:00:05
Restore Status: Success
History File
When performing a backup operation, gpbackup appends backup information in the gpbackup
history file, gpbackup_history.yaml, in the Greenplum Database master data directory. The file
contains the backup timestamp, information about the backup options, and backup set information
for incremental backups. From gpbackup v1.19.0, this file also records failed backup operations. This
file is not backed up by gpbackup.
gpbackup uses the information in the file to find a matching backup for an incremental backup when
you run gpbackup with the --incremental option and do not specify the --from-timesamp option to
indicate the backup that you want to use as the latest backup in the incremental backup set. For
information about incremental backups, see Creating and Using Incremental Backups with gpbackup
and gprestore.
Return Codes
One of these codes is returned after gpbackup or gprestore completes.
0 – Backup or restore completed with no problems
1 – Backup or restore completed with non-fatal errors. See log file for more information.
2 – Backup or restore failed with a fatal error. See log file for more information.
Parent topic:Parallel Backup with gpbackup and gprestore
Filtering the Contents of a Backup or Restore
gpbackup backs up all schemas and tables in the specified database, unless you exclude or include
individual schema or table objects with schema level or table level filter options.
The schema level options are --include-schema, --include-schema-file, or --exclude-schema, --
exclude-schema-file command-line options to gpbackup. For example, if the "demo" database
includes only two schemas, "wikipedia" and "twitter," both of the following commands back up only
the "wikipedia" schema:
$ gpbackup --dbname demo --include-schema wikipedia
$ gpbackup --dbname demo --exclude-schema twitter
You can include multiple --include-schema options in a gpbackup
or
multiple --exclude-schema
options. For example:
$ gpbackup --dbname demo --include-schema wikipedia --include-schema twitter
If you have a large number of schemas, you can list the schemas in a text file and specify the file with
the --include-schema-file or --exclude-schema-file options in a gpbackup command. Each line in
the file must define a single schema, and the file cannot contain trailing lines. For example, this
command uses a file in the gpadmin home directory to include a set of schemas.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
26
gpbackup --dbname demo --include-schema-file /users/home/gpadmin/backup-schemas
To filter the individual tables that are included in a backup set, or excluded from a backup set,
specify individual tables with the --include-table option or the --exclude-table option. The table
must be schema qualified, <schema-name>.<table-name>. The individual table filtering options can be
specified multiple times. However, --include-table and --exclude-table cannot both be used in
the same command.
You can create a list of qualified table names in a text file. When listing tables in a file, each line in
the text file must define a single table using the format <schema-name>.<table-name>. The file must
not include trailing lines. For example:
wikipedia.articles
twitter.message
If a table or schema name uses any character other than a lowercase letter, number, or an
underscore character, then you must include that name in double quotes. For example:
beer."IPA"
"Wine".riesling
"Wine"."sauvignon blanc"
water.tonic
After creating the file, you can use it either to include or exclude tables with the gpbackup options --
include-table-file or --exclude-table-file. For example:
$ gpbackup --dbname demo --include-table-file /home/gpadmin/table-list.txt
You can combine -include schema with --exclude-table or --exclude-table-file for a backup.
This example uses --include-schema with --exclude-table to back up a schema except for a single
table.
$ gpbackup --dbname demo --include-schema mydata --exclude-table mydata.addresses
You cannot combine --include-schema with --include-table or --include-table-file, and you
cannot combine --exclude-schema with any table filtering option such as --exclude-table or --
include-table.
When you use --include-table or --include-table-file dependent objects are not automatically
backed up or restored, you must explicitly specify the dependent objects that are required. For
example, if you back up or restore a view or materialized view, you must also specify the tables that
the view or the materialized view uses. If you backup or restore a table that uses a sequence, you
must also specify the sequence.
Filtering by Leaf Partition
By default, gpbackup creates one file for each table on a segment. You can specify the --leaf-
partition-data option to create one data file per leaf partition of a partitioned table, instead of a
single file. You can also filter backups to specific leaf partitions by listing the leaf partition names in a
text file to include. For example, consider a table that was created using the statement:
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
27
demo=# CREATE TABLE sales (id int, date date, amt decimal(10,2))
DISTRIBUTED BY (id)
PARTITION BY RANGE (date)
( PARTITION Jan17 START (date '2017-01-01') INCLUSIVE ,
PARTITION Feb17 START (date '2017-02-01') INCLUSIVE ,
PARTITION Mar17 START (date '2017-03-01') INCLUSIVE ,
PARTITION Apr17 START (date '2017-04-01') INCLUSIVE ,
PARTITION May17 START (date '2017-05-01') INCLUSIVE ,
PARTITION Jun17 START (date '2017-06-01') INCLUSIVE ,
PARTITION Jul17 START (date '2017-07-01') INCLUSIVE ,
PARTITION Aug17 START (date '2017-08-01') INCLUSIVE ,
PARTITION Sep17 START (date '2017-09-01') INCLUSIVE ,
PARTITION Oct17 START (date '2017-10-01') INCLUSIVE ,
PARTITION Nov17 START (date '2017-11-01') INCLUSIVE ,
PARTITION Dec17 START (date '2017-12-01') INCLUSIVE
END (date '2018-01-01') EXCLUSIVE );
NOTICE: CREATE TABLE will create partition "sales_1_prt_jan17" for table "sales"
NOTICE: CREATE TABLE will create partition "sales_1_prt_feb17" for table "sales"
NOTICE: CREATE TABLE will create partition "sales_1_prt_mar17" for table "sales"
NOTICE: CREATE TABLE will create partition "sales_1_prt_apr17" for table "sales"
NOTICE: CREATE TABLE will create partition "sales_1_prt_may17" for table "sales"
NOTICE: CREATE TABLE will create partition "sales_1_prt_jun17" for table "sales"
NOTICE: CREATE TABLE will create partition "sales_1_prt_jul17" for table "sales"
NOTICE: CREATE TABLE will create partition "sales_1_prt_aug17" for table "sales"
NOTICE: CREATE TABLE will create partition "sales_1_prt_sep17" for table "sales"
NOTICE: CREATE TABLE will create partition "sales_1_prt_oct17" for table "sales"
NOTICE: CREATE TABLE will create partition "sales_1_prt_nov17" for table "sales"
NOTICE: CREATE TABLE will create partition "sales_1_prt_dec17" for table "sales"
CREATE TABLE
To back up only data for the last quarter of the year, first create a text file that lists those leaf partition
names instead of the full table name:
public.sales_1_prt_oct17
public.sales_1_prt_nov17
public.sales_1_prt_dec17
Then specify the file with the --include-table-file option to generate one data file per leaf
partition:
$ gpbackup --dbname demo --include-table-file last-quarter.txt --leaf-partition-data
When you specify --leaf-partition-data, gpbackup generates one data file per leaf partition when
backing up a partitioned table. For example, this command generates one data file for each leaf
partition:
$ gpbackup --dbname demo --include-table public.sales --leaf-partition-data
When leaf partitions are backed up, the leaf partition data is backed up along with the metadata for
the entire partitioned table.
Filtering with gprestore
After creating a backup set with gpbackup, you can filter the schemas and tables that you want to
restore from the backup set using the gprestore --include-schema and --include-table-file
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
28
options. These options work in the same way as their gpbackup counterparts, but have the following
restrictions:
The tables that you attempt to restore must not already exist in the database.
If you attempt to restore a schema or table that does not exist in the backup set, the
gprestore does not execute.
If you use the --include-schema option, gprestore cannot restore objects that have
dependencies on multiple schemas.
If you use the --include-table-file option, gprestore does not create roles or set the
owner of the tables. The utility restores table indexes and rules. Triggers are also restored
but are not supported in Greenplum Database.
The file that you specify with --include-table-file cannot include a leaf partition name, as it
can when you specify this option with gpbackup. If you specified leaf partitions in the backup
set, specify the partitioned table to restore the leaf partition data.
When restoring a backup set that contains data from some leaf partitions of a partitioned
table, the partitioned table is restored along with the data for the leaf partitions. For example,
you create a backup with the gpbackup option --include-table-file and the text file lists
some leaf partitions of a partitioned table. Restoring the backup creates the partitioned table
and restores the data only for the leaf partitions listed in the file.
gprestore performs a special filtered restore operation with a storage plugin that supports this
functionality when all of the following conditions hold:
You specify the --plugin-config config.yml option when you invoke both the gpbackup
and gprestore commands and the configuration file includes the restore_subset: "on"
setting.
The backup is an uncompressed, single-data-file backup (you invoked the gpbackup
command with the --no-compression and --single-data-file flags).
You specify filtering options (--include-table, --exclude-table, --include-table-file, or
‑‑exclude-table-file) on the gprestore command line.
The storage plugin reads and restores only the relations that you specify from the backup file,
improving restore performance.
Parent topic:Parallel Backup with gpbackup and gprestore
Configuring Email Notifications
gpbackup and gprestore can send email notifications after a back up or restore operation completes.
To have gpbackup or gprestore send out status email notifications, you must place a file named
gp_email_contacts.yaml in the home directory of the user running gpbackup or gprestore in the
same directory as the utilities ($GPHOME/bin). A utility issues a message if it cannot locate a
gp_email_contacts.yaml file in either location. If both locations contain a .yaml file, the utility uses
the file in user $HOME.
The email subject line includes the utility name, timestamp, job status (Success or Failure), and the
name of the Greenplum Database host gpbackup or gprestore is called from. These are example
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
29
subject lines for gpbackup emails.
gpbackup 20180202133601 on gp-master completed: Success
or
gpbackup 20200925140738 on mdw completed: Failure
The email contains summary information about the operation including options, duration, and
number of objects backed up or restored. For information about the contents of a notification email,
see Report Files.
Note: The UNIX mail utility must be running on the Greenplum Database host and must be
configured to allow the Greenplum superuser (gpadmin) to send email. Also ensure that the mail
program executable is locatable via the gpadmin user's $PATH.
Parent topic:Parallel Backup with gpbackup and gprestore
gpbackup and gprestore Email File Format
The gpbackup and gprestore email notification YAML file gp_email_contacts.yaml uses indentation
(spaces) to determine the document hierarchy and the relationships of the sections to one another.
The use of white space is significant. White space should not be used simply for formatting purposes,
and tabs should not be used at all.
Note: If the status parameters are not specified correctly, the utility does not issue a warning. For
example, if the success parameter is misspelled and is set to true, a warning is not issued and an
email is not sent to the email address after a successful operation. To ensure email notification is
configured correctly, run tests with email notifications configured.
This is the format of the gp_email_contacts.yaml YAML file for gpbackup email notifications:
contacts:
gpbackup:
- address: <user>@<domain>
status:
success: [true | false]
success_with_errors: [true | false]
failure: [true | false]
gprestore:
- address: <user>@<domain>
status:
success: [true | false]
success_with_errors: [true | false]
failure: [true | false]
Email YAML File Sections
contacts
Required. The section that contains the gpbackup and gprestore sections. The YAML file can
contain a gpbackup section, a gprestore section, or one of each.
gpbackup
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
30
Optional. Begins the gpbackup email section.
address
Required. At least one email address must be specified. Multiple email address
parameters can be specified. Each address requires a status section.
: user@domain is a single, valid email address.
status
Required. Specify when the utility sends an email to the specified email address. The
default is to not send email notification.
: You specify sending email notifications based on the completion status of a backup or restore
operation. At least one of these parameters must be specified and each parameter can appear
at most once.
**success**
: Optional. Specify if an email is sent if the operation completes without erro
rs. If the value is `true`, an email is sent if the operation completes without e
rrors. If the value is `false` (the default), an email is not sent.
**success_with_errors**
: Optional. Specify if an email is sent if the operation completes with errors.
If the value is `true`, an email is sent if the operation completes with errors.
If the value is `false` (the default), an email is not sent.
**failure**
: Optional. Specify if an email is sent if the operation fails. If the value is
`true`, an email is sent if the operation fails. If the value is `false` (the de
fault), an email is not sent.
gprestore
Optional. Begins the gprestore email section. This section contains the address and status
parameters that are used to send an email notification after a gprestore operation. The syntax
is the same as the gpbackup section.
Examples
This example YAML file specifies sending email to email addresses depending on the success or
failure of an operation. For a backup operation, an email is sent to a different address depending on
the success or failure of the backup operation. For a restore operation, an email is sent to
[email protected] only when the operation succeeds or completes with errors.
contacts:
gpbackup:
- address: [email protected]
status:
success:true
- address: [email protected]
status:
success_with_errors: true
failure: true
gprestore:
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
31
- address: [email protected]
status:
success: true
success_with_errors: true
Understanding Backup Files
Warning: All gpbackup metadata files are created with read-only permissions. Never delete or modify
the metadata files for a gpbackup backup set. Doing so will render the backup files non-functional.
A complete backup set for gpbackup includes multiple metadata files, supporting files, and CSV data
files, each designated with the timestamp at which the backup was created.
By default, metadata and supporting files are stored on the Greenplum Database master host in the
directory $MASTER_DATA_DIRECTORY/backups/YYYYMMDD/YYYYMMDDHHMMSS/. If you
specify a custom backup directory, this same file path is created as a subdirectory of the backup
directory. The following table describes the names and contents of the metadata and supporting
files.
Table 2. gpbackup Metadata Files (master)
File name Description
gpbackup_<YYYYMMDDHHMMSS>_metadata.sql Contains global and database-specific metadata:
DDL for objects that are global to the
Greenplum Database cluster, and not owned by
a specific database within the cluster.
DDL for objects in the backed-up database
(specified with --dbname) that must be created
before
to restoring the actual data, and DDL for
objects that must be created
after
restoring the
data.
Global objects include:
Tablespaces
Databases
Database-wide configuration parameter
settings (GUCs)
Resource group definitions
Resource queue definitions
Roles
GRANT assignments of roles to databases
Note: Global metadata is not restored by default. You
must include the --with-globals option to the
gprestore command to restore global metadata.
Database-specific objects that must be created
before
to
restoring the actual data include:
Session-level configuration parameter settings
(GUCs)
Schemas
Procedural language extensions
Types
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
32
Table 2. gpbackup Metadata Files (master)
File name Description
Sequences
Functions
Tables
Protocols
Operators and operator classes
Conversions
Aggregates
Casts
Views
Materialized Views Note: Materialized view data
is not restored, only the definition.
Constraints
Database-specific objects that must be created
after
restoring the actual data include:
Indexes
Rules
Triggers. (While Greenplum Database does not
support triggers, any trigger definitions that are
present are backed up and restored.)
gpbackup_<YYYYMMDDHHMMSS>_toc.yaml Contains metadata for locating object DDL in the
_predata.sql and _postdata.sql files. This file also
contains the table names and OIDs used for locating the
corresponding table data in CSV data files that are
created on each segment. See Segment Data Files.
gpbackup_<YYYYMMDDHHMMSS>_report Contains information about the backup operation that is
used to populate the email notice (if configured) that is
sent after the backup completes. This file contains
information such as:
Command-line options that were provided
Database that was backed up
Database version
Backup type
See Configuring Email Notifications.
gpbackup_<YYYYMMDDHHMMSS>_config.yaml Contains metadata about the execution of the particular
backup task, including:
gpbackup version
Database name
Greenplum Database version
Additional option settings such as --no-
compression, --compression-level, --
compression-type, --metadata-only, --data-
only, and --with-stats.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
33
Table 2. gpbackup Metadata Files (master)
File name Description
gpbackup_<YYYYMMDDHHMMSS>_statistics.sql Contains table statistics.
Created when the gpbackup option --with-stats is
specified. Statistics are restored when the gprestore
option --with-stats is specified.
gpbackup_history.yaml Contains information about options that were used
when creating a backup with gpbackup, information
about incremental backups, and information about failed
backup operations.
Stored on the Greenplum Database master host in the
Greenplum Database master data directory.
This file is not backed up by gpbackup.
For information about incremental backups, see Creating
and Using Incremental Backups with gpbackup and
gprestore.
Segment Data Files
By default, each segment creates one compressed CSV file for each table that is backed up on the
segment. You can optionally specify the --single-data-file option to create a single data file on
each segment. The files are stored in <seg_dir>/backups/YYYYMMDD/YYYYMMDDHHMMSS/.
If you specify a custom backup directory, segment data files are copied to this same file path as a
subdirectory of the backup directory. If you include the --leaf-partition-data option, gpbackup
creates one data file for each leaf partition of a partitioned table, instead of just one table for file.
Each data file uses the file name format gpbackup_<content_id>.gz where:
<content_id> is the content ID of the segment.
is the timestamp of the gpbackup operation.
is the object ID of the table. The metadata file gpbackup__toc.yaml references this to locate
the data for a specific table in a schema.
You can optionally specify the compression level (from 1-9) using the --compression-level option,
or disable compression entirely with --no-compression. If you do not specify a compression level,
gpbackup uses compression level 1 by default.
Parent topic:Parallel Backup with gpbackup and gprestore
Creating and Using Incremental Backups with gpbackup and
gprestore
The gpbackup and gprestore utilities support creating incremental backups of append-optimized
tables and restoring from incremental backups. An incremental backup backs up all specified heap
tables and backs up append-optimized tables (including append-optimized, column-oriented tables)
only if the tables have changed. For example, if a row of an append-optimized table has changed,
the table is backed up. For partitioned append-optimized tables, only the changed leaf partitions are
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
34
backed up.
Incremental backups are efficient when the total amount of data in append-optimized tables or table
partitions that changed is small compared to the data that has not changed since the last backup.
An incremental backup backs up an append-optimized table only if one of the following operations
was performed on the table after the last full or incremental backup:
ALTER TABLE
DELETE
INSERT
TRUNCATE
UPDATE
DROP and then re-create the table
To restore data from incremental backups, you need a complete incremental backup set.
About Incremental Backup Sets
Using Incremental Backups
Parent topic:Parallel Backup with gpbackup and gprestore
About Incremental Backup Sets
An incremental backup set includes the following backups:
A full backup. This is the full backup that the incremental backups are based on.
The set of incremental backups that capture the changes to the database from the time of
the full backup.
For example, you can create a full backup and then create three daily incremental backups. The full
backup and all three incremental backups are the backup set. For information about using an
incremental backup set, see Example Using Incremental Backup Sets.
When you create or add to an incremental backup set, gpbackup ensures that the backups in the set
are created with a consistent set of backup options to ensure that the backup set can be used in a
restore operation. For information about backup set consistency, see Using Incremental Backups.
When you create an incremental backup you include these options with the other gpbackup options
to create a backup:
--leaf-partition-data - Required for all backups in the incremental backup set.
Required when you create a full backup that will be the base backup for an
incremental backup set.
Required when you create an incremental backup.
--incremental - Required when you create an incremental backup.
You cannot combine --data-only or --metadata-only with --incremental.
--from-timestamp - Optional. This option can be used with --incremental. The timestamp
you specify is an existing backup. The timestamp can be either a full backup or incremental
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
35
backup. The backup being created must be compatible with the backup specified with the --
from-timestamp option.
If you do not specify --from-timestamp, gpbackup attempts to find a compatible backup
based on information in the gpbackup history file. See Incremental Backup Notes.
Parent topic:Creating and Using Incremental Backups with gpbackup and gprestore
Using Incremental Backups
Example Using Incremental Backup Sets
Creating an Incremental Backup with gpbackup
Restoring from an Incremental Backup with gprestore
Incremental Backup Notes
When you add an incremental backup to a backup set, gpbackup ensures that the full backup and the
incremental backups are consistent by checking these gpbackup options:
--dbname - The database must be the same.
--backup-dir - The directory must be the same. The backup set, the full backup and the
incremental backups, must be in the same location.
--single-data-file - This option must be either specified or absent for all backups in the
set.
--plugin-config - If this option is specified, it must be specified for all backups in the backup
set. The configuration must reference the same plugin binary.
--include-table-file, --include-schema, or any other options that filter tables and schemas
must be the same.
When checking schema filters, only the schema names are checked, not the objects
contained in the schemas.
--no-compression - If this option is specified, it must be specified for all backups in the
backup set.
If compression is used on the on the full backup, compression must be used on the
incremental backups. Different compression levels are allowed for the backups in the backup
set. For a backup, the default is compression level 1.
If you try to add an incremental backup to a backup set, the backup operation fails if the gpbackup
options are not consistent.
For information about the gpbackup and gprestore utility options, see the gpbackup and gprestore
reference documentation.
Example Using Incremental Backup Sets
Each backup has a timestamp taken when the backup is created. For example, if you create a
backup on May 14, 2017, the backup file names contain 20170514hhmmss. The hhmmss represents the
time: hour, minute, and second.
This example assumes that you have created two full backups and incremental backups of the
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
36
database
mytest
. To create the full backups, you used this command:
gpbackup --dbname mytest --backup-dir /mybackup --leaf-partition-data
You created incremental backups with this command:
gpbackup --dbname mytest --backup-dir /mybackup --leaf-partition-data --incremental
When you specify the --backup-dir option, the backups are created in the /mybackup directory on
each Greenplum Database host.
In the example, the full backups have the timestamp keys 20170514054532 and 20171114064330. The
other backups are incremental backups. The example consists of two backup sets, the first with two
incremental backups, and second with one incremental backup. The backups are listed from earliest
to most recent.
20170514054532 (full backup)
20170714095512
20170914081205
20171114064330 (full backup)
20180114051246
To create a new incremental backup based on the latest incremental backup, you must include the
same --backup-dir option as the incremental backup as well as the options --leaf-partition-data
and --incremental.
gpbackup --dbname mytest --backup-dir /mybackup --leaf-partition-data --incremental
You can specify the --from-timestamp option to create an incremental backup based on an existing
incremental or full backup. Based on the example, this command adds a fourth incremental backup
to the backup set that includes 20170914081205 as an incremental backup and uses 20170514054532
as the full backup.
gpbackup --dbname mytest --backup-dir /mybackup --leaf-partition-data --incremental --
from-timestamp 20170914081205
This command creates an incremental backup set based on the full backup 20171114064330 and is
separate from the backup set that includes the incremental backup 20180114051246.
gpbackup --dbname mytest --backup-dir /mybackup --leaf-partition-data --incremental --
from-timestamp 20171114064330
To restore a database with the incremental backup 20170914081205, you need the incremental
backups 20120914081205 and 20170714095512, and the full backup 20170514054532. This would be
the gprestore command.
gprestore --backup-dir /backupdir --timestamp 20170914081205
Creating an Incremental Backup with gpbackup
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
37
The gpbackup output displays the timestamp of the backup on which the incremental backup is
based. In this example, the incremental backup is based on the backup with timestamp
20180802171642. The backup 20180802171642 can be an incremental or full backup.
$ gpbackup --dbname test --backup-dir /backups --leaf-partition-data --incremental
20180803:15:40:51 gpbackup:gpadmin:mdw:002907-[INFO]:-Starting backup of database test
20180803:15:40:52 gpbackup:gpadmin:mdw:002907-[INFO]:-Backup Timestamp = 2018080315405
1
20180803:15:40:52 gpbackup:gpadmin:mdw:002907-[INFO]:-Backup Database = test
20180803:15:40:52 gpbackup:gpadmin:mdw:002907-[INFO]:-Gathering list of tables for bac
kup
20180803:15:40:52 gpbackup:gpadmin:mdw:002907-[INFO]:-Acquiring ACCESS SHARE locks on
tables
Locks acquired: 5 / 5 [==============================================================
==] 100.00% 0s
20180803:15:40:52 gpbackup:gpadmin:mdw:002907-[INFO]:-Gathering additional table metad
ata
20180803:15:40:52 gpbackup:gpadmin:mdw:002907-[INFO]:-Metadata will be written to /bac
kups/gpseg-1/backups/20180803/20180803154051/gpbackup_20180803154051_metadata.sql
20180803:15:40:52 gpbackup:gpadmin:mdw:002907-[INFO]:-Writing global database metadata
20180803:15:40:52 gpbackup:gpadmin:mdw:002907-[INFO]:-Global database metadata backup
complete
20180803:15:40:52 gpbackup:gpadmin:mdw:002907-[INFO]:-Writing pre-data metadata
20180803:15:40:52 gpbackup:gpadmin:mdw:002907-[INFO]:-Pre-data metadata backup complet
e
20180803:15:40:52 gpbackup:gpadmin:mdw:002907-[INFO]:-Writing post-data metadata
20180803:15:40:52 gpbackup:gpadmin:mdw:002907-[INFO]:-Post-data metadata backup comple
te
20180803:15:40:52 gpbackup:gpadmin:mdw:002907-[INFO]:-Basing incremental backup off of
backup with timestamp = 20180802171642
20180803:15:40:52 gpbackup:gpadmin:mdw:002907-[INFO]:-Writing data to file
Tables backed up: 4 / 4 [============================================================
==] 100.00% 0s
20180803:15:40:52 gpbackup:gpadmin:mdw:002907-[INFO]:-Data backup complete
20180803:15:40:53 gpbackup:gpadmin:mdw:002907-[INFO]:-Found neither /usr/local/greenpl
um-db/./bin/gp_email_contacts.yaml nor /home/gpadmin/gp_email_contacts.yaml
20180803:15:40:53 gpbackup:gpadmin:mdw:002907-[INFO]:-Email containing gpbackup report
/backups/gpseg-1/backups/20180803/20180803154051/gpbackup_20180803154051_report will
not be sent
20180803:15:40:53 gpbackup:gpadmin:mdw:002907-[INFO]:-Backup completed successfully
Restoring from an Incremental Backup with gprestore
When restoring an from an incremental backup, you can specify the --verbose option to display the
backups that are used in the restore operation on the command line. For example, the following
gprestore command restores a backup using the timestamp 20180807092740, an incremental
backup. The output includes the backups that were used to restore the database data.
$ gprestore --create-db --timestamp 20180807162904 --verbose
...
20180807:16:31:56 gprestore:gpadmin:mdw:008603-[INFO]:-Pre-data metadata restore compl
ete
20180807:16:31:56 gprestore:gpadmin:mdw:008603-[DEBUG]:-Verifying backup file count
20180807:16:31:56 gprestore:gpadmin:mdw:008603-[DEBUG]:-Restoring data from backup wit
h timestamp: 20180807162654
20180807:16:31:56 gprestore:gpadmin:mdw:008603-[DEBUG]:-Reading data for table public.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
38
tbl_ao from file (table 1 of 1)
20180807:16:31:56 gprestore:gpadmin:mdw:008603-[DEBUG]:-Checking whether segment agent
s had errors during restore
20180807:16:31:56 gprestore:gpadmin:mdw:008603-[DEBUG]:-Restoring data from backup wit
h timestamp: 20180807162819
20180807:16:31:56 gprestore:gpadmin:mdw:008603-[DEBUG]:-Reading data for table public.
test_ao from file (table 1 of 1)
20180807:16:31:56 gprestore:gpadmin:mdw:008603-[DEBUG]:-Checking whether segment agent
s had errors during restore
20180807:16:31:56 gprestore:gpadmin:mdw:008603-[DEBUG]:-Restoring data from backup wit
h timestamp: 20180807162904
20180807:16:31:56 gprestore:gpadmin:mdw:008603-[DEBUG]:-Reading data for table public.
homes2 from file (table 1 of 4)
20180807:16:31:56 gprestore:gpadmin:mdw:008603-[DEBUG]:-Reading data for table public.
test2 from file (table 2 of 4)
20180807:16:31:56 gprestore:gpadmin:mdw:008603-[DEBUG]:-Reading data for table public.
homes2a from file (table 3 of 4)
20180807:16:31:56 gprestore:gpadmin:mdw:008603-[DEBUG]:-Reading data for table public.
test2a from file (table 4 of 4)
20180807:16:31:56 gprestore:gpadmin:mdw:008603-[DEBUG]:-Checking whether segment agent
s had errors during restore
20180807:16:31:57 gprestore:gpadmin:mdw:008603-[INFO]:-Data restore complete
20180807:16:31:57 gprestore:gpadmin:mdw:008603-[INFO]:-Restoring post-data metadata
20180807:16:31:57 gprestore:gpadmin:mdw:008603-[INFO]:-Post-data metadata restore comp
lete
...
The output shows that the restore operation used three backups.
When restoring an from an incremental backup, gprestore also lists the backups that are used in the
restore operation in the gprestore log file.
During the restore operation, gprestore displays an error if the full backup or other required
incremental backup is not available.
Incremental Backup Notes
To create an incremental backup, or to restore data from an incremental backup set, you need the
complete backup set. When you archive incremental backups, the complete backup set must be
archived. You must archive all the files created on the master and all segments.
Each time gpbackup runs, the utility adds backup information to the history file
gpbackup_history.yaml in the Greenplum Database master data directory. The file includes backup
options and other backup information.
If you do not specify the --from-timestamp option when you create an incremental backup,
gpbackup uses the most recent backup with a consistent set of options. The utility checks the backup
history file to find the backup with a consistent set of options. If the utility cannot find a backup with a
consistent set of options or the history file does not exist, gpbackup displays a message stating that a
full backup must be created before an incremental can be created.
If you specify the --from-timestamp option when you create an incremental backup, gpbackup
ensures that the options of the backup that is being created are consistent with the options of the
specified backup.
The gpbackup option --with-stats is not required to be the same for all backups in the backup set.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
39
However, to perform a restore operation with the gprestore option --with-stats to restore statistics,
the backup you specify must have must have used the --with-stats when creating the backup.
You can perform a restore operation from any backup in the backup set. However, changes
captured in incremental backups later than the backup use to restore database data will not be
restored.
When restoring from an incremental backup set, gprestore checks the backups and restores each
append-optimized table from the most recent version of the append-optimized table in the backup
set and restores the heap tables from the latest backup.
The incremental back up set, a full backup and associated incremental backups, must be on a single
device. For example, the backups in a backup set must all be on a file system or must all be on a
Data Domain system.
If you specify the gprestore option --incremental to restore data from a specific incremental
backup, you must also specify the --data-only option. Before performing the restore operation,
gprestore ensures that the tables being restored exist. If a table does not exist, gprestore returns an
error and exits.
Warning: Changes to the Greenplum Database segment configuration invalidate incremental
backups. After you change the segment configuration (add or remove segment instances), you must
create a full backup before you can create an incremental backup.
Parent topic:Creating and Using Incremental Backups with gpbackup and gprestore
Using gpbackup and gprestore with BoostFS
You can use the Greenplum Database gpbackup and gprestore utilities with the Data Domain DD
Boost File System Plug-In (BoostFS) to access a Data Domain system. BoostFS leverages DD Boost
technology and helps reduce bandwidth usage, can improve backup-times, offers load-balancing
and in-flight encryption, and supports the Data Domain multi-tenancy feature set.
You install the BoostFS plug-in on the Greenplum Database host systems to provide access to a Data
Domain system as a standard file system mount point. With direct access to a BoostFS mount point,
gpbackup and gprestore can leverage the storage and network efficiencies of the DD Boost protocol
for backup and recovery.
For information about configuring BoostFS, you can download the
BoostFS for Linux Configuration
Guide
from the Dell support site https://www.dell.com/support (requires login). After logging into the
support site, you can find the guide by searching for "BoostFS for Linux Configuration Guide". You
can limit your search results by choosing to list only Manuals & Documentation as resources.
To back up or restore with BoostFS, you include the option --backup-dir with the gpbackup or
gprestore command to access the Data Domain system.
Installing BoostFS
Backing Up and Restoring with BoostFS
Parent topic:Parallel Backup with gpbackup and gprestore
Installing BoostFS
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
40
Download the latest BoostFS RPM from the Dell support site https://www.dell.com/support (requires
login).
After logging into the support site, you can find the RPM by searching for "boostfs". You can limit
your search results by choosing to list only Downloads & Drivers as resources. To list the most recent
RPM near the top of your search results, sort your results by descending date.
The RPM supports both RHEL and SuSE.
These steps install BoostFS and create a mounted directory that accesses a Data Domain system.
Perform the steps on all Greenplum Database hosts. The mounted directory you create must be the
same on all hosts.
1. Copy the BoostFS RPM to the host and install the RPM.
After installation, the DDBoostFS package files are located under /opt/emc/boostfs.
2. Set up the BoostFS lockbox with the storage unit with the boostfs utility. Enter the Data
Domain user password at the prompts.
/opt/emc/boostfs/bin/boostfs lockbox set -d <Data_Domain_IP> -s <Storage_Unit>
-u <Data_Domain_User>
The <Storage_Unit> is the Data Domain storage unit ID. The <Data_Domain_User> is a Data
Domain user with access to the storage unit.
3. Create the directory in the location you want to mount BoostFS.
mkdir <path_to_mount_directory>
4. Mount the Data Domain storage unit with the boostfs utility. Use the mount option -allow-
others=true to allow other users to write to the BoostFS mounted file system.
/opt/emc/boostfs/bin/boostfs mount <path_to_mount_directory> -d $<Data_Domain_I
P> -s <Storage_Unit> -o allow-others=true
5. Confirm that the mount was successful by running this command.
mountpoint <mounted_directory>
The command lists the directory as a mount point.
<mounted_directory> is a mountpoint
You can now run gpbackup and gprestore with the --backup-dir option to back up a database to
<mounted_directory> on the Data Domain system and restore data from the Data Domain system.
Parent topic:Using gpbackup and gprestore with BoostFS
Backing Up and Restoring with BoostFS
These are required gpbackup options when backing up data to a Data Domain system with BoostFS.
--backup-dir - Specify the mounted Data Domain storage unit.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
41
--no-compression - Disable compression. Data compression interferes with DD Boost data
de-duplication.
--single-data-file - Create a single data file on each segment host. A single data file
avoids a BoostFS stream limitation.
When you use gprestore to restore a backup from a Data Domain system with BoostFS, you must
specify the mounted Data Domain storage unit with the option --backup-dir.
When you use the gpbackup option --single-data-file, you cannot specify the --jobs option to
perform a parallel restore operation with gprestore.
This example gpbackup command backs up the database test. The example assumes that the
directory /boostfs-test is the mounted Data Domain storage unit.
$ gpbackup --dbname test --backup-dir /boostfs-test/ --single-data-file --no-compressi
on
These commands drop the database test and restore the database from the backup.
$ dropdb test
$ gprestore --backup-dir /boostfs-test/ --timestamp 20171103153156 --create-db
The value 20171103153156 is the timestamp of the gpbackup backup set to restore. For information
about how gpbackup uses timesamps when creating backups, see Parallel Backup with gpbackup and
gprestore. For information about the -timestamp option, see gprestore.
Parent topic:Using gpbackup and gprestore with BoostFS
Using gpbackup Storage Plugins
You can configure the Greenplum Database gpbackup and gprestore utilities to use a storage plugin
to process backup files during a backup or restore operation. For example, during a backup
operation, the plugin sends the backup files to a remote location. During a restore operation, the
plugin retrieves the files from the remote location.
You can also develop a custom storage plugin with the Greenplum Database Backup/Restore
Storage Plugin API. See Backup/Restore Storage Plugin API.
Using the S3 Storage Plugin with gpbackup and gprestore
Using the DD Boost Storage Plugin with gpbackup, gprestore, and gpbackup_manager
Parent topic:Parallel Backup with gpbackup and gprestore
Using the S3 Storage Plugin with gpbackup and gprestore
The S3 storage plugin application lets you use an Amazon Simple Storage Service (Amazon S3)
location to store and retrieve backups when you run gpbackup and gprestore. Amazon S3 provides
secure, durable, highly-scalable object storage. The S3 plugin streams the backup data from a
named pipe (FIFO) directly to the S3 bucket without generating local disk I/O.
The S3 storage plugin can also connect to an Amazon S3 compatible service such as Dell EMC
Elastic Cloud Storage (ECS), Minio, and Cloudian HyperStore.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
42
Prerequisites
Using Amazon S3 to back up and restore data requires an Amazon AWS account with access to the
Amazon S3 bucket. These are the Amazon S3 bucket permissions required for backing up and
restoring data:
Upload/Delete for the S3 user ID that uploads the files
Open/Download and View for the S3 user ID that accesses the files
For information about Amazon S3, see Amazon S3. For information about Amazon S3 regions and
endpoints, see AWS service endpoints. For information about S3 buckets and folders, see the
Amazon S3 documentation.
Installing the S3 Storage Plugin
The S3 storage plugin is included with the Tanzu Greenplum Backup and Restore release. Use the
latest S3 plugin release with the latest Tanzu Backup and Restore, to avoid any incompatibilities.
Open Source Greenplum Backup and Restore customers may get the utility from gpbackup-s3-
plugin. Build the utility following the steps in Building and Installing the S3 plugin.
The S3 storage plugin application must be in the same location on every Greenplum Database host,
for example $GPHOME/bin/gpbackup_s3_plugin. The S3 storage plugin requires a configuration file,
installed only on the master host.
Using the S3 Storage Plugin
To use the S3 storage plugin application, specify the location of the plugin, the S3 login credentials,
and the backup location in a configuration file. For information about the configuration file, see S3
Storage Plugin Configuration File Format.
When running gpbackup or gprestore, specify the configuration file with the option --plugin-config.
gpbackup --dbname <database-name> --plugin-config /<path-to-config-file>/<s3-config-fi
le>.yaml
When you perform a backup operation using gpbackup with the --plugin-config option, you must
also specify the --plugin-config option when restoring with gprestore.
gprestore --timestamp <YYYYMMDDHHMMSS> --plugin-config /<path-to-config-file>/<s3-conf
ig-file>.yaml
The S3 plugin stores the backup files in the S3 bucket, in a location similar to:
<folder>/backups/<datestamp>/<timestamp>
Where folder is the location you specified in the S3 configuration file, and datestamp and timestamp
are the backup date and time stamps.
The S3 storage plugin logs are in <gpadmin_home>/gpAdmin/gpbackup_s3_plugin_timestamp.log on
each Greenplum host system. The timestamp format is YYYYMMDDHHMMSS.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
43
Example
This is an example S3 storage plugin configuration file, named s3-test-config.yaml, that is used in
the next gpbackup example command.
executablepath: $GPHOME/bin/gpbackup_s3_plugin
options:
region: us-west-2
aws_access_key_id: test-s3-user
aws_secret_access_key: asdf1234asdf
bucket: gpdb-backup
folder: test/backup3
This gpbackup example backs up the database demo using the S3 storage plugin with absolute path
/home/gpadmin/s3-test.
gpbackup --dbname demo --plugin-config /home/gpadmin/s3-test/s3-test-config.yaml
The S3 storage plugin writes the backup files to this S3 location in the AWS region us-west-2.
gpdb-backup/test/backup3/backups/<YYYYMMDD>/<YYYYMMDDHHMMSS>/
This example restores a specific backup set defined by the 20201206233124 timestamp, using the S3
plugin configuration file.
gprestore --timestamp 20201206233124 --plugin-config /home/gpadmin/s3-test/s3-test-con
fig.yaml
S3 Storage Plugin Configuration File Format
The configuration file specifies the absolute path to the Greenplum Database S3 storage plugin
executable, connection credentials, and S3 location.
The S3 storage plugin configuration file uses the YAML 1.1 document format and implements its own
schema for specifying the location of the Greenplum Database S3 storage plugin, connection
credentials, and S3 location and login information.
The configuration file must be a valid YAML document. The gpbackup and gprestore utilities process
the control file document in order and use indentation (spaces) to determine the document
hierarchy and the relationships of the sections to one another. The use of white space is significant.
White space should not be used simply for formatting purposes, and tabs should not be used at all.
This is the structure of a S3 storage plugin configuration file.
executablepath: <absolute-path-to-gpbackup_s3_plugin>
options:
region: <aws-region>
endpoint: <S3-endpoint>
aws_access_key_id: <aws-user-id>
aws_secret_access_key: <aws-user-id-key>
bucket: <s3-bucket>
folder: <s3-location>
encryption: [on|off]
backup_max_concurrent_requests: [int]
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
44
# default value is 6
backup_multipart_chunksize: [string]
# default value is 500MB
restore_max_concurrent_requests: [int]
# default value is 6
restore_multipart_chunksize: [string]
# default value is 500MB
http_proxy:
<http://<your_username>:<your_secure_password>@proxy.example.com:proxy_port>
Note: The S3 storage plugin does not support filtered restore operations and the associated
restore_subset plugin configuration property.
executablepath
Required. Absolute path to the plugin executable. For example, the Tanzu Greenplum
installation location is $GPHOME/bin/gpbackup_s3_plugin. The plugin must be in the same
location on every Greenplum Database host.
options
Required. Begins the S3 storage plugin options section.
region
Required for AWS S3. If connecting to an S3 compatible service, this option is not required,
with one exception: If you are using Minio object storage and have specified a value for the
Region setting on the Minio server side you must set this region option to the same value.
endpoint
Required for an S3 compatible service. Specify this option to connect to an S3 compatible
service such as ECS. The plugin connects to the specified S3 endpoint (hostname or IP
address) to access the S3 compatible data store.
If this option is specified, the plugin ignores the region option and does not use AWS to
resolve the endpoint. When this option is not specified, the plugin uses the region to
determine AWS S3 endpoint.
aws_access_key_id
Optional. The S3 ID to access the S3 bucket location that stores backup files.
If this parameter is not specified, S3 authentication uses information from the session
environment. See aws_secret_access_key
aws_secret_access_key
Required only if you specify aws_access_key_id. The S3 passcode for the S3 ID to access the
S3 bucket location.
If aws_access_key_id and aws_secret_access_key are not specified in the configuration file,
the S3 plugin uses S3 authentication information from the system environment of the session
running the backup operation. The S3 plugin searches for the information in these sources,
using the first available source.
1. The environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
2. The authentication information set with the AWS CLI command aws configure.
3. The credentials of the Amazon EC2 IAM role if the backup is run from an EC2
instance.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
45
bucket
Required. The name of the S3 bucket in the AWS region or S3 compatible data store. The
bucket must exist.
folder
Required. The S3 location for backups. During a backup operation, the plugin creates the S3
location if it does not exist in the S3 bucket.
encryption
Optional. Enable or disable use of Secure Sockets Layer (SSL) when connecting to an S3
location. Default value is on, use connections that are secured with SSL. Set this option to off
to connect to an S3 compatible service that is not configured to use SSL.
Any value other than on or off is not accepted.
backup_max_concurrent_requests
Optional. The segment concurrency level for a file artifact within a single backup/upload
request. The default value is set to 6. Use this parameter in conjuction with the gpbackup --
jobs flag, to increase your overall backup concurrency.
Example: In a 4 node cluster, with 12 segments (3 per node), if the --jobs flag is set to 10,
there could be 120 concurrent backup requests. With the backup_max_concurrent_requests
parameter set to 6, the total S3 concurrent upload threads during a single backup session
would reach 720 (120 x 6).
Note: If the upload artifact is 10MB (see backup_multipart_chunksize), the
backup_max_concurrent_requests parameter would not take effect since the file is smaller
than the chunk size.
backup_multipart_chunksize
Optional. The file chunk size of the S3 multipart upload request in Megabytes (for example
20MB), Gigabytes (for example 1GB), or bytes (for example 1048576B). The default value is
500MB and the minimum value is 5MB (or 5242880B). Use this parameter along with the --
jobsflag and the backup_max_concurrent_requests parameter to fine tune your backups. Set
the chunksize based on your individual segment file size. S3 supports upto 10,000 max total
partitions for a single file upload.
restore_max_concurrent_requests
Optional. The level of concurrency for downloading a file artifact within a single restore
request. The default value is set to 6.
restore_multipart_chunksize
Optional. The file chunk size of the S3 multipart download request in Megabytes (for example
20MB), Gigabytes (for example 1GB), or bytes (for example 1048576B). The default value is
500MB. Use this parameter along with the restore_max_concurrent_requests parameter to
fine tune your restores.
http_proxy
Optional. Allow AWS S3 access via a proxy server. The parameter should contain the proxy
url in the form of http://username:[email protected]:proxy_port or
http://proxy.example.com:proxy_port.
Parent topic:Using gpbackup Storage Plugins
Using the DD Boost Storage Plugin with gpbackup,
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
46
gprestore, and gpbackup_manager
Note: The DD Boost storage plugin is available only in the commercial release of Tanzu Greenplum
Backup and Restore.
The DD Boost Storage Plugin can be used with the gpbackup and gprestore utilities to perform faster
backups to the Dell EMC Data Domain storage appliance, which uses Dell EMC Data Domain Boost
(DD Boost) software. The DD Boost Storage Plugin supports filtered restore, which increases
performance by selectively reading and restoring only the subset of backup data that you specify via
gprestore filter options.
You can also create disaster recovery scenarios using gpbackup or gpbackup_manager by
replicating a backup on a separate, remote Data Domain appliance. See Replicating Backups for
more information.
The DD Boost storage plugin is installed in the $GPHOME/bin directory of your Greenplum master host
when you add the gpbackup package to your installation.
To use the DD Boost storage plugin application, you first create a configuration file to specify the
location of the plugin, the DD Boost login, and the backup location. For information about the
configuration file, see DD Boost Storage Plugin Configuration File Format.
To run gpbackup or gprestore with the plugin, specify the configuration file with the option --plugin-
config.
If you perform a backup operation with the gpbackup option --plugin-config, you must also specify
the --plugin-config option when you restore the backup with gprestore.
DD Boost Storage Plugin Configuration File Format
The configuration file specifies the absolute path to the Greenplum Database DD Boost storage
plugin executable, DD Boost connection credentials, and Data Domain location. The configuration
file is required only on the master host. The DD Boost storage plugin application must be in the same
location on every Greenplum Database host.
The DD Boost storage plugin configuration file uses the YAML 1.1 document format and implements
its own schema for specifying the DD Boost information.
The configuration file must be a valid YAML document. The gpbackup and gprestore utilities process
the configuration file document in order and use indentation (spaces) to determine the document
hierarchy and the relationships of the sections to one another. The use of white space is significant.
White space should not be used simply for formatting purposes, and tabs should not be used at all.
This is the structure of a DD Boost storage plugin configuration file.
executablepath: <absolute-path-to-gpbackup_ddboost_plugin>
options:
hostname: "<data-domain-host>"
username: "<ddboost-ID>"
password_encryption: "on" | "off"
password: "<ddboost-pwd>"
storage_unit: "<data-domain-id>"
directory: "<data-domain-dir>"
replication: "on" | "off"
replication_streams: <integer>
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
47
remote_hostname: "<remote-dd-host>"
remote_username: "<remote-ddboost-ID>"
remote_password_encryption "on" | "off"
remote_password: "<remote-dd-pwd>"
remote_storage_unit: "<remote-dd-ID>"
remote_directory: "<remote-dd-dir>"
restore_subset: "on" | "off"
executablepath
Required. Absolute path to the plugin executable. For example, the Tanzu Greenplum
installation location is $GPHOME/bin/gpbackup_ddboost_plugin. The plugin must be in the same
location on every Greenplum Database host.
options
Required. Begins the DD Boost storage plugin options section.
hostname
Required. The IP address or hostname of the host. There is a 30-character limit.
username
Required. The Data Domain Boost user name. There is a 30-character limit.
password_encryption
Optional. Specifies whether the password option value is encrypted. Default value is off.
Use the gpbackup_manager encrypt-password command to encrypt the plain-text
password for the DD Boost user. If the replication option is on, gpbackup_manager also
encrypts the remote Data Domain user's password. Copy the encrypted password(s)
from the gpbackup_manager output to the password options in the configuration file.
password
Required. The passcode for the DD Boost user to access the Data Domain storage unit.
If the password_encryption option is on, this is an encrypted password.
storage-unit
Required. A valid storage unit name for the Data Domain system that is used for backup
and restore operations.
directory
Required. The location for the backup files, configuration files, and global objects on the
Data Domain system. The location on the system is /<data-domain-dir> in the storage
unit of the system.
: During a backup operation, the plugin creates the directory location if it does not exist in the
storage unit and stores the backup in this directory /<data-domain-
dir>/YYYYMMDD/YYYYMMDDHHMMSS/.
replication
Optional. Enables or disables backup replication with DD Boost managed file replication
when gpbackup performs a backup operation. Value is either on or off. Default value is
off, backup replication is disabled. When the value is on, the DD Boost plugin replicates
the backup on the Data Domain system that you specify with the remote_* options.
: The replication option and remote_* options are ignored when performing a restore
operation with gprestore. The remote_* options are ignored if replication is off.
: This option is ignored when you perform replication with the gpbackup_manager replicate-
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
48
backup command. For information about replication,see Replicating Backups.
replication_streams
Optional. Used with the gpbackup_manager replicate-backup command, ignored
otherwise. Specifies the maximum number of Data Domain I/O streams that can be
used when replicating a backup set on a remote Data Domain server from the Data
Domain server that contains the backup. Default value is 1.
: This option is ignored when you perform replication with gpbackup. The default value is used.
remote_hostname
Required when performing replication. The IP address or hostname of the Data Domain
system that is used for remote backup storage. There is a 30-character limit.
remote_username
Required when performing replication. The Data Domain Boost user name that
accesses the remote Data Domain system. There is a 30-character limit.
remote_password_encryption
Optional when performing replication. Specifies whether the remote_password option
value is encrypted. The default value is off. To set up password encryption use the
gpbackup_manager encrypt-password command to encrypt the plain-text passwords
for the DD Boost user. If the replication parameter is on, gpbackup_manager also
encrypts the remote Data Domain user's password. Copy the encrypted passwords from
the gpbackup_manager output to the password options in the configuration file.
remote_password
Required when performing replication. The passcode for the DD Boost user to access
the Data Domain storage unit on the remote system. If the
remote_password_encryption option is on, this is an encrypted password.
remote_storage_unit
Required when performing replication. A valid storage unit name for the remote Data
Domain system that is used for backup replication.
remote_directory
Required when performing replication. The location for the replicated backup files,
configuration files, and global objects on the remote Data Domain system. The location
on the system is /<remote-dd-dir> in the storage unit of the remote system.
: During a backup operation, the plugin creates the directory location if it does not exist in the
storage unit of the remote Data Domain system and stores the replicated backup in this
directory /<remote-dd-dir>/YYYYMMDD/YYYYMMDDHHMMSS/.
restore_subset
Optional. When gpbackup and gprestore commands specify certain backup and filter
conditions (see Filtered Restore with the DD Boost Storage Plugin), specifies whether
gprestore should perform a filtered restore operation. The default value is on, perform
a filtered restore. Set restore_subset to off to disable this optimization.
Examples
This is an example DD Boost storage plugin configuration file that is used in the next gpbackup
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
49
example command. The name of the file is ddboost-test-config.yaml.
executablepath: $GPHOME/bin/gpbackup_ddboost_plugin
options:
hostname: "192.0.2.230"
username: "test-ddb-user"
password: "asdf1234asdf"
storage_unit: "gpdb-backup"
directory: "test/backup"
This gpbackup example backs up the database demo using the DD Boost storage plugin. The
absolute path to the DD Boost storage plugin configuration file is /home/gpadmin/ddboost-test-
config.yml.
gpbackup --dbname demo --single-data-file --no-compression --plugin-config /home/gpadm
in/ddboost-test-config.yaml
The DD Boost storage plugin writes the backup files to this directory of the Data Domain storage unit
gpdb-backup.
<directory>/backups/<datestamp>/<timestamp>
Where:
is the location you specified in the DD Boost configuration file.
is the backup date stamp.
is the backup time stamp.
For example:
/test/backup/<YYYYMMDD>/<YYYYMMDDHHMMSS>/
This is an example DD Boost storage plugin configuration file that enables replication.
executablepath: $GPHOME/bin/gpbackup_ddboost_plugin
options:
hostname: "192.0.2.230"
username: "test-ddb-user"
password: "asdf1234asdf"
storage_unit: "gpdb-backup"
directory: "test/backup"
replication: "on"
remote_hostname: "192.0.3.20"
remote_username: "test-dd-remote"
remote_password: "qwer2345erty"
remote_storage_unit: "gpdb-remote"
remote_directory: "test/replication"
To restore from the replicated backup in the previous example, you can run gprestore with the DD
Boost storage plugin and specify a configuration file with this information.
executablepath: $GPHOME/bin/gpbackup_ddboost_plugin
options:
hostname: "192.0.3.20"
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
50
remote_username: "test-dd-remote"
remote_password: "qwer2345erty"
storage_unit: "gpdb-remote"
directory: "test/replication"
Best Practices
Include these recommended flags when using the DD Boost storage plugin:
--no-compression, because compressed data does not allow DD Boost to do any
deduplication.
--single-data-file, because multiple data files may cause additional overhead on the Data
Domain file system, resulting in slower than optimal backup speed.
Filtered Restore with the DD Boost Storage Plugin
Filtered restore increases performance by reading and restoring only a subset of the backup data
stored on the DD Boost storage system.
gprestore performs a filtered restore operation with the DD Boost Storage plugin when all of the
following conditions hold:
You specify the --plugin-config ddboost-config.yml option when you invoke both the
gpbackup and gprestore commands.
The backup is an uncompressed, single-data-file backup (you invoked the gpbackup
command with the --no-compression and --single-data-file flags).
You specify filtering options (--include-table, --exclude-table, --include-table-file, or
‑‑exclude-table-file) on the gprestore command line.
The DD Boost Storage Plugin reads only the relations that you specify from the backup file on the
DD Boost storage system, and restores them in Greenplum Database.
Notes
Dell EMC DD Boost is integrated with Tanzu Greenplum and requires a DD Boost license. Open
source Greenplum Database cannot use the DD Boost software, but can back up to a Dell EMC Data
Domain system mounted as an NFS share on the Greenplum master and segment hosts.
Parent topic:Using gpbackup Storage Plugins
Backup/Restore Storage Plugin API
This topic describes how to develop a custom storage plugin with the Greenplum Database
Backup/Restore Storage Plugin API.
The Backup/Restore Storage Plugin API provides a framework that you can use to develop and
integrate a custom backup storage system with the Greenplum Database gpbackup,
gpbackup_manager, and gprestore utilities.
The Backup/Restore Storage Plugin API defines a set of interfaces that a plugin must support. The
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
51
API also specifies the format and content of a configuration file for a plugin.
When you use the Backup/Restore Storage Plugin API, you create a plugin that the Greenplum
Database administrator deploys to the Greenplum Database cluster. Once deployed, the plugin is
available for use in certain backup and restore operations.
This topic includes the following subtopics:
Plugin Configuration File
Plugin API
Plugin Commands
Implementing a Backup/Restore Storage Plugin
Verifying a Backup/Restore Storage Plugin
Packaging and Deploying a Backup/Restore Storage Plugin
Parent topic:Parallel Backup with gpbackup and gprestore
Plugin Configuration File
Specifying the --plugin-config option to the gpbackup and gprestore commands instructs the
utilities to use the plugin specified in the configuration file for the operation.
The plugin configuration file provides information for both Greenplum Database and the plugin. The
Backup/Restore Storage Plugin API defines the format of, and certain keywords used in, the plugin
configuration file.
A plugin configuration file is a YAML file in the following format:
executablepath: <path_to_plugin_executable>
options:
restore_subset: "on" | "off"
<keyword1>: <value1>
<keyword2>: <value2>
...
<keywordN>: <valueN>
gpbackup and gprestore use the **executablepath** value to determine the file system location of
the plugin executable program.
gprestore uses the **restore_subset** configuration setting to determine if the plugin supports a
filtered restore operation. The default value is "off", the plugin does not support filtered restore.
The plugin configuration file may also include keywords and values specific to a plugin instance. A
backup/restore storage plugin can use the **options** block specified in the file to obtain
information from the user that may be required to perform its tasks. This information may include
location, connection, or authentication information, for example. The plugin should both specify and
consume the content of this information in keyword:value syntax.
A sample plugin configuration file for the Greenplum Database S3 backup/restore storage plugin
follows:
executablepath: $GPHOME/bin/gpbackup_s3_plugin
options:
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
52
region: us-west-2
aws_access_key_id: notarealID
aws_secret_access_key: notarealkey
bucket: gp_backup_bucket
folder: greenplum_backups
Plugin API
The plugin that you implement when you use the Backup/Restore Storage Plugin API is an
executable program that supports specific commands invoked by gpbackup and gprestore at defined
points in their respective life cycle operations:
The Greenplum Database Backup/Restore Storage Plugin API provides hooks into the
gpbackup lifecycle at initialization, during backup, and at cleanup/exit time.
The API provides hooks into the gprestore lifecycle at initialization, during restore, and at
cleanup/exit time.
The API provides arguments that specify the execution scope (master host, segment host, or
segment instance) for a plugin setup or cleanup command. The scope can be one of these
values.
master - Execute the plugin once on the master host.
segment_host - Execute the plugin once on each of the segment hosts.
segment - Execute the plugin once for each active segment instance on the host
running the segment instance. The Greenplum Database hosts and segment
instances are based on the Greenplum Database configuration when the back up
started. The values segment_host and segment are provided as a segment host can
host multiple segment instances. There might be some setup or cleanup required at
the segment host level as compared to each segment instance.
The Plugin API also defines the delete_backup command, which is called by the gpbackup_manager
utility. (The gpbackup_manager source code is proprietary and the utility is available only in the Tanzu
Greenplum Backup and Restore download from VMware Tanzu Network.)
The Backup/Restore Storage Plugin API defines the following call syntax for a backup/restore
storage plugin executable program:
<plugin_executable> <command> <config_file> <args>
where:
plugin_executable - The absolute path of the backup/restore storage plugin executable
program. This path is determined by the executablepath property value configured in the
plugin's configuration YAML file.
command - The name of a Backup/Restore Storage Plugin API command that identifies a
specific entry point to a gpbackup or gprestore lifecycle operation.
config_file - The absolute path of the plugin's configuration YAML file.
args - The command arguments; the actual arguments differ depending upon the command
specified.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
53
Plugin Commands
The Greenplum Database Backup/Restore Storage Plugin API defines the following commands:
Command Name Description
plugin_api_versi
on
Return the version of the Backup/Restore Storage Plugin API supported by the plugin. The
currently supported version is 0.4.0.
setup_plugin_fo
r_backup
Initialize the plugin for a backup operation.
backup_file Move a backup file to the remote storage system.
backup_data Move streaming data from stdin to a file on the remote storage system.
delete_backup Delete the directory specified by the given backup timestamp on the remote system.
cleanup_plugin
_for_backup
Clean up after a backup operation.
setup_plugin_fo
r_restore
Initialize the plugin for a restore operation.
restore_file Move a backup file from the remote storage system to a designated location on the local host.
restore_data Move a backup file from the remote storage system, streaming the data to stdout.
restore_data_su
bset
Use filtered restore to move selected relations from an uncompressed, single-data-file backup file
from the remote storage system, streaming the data to stdout.
cleanup_plugin
_for_restore
Clean up after a restore operation.
A backup/restore storage plugin must support every command identified above, even if it is a no-op.
Implementing a Backup/Restore Storage Plugin
You can implement a backup/restore storage plugin executable in any programming or scripting
language.
The tasks performed by a backup/restore storage plugin will be very specific to the remote storage
system. As you design the plugin implementation, you will want to:
Examine the connection and data transfer interface to the remote storage system.
Identify the storage path specifics of the remote system.
Identify configuration information required from the user.
Define the keywords and value syntax for information required in the plugin configuration
file.
Determine if, and how, the plugin will modify (compress, etc.) the data en route to/from the
remote storage system.
Define a mapping between a gpbackup file path and the remote storage system.
Identify how gpbackup options affect the plugin, as well as which are required and/or not
applicable. For example, if the plugin performs its own compression, gpbackup must be
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
54
invoked with the --no-compression option to prevent the utility from compressing the data.
A backup/restore storage plugin that you implement must:
Support all plugin commands identified in Plugin Commands. Each command must exit with
the values identified on the command reference page.
Refer to the gpbackup-s3-plugin github repository for an example plugin implementation.
Verifying a Backup/Restore Storage Plugin
The Backup/Restore Storage Plugin API includes a test bench that you can run to ensure that a
plugin is well integrated with gpbackup and gprestore.
The test bench is a bash script that you run in a Greenplum Database installation. The script
generates a small (<1MB) data set in a Greenplum Database table, explicitly tests each command, and
runs a backup and restore of the data (file and streaming). The test bench invokes gpbackup and
gprestore, which in turn individually call/test each Backup/Restore Storage Plugin API command
implemented in the plugin.
The test bench program calling syntax is:
plugin_test_bench.sh <plugin_executable plugin_config>
Procedure
To run the Backup/Restore Storage Plugin API test bench against a plugin:
1. Log in to the Greenplum Database master host and set up your environment. For example:
$ ssh gpadmin@<gpmaster>
gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
2. Obtain a copy of the test bench from the gpbackup github repository. For example:
$ git clone [email protected]:greenplum-db/gpbackup.git
The clone operation creates a directory named gpbackup/ in the current working directory.
3. Locate the test bench program in the gpbackup/master/plugins directory. For example:
$ ls gpbackup/master/plugins/plugin_test_bench.sh
4. Copy the plugin executable program and the plugin configuration YAML file from your
development system to the Greenplum Database master host. Note the file system location
to which you copied the files.
5. Copy the plugin executable program from the Greenplum Database master host to the same
file system location on each segment host.
6. If required, edit the plugin configuration YAML file to specify the absolute path of the plugin
executable program that you just copied to the Greenplum segments.
7. Run the test bench program against the plugin. For example:
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
55
$ gpbackup/master/plugins/plugin_test_bench.sh /path/to/pluginexec /path/to/plu
gincfg.yaml
8. Examine the test bench output. Your plugin passed the test bench if all output messages
specify RUNNING and PASSED. For example:
# ----------------------------------------------
# Starting gpbackup plugin tests
# ----------------------------------------------
[RUNNING] plugin_api_version
[PASSED] plugin_api_version
[RUNNING] setup_plugin_for_backup
[RUNNING] backup_file
[RUNNING] setup_plugin_for_restore
[RUNNING] restore_file
[PASSED] setup_plugin_for_backup
[PASSED] backup_file
[PASSED] setup_plugin_for_restore
[PASSED] restore_file
[RUNNING] backup_data
[RUNNING] restore_data
[PASSED] backup_data
[PASSED] restore_data
[RUNNING] cleanup_plugin_for_backup
[PASSED] cleanup_plugin_for_backup
[RUNNING] cleanup_plugin_for_restore
[PASSED] cleanup_plugin_for_restore
[RUNNING] gpbackup with test database
[RUNNING] gprestore with test database
[PASSED] gpbackup and gprestore
# ----------------------------------------------
# Finished gpbackup plugin tests
# ----------------------------------------------
Packaging and Deploying a Backup/Restore Storage Plugin
Your backup/restore storage plugin is ready to be deployed to a Greenplum Database installation
after the plugin passes your testing and the test bench verification. When you package the
backup/restore storage plugin, consider the following:
The backup/restore storage plugin must be installed in the same file system location on
every host in the Greenplum Database cluster. Provide installation instructions for the plugin
identifying the same.
The gpadmin user must have permission to traverse the file system path to the
backup/restore plugin executable program.
Include a template configuration file with the plugin.
Document the valid plugin configuration keywords, making sure to include the syntax of
expected values.
Document required gpbackup options and how they affect plugin processing.
backup_data
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
56
Plugin command to move streaming data from stdin to the remote storage system.
Synopsis
<plugin_executable> backup_data <plugin_config_file> <data_filenamekey>
Description
gpbackup invokes the backup_data plugin command on each segment host during a streaming
backup.
The backup_data implementation should read a potentially large stream of data from stdin and write
the data to a single file on the remote storage system. The data is sent to the command as a single
continuous stream per Greenplum Database segment. If backup_data modifies the data in any
manner (i.e. compresses), restore_data must perform the reverse operation.
Name or maintain a mapping from the destination file to data_filenamekey. This will be the file key
used for the restore operation.
Arguments
plugin_config_file
The absolute path to the plugin configuration YAML file.
data_filenamekey
The mapping key for a specially-named backup file for streamed data.
Exit Code
The backup_data command must exit with a value of 0 on success, non-zero if an error occurs. In
the case of a non-zero exit code, gpbackup displays the contents of stderr to the user.
backup_file
Plugin command to move a backup file to the remote storage system.
Synopsis
<plugin_executable> backup_file <plugin_config_file> <file_to_backup>
Description
gpbackup invokes the backup_file plugin command on the master and each segment host for the
file that gpbackup writes to a backup directory on local disk.
The backup_file implementation should process and copy the file to the remote storage system. Do
not remove the local copy of the file that you specify with file_to_backup.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
57
Arguments
plugin_config_file
The absolute path to the plugin configuration YAML file.
file_to_backup
The absolute path to a local backup file generated by gpbackup. Do not remove the local copy
of the file that you specify with file_to_backup.
Exit Code
The backup_file command must exit with a value of 0 on success, non-zero if an error occurs. In
the case of a non-zero exit code, gpbackup displays the contents of stderr to the user.
cleanup_plugin_for_backup
Plugin command to clean up a storage plugin after backup.
Synopsis
<plugin_executable> cleanup_plugin_for_backup <plugin_config_file> <local_backup_dir>
<scope>
<plugin_executable> cleanup_plugin_for_backup <plugin_config_file> <local_backup_dir>
<scope> <contentID>
Description
gpbackup invokes the cleanup_plugin_for_backup plugin command when a gpbackup operation
completes, both in success and failure cases. The scope argument specifies the execution scope.
gpbackup will invoke the command with each of the scope values.
The cleanup_plugin_for_backup command should perform the actions necessary to clean up the
remote storage system after a backup. Clean up activities may include removing remote directories
or temporary files created during the backup, disconnecting from the backup service, etc.
Arguments
plugin_config_file
The absolute path to the plugin configuration YAML file.
local_backup_dir
The local directory on the Greenplum Database host (master and segments) to which gpbackup
wrote backup files.
When scope is master, the local_backup_dir is the backup directory of the
Greenplum Database master.
When scope is segment, the local_backup_dir is the backup directory of a segment
instance. The contentID identifies the segment instance.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
58
When the scope is segment_host, the local_backup_dir is an arbitrary backup
directory on the host.
scope
The execution scope value indicates the host and number of times the plugin command is
executed. scope can be one of these values:
master - Execute the plugin command once on the master host.
segment_host - Execute the plugin command once on each of the segment hosts.
segment - Execute the plugin command once for each active segment instance on the
host running the segment instance. The contentID identifies the segment instance.
The Greenplum Database hosts and segment instances are based on the Greenplum Database
configuration when the back up was first initiated.
contentID
The contentID of the Greenplum Database master or segment instance corresponding to the
scope. contentID is passed only when the scope is master or segment.
When scope is master, the contentID is -1.
When scope is segment, the contentID is the content identifier of an active segment
instance.
Exit Code
The cleanup_plugin_for_backup command must exit with a value of 0 on success, non-zero if an
error occurs. In the case of a non-zero exit code, gpbackup displays the contents of stderr to the
user.
cleanup_plugin_for_restore
Plugin command to clean up a storage plugin after restore.
Synopsis
<plugin_executable> cleanup_plugin_for_restore <plugin_config_file> <local_backup_dir>
<scope>
<plugin_executable> cleanup_plugin_for_restore <plugin_config_file> <local_backup_dir>
<scope> <contentID>
Description
gprestore invokes the cleanup_plugin_for_restore plugin command when a gprestore operation
completes, both in success and failure cases. The scope argument specifies the execution scope.
gprestore will invoke the command with each of the scope values.
The cleanup_plugin_for_restore implementation should perform the actions necessary to clean up
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
59
the remote storage system after a restore. Clean up activities may include removing remote
directories or temporary files created during the restore, disconnecting from the backup service, etc.
Arguments
plugin_config_file
The absolute path to the plugin configuration YAML file.
local_backup_dir
The local directory on the Greenplum Database host (master and segments) from which
gprestore reads backup files.
When scope is master, the local_backup_dir is the backup directory of the
Greenplum Database master.
When scope is segment, the local_backup_dir is the backup directory of a segment
instance. The contentID identifies the segment instance.
When the scope is segment_host, the local_backup_dir is an arbitrary backup
directory on the host.
scope
The execution scope value indicates the host and number of times the plugin command is
executed. scope can be one of these values:
master - Execute the plugin command once on the master host.
segment_host - Execute the plugin command once on each of the segment hosts.
segment - Execute the plugin command once for each active segment instance on the
host running the segment instance. The contentID identifies the segment instance.
The Greenplum Database hosts and segment instances are based on the Greenplum Database
configuration when the back up was first initiated.
contentID
The contentID of the Greenplum Database master or segment instance corresponding to the
scope. contentID is passed only when the scope is master or segment.
When scope is master, the contentID is -1.
When scope is segment, the contentID is the content identifier of an active segment
instance.
Exit Code
The cleanup_plugin_for_restore command must exit with a value of 0 on success, non-zero if an
error occurs. In the case of a non-zero exit code, gprestore displays the contents of stderr to the
user.
delete_backup
Plugin command to delete the directory for a given backup timestamp from a remote system.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
60
Synopsis
<delete_backup> <plugin_config_file> <timestamp>
Description
gpbackup_manager invokes the delete_backup plugin command to delete the directory specified by
the backup timestamp on the remote system.
Arguments
plugin_config_file
The absolute path to the plugin configuration YAML file.
timestamp
The timestamp for the backup to delete.
Exit Code
The delete_backup command must exit with a value of 0 on success, or a non-zero value if an error
occurs. In the case of a non-zero exit code, gpbackup_manager displays the contents of stderr to the
user.
Example
my_plugin delete_backup /home/my-plugin_config.yaml 20191208130802
plugin_api_version
Plugin command to display the supported Backup Storage Plugin API version.
Synopsis
<plugin_executable> plugin_api_version
Description
gpbackup and gprestore invoke the plugin_api_version plugin command before a backup or
restore operation to determine Backup Storage Plugin API version compatibility.
Return Value
The plugin_api_version command must return the Backup Storage Plugin API version number
supported by the storage plugin, "0.4.0".
restore_data
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
61
Plugin command to stream data from the remote storage system to stdout.
Synopsis
<plugin_executable> restore_data <plugin_config_file> <data_filenamekey>
Description
gprestore invokes a plugin's restore_data or restore_data_subset command to restore a backup.
gprestore invokes the restore_data plugin command on each segment host when restoring a
compressed, multiple-data-file, or non-filtered streaming backup, or when the plugin does not
support the restore_data_subset command.
The restore_data implementation should read a potentially large data file named or mapped to
data_filenamekey from the remote storage system and write the contents to stdout. If the
backup_data command modified the data in any way (i.e. compressed), restore_data should
perform the reverse operation.
Arguments
plugin_config_file
The absolute path to the plugin configuration YAML file.
data_filenamekey
The mapping key to a backup file on the remote storage system. data_filenamekey is the
same key provided to the backup_data command.
Exit Code
The restore_data command must exit with a value of 0 on success, non-zero if an error occurs. In
the case of a non-zero exit code, gprestore displays the contents of stderr to the user.
See Also
restore_data_subset
restore_data_subset
Plugin command to stream a filtered dataset from the remote storage system to stdout .
Synopsis
<plugin_executable> restore_data_subset <plugin_config_file> <data_filenamekey> <offse
ts_file>
Description
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
62
gprestore invokes a plugin's restore_data or restore_data_subset command to restore a backup.
gprestore invokes the more performant restore_data_subset plugin command on each segment
host to perform a filtered restore operation when all of the following conditions hold:
The backup is an uncompressed, single-data-file backup (the gpbackup command was
invoked with the --no-compression and --single-data-file flags).
Filtering options (--include-table, --exclude-table, --include-table-file, or ‑‑exclude-
table-file) are specified on the gprestore command line.
The plugin_config_file specifies the restore_subset: "on" property setting.
gprestore invokes the restore_data_subset plugin command with an offsets_file that it automatically
generates based on the filters specified. The restore_data_subset implementation should extract
the start and end byte offsets for each relation specified in offsets_file, use this information to
selectively read from a potentially large data file named or mapped to data_filenamekey on the
remote storage system, and write the contents to stdout.
Arguments
plugin_config_file
The absolute path to the plugin configuration YAML file. This file must specify the
restore_subset: "on" property setting.
data_filenamekey
The mapping key to a backup file on the remote storage system. data_filenamekey is the
same key provided to the backup_data command.
offsets_file
The absolute path to the relation offsets file generated by gprestore. This file specifies the
number of relations, and the start and end byte offsets for each relation, that the plugin should
restore. gprestore specifies this information on a single line in the file. For example, if the file
contents specified 2 1001 2007 4500 6000, the plugin restores two relations; relation 1 with
start offset 1001 and end offset 2007, and relation 2 with start offset 4500 and end offset
6000.
Exit Code
The restore_data_subset command must exit with a value of 0 on success, non-zero if an error
occurs. In the case of a non-zero exit code, gprestore displays the contents of stderr to the user.
See Also
restore_data
restore_file
Plugin command to move a backup file from the remote storage system.
Synopsis
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
63
<plugin_executable> restore_file <plugin_config_file> <file_to_restore>
Description
gprestore invokes the restore_file plugin command on the master and each segment host for the
file that gprestore will read from a backup directory on local disk.
The restore_file command should process and move the file from the remote storage system to
file_to_restore on the local host.
Arguments
plugin_config_file
The absolute path to the plugin configuration YAML file.
file_to_restore
The absolute path to which to move a backup file from the remote storage system.
Exit Code
The restore_file command must exit with a value of 0 on success, non-zero if an error occurs. In
the case of a non-zero exit code, gprestore displays the contents of stderr to the user.
setup_plugin_for_backup
Plugin command to initialize a storage plugin for the backup operation.
Synopsis
<plugin_executable> setup_plugin_for_backup <plugin_config_file> <local_backup_dir> <s
cope>
<plugin_executable> setup_plugin_for_backup <plugin_config_file> <local_backup_dir> <s
cope> <contentID>
Description
gpbackup invokes the setup_plugin_for_backup plugin command during gpbackup initialization
phase. The scope argument specifies the execution scope. gpbackup will invoke the command with
each of the scope values.
The setup_plugin_for_backup command should perform the activities necessary to initialize the
remote storage system before backup begins. Set up activities may include creating remote
directories, validating connectivity to the remote storage system, checking disks, and so forth.
Arguments
plugin_config_file
The absolute path to the plugin configuration YAML file.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
64
local_backup_dir
The local directory on the Greenplum Database host (master and segments) to which gpbackup
will write backup files. gpbackup creates this local directory.
When scope is master, the local_backup_dir is the backup directory of the
Greenplum Database master.
When scope is segment, the local_backup_dir is the backup directory of a segment
instance. The contentID identifies the segment instance.
When the scope is segment_host, the local_backup_dir is an arbitrary backup
directory on the host.
scope
The execution scope value indicates the host and number of times the plugin command is
executed. scope can be one of these values:
master - Execute the plugin command once on the master host.
segment_host - Execute the plugin command once on each of the segment hosts.
segment - Execute the plugin command once for each active segment instance on the
host running the segment instance. The contentID identifies the segment instance.
The Greenplum Database hosts and segment instances are based on the Greenplum Database
configuration when the back up was first initiated.
contentID
The contentID of the Greenplum Database master or segment instance corresponding to the
scope. contentID is passed only when the scope is master or segment.
When scope is master, the contentID is -1.
When scope is segment, the contentID is the content identifier of an active segment
instance.
Exit Code
The setup_plugin_for_backup command must exit with a value of 0 on success, non-zero if an error
occurs. In the case of a non-zero exit code, gpbackup displays the contents of stderr to the user.
setup_plugin_for_restore
Plugin command to initialize a storage plugin for the restore operation.
Synopsis
<plugin_executable> setup_plugin_for_restore <plugin_config_file> <local_backup_dir> <
scope>
<plugin_executable> setup_plugin_for_restore <plugin_config_file> <local_backup_dir> <
scope> <contentID>
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
65
Description
gprestore invokes the setup_plugin_for_restore plugin command during gprestore initialization
phase. The scope argument specifies the execution scope. gprestore will invoke the command with
each of the scope values.
The setup_plugin_for_restore command should perform the activities necessary to initialize the
remote storage system before a restore operation begins. Set up activities may include creating
remote directories, validating connectivity to the remote storage system, etc.
Arguments
plugin_config_file
The absolute path to the plugin configuration YAML file.
local_backup_dir
The local directory on the Greenplum Database host (master and segments) from which
gprestore reads backup files. gprestore creates this local directory.
When scope is master, the local_backup_dir is the backup directory of the
Greenplum Database master.
When scope is segment, the local_backup_dir is the backup directory of a segment
instance. The contentID identifies the segment instance.
When the scope is segment_host, the local_backup_dir is an arbitrary backup
directory on the host.
scope
The execution scope value indicates the host and number of times the plugin command is
executed. scope can be one of these values:
master - Execute the plugin command once on the master host.
segment_host - Execute the plugin command once on each of the segment hosts.
segment - Execute the plugin command once for each active segment instance on the
host running the segment instance. The contentID identifies the segment instance.
The Greenplum Database hosts and segment instances are based on the Greenplum Database
configuration when the back up was first initiated.
contentID
The contentID of the Greenplum Database master or segment instance corresponding to the
scope. contentID is passed only when the scope is master or segment.
When scope is master, the contentID is -1.
When scope is segment, the contentID is the content identifier of an active segment
instance.
Exit Code
The setup_plugin_for_restore command must exit with a value of 0 on success, non-zero if an
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
66
error occurs. In the case of a non-zero exit code, gprestore displays the contents of stderr to the
user.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
67
Backup Utility Reference
Reference information for backup related command-line utilities.
gpbackup
gprestore
gpbackup_manager
gpbackup
Create a Greenplum Database backup for use with the gprestore utility.
Synopsis
gpbackup --dbname <database_name>
[--backup-dir <directory>]
[--compression-level <level>]
[--compression-type <type>]
[--copy-queue-size <int>
[--data-only]
[--debug]
[--exclude-schema <schema_name> [--exclude-schema <schema_name> ...]]
[--exclude-table <schema.table> [--exclude-table <schema.table> ...]]
[--exclude-schema-file <file_name>]
[--exclude-table-file <file_name>]
[--include-schema <schema_name> [--include-schema <schema_name> ...]]
[--include-table <schema.table> [--include-table <schema.table> ...]]
[--include-schema-file <file_name>]
[--include-table-file <file_name>]
[--incremental [--from-timestamp <backup-timestamp>]]
[--jobs <int>]
[--leaf-partition-data]
[--metadata-only]
[--no-compression]
[--plugin-config <config_file_location>]
[--quiet]
[--single-data-file]
[--verbose]
[--version]
[--with-stats]
[--without-globals]
gpbackup --help
Description
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
68
The gpbackup utility backs up the contents of a database into a collection of metadata files and data
files that can be used to restore the database at a later time using gprestore. When you back up a
database, you can specify table level and schema level filter options to back up specific tables. For
example, you can combine schema level and table level options to back up all the tables in a schema
except for a single table.
By default, gpbackup backs up objects in the specified database as well as global Greenplum
Database system objects. Use --without-globals to omit global objects. gprestore does not restore
global objects by default; use --with-globals to restore them. See Objects Included in a Backup or
Restore for additional information.
For materialized views, data is not backed up, only the materialized view definition is backed up.
gpbackup stores the object metadata files and DDL files for a backup in the Greenplum Database
master data directory by default. Greenplum Database segments use the COPY ... ON SEGMENT
command to store their data for backed-up tables in compressed CSV data files, located in each
segment's data directory. See Understanding Backup Files for additional information.
You can add the --backup-dir option to copy all backup files from the Greenplum Database master
and segment hosts to an absolute path for later use. Additional options are provided to filter the
backup set in order to include or exclude specific tables.
You can create an incremental backup with the --incremental option. Incremental backups are
efficient when the total amount of data in append-optimized tables or table partitions that changed is
small compared to the data has not changed. See Creating and Using Incremental Backups with
gpbackup and gprestore for information about incremental backups.
With the default --jobs option (1 job), each gpbackup operation uses a single transaction on the
Greenplum Database master host. The COPY ... ON SEGMENT command performs the backup task in
parallel on each segment host. The backup process acquires an ACCESS SHARE lock on each table
that is backed up. During the table locking process, the database should be in a quiescent state.
When a back up operation completes, gpbackup returns a status code. See Return Codes.
The gpbackup utility cannot be run while gpexpand is initializing new segments. Backups created
before the expansion cannot be restored with gprestore after the cluster expansion is completed.
gpbackup can send status email notifications after a back up operation completes. You specify when
the utility sends the mail and the email recipients in a configuration file. See Configuring Email
Notifications.
Note: This utility uses secure shell (SSH) connections between systems to perform its tasks. In large
Greenplum Database deployments, cloud deployments, or deployments with a large number of
segments per host, this utility may exceed the host's maximum threshold for unauthenticated
connections. Consider updating the SSH MaxStartups and MaxSessions configuration parameters to
increase this threshold. For more information about SSH configuration options, refer to the SSH
documentation for your Linux distribution.
Options
--dbname database_name
Required. Specifies the database to back up.
--backup-dir directory
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
69
Optional. Copies all required backup files (metadata files and data files) to the specified
directory. You must specify directory as an absolute path (not relative). If you do not supply
this option, metadata files are created on the Greenplum Database master host in the
$MASTER_DATA_DIRECTORY/backups/YYYYMMDD/YYYYMMDDhhmmss/ directory.
Segment hosts create CSV data files in the
<seg_dir>/backups/YYYYMMDD/YYYYMMDDhhmmss/ directory. When you specify a
custom backup directory, files are copied to these paths in subdirectories of the backup
directory.
You cannot combine this option with the option --plugin-config.
--compression-level level
Optional. Specifies the compression level (from 1 to 9) used to compress data files. The default
is 1. Note that gpbackup uses compression by default.
--compression-type type
Optional. Specifies the compression type (gzip or zstd) used to compress data files. The
default is gzip.
Note: In order to use the zstd compression type, Zstandard (http://facebook.github.io/zstd/)
must be installed in a $PATH accessible by the gpadmin user.
--copy-queue-size int
Optional. Specifies the number of COPY commands gpbackup should enqueue when backing
up using the --single-data-file option. This option optimizes backup performance by
reducing the amount of time spent initializing COPY commands. If you do not set this option to
2 or greater, gpbackup enqueues 1 COPY command at a time.
Note: This option must be used with the--single-data-file option and cannot be used with
the --jobs option.
--data-only
Optional. Backs up only the table data into CSV files, but does not backup metadata files
needed to recreate the tables and other database objects.
--debug
Optional. Displays verbose debug messages during operation.
--exclude-schema schema_name
Optional. Specifies a database schema to exclude from the backup. You can specify this
option multiple times to exclude multiple schemas. You cannot combine this option with the
option --include-schema, --include-schema-file, or a table filtering option such as --
include-table.
See Filtering the Contents of a Backup or Restore for more information.
See Requirements and Limitations for limitations when leaf partitions of a partitioned table are
in different schemas from the root partition.
--exclude-schema-file file_name
Optional. Specifies a text file containing a list of schemas to exclude from the backup. Each
line in the text file must define a single schema. The file must not include trailing lines. If a
schema name uses any character other than a lowercase letter, number, or an underscore
character, then you must include that name in double quotes. You cannot combine this option
with the option --include-schema or --include-schema-file, or a table filtering option such as
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
70
--include-table.
See Filtering the Contents of a Backup or Restore for more information.
See Requirements and Limitations for limitations when leaf partitions of a partitioned table are
in different schemas from the root partition.
--exclude-table schema.table
Optional. Specifies a table to exclude from the backup. The table must be in the format
<schema-name>.<table-name>. If a table or schema name uses any character other than a
lowercase letter, number, or an underscore character, then you must include that name in
double quotes. You can specify this option multiple times. You cannot combine this option
with the option --exclude-schema, --exclude-schema-file, or another a table filtering option
such as --include-table.
If you specify a leaf partition name, gpbackup ignores the partition names. The leaf partition is
not excluded.
See Filtering the Contents of a Backup or Restore for more information.
--exclude-table-file file_name
Optional. Specifies a text file containing a list of tables to exclude from the backup. Each line in
the text file must define a single table using the format <schema-name>.<table-name>. The file
must not include trailing lines. If a table or schema name uses any character other than a
lowercase letter, number, or an underscore character, then you must include that name in
double quotes. You cannot combine this option with the option --exclude-schema, --exclude-
schema-file, or another a table filtering option such as --include-table.
If you specify leaf partition names in a file that is used with --exclude-table-file, gpbackup
ignores the partition names. The leaf partitions are not excluded.
See Filtering the Contents of a Backup or Restore for more information.
--include-schema schema_name
Optional. Specifies a database schema to include in the backup. You can specify this option
multiple times to include multiple schemas. If you specify this option, any schemas that are not
included in subsequent --include-schema options are omitted from the backup set. You
cannot combine this option with the options --exclude-schema, --exclude-schema-file, --
exclude-schema-file, --include-table, or --include-table-file. See Filtering the Contents
of a Backup or Restore for more information.
--include-schema-file file_name
Optional. Specifies a text file containing a list of schemas to back up. Each line in the text file
must define a single schema. The file must not include trailing lines. If a schema name uses
any character other than a lowercase letter, number, or an underscore character, then you
must include that name in double quotes. See Filtering the Contents of a Backup or Restore
for more information.
--include-table schema.table
Optional. Specifies a table to include in the backup. The table must be in the format <schema-
name>.<table-name>. For information on specifying special characters in schema and table
names, see Schema and Table Names.
You can specify this option multiple times. You cannot combine this option with a schema
filtering option such as --include-schema, or another table filtering option such as --exclude-
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
71
table-file.
You can also specify the qualified name of a sequence, a view, or a materialized view.
If you specify this option, the utility does not automatically back up dependent objects. You
must also explicitly specify dependent objects that are required. For example if you back up a
view or a materialized view, you must also back up the tables that the view or materialized
view uses. If you back up a table that uses a sequence, you must also back up the sequence.
You can optionally specify a table leaf partition name in place of the table name, to include
only specific leaf partitions in a backup with the --leaf-partition-data option. When a leaf
partition is backed up, the leaf partition data is backed up along with the metadata for the
partitioned table.
See Filtering the Contents of a Backup or Restore for more information.
--include-table-file file_name
Optional. Specifies a text file containing a list of tables to include in the backup. Each line in
the text file must define a single table using the format <schema-name>.<table-name>. The file
must not include trailing lines. For information on specifying special characters in schema and
table names, see Schema and Table Names.
Any tables not listed in this file are omitted from the backup set. You cannot combine this
option with a schema filtering option such as --include-schema, or another table filtering
option such as --exclude-table-file.
You can also specify the qualified name of a sequence, a view, or a materialized view.
If you specify this option, the utility does not automatically back up dependent objects. You
must also explicitly specify dependent objects that are required. For example if you back up a
view or a materialized view, you must also specify the tables that the view or the materialized
view uses. If you specify a table that uses a sequence, you must also specify the sequence.
You can optionally specify a table leaf partition name in place of the table name, to include
only specific leaf partitions in a backup with the --leaf-partition-data option. When a leaf
partition is backed up, the leaf partition data is backed up along with the metadata for the
partitioned table.
See Filtering the Contents of a Backup or Restore for more information.
--incremental
Specify this option to add an incremental backup to an incremental backup set. A backup set
is a full backup and one or more incremental backups. The backups in the set must be created
with a consistent set of backup options to ensure that the backup set can be used in a restore
operation.
By default, gpbackup attempts to find the most recent existing backup with a consistent set of
options. If the backup is a full backup, the utility creates a backup set. If the backup is an
incremental backup, the utility adds the backup to the existing backup set. The incremental
backup is added as the latest backup in the backup set. You can specify --from-timestamp to
override the default behavior.
--from-timestamp backup-timestamp
Optional. Specifies the timestamp of a backup. The specified backup must have backup
options that are consistent with the incremental backup that is being created. If the specified
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
72
backup is a full backup, the utility creates a backup set. If the specified backup is an
incremental backup, the utility adds the incremental backup to the existing backup set.
You must specify --leaf-partition-data with this option. You cannot combine this option
with --data-only or --metadata-only.
A backup is not created and the utility returns an error if the backup cannot add the backup to
an existing incremental backup set or cannot use the backup to create a backup set.
For information about creating and using incremental backups, see Creating and Using
Incremental Backups with gpbackup and gprestore.
--jobs int
Optional. Specifies the number of jobs to run in parallel when backing up tables. By default,
gpbackup uses 1 job (database connection). Increasing this number can improve the speed of
backing up data. When running multiple jobs, each job backs up tables in separate
transactions.
Important: If you specify a value higher than 1, the database should be in a quiescent state
while the utility acquires a lock on the tables that are being backed up. If the utility cannot
acquire a lock on a table being backed up it will exit.
You cannot use this option in combination with the options --metadata-only, --single-data-
file, or --plugin-config.
Note: When using the --jobs flag, there is a potential deadlock scenario to generate a
WARNING message in the log files. During the metadata portion of the backup, the main worker
process gathers Access Share locks on all the tables in the backup set. During the data portion
of the backup, based on the value of the --jobs flag, additional workers are created that
attempt to take additional Access Share locks on the tables they back up. Between the
metadata backup and the data backup, if a third party process (operations like TRUNCATE, DROP,
ALTER) attempts to access the same tables and obtain an Exclusive lock, the worker thread
identifies the potential deadlock and hands off the table backup responsibilities to the main
worker (that already has an Access Share lock on that particular table). A warning message is
logged, similar to: [WARNING]:-Worker 5 could not acquire AccessShareLock for table
public.foo.
--leaf-partition-data
Optional. For partitioned tables, creates one data file per leaf partition instead of one data file
for the entire table (the default). Using this option also enables you to specify individual leaf
partitions to include in or exclude from a backup, with the --include-table, --include-table-
file, --exclude-table, and --exclude-table-file options.
--metadata-only
Optional. Creates only the metadata files (DDL) needed to recreate the database objects, but
does not back up the actual table data.
--no-compression
Optional. Do not compress the table data CSV files.
--plugin-config config-file_location
Specify the location of the gpbackup plugin configuration file, a YAML-formatted text file. The
file contains configuration information for the plugin application that gpbackup uses during the
backup operation.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
73
If you specify the --plugin-config option when you back up a database, you must specify this
option with configuration information for a corresponding plugin application when you restore
the database from the backup.
You cannot combine this option with the option --backup-dir.
For information about using storage plugin applications, see Using gpbackup Storage Plugins.
--quiet
Optional. Suppress all non-warning, non-error log messages.
--single-data-file
Optional. Create a single data file on each segment host for all tables backed up on that
segment. By default, each gpbackup creates one compressed CSV file for each table that is
backed up on the segment.
Note: If you use the --single-data-file option to combine table backups into a single file per
segment, you cannot set the gprestore option --jobs to a value higher than 1 to perform a
parallel restore operation.
--verbose
Optional. Print verbose log messages.
--version
Optional. Print the version number and exit.
--with-stats
Optional. Include query plan statistics in the backup set.
--without-globals
Optional. Omit the global Greenplum Database system objects during backup.
--help
Displays the online help.
Return Codes
One of these codes is returned after gpbackup completes.
0 – Backup completed with no problems.
1 – Backup completed with non-fatal errors. See log file for more information.
2 – Backup failed with a fatal error. See log file for more information.
Schema and Table Names
When using the option --include-table or --include-table-file to filter backups, the schema or
table names may contain upper-case characters, space ( ), newline (\n), (\t), or any of these special
characters:
~ # $ % ^ & * ( ) _ - + [ ] { } > < \ | ; : / ? ! , " '
For example:
public.foo"bar
public.foo bar
public.foo\nbar
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
74
Note: The --include-table and --include-table-file options do not support schema or table
names that contain periods (.) or evaluated newlines.
When the table name has special characters, the name must be enclosed in single quotes:
gpbackup --dbname test --include-table 'my#1schema'.'my_$42_Table'
When the table name contains single quotes, use an escape character for each quote or encapsulate
the table name within double quotes. For example:
gpbackup --dbname test --include-table public.'foo\'bar'
gpbackup --dbname test --include-table public."foo'bar"
When using the option --include-table-file, the table names in the text file do not require single
quotes. For example, the contents of the text file could be similar to:
my#1schema.my_$42_Table
my#1schema.my_$590_Table
Examples
Backup all schemas and tables in the "demo" database, including global Greenplum Database system
objects statistics:
$ gpbackup --dbname demo
Backup all schemas and tables in the "demo" database except for the "twitter" schema:
$ gpbackup --dbname demo --exclude-schema twitter
Backup only the "twitter" schema in the "demo" database:
$ gpbackup --dbname demo --include-schema twitter
Backup all schemas and tables in the "demo" database, including global Greenplum Database system
objects and query statistics, and copy all backup files to the /home/gpadmin/backup directory:
$ gpbackup --dbname demo --with-stats --backup-dir /home/gpadmin/backup
This example uses --include-schema with --exclude-table to back up a schema except for a single
table.
$ gpbackup --dbname demo --include-schema mydata --exclude-table mydata.addresses
You cannot use the option --exclude-schema with a table filtering option such as --include-table.
See Also
gprestore, Parallel Backup with gpbackup and gprestore and Using the S3 Storage Plugin with
gpbackup and gprestore
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
75
Parent topic:Backup Utility Reference
gprestore
Restore a Greenplum Database backup that was created using the gpbackup utility. By default
gprestore uses backed up metadata files and DDL files located in the Greenplum Database master
host data directory, with table data stored locally on segment hosts in CSV data files.
Synopsis
gprestore --timestamp <YYYYMMDDHHMMSS>
[--backup-dir <directory>]
[--copy-queue-size <int>
[--create-db]
[--debug]
[--exclude-schema <schema_name> [--exclude-schema <schema_name> ...]]
[--exclude-table <schema.table> [--exclude-table <schema.table> ...]]
[--exclude-table-file <file_name>]
[--exclude-schema-file <file_name>]
[--include-schema <schema_name> [--include-schema <schema_name> ...]]
[--include-table <schema.table> [--include-table <schema.table> ...]]
[--include-schema-file <file_name>]
[--include-table-file <file_name>]
[--truncate-table]
[--redirect-schema <schema_name>]
[--data-only | --metadata-only]
[--incremental]
[--jobs <int>]
[--on-error-continue]
[--plugin-config <config_file_location>]
[--quiet]
[--redirect-db <database_name>]
[--verbose]
[--version]
[--with-globals]
[--with-stats]
[--run-analyze]
gprestore --help
Description
To use gprestore to restore from a backup set, you must include the --timestamp option to specify
the exact timestamp value (YYYYMMDDHHMMSS) of the backup set to restore. If you specified a custom -
-backup-dir to consolidate the backup files, include the same --backup-dir option with gprestore to
locate the backup files.
If the backup you specify is an incremental backup, you need a complete set of backup files (a full
backup and any required incremental backups). gprestore ensures that the complete backup set is
available before attempting to restore a backup.
Important: For incremental backup sets, the backups must be on a single device. For example, a
backup set must all be on a Data Domain system.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
76
For information about incremental backups, see Creating and Using Incremental Backups with
gpbackup and gprestore.
When restoring from a backup set, gprestore restores to a database with the same name as the
name specified when creating the backup set. If the target database exists and a table being restored
exists in the database, the restore operation fails. Include the --create-db option if the target
database does not exist in the cluster. You can optionally restore a backup set to a different database
by using the --redirect-db option.
When restoring a backup set that contains data from some leaf partitions of a partitioned tables, the
partitioned table is restored along with the data for the leaf partitions. For example, you create a
backup with the gpbackup option --include-table-file and the text file lists some leaf partitions of a
partitioned table. Restoring the backup creates the partitioned table and restores the data only for
the leaf partitions listed in the file.
By default, only database objects in the backup set are restored. Greenplum Database system
objects are automatically included in a gpbackup backup set, but these objects are only restored if
you include the --with-globals option to gprestore.
During a restore operation, automatic updating of table statistics is disabled for the tables being
restored. If you backed up query plan statistics using the --with-stats option, you can restore those
statistics by providing --with-stats to gprestore. If you did not use --with-stats during a backup,
or you want to collect current statistics during the restore operation, you can use the --run-analyze
option to run ANALYZE on the restored tables.
When a materialized view is restored, the data is not restored. To populate the materialized view with
data, use REFRESH MATERIALIZED VIEW. The tables that are referenced by the materialized view
definition must be available. The gprestore log file lists the materialized views that were restored and
the REFRESH MATERIALIZED VIEW commands that are used to populate the materialized views with
data.
Performance of restore operations can be improved by creating multiple parallel connections to
restore table data and metadata. By default gprestore uses 1 connection, but you can increase this
number with the --jobs option for large restore operations.
When a restore operation completes, gprestore returns a status code. See Return Codes.
gprestore can send status email notifications after a back up operation completes. You specify when
the utility sends the mail and the email recipients in a configuration file. See Configuring Email
Notifications.
Note: This utility uses secure shell (SSH) connections between systems to perform its tasks. In large
Greenplum Database deployments, cloud deployments, or deployments with a large number of
segments per host, this utility may exceed the host's maximum threshold for unauthenticated
connections. Consider updating the SSH MaxStartups and MaxSessions configuration parameters to
increase this threshold. For more information about SSH configuration options, refer to the SSH
documentation for your Linux distribution.
Options
--timestamp YYYYMMDDHHMMSS
Required. Specifies the timestamp of the gpbackup backup set to restore. By default gprestore
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
77
tries to locate metadata files for the timestamp on the Greenplum Database master host in the
$MASTER_DATA_DIRECTORY/backups/YYYYMMDD/YYYYMMDDhhmmss/ directory, and
CSV data files in the <seg_dir>/backups/YYYYMMDD/YYYYMMDDhhmmss/ directory of each
segment host.
--backup-dir directory
Optional. Sources all backup files (metadata files and data files) from the specified directory.
You must specify directory as an absolute path (not relative). If you do not supply this option,
gprestore tries to locate metadata files for the timestamp on the Greenplum Database master
host in the $MASTER_DATA_DIRECTORY/backups/YYYYMMDD/YYYYMMDDhhmmss/
directory. CSV data files must be available on each segment in the
<seg_dir>/backups/YYYYMMDD/YYYYMMDDhhmmss/ directory. Include this option when
you specify a custom backup directory with gpbackup.
You cannot combine this option with the option --plugin-config.
--create-db
Optional. Creates the database before restoring the database object metadata.
The database is created by cloning the empty standard system database template0.
--copy-queue-size int
Optional. Specifies the number of COPY commands gprestore should enqueue when restoring
a backup set. This option optimizes restore performance by reducing the amount of time
spent initializing COPY commands. If you do not set this option to 2 or greater, gprestore
enqueues 1 COPY command at a time.
--data-only
Optional. Restores table data from a backup created with the gpbackup utility, without creating
the database tables. This option assumes the tables exist in the target database. To restore
data for a specific set of tables from a backup set, you can specify an option to include tables
or schemas or exclude tables or schemas. Specify the --with-stats option to restore table
statistics from the backup.
The backup set must contain the table data to be restored. For example, a backup created
with the gpbackup option --metadata-only does not contain table data.
SEQUENCE values are updated to match the values taken at the time of the backup.
To restore only database tables, without restoring the table data, see the option --metadata-
only.
--debug
Optional. Displays verbose and debug log messages during a restore operation.
--exclude-schema schema_name
Optional. Specifies a database schema to exclude from the restore operation. You can specify
this option multiple times. You cannot combine this option with the option --include-schema,
--include-schema-file, or a table filtering option such as --include-table.
--exclude-schema-file file_name
Optional. Specifies a text file containing a list of schemas to exclude from the backup. Each
line in the text file must define a single schema. The file must not include trailing lines. If a
schema name uses any character other than a lowercase letter, number, or an underscore
character, then you must include that name in double quotes. You cannot combine this option
with the option --include-schema or --include-schema-file, or a table filtering option such as
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
78
--include-table.
--exclude-table schema.table
Optional. Specifies a table to exclude from the restore operation. You can specify this option
multiple times. The table must be in the format <schema-name>.<table-name>. If a table or
schema name uses any character other than a lowercase letter, number, or an underscore
character, then you must include that name in double quotes. You can specify this option
multiple times. If the table is not in the backup set, the restore operation fails. You cannot
specify a leaf partition of a partitioned table.
You cannot combine this option with the option --exclude-schema, --exclude-schema-file, or
another a table filtering option such as --include-table.
--exclude-table-file file_name
Optional. Specifies a text file containing a list of tables to exclude from the restore operation.
Each line in the text file must define a single table using the format <schema-name>.<table-
name>. The file must not include trailing lines. If a table or schema name uses any character
other than a lowercase letter, number, or an underscore character, then you must include that
name in double quotes. If a table is not in the backup set, the restore operation fails. You
cannot specify a leaf partition of a partitioned table.
You cannot combine this option with the option --exclude-schema, --exclude-schema-file, or
another a table filtering option such as --include-table.
--include-schema schema_name
Optional. Specifies a database schema to restore. You can specify this option multiple times. If
you specify this option, any schemas that you specify must be available in the backup set. Any
schemas that are not included in subsequent --include-schema options are omitted from the
restore operation.
If a schema that you specify for inclusion exists in the database, the utility issues an error and
continues the operation. The utility fails if a table being restored exists in the database.
You cannot use this option if objects in the backup set have dependencies on multiple
schemas.
See Filtering the Contents of a Backup or Restore for more information.
--include-schema-file file_name
Optional. Specifies a text file containing a list of schemas to restore. Each line in the text file
must define a single schema. The file must not include trailing lines. If a schema name uses
any character other than a lowercase letter, number, or an underscore character, then you
must include that name in double quotes.
The schemas must exist in the backup set. Any schemas not listed in this file are omitted from
the restore operation.
You cannot use this option if objects in the backup set have dependencies on multiple
schemas.
--include-table schema.table
Optional. Specifies a table to restore. The table must be in the format .. You can specify this
option multiple times. You cannot specify a leaf partition of a partitioned table. For information
on specifying special characters in schema and table names, see the gpbackup Schema and
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
79
Table Names section.
You can also specify the qualified name of a sequence, a view, or a materialized view.
If you specify this option, the utility does not automatically restore dependent objects. You
must also explicitly specify the dependent objects that are required. For example if you
restore a view or a materialized view, you must also restore the tables that the view or the
materialized view uses. If you restore a table that uses a sequence, you must also restore the
sequence. The dependent objects must exist in the backup set.
You cannot combine this option with a schema filtering option such as --include-schema, or
another table filtering option such as --exclude-table-file.
--include-table-file file_name
Optional. Specifies a text file containing a list of tables to restore. Each line in the text file must
define a single table using the format <schema-name>.<table-name>. The file must not include
trailing lines. Any tables not listed in this file are omitted from the restore operation. You
cannot specify a leaf partition of a partitioned table. For information on specifying special
characters in schema and table names, see the gpbackup Schema and Table Names section.
You can also specify the qualified name of a sequence, a view, or a materialized view.
If you specify this option, the utility does not automatically restore dependent objects. You
must also explicitly specify dependent objects that are required. For example if you restore a
view or a materialized view, you must also specify the tables that the view or the materialized
uses. If you specify a table that uses a sequence, you must also specify the sequence. The
dependent objects must exist in the backup set.
For a materialized view, the data is not restored. To populate the materialized view with data,
you must use REFRESH MATERIALIZED VIEW and the tables that are referenced by the
materialized view definition must be available.
If you use the --include-table-file option, gprestore does not create roles or set the owner
of the tables. The utility restores table indexes and rules. Triggers are also restored but are not
supported in Greenplum Database.
See Filtering the Contents of a Backup or Restore for more information.
--incremental (Beta)
Optional. Requires the --data-only option. Restores only the table data in the incremental
backup specified by the --timestamp option. Table data is not restored from previous
incremental backups in the backup set. For information about incremental backups, see
Creating and Using Incremental Backups with gpbackup and gprestore.
Warning: This is a Beta featureand is not supported in a production environment.
An incremental backup contains the following table data that can be restored.
Data from all heap tables.
Data from append-optimized tables that have been modified since the previous
backup.
Data from leaf partitions that have been modified from the previous backup.
When this option is specified, gprestore restores table data by truncating the table and
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
80
reloading data into the table. SEQUENCE values are then updated to match the values taken at
the time of the backup.
Before performing the restore operation, gprestore ensures that the tables being restored
exist. If a table does not exist, gprestore returns an error and exits. If the --on-error-
continue option is specified, gprestore logs missing tables and attempts to complete the
restore operation.
Warning: When this option is specified, gpbackup assumes that no changes have been made
to the table definitions of the tables being restored, such as adding or removing columns.
--truncate-table
Optional. Truncate data from a set of tables before restoring the table data from a backup. This
option lets you replace table data with data from a backup. Otherwise, table data might be
duplicated.
You must specify the set of tables with either the option --include-table or --include-
table-file. You must also specify --data-only to restore table data without creating the
tables.
You can use this option with the --redirect-db option. You cannot use this option with --
redirect-schema.
--redirect-schema schema_name
Optional. Restore data in the specified schema instead of the original schemas. The specified
schema must already exist. If the data being restored is in multiple schemas, all the data is
redirected into the specified schema.
This option must be used with an option that includes tables or schemas: --include-table, --
include-table-file, --include-schema, or --include-schema-file.
You cannot use this option with an option that excludes schemas or tables such as --exclude-
schema or --exclude-table.
You can use this option with the --metadata-only or --data-only options.
--jobs int
Optional. Specifies the number of parallel connections to use when restoring table data and
metadata. By default, gprestore uses 1 connection. Increasing this number can improve the
speed of restoring data.
Note: If you used the gpbackup --single-data-file option to combine table backups into a
single file per segment, you cannot set --jobs to a value higher than 1 to perform a parallel
restore operation.
--metadata-only
Optional. Creates database tables from a backup created with the gpbackup utility, but does not
restore the table data. This option assumes the tables do not exist in the target database. To
create a specific set of tables from a backup set, you can specify an option to include tables or
schemas or exclude tables or schemas. Specify the option --with-globals to restore the
Greenplum Database system objects.
The backup set must contain the DDL for tables to be restored. For example, a backup
created with the gpbackup option --data-only does not contain the DDL for tables.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
81
To restore table data after you create the database tables, see the option --data-only.
--on-error-continue
Optional. Specify this option to continue the restore operation if an SQL error occurs when
creating database metadata (such as tables, roles, or functions) or restoring data. If another
type of error occurs, the utility exits. The default is to exit on the first error.
When this option is included, the utility displays an error summary and writes error information
to the gprestore log file and continues the restore operation. The utility also creates text files
in the backup directory that contain the list of tables that generated SQL errors.
Tables with metadata errors - gprestore_<backup-timestamp>_<restore-
time>_error_tables_metadata
Tables with data errors - gprestore_<backup-timestamp>_<restore-
time>_error_tables_data
--plugin-config config-file_location
Specify the location of the gpbackup plugin configuration file, a YAML-formatted text file. The
file contains configuration information for the plugin application that gprestore uses during the
restore operation.
If you specify the --plugin-config option when you back up a database, you must specify this
option with configuration information for a corresponding plugin application when you restore
the database from the backup.
You cannot combine this option with the option --backup-dir.
For information about using storage plugin applications, see Using gpbackup Storage Plugins.
--quiet
Optional. Suppress all non-warning, non-error log messages.
--redirect-db database_name
Optional. Restore to the specified database_name instead of to the database that was backed
up.
--verbose
Optional. Displays verbose log messages during a restore operation.
--version
Optional. Print the version number and exit.
--with-globals
Optional. Restores Greenplum Database system objects in the backup set, in addition to
database objects. See Objects Included in a Backup or Restore.
--with-stats
Optional. Restore query plan statistics from the backup set. If the backup set was not created
with the --with-stats option, an error is returned. Restored tables will only have statistics
from the backup. You cannot use this option with --run-analyze.
To collect current statistics for the restored tables during the restore operation, use the --run-
analyze option. As an alternative, you can run the ANALYZE command on the tables after the
tables are restored.
--run-analyze
Optional. Run ANALYZE on the tables that are restored. For a partitioned table, ANALYZE is run
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
82
on the root partitioned table. If --with-stats was specified for the backup, those statistics are
ignored. You cannot use this option with --with-stats.
If the backup being restored used the gpbackup option --leaf-partition-data, gprestore runs
ANALYZE only on the individual leaf partitions that are restored, not the root partitioned table.
For Greenplum Database 5, ANALYZE updates the root partitioned table statistics when all leaf
partitions have statistics as the default. For Greenplum Database 4, you must run ANALYZE on
the root partitioned table to update the root partition statistics.
Depending the tables being restored, running ANALYZE on restored tables might increase the
duration of the restore operation.
--help
Displays the online help.
Return Codes
One of these codes is returned after gprestore completes.
0 – Restore completed with no problems.
1 – Restore completed with non-fatal errors. See log file for more information.
2 – Restore failed with a fatal error. See log file for more information.
Examples
Create the demo database and restore all schemas and tables in the backup set for the indicated
timestamp:
$ dropdb demo
$ gprestore --timestamp 20171103152558 --create-db
Restore the backup set to the "demo2" database instead of the "demo" database that was backed
up:
$ createdb demo2
$ gprestore --timestamp 20171103152558 --redirect-db demo2
Restore global Greenplum Database metadata and query plan statistics in addition to the database
objects:
$ gprestore --timestamp 20171103152558 --create-db --with-globals --with-stats
Restore, using backup files that were created in the /home/gpadmin/backup directory, creating 8
parallel connections:
$ gprestore --backup-dir /home/gpadmin/backups/ --timestamp 20171103153156 --create-db
--jobs 8
Restore only the "wikipedia" schema included in the backup set:
$ dropdb demo
$ gprestore --include-schema wikipedia --backup-dir /home/gpadmin/backups/ --timestamp
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
83
20171103153156 --create-db
If you restore from an incremental backup set, all the required files in the backup set must be
available to gprestore. For example, the following timestamp keys specify an incremental backup
set. 20170514054532 is the full backup and the others are incremental backups.
20170514054532 (full backup)
20170714095512
20170914081205
20171114064330
20180114051246
The following gprestore command specifies the timestamp 20121114064330. The incremental
backup with the timestamps 20120714095512 and 20120914081205 and the full backup must be
available to perform a restore.
gprestore --timestamp 20121114064330 --redirect-db mystest --create-db
See Also
gpbackup, Parallel Backup with gpbackup and gprestore and Using the S3 Storage Plugin with
gpbackup and gprestore
Parent topic:Backup Utility Reference
gpbackup_manager
Display information about existing backups, delete existing backups, or encrypt passwords for secure
storage in plugin configuration files.
Note: The gpbackup_manager utility is available only in the commercial release of Tanzu Greenplum
Backup and Restore.
Synopsis
gpbackup_manager [<command>]
where
command
is:
delete-backup <timestamp> [--plugin-config <config-file>]
| display-report <timestamp>
| encrypt-password --plugin-config <config-file>
| list-backups
| replicate-backup <timestamp> --plugin-config <config-file>
| help [<command>]
Commands
delete-backup timestamp
Deletes the backup set with the specified timestamp.
display-report timestamp
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
84
Displays the backup report for a specified timestamp.
encrypt-password
Encrypts plain-text passwords for storage in the DD Boost plugin configuration file.
list-backups
Displays a list of backups that have been taken. If the backup history file does not exist, the
command exits with an error message. See Table 1 for a description of the columns in this list.
replicate-backup timestamp
For a backup on a Data Domain server, replicates the backup to a second (remote) Data
Domain server. The timestamp is the timestamp of the backup on the Data Domain server.
The --plugin-config config-file option specifies the DD Boost configuration file that
contains the information to access the backup and the remote Data Domain server. For
information about the configuration file, see Using the DD Boost Storage Plugin with
gpbackup, gprestore, and gpbackup_manager. For information about replicating backups, see
Replicating Backups
help command
Displays a help message for the specified command.
Options
--plugin-config config-file
The delete-backup command requires this option if the backup is stored in S3 or a Data
Domain system. The encrypt-password command requires this option. Note: When you
delete backup sets stored in a Data Domain system, you must pass in the same configuration
file that was passed in when the backups were created. Otherwise, the gpbackup_manager
delete-backup command will exit with an error.
-h | --help
Displays a help message for the gpbackup_manager command. For help on a specific
gpbackup_manager command, enter gpbackup_manager help command. For example:
$ gpbackup_manager help encrypt-password
Description
The gpbackup_manager utility manages backup sets created using the gpbackup utility. You can list
backups, display a report for a backup, and delete a backup. gpbackup_manager can also encrypt
passwords to store in a DD Boost plugin configuration file.
Greenplum Database must be running to use the gpbackup_manager utility.
Backup history is saved on the Greenplum Database master host in the file
$MASTER_DATA_DIRECTORY/gpbackup_history.yaml. If no backups have been created yet, or if the
backup history has been deleted, gpbackup_manager commands that depend on the file will display
an error message and exit. If the backup history contains invalid YAML syntax, a yaml error message
is displayed.
Versions of gpbackup earlier than v1.13.0 did not save the backup duration in the backup history file.
The list-backups command duration column is empty for these backups.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
85
The encrypt-password command is used to encrypt Data Domain user passwords that are saved in a
DD Boost plug-In configuration file. To use this option, the pgcrypto extension must be enabled in
the Greenplum Database postgres database. See the Tanzu Greenplum Backup and Restore
installation instructions for help installing pgcrypto.
The encrypt-password command prompts you to enter and then re-enter the password to be
encrypted. To maintain password secrecy, characters entered are echoed as asterisks. If replication
is enabled in the specified DD Boost configuration file, the command also prompts for a password for
the remote Data Domain account. You must then copy the output of the command into the DD
Boost configuration file.
The following table describes the contents of the columns in the list that is output by the
gpbackup_manager list-backups command.
Table 1. Backup List Report
Column Description
timestamp Timestamp value (YYYYMMDDHHMMSS) that specifies the time the backup was
taken.
date Date the backup was taken.
status Status of the backup operation, Success or Failure.
database Name of the database backed up (specified on the gpbackup command line
with the --dbname option).
type Which classes of data are included in the backup. Can be one of the
following:
full - contains all global and local metadata, and user data for the
database. This kind of backup can be the base for an incremental
backup. Depending on the gpbackup options specified, some
objects could have been filtered from the backup.
incremental – contains all global and local metadata, and user data
changed since a previous full backup.
metadata-only – contains only the global and local metadata for
the database. Depending on the gpbackup options specified, some
objects could have been filtered from the backup.
data-only – contains only user data from the database. Depending
on the gpbackup options specified, some objects could have been
filtered from the backup.
object filtering The object filtering options that were specified at least once on the gpbackup
command line, or blank if no filtering operations were used. To see the
object filtering details for a specific backup, run the gpbackup_manager
report command for the backgit st
include-schema – at least one --include-schema option was
specified.
exclude-schema – at least one --exclude-schema option was
specified.
include-table – at least one --include-table option was specified.
exclude-table – at least one --exclude-table option was specified.
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
86
Table 1. Backup List Report
Column Description
plugin The name of the binary plugin file that was used to configure the backup
destination, excluding path information.
duration The amount of time (hh:mm:ss format) taken to complete the backup.
date deleted Indicates the status of the deletion. If blank, the backup still exists. Other
possible values include:
In progress - the deletion is in progress.
Plugin Backup Delete Failed - Last delete attempt failed to delete
backup from plugin storage.
Local Delete Failed - Last delete attempt failed to delete backup
from local storage.
A timestamp indicating the date deleted.
Examples
1. Display a list of the existing backups.
gpadmin@mdw:$ gpbackup_manager list-backups
timestamp date status database ty
pe object filtering plugin duration date dele
ted
20210721191330 Wed Jul 21 2021 19:13:30 Success sales fu
ll gpbackup_ddboost_plugin 00:20:25 In progre
ss
20210721191201 Wed Jul 21 2021 19:12:01 Success sales fu
ll gpbackup_ddboost_plugin 00:15:21 Plugin Ba
ckup Delete Failed
20210721191041 Wed Jul 21 2021 19:10:41 Success sales fu
ll gpbackup_ddboost_plugin 00:10:25 Local Del
ete Failed
20210721191022 Wed Jul 21 2021 19:10:22 Success sales fu
ll include-schema 00:02:35 Wed Jul 2
1 2021 19:24:59
20210721190942 Wed Jul 21 2021 19:09:42 Success sales fu
ll exclude-schema 00:01:11
20210721190826 Wed Jul 21 2021 19:08:26 Success sales da
ta-only 00:05:17
20210721190818 Wed Jul 21 2021 19:08:18 Success sales me
tadata-only 00:01:01
20210721190727 Wed Jul 21 2021 19:07:27 Success sales fu
ll 00:07:22
2. Display the backup report for the backup with timestamp 20190612154608.
$ gpbackup_manager display-report 20190612154608
Greenplum Database Backup Report
Timestamp Key: 20190612154608
GPDB Version: 5.14.0+dev.8.gdb327b2a3f build commit:db327b2a3f6f2b0673229e9aa16
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
87
4812e3bb56263
gpbackup Version: 1.11.0
Database Name: sales
Command Line: gpbackup --dbname sales
Compression: gzip
Plugin Executable: None
Backup Section: All Sections
Object Filtering: None
Includes Statistics: No
Data File Format: Multiple Data Files Per Segment
Incremental: False
Start Time: 2019-06-12 15:46:08
End Time: 2019-06-12 15:46:53
Duration: 0:00:45
Backup Status: Success
Database Size: 3306 MB
Count of Database Objects in Backup:
Aggregates 12
Casts 4
Constraints 0
Conversions 0
Database GUCs 0
Extensions 0
Functions 0
Indexes 0
Operator Classes 0
Operator Families 1
Operators 0
Procedural Languages 1
Protocols 1
Resource Groups 2
Resource Queues 6
Roles 859
Rules 0
Schemas 185
Sequences 207
Tables 431
Tablespaces 0
Text Search Configurations 0
Text Search Dictionaries 0
Text Search Parsers 0
Text Search Templates 0
Triggers 0
Types 2
Views 0
3. Delete the local backup with timestamp 20190620145126.
$ gpbackup_manager delete-backup 20190620145126
Are you sure you want to delete-backup 20190620145126? (y/n)y
Deletion of 20190620145126 in progress.
Deletion of 20190620145126 complete.
4. Delete a backup stored on a Data Domain system. The DD Boost plugin configuration file
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
88
must be specified with the --plugin-config option.
$ gpbackup_manager delete-backup 20190620160656 --plugin-config ~/ddboost_confi
g.yaml
Are you sure you want to delete-backup 20190620160656? (y/n)y
Deletion of 20190620160656 in progress.
Deletion of 20190620160656 done.
5. Encrypt a password. A DD Boost plugin configuration file must be specified with the --
plugin-config option.
$ gpbackup_manager encrypt-password --plugin-config ~/ddboost_rep_on_config.yam
l
Please enter your password ******
Please verify your password ******
Please enter your remote password ******
Please verify your remote password ******
Please copy/paste these lines into the plugin config file:
password: "c30d04090302a0ff861b823d71b079d23801ac367a74a1a8c088ed53beb62b7e190b
7110277ea5b51c88afcba41857d2900070164db5f3efda63745dfffc7f2026290a31e1a2035dac"
password_encryption: "on"
remote_password: "c30d04090302c764fd06bfa1dade62d2380160a8f1e4d1ff0a4bb25a542fb
1d31c7a19b98e9b2f00e7b1cf4811c6cdb3d54beebae67f605e6a9c4ec9718576769b20e5ebd0b9
f53221"
remote_password_encryption: "on"
See Also
gprestore, Parallel Backup with gpbackup and gprestore and Using the S3 Storage Plugin with
gpbackup and gprestore
Parent topic:Backup Utility Reference
VMware Greenplum Backup and Restore v1.25 Documentation
VMware, Inc
89