© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Stored procedures in Amazon
Redshift
A N T 3 3 7 - R
Naresh Chainani
Senior Development Manager
Amazon Web Services
Joe Harris
Senior Database Engineer
Amazon Web Services
Why stored procedures?
Feature adoption
Frequently asked questions
Best practices
Demos
Agenda
Why stored procedures?
Wrap complex
ETL logic in
reusable stored
procedures
Eliminate network
round trips
Security
Reduce the
number of admin
users
Simpler “lift &
shift” legacy data
warehouse (DW)
migrations
Simplified migrations
Stored procedures enable simple “lift and shift” migrations
Customers have thousands of existing stored procedures that they do
not want to rewrite in Java, Python, or an ETL tool
PL/pgSQL provides loops, conditionals, case statements, and
IN/OUT/INOUT argument passing
Use AWS Schema Conversion Tool to easily migrate from SQL Server,
Oracle, Netezza, Teradata, and Greenplum
Security
GRANT and REVOKE
Only the owner of a procedure can execute it
Permission must be explicitly granted for others to execute a procedure
SECURITY INVOKER
Caller executes procedure as themselves and must have appropriate privileges to
access/modify database objects
This is the default setting
SECURITY DEFINER
Caller executes procedure with the owner’s privileges
Allows security admins to avoid giving broad access that user would otherwise need
Security definer use cases
Destructive actions
ETL users can run DELETE or TRUNCATE on important tables without being granted table owner
or superuser privileges
Access control
Provide fine-grained access to sensitive data
Revoke access to underlying table and allow authorized users to access data via a stored
procedure
Use caution
Grant EXECUTE on such stored procedures to specific users, not to the public. This avoids potential
misuse of a powerful feature.
Modular code, clean abstractions
Maintaining table relationships and ensuring data integrity can lead to
complex ETL logic
Stored procedures provide code encapsulation for SQL, making it easier
to debug and understand when things do not work as expected
CREATE OR PROCEDURE daily_ingest()
AS $$
BEGIN
CALL load_main_tables();
CALL fix_violations();
END;
$$ LANGUAGE plpgsql;
BEGIN; CALL daily_ingest(); COMMIT;
-- `psql` output
BEGIN
INFO: - Load Main -
INFO: `sales` table - 200000 rows loaded
WARN: `sales` table - 7 duplicates found
INFO: Details in ‘violations’ table
INFO: - Fix PK Violations -
INFO: `sales` table - 7 duplicates found
INFO: `sales` table - 7 duplicates deleted
INFO: ‘violations’ table updated
CALL
COMMIT
Performance
You can now apply complex procedural logic inside your Amazon Redshift cluster
Stored procedures loops and conditional logic run on the leader node of Amazon Redshift
cluster, and any SQL within it is distributed to compute nodes
Previously, you needed to extract the data, apply the procedural logic using Java/Python/or
other on a separate server, and then reload the results
You may be able to speed up your processing significantly by using procedures to eliminate the
extract > process > reload round trip
For maximum performance, avoid large loops over millions of rows that are updated one at a
time. Instead use set-based SQL logic.
Feature adoption
Customers have enthusiastically adopted stored procedures
May June July August September October
Stored procedures created
(tens of thousands)
May June July August September October
Daily call count
(hundreds of thousands)
Frequently asked questions
How do I view a list of all stored procedures?
SELECT proname FROM pg_proc_info WHERE prolang = 100125;
How do I retrieve the DDL of a stored procedure?
SHOW PROCEDURE my_proc( arg1 INT, arg2 VARCHAR);
How do I return results from a stored procedure?
How do I use transactions within a stored procedure?
How do I use dynamic SQL in a stored procedure?
How do I debug a stored procedure?
Returning results
Results are returned with REFCURSOR or temp tables
REFCURSOR
Limited to 1 REFCURSOR
Results are kept in memory on leader node
Maximal cursor size is limited on the basis of node type
TEMP TABLE
Many temp tables can be used
Results are distributed across compute node disks
Maximal temp table size is limited by available storage
CREATE PROCEDURE sp_result
( result_set INOUT REFCURSOR )
AS $$
BEGIN
OPEN result_set FOR
SELECT *
FROM my_table
LIMIT 100;
END;
$$ LANGUAGE plpgsql;
BEGIN;
CALL sp_result( 'result’ );
FETCH ALL FROM result;
END;
Returning results
CREATE PROCEDURE sp_result
AS $$
BEGIN
DROP TABLE IF EXISTS my_temp;
CREATE TEMP TABLE my_temp AS
SELECT *
FROM my_table
LIMIT 100;
END;
$$ LANGUAGE plpgsql;
CALL sp_result();
-- Not bound to transaction scope
SELECT * FROM my_temp;
REFCURSOR
Temp Table
Using transactions
COMMIT, ROLLBACK, or TRUNCATE can be issued within a procedure
A call to a stored procedure is atomic; if any error is encountered the
current transaction context is rolled back
Procedures use the transaction context of the caller by default, so a
procedure called from another runs in the same transaction.
Transactions are not nested.
When a stored procedure is called within an explicit transaction block,
your application’s logic determines when a procedure’s modifications
are committed or rolled back
Using transactions
COMMIT ends the current transaction, and a new transaction starts
CREATE PROCEDURE sp_commit()
AS $$
DECLARE
tlist RECORD;
BEGIN
FOR tlist IN EXECUTE 'SELECT tname FROM tbls;'
LOOP
EXECUTE 'INSERT INTO '|| tlist.tname
||' SELECT * FROM stg_'|| tlist.tname
||' ;';
COMMIT; --<<<<<
END LOOP;
END;
$$ LANGUAGE plpgsql;
BEGIN; CALL sp_commit(); END;
-- Effect of sp_commit()
BEGIN TRANSACTION
INSERT INTO tbl_1
COMMIT;
BEGIN TRANSACTION
INSERT INTO tbl_2
COMMIT;
BEGIN TRANSACTION
INSERT INTO tbl_3
COMMIT;
BEGIN TRANSACTION
INSERT INTO tbl_4
COMMIT;
…etc…
Dynamic SQL
Use EXECUTE and quote_ident/quote_literal to run composed queries
CREATE PROCEDURE sp_dynamic(loops INT, tbl_name VARCHAR)
AS $$
DECLARE
i_loop INT:=0;
BEGIN
EXECUTE 'CREATE TEMP TABLE '||quote_ident(tbl_name)
||' (loop INT, loop_ts TIMESTAMP);';
WHILE i_loop < loops
LOOP
i_loop = i_loop + 1;
RAISE INFO 'Loop pass: %', i_loop;
EXECUTE 'INSERT INTO '||quote_ident(tbl_name)
||' SELECT '||i_loop||', '''||SYSDATE||''';';
END LOOP;
END;
$$ LANGUAGE plpgsql;
BEGIN;
CALL sp_dynamic(15,'tmp_dynamic');
SELECT * FROM tmp_dynamic;
END;
-- `psql` output
loop | loop_ts
------+----------------------------
11 | 2019-10-17 16:27:07.365599
5 | 2019-10-17 16:27:07.365599
13 | 2019-10-17 16:27:07.365599
15 | 2019-10-17 16:27:07.365599
3 | 2019-10-17 16:27:07.365599
1 | 2019-10-17 16:27:07.365599
4 | 2019-10-17 16:27:07.365599
8 | 2019-10-17 16:27:07.365599
14 | 2019-10-17 16:27:07.365599
6 | 2019-10-17 16:27:07.365599
10 | 2019-10-17 16:27:07.365599
7 | 2019-10-17 16:27:07.365599
2 | 2019-10-17 16:27:07.365599
9 | 2019-10-17 16:27:07.365599
12 | 2019-10-17 16:27:07.365599
Debugging
Use RAISE INFO statements to print variables for simple debugging
CREATE PROCEDURE sp_debug(loops INT)
AS $$
DECLARE
i_loop INT:=0;
BEGIN
RAISE INFO 'Starting - % loops',loops;
WHILE i_loop < loops
LOOP
i_loop = i_loop + 1;
RAISE INFO 'Loop pass - %', i_loop;
END LOOP;
RAISE INFO 'Finished - % loops', i_loop;
END;
$$ LANGUAGE plpgsql;
BEGIN; CALL sp_debug(10); END;
-- `psql` output
BEGIN
INFO: Starting - 10 loops
INFO: Loop pass - 1
INFO: Loop pass - 2
INFO: Loop pass - 3
INFO: Loop pass - 4
INFO: Loop pass - 5
INFO: Loop pass - 6
INFO: Loop pass - 7
INFO: Loop pass - 8
INFO: Loop pass - 9
INFO: Loop pass - 10
INFO: Finished - 10 loops
CALL
COMMIT
Debugging
Use a tool that displays info messages when debugging
-- `psql` output
BEGIN
INFO: Starting - 10 loops
INFO: Loop pass - 1
INFO: Loop pass - 2
INFO: Loop pass - 3
INFO: Loop pass - 4
INFO: Loop pass - 5
INFO: Loop pass - 6
INFO: Loop pass - 7
INFO: Loop pass - 8
INFO: Loop pass - 9
INFO: Loop pass - 10
INFO: Finished - 10 loops
CALL
COMMIT
TablePlus
[2019-12-02 10:11:12]
manual transaction mode ON
tpcds.public>
BEGIN
[2019-12-02 10:11:12]
completed in 242 ms
tpcds.public>
END
[2019-12-02 10:11:12]
completed in 443 ms
DataGrip
Best practices summary
1. Use source code control to track changes and provide versioning
2. SECURITY DEFINER is powerful; use it with caution
3. Use loops and REFCUSOR for iterating over small result sets
4. Use temp tables and set logic when working with millions of rows
5. Add INFO messages; talkative procedures are easier to understand
6. KISS: Keep procedures small and nest them to do complex things
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo stored procedures on GitHub
https://github.com/awslabs/
amazon-redshift-utils/tree/master/src/
StoredProcedures/
sp_controlled_access.sql
sp_check_primary_key.sql
sp_connect_by_prior.sql
sp_pivot_for.sql
Control queries on sensitive data
-- [3] Create User With No Privilegses
CREATE USER user_no_priv WITH PASSWORD 'Password01';
GRANT ALL ON SCHEMA public TO user_no_priv;
GRANT EXECUTE ON PROCEDURE sp_controlled_access(
VARCHAR,VARCHAR,VARCHAR,INTEGER,REFCURSOR
) TO user_no_priv;
-- [4] Change session to the new user
SET SESSION AUTHORIZATION user_no_priv;
SELECT current_user;
-- user_no_priv
SELECT * FROM hr.employee;
-- ERROR: permission denied
-- [5] Call the stored procedure
BEGIN;
CALL sp_controlled_access (
'title,emp,mgr,dept'::VARCHAR
-- select_cols
, 'hr.employee'::VARCHAR
-- query_from
, 'dept = ''Board'''::VARCHAR
-- where_clause
, 10
-- max_rows
, 'employee_data' );
-- result_set
-- INFO: AUTHORIZED: Query on `hr.employee` completed
-- [6] Fetch the results
FETCH ALL FROM employee_data;
-- title | emp | mgr | dept
-- Chairman | 100 | | Board
-- CEO | 101 | 100 | Board
END;
-- [1] Create Example Data
CREATE SCHEMA hr;
CREATE TABLE hr.employee (
title VARCHAR, emp INT, mgr INT, dept VARCHAR);
INSERT INTO hr.employee VALUES
('Chairman', 100, NULL, 'Board' )
, ('CEO' , 101, 100 , 'Board' )
, ('CTO' , 102, 101 , 'IT' );
-- [2] Create Example Auth Table
CREATE TABLE hr.access_authority (
query_from VARCHAR, user_name VARCHAR
, valid_until TIMESTAMP, max_rows INT );
INSERT INTO hr.access_authority
VALUES ('hr.employee’, 'user_no_priv’, '2019-12-31’ ,99);
amazon-redshift-utils/src/StoredProcedures/sp_controlled_access.sql
Allow a user to perform limited queries on
tables where they have no access privileges
sp_controlled_access() looks for query
permission and maximum rows in auth_table
Uses SECURITY DEFINER to run the user query
Check PRIMARY KEY constraint
-- [4] Attempt to DELETE will error
SET SESSION AUTHORIZATION user_low_priv;
DELETE FROM tmp_pk_log WHERE pk_col = 100;
-- ERROR: permission denied for relation tmp_pk_log
-- [5] Call the SP with LOG action
CALL sp_check_primary_key(
SYSDATE,'tmp_pk_test','LOG','tmp_pk_log',0);
SELECT * FROM tmp_pk_log;
-- check_table | tmp_pk_test
-- check_time | 2019-10-31 00:01:33.849821
-- check_status | ERROR (LOG) - Duplicates found
-- error_count | 4
-- [6] Call the SP with FIX action and 3 rows max fix
CALL sp_check_primary_key(
SYSDATE,'tmp_pk_test','FIX','tmp_pk_log',3);
SELECT * FROM tmp_pk_log;
-- check_table | tmp_pk_test
-- check_time | 2019-10-31 00:01:34.479395
-- check_status | ERROR (FIX) - Duplicate count exceeds…
-- error_count | 4
-- [7] Call the SP with FIX action and 99 rows max fix
CALL sp_check_primary_key(
SYSDATE,'tmp_pk_test','FIX','tmp_pk_log',99);
SELECT * FROM tmp_pk_log;
-- check_table | tmp_pk_test
-- check_time | 2019-10-31 00:01:34.892679
-- check_status | SUCCESS (FIX) - Duplicates corrected.
-- error_count | 4
-- [2] Create Log Table
CREATE TABLE tmp_pk_log(
check_table VARCHAR
, check_time TIMESTAMP
, check_status VARCHAR
, error_count INT);
-- [3] Create A User
CREATE USER user_low_priv ;
-- [4] Grant Execute
GRANT EXECUTE ON PROCEDURE
sp_check_primary_key()
TO user_low_priv;
amazon-redshift-utils/src/StoredProcedures/sp_check_primary_key.sql
Checks that the declared PRIMARY KEY does
not contain any duplicates
sp_check_primary_key() attempts to remove
duplicate rows up the max_fix_rows value
Uses SECURITY DEFINER to allow an
unprivileged user to safely DELETE from a
table where they only have read privileges
-- [1] Create Example Table
CREATE TEMP TABLE tmp_pk_test (
pk_col INTEGER PRIMARY KEY
, entries INTEGER);
INSERT INTO tmp_pk_test VALUES
(100, 7001)
, (100, 7001) -- Duplicate
, (100, 7001) -- Duplicate
, (100, 7001) -- Duplicate
, (101, 20225)
, (102, 22772)
, (103, 4577);
Emulate CONNECT BY PRIOR
-- [2] Example of CONNECT BY PRIOR syntax
SELECT dept
, emp_title
, LEVEL
, emp_id
, mgr_id
FROM tmp_employee
CONNECT BY PRIOR emp_id = mgr_id
START WITH mgr_id IS NULL
WHERE LEVEL <= 3;
-- [3] Syntax for Stored Procedure call
CALL sp_connect_by_prior (
'dept,emp_title'
-- select_cols
,'tmp_employee'
-- table_name
,'emp_id'
-- child_col
,'mgr_id'
-- parent_col
,'mgr_id IS NULL'
-- start_with
,''
-- where_clause
,3
-- max_level
,'tmp_result'
-- temp_tbl_n
);
-- [4] Stored Procedure output
SELECT * FROM tmp_result
ORDER BY level, mgr_id, emp_id;
-- dept | emp_title | level | emp_id | mgr_id
-- Board | Chairman | 1 | 100 |
-- Board | CEO | 2 | 101 | 100
-- IT | CTO | 3 | 102 | 101
-- Sales/Mkt | CMO | 3 | 103 | 101
-- [1] Create Example Table
CREATE TEMP TABLE tmp_employee (
emp_title VARCHAR
, emp_id INT
, mgr_id INT
, dept VARCHAR)
;
INSERT INTO tmp_employee VALUES
('Chairman' , 100, NULL,'Board' )
, ('CEO' , 101, 100 ,'Board' )
, ('CTO' , 102, 101 ,'IT' )
, ('CMO' , 103, 101 ,'Sales/Mkt')
, ('VP Analytics' , 104, 102 ,'IT' )
, ('VP Engineering' , 105, 102 ,'IT’ )
, ('Sales Director' , 106, 103 ,'Sales/Mkt’)
, ('Sales Mgr West' , 107, 106 ,'Sales/Mkt’)
;
amazon-redshift-utils/src/StoredProcedures/sp_connect_by_prior.sql
Emulates the output of queries using CONNECT
BY PRIOR syntax on parent-child hierarchies
sp_connect_by_prior() recursively queries the
table to retrieve all levels of the hierarchy
Emulate PIVOT FOR
-- [2] Example of PIVOT FOR syntax
SELECT country
, ak, al, ar, az, ca, co
FROM tmp_state_metrics
PIVOT (SUM( entries )
FOR state IN ('ak','al','ar','az','ca','co’)) pvt
WHERE state IS NOT NULL
GROUP BY country
;
-- [3] Syntax for Stored Procedure call
BEGIN;
CALL sp_pivot_for (
'country'
-- select_cols
,'tmp_state_metrics'
-- table_name
,'state'
-- pivot_src
,'SUM'
-- agg_func
,'entries'
-- metric_col
,'state IS NOT NULL'
-- where_clause
,'pivot_result'
-- result_set (cursor)
);
-- [4] Retrieve Stored Procedure output from cursor
FETCH ALL FROM pivot_result
;
-- country | ak | al | ar | az | ca | co
-- USA | 7002 | 20226 | 22773 | 4578 | 17878 | 19504
-- [5] Close transaction. Will release the cursor.
END;
-- [1] Create Example Table
CREATE TEMP TABLE tmp_state_metrics (
country VARCHAR
, state VARCHAR
, entries INTEGER
);
INSERT INTO tmp_state_metrics VALUES
('USA', NULL, 15177), ('USA', NULL, 1)
, ('USA', 'AK', 7001), ('USA', 'AK', 1)
, ('USA', 'AL', 20225), ('USA', 'AL', 1)
, ('USA', 'AR', 22772), ('USA', 'AR', 1)
, ('USA', 'AZ', 4577), ('USA', 'AZ', 1)
, ('USA', 'CA', 17877), ('USA', 'CA', 1)
, ('USA', 'CO', 19503), ('USA', 'CO', 1)
;
amazon-redshift-utils/src/StoredProcedures/sp_pivot_for.sql
Emulates the output of queries using PIVOT
FOR syntax to transpose rows to columns
sp_pivot_for() creates an aggregated CASE
statement for each row value
Stored procedure version does not require known
pivot values
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Learn big data with AWS Training and Certification
Visit aws.amazon.com/training/paths-specialty/
New free digital course, Data Analytics Fundamentals, introduces
Amazon S3, Amazon Kinesis, Amazon EMR,
AWS Glue, and Amazon Redshift
Validate expertise with the AWS Certified Big Data - Specialty exam or
the new AWS Certified Data Analytics - Specialty beta exam
Resources created by the experts at AWS to help you build and validate data analytics skills
Classroom offerings, including Big Data on AWS, feature
AWS expert instructors and hands-on labs
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Naresh Chainani
nareshkc@amazon.com
Joe Harris
harrjose@amazon.com
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.