Using Pivot Tables and Using Simple Functions to Analyze Salary

Using Pivot Tables and Using Simple Functions to

Analyze Salary Data

There was a time when Ontario was one of the few jurisdictions to

provide public access to the salaries of public-sector employees earning

more than $100,000 a year. We can debate whether that constitutes an

exorbitant wage in expensive markets such as Vancouver and Toronto,

especially when you factor in other variables such as inflation. Still, this

disclosure is welcome. An increasing number of jurisdictions are

following Ontario’s lead.

In Canada’s largest province, the list is released at the end of each fiscal

year (March) and typically leads to stories about the highest paid

employee.

Getting this data into a spreadsheet allows us to become more

ambitious, which is what we will do in this tutorial.

What you will learn in this tutorial

Download three years worth of data.

Reformat 2019 data using paste special.

Merge tables in the master file.

Use concatenate function to combine last and first names.

Create new columns that calculate total salary and taxable benefits as

percent of total salary.

Create pivot table that compares institutions with the highest number

employees on the list and how the totals compare year-to-year.

Use paste special to create a table for calculating percent difference.

Downloading data

Go to the Ontario’s Public sector salary disclosure site. Take a few

minutes to examine the contents. As we discussed in the first tutorial

involving the Global COVID-19 data, it’s always advisable to study what

you are about to download.

The explanation begins with the “Overview”.

Among other things, you’ll notice that the disclosure tables go all the

way back to 1996, ideal for ambitious folks who want to take a deeper

dive to, for instance, judge the progress Ontario’s public sector has

made in hiring and promoting more women to executive positions. The

archive also makes it possible track hiring at universities, colleges,

school boards, hospitals and municipalities.

Let’s begin by downloading the 2019 data.

Select the blue “Search the 2019 disclosure” tab you can see in the

screen grab above.

Next, you’ll see a preview page with three formats available for

download.

The preview table allows you to see what you’re about to download, as

well as conduct an online search for a specific name. While online

searches are useful, downloading the entire dataset is usually a better

way to go.

Under the “Download data” section, you’ll notice three datatypes:

Spreadsheets; CSV (comma separated value); and JSON (a file format

that stands for J ava Script O bject Notation.)

Download the spreadsheet version.

Save it in the folder you have created for this exercise.

Before downloading files for the previous two years, open the 2019

table.

This table contains some formatting that we will want to get rid of using

the paste-special option we learned in the second tutorial.

Select and copy the entire table.

Open a new worksheet.

Use the paste special.

Adjust the column widths and scroll to the bottom to make sure you

have all the data.

Return to the salary discosure website, and to the main page that we

saw on page two of the tutorial.

Scroll down to the disclosure tables (“all sectors and seconded

employees”).

You’ll also notice that for 2018 and the previous years, there’s an

option to download a table that contains “changes, additions and

deletions to the initial publication.”

Typically, we would have to use the information in this table to update

the material in the main table. If we were building a master table

beyond this tutorial, that’s exactly what we’d have to do. To save time,

we will skip this crucial verification step. It was worth noting before

going any further.

Download the 2018 data.

Thankfully, this table is in a more conventional format, as are all the

tables for the previous years.

Close this file and repeat the same process for 2017.

Merge tables in master file

Now we have three years’ worth of data, we can create a master file.

This is possible because the table for each year contains the identical

structure: identical column names in the same order.

The columns are Sector: Last Name; First Name; Salary Paid; Taxable

Benefits; Employer; Job Title.

If the tables were dissimilar, merging them to create a master file

would be impossible without additional work.

Keep all three tables open.

Open a fourth Excel file and call it something like

“MasterSalaries_2017-2019.”

Copy and paste the 2019 file into the worksheet of the fourth, newly

named Excel file.

Be sure to readjust the column widths either manually with your cursor,

or selecting column in question, right-clicking to obtain your short-cut

menu and readjusting the width to a more manageable length of 30 to

50 characters.

Scroll to the bottom to ensure you have all the data.

Return to the 2018 table, copy that table – minus the column titles

because we already have them – and place your cursor on cell

A166978.

Paste.

Scroll to the bottom of the 2018 data to repeat your verification.

Paste the 2017 table at the bottom of your merged (2019 and 2018 so

far) table using the same techniques.

You should have 450262 rows of salary data, almost half a million rows.

Save the table, calling the first worksheet something like

“Salaries_2017-19.” (NOTE: character limits mean we must be

economical when naming worksheets.) Also be sure to save as you go.

In the first two tutorials, you learned how to use sorting, filtering and

simple observation to interview the data to determine what you have

and what you don’t have. Take the time to do this before moving on to

the next step.

Use concatenate function to combine first and last name

Before we get this into a pivot table, we’ll use a concatenate function

to combine the last and first names in columns B and C into a new

column.

Insert a new column and call it something like “New Name”

We’ll use the concatenate operator “&” to combine the names with the

formula <<=B2&”, “&C2>>

Let’s translate this formula: Take B2 and combine it (&) with a comma

and space which we will put in between quotation marks (“, “) and

combine C2.

Hit enter.

Copy the formula to bottom of the column and adjust the column

length.

We’ll now create a column that adds the Salary Paid to Taxable Benefits

for a total salary.

Add a column to the right of Taxable Benefits and use the addition

operator “+” to obtain your total.

Copy the formula to the bottom of the column and format the numbers

as currency with no decimal places.

Good.

Now let’s create a new column which calculates the taxable benefit as a

percent of the total salary.

Insert a new column, and divide the taxable benefit by the total salary.

Reformat with as a percent with two decimal places.

It’s time to perform some analysis.

Filter the table for the year 2019. Sort the “Total Salary” column to see

who earned the most.

Then sort the “Taxable Benefit Percent” column in descending order.

You’ll be surprised at some of the results.

Before we even get into a pivot table, the new columns we’ve created

in this master table for the three years now convey a lot more

information that takes us beyone the typical, who earned the most in

2019. We can filter for universities, and professors to see which

institutions are doling out the most cash.

Simply filtering and sorting on the main table produces useful, and

perhaps newsworthy information.

Now you can use the pivot table to determine what’s happening at the

institutional level.

Create pivot table that compares institutions with the most

employees on the list and how those numbers compare year-to-year

Each row in our original table is an employee. The master table filtered

for 2019, contains 166,977 rows. Or put another way, 166,977 public-

sector employees who earned more than $100,000 that year.

Let’s create a pivot table to provide a closer look.

We can use the pivot table to group (Rows) the employees, count

(values) them, and subdivide (Columns) by year.

We can see that the 2019 number corresponds with the filtered

number in the main table.

It’s also clear that the number of employees on the list has been

increasing, an additional piece of information that pivot table has just

provided.

Now we can subdivide the table by employer (yes, we can count on the

employer field and group it in another section of the pivot table) and

filter it for the one we want.

Let’s do that for Humber.

Filter the Row Label for Humber.

The number of Humber employees on the list is growing. If you wanted

to see all the employees on list, click on any of the totals to see the

underlying data that will be presented in a separate, formatted

worksheet with drop-down menus.

Clear the Row Label Filter to return to the complete pivot table we’ve

just created.

Sort the Grand Total in descending order.

Sorting the Grand Total in descending order, we can easily see how

some of the top employers stack up. Ontario Power Generation, City of

Toronto, City of Toronto Police Service (interesting because of the

defund the police campaign), University of Toronto, etc. have all seen

an increase in numbers.

To compare more of the major employers and calculate the rate at

which the employees are growing, let’s use the paste special.

Use paste special to create a table for calculating percent difference

Copy the pivot table, go to a new worksheet and use the paste special

option.

Time for some clean-up. Delete first three rows.

Rename column A “Employer”

Delete the Grand Total column (E) and the Grand Total row (2473).

Create a new column “Percent Diff 2017-2019”.

We’ll use the formula for percent increase ( generically: =(NEW

NUMBER-OLD NUMBER)/OLD NUMBER)). We put the numbers in

brackets because Excel will perform that caculation before dividing.

Reformat the number as a percent with on decimal place and copy the

formula to the bottom of the row.

Dividing by zero produces an error message (#DIV/0!). Use the filter to

de-select the error message.

Sort the new column in descending order to see which employer had

the fastest growth rate.

Sort column E in descending order.

You can use the filters to compare some of the larger employers. Let’s

take Humber, Ryerson and Sheridan for instance.

Sheridan grew the fastest.

There are many ways to perform value-added analysis using many of

the techniques we have learned in this tutorial that take us beyond the

tried and true “who earned the most of all employers.”

The spreadsheet is a powerful tool, which forms the bedrock (for non-

programmers, at least) for data-driven stories.