1
PharmaSUG 2019 - Paper AP-212
Python-izing the SAS Programmer
Mike Molter, Wright Avenue
ABSTRACT
More and more, Biostats departments are contemplating and asking questions about converting certain
clinical data and/or metadata processing tasks to Python. For many who have spent a career writing SAS
code, the learning curve for this vast new frontier may appear to be a daunting task. In this paper we’ll
begin a gradual transition into Python data processing by looking at the Python DataFrame, and seeing
how simple SAS Data step tasks such as merging, sorting, and others work in the Python environment.
INTRODUCTION
Like most papers presented at this conference, this paper is written for SAS programmers, by a SAS
programmer. So if you’re like me, whether you’re new to the trade or a seasoned veteran, you’ve
become comfortable with SAS staples such as the data step and all that it includes, such as the power of
the SET statement to both compile and iterate through data, functions, assignment statements, and
iterative DO loops. But if you’re also like me, maybe you’ve started to hear whispers of change coming.
As much as we love our SAS, what of this open source? What can this Python do?
The purpose of this paper is to give you, the SAS programmer, the lifejacket and help you dip a toe into
Lake Python. After an initial discussion about the programming environment, we’ll concentrate mostly on
basic data manipulation, selecting various familiar data step functionalities, and observing how the same
concept is applied in Python.
What will you not find in this paper? For starters, while you will find plenty of comparisons between SAS
and Python, and on occasion a statement about a convenience in one language that isn’t in the other, you
will find no attempt to convince you that one language is better than the other. The purpose of this paper
is to help you expand your knowledge, but not to convince you to choose one exclusively over the other.
You also won’t find every Python detail about any given functionality. No doubt, you will be left wondering
about some of the details. To satisfy such curiosities, you are encouraged to visit the extensive Python
documentation at docs.python.org.
THE ENVIRONMENT
Before we get into the specifics of data manipulation, let’s start by getting us to the point where we’re
ready to begin programming.
We know that we can use any text editor to write SAS code, but to execute such code we know we need
a SAS processor. The same can be said of Python, but the nature of the two languages means that
procuring these environments is achieved through very different processes. We obtain a SAS processor
by purchasing a license from SAS Institute. This license is in effect for a finite period of time, after which
the purchaser has the opportunity to renew. Of course we know that SAS has a Base product that comes
with every purchase, as well as additional packages with additional functionalities for extra costs.
On the other hand, Python is open source and freely available for download. Many operating systems
such as Linux and Mac OS X (but not Windows) have it installed by default. Python is installed with a
standard library that contains several modules, some written in C, some in Python, each of which
addresses a unique functionality. You can think of a module as similar to a PROC, although that
comparison might be a bit of a reach. Because Python is open source, community users can also build
their own modules and through code sharing repositories such as Git, contribute them to the whole
Python community. Additional packages not available with the standard library can be downloaded. An
import statement in the Python program then gives access to its functionality.
In addition to plain text editors available from third-party vendors, SAS gives us (with our purchased
license) the opportunity to write code with other interfaces. Display Manager has been around for a long
time, but newer interfaces such as SAS Studio and Enterprise Guide have improved the programming