arXiv:2207.09379v2 [cs.PL] 29 Jul 2022
To what extent can we analyze Kotlin programs
using existing Java taint analysis tools?
(Extended Version)
Ranjith Krishnamurthy
Fraunhofer IEM
ranjith.krishnamurth[email protected].de
Goran Piskachev
Fraunhofer IEM
Eric Bodden
Paderborn University & Fraunhofer IEM
eric.bodden@uni-paderborn.de
Abstract—As an altern ative to Java, Kotlin has gained rapid
popularity since its introduction and has become the default
choice for developing Android apps. However, due to its inter-
operability with Java, Kotl in programs may contain almost the
same security vulnerabilities as their Java counterparts. Hence,
we question: to what extent can one use an existing Java static
taint analysis on Kotlin code? In this paper, we investigate the
challenges in implementing a taint analysis for Kotlin compared
to Java. To answer this question, we performed an exploratory
study where each Kotlin construct was examined and compared
to its Java equivalent. We identified 18 engineering challenges
that static-analysis writers need to handle differently due to
Kotlin’s uniqu e constructs or the differences in the generated
bytecode between the Kotlin and Java compilers. For eight of
them, we provide a conceptual solution, while six of those we
implemented as part of SECUCHECK-KOTLIN, an extension to
the existing Java taint analysis SECUCHECK.
Index Terms—static analysis, security, kotlin, taint analysis
I. INTRODUCTION
Ten years since its introduction, Kotlin h as been one of
the fastest-growing programming languages (PLs). As of June
2022, it is the twelfth most popular PL by the PYPL index
1
.
Additionally, over 60% of the Android apps are written in
Kotlin, earning it the title of the default PL for the Android
framework
2
. One of the Kotlin advantages as a JVM-b a sed PL
is its interoperability with Java and its unique constructs like
data classes, coroutines, n ull safety, extensio ns, etc.
Like Java, Kotlin code may be vulnerable to security vul-
nerabilities, such as SQL inje c tion [
1]. Therefore, statically
analyzing Kotlin code can be a helpful method for detecting
bugs and security v ulnerabilities as early as possible. Despite
its popularity, very few static-analysis tools can analyze Kotlin
code, such as KtLint [
2], D e te kt [3], Diktat [4], and Sonar-
Qube [
5]. These tools only perform pattern-ba sed analyses
using simple rules, such as the rules of Son arQube [6]. We are
not aware of any tool that performs deep data-flow analyses on
Kotlin code. For example, taint analysis has proven to be very
useful for detecting many prevalent security vulnerabilities [
7]
such as injections [
1], [8], [9] and XSS [10]. This versatility
of the taint analysis is due to its capacity to set various inputs
in the form of rules. At its core, the analysis follows the path
1
https://pypl.github.io/PYPL.html
2
http://surl.li/cfrcc
between so-called so urces, where the taint is created , until so-
called sinks, where the taint is reported. The information for
the sources and sinks is o ften encoded in a rule v ia a domain-
specific languag e (DSL).
For Java, ther e are many existing taint analyses [
11],
[12] that can be used to detect many taint-style security
vulnerabilities. Since Kotlin compiles to the Java bytecode,
theoretically, one c an use existing Java taint analyses on Kotlin
code. However, the Kotlin compiler generates the bytecode
differently than that of Java. This leads to the question : can
one use tain t analysis tools intended for Java to analyze Kotlin
programs, or must one reinvent the wheel?
In this paper, we report the result of an exploratory study
that we conducted to address this question. We analyzed the
Kotlin-generated bytecode for each language construct and
compare d it to the Java equivalent. We used the Jimple in te r-
mediate representation generated by the Soot framework [13]
for this comparison. For completeness, we used the official
Kotlin documentation [
14] and created a micro benchmark
with 294 simple Kotlin programs and 135 simple Java pro-
grams, where each program demonstrates a single la nguage
construct. When considering taint analysis, we found that
most Kotlin constructs can be analyzed the same way as
the Java equivalents. However, we also found 18 engineering
challenges that require a different approach. For example,
functions declared as top-level elements do not have a parent
class in the source code. However, the compiler generates a
parent class in the Java bytecode, which the taint analysis
should b e aware of to locate the function correctly. We
propose solutions for eight of these challenges that analysis
writers can implement. As a proof of concept, we extended
an existing Java taint analysis tool, SECUCHECK [
12], by
implementing six of our eight solutio ns, creating a taint
analysis tool SECUCHECK-KOTLIN that supports the standa rd
languag e constructs. Finally, we evaluated the applicability of
SECUCHECK-KOTLIN with the Kotlin version of the PetClinic
application
3
.
We present the details of our methodology in Section II.
Then, in Section
III, we report on our findings from the
study. Next, we present details of our implementation of
3
https://github.com/spring-petclinic/spring-petclinic-kotlin
SECUCHECK-KOTLIN in Section IV. Finally, we conclude and
present our future work in Section V.
II. METHODOLOGY
We examined the intermediate representation (IR) of the
Kotlin code and—if existing—the equivalent Java code. O ur
methodology consists of automatic IR generation with meta-
data usef ul for our examination, which is a manual step
that follows. We examined the following: (1) whether the
generated IR for Kotlin is valid and can be analyzed the
same wa y as the IR from equivalent Java code, (2) whether
there are difficulties due to the definition of sources and sinks,
and (3) whether there are language con structs in Kotlin that
the analysis needs to handle in a new unique way when
compare d to Java. We did not consider challenges that can
occur due to the callgraph-generation algorithms or com puting
alias information algorithms.
We used Kotlin’s official documentation [14] to examine
each language construct. During the examina tion, we covered
all construc ts from the “Concepts” section and a few from
the “Standard lib rary” section (Collectio ns, Iterators, Ranges,
and Progressions). We did not consider constructs that were
in the experimental stage at the time of this study. Table
I
summarizes Ko tlins constructs discussed in the official d ocu-
mentation and the those we manually examin ed.
Constructs
#Sub-
constructs
Supported
Types, Control flow, Packages & imports, Null safety,
Equality, This expre ssion, Destructuring declarations,
Ranges and Progressions
11
Classes and objects (except for Delegated properties) 17
Functions (except for Builders) 5
Asynchronous programming techniques, Coroutines,
Annotations, and Reflections
4
Collections and Iterators 2
Legend
: examined all constructs in the category.
: examined only the basic constructs.
: did not examine in this study.
Table I: List of Kotlin’s features discussed in Kotlin’s official
documentation.
Kotlin targets most Java Development Kit (JDK) versions.
However, the annua l developer ecosystem survey conducted
by JetBrains in 2020 shows that 73% of Kotlin deve lopers
target JDK 8 [
15]. Furthe rmore, Kotlin targets JDK 8 by
default. Therefore, we consider JDK 8 f or this explorator y
study. Additionally, we consider the Kotlin version 1.5.10.
The Kotlin compiler has various options and annotations for
modify ing the compilation process, whic h alters the outp ut of
the compiler, Java byteco de. For this study, we used the default
configuration of the compiler.
For the IR generation, we built a tool that generates Jimple
IR using th e Soot framework [
13]—JIMPLEPROVIDER. The
Jimple code is organized based on the package name. Further-
more, for each class, JIMPLEPROVIDER gene rates metada ta in
a JSON file that contains information su c h as class name, super
class, implemented interfaces, method count, method signa-
tures, local variables, invo ke expressions, etc. This metadata
helps to identify the challenges easily and quickly. For deeper
examination, we then examine the IR a nd Java b ytecode.
A. Micro benchmark
Using real-world projects for the manual examination is
infeasible because a real-world project has a complex mix of
many constructs, making it hard to identify them clearly in
Jimple. Therefore, we built a micro benchmark suite classified
into two groups—Ko tlin suite and Java suite. The Kotlin
suite consists of small Kotlin programs, ea ch focusing on one
particular Kotlin construct. If a corresponding feature exists in
Java, then an equivalent program is p resent in the Java suite.
The suits contain six main categories: basics (43 Ko tlin & 36
Java files), classes an d objects (118 Kotlin & 80 Java files),
functions (27 Kotlin & 4 Java files), generics (8 Kotlin & 10
Java files), un ique to Kotlin (87 Kotlin files), and collection
(11 Kotlin & 5 Java files). Table
II provides the ove rview of
the Kotlin suite and the important f e atures in the six categories.
Categories in
Kotlin su ite
Major features #Kotlin
files
basics
data types, control flow, package, import,
exceptions, equality, operators, variables
41
classesAndObjects
classes, enum class, inline class, sealed class,
nested / inner class, interface, functional
interface (SAM), object expression, object
declaration, delegation, qualified this, type
aliases, visibility modifiers
118
functions
simple functions, default arguments, local
functions, infix notations. tail recursive function,
varargs
27
generics
simple generic type, generic functions, raw
types, upper bounds,
8
uniqueToKotlin
data class, destructuring declaration, extensions,
higher-order functions, inline functions, null
safety, operator overloading, primary
constructor, properties, ranges, progressions,
smart cast, string template, declaration site
variance, type projection
87
collection collection and iterators 11
Table II: O v erview of Kotlin suite.
B. Manual examination
The manual examination of the Jimp le cod e was per formed
by the first author, who has mo re than 4.5 years o f soft-
ware development experience and is a Ph.D. student f ocusing
on program ming languages and static analysis. The mo re
complex constructs, especially those specific to Kotlin or
with differences from Java, were discussed with the second
author, a Ph .D. student in th e last year with expertise in the
static analysis, and an external researcher with professional
experience in Kotlin development. The examiners used the
JIMPLEPROVIDER to generate the IR f or the entire micro
benchm ark. Th e n, ea c h construct was inspected manually.
First, the generated metadata that provides information related
to taint ana lysis is studied. Next, the g enerated IR is che c ked
for a deeper examination. If more information is needed, then
the gener a te d b ytecode is examined. Based on this, th e exam-
iner conclude d whether a construct requires special handling
in Kotlin taint analysis compar ed to Java taint analysis.
C. Threats to Validity
Our study involves a manual step, making it possible that
some of the find ings are inco mplete o r incorrect. Furthermore,
the programs written in the micro benchmark suite are based
on perso nal experience. Therefore, some advanced use cases
may be missing. As discussed earlier in this section, we
considered the Kotlin version 1.5.10 and the target JDK 8.
However, there is a risk that for some of the constructs, the
Kotlin compiler may generate the bytecode differently for
different versions. Also, for some constructs, the compiler may
generate bytecode differently if some c ompiler options are
used. As stated earlier, we only used the default config uration.
III. FINDINGS
In Su b-Section
III-A, we present the engineering challenges
we identified and to which we have proposed a solution. Then,
in Sub-Section III-B, we present the engineering challenges,
which we leave as open issues. Then, in Sub-Section
III-C,
we answer two research questions for the explorator y study.
A. Engineering challenges with proposed solution
1) Data ty pe mapping: On the bytecode level, some data
types in Kotlin a re mapped to Java data types. For example, the
non-nullable kotlin.Int is mapped to Java’s int. Table
IV summarizes the data type mapping from Kotlin source
code to the Java bytecode. Similarly, the compiler maps the
function type to kotlin.jvm.functions.Function
*
in the Java bytec ode as described in Table
III. This mapping
is only affected by the number of parameters taken by the
function type. The type of the parameters or return type will
not affect the mapping. Note: the mapping de scribed in Table
III is also valid for the respective nu llable function types.
Due to this data type mapping, the users must provid e valid
method signatures based on the Java bytecode to specify the
source, sink , and other relevant method calls. However, it is
cumbersome for the users to find the valid method signatures
in big projects, ma king the tool not usable.
KOTL IN FUNCTION TYPE TYPE IN JAVA BYTECODE
Function type with 0 parameter,
e.g. () Int
kotlin.jvm.functions.Function0
Function type with 1 parameter,
e.g. (Byte) Unit
kotlin.jvm.functions.Function1
Function type with 2 parameters,
e.g. (Int, Int) Int
kotlin.jvm.functions.Function2
...
Function type with 22 parameters kotlin.jvm.functions.Function22
Function type with more than 22
parameters
kotlin.jvm.functions.FunctionN
Table I II: Kotlin function type mapping.
Proposed solution: To handle this challenge, static-
analysis developers can implement a data typ e transformer,
which takes a method signature provided by the users as input.
Then, the tr ansformer checks for the par ameters and return
type in the given method signature. If the par ameters type and
return type are valid Kotlin data types, the transformer replace s
the Kotlin data typ e with the re spective Java data ty pe.
2) Type alias: A type alias allows developers to give a
new name to the existing type. For examp le , in the Kotlin
standard library, ArrayList is defined as a type alias
to java.util.ArrayList. Therefore, ArrayList does
not exist in the bytecode. However, the experts in the Kotlin
programming language know which types ar e defined as type
alias in Kotlin standard libraries. Furthermore, domain experts
in custom libraries such a s cryp tograph ic APIs know what type
aliases are defined in their libraries. On the other hand, users
of the existing Java taint analysis tools may not know such
type aliases and m a y give invalid method signatures.
Proposed solut ion: Static-analysis developers can implement
a feature as part of the D SL that allows domain experts
to spe cify type aliases—type alias specifications. The DSL
semantics replaces all the type aliases found in the given
method signatures with the original type specified in the given
type alias specifications.
3) Property: In Kotlin, a property is a field with an
accessor. By default, Kotlin provides a getter and setter fo r
mutable properties; for im mutable properties, the getter only.
Whenever there is acce ss to a prope rty in Kotlin so urce c ode,
the Kotlin compiler uses the respective accessor method in
the Java bytecode. Similar to variables, pro perties can be
tainted. Therefore, the getter and setter of prop erties can be
the source, sink, or propagator methods. Thus, the user nee ds
to be aware of these signatures.
Proposed solution: Static-analysis developers can provide a
feature in the DSL that enables users to specif y a property by
providing the fully qualified class name in whic h the property
is defined, the property name, and the propertys type . Then,
the valid accessor method signature can be built automa tically.
The pattern for the getter method is <given fully
qualified class name>: <given property’s
type> get<given property name with first
letter caps>(). Similarly, the setter method’s pattern
is <given fully qualified class name>: void
set<given property name with first letter
caps>(<given property’s type>).
4) Top-level members: In Kotlin, top-level members are
defined in a Kotlin file under a package. Kotlin functions and
properties can be top-level members. These members are not
declared in any class, object, or interface. Therefore, in Kotlin
source code, top-level members can be accessed directly with-
out creating any object or using a class to access it. However,
the Kotlin compiler generates a class in the Java bytecod e
and declares those top-level members as static members in
the generated class. Suppose a novice user wants to specify
top-level members as the source, sanitizer, propag ator, or sink
methods. In that case, the user must iden tify the valid class
name in the method signature of top-level memb e rs.
Proposed solution: To identify a valid class name of top-
level m embers, one needs the filename an d the package name
in which top-level members are defined. Therefo re, static-
analysis developers can provide a f eature in the DSL that
enables users to specify a function or a property as a top-
level member by providing the packa ge name and the file
KOTLIN DATA TYPE TYPE I N JAVA BYTECODE
SPECIAL RETURN TYPES
Nothing java.lang.Void
Unit void
BASIC TYPES
Byte byte
Short short
Int int
Long long
Char char
Float float
Double double
Boolean boolean
FEW BUILT-IN CLASS
Any java.lang.Object
Cloneable java.lang.Cloneable
Comparable java.lang.Comparable
Enum java.lang.Enum
Annotation java.lang.Annotation
CharSequence java.lang.CharSequence
String java.lang.String
Number java.lang.Number
Throwable java.lang.Throwable
ARRAY TYPES
Array<Byte> java.lang.Byte[]
Array<Short> java.lang.Short[]
Array<Int> java.lang.Integer[]
Array<Long> java.lang.Long[]
Array<Char> java.lang.Character[]
Array<Float> java.lang.Float[]
Array<Double> java.lang.Double[]
Array<Boolean> java.lang.Boolean[]
Array<Any> java.lang.Object[]
Array<
*
>
*
[]
BASIC TYPES ARRAY
ByteArray byte[]
ShortArray short[]
IntArray int[]
LongArray long[]
CharArray char[]
FloatArray float[]
DoubleArray double[]
BooleanArray boolean[]
IMMUTABLE COLLECTIONS
Collection<T> java.util.Collection<T>
List<T> java.util.List<T>
Set<T> java.util.Set<T>
Map<K, V> java.util.Map<K, V>
Map.Entry<K, V> java.util.Map.Entry<K, V>
Iterator<T> java.util.Iterator<T>
Iterable<T> java.lang.Iterable<T>
ListIterator<T> java.util.ListIterator<T>
MUTABLE COLLECTIONS
MutableCollection<T> java.util.Collection<T>
MutableList<T> java.util.List<T>
MutableSet<T> java.util.Set<T>
MutableMap<K, V> java.util.Map<K, V>
MutableMap.Entry<K, V> java.util.Map.Entry<K, V>
MutableIterator<T> java.util.Iterator<T>
MutableIterable<T> java.lang.Iterable<T>
MutableListIterator<T> java.util.ListIterator<T>
(a) Mapping for non-nullable types
KOTLIN DATA TYPE TYPE I N JAVA BYTECODE
SPECIAL RETURN TYPES
Nothing? java.lang.Void
Unit? Unit
BASIC TYPES
Byte? java.lang.Byte
Short? java.lang.Short
Int? java.lang.Integer
Long? java.lang.Long
Char? java.lang.Character
Float? java.lang.Float
Double? java.lang.Double
Boolean? java.lang.Boolean
FEW BUILT-IN CLASS
Any? java.lang.Object
Cloneable? java.lang.Cloneable
Comparable? java.lang.Comparable
Enum? java.lang.Enum
Annotation? java.lang.Annotation
CharSequence? java.lang.CharSequence
String? java.lang.String
Number? java.lang.Number
Throwable? java.lang.Throwable
ARRAY TYPES
Array<Byte>? java.lang.Byte[]
Array<Short>? java.lang.Short[]
Array<Int>? java.lang.Integer[]
Array<Long>? java.lang.Long[]
Array<Char>? java.lang.Character[]
Array<Float>? java.lang.Float[]
Array<Double>? java.lang.Double[]
Array<Boolean>? java.lang.Boolean[]
Array<Any>? java.lang.Object[]
Array<
*
>?
*
[]
BASIC TYPES ARRAY
ByteArray? byte[]
ShortArray? short[]
IntArray? int[]
LongArray? long[]
CharArray? char[]
FloatArray? float[]
DoubleArray? double[]
BooleanArray? boolean[]
IMMUTABLE COLLECTIONS
Collection<T>? java.util.Collection<T>
List<T>? java.util.List<T>
Set<T>? java.util.Set<T>
Map<K, V>? java.util.Map<K, V>
Map.Entry<K, V>? java.util.Map.Entry<K, V>
Iterator<T>? java.util.Iterator<T>
Iterable<T>? java.lang.Iterable<T>
ListIterator<T>? java.util.ListIterator<T>
MUTABLE COLLECTIONS
MutableCollection<T>? java.util.Collection<T>
MutableList<T>? java.util.List<T>
MutableSet<T>? java.util.Set<T>
MutableMap<K, V>? java.util.Map<K, V>
MutableMap.Entry<K, V>? java.util.Map.Entry<K, V>
MutableIterator<T>? java.util.Iterator<T>
MutableIterable<T>? java.lang.Iterable<T>
MutableListIterator<T>? java.util.ListIterator<T>
(b) Mapping for nullable types
Table IV: Data types ma pping from Kotlin source code to the Java bytecode
name in which a top-level member is defined. Then, the DSL
component can build a valid class name for a top-level functio n
or accessors of a top-level property. The rule to build the valid
class name is <given package name>.<given file
name>Kt.
5) Default arguments: In Java, the overload feature can
achieve a default valu e to function or constructor arguments.
However, this increases the number of overloads. Kotlin
avoids this problem by providing a default argument feature
in a constructor or function . For a function or constructor
with default argume nts, the Kotlin compiler generates two
implementations in the Java bytecode. First, the actual im-
plementation with all the parameters as defined in source
code. Second implementation generated by the compiler with
additional argum e nts that determines the de fault arguments’
value and calls the actual impleme ntation. For constructor, the
compiler adds two additional arguments at the end—int and
kotlin.jvm.internal.DefaultConstructorMar-
ker. Similarly, for a function, the compiler adds int and
java.lang.Object at the end. Additionally, if the func-
tion is a member func tion, the com piler adds a first argument
of type in which the function is defined. This added first argu-
ment is the this-object of the member f unction. Fu rthermore,
for a default argument in a top-level function or membe r func-
tion, the compiler adds the suffix $default to the function
name f or the second implementa tion. If a developer does not
pass va lue to default arguments, then the compiler calls the
second implementation. Suppose users of taint analysis tools
specify a default argument constructor or function as a source
method. In tha t case, the analysis component should iden tify
the second implementation generated by the compiler as a
source method an d track the variables correc tly.
Proposed solut ion: If the analysis fails to identify a method
call as a source, sink, or other relevant method as specified
in taint-flow specifications, then the analysis checks for the
second im plementation of the default argument feature. For
each function or constructor in taint-flow specifications, add
the additional arguments and modify the function name as
described in Sub-Section
III-A5. Su bsequently, if the method
signature matches with the method call’s signature, track the
respective variables. For constructor and top-level function,
track the variables based on the specified rules for the matched
method in taint-flow specifications. However, fo r member
functions, since the compiler adds a parameter at the begin-
ning, the analysis should consider this added first argument
while tracking the variables. For example, if the this-object
is specified to track , then track the first argument in the Java
bytecod e. Likewise, track the second argument in the Java
bytecod e if the first argument is specified to track and so forth.
6) Extensions: In Kotlin, the extension feature allows
extending an existing class with new members without using
inheritance . However, extensions will not modify and add a
new member to an existing class; instead, the new member
is made accessible using the dot- notation on variables of
the type (r e ceiver type) for which the extension me mber
is defined. In the Java bytecode for a top-level extension
function, the Kotlin compiler adds the receiver type as the
first argument, followed by the actual parameters defined in
the source code. Similarly, the compiler adds the receiver
type as the first argumen t to the getter method of a top-
level extension property. Note: The compiler generates only
the getter method fo r an extension property. Furthermore, for
top-level companion object extension members, the compiler
also adds the receiver type as the fir st a rgument, followed
by the actual argument defined in the source code. However,
the added first a rgument type is the wrapper class generated
for a companion object. The companion object is discussed
in detail in Section
III-B1. Like top-level extension member s,
the compiler also ad ds the receiver type as the first a rgument
for an extension d efined as a class member. Furthermore,
Kotlin supports qualified this-object to access the outer
class’s this-object. For this, the c ompiler considers the actual
this-obje ct (outer class’s this) in the Java bytecode as a
qualified outer class’s this-o bject in the source code and the
first argument in the Java bytecode as a receiver this-object
in the source code. Suppose users want to spec ify an extension
member as a so urce or sink method, then users might give
an invalid method signature since users might not be aware
of the first argument of receive r type a dded by the compiler.
Furthermore, if users specify to track the this-object in an
extension member, then the analysis should track the first
argument. Likewise, the analysis sho uld track the actual this-
object in the Java bytecode if the outer class this-ob je ct is
specified to track. Similarly, if users specify to track the first
argument in an extension function, then the analysis should
track the second argument and so forth.
Proposed solut ion: To handle extension functions and exten-
sion properties, static-analysis developers should make their
taint-flow specifiations aware of these. If this is done thr ough
the DSL for taint-flow specifications, the DSL can build the
valid method signature by adding the given fully qualified class
name as the first argument. Furthermore, the u sers should not
be able to obtain a setter method from an extension property
since an extension property can not have a setter method.
To handle companion object extensions, static-analysis de-
velopers can provide a feature in the DSL. This featu re
enables th e users to specify a function or prop erty as a
compan ion object extension me mber by providing the fully
qualified class name and the name of the companion ob-
ject for which the extension is defined. If the name of the
compan ion object is not given, then by d efault, the name is
Companion. From these inputs, the generated wrapper class
for the companion object can be built as <given fully
qualified class name>$<given companion ob-
ject name>. Then, the valid method sign a ture can be built
by addin g this wrapper class as a first argu ment.
To handle th e qualified this-object in extensions as mem-
bers, the DSL should b e able to track the this-object as
extension receiver or dispatch receiver (outer class’s this-
object). If users specify to track this-object as an extension
receiver, modify the tain t-flow specification to track the first
parameter in the Java bytecod e. Similarly, if users specify
BUILT-IN OPERATOR MAPP ED TO A FUNCTION
UNARY OPERATORS
+obj obj.unaryPlus()
-obj obj.unaryMinus()
!obj obj.not()
++obj obj.inc()
--obj obj.dec()
obj++ obj.inc()
obj-- obj.dec()
ARITHME TIC OPERATO RS
obj1 + obj2 obj.plus(obj2)
obj1 - obj2 obj.minus(obj2)
obj1
*
obj2 obj.times(obj2)
obj1 / obj2 obj.div(obj2)
obj1 % obj2 obj.rem(obj2)
obj1..obj2 obj.rangeTo(obj2)
AUGMENTED ASSIGNMENT OPERATORS
obj1 += obj2 obj.plusAssign(obj2)
obj1 -= obj2 obj.minusAssign(obj2)
obj1
*
= obj2 obj.timesAssign(obj2)
obj1 /= obj2 obj.divAssign(obj2)
obj1 %= obj2 obj.remAssign(obj2)
EQUALITY CHECK OPERATOR
obj1 == obj2 obj.equals(obj2)
obj1 != obj2 !(obj.equals(obj2))
BUILT-IN OPERATOR MAPP ED TO A FUNCTION
IN OPERATOR
obj1 in obj2 obj.contains(obj2)
obj1 !in obj2 !(obj.contains(obj2))
INDEX OPERATORS
obj[i] obj.get(i)
obj[i, j] obj.get(i, j)
obj[i, j, k] obj.get(i, j, k)
obj[i1, ..., in] obj.get(i1, ..., in)
obj[i] = obj2 obj.set(i, obj2)
obj[i, j] = obj2 obj.set(i, j, obj2)
obj[i, j, k] = obj2 obj.set(i, j, k, obj2)
obj[i1, ..., in] = obj2 obj.set(i1, ..., in, obj2)
INVOKE OPERATORS
obj() obj.invoke()
obj(i) obj.invoke(i)
obj(i, j) obj.invoke(i, j)
obj(i, j, k) obj.invoke(i, j, k)
obj(i1, i2, ..., in) obj.invoke(i1, i2, ..., in)
COMPARISON OPERATORS
obj1 > obj2 obj.compareTo(obj2)
obj1 < obj2 obj.compareTo(obj2)
obj1 >= obj2 obj.compareTo(obj2)
obj1 <= obj2 obj.compareTo(obj2)
Table V: Built-in operators a nd its co rresponding functions in Kotlin.
to track this-object as dispatch receiver, modify the taint-
flow specification to track the actual this-object in the Java
bytecod e. Similarly, for an extension function, if user specify
to track the first parameter, then analysis sho uld track the
second parameter and so forth.
7) Infix function: In Kotlin, infix functions are called
using the infix notation, i. e., without the dot n otation a nd the
parenthe ses. The in fix fu nction must be a member fu nction or
extension function and must have a single parameter without a
default value. Similar to a standard function, an infix function
can be a source, sink, and other relevant me thods. However, a
novice u ser of taint analysis tools ma y not know how the infix
function works in the Java byteco de and may provide invalid
method signatures.
Proposed solution: Static-analysis developers can provide a
DSL feature that enables users of taint analysis tools to specify
a function a s an infix func tion by providing a function name,
receiver type, parameter type, and return type. Then, DSL
can build a valid method signature as <given receiver
type>: <given return type> <given function
name>(<given parameter type>).
8) Operator overloading: Operator overloading redefine s
the implementation of the built-in operators with specific
types. For example, o ne can overload the ++ operator by
defining the function inc on a custom class. The compiler
calls the implemented inc function in the Java bytecode.
Table
V provides the mapping between the built-in opera tor
and the function name. An overloaded operator function can
be a sanitizer or propagator method. However, the novice users
of taint analysis tools ma y not know th e mapping o f the built-
in operators to the function name and may provide invalid
method signatures.
Proposed solution: Static-analysis developers can provide a
feature in DSL that enables users to specify an overloaded
operator by providing the symbol of an operator, type of the
receiver, return type, and the parameter(s) type based on an
operator. Then, DSL can build the valid method signature
by mapping the given operator symbol to the function as
described in Ta ble
V.
B. Engineering challenges without solution (open issues)
1) Companion object: In Kotlin, a c ompanio n object
binds members to a class rather than the instance of
a class. Kotlin ’s companion object is similar to Java’s
static members. However, the Kotlin compiler generates
a wrapper class for each companion object in the Java
bytecod e. The namin g scheme for that wrapper class is
<class name in which the companion object
is defined>$<companion object name>. If the
compan ion object name is not provided in Kotlin source code,
then by default the name is Companion. The c ompiler places
the implementation of that companion object’s members in
the generated wrap per class.
Furthermore, to allow that wrapper class to access the
private members of the ac tual class and vice versa, the com-
piler generates additional functions for each private member.
For a private fun ction, the naming schem e for the generated
function is access$<actual name of the private
function>. Similarly, the namin g scheme for the accessors
of a private property is access$<accessor’s method
name of a property>$cp. The acce ssors’ method name
is discussed in Sub-Section
III-A3.
Due to such implementation of companion objects in the
Java bytecode, users of ta int analysis tools might find it
difficult to identify valid method signatures. Additionally, f or
the function that takes a co mpanion object as a parameter,
users must give that parameter type as a generated wrapper
class in the method signature, w hich is not visible in the
source code. Furthermore, the analysis should be aware of
the generated functions for private mem bers, wh ic h might be
a possible source, sink, or propagator.
2) Destructuring declaration: In Kotlin, an object c an be
destructure d into multiple variables in a single statement using
the destructuring declaration . To allow a class to destructure,
that class must have the componentN fun ctions with the
operator key word. These component functions return the
properties of a class. The widely used convention fo r the order
of componentN functions is the order of pr operties defined
in a class. However, it is not mandatory, and developers can
make component functions return any properties of a class.
Suppose the fun c tion component1 returns the first prope rty
and the users of taint analysis tools specify the getter method
of the first prope rty as a sou rce method. In that case, the
analysis should be able to identify the component1 function
as a source method. Ther efore, the analysis must know the
mapping between the componentN functions and p roperties
of a class to identify a taint-flow in a destructuring declaration.
3) Internal modifier: In Kotlin, a memb er declared with
an internal modifier is only visible inside the mo dule in
which the member is defined. Kotlin defines a module as a
group of Kotlin files that are compiled together. In the Java
bytecod e, the Kotlin compiler appends the symbol hyphen
followed by the module name f or the a ccessors of an internal
property and to an internal membe r fun c tion. However, we
did n ot observe this behavior for classes, interfaces, top-
level functions, or accessors of top-level pro perties, which a re
declared as internal. Suppose users of taint analysis tools
specify an internal member fu nction or accessors of internal
property as a sink method. I n that case, the analysis component
should identify the modified name with the appended m odule
name as a sink method . Otherwise, the analysis componen t
fails to detect taint-flow in internal member functions and
properties. Note: if there is a symbol hyphen in the module
name, the Kotlin compiler replaces it with the underscore
before appending it to the in ternal member fu nctions and
accessors of internal p roperty in the Java by te code.
4) Inline cla ss: Kotlin’s inline class wraps an existing
class with improved performance compared to a manually
created wrapper cla ss. In the Java bytecode, the Kotlin com-
piler generates some of the member functions for an inline
class—constructor, accessor for a p roperty (wrapped class),
toString, hashCode, and equality check. These func-
tions are generated to support the interoperability with Java.
However, the compiler generates the alternative version of
these functions to improve the pe rformance by inlining the
wrapped class in place of wrap per class usage. In addition, the
compiler adds the suffix -impl to the imp roved version of
these functio ns and to the overridden function of an interface.
Additionally, the compiler gene rates box-impl and unbox-
impl function for boxing and unboxing the wrap ped class.
The Kotlin compiler calls the -impl version of member
functions wherever it is possible to improve th e perform ance.
Suppose users of taint analysis tools specify the member
functions o f an inline class as a source. In that case, along
with the actual implementation, the analysis should identify
its -impl version as a source. Otherwise, the existing Java
taint analysis tools fail to detect taint-flows in an inlin e class.
5) Function returning anonymous object: In Kotlin, ob-
ject expressions create objects of an anonymous class. Every
object expression has at least one base class. The Kotlin
compiler generates a wrapper class for each instance of object
expression in the Java bytecode similar to Java. However, in
contrast to Java, the re turn type in Kotlin’s function is not
mandatory to specify, and the compiler c an infer the type.
Suppose a function is private and retu rns an anonymous object.
In that case, the compiler infe rs the retu rn type as the generated
wrapper c lass, which is not visible in the source code. This
makes it challenging for the users to identify the valid method
signature of a private function that returns a nonymous object.
6) Local functions: Kotlin suppor ts local functions, which
are functions inside other functions. These local func tions
can access the outer functions local variables. For a local
function, the Kotlin compiler generates a static function in the
Java bytecode. The naming scheme and the parameters of this
static function are <outer function name>$<local
function name><-digits starting from 0 if
there are multiple local function with th-
e same name>(<outer functions local varia-
bles if accessed by local function>, <this
object if the outer function is a member
function>, <actual parameter as defined f-
or the local function in Kotlin source co-
de>). Additionally, if a local function accesses an mutable
local variable of an outer function, then the compiler passes
the reference type to reflect the changes in the outer fu nction.
For example, if the local function access mutable Int
type, then in the Java bytecode the Kotlin compiler passes
the kotlin.jvm.internal.Ref$IntRef type to
the generated static function as a param eter. Due to such
implementation of local function in the Java bytecode, it is
challengin g for the user to ide ntify the valid method signature
of a local fu nction. Furthermore, the analysis must handle the
accessed local variables of the o uter functions to tr ack the
tainted variable.
7) Higher-order functions: Kotlin provides a function type
that enable s higher-order function in Kotlin. These function
types are mapped to kotlin.jvm.functions.Functi-
on
*
types in the Java byteco de as described in Table
III.
Furthermore, there are five ways to create an instance of
a function type in Kotlin—lambda expression, anonymous
function, function literal with a receiver, callable reference,
and instan c es of a custom class that implements a function
type. The Kotlin compiler genera tes a wrapper class for each
instance of a function type in Kotlin source code. The naming
scheme for this wrapper class is <class name in which
the lambda is declared>$<function name in
which the lambda is declared>$<variable n-
ame in which the lambda expression is sto-
red if any otherwise this is optional>$<d-
igits starting from 1>. This wrapper class overrides
the interface function invoke, in which the Kotlin compiler
places the imp le mentation of a lambda expression.
Similar to the local functions accessing the outer function’s
local variables as discussed in Sub-Section
III-B6, lambda
expressions can also access the outer function’s local variables.
All the accessed variables are passed to the constructor of
the wrapper class. Then, the constructor stores these val-
ues in its fields, which can be acce ssed in the invoke
method. Furthermore , if the outer function’s local va riable
is immutable, the compiler passes the reference type, e.g.
kotlin.jvm.internal.Ref$IntRef. For an anony-
mous function, the compiler generates the Java bytecode
similar to the lambda expression. Similarly, for a class im-
plementing a f unction type, the compiler implements the
kotlin.jvm.functions.Function
*
in the Java byte-
code and implements the interface method invoke.
For a function literal with a receiver, the compiler generates
the Java bytecode similar to the lambda expression, except
that the receiver object is p a ssed as the first argument to
the invoke method. For callable reference, the compiler
generates the Java bytecode similar to a lambda expression.
However, the receiver of a callable reference is passed to
the constructor of the generated wrapper class, which stores
the receiver in the superclass’ field called receiver. Later,
the function invoke access the field receiver to call the
respective me mber function.
Java uses invokedynamic instruction for lambda ex-
pression. Therefore, the existing Java taint analysis tools
detect taint-flows in lambda expressions in Java by han-
dling the invokedynamic instruction in the Java byte-
code. However, by default, the Kotlin compiler does not use
invokedynamic instruction for an instance of a function
type, which leads to the existing Java taint analysis tools fail-
ing to detect taint-flows in higher-order functions. Therefore,
the analysis must handle th e generated wrapper class for an
instance of a function type to track the tainted information.
Furthermore, the analysis should handle the receiver property
to track the tainted receive r object for a callable reference.
Furthermore, similar to local functions (
III-B6), the analysis
should handle the accessed local variables of the outer func-
tions to track the tainted variable.
Note: for a functional interface or a Single Abstract Meth od
(SAM), the Kotlin com piler g enerates the Java bytec ode
similar to the Java’s lambda expression by default, i.e.,
invokedynamic instruction in the Java bytecode.
8) Inline function: As discussed in Sub-Sectio n
III-B7,
the Kotlin compiler generates a wrapper class for each in-
stance of a function type, captures the outer function’s local
variables, which leads to extra memory allocations, and extra
virtual method ca ll introduces runtime overhead. However,
in some scenarios, such runtime overhead can be elim inated
by inlining the lambda expre ssion ra ther than crea ting an
instance of a f unction type. For this pu rpose, Kotlin provides
inline functions. For example, the println function in
Kotlin is dec lared as inline, which calls the Java’s function
System.out.println. Therefore, in the Java bytecode,
we find the System.out.println function call in place
of Kotlin’s println c all site. Sim ilarly, custom higher-order
functions can also be declared as in line in Kotlin. Suppose
users of taint analysis tools specify an inline function as a sink
method. In that case, taint analysis tools fail to detect taint-flow
that reaches this sink m ethod since there is no actual method
call of an inline function in the Java byteco de. Therefore,
taint analysis tools mu st know the propagation rule for all the
method calls in the bo dy of that in line function. Othe rwise, it
fails to detect ta int-flows in inline functions.
9) Sealed class: A sealed class restricts users from in-
heriting a class or interface, and all the derived classes
are known at compile time. To achieve this, the Ko tlin
compiler ma kes the constructor private and overloads the
constructo r with an additional parameter at the end—
kotlin.jvm.internal.DefaultConsructorMark-
er. This allows the compiler to call the overloaded constructor
for the known derived class and restricts developers from
creating a new derived class. Suppose users of taint analysis
tools specify the constructor of a sealed class as a propagato r
method. In that case, the analysis must identify the overloaded
constructo r as a propagator. Oth e rwise, taint analysis to ols fail
to detect taint- flows in a sealed class’s constructor.
10) Package: In Java, the package name must match the
path of that Java file. However, in Kotlin, th e package name
can be different than the path of that Kotlin file. Once the
analysis component completes and returns the found results,
some existing Java taint analysis tools use the package name
to build the path of the Java file to display the errors in an IDE.
However, if the Kotlin file’s path is different from its package,
then taint a nalysis tools fail to display the found taint-flows
in an IDE.
C. Research Questions
In the previous two sub-sections, Sub-Section
III-A and
Sub-Section III-B, we discussed the various enginee ring chal-
lenges that mu st b e handled in the existing Java taint analysis
tools to support taint analysis on Kotlin code. In this sub-
section, we answer two research questions (
RQ1 and RQ2),
which evaluates o ur exploratory study.
RQ1: Which Kotlin’s features can be analyzed by the existing
Java taint analysis tools without any engineering challenge?
To answer this research question, we list the Kotlin’s
features for which the Kotlin compiler generates the Java
bytecod e similarly to the Java compiler. The existing Java
taint analysis tools can analyze Kotlin program s c ontaining
these features without any engineering challenges. For all the
features listed under this research question, Soot generates
the valid Jimple code. Furthermore, th e analysis component
can perform taint a nalysis on these fe atures and requires no
additional constructs in the DSL co mponent to handle these
features.
Kotlin’s features
Similarity
level
similar to
Explicit conversion *
typecasting and Java’s methods like intValue,
byteValue etc.
Arithmetic operators, Bitwise operators, Comparison operators, assignment
operators, unary operators, logical op erators, equality check, L iteral constants,
varargs
is operator, unsafe cast op erator, safe cast operator * instance check (instanceof), typecasting
when construct *
lookupswitch, tableswitch, comparison,
goto and label
for construct * Java’s for, iterators
while, do-while, if construct
return, break, continue, labeled break and labeled con tinue, and qualified this
in nested / inner class
labeled return (non-local return) * goto and label statements
try-catch, finally, throw
import, named arguments
Open class non-final class
Abstract class, inheritance, overriding methods, calling super class implemen-
tations, multiple inheritance
Functional interface (SAM) * Java’s lambda expression (invokedynamic)
Generics
Nested class, inner class, enum class
Object expression instance of anonymous class
Object declaration * singleton pattern
Delegation (in inheritance) * Delegation pattern
varargs
Tail-recursive function normal function and loops
String template *
Java’s StringBuilder append, Kotlin’s
stringPlus methods
Smart cast * Typecasting after instance check
lateinit *
uninitialized field, null check, Null pointer
exception (NPE)
Null safety * null check, goto and label statements, NPE
Default implementation in interface
Legend
: similar to the respective feature in J ava.
: similar to Java, but the naming scheme of the generated wrapper class is different compared to Java.
: completely different to Java, but there is no challenge concerning taint analysis in DSL, analysis or IR generator components.
*: similar to Java’s features, some of them are not visible in the source code.
Table VI: List of Kotlin’s features, for which the existing Java taint a nalysis tools can analyze without any c hallenge.
Table
VI summarizes the features of Kotlin that can be
analyzed by the existing Java taint analysis tools.
represents
Kotlin’s features, for which the Kotlin compiler generates the
Java bytecode similar to the respective features in Java. *
represents Kotlin’s features, for which the Kotlin compiler gen-
erates the Java bytecode similar to some features (third column
in the table) in Java, which are not visible in Kotlin so urce
code. For example, the explicit conversion f rom the Number
type to the Int in Kotlin is pe rformed using the method
toInt. How ever, in some scenarios, the Kotlin compiler uses
the intValue method in the Java bytecode. Furthermore,
the Kotlin compiler uses the StringBuilder append
method for the String template, a nd in some scenarios, it uses
Kotlin’s stringPlus method. Therefore , we recommend us-
ing Java’s general propagator methods while analyzing Kotlin
programs.
represents Kotlin’s features, for which the Kotlin
compiler generates the Java bytecode similar to the respective
features in Java. However, the naming scheme of the generated
wrapper class by the Kotlin compiler is different compar e d to
that of the Java compiler. For the default implementation in
an interface, th e Kotlin compiler gener ates the Java bytecode
differently from Java
. The Java compiler keeps the default
implementation in the interface. However, the Kotlin compiler
keeps only the abstract methods in the interface, and the
default im plementation is placed in the generated wrapper
class. For exam ple, suppose a developer uses the default
implementation method in a class, which implements that
interface. In that case, the Kotlin compiler overrides that
method automatically and calls the default implementation
present in the wrapper class. Whenever that default method
is called, the Kotlin compiler calls the virtual method from
the object of th e derived class or interface similar to the Java
compiler. Therefore, there is no engine ering challenge with
this feature in the DSL, analysis, and IR ge nerator comp onent.
However, there ma y be some challenges with this feature in
other compone nts suc h as the call graph generator com ponent.
Kotlin ’s features Engineering challenges
can be solved
in
Note
Data types Data type mapping (III-A1)
Exception types and type alias Type alias (III-A2)
Kotlin’s exception types are defined as type aliases to
Java’s exception types.
Top-level functions and top-level properties Top-level members (III-A4)
Package Package (III-B10)
This challenge can also be solved in the component that
integrates the analysis with the IDE.
Constructor with default arguments and function with default
arguments
Default argument (III-A5)
Internal visibility modifier Internal modifier (III-B3)
Sealed class Sealed class (III-B9)
Inline class Inline class (III-B4)
Function returning anonymous object Function returning anonymous object (III-B5)
Companion object Companion object (III-B1)
Infix function Infix function (III-A7)
Local functions Local functions (III-B6)
Qualified this object
qualified this object in extensions as members (III-A6),
qualified this object in function with receiver type (
III-B7)
qualified this object in nested / inner class is same as
Java’s qualified this in nested / inner class. Therefore,
there is no challenge in this s cenario
Destructuring declaration Destructuring declaration (III-B2)
Properties accessors Properties accessors (III-A3)
Extension function, extension property, companion object ex-
tension and extensions as members
Extensions (III-A6)
Data class Destructuring declaration (III-B2), default argument (III-A5) ,
For a data class, the Kotlin compiler automatically gen-
erates the componentN function for the destructuring
declaration
. Additionally, it also generates the copy
function with the default value. Therefore, this feature
also has the challenge of default argument for the copy
function
.
lambda expression, anonymous function, function literal with a
receiver, callable reference, class implementing function types
Higher-order function (III-B7), function type (III-A1)
inline function Inline function (III-B8)
Operator overloading Operator overloading (III-A8)
Ranges and Progressions
Top-level members (III-A4), infix function (III-A7), extens ions
(
III-A6
)
In Ranges and Progressions, the methods until,
downto, and step are defined as top-level, extension,
infix function.
Collections and Iterators
Data type mapping (III-A1), Top-level members (III-A4),
extensions (
III-A6) and destructuring declarations (III-B2)
and
Kotlin uses other features s uch as extensions, destruc-
turing declaration, etc., to define the members of collec-
tions and iterators. In addition, some of the collection
types are mapped to Java’s collection types, and some
collection types are defined as type aliases to Java’s
collection types.
Legend
: Engineering challenge(s) can be solved in the DSL component.
: Engineering challenge(s) can be solved in the analysis component.
: Engineering challenge(s) can be solved either in the DSL or analysis component (depends on the analysis designer decision).
Table VII: List of Kotlin’s features that re quires an extension in the DSL or analysis components of the existing Java taint
analysis tools to support taint analysis fo r Kotlin programs.
RQ2: For which Kotlin’s features, the existing Java taint
analysis tools need an exten sio n to support taint analysis for
Kotlin programs?
To answer this research question, we list the Kotlins
features for which the Kotlin compiler generates the Java
bytecod e differently from the Java compiler. Such differences
makes an engineering challenge s in the existing Java taint anal-
ysis tools analyzing Kotlin programs, as discussed in Section
III. Static analysis developers must handle these challegens in
the DSL, analysis, or IR generato r components.
Table
VII summarizes the features of Ko tlin that re quire an
extension in the existing Java taint analysis tools to supp ort
taint analysis for Kotlin. The engine e ring challenges associated
with Kotlin’s features are given in the second column. If a
Kotlin’s feature can be handled in the DSL component, then
we categorize that feature into
. Furthermore, suppose a
challenge can be solved in the analysis component without
any input from the users of taint analysis tools. In that
case, we categorize that feature into
. For instance, we
can solve the default argument challenge (Sub-Section III-A5)
in the analysis component without im plementing additional
constructs in the DSL component.
We did not propose a solution for the ch allenges d iscussed
in Section
III-B. However, we can handle these challenges
in the DSL or an a lysis com ponent based on the solution and
the analysis d esigner’s decision. Kotlin’s f e atures associated
with these challenges are categorized into
. Additio nally, for
all the features of Kotlin that we manually examined in this
exploratory research, Soot generates th e valid Jimple code.
IV. SECUCHECK-KOTLIN
As a pro of of concept, we extended an existing Java taint
analysis tool called SECUCHECK [12] b y implementing the
solution fo r six of the engineering challenges discussed in Sub-
Section
III-A. For the taint analy sis, SECUCHECK uses Jimple
IR [16] generated by SOOT [17]. Furthermor e , SECUCHECK
provides a DSL called uentTQL [7] for specifying taint
flows. First, we discuss the impleme ntation of SECUCHECK-
KOTLIN in Sub-Section
IV-A. The n, we evaluate the applica-
bility of SECUCHECK-KOTLIN in Sub-Section IV-B.
A. Implementation
Table
VIII summarizes the list of challenges th at we handled
in SECUCHECK-KOTLIN. We implemented the solutions for
these challenges without modifying the existing architecture
of SECUCHECK.
For handling the data type mapping discussed in Sub-
Section
III-A1, we implemented a data type transformer
module in fluentTQL
*. This transformer checks whether
the given type in a meth od signature is a valid Kotlin type
Challenges Solved in Newly added constructs in fluentTQL
Data type mapping
(
III-A1)
* -
Type alias
(III-A2)
TypeAliases class for experts in Kotlin and domain-experts
in custom libraries. The object of TypeAliases are
accepted in MethodSignatureBuilder,
MethodSelector, and MethodConfigurator.
Property
(III-A3)
property, getter, and setter methods
Top-level members
(III-A4)
topLevelMember method
Extensions
(III-A6)
extensionFunction and extensionProperty
methods. For handling qualified this object challenge,
provides constants
QualifiedThis.DISPATCH_RECEIVER and
QualifiedThis.EXTENSION_RECEIVER
Default argument
(III-A5)
-
Legend
: solved in uentTQL DSL.
*: solved in fluentTQL without implementing new construct in fluentTQL DSL.
: solved in the analysis compon ent.
Table VIII: List of found engineering challenges handled in
SECUCHECK-KOTLIN.
or function type, as described in Tables III and IV, and
transforms the given type into respec tive Java data type. This
allows users to provid e Kotlin types such as kotlin.Int,
kotlin.Int?, etc., in a method signature. In addition, users
can also provide short type names such as Int, Int?, etc.
Furthermore, for a function type, users can provide a regular
expression such as ()
”, or a function type itself
such as () String”. The transformer looks f or the
function type expression and transforms it into a valid data
type, as summarized in Table
III. The limitation of this current
implementation is that users can not provide complex function
types such as (Int) (Int, Int) String. For
such function types, users must use regular expressions, e.g.,
(
) . Additiona lly, suppose users want to specify to
track a parameter of function type in fluentTQL. In that case,
users m ust explicitly specify the pr opagation rules for the
invoke method of the Function
*
class as discussed in
Sub-Section
III-B7.
For handling type alias (III-A2), property (III-A3), top-
level mem bers (
III-A4), and extensions (III-A6), we im ple-
mented new constructs in fluen tTQL that helps the users
to specify the respective features . Listing 1 demon-
strates the way of specifying type aliases and extension
property in fluentTQL of SECUCHECK-KOTLIN. For the
type alias challen ge, we implemented the TypeAliases
class in uentTQL, which experts in Kotlin pro gramming
languag e or the doma in experts in custom libraries can use
to specify typ e aliases. For instance, experts in the Kotlin
programming language can specify the type aliases defined
in the Kotlin standard library as shown in Lines 2-6. Then,
the users of fluentTQL can use the specified type aliases
in MethodConfigurator (Line 14), MethodSelector,
or MethodSignatureBuilder, which replaces the given
type a lias with the original type as specified by the experts.
Note: SECUCHECK provides MethodSignatureBuilder
for novice users to build a method signature with fluent
interface. Similarly, it provides MethodConfigurator and
MethodSelector for configuring methods with ta int infor-
mation using fluent interface.
For handling the proper ty, top-level members, an d ex-
tensions, we provide the methods in the fluent interface
of MethodSignatureBuilder. For example, for proper-
ties, the methods are property, getter (Line 12), and
setter. If a property is an extension, then the method
is extensionProperty (Line 11). This function takes
three arguments—receiver type, property name, and prop-
erty type. From these inputs, fluentTQL builds the valid
method signature. Similarly, extensionFunction method
for specifying extension fu nctions and the topLevelMem-
ber method for specifying top-level members. For han-
dling the qua lified this-object in extensions, we pro-
vide two constants—Qualified.DISPATCH_RECEIVER
and Qualified.EXTENSION_RECEIVER, which can
be used in the method thisObject (Line 15) of
MethodConfigurator to track the respective this
object. The limitation for this implementation is that
these methods are only available in th e method chain
of MethodSignatureBuilder and not availab le for
MethodConfigurator and MethodSelector. Simi-
larly, the qualified this constants are only available
for MethodConfigurator, and it is not available for
MethodSelector. Fin a lly, we handled the challenge of
default argument in the analysis component
, as proposed
in Sub-Section III-A5.
B. Evaluation
To evaluate the applicability of SECUCHECK-KOTLIN, we
found a vulnerable version of the Spr ing PetClinic application
written in Kotlin
4
. This project contains 27 Kotlin files with
six known hibernate injections as summarized in Table IX.
Project Name
#Kotlin-files
#Taint-flows
#Queries
#Found-flows
#Runtime (s)
Display
error messages?
Display
line numbers?
Display
file locations?
spring-petclinic-kotlin
(vulnerable)
27 6 5 6 11.05
#Kotlin-files: Number of Kotlin files in the project
#Taint-flows: Number of known taint-flows in the pro je c t
#Queries: Number of specified taint-flow queries in fluentTQL of SE CUCHECK-KOTLIN
#Found-flows: Number of found taint-flows by SECUCH E CK-KOTLIN
#Runtime (s): Runtime in seconds (average of 10 runs)
Table IX: Overview of SECUCHECK-KOTLIN analysis results.
SECUCHECK-KOTLIN found all the six taint-flows with the
run time of 11.05 seconds (average of 10 runs). SECUCHECK-
KOTLIN successfully displayed the valid line numbers of
the source and sink methods. It also displayed the cus-
tomized error message as well as the d e scriptive message s
from the fluentTQL2ENGLISH translator [
12]. Additionally,
SECUCHECK-KOTLIN displayed the file locations of the
source and sink methods. However, SECUCHECK through the
command prompt displays the file loc a tion of the classes
4
https://shorturl.at/hvyRS
1 // Specified by Kotlin programming language experts.
2 static TypeAliases typeAliases = new TypeAliases(){{
3 add("ArrayList", "java.util.ArrayList");
4 add("HashSet", "java.util.HashSet");
5 ...
6 }};
7
8 // Specified by the users of fluentTQL
9 public MethodSignature signature = new MethodSignatureBuilder()
10 .atClass("de.fraunhofer.iem.EmployeePrinter")
11 .extensionProperty("de.fraunhofer.iem.Employee", "nameLength", "Int")
12 .getter();
13
14 public Method source1 = new MethodConfigurator(signature, typeAliases)
15 .in().thisObject(QualifiedThis.DISPATCH_RECEIVER)
16 .out().returnValue()
17 .configure();
Listing 1: Ex ample of type alias, extension property in fluentTQL of SECUCHECK-KOTLIN.
instead of the Java files in the Static An alysis Results Inter-
change Format (SARIF) [18] output. Therefore, SECUCHECK-
KOTLIN has no problem in displaying the valid file loc ations.
However, suppose developers want to display the file location
of th e Kotlin files instead of the class files in the SARIF output.
In that ca se, file location of the Kotlin files has to be identified
based on the challenge we discussed in Section
III.
V. CONCLUSION AND FUTURE WORK
In this paper, we presented our exploratory study for Kotlin
taint ana lysis, which shows that most of the Kotlin constructs
can be analyzed by an existing Java taint analysis tool.
However, we found 1 8 engineer ing challenges that must be
handled differently than the Java taint analysis. For eight
of these challeng es, we proposed solutions. Finally, as a
proof of concept, we extended an existing Java taint analysis,
SECUCHECK, by implementing six of these solutions, which
led to SECUCHECK-KOTLIN. We evaluated the applicability
of SECUCHECK-KO TLIN, which found all the six expected
taint-flows. In the future, we plan to work on the open issues
from Sub-Section
III-B and extend the implementation of
SECUCHECK-KOTLIN, after which a thor ough evaluation with
real-world applications can b e performed.
REFERENCES
[1] “CWE-89: Improper Neutralization of Special Elements used in
an SQL Command,
https://cwe.mitre.org/data/definitions/89.html, ac-
cessed: 2021-June-22.
[2] KTLINT: An anti-bikeshedding Kotlin linter with built-in formatter,
https://github.com/pinterest/ktlint, accessed: 2021-December-14.
[3] DETEKT: static analysis for Kotlin, https://github.com/detekt/detekt,
accessed: 2021-December-14.
[4] DIKTAT: Strict coding s tandard for Kotlin and a custom s et of
rules for detecting code smells, code style issues and bugs,
https://github.com/diktat-static-analysis/diKTat, accessed: 2021-
December-14.
[5] “SONARQUBE: automatic code review tool to detect bugs, vulnerabili-
ties, and code smells, https://docs.sonarqube.org/latest/, accessed: 2021-
December-14.
[6] “SONARQUBE: rules for Kotlin.”
https://rules.sonarsource.com/kotl in ,
accessed: 2021-December-14.
[7] G. Piskachev, J. Sp¨ath, I. Budde, and E. Bodden, “Fluently
specifying taint-flow queries with fluenttql, Empir. Softw.
Eng., vol. 27, no. 5, p. 104, 2022. [Online]. Available:
https://doi.org/10.1007/s10664-022-10165-y
[8] “CWE-77: Improper Neutralization of Special Ele-
ments used in a Command (’Command Injection’),
https://cwe.mitre.org/data/definitions/77.html , accessed: 2021-October-
25.
[9] “CWE-476: NULL Pointer Dereference,
https://cwe.mitre.org/data/definitions/476.html, accessed: 2021-June-18.
[10] D. Endler, “The evolution of cross site scripting attacks, Technical
report, iDEFENSE Labs, Tech. Rep., 2002.
[11] S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein,
Y. Le Traon, D. Octeau, and P. McDaniel, “Flowdroid: Precise context,
flow, field, object-sensitive and lifecycle-aware taint analysis for android
apps, Acm Sigplan Notices, vol. 49, no. 6, pp. 259–269, 2014.
[12] G. Piskachev, R. Krishnamurthy, and E. Bodden, “Secucheck: Engineer-
ing configurable taint analysis for software developers, in 2021 IEEE
21st International Working Conference on Source Code Analysis and
Manipulation (SCAM), 2021, pp. 24–29.
[13] P. Lam, E. Bodden, O. Lhot´ak, and L. Hendren, “The Soot framework
for Java program analysis: a retrospective, in Cetus Users and Compiler
Infastructure Workshop (CETUS 2011), vol. 15, no. 35, 2011.
[14] “Kotlin’s official documentation,
https://kotlinlang.org/docs/home.html,
accessed: 2021-November-21.
[15] “The State of D eveloper Ecosystem 2020,
https://www.jetbrains.com/lp/devecosystem-2020/kotlin/, accessed:
2021-October-27.
[16] R. Vallee-Rai and L . J. Hendren, “Jimple: Simplifying Java bytecode
for analyses and transformations, 1998.
[17] R. Vall´ee-Rai, P. Co, E. Gagnon, L. Hendren, P. Lam, and
V. Sundaresan, “Soot: A java bytecode optimization framework,
in CASCON First Decade High Impact Papers, ser. CASCON ’10.
Riverton, NJ, USA: IBM Corp., 2010, pp. 214–224. [Online]. Available:
https://doi.org/10.1145/1925805.1925818
[18] S. Kummita and G . Piskachev, “Integration of the static analysis results
interchange format in cognicrypt, arXiv preprint arXiv:1907.02558,
2019.