2 ZHANG, NIU, ZHANG: IMAGE COMPOSITION ASSESSMENT WITH SAMP
a new image Composition Assessment DataBase (CADB) on the basis of Aesthetics and
Attributes DataBase (AADB) dataset [22]. Our CADB dataset contains 9,497 images with
each image rated by 5 individual raters who specialize in fine art for the overall composition
quality. The details of our CADB dataset will be introduced in Section 3.
Figure 1: Evaluating composition quality
from the perspectives of different composition
patterns. The first (resp., second) row shows a
good example and a bad example considering
symmetrical (resp., radial) balance.
To the best of our knowledge, there
is no method specifically designed for im-
age composition assessment. However,
some previous aesthetic assessment meth-
ods also take composition into considera-
tion. We divide the existing composition-
relevant approaches into two groups. 1)
The composition-preserving methods [4,
32] can maintain image composition during
both training and testing. However, these
approaches fail to extract composition-
relevant feature for composition assessment
task. 2) The composition-aware approaches
[28, 31, 52] extract composition-relevant
feature by modeling the mutual dependen-
cies between all pairs of objects or regions
in the image. However, redundant and noisy information is likely to be introduced dur-
ing this procedure, which may adversely affect the performance of composition assessment.
Moreover, there are some previous methods [1, 10, 29, 49, 54, 55] designed to model the
well-established photographic rules (e.g., rule of thirds and golden ratio [20]), which hu-
mans use in evaluating image composition quality. However, these rule-based methods have
two major limitations: 1) The hand-crafted feature extraction is tedious and laborious com-
pared with deep learning features [27]. 2) Each rule is valid only for specific scenes and they
did not consider which rules are applicable for a given scene [47].
Interestingly, composition pattern, as an important aspect of composition assessment, is
not explicitly considered by the above methods. As shown in Figure 1, each composition
pattern divides the holistic image into multiple non-overlapping partitions, which can model
human perception of composition quality. In particular, by analyzing the visual layout (e.g.,
positions and sizes of visual elements) according to composition pattern, i.e., comparing the
visual elements in various partitions, we can quantify the aesthetics of visual layout in terms
of visual balance (e.g., symmetrical balance and radial balance) [18, 23, 30], composition
rules (e.g., rule of thirds, diagonals and triangles) [24, 50], and so on. Different composition
patterns offer different perspectives to evaluate composition quality. For example, the com-
position pattern in the top (resp., bottom) row in Figure 1 can help judge the composition
quality in terms of symmetrical (resp., radial) balance.
To dissect visual layout based on different composition patterns, we propose a novel
multi-pattern pooling module at the end of backbone to integrate the information extracted
from multiple patterns, in which each pattern provides a perspective to evaluate the compo-
sition quality. Considering that the sizes and locations of salient objects are representative of
visual layout and fundamental to image composition [30], we further integrate visual saliency
[17] into our multi-pattern pooling module to encode the spatial and geometric information
of salient objects, leading to our Saliency-Augmented Multi-pattern Pooling (SAMP) mod-
ule. Additionally, since some composition patterns may play more important roles, we de-
sign weighted multi-pattern aggregation to fuse multi-pattern features, which can adaptively