namely, taking the generated template as a reference, MGCM is
able to distinguish the positive bottom b
j
for the given top t
i
from
the negative one and provides the correct m
ijk
. In addition, as we
can see from Fig. 9(b), -noTemG fails to rank the positive bottoms
at the first place, which is corrected by the complete MGCM that
takes into account the bottom template generation. Notably, the
generated templates do provide the reasonable guidance for rank-
ing the positive bottoms. Ultimately, these observations indeed
validate that both the proposed pixel-wise consistency regulariza-
tion and the auxiliary multi-modal bottom template generation in
our MGCM are helpful to improve the model preference in differ-
ent tasks.
5. Conclusion
In this paper, we propose a multi-modal generative compatibil-
ity modeling (MGCM) network, which is able to boost the perfor-
mance of compatibility modeling between fashion items (e.g., a
top and a bottom) with the auxiliary template generation. Specifi-
cally, we introduce the multi-modal enhanced compatible tem-
plate generation network to sketch a compatible template (e.g., a
bottom template) for the give fashion item (e.g., a top) with the
pix-wise consistency and template compatibility regularization.
Our proposed MGCM is able to model the compatibility preference
from both the item-item and item-template perspectives. Exten-
sive experiments on two public real-world datasets show that (1)
the generated templates are indeed helpful in guiding the compat-
ibility modeling between complementary fashion items; and (2)
the pixel-wise consistency regularization does promote the com-
patibility modeling performance. Currently, our model only mea-
sures the compatibility between two fashion items. In the future,
we plan to devise more advanced scheme to model the compatibil-
ity among multiple fashion items.
CRediT authorship contribution statement
Jinhuan Liu: Conceptualization, Methodology, Software, Formal
analysis, Data curation, Investigation, Visualization, Writing - orig-
inal draft. Xuemeng Song: Methodology, Validation, Formal analy-
sis, Investigation, Writing - review & editing, Software, Funding
acquisition. Zhumin Chen: Validation, Investigation, Resources,
Supervision, Funding acquisition. Jun Ma: Writing - review & edit-
ing, Supervision, Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing finan-
cial interests or personal relationships that could have appeared
to influence the work reported in this paper.
Acknowledgements
This work is supported by the National Natural Science Founda-
tion of China, No.: 61702300, 61672324, 61972234, 61902219 and
61672322; the Future Talents Research Funds of Shandong Univer-
sity, No.: 2018WLJH63.
References
[1] X. Han, Z. Wu, Y.-G. Jiang, L. S. Davis, Learning fashion compatibility with
bidirectional lstms, in: MM, 2017, pp. 1078–1086.
[2] C.P. Huynh, A. Ciptadi, A. Tyagi, A. Agrawal, Craft: complementary
recommendations using adversarial feature transformer, arXiv preprint
arXiv:1804.10871.
[3] G. Cucurull, P. Taslakian, D. Vazquez, Context-aware visual compatibility
prediction, arXiv preprint arXiv:1902.03646.
[4]
Y. Li, L. Cao, J. Zhu, J. Luo, Mining fashion outfit composition using an end-to-
end deep learning approach on set data, TMM 19 (8) (2017) 1946–1955
.
[5]
X. Song, F. Feng, X. Han, X. Yang, W. Liu, L. Nie, Neural compatibility modeling
with attentive knowledge distillation, SIGIR (2018) 5–14
.
[6]
M.I. Vasileva, B.A. Plummer, K. Dusad, S. Rajpal, R. Kumar, D. Forsyth, Learning
type-aware embeddings for fashion compatibility, ECCV (2018) 390–405
.
[7]
P. Isola, J.-Y. Zhu, T. Zhou, A.A. Efros, Image-to-image translation with
conditional adversarial networks, CVPR (2017) 1125–1134
.
[8]
Z. Yi, H. Zhang, P. Tan, M. Gong, Dualgan: unsupervised dual learning for
image-to-image translation, ICCV (2017) 2849–2857
.
[9]
Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, A. Stevens, L. Carin, Variational
autoencoder for deep learning of images, labels and captions, NIPS (2016)
2352–2360
.
[10]
M. Arjovsky, S. Chintala, L. Bottou, Wasserstein generative adversarial
networks, ICML (2017) 214–223
.
[11] D.P. Kingma, M. Welling, Auto-encoding variational bayes, arXiv preprint
arXiv:1312.6114.
[12]
T. Ma, J. Chen, C. Xiao, Constrained generation of semantically valid graphs via
regularizing variational autoencoders, NIPS (2018) 7113–7124
.
[13] M. Wu, N. Goodman, Multimodal generative models for scalable weakly-
supervised learning, in: NIPS, 2018, pp. 5575–5585.
[14] I. Goodfellow, Nips 2016 tutorial: generative adversarial networks, arXiv
preprint arXiv:1701.00160.
[15]
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A.
Courville, Y. Bengio, Generative adversarial nets, NIPS (2014) 2672–2680
.
[16] M. Mirza, S. Osindero, Conditional generative adversarial nets, arXiv preprint
arXiv:1411.1784.
[17]
T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, B. Catanzaro, High-resolution
image synthesis and semantic manipulation with conditional gans, CVPR
(2018) 8798–8807
.
[18]
Y. Balaji, M.R. Min, B. Bai, R. Chellappa, H.P. Graf, Conditional gan with
discriminative filter generation for text-to-video synthesis, IJCAI (2019) 1995–
2001
.
[19] V.V. Kniaz, V.A. Knyaz, J. Hladuvka, W.G. Kropatsch, V. Mizginov, Thermalgan:
multimodal color-to-thermal image translation for person re-identification in
multispectral dataset, in: ECCV, 2018.
[20]
L. Liu, H. Zhang, Y. Ji, Q.J. Wu, Toward ai fashion design: an attribute-gan model
for clothing match, Neurocomputing 341 (2019) 156–167
.
[21]
Y. Lin, P. Ren, Z. Chen, Z. Ren, J. Ma, M. de Rijke, Improving outfit
recommendation with co-supervision of fashion generation, WWW (2019)
1095–1105
.
[22]
X. Yan, J. Yang, K. Sohn, H. Lee, Attribute2image: conditional image generation
from visual attributes, ECCV (2016) 776–791
.
[23] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative
adversarial text to image synthesis, arXiv preprint arXiv:1605.05396.
[24] Y. Taigman, A. Polyak, L. Wolf, Unsupervised cross-domain image generation,
arXiv preprint arXiv:1611.02200.
[25] A.L. Maas, A.Y. Hannun, A.Y. Ng, Rectifier nonlinearities improve neural
network acoustic models, in: Proc. icml, 2013, p. 3.
[26] C. McCormick, Word2vec tutorial-the skip-gram model, 2016.
[27] Y. Kim, Convolutional neural networks for sentence classification, arXiv
preprint arXiv:1408.5882.
[28] J. Liu, X. Song, Z. Chen, J. Ma, Neural fashion experts: I know how to make the
complementary clothing matching, Neurocomputing.
[29]
A. Severyn, A. Moschitti, Twitter sentiment analysis with deep convolutional
neural networks, SIGIR (2015) 959–962
.
[30]
X. Mao, Q. Li, H. Xie, R.Y. Lau, Z. Wang, S. Paul Smolley, Least squares generative
adversarial networks, ICCV (2017) 2794–2802
.
[31]
W. Xian, P. Sangkloy, V. Agrawal, A. Raj, J. Lu, C. Fang, F. Yu, J. Hays, Texturegan:
controlling deep image synthesis with texture patches, CVPR (2018) 8456–
8465
.
[32] X. Song, F. Feng, J. Liu, Z. Li, L. Nie, J. Ma, Neurostylist: Neural compatibility
modeling for clothing matching, in: MM, 2017, pp. 753–761.
[33]
S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt-Thieme, Bpr: bayesian
personalized ranking from implicit feedback, UAI (2009) 452–461
.
[34] D.P. Kingma, J. Ba, Adam: a method for stochastic optimization, arXiv preprint
arXiv:1412.6980.
[35] E. Moulines, F.R. Bach, Non-asymptotic analysis of stochastic approximation
algorithms for machine learning, in: NIPS, 2011, pp. 451–459.
[36] Y. Lin, P. Ren, Z. Chen, Z. Ren, J. Ma, M. De Rijke, Explainable outfit
recommendation with joint outfit matching and comment generation, TKDE.
[37]
T. Fawcett, An introduction to roc analysis, Pattern Recogn. Lett. (2006) 861–
874
.
[38]
Y. Koren, R. Bell, C. Volinsky, Matrix factorization techniques for recommender
systems, Computer (2009) 30–37
.
[39]
X. Han, X. Song, J. Yin, Y. Wang, L. Nie, Prototype-guided attribute-wise
interpretable scheme for clothing matching, SIGIR (2019) 785–794
.
[40]
S. Qi, X. Wang, X. Zhang, X. Song, Z.L. Jiang, Scalable graph based non-negative
multi-view embedding for image ranking, Neurocomputing 274 (2018) 29–36
.
[41]
J. McAuley, C. Targett, Q. Shi, A. Van Den Hengel, Image-based
recommendations on styles and substitutes, SIGIR (2015) 43–52
.
[42]
J.-Y. Zhu, T. Park, P. Isola, A.A. Efros, Unpaired image-to-image translation
using cycle-consistent adversarial networks, ICCV (2017) 2223–2232
.
[43] O. Ronneberger, P. Fischer, T. Brox, U-net: convolutional networks for
biomedical image segmentation, in: MICCAI, Springer, 2015, pp. 234–241.
J. Liu et al. / Neurocomputing 414 (2020) 215–224
223