6492
ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018,
pages 6224–6228, April.
Black, A. W. (2019). Cmu wilderness multilingual speech
dataset. In ICASSP 2019 - 2019 IEEE International
Conference on Acoustics, Speech and Signal Processing
(ICASSP), pages 5971–5975, May.
Christodoulopoulos, C. and Steedman, M. (2015). A mas-
sively parallel corpus: the bible in 100 languages. Lan-
guage Resources and Evaluation, 49(2):375–395.
Chrupała, G., Gelderloos, L., and Alishahi, A. (2017).
Representations of language in a model of visually
grounded speech signal. In Proceedings of the 55th An-
nual Meeting of the Association for Computational Lin-
guistics (Volume 1: Long Papers), pages 613–622. Asso-
ciation for Computational Linguistics.
Chung, Y.-A., Weng, W.-H., Tong, S., and Glass, J. R.
(2018). Towards unsupervised speech-to-text transla-
tion. ICASSP 2019 - 2019 IEEE International Con-
ference on Acoustics, Speech and Signal Processing
(ICASSP), pages 7170–7174.
Di Gangi, M. A., Cattoni, R., Bentivogli, L., Negri, M., and
Turchi, M. (2019). Must-c: a multilingual speech trans-
lation corpus. In Proceedings of the 2019 Conference of
the North American Chapter of the Association for Com-
putational Linguistics: Human Language Technologies,
Volume 2 (Short Papers), Minneapolis, MN, USA, June.
Federmann, C. and Lewis, W. (2016). Microsoft speech
language translation (mslt) corpus: The iwslt 2016 re-
lease for english, french and german. In Proceedings of
IWSLT 2016, December.
Harwath, D., Chuang, G., and Glass, J. R. (2018a). Vi-
sion as an interlingua: Learning multilingual semantic
embeddings of untranscribed speech. In 2018 IEEE In-
ternational Conference on Acoustics, Speech and Signal
Processing, ICASSP 2018, Calgary, AB, Canada, April
15-20, 2018, pages 4969–4973.
Harwath, D., Recasens, A., Sur
´
ıs, D., Chuang, G., Torralba,
A., and Glass, J. R. (2018b). Jointly discovering visual
objects and spoken words from raw sensory input. In
Vittorio Ferrari, et al., editors, ECCV (6), volume 11210
of Lecture Notes in Computer Science, pages 659–677.
Springer.
Hochreiter, S. and Schmidhuber, J. (1997). Long
short-term memory. Neural Comput., 9(8):1735–1780,
November.
Iranzo-S
´
anchez, J., Silvestre-Cerd
`
a, J. A., Jorge, J.,
Rosell
´
o, N., Gim
´
enez, A., Sanchis, A., Civera, J., and
Juan, A. (2020). Europarl-ST: A Multilingual Corpus
For Speech Translation Of Parliamentary Debates. In
ICASSP 2020 - 2020 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP).
Jia, Y., Johnson, M., Macherey, W., Weiss, R., Cao, Y.,
Chiu, C.-C., Ari, N., Laurenzo, S., and Wu, Y. (2019a).
Leveraging weakly supervised data to improve end-to-
end speech-to-text translation. In ICASSP 2019 - 2019
IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP), pages 7180–7184, 05.
Jia, Y., Weiss, R. J., Biadsy, F., Macherey, W., Johnson,
M., Chen, Z., and Wu, Y. (2019b). Direct Speech-to-
Speech Translation with a Sequence-to-Sequence Model.
In Proc. Interspeech 2019, pages 1123–1127.
Kisler, T., Reichel, U., and Schiel, F. (2017). Multilingual
processing of speech via web services. Computer Speech
& Language, 45:326 – 347.
Kocabiyiko
˘
glu, A. C., Besacier, L., and Kraif, O. (2018).
Augmenting librispeech with French translations: A
multimodal corpus for direct speech translation evalua-
tion. In Proceedings of the Eleventh International Con-
ference on Language Resources and Evaluation (LREC
2018), Miyazaki, Japan, May. European Language Re-
sources Association (ELRA).
Lee, L.-s., Glass, J., Lee, H.-y., and Chan, C.-a. (2015).
Spoken content retrieval beyond cascading speech recog-
nition with text retrieval. IEEE/ACM Transactions on
Audio, Speech, and Language Processing, 23(9):1389–
1420.
Littell, P., Mortensen, D. R., Lin, K., Kairis, K., Turner, C.,
and Levin, L. (2017). Uriel and lang2vec: Representing
languages as typological, geographical, and phylogenetic
vectors. In Proceedings of the 15th Conference of the
European Chapter of the Association for Computational
Linguistics: Volume 2, Short Papers, volume 2, pages
8–14.
Navigli, R. and Ponzetto, S. P. (2010). BabelNet: Build-
ing a very large multilingual semantic network. In Pro-
ceedings of the 48th Annual Meeting of the Association
for Computational Linguistics, pages 216–225, Uppsala,
Sweden, July. Association for Computational Linguis-
tics.
Nivre, J., de Marneffe, M., Ginter, F., Goldberg, Y., Hajic,
J., Manning, C. D., McDonald, R. T., Petrov, S., Pyysalo,
S., Silveira, N., Tsarfaty, R., and Zeman, D. (2016).
Universal dependencies v1: A multilingual treebank col-
lection. In Proceedings of the Tenth International Con-
ference on Language Resources and Evaluation LREC
2016, Portoro
ˇ
z, Slovenia, May 23-28, 2016.
Post, M., Kumar, G., Lopez, A., Karakos, D., Callison-
Burch, C., and Khudanpur, S. (2013). Improved
speech-to-text translation with the fisher and callhome
spanish-english speech translation corpus. In Inter-
national Workshop on Spoken Language Translation
(IWSLT 2013).
Sanabria, R., Caglayan, O., Palaskar, S., Elliott, D., Bar-
rault, L., Specia, L., and Metze, F. (2018). How2:
a large-scale dataset for multimodal language under-
standing. In Proceedings of the Workshop on Visually
Grounded Interaction and Language (ViGIL). NeurIPS.
Schultz, T. and Schlippe, T. (2014). Globalphone: Pro-
nunciation dictionaries in 20 languages. In Proceedings
of the Ninth International Conference on Language Re-
sources and Evaluation, LREC 2014, Reykjavik, Iceland,
May 26-31, 2014., pages 337–341.
Schwenk, H., Chaudhary, V., Sun, S., Gong, H., and
Guzm
´
an, F. (2019). Wikimatrix: Mining 135m paral-
lel sentences in 1620 language pairs from Wikipedia.
Preprint, abs/1907.05791.
S
´
erasset, G. (2015). Dbnary: Wiktionary as a lemon-
based multilingual lexical resource in rdf. Semantic Web,