Deeply Audio Datasets
Machine / Deep Learning Training Data

​Three types of large-scale audio datasets with different characteristics are collected over the years. It can be used for training and improving machine learning and deep learning AI models. Also we have a pre-trained model for each dataset. Gathering datasets is supported by Korea government, and the data were double-checked by NIPA, a government agency. ​Each dataset can be used for commercial or academic use at different prices. Detailed descriptions and statistics are on each dataset page below, and listen to samples.

Please do not hesitate to contact us about price inquiries. Enter your email address below.

For a faster reply, please contact - contact@deeplyinc.com

Thank you! We will contact you soon.

little-child-walks-outside-mask.jpg
00:00 / 00:03

coughing sound sample

01

Nonverbal Vocalization Data

Non-verbal voice data do not contain language. There are a total of 16 types of nonverbal vocal sounds, including screams, laughter, cries, moans, and tickling sounds. With 57 hours of data collected from 1419 people, the quality of the data was confirmed through double inspection.

02

Parent-Child Vocal Interaction Data

Various conversations between parents and children. It consists of voice interaction data of 8 classes, such as talking, singing, crying, etc. A total of 282 hours of data was confirmed through double inspection. In particular, the same sound was recorded on both types of cell phones (iPhone X, Samsung Galaxy S7) under the distance conditions of 0.4m, 2.0m, and 4.0m. In addition, taking into account the characteristics of the recording space, each piece of data is recorded in a room, studio, or anechoic room.

mother-daughter-playing-music-singing.jpg

child singing sample

00:00 / 00:01

03

Emotional Speech Corpus

business-lady-insists-changing-contact-text.jpg

negative voice sample

00:00 / 00:03

Voice data with various emotions. We recorded when sentences contain positive, neutral, or negative meanings with neutral emotion. Other sentences were recorded by the sound of the speaker with positive, neutral, or negative emotions. A total of 290 hours of data was confirmed through double inspection. Also, the same sound was recorded on both types of cell phones (iPhone X, Samsung Galaxy S7) under the distance conditions of 0.4m, 2.0m, and 4.0m. In addition, taking into account the characteristics of the recording space, each piece of data is recorded in a room, studio, or anechoic room.

​Secure datasets are already being used by large companies, research institutes, and universities to improve speech, nonverbal, and emotional AI analysis.

More information can be found at the following link.

* github link

* blog post

Please do not hesitate to contact us about price inquiries. Enter your email address below.

For a faster reply, please contact contact@deeplyinc.com

Thank you! We will contact you soon.

What is strength of datasets?

graycircle.png
graph.png

World's largest non-verbal dataset

The world's largest data set containing screams, coughs, sneezing, laughter, cyring and ects.

2_Horziontal_transparent.png

Office : E02, Space Sallim 2F, 10, Noryangjin-ro, Dongjak-gu, Seoul, Republic of Korea

Tel : +82 70-7459-0704

E-mail : contact@deeplyinc.com

Copyright © Deeply, Inc. All rights reserved.