The COVID-19 pandemic has resulted in more than 100 million infections, and more than 2 million casualties. The global crisis spans across 200 countries. Large scale testing, social distancing, and face masks have been critical measures to help contain the spread of the infection. Even with the onset of the vaccination programs, the WHO highlights large scale testing and precautionary measures must be followed for the next couple of years. While the list of symptoms is regularly updated, it is established that in symptomatic cases COVID-19 seriously impairs normal functioning of the respiratory system. Does this alter the acoustic characteristics of breathe, cough, and speech sounds produced through the respiratory system? This is an open question waiting for scientific insights. A COVID-19 diagnosis methodology based on acoustic signal analysis, if successful, can provide a remote, scalable, and economical means for testing of individuals. This can supplement the existing nucleotides based COVID-19 testing methods, such as RT-PCR and RAT.
The DiCOVA Challenge is designed to find scientific and engineering insights to the question by enabling participants to analyze an acoustic dataset gathered from COVID-19 positive and non-COVID-19 individuals. The selected findings will be presented in a special session at Interspeech 2021, the flagship conference of the global speech science and technology community, to be held in Brno from Aug 31-Sept 3, 2021. The timeliness, and the global societal importance of the challenge warrants focussed effort from researchers across the globe, including from the fields of medical and respiratory sciences, signal processing, and machine learning engineers/researchers. We look forward to your participation!


sample-image The DiCOVA Track-1 Challenge (COVID-19 detection fromm cough sounds) received registrations from 85 teams, spread across the globe and coming from industry, academia and independent individuals. All these teams were sent the challenge datasets. Of these teams, 29 participated in evaluating their systems against the blind test set (233 audio files). For this, a leaderboard was set up in Codalab and teams posted a COVID probability score for each test audio file. In response, they received the AUC score (area under the ROC curve) computed over the 233 test audio files. A high AUC (0-100%) implies better performance. Team T-1 posted an AUC of 87.04% and finished on top of the leaderboard. On the right you can see the classification performance of this team on the blind test set. Below we illustrate a few of our observations on the activity seen on the leaderboard.

The leaderboard saw participation from 29 teams.            


Each team was given a maximum of 25 attempts to evaluate their system performance against the hidden blind test labels. The AUCs of many of these systems performed better than the baseline system.            


There was a good diversity in kinds of features used by the teams. These features ranged from simple hand-crafted acoustic features (like, ZCR, energy) to advanced acoustic representations (embeddings) obtained using pre-trained DNNs.            


The novelty of the task made teams also experiment with diverse kinds of classifiers.            


The challenge task required handling class data imbalance. For this, several teams experimented with data augmentation (adding noise, reverberation, pitch shifting, etc., or cough files from other public datasets, like COUGHVID), and system fusion.            


The best performance was posted by team T-1 with an AUC of 87.04%, significantly improving over the baseline system performance (69.85%). This performance was followed by two close competitors, team T-2 posting 85.43% AUC and team T-3 posting 85.35% AUC. It was wonderful to see nine teams scores above 80% AUC!            


The evaluation was open for 22 days. In the initial days only a few teams evaluated their systems. As days passed, the leaderboard activity began to gain pace, and teams started improving their AUCs.            


How did the best AUC on the leaderboard change over evaluation days?            


Does more evaluation by a team imply a better AUC? There is some correlation :)!            


How does the performance on the test set compare against the performance on the val set?            


An important metric in evaluating a diagnosis tool is its specificity at some sensitivity. For the challenge we evaluated the specificity at 80% sensitivity. Below we show how different systems fared in this. The best specificity obtained was 83.33% by team T-1.            


And finally, here are the ROCs of the 29 systems corresponding to the best system of each team.            


Timeline (Tentative) [23:59hrs AOE]

Registrations Open :
4th Feb 2021
Registrations Close:
1st Mar
Data Release (Train and Dev):
15th Feb 2021
Baseline System Release:
22nd Feb 2021
Evaluation data and Leaderboard active:
1st Mar 2021
Final evaluation Closes:
22nd Mar 2021
System Report submission:
23rd Mar 2021
Interspeech Abstract submission:
26th Mar 2021
Interspeech Paper submission:
2nd Apr 2021


This special session features two tracks and you can participate in one or both of them. The Track-1 is focussed only on cough sound recordings, and Track-2 is open for use of broader sound categories, like, cough, breath, sustained phonation, and continuous speech.
You are encouraged to submit your findings to the DiCOVA Special Session at Interspeech 2021 for peer-review and subsequent consideration for presentation (and publication) in the conference. For this we require you to participate in one or both the tracks.

Track-1: Cough Sound
Click to expand.

Hide this content.

  • a. The goal is to use cough sound recordings from COVID-19 and non-COVID-19 individuals for the task of COVID-19 detection.
  • b. You will be provided with a train/val audio dataset, train/val lists, and a baseline system to enable design of your own models.
  • c. Subsequently, a blind evaluation dataset will be provided to all participants. You will submit your model performance scores on the blind set and the validation lists to a leaderboard interface (setup in Codalab), featuring performance of other teams on the same dataset.
  • d. The performance metric for evaluation will be based on using the area under the curve (AUC) and the specificity at 80% sensitivity.
  • e. All participants will be required to submit a system description report (2-4 pages) to the organizers. All participants are also encouraged to submit their findings to the DiCOVA Special Session, Interspeech 2021 for peer-review.
  • f. You are free to use any other dataset (excluding the Project Coswara dataset) for data augmentation.

Hide this content.

Track-2: Multi Sound
Click to expand.

Hide this content.

  • a. The goal is to use breathing, sustained phonation, and speech sound recordings from COVID-19 and non-COVID-19 individuals for any kind of detailed analysis which can contribute towards COVID-19 detection.
  • b. You will be provided an audio dataset, and also train/val/eval lists which can be used to report results.
  • c. The participants are encouraged to do their own analysis and evaluation, and submit their findings to the DiCOVA Special Session, Interspeech 2021 for peer-review.
  • d. There will be no baseline system, blind test, and leaderboard in this track.
  • e. You are free to use any other dataset (excluding the Project Coswara dataset) for data augmentation.

Hide this content.


Thank you for your interest! Below are the three quick steps to register your participation and get started in the challenge.
  • Step-1: One representative of the participating team fills the form at: click here
  • Step-2: Subsequently, we will send a Terms & Conditions document to your e-mail address. Fill it and e-mail to us at dicova2021@gmail.com.
  • Step-3: After a quick verification from our side, we will confirm your registration within 24 hrs and send you the access details to the dataset. That's it!


Sriram Ganapathy
Assistant Professor, Indian Institute of Science, Bangalore, India
Prasanta Kumar Ghosh
Associate Professor, Indian Institute of Science, Bangalore, India
Neeraj Kumar Sharma
CV Raman Postdoctoral Researcher, Indian Institute of Science, Bangalore, India
Srikanth Raj Chetupalli
Postdoctoral Researcher, Indian Institute of Science, Bangalore, India
Prashant Krishnan
Research Assistant, Indian Institute of Science, Bangalore, India
Rohit Kumar
Research Associate, Indian Institute of Science, Bangalore, India
Ananya Muguli
Research Assistant, Indian Institute of Science, Bangalore, India

Frequently Asked Questions

Q. Which programming languages can I use?

A. You are free to use any programming language you like. For system evaluation we will require you to submit the output decisions as a CSV/TXT file.
Q. How big is the Track-1 challenge dataset?

A. The train-val dataset for Track-1 contains a total of ~1.36 hrs of cough audio recordings from 75 COVID-19+ve subjects and 965 non-COVID-19 subjects. The compressed zip file size is 160 MB only. The audio data is compressed as .FLAC, sampling rate 44.1 kHz.
Q. How do I get the DiCOVA audio dataset?

A. It is simple - by registering for the challenge. Please see the registration section in this webpage (above).
Q. Can I re-distribute the data?

A. Yes but only after obtaining consent of the organizers.
Q. Are there other datasets I can use?

A. For both Track-1 and Track-2 you are not allowed to use Project Coswara data. You can use any other data with proper citation of the source in the report and the Interspeech manuscript.
Q. How do I submit my findings obtained by participating in this challenge to Interspeech 2021?

A. That's great! You can follow the Interspeech 2021 paper submission portal here. Remember to select "Special Session DiCOVA" while uploading your paper there.
Q. Can I obtain/use the DiCOVA audio data without participating in the challenge?

A. No. We might re-consider this answer after the end of the challenge. Please contact us then.

Contact Us

You have more questions? Feel free to contact us at: x@y.com where x is dicova2021 and y is gmail.