Crowdsourcing Adverse Test Sets to Help Surface AI Blindspots
Scoring

For every participant we maintain the following in the leaderboard:

  • Submitted Images: number of submitted images per participant;
  • Remaining Quota: remaining number of images a participant is allowed to submit;
  • Adverse Examples: number of adverse examples (i.e. human verified false positive or false negative) that this participant identified;
  • Bonus Quota: number of additional images that this participant is allowed to submit.

Bonus Quota: We multiply the number of Adverse Examples a participant has discovered by 5 in order to calculate the bonus quota per participant. In other words, for every example that is scored as adverse the participant will be allowed to submit 5 more image-label pairs.

Human verification: Human raters will be rating all the image-label pairs in the submission Queue by the participant continuously throughout the challenge.

Awarding points: If multiple participants submit the same image-label pair, a point is awarded to the first participant who submits it (based on the timestamp in the submitted images queue)

Adverse examples: Image-label pairs for which the human verification is in disagreement with a machine prediction, e.g.

  • human verification = Y, machine prediction = N (false negative);
  • human verification = N, machine prediction = Y (false positive).

Winner: The winner is the participant with the highest number of Adverse Examples when the competition closes.

Check Rules to learn the challenge conditions.

Check Participate to learn how to start contributing.