Suggestions for CiML 2018
• How to choose better topics for challenges, including topics faced by the local community?
• How to design better challenges, including AI and RL challenges?
• Can we organize challenges that generalize across domains (computer vision, computational biology, etc.)?
• How to teach challenge design? How to evaluate how good a challenge design really is?
• Coopetitions: how to foster the re-use of components and participant collaboration? What can be reward mechanisms?
Summary of the 2017 discussions
• How do you use challenges in the classroom?
• How to evaluate students?
- Kaggle in class is popular.
- Engage students to create their own challenges (Michigan http://midas.umich.edu/challenges/, data science game http://www.datasciencegame.com/, Paris-Saclay mini-challenge https://sites.google.com/a/chalearn.org/saclay/.
- Use challenges to teach STEM.
- Associate or not ranking in challenge to grade: mixed blessing, combine with other metrics e.g. report.
- Use applications popular with students: finance, social good, computer vision, etc.
- Attractive topics, e.g. deep learning.
- Teach them good challenge "hygiene" (size dataset, data leakage, etc.)
• Engaging women in AI/ML?
- Surprizingly few women.
- Women have less leisure time in their early career.
- Is is a myth that women are less competitive? Maybe they are engaged in competing on the marketplace?
- Coopetititon of training competitions that are less competitive may attract more women.
- Create leagues to lower the level of entry?
What are good rewards:
- Cash is not the main motivator.
- Exchange badges for goods.
- Reward ideas: Coaching, food, collaboration, computing resources, data, jobs.
Should change play a role?
- No for serious players.
- Maybe for weaker players.
Motivate people intially:
- Keep it simple, explain well the task.
- Lower the barrier of entry.
- Social good.
- progressive difficulty.
Keep people motivated:
- Organize recurring events.
- Give intermediate rewards.
- Organize tournaments.
- Diversity of prizes; sprints.
- Allow for mistakes/regressions.
- Increase number of trials as a function of performance.
- Points for "helping others" The more we share the more we own.
Other questions of interest to the audience but not really discussed:
• Reproducible research?
• Overfitting in challenges?
• What features and extensions would be interesting to add?
• Machine/human collaboration?
• Sponsorship (interest publishers?)
Summary of the 2016 discussions:
Th 2016 CiML Workshop discussions, focused on the open discussion from CiML 2015 (see bottom of page) and coalesced into four distinct themes see also the full text of the minutes:
1.) Democratization – how to democratize Innovation via Challenges by:
- Lowering the bar to contribute to solving complex challenges requiring different levels of expertise in different scientific areas.
- Building upon previous organizers’ challenges.
2.) Incentives and Recognition – incentives required for people to collaborate and together solve complex problems via challenges, while acknowledging people’s individual contributions – make “the whole greater than the sum of the parts”.
3.) Theoretical vs. engineering decisions - leading edge ML tools & techniques for E2E solutions, with consistent pipeline, feedback loop and reproducibility.
4.) Vibrant ecosystem – how to build and grow a sustainable community with approaches shared, and value add for all - Bringing value back to the ML Community
Workshop participants shared their experiences, hopes, fears and aspirations for how Challenges can be used as an effective tool to drive open innovation across a variety of machine learning disciplines. While the discussion is ongoing, there was some consensus around key investment areas in the coming year:
o Audience Engagement
o Ease of Starting / Participating in a Challenge
2.) Incentives & Recognition
o Benefits for Hosters & Participants
o Use Cases
o Sustaining financing & staffing of ML Challenges
3.) Theoretical vs. engineering decisions
o Bridging the gap b/w ML Theory & Application
o Automated Approaches
4.) Vibrant ecosystem
o Publication & dissemination of Challenge Results
o Closing the loop between challenge results and new scientific inquiries
Questions raised in 2015 by the participants at the first CiML meeting:
• Is it possible to use a Kaggle-like platform to forecast sports scores, stock market moves, etc.? (data sets in a time series)
• Is there an optimal method to prevent cheating in competitions that use time-series data?, (which itself is easy to introduce errors into that may give some participants an unfair advantage)
• Is there a way to detect that participants are using extra data not in use by others?
• How can we take into account the uncertainty of models?
• Do a lot of people spend time over-fitting to the public leaderboard?
• Can we diagnose/prevent data leakage?
• Is there a way of factoring the elegance of technique into the scoring?
• How do you know whether your gold/silver/bronze standard is good/sufficient?
• How to index/share datasets?
• Licensing issues?
• Privacy issues?
• Data collection and annotation; crowdsourcing.
• How to reward/acknowledge data provider?
• How can we add the knowledge from forum posts to the competitions?
• How can we manage to publish the writer's solutions even when there is a substantial scientific contribution?
• What data should we collect during and after the contest to validate contest quality?
• How can we improve communication/sharing between organizers: yearly workshop?
• Do we want a magazine? Do we want e newsletter? Where to post challenges (mailing list, Wiki)?
• How can we make sure challenges are analyzed in depth? Organize a meta-challenge for back-testing?
• How can we make results so that we gain more insight from competitions? Academic impact
• How can we trace the impact of competitions down to applications and or impact on people’s career?
• Challenge taxonomy, challenge search engine (cross-platform).
• How to make competitions comparable?
• Standard metrics?
• Standard of competition “configuration file” (similar to Codalab bundles)?
• Is Kaggle the only platform?
• Is there a platform to share resources among challenge organizers?
• Unique identification of user (e.g. orcid/ identify users), cross-platform tracking of challenge performance.
• What are the features of platforms that are the most difficult to use for organizers
COOPETITIONS/NEW TYPES OF COMPETITIONS
KNOW THE “CUSTOMER” (PARTICIPANT), MORE MOTIVATING RULES, LOWER BARRIERS OF ENTRY
• How do we formulate hard tasks as challenges?
• How does organizing a company competition (private) differ from a public competition?
• What is the most important way to improve competitions?
• Are some competitions leading to tuning of existing methods rather than creation of new methods?
• How can people be recognized for small contributions to a team effort?
• Typically participants bring answers to questions, can we organize a competition where people bring additional data? (Nobel Game)
• What are the right incentives for people to collaborate?
• Would it be interesting to have a challenge on human-understandable knowledge?
• Tests of behavior of participants
• What could we do to stimulate innovative techniques?
• When are you going to host an agent-like competition?
• Can we evaluate solutions based upon user code submissions?
• Why not make it easy for participants to use an ensemble approach?
• How can we create graphs of models for deep learning solutions?
• Relative benefits of various settings? Issues of duration; Hackatons?
• Can you organize a competition where both the public domain and the participants benefit equally?
• effectiveness of challenges in teaching, use in classes?
• How can we use challenges to up-skill the workforce?
• How can we construct challenges to train people to conduct reproducible research?
• Can we gat together to get Government funding to seed new challenge efforts (similar to Pascal challenges)?
• Crowd-funding, benefactors?
• Sponsors, common fund for student grants?