For the second time, the International Conference on Software Maintenance and Evolution (ICSME) is going to host the Software Documentation Generation Challenge (DocGen). This time, DocGen is organized in three different tracks. One will be a competition, with prize winners selected according to a suite of objective metrics (DeClutter); one will be a tool demo similar to the challenge in 2018 (ClassDoc); and finally, we invite open tool demos on any documentation-related topic (Open Exhibition).
Decluttering Challenge (DeClutter)
NB Technical details such as format and evaluation metrics subject to change
The goal of the DeClutter challenge is to build an automated tool that can identify unnecessary software documentation at the class or file level. For example, a comment saying
//loop from 1 to N just before code implementing exactly that is not helpful, and worse, could drift when the underlying code is changed to
(N-1) (for example).
The tool should take in a CSV file in the described format, and output a label for each row as either "Clutter" or "Non-Clutter". Tools will be ranked according to commonly accepted machine learning scoring, e.g. Accuracy, Precision, Recall, Matthews Correlation. We may also award a prize for the tool which does best in some hitherto unappreciated dimension. DeClutter will be run as automatically as possible, using Kaggle as a leaderboard mechanism. This is a great opportunity for new researchers to get involved in the topic.
Participants (either teams or individuals) will initially have access to the development data only. Later, the unlabelled test data will be released (see a tentative timeline below). After the assessment, the labels for the test data will be released as well. Both development and test data will be extracted from the JabRef project. However, participants might use additional labeled data from other sources in order to develop and implement their systems. Guidelines describing the annotation and providing detailed instructions for participants will be released with the development data.
After submitting the predictions for the test data, participants will be required to provide a short report including a brief description of their approach, an illustration of their experiments, in particular techniques and resources used (included additional data used for building the model, if any), and an analysis of their results. Formatting guidelines will be released with the test data.
We invite potential participants to register using the webform. Registration is compulsory to participate in the challenge, although it is not binding. In any case, independently of your registration, you can decide to withdraw from participating by not sending the system results during the evaluation stage. When registering, you will also have the option to subscribe to the Google group we will create to keep participants up to date with the latest news related to the challenge. Participants are encouraged to share comments and questions with the mailing list. The challenge co-chairs will assist you for any potential issues that could be raised.
Tentative timeline for the DeClutter Challenge (report submission deadline below):
- 4th November 2019: Development data available to participants
- 20th April 2020: Test data available, registration closes
- 4th May 2020: System results due to organizers
- 8th June 2020: Assessment returned to participants
Tool Demo (ClassDoc)
The ClassDoc Tool Demo is the successor of the DocGen competition held in 2018. The goal is to build an automated system that can create on-demand developer documentation for a Java class. The documentation should help developers in answering specific questions about the class, for example:
- What does the class do?
- What is the purpose of the class?
- Why is the class implemented in this way?
- How am I supposed to use the class?
An entry to the ClassDoc Tool Demo is a program that takes as input the fully qualified name of a class from the JabRef project and generates the developer documentation for the input class in HTML format. Examples for the types of information that the generated document could contain include key methods, usage constraints, usage examples, the role of the class in applicable design patterns, known issues and limitations, etc.
Possible data sources for generating the documentation include, but are not limited to:
- Source code and existing comments
- Version control data
- Forum discussions
- Help pages
- Issue tracking data
- Related test cases
- Stack Overflow threads
Participants are required to submit a two-page description of the key idea(s) behind their document generator. The description must contain the following information:
- Data sources used
- Brief overview of documentation generation technique(s) used
- Preliminary results, including a link to an HTML file with the generated reference documentation for class org.jabref.model.entry.BibEntry as of release v5.0-alpha
- A DOI citation that points to a preserved archive containing the source code of the document generator together with instructions describing how to reproduce the documentation generation. Zenodo and Figshare accounts can be easily linked with GitHub repositories to automatically archive releases and make them citable.
We invite submissions in the area of automated documentation that do not fall into the preceding categories. Is there an aspect to documentation generation we are missing? Some cool dataset the community should consider? Techniques from NLP that might make the entire approach moot?
Teams can enter DocGen in one of three categories:
- The challenge category will only accept entries that address the specific DocGen2 challenge (DeClutter). Two-page summaries of the accepted competition entries will be published in the ICSME proceedings.
- The ClassDoc tool demo will only accept entries that address classes in the JabRef project, as outlined above. Two-page descriptions of the accepted tools will be published in the ICSME proceedings.
- The open exhibition will accept any demonstration of documentation generation technology, for any programming language, technology, or types of documents. Two-page descriptions of the accepted open exhibition submissions will be published online.
All submissions must be formatted according to the ICSME Formatting Instructions and must be submitted through EasyChair. Each submission will be evaluated by the program committee.
Accepted submissions of all three tracks will be invited to present at the workshop.
The best DeClutter tool(s), evaluated using commonly accepted machine learning scoring as outlined above, will be awarded the DocGen Award 2020. The ClassDoc submissions will be ranked by JabRef developers, the best tool(s) will also receive an award.
- 10th July 2020: Submission deadline for all three tracks
- 24th July 2020: Notification deadline
- 7th August 2020: Camera-ready deadline (tentative)
- September 2020: Workshop co-located with ICSME 2020
General and Tool Demo Chairs
- Andrian Marcus (The University of Texas at Dallas, USA)
- Christoph Treude (The University of Adelaide, Australia)
- Denys Poshyvanyk (College of William and Mary, USA)
- Hideto Ogawa (Hitachi Ltd, Japan)
- James Clause (University of Delaware, USA)
- Jin Guo (McGill University, Canada)
- Katsuhisa Maruyama (Ritsumeikan University, Japan)
- Laura Moreno (Colorado State University, USA)
- Michael Decker (Bowling Green State University, USA)
- Michele Lanza (Università della Svizzera italiana, Switzerland)
- Norihiro Yoshida (Nagoya University, Japan)
- Oscar Chaparro (College of William and Mary, USA)
- Shinpei Hayashi (Tokyo Institute of Technology, Japan)
- Sonia Haiduc (Florida State University, USA)
- Takashi Ishio (Osaka University, Japan)
- Vincent Ng (The University of Texas at Dallas, USA)
DocGen2 will take place in Adelaide, Australia. More details will come soon.