September 29th 2020, Adelaide, Australia (to be held virtual)

Second Software Documentation Generation Challenge (DocGen2)

Hosted by the IEEE International Conference on Software Maintenance and Evolution (ICSME 2020)


Latest News:

  • (Oct 20th) The video and slides of the keynote talk are now available.
  • (Sept 24th) The program is now available.
  • (Aug 5th) Registration to DocGen2 is open now. Follow this link to register
  • (Aug 4th) The camera-ready deadline for the papers was updated to September 4th
  • Submit your papers now by following this link
  • The workshop will go virtual due to COVID-19
  • The important dates have been updated

For the second time, the International Conference on Software Maintenance and Evolution (ICSME) is going to host the Software Documentation Generation Challenge (DocGen). This time, DocGen is organized in three different tracks. One will be a competition, with prize winners selected according to a suite of objective metrics (DeClutter); one will be a tool demo similar to the challenge in 2018 (ClassDoc); and finally, we invite open tool demos on any documentation-related topic (Open Exhibition).

Decluttering Challenge (DeClutter)

The goal of the DeClutter challenge is to build an automated tool that can identify unnecessary software documentation at the class or file level. For example, a comment saying //loop from 1 to N just before code implementing exactly that is not helpful, and worse, could drift when the underlying code is changed to (N-1) (for example).

As such, it is considered as non-informative and labeled as "Non-information = Yes". The tool should take in a CSV file in the described format, and output a label for each row as either "Non-information = Yes" for clutter, non-informative comments and "Non-information = No" for non-clutter ones.

To download the guidelines and dataset please refer to the declutter repository. Participants (either teams or individuals) will initially have access to the development data only. Later, the unlabelled test data will be released (see the timeline below). After the assessment, the labels for the test data will be released as well. Both development and test data will be extracted from the JabRef project. However, participants might use additional labeled data from other sources in order to develop and implement their systems. Guidelines describing the annotation and providing detailed instructions for participants will be released with the development data.

After submitting the predictions for the test data, participants will be required to provide a short report including a brief description of their approach, an illustration of their experiments, in particular techniques and resources used (included additional data used for building the model, if any), and an analysis of their results. Formatting guidelines will be released with the test data.

We invite potential participants to register using the webform. Registration is compulsory to participate in the challenge, although it is not binding. In any case, independently of your registration, you can decide to withdraw from participating by not sending the system results during the evaluation stage. When registering, you will also have the option to subscribe to the Google group we will create to keep participants up to date with the latest news related to the challenge. Participants are encouraged to share comments and questions with the mailing list. The challenge co-chairs will assist you for any potential issues that could be raised.

Tentative timeline for the DeClutter Challenge (report submission deadline below):

  • 30th September 2019: Development data available to participants
  • 20th May 2020: test data available
  • 28th July 2020: Kaggle competition closes

Tool Demo (ClassDoc)

The ClassDoc Tool Demo is the successor of the DocGen competition held in 2018. The goal is to build an automated system that can create on-demand developer documentation for a Java class. The documentation should help developers in answering specific questions about the class, for example:

  • What does the class do?
  • What is the purpose of the class?
  • Why is the class implemented in this way?
  • How am I supposed to use the class?

An entry to the ClassDoc Tool Demo is a program that takes as input the fully qualified name of a class from the JabRef project and generates the developer documentation for the input class in HTML format. Examples for the types of information that the generated document could contain include key methods, usage constraints, usage examples, the role of the class in applicable design patterns, known issues and limitations, etc.

Possible data sources for generating the documentation include, but are not limited to:

Participants are required to submit a two-page description of the key idea(s) behind their document generator. The description must contain the following information:

  • Authors
  • Data sources used
  • Brief overview of documentation generation technique(s) used
  • Preliminary results, including a link to an HTML file with the generated reference documentation for class org.jabref.model.entry.BibEntry as of release v5.0-alpha
  • A DOI citation that points to a preserved archive containing the source code of the document generator together with instructions describing how to reproduce the documentation generation. Zenodo and Figshare accounts can be easily linked with GitHub repositories to automatically archive releases and make them citable.

Open Exhibition

We invite submissions in the area of automated documentation that do not fall into the preceding categories. Is there an aspect to documentation generation we are missing? Some cool dataset the community should consider? Techniques from NLP that might make the entire approach moot?


Teams can enter DocGen in one of three categories:

  • The challenge category will only accept entries that address the specific DocGen2 challenge (DeClutter). Two-page summaries of the accepted competition entries will be published in the ICSME proceedings.
  • The ClassDoc tool demo will only accept entries that address classes in the JabRef project, as outlined above. Two-page descriptions of the accepted tools will be published in the ICSME proceedings.
  • The open exhibition will accept any demonstration of documentation generation technology, for any programming language, technology, or types of documents. Two-page descriptions of the accepted open exhibition submissions will be published online.

All submissions must be formatted according to the ICSME Formatting Instructions and must be submitted through EasyChair. No double-blind submissions are required. Each submission will be evaluated by the program committee.


Accepted submissions of all three tracks will be invited to present at the workshop.

The best DeClutter tool(s), evaluated using commonly accepted machine learning scoring as outlined above, will be awarded the DocGen Award 2020. The ClassDoc submissions will be ranked by JabRef developers, the best tool(s) will also receive an award.


September 29th, 2020

Session 1: Opening and Keynote

Oliver Kopp

Markdown Architectural Decision Records: Capturing Decisions Where the Developer is Working

[Video] [Slides]

More information below.


Session 2: Paper presentations


- Mingwei Liu, Xin Peng, Xiujie Meng, Huanjun Xu, Shuangshuang Xing, Xin Wang, Yang Liu and Gang Lv, Source Code based On-demand Class Documentation Generation (tool demo)

- Challenge recap

- Giuseppe Colavito, Pierpaolo Basile and Nicole Novielli, Leveraging Textual and Non-Textual Features for Documentation Decluttering (challenge)

- Mingwei Liu, Yanjun Yang, Xin Peng, Chong Wang, Chengyuan Zhao, Xin Wang and Shuangshuang Xing, Learning based and Context Aware Non-Informative Comment Detection (challenge)


Session 3: Outlook and Closing


[Video] [Slides]

Markdown Architectural Decision Records: Capturing Decisions Where the Developer is Working

An Architectural Decision (AD) is a software design choice that addresses a functional or non-functional requirement that is architecturally significant. This might, for instance, be a technology choice (e.g., Java vs. JavaScript), a choice of the IDE (e.g., IntelliJ vs. Eclipse IDE), a choice between a library (e.g., SLF4J vs java.util.logging), or a decision on features (e.g., infinite undo vs. limited undo). It should be as easy as possible to a) write down the decisions and b) to version the decisions. “Markdown and Architectural Decision Records” are one way to fulfill these requirements. This talk will outline MADR, present its current use and outline next research and development steps.

Dr. Oliver Kopp studied Software Engineering at the University of Stuttgart. Wanting to learn and research more about business processes and software architectures, he did his PhD and was a post doc at the University of Stuttgart resulting in more than 100 publications and multiple open source projects such as the Cloud application modeling tool Eclipse Winery or the literature management software JabRef. Since mid-2018, he works for the Mercedes-Benz AG in the field of software research.

Important Dates

  • 14th August 2020: Submission deadline for all three tracks
  • 28th August 2020: Notification deadline
  • 4th September 2020: Camera-ready deadline
  • 29th September 2020: (Virtual) Workshop co-located with ICSME 2020


General and Tool Demo Chairs


Sebastian Baltes QAware GmbH, Germany The University of Adelaide, Australia


Hideaki Hata Nara Institute of Science and Technology, Japan

Challenge Chairs


Nicole Novielli University of Bari, Italy


Neil Ernst University of Victoria, Canada

Exhibition Chairs


Raula G. Kula Nara Institute of Science and Technology, Japan


Michael W. Godfrey University of Waterloo, Canada

Steering Committee


Martin Robillard McGill University, Canada


Takashi Kobayashi Tokyo Institute of Technology, Japan

Publicity Chairs


Oscar Chaparro College of William and Mary, USA


David Shepherd ABB Corporate Research, USA

Program committee


DocGen2 will take place in Adelaide, Australia. More details will come soon.