Improving Neural Abstractive Summarization with Sequence-level Models

The rise of deep learning since the mid 2010s enabled researchers to train end-to-end sequence models for abstractive summarization, which is the task of writing a summary of an input document from scratch (without just highlighting subsets of the input, which is extractive summarization). Such models are trained with maximum-likelihood estimation (MLE) with the negative log-likelihood (NLL) loss at the token-level, and typically with a single label due to a lack of annotation. However, there could be many good summaries for a single document, as summarization by nature is diverse and subjective. Sequence-level methods propose to take a radically different optimization direction compared to MLE with NLL, and instead position themselves in a second-stage approach. Assuming an existing baseline summarization model producing summary candidates, a second-stage model is trained to process the summary candidates to form a new, better output. In Mathieu’s thesis, they first propose a model learning to re-rank different summary candidates in a supervised manner. They later develop an approach performing re-ranking in the unsupervised setup. Then, they propose a new approach consisting in fusing summary candidates together into a new, abstractive second-stage method.