Main Content

Evaluate translation or summarization with ROUGE similarity score

The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scoring algorithm evaluates the similarity between a candidate document and a collection of reference documents. Use the ROUGE score to evaluate the quality of document translation and summarization models.

returns the ROUGE score between the specified candidate document and the reference
documents. The function, by default, computes unigram overlaps between
`score`

= rougeEvaluationScore(`candidate`

,`references`

)`candidate`

and `references`

. This is also known as
the ROUGE-N metric with n-gram length 1. For more information, see ROUGE Score.

specifies additional options using one or more name-value pairs.`score`

= rougeEvaluationScore(`candidate`

,`references`

,`Name,Value`

)

Specify the candidate document as a `tokenizedDocument`

object.

```
str = "the fast brown fox jumped over the lazy dog";
candidate = tokenizedDocument(str)
```

candidate = tokenizedDocument: 9 tokens: the fast brown fox jumped over the lazy dog

Specify the reference documents as a `tokenizedDocument`

array.

str = [ "the quick brown animal jumped over the lazy dog" "the quick brown fox jumped over the lazy dog"]; references = tokenizedDocument(str)

references = 2x1 tokenizedDocument: 9 tokens: the quick brown animal jumped over the lazy dog 9 tokens: the quick brown fox jumped over the lazy dog

Calculate the ROUGE score between the candidate document and the reference documents.

score = rougeEvaluationScore(candidate,references)

score = 0.8889

Specify the candidate document as a `tokenizedDocument`

object.

```
str = "a simple summary document containing some words";
candidate = tokenizedDocument(str)
```

candidate = tokenizedDocument: 7 tokens: a simple summary document containing some words

Specify the reference documents as a `tokenizedDocument`

array.

str = [ "a simple document" "another document with some words"]; references = tokenizedDocument(str)

references = 2x1 tokenizedDocument: 3 tokens: a simple document 5 tokens: another document with some words

Calculate the ROUGE score between the candidate document and the reference documents using the default options.

score = rougeEvaluationScore(candidate,references)

score = 1

The `rougeEvaluationScore`

function, by default, compares unigram (single-token) overlaps between the candidate document and the reference documents. Because the ROUGE score is a recall-based measure, if one of the reference documents is made up entirely of unigrams that appear in the candidate document, the resulting ROUGE score is one. In this scenario, the output of the `rougeEvaluationScore`

function is uninformative.

For a more meaningful result, calcualte the ROUGE score again using bigrams by setting the `'NgramLength'`

option to `2`

. The resulting score is less than one, since every reference document contain bigrams that do not appear in the candidate document.

`score = rougeEvaluationScore(candidate,references,'NgramLength',2)`

score = 0.5000

`candidate`

— Candidate document`tokenizedDocument`

scalar | string array | cell array of character vectorsCandidate document, specified as a `tokenizedDocument`

scalar, a string array,
or a cell array of character vectors. If
`candidate`

is not a
`tokenizedDocument`

scalar, then it
must be a row vector representing a single document, where each
element is a word.

`references`

— Reference documents`tokenizedDocument`

array | string array | cell array of character vectorsReference documents, specified as a `tokenizedDocument`

array, a string array,
or a cell array of character vectors. If `references`

is not a
`tokenizedDocument`

array, then it must be a row vector representing
a single document, where each element is a word. To evaluate against multiple reference
documents, use a `tokenizedDocument`

array.

Specify optional
comma-separated pairs of `Name,Value`

arguments. `Name`

is
the argument name and `Value`

is the corresponding value.
`Name`

must appear inside quotes. You can specify several name and value
pair arguments in any order as
`Name1,Value1,...,NameN,ValueN`

.

```
scores =
rougeEvaluationScore(candidate,references,'ROUGEMethod','weighted-subsequences')
```

specifies to use the weighted subsequences ROUGE method.`ROUGEMethod`

— ROUGE method`'n-grams'`

(default) | `'longest-common-subsequences'`

| `'weighted-subsequences'`

| `'skip-bigrams'`

| `'skip-bigrams-and-unigrams'`

ROUGE method, specified as the comma-separated pair consisting of
`'ROUGEMethod'`

and one of the following:

`'n-grams'`

– Evaluate the ROUGE score using n-gram overlaps between the candidate document and the reference documents. This is also known as the ROUGE-N metric.`'longest-common-subsequences'`

– Evaluate the ROUGE score using Longest Common Subsequence (LCS) statistics. This is also known as the ROUGE-L metric.`'weighted-subsequences'`

– Evaluate the ROUGE score using weighted longest common subsequence statistics. This method favors consecutive LCSs. This is also known as the ROUGE-W metric.`'skip-bigrams'`

– Evaluate the ROUGE score using skip-bigram (any pair of words in sentence order) co-occurrence statistics. This is also known as the ROUGE-S metric.`'skip-bigrams-and-unigrams'`

– Evaluate the ROUGE score using skip-bigram and unigram co-occurrence statistics. This is also known as the ROUGE-SU metric.

`NgramLength`

— N-gram length1 (default) | positive integer

N-gram length used for the `'n-grams'`

ROUGE method (ROUGE-N),
specified as the comma-separated pair consisting of `'NgramLength'`

and a positive integer.

If the `'ROUGEMethod'`

option is not
`'n-grams'`

, then the `'NgramLength'`

option has no
effect.

**Tip**

If the longest document in `references`

has fewer than
`NgramLength`

words, then the resulting ROUGE score is
`NaN`

. If `candidate`

has fewer than
`NgramLength`

words, then the resulting ROUGE score is zero. To ensure
that `rougeEvaluationScore`

returns nonzero scores for very short
documents, set `NgramLength`

to a positive integer smaller than the length
of `candidate`

and the length of the longest document in
`references`

.

**Data Types: **`single`

| `double`

| `int8`

| `int16`

| `int32`

| `int64`

| `uint8`

| `uint16`

| `uint32`

| `uint64`

`SkipDistance`

— Skip distance4 (default) | positive integer

Skip distance used for the `'skip-bigrams'`

and
`'skip-bigrams-and-unigrams'`

ROUGE methods (ROUGE-S and ROUGE-SU),
specified as the comma-separated pair consisting of `'SkipDistance'`

and a positive integer.

If the `'ROUGEMethod'`

option is not
`'skip-bigrams'`

or `'skip-bigrams-and-unigrams'`

,
then the `'SkipDistance'`

option has no effect.

**Data Types: **`single`

| `double`

| `int8`

| `int16`

| `int32`

| `int64`

| `uint8`

| `uint16`

| `uint32`

| `uint64`

`score`

— ROUGE scorescalar

ROUGE score, returned as a scalar value in the range [0,1] or
`NaN`

.

A ROUGE score close to zero indicates poor similarity between
`candidate`

and `references`

. A ROUGE score
close to one indicates strong similarity between `candidate`

and
`references`

. If `candidate`

is identical to one
of the reference documents, then `score`

is 1. If
`candidate`

and `references`

are both empty
documents, then the resulting ROUGE score is `NaN`

.

**Tip**

If the longest document in `references`

has fewer than
`NgramLength`

words, then the resulting ROUGE score is
`NaN`

. If `candidate`

has fewer than
`NgramLength`

words, then the resulting ROUGE score is zero. To ensure
that `rougeEvaluationScore`

returns nonzero scores for very short
documents, set `NgramLength`

to a positive integer smaller than the length
of `candidate`

and the length of the longest document in
`references`

.

The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scoring algorithm [1] calculates the similarity between a candidate document and a collection of reference documents. Use the ROUGE score to evaluate the quality of document translation and summarization models.

Given an n-gram length *n*, the ROUGE-N metric between a candidate
document and a *single* reference document is given by

$${\text{ROUGE-N}}_{\text{single}}\text{(candidate,reference})=\frac{{\displaystyle \sum _{{r}_{i}\in \text{reference}}{\displaystyle \sum _{\text{n-gram}\in {r}_{i}}\text{Count}(\text{n-gram,candidate})}}}{{\displaystyle \sum _{{r}_{i}\in \text{reference}}\text{numNgrams}({r}_{i})}},$$

where the elements *r _{i}* are
sentences in the reference document, $$\text{Count}(\text{n-gram},\text{candidate})$$ is the number of times the specified n-gram occurs in the candidate
document and

For sets of multiple reference documents, the ROUGE-N metric is given by

$$\text{ROUGE-N(candidate,references)=m}a{x}_{k}\left\{{\text{ROUGE-N}}_{\text{single}}({\text{candidate,references}}_{k})\right\}.$$

To use the ROUGE-N metric, set the `'ROUGEMethod'`

option to
`'n-grams'`

.

Given a sentence $$d=[{w}_{1},\dots ,{w}_{m}]$$ and a sentence *s*, where the elements
*s _{i}* correspond to words, the subsequence $$[{w}_{{i}_{1}},\dots ,{w}_{{i}_{k}}]$$ is a

Given a candidate document and a single reference document the
*union* of the longest common subsequences is given by

$$LC{S}_{\cup}(\text{candidate},\text{reference})={\displaystyle \underset{{r}_{i}\in \text{reference}}{\cup}\left\{w\right|w\in \text{LCS}(\text{candidate},{\text{r}}_{i})\}},$$

where $$\text{LCS}(\text{candidate},{r}_{i})$$ is the set of longest common subsequences in the candidate document and
the sentence *r _{i}* from a reference
document.

The ROUGE-L metric is an F-score measure. To calculate it, first calculate the recall and precision scores given by

$${R}_{\text{lcs}}(\text{candidate},\text{reference})=\frac{{\displaystyle \sum _{{r}_{i}\in \text{reference}}\left|{\text{LCS}}_{\cup}({\text{candidate,r}}_{i})\right|}}{\text{numWords}(\text{reference})}$$

$${P}_{\text{lcs}}(\text{candidate},\text{reference})=\frac{{\displaystyle \sum _{{r}_{i}\in \text{reference}}\left|{\text{LCS}}_{\cup}({\text{candidate,r}}_{i})\right|}}{\text{numWords}(\text{candidate})}.$$

Then, the ROUGE-L metric between a candidate document and a
*single* reference document is given by the F-score measure

$${\text{ROUGE-L}}_{\text{single}}(\text{candidate},\text{reference})=\frac{(1+{\beta}^{2}){R}_{\text{lcs}}(\text{candidate},\text{reference}){P}_{\text{lcs}}(\text{candidate},\text{reference})}{{R}_{\text{lcs}}(\text{candidate},\text{reference})+{\beta}^{2}{P}_{\text{lcs}}(\text{candidate},\text{reference})},$$

where the parameter $$\beta $$ controls the relative importance of the precision and recall. Because the ROUGE score favors recall, $$\beta $$ is typically set to a high value.

For sets of multiple reference documents, the ROUGE-L metric is given by

$$\text{ROUGE-L(candidate,references)=m}a{x}_{k}\left\{{\text{ROUGE-L}}_{\text{single}}({\text{candidate,references}}_{k})\right\}.$$

To use the ROUGE-L metric, set the `'ROUGEMethod'`

option to
`'longest-common-subsequences'`

.

Given a weighting function *f* such that *f* has the
property *f(x+y)>f(x)+f(y)* for any positive integers
*x* and *y*, define $$\text{WLCS}(\text{candidate},\text{reference})$$ to be the length of the longest consecutive matches encountered in the
candidate document and a single reference document scored by the weighting function
*f*. For more information about calculating this value, see [1].

The ROUGE-W is metric given an F-score measure which requires the recall and precision scores given by

$${R}_{\text{wlcs}}(\text{candidate},\text{reference})={f}^{-1}\left(\frac{\text{WLCS}(\text{candidate},\text{reference})}{f(\text{numWords}(\text{reference})}\right)$$

$${P}_{\text{wlcs}}(\text{candidate},\text{reference})={f}^{-1}\left(\frac{\text{WLCS}(\text{candidate},\text{reference})}{f(\text{numWords}(\text{candidate}))}\right).$$

The ROUGE-W metric between a candidate document and a *single*
reference document is given by the F-score measure

$${\text{ROUGE-W}}_{\text{single}}(\text{candidate},\text{reference})=\frac{(1+{\beta}^{2}){R}_{\text{wlcs}}(\text{candidate},\text{reference}){P}_{\text{wlcs}}(\text{candidate},\text{reference})}{{R}_{\text{wlcs}}(\text{candidate},\text{reference})+{\beta}^{2}{P}_{\text{wlcs}}(\text{candidate},\text{reference})},$$

where the parameter $$\beta $$ controls the relative importance of the precision and recall. Because the ROUGE score favors recall, $$\beta $$ is typically set to a high value.

For multiple reference documents, the ROUGE-W metric is given by

$$\text{ROUGE-W(candidate,references)=m}a{x}_{k}\left\{{\text{ROUGE-W}}_{\text{single}}({\text{candidate,references}}_{k})\right\}.$$

To use the ROUGE-W metric, set the `'ROUGEMethod'`

option to
`'weighted-longest-common-subsequences'`

.

A *skip-bigram* is an ordered pair of words in a sentence allowing
for arbitrary gaps between them. That is, given a sentence $${c}_{i}=[{c}_{i1},\dots ,{c}_{im}]$$ from a candidate document, where the elements
*c _{ij}* correspond to the words in the sentence,
the pair of words $$[{c}_{i{j}_{1}^{\prime}},{c}_{i{j}_{2}^{\prime}}]$$ is a

The ROUGE-S metric is an F-score measure. To calculate it, first calculate the recall and precision scores given by

$${R}_{\text{skip2}}(\text{candidate},\text{reference})=\frac{{\displaystyle \sum _{{r}_{i}\in \text{reference}}{\displaystyle \sum _{\text{skip-bigram}\in {r}_{i}}\text{Count}(\text{skip-bigram},\text{candidate})}}}{{\displaystyle \sum _{{r}_{i}\in \text{reference}}\text{numSkipBigrams}({r}_{i})}}$$

$${P}_{\text{skip2}}(\text{candidate},\text{reference})=\frac{{\displaystyle \sum _{{r}_{i}\in \text{reference}}{\displaystyle \sum _{\text{skip-bigram}\in {r}_{i}}\text{Count}(\text{skip-bigram},\text{candidate})}}}{{\displaystyle \sum _{{c}_{i}\in \text{candidate}}\text{numSkipBigrams}({c}_{i})}}.$$

where the elements *r _{i}* and

Then, the ROUGE-S metric between a candidate document and a
*single* reference document is given by the F-score measure

$${\text{ROUGE-S}}_{\text{single}}(\text{candidate},\text{reference})=\frac{(1+{\beta}^{2}){R}_{\text{skip2}}(\text{candidate},\text{reference}){P}_{\text{skip2}}(\text{candidate},\text{reference})}{{R}_{\text{skip2}}(\text{candidate},\text{reference})+{\beta}^{2}{P}_{\text{skip2}}(\text{candidate},\text{reference})},$$

For sets of multiple reference documents, the ROUGE-S metric is given by

$$\text{ROUGE-S(candidate,references)=m}a{x}_{k}\left\{{\text{ROUGE-S}}_{\text{single}}({\text{candidate,references}}_{k})\right\}.$$

To use the ROUGE-S metric, set the `'ROUGEMethod'`

option to
`'skip-bigrams'`

.

To also include unigram co-occurrence statistics in the ROUGE-S metric, introduce unigram counts into the recall and precision scores for ROUGE-S. This is equivalent to including start tokens in the candidate and reference documents, since

$$\sum _{\text{skip-bigram}\in {r}_{i}}\left(\text{Count}(\text{skip-bigram},\text{candidate)}\right)}+{\displaystyle \sum _{\text{unigram}\in {r}_{i}}\left(\text{Count}(\text{unigram},\text{candidate}\right)=}{\displaystyle \sum _{\text{skip-bigram}\in {r}_{i}^{+}}\left(\text{Count}(\text{skip-bigram},{\text{candidate}}^{+}\text{)}\right)},$$

where *Count(unigram,candidate)* is the number of
times the specified unigram appears in the candidate document, and $${r}_{i}^{+}$$ and $${\text{candidate}}^{+}$$ denote the reference sentence and the candidate document augmented with
start tokens, respectively.

For sets of multiple reference documents, the ROUGE-SU metric is given by

$$\text{ROUGE-SU(candidate,references)=m}a{x}_{k}\left\{{\text{ROUGE-S}}_{\text{single}}({\text{candidate}}^{+}{\text{,references}}_{k}^{+})\right\},$$

where $${\text{reference}}^{+}$$ is the reference document with sentences augmented with start tokens.

To use the ROUGE-SU metric, set the `'ROUGEMethod'`

option to
`'skip-bigrams-and-unigrams'`

.

[1] Lin, Chin-Yew. "Rouge: A package
for automatic evaluation of summaries." In *Text Summarization Branches
Out*, pp. 74-81. 2004.

`tokenizedDocument`

| `bleuEvaluationScore`

| `bm25Similarity`

| `cosineSimilarity`

| `textrankScores`

| `lexrankScores`

| `mmrScores`

| `extractSummary`

You have a modified version of this example. Do you want to open this example with your edits?

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

Select web siteYou can also select a web site from the following list:

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

- América Latina (Español)
- Canada (English)
- United States (English)

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)