The following is a summary of the composition of the corpus. You can also download a file that lists all of the ~57,000 texts in the corpus.

The corpus is composed of more than 45 million words in nearly 57,000 texts. There are 20 million words from the 1900s, 10 million from the 1800s, and 15 million words from the 1300s-1700s. For the 1900s, there are six million words from fiction, six million from newspapers and magazines, six million from academic texts, and two million from spoken. For each of these four genres (and therefore overall) the texts from the 1900s are evenly divided between texts from Portugal and texts from Brazil.

 # WORDS CENTURY COUNTRY GENRE
   Historical
550,968 1200s Portugal  
1,316,268 1300s Portugal  
2,875,653 1400s Portugal  
4,435,031 1500s Portugal / Brazil  
3,407,741 1600s Portugal / Brazil  
2,234,951 1700s Portugal / Brazil  
10,008,622 1800s Portugal / Brazil  
 
   Modern Portuguese: Genres / Countries
3,087,052 1900s Portugal Academic
3,271,328 1900s Portugal Newspaper
3,048,020 1900s Portugal Fiction
1,100,303 1900s Portugal Spoken
 
2,816,802 1900s Brazil Academic
3,346,988 1900s Brazil Newspaper
3,028,646 1900s Brazil Fiction
1,078,586 1900s Brazil Spoken