o corpus do portuguÍs

Corpus size
Compare to other corpora
Related resources
Contact us

There are several resources that are based on the older version of the Corpus do PortuguÍs (which was released in 2006), such as:

The older Corpus do PortuguÍs was quite small, however (only 20 million words for the 1900s). As a result, there were many types of resources that we've created for English, which couldn't be created for Portuguese until a much larger corpus was available. With the new one billion word corpus, we can create many of these resources, including:

  • Full-text data, which means that you'd have nearly the entire two billion words of data on your machine

  • Updated data similar to the word frequency, collocates, and n-grams data (including the top 40,000 lemmas of Portuguese)

  • WordAndPhrase for Portuguese, which allows you to browse through the top 40,000 lemmas to see frequency information, definition, collocates, concordances, and synonyms -- all on one page. In addition, you can input your own texts and analyze them with the corpus data.