o corpus do português

o corpus do português


There are several resources that are based on the older version of the Corpus do Português (which was released in 2006), such as:

The older Corpus do Português was quite small, however (only 20 million words for the 1900s). As a result, there were many types of resources that we've created for English, which couldn't be created for Portuguese until a much larger corpus was available. With the new one billion word corpus, we can create many of these resources, including:

  • Full-text data, which means that you'd have nearly the entire two billion words of data on your machine

  • Updated data similar to the word frequency, collocates, and n-grams data (including the top 40,000 lemmas of Portuguese)

  • WordAndPhrase for Portuguese, which allows you to browse through the top 40,000 lemmas to see frequency information, definition, collocates, concordances, and synonyms -- all on one page. In addition, you can input your own texts and analyze them with the corpus data.