There are several resources that are based on the older
version of the Corpus do Português (which was released in 2006), such as:
The older Corpus do Português was quite small,
however (only 20 million words for the 1900s). As a result, there were many
types of resources that we've created for English, which couldn't be created for
Portuguese until a much larger corpus was available. With the new
one billion word
corpus, we can create many of these resources, including:
-
Full-text data, which means that you'd have nearly the entire two
billion words of data on your machine
-
Updated data similar to the
word frequency,
collocates, and
n-grams data (including the top 40,000
lemmas of Portuguese)
-
WordAndPhrase for Portuguese, which allows you to browse through the
top 40,000 lemmas to see frequency information, definition, collocates,
concordances, and synonyms -- all on one page. In addition, you can input your own texts and analyze them with the corpus data.
|
|