DOWNLOAD FREQUENCY / N-GRAM DATA

[CITING THE CORPUS]

This website allows you to quickly and easily search more than 45 million words in almost 57,000 Portuguese texts from the 1300s to the 1900s. The interface allows you to search for exact words or phrases, wildcards, lemmas, part of speech, or any combinations of these.  You can also search for surrounding words (collocates) within a ten-word window (e.g. all nouns somewhere near cadeia, all adjectives near mulher, or all nouns near girar). 

The corpus also allows you to easily compare the frequency of and distribution of words, phrases, and grammatical constructions across texts, in at least three ways:

  • By register: comparisons between spoken, fiction, newspaper, and academic

  • By dialect: comparison of European and Brazilian Portuguese

  • By historical period: compare different centuries from the 1300s to the 1900s

You can also easily carry out semantically-based queries of the corpus. For example, you can compare and contrast the collocates of two related words, to determine the difference in meaning between these words.  You can find the frequency and distribution of synonyms for more than 20,000 words and also compare their  frequency in different registers, countries, and historical periods, and use these word lists as part of other queries. Finally, you can easily create your own lists of semantically-related words, and then use them directly as part of the query.

Please feel free to take a five minute guided tour, which will show the major features of the corpus.  A simple click for each query will automatically fill in the form for you, search through the 45 million words of text, and then display the results.  This corpus of Portuguese is fast, free, easy to use, and we believe that it offers important features not found in any other interface to a large corpus of Portuguese.

bnc search british national corpus corpora English register registers wordlists word lists WordNet BYU Mark Davies