| 
 
			 The Corpus do Português that was
			released in 2016 (Web /
			Dialects: CdP:New) contains about one billion words of data, which
			is about 50 times as much data as in the 1900s portion of the
			previous Corpus do Português (Historical / Genres: CdP:Old). As a result, it provides much richer data on a
			wide range of phenomena. The following are just a few examples. 
			Lexical There are
			282
			verbs with a lemma frequency of between 300 and 600 in CdP:New,
			and which are also found in at least two of the three online
			dictionaries that we have used to correct the lemma lists. The
			following shows how many times these same verbs appear in CdP:Old.
			Of the 282 verbs in CdP:New, about 42% have ten tokens or less in
			CdP:Old, which really isn't enough to say anything about the verbs.
			And only 33 / 282 (about 12%) have 50 tokens or more.
 
				
					| Frequency CdP:Old  (300-600 in
					CdP:New) | # verbs | % verbs | Examples |  
					| 50 tokens or more | 33 | 12% | assoar, arremeter, crepitar |  
					| 26-49 tokens | 45 | 16% | coalhar, arreganhar, fender |  
					| 11-25 tokens | 87 | 31% | emparelhar, arrear, reincidir |  
					| 1-10 tokens | 106 | 38% | aplainar, encerar, solapar |  
					| 0 tokens | 12 | 4% | eletrizar, afobar, conflitar |  
Semantic 
Without enough tokens of a given word, it is impossible to look at collocates
("nearby words") to say much about the meaning and usage of a word. For example,
we have chosen (almost at random) a verb, noun, adjective, and adverb from
CdP:New, to show how many different collocates occur with this word (at least
three times as a lemma, between four words to the left and four words to the
right of the node word) in CdP:New and CdP:Old. (You might need to manually
reset the SEC 1 value to just the 1900s for the CdP:Old to get the correct type
count.) As we see, CdP:New provides much
better data to examine the meaning and usage of words.
 
				
					| lemma (PoS node:collocate) | CdP:New | CdP:Old |  
					| frigir (VERB : NOUN) | 540 | 1 |  
					| faceta (NOUN : NOUN) | 434 | 2 |  
					| interpessoal (ADJ : NOUN) | 453 | 3 |  
					| inconscientemente (ADV : VERB) | 404 | 7 |  
Syntactic 
Because CdP:New is about 50 times as large as the 1900s portion of the CdP:Old,
it provides many more tokens for lower frequency syntactic constructions. The
following shows the number of tokens in the two corpora for a number of
different constructions. (You might need to manually reset the SEC 1 value to
just the 1900s for the CdP:Old to get the correct type count.) 
 
				
					| CdP:New | CdP:Old | search string | explanation | example(s) |  
					| 805 | 3 | parecem|pareciam que [v*3p*] | "Split subject raising" (see
					
					#59 and #60) | parecem
					que querem causar um conflito |  
					| 354 | 9 | os|as [fazer]
					[v*] o|os|as|um|uma | Accusative case
					for 3PL agent in causative
					construction (see
					
					#67, 68, and #71) | não as
					faz perder o entusiasmo |  
					| 481 | 21 | sem lhes
					[v*] | Pre-verbal clitic (see
					
					#62); here just with sem and lhes | sem lhes
					dar tempo de refletirem |  
					| 7151 | 175 | estava*
					sendo [vps*] | Progressive +
					passive (just with estava / estavam) | o Bitcoin 
					estava sendo usado por criminosos |  |