The corpus is composed of about one billion billion words in more than one million web pages from 85,000 websites in four different Portuguese-speaking countries (more information). You can download metadata (country, genre, URL, title, # words, etc) for all 2.0 million web pages (ZIP file: 115 MB). See also some good examples of using the corpus to look at differences between the dialects.

Country Code General (may also include blogs) (Only) Blogs Total
    Words Web pages Web sites Words Web pages Web sites Words Web pages Web sites
Brazil BR 319,435,592 286,712 25,351 336,244,918 321,305 35,248 655,680,510 608,017 60,599
Portugal PT 136,144,529 184,512 12,082 190,503,822 221,338 9,005 326,648,351 405,850 21,087
Angola AO 17,877,399 19,178 1,240 17,255,595 21,233 418 35,132,994 40,411 1,658
Mozambique MZ 16,936,743 19,236 1,065 15,070,829 17,910 404 32,007,572 37,146 1,469
TOTAL 490,394,263 509,638 39,738 559,075,164 581,786 45,075 1,049,469,427 1,091,424 84,813

Notes on duplicate texts.