The corpus is composed of about one billion billion words in more than one million web pages from 85,000 websites in four different Portuguese-speaking countries (more information). You can download metadata (country, genre, URL, title, # words, etc) for all 2.0 million web pages (ZIP file: 115 MB). See also some good examples of using the corpus to look at differences between the dialects.

Country Code General (may also include blogs) (Only) Blogs Total
    Words Web pages Web sites Words Web pages Web sites Words Web pages Web sites
BrasilBR 347,834,509 291,592 25,535 365,213,219 327,093 35,345 713,047,728 618,685 60,880
PortugalPT 171,208,029 186,772 12,127 214,924,924 225,129 9,022 386,132,953 411,901 21,149
AngolaAO 19,629,046 20,021 1,257 19,010,093 22,322 419 38,639,139 42,343 1,676
Moçambique MZ 18,562,424 19,904 1,074 16,587,071 18,455 404 35,149,495 38,359 1,478
TOTAL               1,172,969,315 1,111,288 85,183

Notes on duplicate texts.