The new billion word corpus has been tagged for part of speech (e.g.
casa = noun, fazem = verb form) and it has been lemmatized (e.g.
faço, fizeram, and fizemos are all forms of the lemma
fazer). But there are still problems.
If you are a native speaker of
Portuguese and can spend even 10-15 minutes a week to
help correct errors, we would really appreciate it. You can
also "earn credit" that will count
towards increased corpus access or corpus data.
More information and tutorial
Thanks for your help!