The new billion word corpus has been tagged for part of speech (e.g. casa = noun, fazem = verb form) and it has been lemmatized (e.g. faço, fizeram, and fizemos are all forms of the lemma fazer). But there are still problems.

If you are a native speaker of Portuguese and can spend even 10-15 minutes a week to help correct errors, we would really appreciate it. You can also "earn credit" that will count towards increased corpus access or corpus data.

