LegalBench-BR: First Brazilian Legal AI Benchmark Released

LegalBench-BR is the first public benchmark for evaluating language models on Brazilian legal text classification, featuring 3,105 appellate proceedings from Santa Catarina State Court across five legal areas with BERTimbau-LoRA achieving 87.6% accuracy.

Released by	Not yet disclosed
Release date	April 22, 2024
What it is	First public benchmark for Brazilian legal text classification
Who it is for	Legal AI researchers and Portuguese NLP developers
Where to get it	Full dataset and model released publicly
Price	Free

LegalBench-BR contains 3,105 appellate proceedings from Santa Catarina State Court collected via DataJud API
BERTimbau-LoRA achieves 87.6% accuracy with only 0.3% parameter updates, outperforming commercial LLMs by 22-28 percentage points
GPT-4o mini and Claude 3.5 Haiku show systematic bias toward civil law classification
Fine-tuned models eliminate commercial LLM failure modes in administrative law classification
Complete dataset, model, and pipeline released for reproducible Portuguese legal NLP research

What is LegalBench-BR
What is new vs previous benchmarks
How does LegalBench-BR work
Benchmarks and evidence
Who should care
How to use LegalBench-BR today
LegalBench-BR vs competitors
Risks, limits, and myths

Domain-adapted fine-tuning significantly outperforms general-purpose LLMs on Brazilian legal classification tasks
Commercial LLMs exhibit systematic classification bias that fine-tuning eliminates
LoRA fine-tuning provides efficient parameter updates with zero marginal inference cost
Administrative law classification proves particularly challenging for general-purpose models

What is LegalBench-BR

LegalBench-BR is the first public benchmark specifically designed for evaluating language models on Brazilian legal text classification tasks. The benchmark comprises 3,105 appellate proceedings from the Santa Catarina State Court (TJSC) collected through the DataJud API from Brazil’s National Council of Justice (CNJ). The dataset covers five distinct legal areas and uses LLM-assisted labeling with heuristic validation for annotation quality.

What is new vs previous benchmarks

LegalBench-BR introduces the first Portuguese-language legal classification benchmark, addressing a gap in existing legal AI evaluation tools.

Feature	LegalBench-BR	Previous Legal Benchmarks
Language	Portuguese (Brazilian)	Primarily English
Legal System	Brazilian civil law	Common law systems
Data Source	Santa Catarina State Court	Various international courts
Classification Areas	5 Brazilian legal domains	General legal categories
Annotation Method	LLM-assisted with heuristic validation	Manual expert annotation

How does LegalBench-BR work

LegalBench-BR operates through a systematic evaluation framework for Brazilian legal text classification.

Data collection from Santa Catarina State Court via DataJud API provides 3,105 appellate proceedings
LLM-assisted labeling with heuristic validation annotates texts across five legal areas
Class-balanced test set ensures fair evaluation across all legal domains
BERTimbau-LoRA fine-tuning updates only 0.3% of model parameters for efficient adaptation
Evaluation metrics include accuracy and macro-F1 scores for comprehensive performance assessment

Benchmarks and evidence

Performance evaluations demonstrate significant advantages of domain-adapted models over general-purpose LLMs on Brazilian legal classification.

Model	Accuracy	Macro-F1	Administrative Law F1	Source
BERTimbau-LoRA	87.6%	0.87	0.91	LegalBench-BR paper
Claude 3.5 Haiku	Not disclosed	0.65	0.08	LegalBench-BR paper
GPT-4o mini	Not disclosed	0.59	0.00	LegalBench-BR paper

Who should care

Builders

Legal AI developers building Portuguese-language applications need domain-specific benchmarks for accurate model evaluation. LegalBench-BR provides the first standardized evaluation framework for Brazilian legal text classification, enabling developers to measure model performance against established baselines.

Enterprise

Law firms and legal technology companies operating in Brazil require specialized AI models for document classification and case management. The benchmark demonstrates that general-purpose LLMs cannot substitute for domain-adapted models in Brazilian legal contexts, informing technology investment decisions.

End users

Legal professionals and researchers working with Brazilian court documents benefit from improved AI classification accuracy. The benchmark’s findings show that fine-tuned models eliminate systematic biases present in commercial LLMs, leading to more reliable legal document processing.

Investors

Venture capital and legal tech investors can assess the competitive landscape for Portuguese legal AI solutions. The benchmark reveals significant performance gaps between general-purpose and specialized models, indicating market opportunities for domain-specific legal AI development.

How to use LegalBench-BR today

Researchers and developers can access LegalBench-BR through the publicly released dataset and evaluation pipeline.

Download the complete dataset from the public repository containing 3,105 annotated legal proceedings
Install the provided evaluation pipeline with preprocessing and classification scripts
Load the BERTimbau-LoRA model weights for baseline comparison testing
Run evaluation scripts on the class-balanced test set using accuracy and macro-F1 metrics
Compare custom model performance against established BERTimbau-LoRA, Claude 3.5 Haiku, and GPT-4o mini baselines

LegalBench-BR vs competitors

LegalBench-BR addresses Portuguese legal classification while existing benchmarks focus on English-language legal tasks.

Benchmark	Language	Task Type	Legal System	Dataset Size
LegalBench-BR	Portuguese	Classification	Brazilian civil law	3,105 proceedings
LegalBench	English	Multi-task	US common law	Not disclosed
LexGLUE	English	Multi-task	EU/US law	Not disclosed
CaseHOLD	English	Classification	US common law	Not disclosed

Risks, limits, and myths

Dataset limited to Santa Catarina State Court proceedings may not generalize to other Brazilian jurisdictions
Five-class classification scope excludes more granular legal subcategories and specialized domains
LLM-assisted annotation with heuristic validation may introduce systematic labeling biases
Performance metrics focus on classification accuracy without evaluating legal reasoning quality
Benchmark does not address multilingual legal documents or cross-jurisdictional legal analysis
Fine-tuning requirements may limit accessibility for researchers without computational resources

FAQ

What makes LegalBench-BR different from other legal AI benchmarks?

LegalBench-BR is the first public benchmark specifically designed for Brazilian legal text classification, using Portuguese-language appellate proceedings from Santa Catarina State Court across five legal areas.

How accurate is BERTimbau-LoRA compared to GPT-4o mini on Brazilian legal texts?

BERTimbau-LoRA achieves 87.6% accuracy and 0.87 macro-F1, outperforming GPT-4o mini by 28 percentage points while updating only 0.3% of model parameters.

Why do commercial LLMs perform poorly on administrative law classification?

GPT-4o mini scores F1 = 0.00 and Claude 3.5 Haiku scores F1 = 0.08 on administrative law due to systematic bias toward civil law classification, absorbing ambiguous classes rather than discriminating them.

What data source does LegalBench-BR use for legal proceedings?

LegalBench-BR uses 3,105 appellate proceedings from Santa Catarina State Court (TJSC) collected via the DataJud API from Brazil’s National Council of Justice (CNJ).

How does LoRA fine-tuning compare to full model training for legal classification?

LoRA fine-tuning updates only 0.3% of BERTimbau parameters while achieving 87.6% accuracy, providing efficient domain adaptation with zero marginal inference cost compared to full model retraining.

Can I use LegalBench-BR for legal systems outside Brazil?

LegalBench-BR is specifically designed for Brazilian civil law system and Portuguese language, limiting direct applicability to other legal systems without adaptation.

What annotation method does LegalBench-BR use for labeling legal texts?

LegalBench-BR employs LLM-assisted labeling with heuristic validation to annotate 3,105 legal proceedings across five legal areas for classification tasks.

Is the LegalBench-BR dataset available for commercial use?

The researchers release the full dataset, model, and pipeline publicly to enable reproducible research in Portuguese legal NLP, though specific licensing terms are not yet disclosed.

How many legal areas does LegalBench-BR cover for classification?

LegalBench-BR covers five distinct legal areas for classification tasks, including administrative law, civil law, and three additional legal domains from Brazilian jurisprudence.

What evaluation metrics does LegalBench-BR use for model comparison?

LegalBench-BR uses accuracy and macro-F1 scores on a class-balanced test set to evaluate model performance across five legal classification categories.

Glossary

BERTimbau: Portuguese-language BERT model specifically trained on Brazilian Portuguese texts for natural language processing tasks
DataJud API: Application programming interface provided by Brazil’s National Council of Justice (CNJ) for accessing court proceedings and legal documents
LoRA (Low-Rank Adaptation): Parameter-efficient fine-tuning method that updates only a small percentage of model parameters while maintaining performance
Macro-F1: Evaluation metric that calculates F1 score for each class separately then averages them, giving equal weight to all classes regardless of frequency
Santa Catarina State Court (TJSC): State-level judicial court in Santa Catarina, Brazil, providing appellate proceedings for the LegalBench-BR dataset
Systematic bias: Consistent tendency of a model to favor certain classifications over others, leading to predictable errors in specific categories

Download the LegalBench-BR dataset and evaluation pipeline from the public repository to benchmark your Portuguese legal AI models.

Sources

LegalBench-BR: A Benchmark for Evaluating Large Language Models on Brazilian Legal Decision Classification. arXiv:2604.18878v1. April 22, 2024.
LexGenius: An Expert-Level Benchmark for Large Language Models in Legal General Intelligence. arXiv:2512.04578.
Benchmarking Vietnamese Legal Knowledge of Large Language Models. arXiv:2512.14554v5.
PoliLegalLM: A Technical Report on a Large Language Model for Political and Legal Affairs. arXiv:2604.17543.
Professional Reasoning Benchmark – Legal. Scale Labs.

Author

Siegfried Kamgo

Founder and editorial lead at FrontierWisdom. Engineer turned operator-analyst writing about AI systems, automation infrastructure, decentralised stacks, and the practical economics of frontier technology. Focus: turning fast-moving releases into durable, implementation-ready playbooks.