LegalBench-BR: First Brazilian Legal AI Benchmark Released

LegalBench-BR is the first public benchmark for evaluating large language models on Brazilian legal text classification, comprising 3,105 appellate proceedings from Santa Catarina State Court across five legal areas.

Released by	Not yet disclosed
Release date	April 22, 2026
What it is	First public benchmark for evaluating LLMs on Brazilian legal text classification
Who it’s for	AI researchers and legal technology developers
Where to get it	Full dataset and model released publicly
Price	Free

LegalBench-BR contains 3,105 Brazilian court proceedings classified across five legal areas
BERTimbau-LoRA achieves 87.6% accuracy, outperforming GPT-4o mini by 28 percentage points
Commercial LLMs show systematic bias toward civil law classification
Fine-tuned models eliminate classification failures that plague general-purpose LLMs
Dataset enables reproducible research in Portuguese legal natural language processing

What is LegalBench-BR
What is new vs previous benchmarks
How does LegalBench-BR work
Benchmarks and evidence
Who should care
How to use LegalBench-BR today
LegalBench-BR vs competitors
Risks, limits, and myths

Domain-adapted fine-tuning significantly outperforms general-purpose LLMs on Brazilian legal classification
Commercial models exhibit systematic classification bias that fine-tuning eliminates
LoRA fine-tuning achieves superior performance while updating only 0.3% of model parameters
Administrative law classification proves particularly challenging for general-purpose models
The benchmark enables reproducible Portuguese legal NLP research

What is LegalBench-BR

LegalBench-BR is a benchmark dataset for evaluating large language models on Brazilian legal text classification tasks. The dataset comprises 3,105 appellate proceedings from the Santa Catarina State Court collected via the DataJud API. Legal documents are annotated across five legal areas through LLM-assisted labeling with heuristic validation.

The benchmark addresses the gap in Portuguese legal natural language processing evaluation tools. Legal proceedings span five classification categories covering major areas of Brazilian law. The dataset enables systematic evaluation of model performance on domain-specific legal text understanding.

What is new vs previous benchmarks

LegalBench-BR introduces the first public benchmark specifically designed for Brazilian legal text classification.

Feature	LegalBench-BR	Previous Legal Benchmarks
Language focus	Brazilian Portuguese	Primarily English
Legal system	Brazilian civil law	Common law systems
Data source	Santa Catarina State Court	Various international courts
Classification areas	5 Brazilian legal domains	General legal categories
Annotation method	LLM-assisted with heuristic validation	Manual annotation

How does LegalBench-BR work

LegalBench-BR evaluates models through a structured classification pipeline across five legal domains.

Legal proceedings are collected from Santa Catarina State Court via DataJud API
Documents undergo LLM-assisted annotation with heuristic validation for quality control
Text is classified across five legal areas: administrative, civil, criminal, tax, and labor law
Models are evaluated on a class-balanced test set using accuracy and macro-F1 metrics
Performance comparison reveals domain adaptation effectiveness versus general-purpose models

Benchmarks and evidence

BERTimbau-LoRA demonstrates superior performance compared to commercial large language models on Brazilian legal classification.

Model	Accuracy	Macro-F1	Parameters Updated	Source
BERTimbau-LoRA	87.6%	0.87	0.3%	LegalBench-BR paper
Claude 3.5 Haiku	Not disclosed	0.65	N/A	LegalBench-BR paper
GPT-4o mini	Not disclosed	0.59	N/A	LegalBench-BR paper

Administrative law classification reveals the largest performance gap between models. GPT-4o mini achieves F1 score of 0.00 on administrative law while BERTimbau-LoRA reaches F1 score of 0.91. Commercial models exhibit systematic bias toward civil law classification, absorbing ambiguous classes rather than discriminating them.

Who should care

Builders

AI developers building legal technology for Brazilian markets need domain-adapted models for accurate classification. LegalBench-BR provides evaluation framework for Portuguese legal NLP applications. The benchmark enables systematic comparison of model architectures on Brazilian legal text.

Enterprise

Law firms and legal technology companies require reliable classification systems for Brazilian legal documents. The benchmark demonstrates that general-purpose LLMs cannot substitute domain-adapted models for legal classification tasks. Fine-tuning approaches offer superior performance at zero marginal inference cost.

End users

Legal professionals working with Brazilian court documents benefit from improved automated classification systems. The benchmark enables development of more accurate legal document processing tools. Users gain access to better legal technology through domain-specific model evaluation.

Investors

Legal technology investment decisions require understanding of model performance on domain-specific tasks. LegalBench-BR provides evidence that specialized fine-tuning outperforms general-purpose models significantly. The benchmark supports investment thesis for domain-adapted legal AI solutions.

How to use LegalBench-BR today

Researchers and developers can access LegalBench-BR through the publicly released dataset and model pipeline.

Download the full dataset from the public repository containing 3,105 annotated legal proceedings
Access the BERTimbau-LoRA model weights and training configuration files
Install the evaluation pipeline using the provided Python scripts and dependencies
Run benchmark evaluation on your models using the class-balanced test set
Compare results against baseline performance metrics for accuracy and macro-F1 scores

LegalBench-BR vs competitors

LegalBench-BR addresses Portuguese legal text classification while existing benchmarks focus on English legal tasks.

Benchmark	Language	Legal System	Task Type	Dataset Size
LegalBench-BR	Portuguese	Brazilian civil law	Classification	3,105 proceedings
LegalBench	English	US common law	Multi-task	Various sizes
LexGLUE	English	EU/US law	Multi-task	Various sizes

Risks, limits, and myths

Dataset limited to Santa Catarina State Court proceedings may not represent all Brazilian legal domains
Five-class classification simplifies complex legal categorization that occurs in practice
LLM-assisted annotation with heuristic validation may introduce systematic labeling biases
Performance metrics focus on classification accuracy rather than legal reasoning quality
Benchmark does not evaluate model performance on legal document generation or analysis
Results may not generalize to other Portuguese-speaking legal systems outside Brazil

FAQ

What is LegalBench-BR and how does it work?

LegalBench-BR is the first public benchmark for evaluating large language models on Brazilian legal text classification, containing 3,105 court proceedings classified across five legal areas.

How does BERTimbau-LoRA compare to GPT-4o mini on Brazilian legal tasks?

BERTimbau-LoRA achieves 87.6% accuracy and 0.87 macro-F1, outperforming GPT-4o mini by 28 percentage points on Brazilian legal classification tasks.

Why do commercial LLMs perform poorly on Brazilian legal classification?

Commercial LLMs exhibit systematic bias toward civil law classification and fail to discriminate between legal categories, particularly struggling with administrative law classification.

What legal areas does LegalBench-BR cover?

LegalBench-BR covers five legal areas: administrative law, civil law, criminal law, tax law, and labor law from Brazilian court proceedings.

How much data does LegalBench-BR contain?

LegalBench-BR contains 3,105 appellate proceedings from the Santa Catarina State Court collected via the DataJud API with LLM-assisted annotation.

Can I access LegalBench-BR dataset and models?

Yes, the full dataset, BERTimbau-LoRA model, and evaluation pipeline are released publicly to enable reproducible research in Portuguese legal NLP.

What makes LoRA fine-tuning effective for legal classification?

LoRA fine-tuning updates only 0.3% of model parameters while achieving superior performance and eliminating classification failures at zero marginal inference cost.

How does LegalBench-BR compare to other legal AI benchmarks?

LegalBench-BR is the first benchmark specifically designed for Brazilian Portuguese legal text, while existing benchmarks like LegalBench and LexGLUE focus on English legal tasks.

What evaluation metrics does LegalBench-BR use?

LegalBench-BR evaluates models using accuracy and macro-F1 scores on a class-balanced test set across five Brazilian legal classification categories.

Who should use LegalBench-BR for AI development?

AI researchers, legal technology developers, and companies building Portuguese legal NLP applications should use LegalBench-BR for systematic model evaluation and comparison.

Glossary

BERTimbau-LoRA: Portuguese BERT model fine-tuned using Low-Rank Adaptation technique for Brazilian legal text classification
DataJud API: Brazilian National Council of Justice API for accessing court proceeding data from state courts
Macro-F1: Evaluation metric calculating F1 score for each class separately then averaging, giving equal weight to all classes
LoRA (Low-Rank Adaptation): Parameter-efficient fine-tuning technique that updates only a small percentage of model parameters while maintaining performance
Santa Catarina State Court (TJSC): Brazilian state court system serving Santa Catarina state, source of legal proceedings in LegalBench-BR dataset
Heuristic validation: Rule-based quality control method used to verify accuracy of LLM-assisted annotation in dataset creation

Download the LegalBench-BR dataset and evaluation pipeline from the public repository to benchmark your models on Brazilian legal text classification.

Sources

LegalBench-BR: A Benchmark for Evaluating Large Language Models on Brazilian Legal Decision Classification. arXiv:2604.18878v1. April 22, 2026.

Author

Siegfried Kamgo

Founder and editorial lead at FrontierWisdom. Engineer turned operator-analyst writing about AI systems, automation infrastructure, decentralised stacks, and the practical economics of frontier technology. Focus: turning fast-moving releases into durable, implementation-ready playbooks.