LLM-Guided Agentic Floor Plan Parsing for Accessible Indoor

A new agentic framework leverages Large Language Models (LLMs) to convert floor plan images into structured, retrievable knowledge bases. This system generates safe, accessible indoor navigation instructions for blind and low-vision individuals, reducing reliance on costly infrastructure. It employs a multi-agent module for parsing and a Path Planner with a Safety Evaluator for instruction generation.

Attribute	Detail
Released by	arXiv cs.MA
Release date	April 28, 2026
What it is	An agentic framework for generating accessible indoor navigation instructions from floor plans using LLMs.
Who it is for	Blind and low-vision individuals seeking accessible indoor navigation solutions.
Where to get it	arXiv (arxiv.org/abs/2604.23970)
Price	Not yet disclosed.

The framework converts floor plan images into structured knowledge bases.
It generates accessible indoor navigation instructions for blind and low-vision people.
A multi-agent module parses floor plans into a spatial knowledge graph.
A Path Planner generates instructions, with a Safety Evaluator assessing hazards.
The system outperforms single-call LLM baselines on real-world building data.

What is LLM-Guided Agentic Floor Plan Parsing?
What is new vs the previous version?
How does LLM-Guided Agentic Floor Plan Parsing work?
Benchmarks and evidence
Who should care
How to use LLM-Guided Agentic Floor Plan Parsing today
LLM-Guided Agentic Floor Plan Parsing vs competitors
Risks, limits, and myths
FAQ
Glossary
Next step
Sources

The system offers a scalable solution for accessible indoor navigation.
It reduces the need for expensive, per-building infrastructure.
Multi-agent LLM systems can achieve self-correction and iterative refinement.
The framework significantly improves navigation success rates over single-call LLMs.
Safety evaluation is integrated into the path planning process.

What is LLM-Guided Agentic Floor Plan Parsing?

LLM-Guided Agentic Floor Plan Parsing is an agentic framework that transforms a single floor plan image into a structured, retrievable knowledge base. This system generates safe, accessible navigation instructions for blind and low-vision (BLV) individuals [Source: arXiv:2604.23970]. It aims to provide lightweight infrastructure for indoor navigation accessibility [Source: arXiv:2604.23970].

What is new vs the previous version?

This framework introduces a novel agentic approach for floor plan parsing and navigation.

Multi-Agent Module: It uses a multi-agent module for parsing floor plans into a spatial knowledge graph [Source: arXiv:2604.23970].
Self-Correcting Pipeline: The parsing includes iterative retry loops and corrective feedback for accuracy [Source: arXiv:2604.23970].
Integrated Safety Evaluator: A Safety Evaluator agent assesses potential hazards along each generated route [Source: arXiv:2604.23970].
Performance Gains: It consistently outperforms single-call LLM baselines in navigation success rates [Source: arXiv:2604.23970].

How does LLM-Guided Agentic Floor Plan Parsing work?

The system operates in two main phases to generate accessible navigation instructions.

Floor Plan Parsing: A multi-agent module processes a single floor plan image [Source: arXiv:2604.23970]. This module parses the floor plan into a spatial knowledge graph [Source: arXiv:2604.23970]. It uses a self-correcting pipeline with iterative retry loops and corrective feedback [Source: arXiv:2604.23970].
Path Planning and Safety Evaluation: A Path Planner generates accessible navigation instructions [Source: arXiv:2604.23970]. A Safety Evaluator agent then assesses potential hazards for each route [Source: arXiv:2604.23970]. The LLM acts as an agent by incorporating a role, environment, and memory as inputs [Source: 5].

Benchmarks and evidence

Evaluation Metric	UMBC MP-1 (Short Routes)	UMBC MP-1 (Medium Routes)	UMBC MP-1 (Long Routes)	UMBC MP-3 (Short Routes)	UMBC MP-3 (Medium Routes)	UMBC MP-3 (Long Routes)	Source
Agentic Framework Success Rate	92.31%	76.92%	61.54%	76.92%	61.54%	38.46%	arXiv:2604.23970
Claude 3.7 Sonnet Baseline Success Rate	84.62%	69.23%	53.85%	61.54%	46.15%	23.08%	arXiv:2604.23970

The system was evaluated on the UMBC Math and Psychology building (floors MP-1 and MP-3) and the CVC-FP benchmark [Source: arXiv:2604.23970]. It showed consistent gains over single-call LLM baselines [Source: arXiv:2604.23970]. For example, on MP-1, it achieved 92.31% success for short routes, outperforming Claude 3.7 Sonnet at 84.62% [Source: arXiv:2604.23970].

Who should care

Builders

Builders of AI systems for accessibility should care about this framework. It demonstrates a scalable solution for indoor navigation for BLV individuals [Source: arXiv:2604.23970]. The multi-agent approach with self-correction is a valuable design pattern [Source: arXiv:2604.23970].

Enterprise

Enterprises in hospitality, retail, and public services can leverage this technology. It can enhance accessibility for BLV customers and employees [Source: arXiv:2604.23970]. This reduces the need for expensive per-building infrastructure [Source: arXiv:2604.23970].

End users

Blind and low-vision individuals are the primary beneficiaries of this innovation. It offers improved and reliable indoor navigation instructions [Source: arXiv:2604.23970]. This enhances independence and safety in unfamiliar indoor environments [Source: arXiv:2604.23970].

Investors

Investors interested in AI for social good and accessibility technology should take note. This framework presents a scalable and impactful application of LLMs [Source: arXiv:2604.23970]. The market for accessibility solutions is growing, driven by regulatory and social demands [Source: 7].

How to use LLM-Guided Agentic Floor Plan Parsing today

The framework is currently presented as an academic paper on arXiv [Source: arXiv:2604.23970]. Direct public access or an API for immediate use is not yet disclosed. Researchers can access the paper for implementation details [Source: arXiv:2604.23970].

LLM-Guided Agentic Floor Plan Parsing vs competitors

Feature/System	LLM-Guided Agentic Floor Plan Parsing	Single-Call LLM Baselines (e.g., Claude 3.7 Sonnet)	Traditional Indoor Navigation (e.g., costly infrastructure)
Input	Single floor plan image	Text prompts/limited image input	Dedicated sensors, beacons, or pre-mapped environments
Parsing Method	Multi-agent module, self-correcting, iterative retry loops	Direct LLM interpretation	Manual mapping or specialized computer vision
Output	Structured spatial knowledge base, accessible navigation instructions with safety evaluation	Navigation instructions (potentially less reliable/safe)	Navigation instructions based on infrastructure data
Infrastructure Cost	Lightweight	Low (software-based)	High (per-building installation)
Accessibility for BLV	High, with safety considerations	Moderate, less reliable	High, but limited by infrastructure availability
Performance (UMBC MP-1 Short)	92.31% success rate	84.62% success rate	Not yet disclosed.
Scalability	Scalable solution	Limited by single-call LLM robustness	Limited by infrastructure deployment

The agentic framework significantly outperforms single-call LLM baselines like Claude 3.7 Sonnet in navigation success rates [Source: arXiv:2604.23970]. It offers a lightweight infrastructure solution compared to traditional methods [Source: arXiv:2604.23970]. Multimodal LLMs, which integrate visual and textual reasoning, are a promising frontier for interpretable assessments [Source: 6]. GPT-5.5 (xhigh) currently ranks #1 on the Artificial Analysis LLM Leaderboard [Source: 2]. GLM-5.1 leads open-source LLMs in coding performance [Source: 3].

Risks, limits, and myths

Floor Plan Accuracy: The system’s performance depends on the clarity and accuracy of the input floor plan image. Imperfect or outdated floor plans could lead to incorrect navigation instructions.
Dynamic Environments: The current framework may struggle with dynamic changes in indoor environments, such as temporary obstacles or furniture rearrangements.
LLM Hallucinations: Like all LLMs, there’s a risk of the model generating plausible but incorrect information, especially in complex parsing tasks.
Myth: LLMs alone are sufficient for complex tasks. This research demonstrates that an agentic framework with iterative self-correction significantly outperforms single-call LLMs [Source: arXiv:2604.23970].
Myth: Costly infrastructure is always necessary for indoor navigation. This system aims to provide accessible navigation with lightweight infrastructure [Source: arXiv:2604.23970].

FAQ

What is the primary goal of LLM-Guided Agentic Floor Plan Parsing?: The primary goal is to generate safe, accessible indoor navigation instructions for blind and low-vision individuals using a single floor plan image [Source: arXiv:2604.23970].
How does the system process a floor plan image?: A multi-agent module parses the floor plan image into a spatial knowledge graph through a self-correcting pipeline [Source: arXiv:2604.23970].
What role does the Safety Evaluator play?: The Safety Evaluator agent assesses potential hazards along each generated navigation route [Source: arXiv:2604.23970].
Is this system better than using a single LLM for navigation?: Yes, the agentic framework consistently outperforms single-call LLM baselines in navigation success rates [Source: arXiv:2604.23970].
What kind of infrastructure does this system require?: It is designed to work with lightweight infrastructure, reducing the need for costly per-building installations [Source: arXiv:2604.23970].
On which datasets was the system evaluated?: The system was evaluated on the UMBC Math and Psychology building (floors MP-1 and MP-3) and the CVC-FP benchmark [Source: arXiv:2604.23970].
Can this technology be used in commercial applications?: Not yet disclosed. The research paper suggests it is a scalable solution for accessible indoor navigation [Source: arXiv:2604.23970].
What is an agentic framework in the context of LLMs?: An agentic framework extends an LLM by adding supporting elements like a role, environment, and memory, allowing it to perform complex tasks iteratively [Source: 5].

Glossary

Agentic Framework: A system where an LLM is augmented with a role, environment, and memory to perform tasks iteratively and autonomously [Source: 5].
Blind and Low-Vision (BLV): Individuals with significant visual impairment, requiring specialized accessibility solutions [Source: arXiv:2604.23970].
Floor Plan Parsing: The process of extracting structural and spatial information from a floor plan image [Source: arXiv:2604.23970].
Large Language Model (LLM): A type of artificial intelligence model trained on vast amounts of text data, capable of understanding and generating human-like text [Source: 5].
Multimodal LLM: An LLM that can process and integrate information from multiple modalities, such as text and images [Source: 6].
Spatial Knowledge Graph: A structured representation of spatial relationships and entities within an environment, derived from a floor plan [Source: arXiv:2604.23970].

Explore the full research paper on arXiv to understand the technical details and potential for implementation.

Sources

Author

siego237

Writes for FrontierWisdom on AI systems, automation, decentralized identity, and frontier infrastructure, with a focus on turning emerging technology into practical playbooks, implementation roadmaps, and monetization strategies for operators, builders, and consultants.

LLM-Guided Agentic Floor Plan Parsing for Accessible Indoor Navigation

What is LLM-Guided Agentic Floor Plan Parsing?

What is new vs the previous version?

How does LLM-Guided Agentic Floor Plan Parsing work?

Benchmarks and evidence