A new agentic framework leverages Large Language Models (LLMs) to convert floor plan images into structured, retrievable knowledge bases. This system generates safe, accessible indoor navigation instructions for blind and low-vision individuals, reducing reliance on costly infrastructure. It employs a multi-agent module for parsing and a Path Planner with a Safety Evaluator for instruction generation.
| Attribute | Detail |
|---|---|
| Released by | arXiv cs.MA |
| Release date | |
| What it is | An agentic framework for generating accessible indoor navigation instructions from floor plans using LLMs. |
| Who it is for | Blind and low-vision individuals seeking accessible indoor navigation solutions. |
| Where to get it | arXiv (arxiv.org/abs/2604.23970) |
| Price | Not yet disclosed. |
- The framework converts floor plan images into structured knowledge bases.
- It generates accessible indoor navigation instructions for blind and low-vision people.
- A multi-agent module parses floor plans into a spatial knowledge graph.
- A Path Planner generates instructions, with a Safety Evaluator assessing hazards.
- The system outperforms single-call LLM baselines on real-world building data.
- What is LLM-Guided Agentic Floor Plan Parsing?
- What is new vs the previous version?
- How does LLM-Guided Agentic Floor Plan Parsing work?
- Benchmarks and evidence
- Who should care
- How to use LLM-Guided Agentic Floor Plan Parsing today
- LLM-Guided Agentic Floor Plan Parsing vs competitors
- Risks, limits, and myths
- FAQ
- Glossary
- Next step
- Sources
- The system offers a scalable solution for accessible indoor navigation.
- It reduces the need for expensive, per-building infrastructure.
- Multi-agent LLM systems can achieve self-correction and iterative refinement.
- The framework significantly improves navigation success rates over single-call LLMs.
- Safety evaluation is integrated into the path planning process.
What is LLM-Guided Agentic Floor Plan Parsing?
LLM-Guided Agentic Floor Plan Parsing is an agentic framework that transforms a single floor plan image into a structured, retrievable knowledge base. This system generates safe, accessible navigation instructions for blind and low-vision (BLV) individuals [Source: arXiv:2604.23970]. It aims to provide lightweight infrastructure for indoor navigation accessibility [Source: arXiv:2604.23970].
What is new vs the previous version?
This framework introduces a novel agentic approach for floor plan parsing and navigation.
- Multi-Agent Module: It uses a multi-agent module for parsing floor plans into a spatial knowledge graph [Source: arXiv:2604.23970].
- Self-Correcting Pipeline: The parsing includes iterative retry loops and corrective feedback for accuracy [Source: arXiv:2604.23970].
- Integrated Safety Evaluator: A Safety Evaluator agent assesses potential hazards along each generated route [Source: arXiv:2604.23970].
- Performance Gains: It consistently outperforms single-call LLM baselines in navigation success rates [Source: arXiv:2604.23970].
How does LLM-Guided Agentic Floor Plan Parsing work?
The system operates in two main phases to generate accessible navigation instructions.
- Floor Plan Parsing: A multi-agent module processes a single floor plan image [Source: arXiv:2604.23970]. This module parses the floor plan into a spatial knowledge graph [Source: arXiv:2604.23970]. It uses a self-correcting pipeline with iterative retry loops and corrective feedback [Source: arXiv:2604.23970].
- Path Planning and Safety Evaluation: A Path Planner generates accessible navigation instructions [Source: arXiv:2604.23970]. A Safety Evaluator agent then assesses potential hazards for each route [Source: arXiv:2604.23970]. The LLM acts as an agent by incorporating a role, environment, and memory as inputs [Source: 5].
Benchmarks and evidence
| Evaluation Metric | UMBC MP-1 (Short Routes) | UMBC MP-1 (Medium Routes) | UMBC MP-1 (Long Routes) | UMBC MP-3 (Short Routes) | UMBC MP-3 (Medium Routes) | UMBC MP-3 (Long Routes) | Source |
|---|---|---|---|---|---|---|---|
| Agentic Framework Success Rate | 92.31% | 76.92% | 61.54% | 76.92% | 61.54% | 38.46% | arXiv:2604.23970 |
| Claude 3.7 Sonnet Baseline Success Rate | 84.62% | 69.23% | 53.85% | 61.54% | 46.15% | 23.08% | arXiv:2604.23970 |
The system was evaluated on the UMBC Math and Psychology building (floors MP-1 and MP-3) and the CVC-FP benchmark [Source: arXiv:2604.23970]. It showed consistent gains over single-call LLM baselines [Source: arXiv:2604.23970]. For example, on MP-1, it achieved 92.31% success for short routes, outperforming Claude 3.7 Sonnet at 84.62% [Source: arXiv:2604.23970].
Who should care
Builders
Builders of AI systems for accessibility should care about this framework. It demonstrates a scalable solution for indoor navigation for BLV individuals [Source: arXiv:2604.23970]. The multi-agent approach with self-correction is a valuable design pattern [Source: arXiv:2604.23970].
Enterprise
Enterprises in hospitality, retail, and public services can leverage this technology. It can enhance accessibility for BLV customers and employees [Source: arXiv:2604.23970]. This reduces the need for expensive per-building infrastructure [Source: arXiv:2604.23970].
End users
Blind and low-vision individuals are the primary beneficiaries of this innovation. It offers improved and reliable indoor navigation instructions [Source: arXiv:2604.23970]. This enhances independence and safety in unfamiliar indoor environments [Source: arXiv:2604.23970].
Investors
Investors interested in AI for social good and accessibility technology should take note. This framework presents a scalable and impactful application of LLMs [Source: arXiv:2604.23970]. The market for accessibility solutions is growing, driven by regulatory and social demands [Source: 7].
How to use LLM-Guided Agentic Floor Plan Parsing today
The framework is currently presented as an academic paper on arXiv [Source: arXiv:2604.23970]. Direct public access or an API for immediate use is not yet disclosed. Researchers can access the paper for implementation details [Source: arXiv:2604.23970].
LLM-Guided Agentic Floor Plan Parsing vs competitors
| Feature/System | LLM-Guided Agentic Floor Plan Parsing | Single-Call LLM Baselines (e.g., Claude 3.7 Sonnet) | Traditional Indoor Navigation (e.g., costly infrastructure) |
|---|---|---|---|
| Input | Single floor plan image | Text prompts/limited image input | Dedicated sensors, beacons, or pre-mapped environments |
| Parsing Method | Multi-agent module, self-correcting, iterative retry loops | Direct LLM interpretation | Manual mapping or specialized computer vision |
| Output | Structured spatial knowledge base, accessible navigation instructions with safety evaluation | Navigation instructions (potentially less reliable/safe) | Navigation instructions based on infrastructure data |
| Infrastructure Cost | Lightweight | Low (software-based) | High (per-building installation) |
| Accessibility for BLV | High, with safety considerations | Moderate, less reliable | High, but limited by infrastructure availability |
| Performance (UMBC MP-1 Short) | 92.31% success rate | 84.62% success rate | Not yet disclosed. |
| Scalability | Scalable solution | Limited by single-call LLM robustness | Limited by infrastructure deployment |
The agentic framework significantly outperforms single-call LLM baselines like Claude 3.7 Sonnet in navigation success rates [Source: arXiv:2604.23970]. It offers a lightweight infrastructure solution compared to traditional methods [Source: arXiv:2604.23970]. Multimodal LLMs, which integrate visual and textual reasoning, are a promising frontier for interpretable assessments [Source: 6]. GPT-5.5 (xhigh) currently ranks #1 on the Artificial Analysis LLM Leaderboard [Source: 2]. GLM-5.1 leads open-source LLMs in coding performance [Source: 3].
Risks, limits, and myths
- Floor Plan Accuracy: The system’s performance depends on the clarity and accuracy of the input floor plan image. Imperfect or outdated floor plans could lead to incorrect navigation instructions.
- Dynamic Environments: The current framework may struggle with dynamic changes in indoor environments, such as temporary obstacles or furniture rearrangements.
- LLM Hallucinations: Like all LLMs, there’s a risk of the model generating plausible but incorrect information, especially in complex parsing tasks.
- Myth: LLMs alone are sufficient for complex tasks. This research demonstrates that an agentic framework with iterative self-correction significantly outperforms single-call LLMs [Source: arXiv:2604.23970].
- Myth: Costly infrastructure is always necessary for indoor navigation. This system aims to provide accessible navigation with lightweight infrastructure [Source: arXiv:2604.23970].
FAQ
- What is the primary goal of LLM-Guided Agentic Floor Plan Parsing?
- The primary goal is to generate safe, accessible indoor navigation instructions for blind and low-vision individuals using a single floor plan image [Source: arXiv:2604.23970].
- How does the system process a floor plan image?
- A multi-agent module parses the floor plan image into a spatial knowledge graph through a self-correcting pipeline [Source: arXiv:2604.23970].
- What role does the Safety Evaluator play?
- The Safety Evaluator agent assesses potential hazards along each generated navigation route [Source: arXiv:2604.23970].
- Is this system better than using a single LLM for navigation?
- Yes, the agentic framework consistently outperforms single-call LLM baselines in navigation success rates [Source: arXiv:2604.23970].
- What kind of infrastructure does this system require?
- It is designed to work with lightweight infrastructure, reducing the need for costly per-building installations [Source: arXiv:2604.23970].
- On which datasets was the system evaluated?
- The system was evaluated on the UMBC Math and Psychology building (floors MP-1 and MP-3) and the CVC-FP benchmark [Source: arXiv:2604.23970].
- Can this technology be used in commercial applications?
- Not yet disclosed. The research paper suggests it is a scalable solution for accessible indoor navigation [Source: arXiv:2604.23970].
- What is an agentic framework in the context of LLMs?
- An agentic framework extends an LLM by adding supporting elements like a role, environment, and memory, allowing it to perform complex tasks iteratively [Source: 5].
Glossary
- Agentic Framework
- A system where an LLM is augmented with a role, environment, and memory to perform tasks iteratively and autonomously [Source: 5].
- Blind and Low-Vision (BLV)
- Individuals with significant visual impairment, requiring specialized accessibility solutions [Source: arXiv:2604.23970].
- Floor Plan Parsing
- The process of extracting structural and spatial information from a floor plan image [Source: arXiv:2604.23970].
- Large Language Model (LLM)
- A type of artificial intelligence model trained on vast amounts of text data, capable of understanding and generating human-like text [Source: 5].
- Multimodal LLM
- An LLM that can process and integrate information from multiple modalities, such as text and images [Source: 6].
- Spatial Knowledge Graph
- A structured representation of spatial relationships and entities within an environment, derived from a floor plan [Source: arXiv:2604.23970].
Sources
- [1] LLM Leaderboard 2026 — Compare Top AI Models – Vellum
- [2] LLM Leaderboard – Comparison of over 100 AI models from OpenAI, Google, DeepSeek & others
- [3] The Best Open-Source LLMs in 2026
- [4] Top 5 Local LLM Tools and Models in 2026 – DEV Community
- [5] Large language model – Wikipedia
- [6] Leveraging Multimodal LLMs for Built Environment and Housing Attribute Assessment from Street-View Imagery
- [7] ADA Guide for Places of Lodging: Serving Guests Who Are Blind or Who Have Low Vision | ADA.gov