Map-Free Outdoor Social Navigation

Walk With Me: Long-Horizon Social Navigation for Human-Centric Outdoor Assistance

Translating high-level human intent into safe, long-horizon, socially compliant outdoor robot navigation.

Lingfeng Zhang1,2,3, Xiaoshuai Hao3,†,‡, Xizhou Bu3,4, Yingbo Tang5, Hongsheng Li1, Jinghui Lu3, Xiu-shen Wei6, Jiayi Ma7, Yu Liu8, Jing Zhang7, Hangjun Ye3, Xiaojun Liang2, Long Chen3, Wenbo Ding1,†
1Tsinghua University 2Pengcheng Laboratory 3Xiaomi EV 4Fudan University 5Institute of Automation, Chinese Academy of Sciences 6Southeast University 7Wuhan University 8Hefei University of Technology
Corresponding authors Project leader
Walk With Me motivation and applications
Human instruction

"I want to go for a walk."

Robot response

"I will take you to a nearby park and cross safely when the scene is clear."

From intent to outdoor assistance

Walk With Me grounds abstract language with lightweight public-map context, plans coarse waypoints, and executes the route through a safety-aware VLM/VLA hierarchy.

Abstract

Assisting humans in open-world outdoor environments requires a robot to translate high-level natural-language intentions into safe, long-horizon, and socially compliant navigation behavior. Walk With Me is a map-free framework for this setting. Instead of relying on costly pre-built HD maps, it uses GPS context and lightweight candidate points-of-interest from public map services for semantic destination grounding and waypoint proposal.

A High-Level Vision-Language Model grounds the user's abstract instruction into a concrete destination and plans a coarse waypoint sequence. During execution, an observation-aware router decides whether a situation can be handled by the Low-Level Vision-Language-Action policy or should be escalated back to the High-Level VLM for explicit safety reasoning. Routine segments are executed by the Low-Level VLA, while complex scenes such as crowded crossings trigger stop-and-wait behavior when unsafe.

Demo Videos

Representative outdoor runs for last-mile delivery and blind guidance.

Last-Mile Delivery

The robot grounds "Take the milk tea to Building B" into a concrete destination and follows a long-horizon route while avoiding nearby pedestrians.

Blind Guidance

The system maps an open-ended walking request to a nearby destination, reasons about road-crossing safety, and continues with socially aware navigation.

Method Overview

The framework decouples where to go from how to walk there.

Overall framework of Walk With Me
A High-Level VLM grounds intent and proposes waypoints, while an observation-aware router switches between Low-Level VLA execution and explicit safety reasoning.
01

Intent Grounding

A High-Level VLM interprets the user instruction with GPS context and nearby POI candidates, selecting a concrete real-world destination.

02

Waypoint Planning

Public walking-route services provide lightweight route priors that are resampled into geo-referenced waypoints without requiring an HD map.

03

Adaptive Routing

An observation-aware router sends routine segments to a Low-Level VLA and escalates crossings, crowds, or ambiguous traffic to high-level safety reasoning.

04

Closed-Loop Execution

The robot predicts local navigation actions, updates its state, and repeats the loop until the destination is reached.

Qualitative Visualization

Real-world examples and model behaviors from the updated paper.

Qualitative visualization of Walk With Me
Walk With Me grounds last-mile delivery and blind-guidance instructions into concrete destinations, handles pedestrian interactions, and reasons conservatively in safety-critical road-crossing scenes.
High-Level VLM reasoning in safety-critical scenes
High-Level VLM reasoning compares stop-and-wait decisions across crossings, pedestrian proximity, and crowded outdoor scenes.
Low-Level VLA trajectory prediction
Low-Level VLA trajectory prediction provides short-horizon socially compliant motion guidance conditioned on route instructions and current observations.

Human-Centric Applications

Walk With Me targets practical outdoor assistance where the user gives natural, high-level requests.

Last-Mile Delivery

Completes long-horizon outdoor delivery routes with social compliance and local obstacle awareness.

Blind Guidance

Uses explicit safety reasoning at crossings and crowded scenes before proceeding.

BibTeX

@article{zhang2026walkwithme,
  title={Walk With Me: Long-Horizon Social Navigation for Human-Centric Outdoor Assistance},
  author={Zhang, Lingfeng and Hao, Xiaoshuai and Bu, Xizhou and Tang, Yingbo and Li, Hongsheng and Lu, Jinghui and Wei, Xiu-shen and Ma, Jiayi and Liu, Yu and Zhang, Jing and Ye, Hangjun and Liang, Xiaojun and Chen, Long and Ding, Wenbo},
  journal={Preprint},
  year={2026}
}