Last week, I attended NVIDIA GTC 2026, one of the world’s largest AI conferences. Across keynote sessions, technical workshops, and candid conversations with robotics engineers, I noticed that discussions kept circling back to a single issue: data strategy.
More specifically, I see a growing gap between where physical AI is headed and where the data supporting it actually stands today.
Physical AI, the branch of artificial intelligence that enables machines to perceive, reason, and act in the real world, isn’t just a research concept anymore. This technology is advancing rapidly, but the data strategies needed to support it aren’t keeping pace.
At the GTC conference, nearly every robotics company I spoke with is still working to build a data stack that can keep up with its models. So, if the models are ready, what is holding real-world deployment back?
The most candid conversations at GTC took place on the robotics floor, where I got to interact with engineers and product leads building real-world systems.

A Humanoid Robot Interacting with Attendees at GTC Conference 2026 (Source)
What stood out to me most was that, despite rapid progress in physical AI, most teams are still working with a remarkably similar data stack. Here’s a look at what that typically includes:
As I talked to more teams working on physical AI, the same three-part data setup kept coming up, and that’s where the gap became clear. For robots to work reliably in the real world, these data types need to work together.
But in practice, most teams still rely heavily on synthetic data and have limited access to scalable real-world data. That’s why models perform well in controlled environments but start to struggle in more complex, unpredictable settings.
Synthetic data has been the foundation of robotics model training for a while. It enables teams to generate large volumes of labeled data in simulated environments without the cost and complexity of real-world collection. But it comes with limitations, a point I heard repeatedly from robotics teams at GTC.
Synthetic data is falling short in two key areas: throughput and generalizability. As models expand into new environments, object types, and task variations, teams are constrained by how quickly they can generate and validate synthetic datasets.
Another limiting factor is the time required to produce high-quality synthetic robotics data. As one engineer at the GTC conference put it, “Synthetic data is useful, but it takes away too much time.”
To compensate, a logical option seems to be teleoperation data. This type of data is grounded in real-world interaction and better reflects how robots behave in dynamic environments. However, it is expensive and operationally complex to scale.
As a result, I saw many teams starting to look toward egocentric data. Nearly every robotics company I spoke with is either exploring it or sees it as a necessary next step to scale their data more efficiently.
If synthetic data helps models scale and teleoperation grounds them in reality, egocentric data is what connects the two. Egocentric data captures the world from the robot’s point of view. It reflects how environments actually look and behave, from cluttered scenes and shifting lighting to the unpredictability of human interaction.

Several Physical AI demos at GTC 2026 highlighted the need for more egocentric data.
For robotics teams, this type of data is essential. It is a key ingredient in building physical intelligence, the ability for machines to perceive, adapt, and act reliably in unstructured environments.
In conversation after conversation at GTC, the message was consistent. Physical AI teams knew they needed more egocentric data, but most didn’t have a clear path for egocentric data collection.
What I heard at GTC reflects a broader pattern across robotics AI news and the physical AI landscape.
Synthetic data has helped drive early progress, but struggles to capture the variability of real-world environments. Teleoperation brings that real-world grounding, but remains expensive and difficult to scale.
Across the industry, teams are coming to the same conclusion: the current data stack is incomplete. It needs a third layer, and that layer is egocentric data.
Egocentric data is quickly becoming crucial for improving model performance in real-world deployment. But while the need for egocentric data is widely understood, the ability to collect and scale egocentric data is still catching up. As robotics continues to advance, the teams moving fastest are the ones addressing this data gap now.
As robotics teams move from experimentation to real-world deployment, scaling data becomes the next big challenge.
At Objectways, we help teams make that shift. We work with robotics companies to collect, annotate, and manage the data needed to train and improve their models. This includes egocentric data from a robot’s point of view, teleoperation data from real-world interactions, and multimodal 3D data for spatial understanding.
Closing the robotics data gap is where we specialize. Our in-house team of over 2,200 trained data annotation specialists supports faster data production, reliable quality, and scalable execution.

An example of the Objectways team creating and collecting egocentric data. (Source)
We support end-to-end data workflows, from data collection and annotation to validation and quality assurance. Teams can scale faster, improve model performance, and bring robotics systems into real-world environments with confidence.
Here’s a quick roundup of key takeaways from GTC 2026 for teams building physical AI systems:
NVIDIA GTC 2026 made one thing clear. Physical AI is moving into real-world environments. Robotics systems are shifting beyond controlled settings into dynamic, unpredictable conditions.
But as I saw on the exhibit floor, progress is about more than models. It depends on how effectively teams build and scale their data. As AI adoption grows, building and scaling data pipelines will distinguish teams that deploy from those still stuck in experimentation.
If you’re building a robotics model and starting to feel the friction around data, whether it’s synthetic throughput limits, teleoperation costs, or the egocentric data gap, we’d love to talk. Get in touch to see how Objectways can support your data pipeline.
Generative AI creates content such as text and images from learned patterns. Physical AI enables machines to perceive, reason, and act in the real world, powering robots, autonomous vehicles, and industrial automation.