👯Tutorial 5: Multiplayer AR

See repo: https://github.com/holokit/holokit-colocated-multiplayer-boilerplate

Overview

Multiplayer games, spanning over decades across various platforms like PC, consoles, mobile, and Virtual Reality (VR), uniformly immerse players in entirely virtual worlds.

Augmented Reality (AR) diverges from this norm by overlaying virtual elements onto the real world. The player’s immediate environment becomes an integral part of the overall gaming experience.

In AR development using Unity’s ARFoundation, the XROrigin component overlays a virtual coordinate system onto the real world, with the origin set at the AR session's initial position. This poses a challenge in multiplayer AR: the coordinate systems on different devices, each initialized at their respective AR session startig points, do not align. Consequently, synchronizing virtual objects across devices becomes complex. An object placed at specific coordinates on Device A may not appear in the same physical location on Device B, as identical coordinates do not correspond to the same real-world position.

There are two main problems which need to be solved to create a solid multiplayer AR experience: network communication and coordinate system synchronization.

Network Communication

Multiplayer AR faces network communication challenges similar to those in traditional multiplayer games on platforms like PC, concoles, mobile, and VR, where different devices must synchronize content. To address these challenges, two key decisions are required:

  1. Selecting a multiplayer solution SDK: Options include Unity’s Netcode for GameObjects or Photon Fusion, Unreal Engine’s built-in SDK, or Apple’s RealityKit framework.

  2. Choosing a transport method: Traditional MMORPGs often use server-client architecture, while early competitive games like Warcraft 3 opt for host-client systems. Apple’s native solution employs peer-to-peer networking for local device communication, ideal for face-to-face multiplayer AR due to reduced latency in shared physical spaces. Apple’s MultipeerConnectivity framework, underpinning technologies like AirDrop, operates independentlyof cellular or Wi-Fi connections. However, it’s important to note that while some SDKs like Netcode for GameObjects support multiple transport layers, others do not.

Network latency is a critical factor in the efficiency of multiplayer AR experiences. Measured as Round Trip Time (RTT), it represents the duration it takes for a data packet to travel from one device the the server and back again. This time is crucial in multiplayer games, as high latency can lead to noticeable delays, affecting real-time interactions and synchronization crucial for an immersive AR experience.

Coordinate System Synchronization

In a multiplayer AR session, it’s crucial for devices to share real-time pose data (position and rotation) to accurately render each other’s location in every frame. Consider a multiplayer AR game displaying each player’s health bar above them. To accomplish this, the devices’ coordinate systems must be synchronized, requiring the resetting of their origins to an identical point corresponding to the real world.

The "cold start" approach and the "absolute coordinate" approach

Prior to synchronizing coordinate systems in AR, it’s essential to understand the two primary localization methods. The first is the “cold start” approach, where the device enters an ARSession without any prior knowledge of its surroundings. It begins by building a SLAM (Simultaneous Localization and Mapping) map from scratch, localizing itself within this map as it’s constructed. SLAM accurately reflects this dual process and mapping and self-localization. Consequently, the origin of the coordinate system is established at the initialization point of the SLAM process, resulting in distinct and independent coordinate system origins for each device.

The second method is the “absolute coordinate” approach, contrasting sharply with the “cold start” method. This approach necessitates pre-scanning the environment to create a point cloud map. When the AR sesssion begins, the device, already familiar with its surroundings, only needs to relocalize itself within the pre-established map.

The “cold start” approach’s primary advantage lies in its ability to quickly initiate an AR session anywhere. However, this speed comes at the cost of accuracy. The tracking system may exprience drift over longer sessions. Moreover, due to the limited storage capacity of mobile devices, a growing SLAM map can become too large to store the entrie surrounding environemnt. For instance, ARKit’s maximum mapping capability is approximately the scale of a room. If a user continue scanning new areas, ARKit will discard previously scanned map sections.

The “absolute coordinate” method offers enhanced accuracy and a broader map scale due to its reliance on pre-scanning. This enables scanning the same area from multiple angles and investing more time to build the point cloud map from pre-scanned images. With prior knowledge of the entire map, devices can relocalize more easily when tracking drift occurs, increasing the tracking system’s fault tolerance. Additionally, storing the point cloud map either on the device or in the cloud allows for localization in very large-scale maps. The main drawback of this method is the prerequisite of pre-scanning, limiting the application’s use to only specific, pre-scanned areas.

How to synchronize coordiante systems with these two approaches

Having explored the concepts of both the “cold start” and the “absolute coordinate” approaches, let’s focus back to the coordinate system synchronization problem.

In the “cold start” approach, aligning SLAM maps between devices is key. This means resetting the coordinate system origins of all devices in the network to a singular physical point.

ARKit’s ARCollaboration feature conveniently supports SLAM map alignment, enabling devices to share their SLAM map data in real-time. When a common area is detected, ARKit automatically transforms the devices’ 6DoF poses and AR anchors to the appropriate locations on each device’s local coordinate system. For successful alignment, devices must scan the same physical environment from similar perspectives. This method is relatively accurate, using numerous feature points for map alignment. However, the user experience can be challenging, as it requires users to patienly scan a specific area for 3-15 seconds, demanding both skill and patience.

A more user-friendly alternative involves using an external marker, like a QR code, instead of aligning a section of the SLAM map. This method exploys an image tracking algorithm to track the marker’s pose. Once devices detect the marker, they set their coordinate system origin to it. This approach is quicker, less computationally expensive and less technically demanding than SLAM map alignment. However, its accuracy is lower due to reliance on a single point for alignment, potentially resulting in significant drift due to the SLAM maps’ varying details on each device. Another limitation is the impracticality of users carying an external marker image with them at all times.

Addressing the issue of not having a physical marker, one solution is to display the marker on one device’s screen for other devices to scan. Yet, this method still falls short in solving the accuracy problem. Compared to the SLAM map alignment method, which involves sharing real-time SLAM map data between devices, this simpler approach of stitching maps together fails to yield better results.

Besides the methods mentioned earlier, a simpler yet effective approach for achieving coordinate system synchronization is to initialize all devices at the same physical location and orientation. While seemingly rudimentary, this method can yield surprisingly good results with minimal effort, as it eliminates the need for complex coding (coding multiplayer AR is hard). If the goal is to create a multiplayer AR demo video featuring static virtual content, this straightforward approach is worth considering.

In the “absolute coordinate” approach, synchronizing the coordinate system is much simpler. As all devices access a shared pre-scanned map, their coordinate systems are inherently identical. This method enhances the robustness and user-friendliness of multiplayer sessions. Users can easily relocalize their device by scanning any pre-scanned area, and relocalization is more likely to succeed even if the device’s tracking is lost due to successive motion or camera obstruction. Immersal stands out as an excellent SDK for this approach, allowing devices to synchronize using the same point cloud map downloaded from the Immersal cloud server.

VPS and AR cloud

While the “cold start” approach is limited to room-scale AR experiences, while Immersal can already support block-scale or even city-scale AR, is this the limit?

Imagine a future where AR glasses are as ubiquitous as mobile phones, with everyone owning a pair. In such a scenario, connecting every AR device on a planetary scale becomes essential. This would necessitates a vast, pre-scanned map encompassing every area of Earth, allowing all AR devices to share a unified coordinate system. With this setup, everyone could see the same virtual content, precisely rednered at identical locations worldwide.

Could the Global Positioning System (GPS) be used to position devices for AR? Unfortunately, GPS accuracy is limited to meters, whereas a seamless AR experience requires centimeter-level precision. To address this, the emerging Visual Positioning System (VPS) technology offers a solution. VPS initially uses GPS to approximate its global position, then scans the surroundings to match them with a comprehensive street view database in the cloud. This process allows for relocalization within a vast, planet-scale coordinate system, with precise latitude, longitude, and altitude. Major companies like Apple and Google are at the forefront of this technology, having developed ARKit’s GeoAnchor system and ARCore’s GeoSpatial API, respectively.

However, VPS technology is still in its infancy, grappling with lengthy initialization times and limited coverage cities. The key to achieve planet-scale AR lies in the development of the “AR cloud”, a foundational infrastructure for future AR devices analogous to how 5G underpins today’s mobile technology. As the AR cloud matures, the era of truly immersive multiplayer AR swill dawn.

The Three Aspects of Multiplayer AR

Drawing from the concepts discussed earlier, implementing a multiplayer AR system hinges on addressing two primary challenges: network communication and coordinate system synchronization. Network communication itself splits into two sub-problems: choosing the right multiplayer SDK and selecting an appropriate transport layer. Therefore, a comprehensive multiplayer AR solution emerges from effectively resolving these three intertwined issues.

Multiplayer AR Solution = Multiplayer SDK + Network Transport + Coordinate System Synchronization Method

For instance, the HoloKit multiplayer AR game “MOFA” utilizes Unity’s Netcode for GameObjects as its multiplayer SDK, Apple’s MultipeerConnectivity framework for the transport layer, and external marker for synchronizing the coordinate system.

Consider a multiplayer museum AR app: it could employ Unity’s Netcode for GameObjects as the multiplayer SDK, a local router for the transport layer, and Immersal SDK for coordinate system synchronization.

Apple Vision Pro’s native multiplayer solution integrates RealityKit as the multiplayer SDK, Apple’s MultipeerConnectivity framework for the transport layer, and ARKit’s SLAM map alignment for coordinate system synchronization.

The following section will delve into an analysis of the advantages and disadvantages of each solution for these three key problems.

Multiplayer SDK

Netcode for GameObjects, Unity’s official solution for multiplayer gaming, provides the fundamental features necessary for multiplayer game development. Additionally, it offers a relatively lower learning curve compared to other multiplayer SDKs that boast advanced features.

Photon Fusion is another Unity multiplayer SDK, equipped with advanced features like client-side prediction and lag compensation. However, it presents a steeper learning curve than Netcode for GameObjects, coupled with fewer learning resources and less active community.

RealityKit, Apple’s native framework for AR development, includes a built-in multiplayer solution. Being an Apple framework, it requires developers to use Swift for coding. However, its capabilities for game development are not as versatile and powerful as Unity, a dedicated game engine.

Transport layer

Using a local router offers convenience, enabling low latency as data doesn’t have to travel through a remote server, and it’s easy to set up in the Unity project. However, this method requires a dedicated router and necessitates users to manually enter IPs for device connection.

Dedicated remote servers, commonly used in modern online games, offer the benefits of anytime connectivity and stable connections. However, renting or constructng such servers can be costly, and they typically have higher latency compared to local area networks.

Apple’s MultipeerConnectivity framework, the technology powering AirDrop, offers substantial handwitdth easy connection of nearby Apple devices. As a local peer-to-peer network, it boasts low and stable latency, typically around 30ms. However, its major limitation is the restricted range; devices may disconnet if they are more than 10 meters apart.

Coordinate system synchronization

ARKit’s SLAM map alignment feature achieves relatively high accuracy by using numerous feature points for map alignment and sharing SLAM data in real-time to ensure map uniformity across devices. However, it compromises on user experience, presenting a confusing process and requiring prolonged holding times.

Compared to SLAM map alignment, external markers offers quicker setup times, resulting in a slightly improved user experience. However, their reliance on a single alignment point leads to poorer accuracy. Users may experience drifting during extended AR sessions or when the movement area expands.

The Immersal SDK is a potent “absolute coordinate” solutioin. Developers must scan the environment, upload the imagery to the cloud for point cloud map construction, and then download the map or relocalize the device via the cloud. This process is rubust and user-friendly within predefined areas. However, its main drawback is the lack of flexiblity, restricting users to AR expreiences only in specific, pre-scanned areas.

Current Situation of Multiplayer AR

Despite numerous teams announcing the implementation of multiplayer AR games over the past years, there are still no widely accepted titles in this genre. Pokémon Go, for instance, isn’t considered a strict AR game as it lacks precise tracking, not allowing multiple players to see the same Pokémon rendered at the exact same real-world position.

In AR experiences utilizing the “cold start” approach, user satisfaction cannot be consistently assured for the general public. Meanwhile, apps employing the “absolute coordinate” approach are confined to certain areas, limiting the potential user base.

Perhaps the widespread popularity of true multiplayer AR apps, offering unprecedented solutions to user prblems, awaits the maturation of VPS and AR cloud technologies.

What We Offer For Multiplayer AR Now

We offer a sample project with three boilerplate scenes for developing multiplayer AR projects, accessible at holokit-multiplayer-ar-boilerplates. Designed for both "cold start" and "absolute coordinate" AR projects, these boilerplates include:

  • External Marker Relocalization Boilerplate: Utilizes an external marker to relocalize device coordinate systems. It has a relatively easy setup process. If an external marker image is accesible, this boilerplate is ideal for AR experiences involving more than three devices.

  • Dynamically Rendered Marker Relocalization Boilerplate: Employs a dynamically rendered marker, which displayed on the host's screen and scanned by all client devices. This method is more complex and less error-tolerant compared to the external marker method, as each client device's relocalization requires the help of the host. This boilerplate is suitable for projects with two or three devices where an external marker isn't available.

  • Immersal Boilerplate: Specifically for "absolute coordinate" AR projects, it integrates the Immersal SDK for AR map pre-scanning and relocalizatioin. It requires scanning your map using the Immersal Mapper App and importing it into the project. It's best for AR experiences in specific locations demanding high tracking accuracy.

These boilerplates are good starting points for your multiplayer AR development. You can build the sample scenes onto your iPhones, familiarize yourself with the code, and then tailor the project to your requirements. For more information on how to use these boilerplates, refer to the README file in the boilerplate repository.

Multiplayer programming is a complex endeavor, requiring the synchronization of user inputs and on-screen content across all networked devices. Programming for network objects demands a distinct approach. At the conclusion of this article, we’ll provide a list of learning resources to assist in mastering multiplayer programming.

A common misconception is that one can evolve a single-player game into a multiplayer one with ease. In reality, this often results in a disastrous need for refactoring. When developing multiplayer games, it’s essential to construct the architecture with multiplayer dynamics in mind from the outset, as there are numerous additional factors to consider in multiplayer programming.

Be aware that the spectator view feature is a key application of multiplayer AR. In this scenario, the spectator device is treated as a connected but input-free device. Lots of HoloKit demo videos have utilized this feature to capture AR experiences from a third-person perspective.

Learning Resources

Last updated