1. Introduction
In an advanced society, the world is gradually interested in health and wellness within buildings. It is said that people spend an average of 87% of their time indoors 1. Therefore, it is important to provide people with a comfortable and healthy indoor environment. Indoor Environmental Quality (IEQ) and its effect on occupants’ health and comfort have always been considered in new buildings. However, the proportion of building stock is much larger than that of new buildings 2. Numerous existing building stocks retain considerable value in terms of social, economic, and environmental aspects. There is a high demand for building stock renovation, which gives existing buildings new life, rather than constructing new ones.
In order to reduce the cost and decrease environmental impacts during the renovation process, using Mixed Reality (MR) technology to display interior renovation plans has been proposed 3. MR refers to a state in which digital information is superimposed to some extent on the complete real world. 4. Introducing MR technology into the renovation design steps enables the simultaneous design of the building plan and environment design which can reduce the construction period with less need for coordination, timely feedback, and fewer variation orders. The MR experience enables non-professionals to understand and participate in the design process. MR encompasses a spectrum from the real to the virtual environment. Augmented Reality (AR) enhances the real world with virtual elements, and Virtual Reality (VR) immerses users entirely in a virtual space. While diminished reality (DR) is a lesser-known term than AR and VR, it is an evolving concept within the field of MR and can be understood as the removal or modification of real-world content 5. The introduction of DR technology enables users to have an indirect view of the world, where specific objects are rendered invisible, facilitating the presentation of interior renovation plans more effectively, as removing an existing wall is one of the basic steps in building stock renovation.
There are many methods to achieve DR. The key is how to generate the erased part, which we call the DR background, and let the DR result make for a more satisfying experience. In-painting is one of the methods that seek to apply textures and patch details directly from the source image to paint over objects 6. New algorithms for image in-painting methods are constantly being proposed, such as combining Visual-SLAM (Simultaneous Localization and Mapping) into the algorithm, which is a method that segments the background surfaces by using some feature points of the background 7. However, those methods combining Visual-SLAM are not available on the solid-colored or large-area walls. Moreover, the more satisfying the DR results, the more computing resources are needed 8. Then, the time required for calculations will increase, and real-time DR will be hard to realize. On the other hand, in-painting methods are usually used to erase small and independent objects, rather than for room-based interior renovation designs. This approach might encounter limitations when dealing with intricate indoor environments. Relative to this, building structures such as walls and columns are relatively large objects and are connected to the environment. Therefore, it is hard to use the in-painting methods to erase large objects like an entire wall. In this research, we propose a novel method to generate DR background that can achieve realistic large-scale DR in real-time.
Another set of methods utilizes pre-captured images or video of a background scene 9. Subsequently, as new physical elements are introduced into the space, the background images serve as a reference, providing information on the areas concealed by these new objects. Those methods that rely on pre-captured backgrounds cannot display the real-time background and limit the observation angle provided to users to that of the pre-captured background. Therefore, to achieve a realistic large-scale DR, reconstructing a three-dimensional (3D) DR background area virtually is necessary.
Previously, a system utilizing multiple handheld cameras for DR was introduced 10. This system harnessed several cameras to capture the same scenes from varied angles. AR markers were employed to calibrate these cameras, enabling the computation of occluded content and the diminishing of targeted objects. Nevertheless, the DR results from this method were not consistently stable, primarily because its pixel-by-pixel comparison algorithm falters when the occluding object lacks color variations on its surface. Another advanced DR technique leverages an RGB-D camera to obscure trackable objects 11. This approach employed the RGB-D camera to piece together the absent segments, followed by the application of a color correction method to unify imagery from disparate cameras. Due to the inherent limitations of the RGB-D camera's performance, the reconstructed images often exhibited gaps resembling black noise. Even though in-painting techniques were implemented to address these inconsistencies, the overall quality of the DR results remained suboptimal, with some residual black noise persisting. The proposed methods pre-capture the existing DR background and focus on removing the physical items, whereas this study concentrates on displaying the renovation plan, which may still be under design. The existing DR background needs to be edited.
With the development of artificial intelligence, Generative Adversarial Networks (GANs) have been used to produce data via deep learning, proposing new image generation methods 12. A stable GANs model was trained which demonstrated models for generating new examples of bedrooms 13. Also, a study shows training synthetic images and optimizing the training material can help recover target images better, which makes it easier to do many image editing tasks 14. Furthermore, identifying different underlying variables can aid in editing scenes in GANs models, including indoor scenes 1415. However, there are not many GANs training models specifically for indoor panoramic images. Even when using indoor images synthesized from multiple angles, it's challenging to maintain a consistent style for objects at the same location. Using such synthesized images, it's virtually impossible to reconstruct the indoor environment accurately.
This paper presents a real-time MR system for interior renovation stakeholders, utilizing GANs for DR functions, and SLAM-based MR, offering an intuitive evaluation of the renovation plan, enabling enhanced communication efficiency between non-experts and experts regarding the plan. The system proposes a method that virtually erases a real wall to connect two rooms. Introducing SLAM-based MR technology to do the registration, tracking, and display of a stable background at the specified location. The DR background is generated by a GANs method in advance, and the DR results are sent to a head mounted display (HMD) through a Wi-Fi connection in real-time. Users can experience those MR scenes on the HMD and study renovation by making a wall disappear. An earlier version of this paper was presented at the Association for Computer-Aided Architectural Design Research in Asia (CAADRIA) 2019 conference 16. This article presents the significant subsequent progress and new validations of our research. We have incorporated the option to use the GANs method for outputting DR backgrounds.
2. Literature review
Interior renovation entails the upgrading and modification of existing architectural structures and elements to enhance their eco-friendliness, safety, or to meet other new demands. In the initial stages of traditional design processes, designers utilize blueprints to communicate with stakeholders. Given sufficient budget and time, miniature physical models of the interior design, crafted from materials such as paper or foam, are occasionally created to facilitate more tangible communication with stakeholders. Whether hand-drawn blueprints or manually crafted physical models, the time consumption is typically more extensive compared to digital methods, and human error could introduce inaccuracies in measurements or design. Moreover, for manually crafted designs, any modifications can demand substantial time and budget expenditures, and may even necessitate the redrawing of the entire design—serving as one prevalent cause for time overruns in architectural projects 17. In the initial stage of the design process, employing digital methods facilitates more convenient sharing with team members or clients. The feedback is expedited, and the costs of modifications are reduced 18.
2.1. Interior renovation with Mixed Reality
Compared to demolition and reconstruction, interior renovation emerges as a more eco-friendly, budget-conscious, and efficient method, granting existing buildings a new existence. To ensure that clients without specialized knowledge get renovation outcomes that meet their expectations, a digitized interior model proves indispensable. Contrary to the more technically demanding design blueprints, 3D models enable clients to immediately grasp the anticipated renovation results, proffering their feedback and embedding themselves into the design process promptly.
Compared with traditional methods that utilize desktop displays to present renovation results in a 2D plane, VR technology, which visualizes various data and interior models in 3D space, allowing clients to immerse themselves in the renovation results from a free-viewpoint experience, has been proposed to bridge the communication gap between designers and their clients 19. Furthermore, the reliability and applicability of employing VR technology within the architectural design process have been validated 20. Compared to traditional tools, immersive VR technology holds significant advantages within the interior architectural design process 21. However, although VR technology in interior design has matured significantly, it also includes some problems. VR establishes an entirely virtual environment, thus requiring the computer to generate a comprehensive interior 3D model of the building. To proffer a more reality-related immersive experience, some locations that do not necessitate renovation must also be modeled. For instance, even when only a part of a room is being renovated, to render a more authentic experience, the entire room needs to be modeled, including objects that merely serve as the backdrop. The high upfront costs can result in designers losing some clients who have not yet signed a contract. Moreover, the more the VR outcome approximates reality, the larger and more expensive the required equipment becomes, further inflating the cost of modeling. Examples of such equipment include gigantic high-definition displays composed of multiple screens 22, or the Cave Automatic Virtual Environment (CAVE) system, where an entire room transforms into a display 23.
Accordingly, more compact and flexible AR technology has been introduced. The AR background is the real object itself, requiring modeling only for added items, significantly reducing the upfront design costs. For simple interior renovations or indoor decoration, that do not change the structural elements of the house, there may be no need for designers to be involved in the early design stages – even owners without specialized knowledge can manage it. Employing a new automated interior design algorithm, rational and personalized furniture arrangements can be generated 24. Through AR technology, a straightforward evaluation of expected renovation results can be achieved. For interior renovations that necessitate structural alterations to the house, modeling needs to involve the interior structures of the house. Directly importing models from Building Information Modeling (BIM) into the AR system presents a method that is both simple and accurate 25.
However, in many interior renovation designs, beyond adding more complex structures and furniture, there's also the removal of existing structures or the removal of large, unmovable furniture. Here, DR technology becomes imperative. Contrary to AR technology, which adds information, DR technology demolishes it 26. In architectural design, this technology can assist designers and stakeholders in better understanding the impact of design changes that might be imposed on the space. DR technology is widely utilized in both indoor and outdoor architectural renovation design processes, as well as environmental design processes 27. DR technology is employed to eliminate buildings and landscapes slated for demolition and can even address pedestrians and vehicles 28, or be used to erase indoor furniture and decorations 29. Its usability has been validated. This study employed multiple cameras, including depth camera, RGB camera, and 360-degree camera, to achieve real-time DR for large indoor areas. This provides a new approach to building consensus for complex interior renovations. However, due to device spec limitations, the real-time streaming suffers from significant latency, leading to a noticeable lag. The efficiency and precision of occlusion generation also need improvement.
2.2. Generate virtual environment using GANs
With the progression of artificial intelligence, deep learning has introduced a variety of new and efficient approaches for image and model generation, among which GANs stand out remarkably. Creating 3D models is a complex and costly procedure, involving 3D capture, 3D reconstruction, and rendering using computer graphics technology. Employing GANs allows for the automatic generation of a large number of house models based on some simple requirements and prompts provided by the user, such as keywords, sketches, or color blocks 30. Additionally, utilizing varied training datasets can result in different image effects that exude a unique style. For renovating older houses that lack BIM data, the house point cloud data obtained through laser scanner or depth camera, or photo data captured by cameras, can be inputted into GANs method which can be used to swiftly conduct automatic 3D model reconstruction 31. Those procedures proved immensely beneficial for the early process of interior renovation design utilizing MR technology, enhancing feedback speed throughout the entire process and simultaneously reducing model creation costs. Furthermore, the method of using a 360-degree panoramic camera to capture the target room, employing panoramic photos or videos as the MR environment, has also been proposed 32. In an indoor environment where the structure is not exceedingly complex, the shooting range of a 360 camera can nearly cover most of the range observable to the human eye. Compared to generating an indoor environment using camera scanning or multi-view photos, panoramic photos can complete environment data collection in an instant, boasting a dominant advantage in generation speed. Although panoramic photos have some shortcomings in terms of image quality, several solutions utilizing GANs technology have been proposed to tackle this issue 3334.
3. Proposed MR system with GANs method
3.1. Overview of the proposed methodology
In accordance with
Figure 1, this system consists of four well-defined steps, namely, real-time data collection, background reconstruction (pre-process or runtime-process), renovation plan digitization, and system integration. An HMD is required, equipped with depth camera and RGB camera, capable of receiving Wi-Fi data, and either having its own operating system or the ability to exchange data with a computer in real-time. After researching HMDs that meet these criteria, at the time of this study, only the HoloLens satisfied these requirements, leading to its selection. However, if an HMD meeting these conditions is developed in the near future, it could also be considered for use. The system includes one HMD (HoloLens Gen 1), one 360-degree camera, one Wi-Fi router, and a GPU (Graphics Processing Unit) contained computer for GANs processing. The HoloLens has four “environment understanding” cameras, one RGB camera and one Gyro sensor. This set of equipment can provide calibration information for registration and occlusion generation, as seen in
Figure 2. The streaming data used to reconstruct the background is collected, via a 360-degree camera, which is placed in the central location of the targeted room for renovation. These two devices communicate via Wi-Fi router. Wi-Fi signals can pass through walls, which makes it possible to gather information in the neighboring room. The renovation plan is to build a BIM model that can provide professional information to designers that will make it faster and easier for designers to modify the renovation plan. Synchronizing the BIM model with the 3D coordinates and occlusion information not only ensures the accuracy of the MR result but also allows the user to have a more realistic MR experience. Except for some preprocessing, the entire MR system operates on HMD, so, users can watch the MR results from its see-through display. From the user's perspective view, interior structures that need to be removed will be diminished, the room behind the physical wall will be displayed in front of the user, and as the user can walk back and forth, the occlusion of the background room and the physical objects around the diminished wall will be correctly displayed.
3.2. Real-time data collection
The real-time reconstruction of the background scene has become a key point in recent research. Numerous DR methods, based on in-painting methodologies, have successfully attained real-time object exclusion. One such approach has realized a comprehensive image completion pipeline, independent of pose tracking or 3D modeling 3535. Nevertheless, the outcomes of these DR methods tend to be static, mainly targeting simple objects devoid of intricate patterns. This limitation emerges as the background information largely depends on the characteristics of the immediate environment. The complexity of the target object directly influences the predictability of the DR results: the more complex the object, the more unexpected the DR result. Given this, recent years have seen a surge in the adoption of methods that leverage 3D reconstruction for background scene generation. For instance, the PhotoAR+DR project advanced a reconstruction technique utilizing photogrammetry software, crafting a background 3D model by assimilating photographs of the surrounding environments 36. The generated background is quite realistic, yet processing can require up to ten hours or even span multiple days. In a related study, a different research team introduced a method to visualize concealed regions using an RGB-D camera 37. This method is capable of reconstructing dynamic background scenes. However, the data captured from the RGB-D camera takes the form of point cloud data. Given their extensive size, these point cloud data files can impede real-time reconstruction, leading to occasional delays or missing scene portions. To enhance the consistency of DR results, researchers often find it necessary to thin the density of the point cloud data. In this paper, a method is introduced to gather background scene details using a 360-degree camera. Compared to the standard RGB and RGB-D cameras, the 360-degree camera offers a broader field of view, enabling it to capture the complete background scene in one sweep. Importantly, the file size of the resultant panorama is much smaller than the large-sized point cloud data, negating the need for mandatory wired connections for data transfer. In this study, the data size from the 360-degree camera is compact enough to facilitate wireless transmission. Initially, a plugin is developed utilizing the API provided by the camera manufacturer. Subsequently, a Wi-Fi linkage between the HMD and the 360-degree camera will be established. This connection relies on the HttpWebRequest class, with data typically fetched and dispatched via GET and POST methods. Leveraging this wireless transmission, panoramic videos can be relayed to the HMD in real-time.
3.3. Background reconstruction
3.3.1. Panorama Conversion
The proposed system receives panoramic video data byte-by-byte from the 360-degree camera and isolates the image of each frame from the data stream. However, directly using this original panoramic image to reconstruct the background would lead to significant image distortion. To address this, a transformation of the panorama image into a format compatible with the mask model is essential for minimizing distortion. The mask model acts as a 3D virtual representation of the target set for diminishment and ascertains the segment of the virtual room for DR that is visible in front of the occlusion model. A 1:1 hexahedron room mask model is employed to map the background image. Subsequently, the panoramic images are segmented into six patches and affixed to the hexahedron. The algorithm is influenced by a segment of a VR study that explores the conversion of a cubic environment map (featuring 90-degree perspective projections on a cube's face) into a cylindrical panoramic image 3838. Two primary solutions exist for splitting the panoramic image: one utilizes OpenCV image processing, and the other is grounded in OpenGL. The OpenGL approach adjusts the direction of the LookAt camera to obtain six textures, after which the resultant data is read and saved. For the real-time processing of panoramic images from the data stream required in this study, OpenGL methods could strain GPU computing resources, a known limitation of HMDs. Consequently, OpenCV methods are preferred.
3.3.2. Texture on the dynamic Mask Model
In this study, the background scene is not represented by a single image or model. Instead, a virtual room is fashioned to substitute the diminished wall. The mask model takes the form of a hollow cube, absent one side. The system consistently captures converted images, which are dynamically mapped onto the mask model, barring the diminished wall intended for removal in the renovation design. Subsequently, the real-time DR results are displayed on the mask model. Alternatively, one can transition to the rendered images generated by the pre-trained GANs and map them onto the mask model.
3.4. GANs generation
The challenge of enabling homeowners without specialized knowledge to accurately articulate their design ideas has repeatedly been shown to be challenging. Such individuals lack the capability to visually represent design concepts or build models as professional designers do. They also seldom have the luxury of time for these undertakings. Moreover, depending only on verbal communication often fails to convey their vision effectively. In response to this, a strategy has been developed to transform hand-drawn sketches into renderings. Drawing from an image-to-image translation technique, first introduced by Phillip Isola et al 39, a sketches to objects model has been trained. Given a training set with pairs of related images, labeled as 'A' and 'B', the model can learn to change an image of type 'A' (sketches) into an image of type 'B' (renderings).
3.5. Occlusion
Occlusion occurs when one object in a 3D space hides another object from view. For a more realistic DR experience, physical objects should interact with virtual ones. Incorrect occlusion can make the DR result appear less realistic, as evident in
Figure 3. The system proposed in this study scans and creates real-world object occlusions in real-time. It employs four "environment understanding" cameras from HoloLens, which can sense the depth of the scene similar to an RGB-D camera. Additionally, the system removes the real-time occlusion of the wall that's meant to be diminished. To ensure the wall meant for DR isn't obscured by occlusions detected by the "environment understanding" cameras, the scope of real-time occlusion creation is constrained. By limiting the scanning range of the "environment understanding camera", the occlusion of the target wall is not produced.
3.6. System integration
Apart from some pre-processing steps of GANs, encompassing model training and rendering image generation—conducted on a computer equipped with the Ubuntu system—all other modules are integrated within the game engine. This engine supports Universal Windows Platform (UWP) development. Notably, programs crafted via UWP are compatible with all Windows 10 devices, encompassing the HMD-HoloLens. Due to its support for UWP development and the inclusion of the development kit required for this experiment, Unity(2018.2) will be utilized. The function converting panoramas is rooted in OpenCV (2.2.1), and its library must be imported into the Unity Asset as supplemental attachment package.
4. Implementation
To validate the advantages of the proposed method, an assessment was conducted in rooms 410 and 411 of the M3 building at Suita Campus, Osaka University. Room 410 spans an area of 42.4 m
2, while room 411 covers 150 m
2. The wall designated for DR target was situated between these two rooms, with dimensions of 6.5 m in length and 2.6 m in height. Observation points should be primarily concentrated on the right half of the room. (as illustrated in
Figure 4). The 360-degree camera, THETA V, was positioned centrally in room 410. The comprehensive arrangement is depicted in
Figure 4.
4.1. System data flow
The system data flow for the DR function is shown in
Figure 5. Two distinct sources provide the reconstruction data: GANs-generated renderings and imagery captured by the THETA V camera positioned behind the target wall.
During the pre-processing phase, the panoramic image is produced by a computer equipped with a GPU and then uploaded to the HoloLens. Numerous interior renovation renderings and their paired sketches have been collected. Through the use of edge extraction processing technology in Photoshop and a set of custom scripts, sketches can be efficiently derived from these rendering images. By training the GANs network to transition from sketches to renderings, sketches drawn by homeowners can be converted into renderings using this approach.
The panoramic video captured is streamed to HoloLens through a Wi-Fi connection. The transfer protocol is constructed based on the THETA developers’ API 2.1 (THETA DEVELOPERS BETA) 40. Within this method, the HttpWebRequest class is utilized to set the POST-request. Following this, the motion-JPEG HTTP stream from the camera is parsed to generate each frame from the streaming data. Subsequently, OpenCV-based techniques transform these panoramic images into a Hexahedral map, as illustrated in
Figure 6.
Each frame is then mapped onto a mask model as a texture, awaiting processing completion. The digitization of the renovation plan falls under the MR segment, offering both the renovation plan model and occlusion. Once both parts are finalized, the combined results are projected onto a translucent screen in HoloLens. This allows individuals to "look through" the DR target wall. The real-world view of an individual experiencing the DR results can be observed in
Figure 7. Within
Figure 7, the section enclosed by the red frame designates the target wall. A portion of the target wall nearer to the entrance is not chosen for removal, primarily due to its structural significance.
4.2. Renovation target room
The BIM model, depicted in
Figure 8, can provide stakeholders with professional information and simplify the modification of the renovation plan. Simultaneously, to enable the MR system to operate on some older HMDs that lack real-time scanning and environmental understanding cameras, the BIM model was utilized to generate mesh and collision, created via Revit software. When the user wears the HMD for observation, the MR system must discern its position relative to the room. The BIM model facilitates this by assisting with the registration, and when coupled with the collision data, occlusion can be calculated.
4.3. Operating environment
A prototype application was developed to check the feasibility of the proposal, with the build being based on the methodology proposed in subsection 3.1. The BIM modeling, MR system development, and creation of GANs training materials were carried out on desktop PC (A) (specifications in
Table 1). The GANs operations, requiring CUDA and PyTorch and the installation of the dominate library, ran on the Ubuntu system of PC (B) (specifications in
Table 2). The GPU configuration requirements for PC (B) are relatively high. A GPU with lower specifications may adversely affect the computational speed of GANs, resulting in suboptimal performance in calculation and processing tasks. The system environment setup of PC(B) can be seen in
Table 3.
4.4. Simulation of renovation proposal
To validate the usability and effectiveness of the MR system proposed in this study, a simulation of the initial stage of a renovation project was performed using the MR system for rooms 410 and 411. As shown in
Figure 4, the wall between rooms 410 and 411 was eliminated using DR technology, connecting the two spaces. At the same time, a hand-drawn sketch was created by participants as the renovation target for room 410. To facilitate the subsequent quality verification tests, the current setup of room 410 (as seen in
Figure 9) serves as the renovation goal, prompting participants to create hand-drawn interpretations (illustrated in
Figure 10). These sketches are then input into PC (B). After processing through GANs, the generated renderings are loaded into the HMD, allowing participants to observe the simulated renovation results. Additionally, the current state of room 410 is captured by a 360-degree camera and is wirelessly transmitted to the HMD in real-time, enabling participants to manually switch between the two outcomes. For a more visual explanation of the experimental setup, the system scenarios are depicted in
Figure 11.
4.5. Verification of GANs generation quality
In the proposed simulation, 3044 pairs of panorama images have been inputted as the training dataset. Including 2236 indoor panorama images were gathered from the HDRIHaven indoor dataset 41 and Shutterstock dataset 42, as well as 52 indoor panorama images captured by 360-degree camera. Out of these, 52 images captured by the camera were subjected to data augmentation including flipping, scaling, and cropping. The available number increased to 416. The paired sketches were also generated using a custom Photoshop script. Part of them are shown in
Figure 12. In order to improve the quality of the renderings generated from hand-drawings, we selected 56 photos with reasonable layout and non-repetitive styles from 2236 indoor panorama images and sketched them using the hand-drawing method. And after passing data augmentation, 448 paired training datasets were obtained. Part of them are shown in
Figure 13.
To quantitatively assess the quality of the panorama images generated by GANs, this study intends to compare the GANs-generated results with the correct results. The correct results correspond to the existing arrangement of Room 410, as discussed in section 4.4 of the paper. Additionally, nine indoor panoramic images, not part of the training dataset, were chosen for verification. The evaluation will utilize the Structural Similarity Index Measure (SSIM) 43 and the Peak Signal-to-Noise Ratio (PSNR) 44. Both are related and complementary Full-Reference Image Quality Assessment (FR-IQA) methods and are widely recognized as fundamental and universal image evaluation indices 45.
5. Results
5.1. Numerical results
The PSNR measures the relationship between the maximum potential value of a signal and the noise distortion affecting its representation quality. When the PSNR value is higher, it suggests superior reconstruction (or compression) quality due to reduced distortion or noise. In contrast, a diminished PSNR value signals a decline in quality. PSNR is quantified in decibels (dB). Based on general guidelines, we can provide the following rough guidance 4647:
- -
Under 15dB: unacceptable
- -
15-25dB: The quality might be considered poor, with possible noticeable distortions or artifacts.
- -
25-30dB: Medium quality. Acceptable for some applications but might not be for high-quality needs.
- -
30-35dB: Good quality, acceptable for most applications.
- -
35-40dB: Very good quality.
- -
40dB and above Excellent quality, almost indistinguishable differences from the original image.
The SSIM functions as an index to anticipate the perceived quality of an image in contrast to an original (reference) image. With its value range spanning from -1 to 1, an SSIM of 1 signifies that the test image matches the reference perfectly. As the SSIM value approaches 1, it underscores the increasing structural congruence between the two compared images. Here's a general interpretation of SSIM values 4546:
- -
SSIM = 1: The test image is identical to the reference. Perfect structural similarity.
- -
0.8 < SSIM < 1: High similarity between the two images.
- -
0.5 < SSIM ≤ 0.8: Moderate similarity. There might be some noticeable distortions, but the overall structure remains somewhat consistent with the reference image.
- -
0 < SSIM ≤ 0.5: Low similarity. Significant structural differences or distortions are present.
- -
SSIM = 0: No structural information is shared between the two images.
Numerical results to verify the quality of simulated renovation plans are shown in
Table 4. The PSNR & SSIM Scores show that most images generated by GANs are of moderate quality and have a reasonable similarity with the correct result images. The image with serial number 1 was a hand-drawn sketch of room 410, while the rest were processed by scripts and manually corrected hand-drawn sketches. All of the results hover around 20dB, which falls within an acceptable range. This indicates that the GANs methods have been trained effectively and can produce images that are both qualitatively good and structurally consistent with the expected outcomes. While the quality of renovation renderings produced by GANs might not match those of commercial renderings that need modeling and rendering processes, the low cost of a single hand-drawn sketch and its significantly faster generation speed compared to traditional renderings offer huge advantages in the initial stages of the renovation process.
The GANs outcomes generated from training datasets with different image counts are depicted in
Figure 14. The more images used for training, the better the resulting image quality. Moreover, as the level of detail in hand-drawn sketches increases, the quality of the resulting image also improves.
5.2. Visualization results
The original state of the target wall is depicted in
Figure 7, and
Figure 15 presents the experimental outcomes using the proposed MR system, showcasing 2D segmentation maps. Within these maps, the grey sections represent real-world objects, whereas the blue sections denote the background scene. The five planes of the mask model, along with the walls, ceiling, and floor of room 410, align seamlessly in their real-world positions. As a result, as users move about, they can accurately perceive the spatial sense of the virtual room 410. When viewed through the HMD, the DR results give users the impression that the target wall has vanished. Additionally, in room 410, activities occurring behind the target wall are instantly mirrored in the virtual room 411. The alignment function remains steady, ensuring the virtual object displays without any drift or disappearances.
However, the occlusion isn't always accurately represented. Analyzing the segmentation map shows that parts of the occlusion edge are displayed incorrectly, and certain virtual elements, which shouldn't be present in the DR results, appear sporadically. This inaccuracy stems from the fact that the real-time scanning-generated occlusion is imprecise. The primary culprit behind this is the lack of scanning precision. Moreover, given the performance constraints of the HoloLens, inherent systematic errors are present in the results. Beyond the uneven edge, the whiteboard, which should be positioned in front of the virtual room, is obscured by the virtual components. Even though the span for real-time occlusion generation is set between the whiteboard and the target wall, the proximity between the two is so minimal that it occasionally leads to errors.
Additionally, the DR result is dynamic, but the refresh rate remains low. While testing this DR system on a PC platform, the process runs smoothly in real-time. Yet, the image transmission speed is constrained by the HoloLens' bandwidth, leading to reduced frame rates.
6. Discussion
Using only algorithms to infer the background makes generating content for intricate backgrounds challenging, especially in the absence of repeating patterns. Consequently, this research introduced the approach of employing 360-degree camera data to reconstruct the background. Using a 360-degree camera allows for a rapid scan of the entire DR background. Transmitting data via Wi-Fi expands the range of use scenarios, ensuring that even in rooms with unique layouts, distance constraints are minimized, and real-time DR backgrounds can still be provided. This method is not just effective for complex backgrounds but also extensive diminished targets. Especially for large-scale DR targets, there's no need for multiple cameras or multiple shots of the DR background from different angles. Panoramic images can reconstruct the DR background more conveniently. However, overly large DR targets might compromise the outcome's quality due to insufficient image resolution.
Though GANs-generated renovation renderings as backgrounds may not rival commercial ones, the speed and low cost of hand-drawn sketches offer significant advantages in the renovation process's initial stages. What's more, the method of generating renderings from just a sketch not only enhances communication efficiency between designers and clients but also boosts the clients' enthusiasm to participate. Even younger individuals might potentially offer valuable design insights. Additionally, using panoramic cameras to capture images or collecting copyright-free panoramas for training datasets is time-consuming and often gives inconsistent quality. Owing to the lack of extraterrestrial data, methods for gathering training information on artificially constructed lunar terrains within a VR environment have been previously introduced 48. When a house's BIM model is available, introducing it into a VR environment to collect panoramic images for training data is a viable approach. Compared to relying solely on basic data augmentation techniques, this method yields significantly higher-quality training data. Moreover, the models trained using this approach align more closely with the style of the house before renovation.
Latency challenges are frequent in streaming media transmissions. In this research, compromising video quality isn't an option, as a blurred virtual room would adversely impact the entire DR experience. As a solution, the refresh rate was reduced. The program has been manually adapted to refresh the background. A button was added to the user interface, which, when pressed, refreshes the background.
7. Conclusions
In this paper, a novel MR system is introduced that diminishes a wall, revealing real-time details of the current background scene. Proper occlusion seamlessly blends the virtual and real worlds, enhancing the realism of the MR experience. This system allows designers to quickly modify renovation plans and visualize the effects. For occupants, the DR experience clarifies the designer's intentions and provides a vivid preview of the renovation outcomes.
At present, the system operates within a LAN. For future development, creating a relay server would enable WAN operation, allowing a remote room to be reconstructed for the user. During image conversion, segment parameters are manually set to ensure wall edges align correctly with the mask model's surfaces.
Regarding the image quality produced by GANs, it's evident from
Figure 14 that with the inclusion of more training images, there's a noticeable enhancement in the generated image quality. While thousands of images have been used, the outcomes still show a significant difference when compared to photos taken by cameras or rendered images. In future works, it's essential to consider more efficient methods for creating training sets. Manually crafted sketches are time-consuming and costly but yield the best results; sketches generated using scripts are produced quickly, but the models trained from them aren't entirely adept at processing sketches drawn by non-professionals. Furthermore, for GANs algorithms, there's no distinction between panorama images and regular photos in terms of learning and computation. However, for non-professionals, making renovation plan adjustments on panorama images presents certain challenges. The approach in this study involves converting the images into cube maps for modification. Still, this method is suitable only for minor image changes. Additionally, modifying parts located at the edge of the image isn't straightforward due to distortion issues. This remains one of the critical challenges to address in future research.
Author Contributions
Conceptualization, Y.Z. and T.F.; methodology, Y.Z. and T.F.; software, Y.Z.; resources, T.F. and N.Y.; writing—original draft preparation, Y.Z.; writing—review and editing, T.F. and N.Y; visualization, Y.Z. and T.F.; funding acquisition, T.F. All authors have read and agreed to the published version of the manuscript.
Acknowledgments
This research has been partly supported by JSPS KAKENHI Grant Number JP16K00707.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Klepeis, N. E. , Nelson, W. C., Ott, W. R., Robinson, J. P., Tsang, A. M., Switzer, P.,... & Engelmann, W. H. (2001). The National Human Activity Pattern Survey (NHAPS): a resource for assessing exposure to environmental pollutants. Journal of Exposure Science & Environmental Epidemiology, 11(3), 231-252. [CrossRef]
- Kovacic, I., Summer, M., & Achammer, C. (2015). Strategies of building stock renovation for ageing society. Journal of cleaner production, 88, 349-357. [CrossRef]
- Zhu, Y. , Fukuda, T. In , & Yabuki, N. (2018, May). Slam-based MR with animated CFD for building design simulation. In Proceedings of the 23rd International Conference of the Association for Computed-Aided Architectural Design Research in Asia (CAADRIA), Tsinghua University, School of Architecture, Beijing, China (pp. 17-19). [CrossRef]
- Milgram, P., & Kishino, F. A taxonomy of mixed reality visual displays. IEICE TRANSACTIONS on Information and Systems. 1994, 77, 1321–1329.
- Nakajima, Y. , Mori, S., & Saito, H. (2017, October). Semantic object selection and detection for diminished reality based on slam with viewpoint class. In 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-Adjunct) (pp. 338-343). IEEE. [CrossRef]
- Herling, J. , & Broll, W. (2012, November). Pixmix: A real-time approach to high-quality diminished reality. In 2012 ieee international symposium on mixed and augmented reality (ismar) (pp. 141-150). IEEE. [CrossRef]
- Kawai, N. , Sato, T., & Yokoya, N. (2013, October). Diminished reality considering background structures. In 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (pp. 259-260). IEEE. [CrossRef]
- Sasanuma, H. , Manabe, Y., & Yata, N. (2016, September). Diminishing real objects and adding virtual objects using a RGB-D camera. In 2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-Adjunct) (pp. 117-120). IEEE. [CrossRef]
- Cosco, F. I. , Garre, C., Bruno, F., Muzzupappa, M., & Otaduy, M. A. (2009, October). Augmented touch without visual obtrusion. In 2009 8th IEEE International Symposium on Mixed and Augmented Reality (pp. 99-102). IEEE. [CrossRef]
- Enomoto, A. , & Saito, H. (2007, November). Diminished reality using multiple handheld cameras. In Proc. ACCV (Vol. 7, pp. 130-135). [CrossRef]
- Meerits, S. , & Saito, H. (2015, September). Real-time diminished reality for dynamic scenes. In 2015 IEEE International Symposium on Mixed and Augmented Reality Workshops (pp. 53-59). IEEE. [CrossRef]
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27. [CrossRef]
- Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. arXiv:1511.06434. [CrossRef]
- Zhu, J., Shen, Y., Zhao, D., & Zhou, B. (2020, August). In-domain gan inversion for real image editing. In European conference on computer vision (pp. 592-608). Cham: Springer International Publishing. [CrossRef]
- Yang, C. Yang, C., Shen, Y., & Zhou, B. Semantic hierarchy emerges in deep generative representations for scene synthesis. International Journal of Computer Vision. 2021, 129, 1451–1466. [Google Scholar] [CrossRef]
- Zhu, Y., Fukuda, T., & Yabuki, N. (2019, April). Synthesizing 360-degree live streaming for an erased background to study renovation using mixed reality. In Proceedings of the 24th International Conference of the Association for Computed-Aided Architectural Design Research in Asia (CAADRIA), Victoria University of Wellington, Faculty of Architecture & Design (pp. 71-80). [CrossRef]
- Chan, D. W., & Kumaraswamy, M. M. (1997). A comparative study of causes of time overruns in Hong Kong construction projects. International Journal of project management, 15(1), 55-63. [CrossRef]
- Watson, A. (2011). Digital buildings–Challenges and opportunities. Advanced engineering informatics, 25(4), 573-581. [CrossRef]
- Ozdemir, M. A. (2021). Virtual reality (VR) and augmented reality (AR) technologies for accessibility and marketing in the tourism industry. In ICT tools and applications for accessible tourism (pp. 277-301). IGI Global. [CrossRef]
- Woksepp, S., & Olofsson, T. (2008). Credibility and applicability of virtual reality models in design and construction. Advanced Engineering Informatics, 22(4), 520-528. [CrossRef]
- Lee, J. , Miri, M., & Newberry, M. (2023). Immersive Virtual Reality, Tool for Accessible Design: Perceived Usability in an Interior Design Studio Setting. Journal of Interior Design, 10717641231182981. [CrossRef]
- Chandler, T. , Cordeil, M., Czauderna, T., Dwyer, T., Glowacki, J., Goncu, C.,... & Wilson, E. (2015, September). Immersive analytics. In 2015 Big Data Visual Analytics (BDVA) (pp. 1-8). IEEE. [CrossRef]
- Ohno, N. , & Kageyama, A. (2007). Scientific visualization of geophysical simulation data by the CAVE VR system with volume rendering. Physics of the Earth and Planetary Interiors, 163(1-4), 305-311. [CrossRef]
- Kan, P., Kurtic, A., Radwan, M., & Rodriguez, J. M. L. (2021). Automatic interior Design in Augmented Reality Based on hierarchical tree of procedural rules. Electronics, 10(3), 245. [CrossRef]
- Wang, X., Love, P. E., Kim, M. J., Park, C. S., Sing, C. P., & Hou, L. (2013). A conceptual framework for integrating building information modeling with augmented reality. Automation in construction, 34, 37-44. 34. [CrossRef]
- Mori, S. , Ikeda, S., & Saito, H. (2017). A survey of diminished reality: Techniques for visually concealing, eliminating, and seeing through real objects. IPSJ Transactions on Computer Vision and Applications, 9(1), 1-14. [CrossRef]
- Eskandari, R. , A. (2021). Diminished reality in architectural and environmental design: Literature review of techniques, applications, and challenges. In ISARC. Proceedings of the International Symposium on Automation and Robotics in Construction (Vol.; pp. 38995–1001. [CrossRef]
- Kido, D. , Fukuda, T., & Yabuki, N. (2020). Diminished reality system with real-time object detection using deep learning for onsite landscape simulation during redevelopment. Environmental Modelling & Software, 131, 104759. [CrossRef]
- Siltanen, S. (2017). Diminished reality for augmented reality interior design. The Visual Computer, 33, 193-208. [CrossRef]
- Liu, M. Y. , Huang, X., Yu, J., Wang, T. C., & Mallya, A. (2021). Generative adversarial networks for image and video synthesis: Algorithms and applications. Proceedings of the IEEE, 109(5), 839-862. [CrossRef]
- Yun, K. , Lu, T., & Chow, E. (2018, April). Occluded object reconstruction for first responders with augmented reality glasses using conditional generative adversarial networks. In Pattern Recognition and Tracking XXIX (Vol. 10649, pp. 225-231). SPIE. [CrossRef]
- Teo, T., Lawrence, L., Lee, G. A., Billinghurst, M., & Adcock, M. (2019, May). Mixed reality remote collaboration combining 360 video and 3d reconstruction. In Proceedings of the 2019 CHI conference on human factors in computing systems (pp. 1-14). [CrossRef]
- LI, Q. , TAKEUCHI, O., SHISHIDO, H., KAMEDA, Y., KIM, H., & KITAHARA, I. (2022). Generative image quality improvement in omnidirectional free-viewpoint images and assessments. IIEEJ Transactions on Image Electronics and Visual Computing, 10(1), 107-119. [CrossRef]
- Lin, C. H. , Chang, C. C., Chen, Y. S., Juan, D. C., Wei, W., & Chen, H. T. (2019). Coco-gan: Generation by parts via conditional coordinating. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 4512-4521). [Google Scholar] [CrossRef]
- Herling, J. , & Broll, W. (2010, October). Advanced self-contained object removal for realizing real-time diminished reality in unconstrained environments. In 2010 IEEE International Symposium on Mixed and Augmented Reality (pp. 207-212). IEEE. [CrossRef]
- Inoue, K., Fukuda, T., Cao, R., & Yabuki, N. (2018). Tracking Robustness and Green View Index Estimation of Augmented and Diminished Reality for Environmental Design. Proceedings of CAADRIA 2018, 339-348. [CrossRef]
- Meerits, S. , & Saito, H. (2015, August). Visualization of dynamic hidden areas by real-time 3d structure acquistion using rgb-d camera. In 3D Systems and Applications Conference. [CrossRef]
- “Converting to/from cubemaps”: 2018. Available from <https://paulbourke.net/panorama/cubemaps//> (Accessed: 05-10-2023).
- Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1125-1134). [CrossRef]
- “THETA developers’ API”: 2018. Available from <https://github.com/ricohapi/theta-api-specs> (Accessed: 05-10-2023).
- HDRIHAVEN. HDRIHaven. https://polyhaven.com/hdris/indoor. Accessed: 05-10-2023.
- Shutterstock. Shutterstock. https://www.shutterstock.com/ja/search/indoor-panorama. Accessed: 05-10-2023.
- Wang, Z. , Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600-612. [CrossRef]
- Huynh-Thu, Q. , & Ghanbari, M. (2008). Scope of validity of PSNR in image/video quality assessment. Electronics letters, 44(13), 800-801. [CrossRef]
- Horé, A. , & Ziou, D. (2013). Is there a relationship between peak-signal-to-noise ratio and structural similarity index measure?. IET Image Processing, 7(1), 12-24. [CrossRef]
- Jing, J., Deng, X., Xu, M., Wang, J., & Guan, Z. (2021). HiNet: deep image hiding by invertible network. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 4733-4742). [CrossRef]
- Cavigelli, L. , Hager, P., & Benini, L. (2017, May). CAS-CNN: A deep convolutional neural network for image compression artifact suppression. In 2017 International Joint Conference on Neural Networks (IJCNN) (pp. 752-759). IEEE. [CrossRef]
- Franchi, V. , & Ntagiou, E. (2021, November). Augmentation of a virtual reality environment using generative adversarial networks. In 2021 IEEE international conference on artificial intelligence and virtual reality (AIVR) (pp. 219-223). IEEE. [CrossRef]
Figure 1.
System overview.
Figure 1.
System overview.
Figure 2.
Wearing HoloLens Gen 1 (left); HoloLens Gen 1 appearance (right).
Figure 2.
Wearing HoloLens Gen 1 (left); HoloLens Gen 1 appearance (right).
Figure 3.
DR result without occlusion.
Figure 3.
DR result without occlusion.
Figure 4.
Room floor plan.
Figure 4.
Room floor plan.
Figure 5.
System data flow.
Figure 5.
System data flow.
Figure 6.
Panorama image conversion.
Figure 6.
Panorama image conversion.
Figure 7.
Reality view (Red frame: DR target wall).
Figure 7.
Reality view (Red frame: DR target wall).
Figure 8.
Room 410 BIM model.
Figure 8.
Room 410 BIM model.
Figure 9.
The current furnishings of room 410.
Figure 9.
The current furnishings of room 410.
Figure 10.
Hand drawn current furnishings of room 410.
Figure 10.
Hand drawn current furnishings of room 410.
Figure 11.
System scenarios.
Figure 11.
System scenarios.
Figure 12.
Part of the pairs' training data.
Figure 12.
Part of the pairs' training data.
Figure 13.
Part of the augmented images with paired hand-drawing images.
Figure 13.
Part of the augmented images with paired hand-drawing images.
Figure 14.
Results generated by GANs with different training datasets: 50 images (top), 200 images (middle), 3044 images (bottom).
Figure 14.
Results generated by GANs with different training datasets: 50 images (top), 200 images (middle), 3044 images (bottom).
Figure 15.
DR result and its segmentation map in position A (Above), B (Middle), C (Bottom).
Figure 15.
DR result and its segmentation map in position A (Above), B (Middle), C (Bottom).
Table 1.
Desktop PC (A) specifications.
Table 1.
Desktop PC (A) specifications.
ITEM |
PERFORMANCE |
OS |
Windows 10 Enterprise 64-bit |
CPU |
Intel Core i5 7500 @ 3.40GHz |
RAM |
16.0GB Dual-Channel DDR4 @ 2400MHz |
MOTHERBOARD |
H270-PLUS |
GPU |
NVIDIA GeForce GTX 1060 6G |
Table 2.
Desktop PC (B) specifications.
Table 2.
Desktop PC (B) specifications.
ITEM |
PERFORMANCE |
OS |
Ubuntu 16.04 |
CPU |
Intel Core i7 7700K @ 4.2GHz |
RAM |
16.0GB Dual-Channel DDR4 @ 2400MHz |
MOTHERBOARD |
Z270-K |
GPU |
NVIDIA GeForce GTX 2070s |
Table 3.
System environment of PC(B).
Table 3.
System environment of PC(B).
PACKAGE |
VERSION |
CUDA Toolkit Version |
CUDA 10.0.130_410.48 |
Linux x86_64 Driver Version |
NVIDIA driver 410.78 |
CUDNN |
7.4.2.24 |
Anaconda3 |
2021.05 |
PyTorch |
1.2.0 |
Pytorch Torchvision |
0.4.0 |
Table 4.
PSNR & SSIM Scores for 10 Image Pairs.
Table 4.
PSNR & SSIM Scores for 10 Image Pairs.
IMAGE NUMBER |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
PSNR (dB)
|
15.12 |
22.42 |
21.52 |
20.95 |
21.62 |
23.41 |
24.29 |
20.12 |
18.57 |
21.49 |
SSIM |
0.7332 |
0.9265 |
0.8546 |
0.8956 |
0.8103 |
0.9187 |
0.9036 |
0.8452 |
0.8419 |
0.8516 |
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).