Abstract:
This study proposes a room impulse response (RIR) computation model tailored for virtual reality applications, integrating deep learning neural network techniques with psychoacoustic perception parameters. This model can efficiently predict perceptually meaningful RIRs from virtual reality scene data while ensuring high-quality predictions. It meets the requirements for real-time generation, high sampling rate, unrestricted length, and lightweight implementation in virtual reality audio scenarios. The model first encodes the acoustic information from the scene using a graph convolutional neural network, then decodes this information through a neural sound field and transposed convolution model to obtain the RIR perception parameters. Finally, the RIR signal is reconstructed from these parameters. Experimental results demonstrate that the proposed model offers significant advantages in RIR generation quality, computational efficiency, and functionality, making it well-suited to meet the real-time RIR generation needs of virtual reality audio.