V-Cloak

Application Scenario of V-Cloak

V-Cloak: Intelligibility-, Naturalness- & Timbre-Preserving Real-Time Voice Anonymization

Jiangyi Deng¹, Fei Teng¹, Yanjiao Chen¹, Xiaofu Chen², Zhaohui Wang², Wenyuan Xu¹

¹Zhejiang University, ²Wuhan University

The paper has been accepted to USENIX Security Symposium 2023.

Abstract

Voice data generated on instant messaging or social media applications contains unique user voiceprint that may be abused by malicious adversaries for identity inference or identity theft. In this paper, we develop a voice anonymization system, named V-Cloak, which attains real-time voice anonymization while preserving the intelligibility, naturalness and timbre of the audio.

We have conducted extensive experiments on four datasets, i.e., LibriSpeech (English), AISHELL (Chinese), CommonVoice (French) and CommonVoice (Italian), five Automatic Speaker Verification (ASV) systems (including two DNN-based, two statistical and one commercial ASV), and eleven Automatic Speech Recognition (ASR) systems (for different languages), demonstrating the effectiveness, robustness, and efficiency of V-Cloak.

Hopefully, V-Cloak may provide a cloak for us in a prism world.

Demo Audios

In this part, we provide demo audios generated by V-Cloak and four previous works.

Group 1: Speaker #1188 (Male)

B0: Raw
B1: NSF
B2: HFGAN
B3: McAdams
B4: VoiceMask
V-Cloak (ε=0.1)

Group 2: Speaker #61 (Male)

B0: Raw
B1: NSF
B2: HFGAN
B3: McAdams
B4: VoiceMask
V-Cloak (ε=0.1)

Group 3: Speaker #2961 (Female)

B0: Raw
B1: NSF
B2: HFGAN
B3: McAdams
B4: VoiceMask
V-Cloak (ε=0.1)

Group 4: Speaker #3575 (Female)

B0: Raw
B1: NSF
B2: HFGAN
B3: McAdams
B4: VoiceMask
V-Cloak (ε=0.1)

Different Anonymization Levels

In this part, we present audios anonymized with different ε-s.

Speaker #7021 (Male)

B0: Raw
ε=0.02
ε=0.04
ε=0.06
ε=0.08
ε=0.10