nnsvs.gen.predict_waveform

nnsvs.gen.predict_waveform(device, multistream_features, vocoder=None, vocoder_config=None, vocoder_in_scaler=None, sample_rate=48000, frame_period=5, use_world_codec=True, feature_type='world', vocoder_type='world', vuv_threshold=0.5)[source]

Predict waveform from multi-stream acoustic features

Vocoders can be 1) WORLD, 2) PWG or 3) uSFGAN.

Parameters:

device (torch.device) – Device to run inference
features (tuple) – Acoustic features
vocoder (nn.Module) – Vocoder model
vocoder_config (dict) – Vocoder config
vocoder_in_scaler (StandardScaler) – Vocoder input scaler
sample_rate (int,) – Sampling rate.
frame_period (float) – Frame period in msec.
use_world_codec (bool) – Whether to use WORLD codec for decoding.
feature_type (str) – Feature type. world world_org, melf0 or neutrino.
vocoder_type (str) – Vocoder type. world or pwg or usfgan
vuv_threshold (float) – VUV threshold.

Returns:

Predicted waveform

Return type:

np.ndarray