nnsvs.gen.predict_waveform
- nnsvs.gen.predict_waveform(device, multistream_features, vocoder=None, vocoder_config=None, vocoder_in_scaler=None, sample_rate=48000, frame_period=5, use_world_codec=True, feature_type='world', vocoder_type='world', vuv_threshold=0.5)[source]
Predict waveform from multi-stream acoustic features
Vocoders can be 1) WORLD, 2) PWG or 3) uSFGAN.
- Parameters:
device (torch.device) – Device to run inference
features (tuple) – Acoustic features
vocoder (nn.Module) – Vocoder model
vocoder_config (dict) – Vocoder config
vocoder_in_scaler (StandardScaler) – Vocoder input scaler
sample_rate (int,) – Sampling rate.
frame_period (float) – Frame period in msec.
use_world_codec (bool) – Whether to use WORLD codec for decoding.
feature_type (str) – Feature type.
world
world_org
,melf0
orneutrino
.vocoder_type (str) – Vocoder type.
world
orpwg
orusfgan
vuv_threshold (float) – VUV threshold.
- Returns:
Predicted waveform
- Return type:
np.ndarray