nnsvs.gen.predict_waveform

nnsvs.gen.predict_waveform(device, multistream_features, vocoder=None, vocoder_config=None, vocoder_in_scaler=None, sample_rate=48000, frame_period=5, use_world_codec=True, feature_type='world', vocoder_type='world', vuv_threshold=0.5)[source]

Predict waveform from multi-stream acoustic features

Vocoders can be 1) WORLD, 2) PWG or 3) uSFGAN.

Parameters:
  • device (torch.device) – Device to run inference

  • features (tuple) – Acoustic features

  • vocoder (nn.Module) – Vocoder model

  • vocoder_config (dict) – Vocoder config

  • vocoder_in_scaler (StandardScaler) – Vocoder input scaler

  • sample_rate (int,) – Sampling rate.

  • frame_period (float) – Frame period in msec.

  • use_world_codec (bool) – Whether to use WORLD codec for decoding.

  • feature_type (str) – Feature type. world world_org, melf0 or neutrino.

  • vocoder_type (str) – Vocoder type. world or pwg or usfgan

  • vuv_threshold (float) – VUV threshold.

Returns:

Predicted waveform

Return type:

np.ndarray