nnsvs.gen.postprocess_acoustic

nnsvs.gen.postprocess_acoustic(device, acoustic_features, duration_modified_labels, binary_dict, numeric_dict, acoustic_config, acoustic_out_static_scaler, postfilter_model=None, postfilter_config=None, postfilter_out_scaler=None, sample_rate=48000, frame_period=5, relative_f0=False, feature_type='world', post_filter_type='gv', trajectory_smoothing=True, trajectory_smoothing_cutoff=50, trajectory_smoothing_cutoff_f0=20, vuv_threshold=0.5, f0_shift_in_cent=0, vibrato_scale=1.0, force_fix_vuv=False, fill_silence_to_rest=False)[source]

Post-process acoustic features

The function converts acoustic features in single ndarray to tuple of multi-stream acoustic features.

e.g., array -> (mgc, lf0, vuv, bap)

Parameters:
  • device (torch.device) – Device.

  • duration_modified_labels (nnmnkwii.io.hts.HTSLabelFile) – HTS label file.

  • binary_dict (dict) – Dictionary of binary features.

  • numeric_dict (dict) – Dictionary of numeric features.

  • acoustic_config (dict) – Acoustic model configuration.

  • acoustic_features (np.ndarray) – Acoustic features.

  • acoustic_out_static_scaler (sklearn.preprocessing.StandardScaler) – Scaler for acoustic features.

  • postfilter_model (nn.Module) – Post-filter model.

  • postfilter_config (dict) – Post-filter model configuration.

  • postfilter_out_scaler (sklearn.preprocessing.StandardScaler) – Scaler for post-filter.

  • sample_rate (int) – Sampling rate.

  • frame_period (float) – Frame period in milliseconds.

  • relative_f0 (bool) – If True, use relative f0.

  • feature_type (str) – Feature type.

  • post_filter_type (str) – Post-filter type. One of gv, merlin or nnsvs. Recommended to use gv for general purpose.

  • trajectory_smoothing (bool) – Whether to apply trajectory smoothing.

  • trajectory_smoothing_cutoff (float) – Cutoff frequency for trajectory smoothing of spectral features.

  • trajectory_smoothing_cutoff_f0 (float) – Cutoff frequency for trajectory smoothing of f0.

  • vuv_threshold (float) – V/UV threshold.

  • f0_shift_in_cent (float) – F0 shift in cents.

  • vibrato_scale (float) – Vibrato scale.

  • force_fix_vuv (bool) – If True, force to fix V/UV.

  • fill_silence_to_rest (bool) – Fill silence to rest frames.

Returns:

Post-processed acoustic features.

Return type:

tuple