nnsvs.gen.postprocess_acoustic

nnsvs.gen.postprocess_acoustic(device, acoustic_features, duration_modified_labels, binary_dict, numeric_dict, acoustic_config, acoustic_out_static_scaler, postfilter_model=None, postfilter_config=None, postfilter_out_scaler=None, sample_rate=48000, frame_period=5, relative_f0=False, feature_type='world', post_filter_type='gv', trajectory_smoothing=True, trajectory_smoothing_cutoff=50, trajectory_smoothing_cutoff_f0=20, vuv_threshold=0.5, f0_shift_in_cent=0, vibrato_scale=1.0, force_fix_vuv=False, fill_silence_to_rest=False)[source]

Post-process acoustic features

The function converts acoustic features in single ndarray to tuple of multi-stream acoustic features.

e.g., array -> (mgc, lf0, vuv, bap)

Parameters:

device (torch.device) – Device.
duration_modified_labels (nnmnkwii.io.hts.HTSLabelFile) – HTS label file.
binary_dict (dict) – Dictionary of binary features.
numeric_dict (dict) – Dictionary of numeric features.
acoustic_config (dict) – Acoustic model configuration.
acoustic_features (np.ndarray) – Acoustic features.
acoustic_out_static_scaler (sklearn.preprocessing.StandardScaler) – Scaler for acoustic features.
postfilter_model (nn.Module) – Post-filter model.
postfilter_config (dict) – Post-filter model configuration.
postfilter_out_scaler (sklearn.preprocessing.StandardScaler) – Scaler for post-filter.
sample_rate (int) – Sampling rate.
frame_period (float) – Frame period in milliseconds.
relative_f0 (bool) – If True, use relative f0.
feature_type (str) – Feature type.
post_filter_type (str) – Post-filter type. One of gv, merlin or nnsvs. Recommended to use gv for general purpose.
trajectory_smoothing (bool) – Whether to apply trajectory smoothing.
trajectory_smoothing_cutoff (float) – Cutoff frequency for trajectory smoothing of spectral features.
trajectory_smoothing_cutoff_f0 (float) – Cutoff frequency for trajectory smoothing of f0.
vuv_threshold (float) – V/UV threshold.
f0_shift_in_cent (float) – F0 shift in cents.
vibrato_scale (float) – Vibrato scale.
force_fix_vuv (bool) – If True, force to fix V/UV.
fill_silence_to_rest (bool) – Fill silence to rest frames.

Returns:

Post-processed acoustic features.

Return type:

tuple