nnsvs.gen.predict_acoustic(device, labels, acoustic_model, acoustic_config, acoustic_in_scaler, acoustic_out_scaler, binary_dict, numeric_dict, subphone_features='coarse_coding', pitch_indices=None, log_f0_conditioning=True, force_clip_input_features=False, frame_period=5, f0_shift_in_cent=0)[source]

Predict acoustic features from HTS labels

MLPG is applied to the predicted features if the output features have dynamic features.

  • device (torch.device) – device to use

  • labels (HTSLabelFile) – HTS labels

  • acoustic_model (nn.Module) – acoustic model

  • acoustic_config (AcousticConfig) – acoustic configuration

  • acoustic_in_scaler (sklearn.preprocessing.StandardScaler) – input scaler

  • acoustic_out_scaler (sklearn.preprocessing.StandardScaler) – output scaler

  • binary_dict (dict) – binary feature dictionary

  • numeric_dict (dict) – numeric feature dictionary

  • subphone_features (str) – subphone feature type

  • pitch_indices (list) – indices of pitch features

  • log_f0_conditioning (bool) – whether to use log f0 conditioning

  • force_clip_input_features (bool) – whether to force clip input features

  • frame_period (float) – frame period in msec

  • f0_shift_in_cent (float) – F0 shift in cent-scale before the inference


predicted acoustic features

Return type: