nnsvs.gen.predict_duration

nnsvs.gen.predict_duration(device, labels, duration_model, duration_config, duration_in_scaler, duration_out_scaler, binary_dict, numeric_dict, pitch_indices=None, log_f0_conditioning=True, force_clip_input_features=False, frame_period=5)[source]

Predict phoneme durations from HTS labels

Parameters:
  • device (torch.device) – device to run the model on

  • labels (nnmnkwii.io.hts.HTSLabelFile) – labels

  • duration_model (nn.Module) – duration model

  • duration_config (dict) – duration config

  • duration_in_scaler (sklearn.preprocessing.MinMaxScaler) – duration input scaler

  • duration_out_scaler (sklearn.preprocessing.MinMaxScaler) – duration output scaler

  • binary_dict (dict) – binary feature dictionary

  • numeric_dict (dict) – numeric feature dictionary

  • pitch_indices (list) – indices of pitch features

  • log_f0_conditioning (bool) – whether to use log-f0 conditioning

  • force_clip_input_features (bool) – whether to clip input features

Returns:

predicted durations

Return type:

np.ndarray