Change log

v0.1.0 <2022-xx-xx>


New features

  • Dynamic batch size support (by batch_max_frames).

  • New acoustic models based on duration informed Tacotron.

  • New F0 prediction models based on duration informed Tacotron.

  • Multi-stream model implementations

  • Support mel-spectrogram as acoustic features.

  • Integrate uSFGAN vocoder

v0.0.3 <2022-10-15>

Moved the repository to organization.

New features

  • Mixed precision training #106

  • Recipe-level integration of hyperparameter optimization with Optuna #43 Hyperparameter optimization with Optuna

  • Added VariancePredictor (Ren et al. [RHQ+21]).

  • Spectrogram, aperiodicity, F0, and generated audio is now logged to tensorboard if is used.

  • Objective metrics (such as mel-cepstrum distortion and RMSE) are now logged to tensorboard. #41

  • Added MDNv2 (MDN + dropout) #118

  • Correct V/UV (correct_vuv) option is added to feature processing.

  • Support training non-resf0 models with #125

  • Global-variance (GV)-based post-filter

Bug fixes

  • Add a heuristic trick to prevent non-negative durations at synthesis time

  • Fix error when no dynamic features are used #128

  • Add a workaround for WORLD’s segfaults issue when min_f0 is too high.

  • Fix bug of computing pitch regularization weights

  • Fix continuous F0 for rest


  • nnsvs.model.MDN now support dropout by the dropout argument. The dropout argument existed before but it was no-op for a long time.

  • Number of training iterations can be now specified by either epochs or steps.

  • A heuristic trick is added to prevent serious V/UV prediction errors . #95 #119

  • Speech parameter trajectory smoothing (Takamichi et al. [TKT+15]). Disabled by default.

  • Added recipe tests on CI #116

  • Add option to allow filtering of long segments #135

  • Stream-wise flags to enable/disable dynamic features

  • Pre-processing: Tweaked min_f0/max_f0 threshold

  • Pre-processing: Add resampling if necessary

  • Pre-processing: Allow users to specify expliciti F0 range

  • Expose decay_size for pitch reguralization

  • Support Codecov


  • dropout for nnsvs.model.MDN is deprecated. Please consider removing the parameter as it has no effect.

  • dropout for nnsvs.model.Conv1dResnet is deprecated. Please consider removing the parameter as it has no effect.

  • FeedForwardNet is renamed to FFN to be consistent with other names (such as MDN)

  • ResF0Conv1dResnetMDN is deprecated. You can use ResF0Conv1dResnet with use_mdn=True.

  • Conv1dResnetMDN is deprecated. You can use Conv1dResnet with use_mdn=True.

Breaking changes

  • Update d4c threshold to prevent serious voiced -> unvoiced errors from 0.85 to 0.15. If you prefer the old default, please set d4c_threshold to 0.85.

  • Default values of functions in and are changed while refactoring. Please explicitly set the function arguments to avoid unexpected behavior.


Added documentations as mush as possible.

Experimental features

Some features that are available but not yet tested or documented

  • GAN-based post-filters (Kaneko et al. [KTKY17b], Kaneko et al. [KTKY17a]) #85 and GV post-filter (Silén et al. [SilenHNG12])

  • CycleGAN-based post-filter

  • Support for neural vocoders #72

  • Add ResF0NonAttentiveTacotron acoustic model. #129 #15

  • WaveNet #100

  • GAN-based acoustic models #85

  • Make nnsvs.svs to support trainable post-filters and neural vocoders.

v0.0.2 <2022-04-29>

A version that should work with ENUNU v0.4.0

New features

  • Improved timings with MDN duration models #80

  • Improved acoustic models with residual F0 prediction #76

Bug fixes

  • numpy.linalg.LinAlgError in MDN models #94

v0.0.1 <2022-03-11>

The first release

The initial version of nnsvs (with some experimental features like vibrato modeling and data augmentation). This version should be compatible with currently available tools around nnsvs (e.g., ENUNU). Hydra >=v1.0.0, <v1.2.0 is supported. PyPi release is also available. So you can install the core library by pip install nnsvs.