Change log

v0.1.0 <2022-xx-xx>

WILL BE MOVED TO GITHUB RELEASES.

New features

Dynamic batch size support (by batch_max_frames).
New acoustic models based on duration informed Tacotron.
New F0 prediction models based on duration informed Tacotron.
Multi-stream model implementations
Support mel-spectrogram as acoustic features.
Integrate uSFGAN vocoder

v0.0.3 <2022-10-15>

Moved the repository to https://github.com/nnsvs organization.

New features

Mixed precision training #106
Recipe-level integration of hyperparameter optimization with Optuna #43 Hyperparameter optimization with Optuna
Added VariancePredictor (Ren et al. [RHQ+21]).
Spectrogram, aperiodicity, F0, and generated audio is now logged to tensorboard if train_resf0.py is used.
Objective metrics (such as mel-cepstrum distortion and RMSE) are now logged to tensorboard. #41
Added MDNv2 (MDN + dropout) #118
Correct V/UV (correct_vuv) option is added to feature processing.
Support training non-resf0 models with train_resf0.py #125
Global-variance (GV)-based post-filter

Bug fixes

Add a heuristic trick to prevent non-negative durations at synthesis time
Fix error when no dynamic features are used #128
Add a workaround for WORLD’s segfaults issue when min_f0 is too high.
Fix bug of computing pitch regularization weights
Fix continuous F0 for rest

Improvements

nnsvs.model.MDN now support dropout by the dropout argument. The dropout argument existed before but it was no-op for a long time.
Number of training iterations can be now specified by either epochs or steps.
A heuristic trick is added to prevent serious V/UV prediction errors . #95 #119
Speech parameter trajectory smoothing (Takamichi et al. [TKT+15]). Disabled by default.
Added recipe tests on CI #116
Add option to allow filtering of long segments #135
Stream-wise flags to enable/disable dynamic features
Pre-processing: Tweaked min_f0/max_f0 threshold
Pre-processing: Add resampling if necessary
Pre-processing: Allow users to specify expliciti F0 range
Expose decay_size for pitch reguralization
Support Codecov

Deprecations

dropout for nnsvs.model.MDN is deprecated. Please consider removing the parameter as it has no effect.
dropout for nnsvs.model.Conv1dResnet is deprecated. Please consider removing the parameter as it has no effect.
FeedForwardNet is renamed to FFN to be consistent with other names (such as MDN)
ResF0Conv1dResnetMDN is deprecated. You can use ResF0Conv1dResnet with use_mdn=True.
Conv1dResnetMDN is deprecated. You can use Conv1dResnet with use_mdn=True.

Breaking changes

Update d4c threshold to prevent serious voiced -> unvoiced errors from 0.85 to 0.15. If you prefer the old default, please set d4c_threshold to 0.85.
Default values of functions in gen.py and svs.py are changed while refactoring. Please explicitly set the function arguments to avoid unexpected behavior.

Documentation

Added documentations as mush as possible.

Experimental features

Some features that are available but not yet tested or documented

GAN-based post-filters (Kaneko et al. [KTKY17b], Kaneko et al. [KTKY17a]) #85 and GV post-filter (Silén et al. [SilenHNG12])
CycleGAN-based post-filter
Support for neural vocoders #72
Add ResF0NonAttentiveTacotron acoustic model. #129 #15
WaveNet #100
GAN-based acoustic models #85
Make nnsvs.svs to support trainable post-filters and neural vocoders.

v0.0.2 <2022-04-29>

A version that should work with ENUNU v0.4.0

New features

Improved timings with MDN duration models #80
Improved acoustic models with residual F0 prediction #76

Bug fixes

numpy.linalg.LinAlgError in MDN models #94

v0.0.1 <2022-03-11>

The first release

The initial version of nnsvs (with some experimental features like vibrato modeling and data augmentation). This version should be compatible with currently available tools around nnsvs (e.g., ENUNU). Hydra >=v1.0.0, <v1.2.0 is supported. PyPi release is also available. So you can install the core library by pip install nnsvs.