Change log
v0.1.0 <2022-xx-xx>
WILL BE MOVED TO GITHUB RELEASES.
New features
Dynamic batch size support (by
batch_max_frames
).New acoustic models based on duration informed Tacotron.
New F0 prediction models based on duration informed Tacotron.
Multi-stream model implementations
Support mel-spectrogram as acoustic features.
Integrate uSFGAN vocoder
v0.0.3 <2022-10-15>
Moved the repository to https://github.com/nnsvs organization.
New features
Mixed precision training #106
Recipe-level integration of hyperparameter optimization with Optuna #43 Hyperparameter optimization with Optuna
Added VariancePredictor (Ren et al. [RHQ+21]).
Spectrogram, aperiodicity, F0, and generated audio is now logged to tensorboard if
train_resf0.py
is used.Objective metrics (such as mel-cepstrum distortion and RMSE) are now logged to tensorboard. #41
Added MDNv2 (MDN + dropout) #118
Correct V/UV (
correct_vuv
) option is added to feature processing.Support training non-resf0 models with
train_resf0.py
#125Global-variance (GV)-based post-filter
Bug fixes
Add a heuristic trick to prevent non-negative durations at synthesis time
Fix error when no dynamic features are used #128
Add a workaround for WORLD’s segfaults issue when
min_f0
is too high.Fix bug of computing pitch regularization weights
Fix continuous F0 for rest
Improvements
nnsvs.model.MDN
now support dropout by thedropout
argument. Thedropout
argument existed before but it was no-op for a long time.Number of training iterations can be now specified by either epochs or steps.
A heuristic trick is added to prevent serious V/UV prediction errors . #95 #119
Speech parameter trajectory smoothing (Takamichi et al. [TKT+15]). Disabled by default.
Added recipe tests on CI #116
Add option to allow filtering of long segments #135
Stream-wise flags to enable/disable dynamic features
Pre-processing: Tweaked min_f0/max_f0 threshold
Pre-processing: Add resampling if necessary
Pre-processing: Allow users to specify expliciti F0 range
Expose decay_size for pitch reguralization
Support Codecov
Deprecations
dropout
fornnsvs.model.MDN
is deprecated. Please consider removing the parameter as it has no effect.dropout
fornnsvs.model.Conv1dResnet
is deprecated. Please consider removing the parameter as it has no effect.FeedForwardNet
is renamed toFFN
to be consistent with other names (such as MDN)ResF0Conv1dResnetMDN
is deprecated. You can useResF0Conv1dResnet
withuse_mdn=True
.Conv1dResnetMDN
is deprecated. You can useConv1dResnet
withuse_mdn=True
.
Breaking changes
Update d4c threshold to prevent serious voiced -> unvoiced errors from 0.85 to 0.15. If you prefer the old default, please set d4c_threshold to 0.85.
Default values of functions in
gen.py
andsvs.py
are changed while refactoring. Please explicitly set the function arguments to avoid unexpected behavior.
Documentation
Added documentations as mush as possible.
Experimental features
Some features that are available but not yet tested or documented
v0.0.2 <2022-04-29>
A version that should work with ENUNU v0.4.0
New features
Bug fixes
numpy.linalg.LinAlgError in MDN models #94
v0.0.1 <2022-03-11>
The first release
The initial version of nnsvs (with some experimental features like vibrato modeling and data augmentation). This version should be compatible with currently available tools around nnsvs (e.g., ENUNU). Hydra >=v1.0.0, <v1.2.0 is supported. PyPi release is also available. So you can install the core library by pip install nnsvs.