WILL BE MOVED TO GITHUB RELEASES.
Dynamic batch size support (by
New acoustic models based on duration informed Tacotron.
New F0 prediction models based on duration informed Tacotron.
Multi-stream model implementations
Support mel-spectrogram as acoustic features.
Integrate uSFGAN vocoder
Moved the repository to https://github.com/nnsvs organization.
Mixed precision training #106
Added VariancePredictor (Ren et al. [RHQ+21]).
Spectrogram, aperiodicity, F0, and generated audio is now logged to tensorboard if
Objective metrics (such as mel-cepstrum distortion and RMSE) are now logged to tensorboard. #41
Added MDNv2 (MDN + dropout) #118
Correct V/UV (
correct_vuv) option is added to feature processing.
Support training non-resf0 models with
Global-variance (GV)-based post-filter
Add a heuristic trick to prevent non-negative durations at synthesis time
Fix error when no dynamic features are used #128
Add a workaround for WORLD’s segfaults issue when
min_f0is too high.
Fix bug of computing pitch regularization weights
Fix continuous F0 for rest
nnsvs.model.MDNnow support dropout by the
dropoutargument existed before but it was no-op for a long time.
Number of training iterations can be now specified by either epochs or steps.
Speech parameter trajectory smoothing (Takamichi et al. [TKT+15]). Disabled by default.
Added recipe tests on CI #116
Add option to allow filtering of long segments #135
Stream-wise flags to enable/disable dynamic features
Pre-processing: Tweaked min_f0/max_f0 threshold
Pre-processing: Add resampling if necessary
Pre-processing: Allow users to specify expliciti F0 range
Expose decay_size for pitch reguralization
nnsvs.model.MDNis deprecated. Please consider removing the parameter as it has no effect.
nnsvs.model.Conv1dResnetis deprecated. Please consider removing the parameter as it has no effect.
FeedForwardNetis renamed to
FFNto be consistent with other names (such as MDN)
ResF0Conv1dResnetMDNis deprecated. You can use
Conv1dResnetMDNis deprecated. You can use
Update d4c threshold to prevent serious voiced -> unvoiced errors from 0.85 to 0.15. If you prefer the old default, please set d4c_threshold to 0.85.
Default values of functions in
svs.pyare changed while refactoring. Please explicitly set the function arguments to avoid unexpected behavior.
Added documentations as mush as possible.
Some features that are available but not yet tested or documented
Support for neural vocoders #72
GAN-based acoustic models #85
Make nnsvs.svs to support trainable post-filters and neural vocoders.
A version that should work with ENUNU v0.4.0
numpy.linalg.LinAlgError in MDN models #94
The first release
The initial version of nnsvs (with some experimental features like vibrato modeling and data augmentation). This version should be compatible with currently available tools around nnsvs (e.g., ENUNU). Hydra >=v1.0.0, <v1.2.0 is supported. PyPi release is also available. So you can install the core library by pip install nnsvs.