nnsvs.postfilters
Variance scaling method to enhance synthetic speech quality |
Conv2dPostFilter
- class nnsvs.postfilters.Conv2dPostFilter(in_dim=None, channels=128, kernel_size=(5, 5), init_type='kaiming_normal', noise_scale=1.0, noise_type='bin_wise', padding_mode='zeros', smoothing_width=-1)[source]
A post-filter based on Conv2d
A model proposed in Kaneko et al. [KTKY17a].
- Parameters:
channels (int) – number of channels
kernel_size (tuple) – kernel sizes for Conv2d
init_type (str) – type of initialization
noise_scale (float) – scale of noise
noise_type (str) – type of noise. “frame_wise” or “bin_wise”
padding_mode (str) – padding mode
smoothing_width (int) – Width of smoothing window. The larger the smoother. Only used at inference time.
MultistreamPostFilter
- class nnsvs.postfilters.MultistreamPostFilter(mgc_postfilter: Module, bap_postfilter: Module, lf0_postfilter: Module, stream_sizes: list, mgc_offset: int = 2, bap_offset: int = 0)[source]
A multi-stream post-filter that applies post-filtering for each feature stream
Currently, post-filtering for MGC, BAP and log-F0 are supported. Note that it doesn’t make much sense to apply post-filtering for other features.
- Parameters:
mgc_postfilter (nn.Module) – post-filter for MGC
bap_postfilter (nn.Module) – post-filter for BAP
lf0_postfilter (nn.Module) – post-filter for log-F0
stream_sizes (list) – sizes of each feature stream
mgc_offset (int) – offset for MGC. Defaults to 2.
bap_offset (int) – offset for BAP. Defaults to 0.