nnsvs.postfilters

variance_scaling

Variance scaling method to enhance synthetic speech quality

Conv2dPostFilter

class nnsvs.postfilters.Conv2dPostFilter(in_dim=None, channels=128, kernel_size=(5, 5), init_type='kaiming_normal', noise_scale=1.0, noise_type='bin_wise', padding_mode='zeros', smoothing_width=-1)[source]

A post-filter based on Conv2d

A model proposed in Kaneko et al. [KTKY17a].

Parameters:

channels (int) – number of channels
kernel_size (tuple) – kernel sizes for Conv2d
init_type (str) – type of initialization
noise_scale (float) – scale of noise
noise_type (str) – type of noise. “frame_wise” or “bin_wise”
padding_mode (str) – padding mode
smoothing_width (int) – Width of smoothing window. The larger the smoother. Only used at inference time.

MultistreamPostFilter

class nnsvs.postfilters.MultistreamPostFilter(mgc_postfilter: Module, bap_postfilter: Module, lf0_postfilter: Module, stream_sizes: list, mgc_offset: int = 2, bap_offset: int = 0)[source]

A multi-stream post-filter that applies post-filtering for each feature stream

Currently, post-filtering for MGC, BAP and log-F0 are supported. Note that it doesn’t make much sense to apply post-filtering for other features.

Parameters:

mgc_postfilter (nn.Module) – post-filter for MGC
bap_postfilter (nn.Module) – post-filter for BAP
lf0_postfilter (nn.Module) – post-filter for log-F0
stream_sizes (list) – sizes of each feature stream
mgc_offset (int) – offset for MGC. Defaults to 2.
bap_offset (int) – offset for BAP. Defaults to 0.