Variance scaling method to enhance synthetic speech quality


class nnsvs.postfilters.Conv2dPostFilter(in_dim=None, channels=128, kernel_size=(5, 5), init_type='kaiming_normal', noise_scale=1.0, noise_type='bin_wise', padding_mode='zeros', smoothing_width=-1)[source]

A post-filter based on Conv2d

A model proposed in Kaneko et al. [KTKY17a].

  • channels (int) – number of channels

  • kernel_size (tuple) – kernel sizes for Conv2d

  • init_type (str) – type of initialization

  • noise_scale (float) – scale of noise

  • noise_type (str) – type of noise. “frame_wise” or “bin_wise”

  • padding_mode (str) – padding mode

  • smoothing_width (int) – Width of smoothing window. The larger the smoother. Only used at inference time.


class nnsvs.postfilters.MultistreamPostFilter(mgc_postfilter: Module, bap_postfilter: Module, lf0_postfilter: Module, stream_sizes: list, mgc_offset: int = 2, bap_offset: int = 0)[source]

A multi-stream post-filter that applies post-filtering for each feature stream

Currently, post-filtering for MGC, BAP and log-F0 are supported. Note that it doesn’t make much sense to apply post-filtering for other features.

  • mgc_postfilter (nn.Module) – post-filter for MGC

  • bap_postfilter (nn.Module) – post-filter for BAP

  • lf0_postfilter (nn.Module) – post-filter for log-F0

  • stream_sizes (list) – sizes of each feature stream

  • mgc_offset (int) – offset for MGC. Defaults to 2.

  • bap_offset (int) – offset for BAP. Defaults to 0.