nnsvs.svs

class nnsvs.svs.SPSVS(model_dir, device='cpu')[source]

Statistical parametric singing voice synthesis

Note

This class is designed to be language-independent. Therefore, frontend functionality such as converting musicXML/UST to HTS labels is not included.

Parameters
  • model_dir (str) – directory of the model

  • device (str) – cpu or cuda

Examples:

Synthesize wavefrom from a musicxml file

import numpy as np
import pysinsy
from nnmnkwii.io import hts
from nnsvs.pretrained import retrieve_pretrained_model
from nnsvs.svs import SPSVS
from nnsvs.util import example_xml_file
import matplotlib.pyplot as plt

model_dir = retrieve_pretrained_model("r9y9/kiritan_latest")
engine = SPSVS(model_dir)

contexts = pysinsy.extract_fullcontext(example_xml_file(key="get_over"))
labels = hts.HTSLabelFile.create_from_contexts(contexts)

wav, sr = engine.svs(labels)

fig, ax = plt.subplots(figsize=(8,2))
librosa.display.waveshow(wav.astype(np.float32), sr, ax=ax)
../_images/svs-1.png

With a trained post-filter:

>>> wav, sr = engine.svs(labels, posft_filter_type="nnsvs")

With a trained neural vocoder:

>>> wav, sr = engine.svs(labels, vocoder_type="pwg")

With a global variance enhancement filter and a neural vocoder:

>>> wav, sr = engine.svs(labels, post_filter_type="gv", vocoder_type="pwg")

Default of the NNSVS v0.0.2 or earlier:

>>> wav, sr = engine.svs(labels, post_filter_type="merlin", vocoder_type="world")
set_device(device)[source]

Set device for the SVS model

Parameters

device (str) – cpu or cuda.

synthesis_from_timings(duration_modified_labels, vocoder_type='world', post_filter_type='merlin', trajectory_smoothing=True, trajectory_smoothing_cutoff=50, trajectory_smoothing_cutoff_f0=20, vuv_threshold=0.5, pre_f0_shift_in_cent=0, post_f0_shift_in_cent=0, vibrato_scale=1.0, return_states=False, force_fix_vuv=False, segmented_synthesis=False)[source]

Synthesize waveform from HTS labels with timings.

Parameters
  • duration_modified_labels (nnmnkwii.io.hts.HTSLabelFile) – HTS labels with predicted timings.

  • vocoder_type (str) – Vocoder type. world, pwg and usfgan is supported.

  • post_filter_type (str) – Post-filter type. merlin, gv or nnsvs is supported.

  • trajectory_smoothing (bool) – Whether to smooth acoustic feature trajectory.

  • trajectory_smoothing_cutoff (int) – Cutoff frequency for trajectory smoothing.

  • trajectory_smoothing_cutoff_f0 (int) – Cutoff frequency for trajectory smoothing of f0.

  • vuv_threshold (float) – Threshold for VUV.

  • f0_scale (float) – Scale factor for f0.

  • vibrato_scale (float) – Scale for vibrato. Only valid if the acoustic features contain vibrato parameters.

  • return_states (bool) – Whether to return the internal states (for debugging)

  • force_fix_vuv (bool) – Whether to correct VUV.

  • segmneted_synthesis (bool) – Whether to use segmented synthesis.

svs(labels, vocoder_type='world', post_filter_type='merlin', trajectory_smoothing=True, trajectory_smoothing_cutoff=50, trajectory_smoothing_cutoff_f0=20, vuv_threshold=0.5, pre_f0_shift_in_cent=0, post_f0_shift_in_cent=0, vibrato_scale=1.0, return_states=False, force_fix_vuv=False, post_filter=None, segmented_synthesis=False)[source]

Synthesize waveform from HTS labels.

Parameters
  • labels (nnmnkwii.io.hts.HTSLabelFile) – HTS labels

  • vocoder_type (str) – Vocoder type. world, pwg and usfgan is supported.

  • post_filter_type (str) – Post-filter type. merlin, gv or nnsvs is supported.

  • trajectory_smoothing (bool) – Whether to smooth acoustic feature trajectory.

  • trajectory_smoothing_cutoff (int) – Cutoff frequency for trajectory smoothing.

  • trajectory_smoothing_cutoff_f0 (int) – Cutoff frequency for trajectory smoothing of f0.

  • vuv_threshold (float) – Threshold for VUV.

  • f0_shift_in_cent (float) – F0 scaling factor.

  • vibrato_scale (float) – Scale for vibrato. Only valid if the acoustic features contain vibrato parameters.

  • return_states (bool) – Whether to return the internal states (for debugging)

  • force_fix_vuv (bool) – Whether to correct VUV.

  • segmneted_synthesis (bool) – Whether to use segmented synthesis.