nnsvs.svs
- class nnsvs.svs.SPSVS(model_dir, device='cpu')[source]
Statistical parametric singing voice synthesis
Note
This class is designed to be language-independent. Therefore, frontend functionality such as converting musicXML/UST to HTS labels is not included.
Examples:
Synthesize wavefrom from a musicxml file
import numpy as np import pysinsy from nnmnkwii.io import hts from nnsvs.pretrained import retrieve_pretrained_model from nnsvs.svs import SPSVS from nnsvs.util import example_xml_file import matplotlib.pyplot as plt model_dir = retrieve_pretrained_model("r9y9/kiritan_latest") engine = SPSVS(model_dir) contexts = pysinsy.extract_fullcontext(example_xml_file(key="get_over")) labels = hts.HTSLabelFile.create_from_contexts(contexts) wav, sr = engine.svs(labels) fig, ax = plt.subplots(figsize=(8,2)) librosa.display.waveshow(wav.astype(np.float32), sr, ax=ax)
With a trained post-filter:
>>> wav, sr = engine.svs(labels, posft_filter_type="nnsvs")
With a trained neural vocoder:
>>> wav, sr = engine.svs(labels, vocoder_type="pwg")
With a global variance enhancement filter and a neural vocoder:
>>> wav, sr = engine.svs(labels, post_filter_type="gv", vocoder_type="pwg")
Default of the NNSVS v0.0.2 or earlier:
>>> wav, sr = engine.svs(labels, post_filter_type="merlin", vocoder_type="world")
- synthesis_from_timings(duration_modified_labels, vocoder_type='world', post_filter_type='merlin', trajectory_smoothing=True, trajectory_smoothing_cutoff=50, trajectory_smoothing_cutoff_f0=20, vuv_threshold=0.5, pre_f0_shift_in_cent=0, post_f0_shift_in_cent=0, vibrato_scale=1.0, return_states=False, force_fix_vuv=False, segmented_synthesis=False)[source]
Synthesize waveform from HTS labels with timings.
- Parameters
duration_modified_labels (nnmnkwii.io.hts.HTSLabelFile) – HTS labels with predicted timings.
vocoder_type (str) – Vocoder type.
world
,pwg
andusfgan
is supported.post_filter_type (str) – Post-filter type.
merlin
,gv
ornnsvs
is supported.trajectory_smoothing (bool) – Whether to smooth acoustic feature trajectory.
trajectory_smoothing_cutoff (int) – Cutoff frequency for trajectory smoothing.
trajectory_smoothing_cutoff_f0 (int) – Cutoff frequency for trajectory smoothing of f0.
vuv_threshold (float) – Threshold for VUV.
f0_scale (float) – Scale factor for f0.
vibrato_scale (float) – Scale for vibrato. Only valid if the acoustic features contain vibrato parameters.
return_states (bool) – Whether to return the internal states (for debugging)
force_fix_vuv (bool) – Whether to correct VUV.
segmneted_synthesis (bool) – Whether to use segmented synthesis.
- svs(labels, vocoder_type='world', post_filter_type='merlin', trajectory_smoothing=True, trajectory_smoothing_cutoff=50, trajectory_smoothing_cutoff_f0=20, vuv_threshold=0.5, pre_f0_shift_in_cent=0, post_f0_shift_in_cent=0, vibrato_scale=1.0, return_states=False, force_fix_vuv=False, post_filter=None, segmented_synthesis=False)[source]
Synthesize waveform from HTS labels.
- Parameters
labels (nnmnkwii.io.hts.HTSLabelFile) – HTS labels
vocoder_type (str) – Vocoder type.
world
,pwg
andusfgan
is supported.post_filter_type (str) – Post-filter type.
merlin
,gv
ornnsvs
is supported.trajectory_smoothing (bool) – Whether to smooth acoustic feature trajectory.
trajectory_smoothing_cutoff (int) – Cutoff frequency for trajectory smoothing.
trajectory_smoothing_cutoff_f0 (int) – Cutoff frequency for trajectory smoothing of f0.
vuv_threshold (float) – Threshold for VUV.
f0_shift_in_cent (float) – F0 scaling factor.
vibrato_scale (float) – Scale for vibrato. Only valid if the acoustic features contain vibrato parameters.
return_states (bool) – Whether to return the internal states (for debugging)
force_fix_vuv (bool) – Whether to correct VUV.
segmneted_synthesis (bool) – Whether to use segmented synthesis.