As far as I understand, the MMS TTS models are trained from scratch (section 7.1 of [1]), they do not employ any SSL models. So the OmniASR SSL models are not useful here.
What might be interesting is the newly released OmniASR data, because the MMS data, which was used for the MMS TTS, was never released.
Also, the OmniASR can be used to transcribe some untranscribed speech to train a TTS on it.
There is only a single paper that has published a similar derivation but with a critical mistake. To be fair there are many documented examples of how to derive parametric relationships in linkages and can be quite methodical. I think I could get Gemini or 3.5 to do it but not single shot/ultra fast like here.
reply