Re: cxxpiper

speechd-discuss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: cxxpiper

From:	Derek Davies
Subject:	Re: cxxpiper
Date:	Wed, 18 Sep 2024 13:55:52 -0400
User-agent:	mu4e 1.2.0; emacs 26.3

Samuel Thibault <samuel.thibault@ens-lyon.org> writes:

e are already using
[snip: stuff about cxxpiper -- the C++++ module in development]
> Which module? I don't know about it. Will it really be useful to keep
> the python version once the c++ version works?

Apparently this project has a python/piper generic output module within it?
https://github.com/Elleo/pied#start-of-content

I can't remember where I got wind of python piper as a generic SD, but
it may have been on some coqui forum, or in some raspy or braillecom
page. I think someone here toldme about pyed as an installation vector,
but I'd already hacked together my own install from other instructions.

I think the C++ version would replace it, but one option would be for
the python version to leverage some model server so it would not incur
model load that a stateless command line driven generic module does.

But since I was interested in a non-sd_generic, ie non-command line
approch and piper had a C++ library with demo code I did C++.

I think we'll see evolution toward a sd_generic command line interface
that uses a separate long running server separate from the SD and OM
themselves. Lots of TTSes come with command line infrastructure that
will already work (or they should). I don't think fork/exec or TCP
overhead will be a visceral problem, but I doubt. Especially with
caching[1].

> https://htmlpreview.github.io/?https://github.com/brailcom/speechd/blob/master/doc/speech-dispatcher.html#Output-Modules

Thanks. That's a start., I think, but I could not access that
directorly using emacs eww so I think I got it from an arch wiki page.
Pretty much what I was looking for except there is so much MORE. I will
try to add to what you referenced with more stuff.

The piper C++ library doesn't seem to parameterize rate pitch or volume,
so to get that I will need to effect the sample in the audio callback,
incuring "resampling" like overhead. No biggie but if I'm wrong please
let me know :)

I'm having trouble finding multi-speaker models that fill in the speaker
map with sensible values. I would guess the value would be some human
readable useful description, such as "female" or "nora", but they all
appear to be gibberish. I tried about three model sources. The
existing OMs, including the python piper don't seem to handle
multi-speaker voices, at least not directly as variants of voices. I
would think OMs would flattten speakers into voices with some appended
descriptive text, probably from the speaker map, so that the user would
see and use them with the existing spd-say -L interface.

Any thoughts or precedent on how this should work? By flattening we
would avoid introducing a "current speaker index" to the api, so that's
how I'm guessing it should be done.

Thanks!
Derek

[1] We should generalize and separate the various OM implemented
caches. It might be good to cache only certain things, such as single
letter or single word queries only, IMHO.

[Prev in Thread]

Current Thread

[Next in Thread]

cxxpiper, Derek Davies, 2024/09/15
- Re: cxxpiper, Samuel Thibault, 2024/09/15
  - Re: cxxpiper, Derek Davies <=
    - Re: cxxpiper, guest271314, 2024/09/18
    - Re: cxxpiper, Samuel Thibault, 2024/09/18
    - Re: cxxpiper, Kyle, 2024/09/19

Prev by Date: Speech Dispatcher via inet_socket
Next by Date: Re: cxxpiper
Previous by thread: Re: cxxpiper
Next by thread: Re: cxxpiper
Index(es):
- Date
- Thread