Day 6

Personaplex Speech-to-Speech Processor

Personaplex Speech-to-Speech Processor

Today I took advantage of the HuggingFace Daily Briefer to play with a new model.  The NVIDIA Personaplex 7B: Audio‑to‑Audio Model is a speech-to-speech AI model that can talk with you in real time while staying in a chosen role and voice, using text prompts and short voice samples. It’s a neural architecture designed to let a model listen and speak at the same time, producing natural real-time speech conversations instead of the usual “speech in, text out, speech back” pipeline.

I was able to get the model running locally on my Mac M3, but only in offline mode and it was quite slow.  In order to actually run it in real-time you need a more powerful CUDA GPU.  The script reads an audio file, and spits out the AI generated audio.  In the example that I ran, the audio quality didnt seem that great, but it did seem to be pretty responsive to what I was saying.  The script outpus an audio file, which I then put both the input and output audio files together in Audacity to see if the model was actually reponsive.  I ran out of time on this project to really play and evaluate - one of the features I was looking forward to using is the persona aspect.

One additional note, I was using Claude Code to get some of this up and running when Claude gave me this response when I was debugging some of the code.  I was very surprised:

Notice how Claude fixed the package software.  I'm not sure if I should be amazed or shocked.  To the best of my knowledge, you shouldnt really edit the underlying installation packages unless you really know what you're doing...

Here is the simple script on Github

← All Projects