PRE-TRAINED MACHINE LEARNING MODELS FOR REAL-TIME SPEECH FORM CONVERSION

  • Short Desription

    Abstract/Short Description of Invention

  • Problem Addressed

    Describe The Problem the Invention Addresses in Simple Terms

  • Solution to Solution

    The Invention Namely the Solution to the Problem

  • Prior Art Addressed

    Invention Addressed a Major Gap In Prior Art

  • Unique Features

    Unique Features of Product/Process/Service

  • Commercial & Market View

    Commercial and Market Potential

Techniques are described for generating parallel data for real-time speech form conversion. In an embodiment, based at least in part on input speech data of an original form, a speech machine learning (ML) model generates parallel speech data. The parallel speech data includes the input speech data of the original form and temporally aligned output speech data of a target form different than the original form. Each frame of the input speech data temporally corresponds to the corresponding output speech frame of the target speech form and contains a same portion of the particular content. The techniques further include training a teacher machine learning model that is offline and is substantially larger than a student machine learning model for converting speech form. Transferring "knowledge" from the trained Teacher model for training the Production Student Model that performs the speech form conversion on an end-user computing device.

The challenge in speech conversion is not only to be able to convert speech audio from one form to another (e g , accented speech to native speech) but also to do so in realtime. In particular, the quest for efficient, high-quality accent conversion has long been an area of interest within speech and audio processing. Accent conversion has significant potential for various applications, notably in real-time audio communication. The challenge is to design an audio processing algorithm that may transform a given input audio signal with L2 (original speech format, e.g., English with accent speech) into LI (desired speech format, e.g., native English speech), maintaining the textual integrity and synchronicity, yet changing the accentual characteristics. Although the accent conversion for the English language is used as an example speech conversion task, the techniques described herein are equally applicable to other speech conversion tasks such as voice conversion, emotion conversion, bandwidth expansion, etc.

One approach to perform audio speech conversion may be to first convert the original audio stream to text and then convert the text data to the desired form of the audio speech stream. This approach may work for offline audio processing, but the real-time conversion would require extensive computing resources and will present an additional challenge of synchronization.

Estimated Valuation $99,999,999.99
Rohit
Rohit
0 Deals In Queue
Estimated Valuation $99,999,999.99
N/A
( 0 Reviews )
Application Number
US2024/038305
Patent Number
2025/024193
Applicant
TECHNOLOGIES, INC
Country
Industry
Automotive and Transportation
Patent Type
Single Patent
Available For
Sale

Documents

Actions

Complete Specification

Added all portfolio

Country Current Status Patent Application Number Patent Number Applicant / Current Assignee Name Title Google Patent Link
Granted US2024/038305 2025/024193 TECHNOLOGIES, INC Single Patent Google patent link
You May Also Llike

You may also like the following patent