PRE-TRAINED MACHINE LEARNING MODELS FOR REAL-TIME SPEECH FORM CONVERSION

Techniques are described for generating parallel data for real-time speech form conversion. In an embodiment, based at least in part on input speech data of an original form, a speech machine learning (ML) model generates parallel speech data. The parallel speech data includes the input speech data of the original form and temporally aligned output speech data of a target form different than the original form. Each frame of the input speech data temporally corresponds to the corresponding output speech frame of the target speech form and contains a same portion of the particular content. The techniques further include training a teacher machine learning model that is offline and is substantially larger than a student machine learning model for converting speech form. Transferring "knowledge" from the trained Teacher model for training the Production Student Model that performs the speech form conversion on an end-user computing device.

The challenge in speech conversion is not only to be able to convert speech audio from one form to another (e g , accented speech to native speech) but also to do so in realtime. In particular, the quest for efficient, high-quality accent conversion has long been an area of interest within speech and audio processing. Accent conversion has significant potential for various applications, notably in real-time audio communication. The challenge is to design an audio processing algorithm that may transform a given input audio signal with L2 (original speech format, e.g., English with accent speech) into LI (desired speech format, e.g., native English speech), maintaining the textual integrity and synchronicity, yet changing the accentual characteristics. Although the accent conversion for the English language is used as an example speech conversion task, the techniques described herein are equally applicable to other speech conversion tasks such as voice conversion, emotion conversion, bandwidth expansion, etc.

One approach to perform audio speech conversion may be to first convert the original audio stream to text and then convert the text data to the desired form of the audio speech stream. This approach may work for offline audio processing, but the real-time conversion would require extensive computing resources and will present an additional challenge of synchronization.

Estimated Valuation $99,999,999.99

Rohit

0 Deals In Queue

Estimated Valuation $99,999,999.99

N/A

( 0 Reviews )

Application Number

US2024/038305

Patent Number

2025/024193

Applicant

TECHNOLOGIES, INC

Country

Industry

Automotive and Transportation

Patent Type

Single Patent

Available For

Sale

Documents

1758535459662_450.pdf

Download
1758535532589_135.pdf

Download

Actions

Report

Complete Specification

Added all portfolio

Country	Current Status	Patent Application Number	Patent Number	Applicant / Current Assignee Name	Title	Google Patent Link
	Granted	US2024/038305	2025/024193	TECHNOLOGIES, INC	Single Patent	Google patent link

You May Also Llike