Skip to content

Issue in converting voice to text language detection #6429

@anjaly

Description

@anjaly

Please make sure you have searched for information in the following guides.

A screenshot that you have tested with "Try this API".

.

Link to the code that reproduces this issue. A link to a public Github Repository or gist with a minimal reproduction.

https://www.npmjs.com/package/@google-cloud/speech

A step-by-step description of how to reproduce the issue, based on the linked reproduction.

.

A clear and concise description of what the bug is, and what you expected to happen.

@google-cloud/speech

We are experiencing an issue with the Google Speech-to-Text API where English voice inputs are sometimes being incorrectly identified and transcribed as Arabic text. And also the same inputs are returning different responses when asked by different users.

We would appreciate your guidance on the following points:

How can we ensure that speech is consistently recognized in the correct spoken language, especially when English is spoken?

Is there a way to reduce or prevent misclassification into alternative languages like Arabic when English is the actual spoken language?

Can identical voice inputs result in different transcriptions depending on speaker tone, accent, or other voice characteristics?

Our desired outcome is to ensure the transcription reflects the user’s spoken language accurately and consistently.

Looking forward to your assistance on this matter.

Below is the code snippet that we are using for transcription of the voice to text.

converter: function (req, res) { return new Promise( function (resolve, reject) { const grpc = require("@grpc/grpc-js"); const speech = require('@google-cloud/speech').v1p1beta1; const client = new speech.SpeechClient({ projectId: config.api.PROECT_ID, credentials:req.body.oauthJSON, grpc: grpc}); let file = req.fileNoExtension; const filename = "./audio/" + file + ".flac"; let languageCode = req.body.lang_code ? req.body.lang_code : config.api.LANGUAGE_CODE; let alternativeLanguageCodes = req.body.alternative_language_codes && Array.isArray(req.body.alternative_language_codes) ? req.body.alternative_language_codes : config.api.ALTERNATIVE_LANGUAGE_CODES; const config1 = { encoding: config.api.ENCODING, sampleRateHertz: 16000, languageCode: languageCode, alternativeLanguageCodes: alternativeLanguageCodes, }; const audio = { content: fs.readFileSync(filename).toString('base64'), }; const request = { config: config1, audio: audio, }; client .recognize(request) .then(data => { var temp = './audio/'; findRemoveSync(temp, { filename: file, extensions: config.api.AUDIO_FORMATS }, function (err, body) { console.log("body", JSON.stringify(body)) }); if (req.options && req.options.filename) { let n = './audio/' + req.options.filename; fs.unlink(n, function (err) { if (err) { console.log(err); } }) } const response = data[0]; let responseDataObj = {}; responseDataObj.audio_link = req.body.attachment; responseDataObj.totalBilledTime; responseDataObj.convertedText = []; if (response.results && response.results.length > 0) { responseDataObj.totalBilledTime = response.totalBilledTime; response.results.forEach(result => { const alternative = result.alternatives[0]; let dbOut = {}; dbOut.text = alternative.transcript; dbOut.confidence = alternative.confidence; responseDataObj.convertedText.push(dbOut); resolve(responseDataObj); }); } else { req.speechStatus = 'false'; console.info('Unable to process the file ', req.speechStatus) return res.json({ status: 400, info: 'Unable to process the file' }); } }) .catch(err => { req.speechStatus = 'false'; var temp = './audio/'; findRemoveSync(temp, { filename: file, extensions: config.api.AUDIO_FORMATS }, function (err, body) { console.log("body", JSON.stringify(body)) }); if (req.options && req.options.filename) { let n = './audio/' + req.options.filename; fs.unlink(n, function (err) { if (err) { //console.log(err); } }) } return res.json({ status: 400, info: 'Unable to process the audio file' }); }); });
Below is the configuration we use

{"encoding":"FLAC","sampleRateHertz":16000,"languageCode":"en-GB","alternativeLanguageCodes":["ar-AE","de-DE","th-TH"]}

Thanks

A clear and concise description WHY you expect this behavior, i.e., was it a recent change, there is documentation that points to this behavior, etc. **

The converted text should be in the same language.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions