Issue in converting voice to text language detection

### Please make sure you have searched for information in the following guides.

- [ ] Search the issues already opened: https://github.com/GoogleCloudPlatform/google-cloud-node/issues
- [x] Search StackOverflow: http://stackoverflow.com/questions/tagged/google-cloud-platform+node.js
- [x] Check our Troubleshooting guide: https://github.com/googleapis/google-cloud-node/blob/main/docs/troubleshooting.md
- [x] Check our FAQ: https://github.com/googleapis/google-cloud-node/blob/main/docs/faq.md
- [x] Check our libraries HOW-TO: https://github.com/googleapis/gax-nodejs/blob/main/client-libraries.md
- [x] Check out our authentication guide: https://github.com/googleapis/google-auth-library-nodejs
- [x] Check out handwritten samples for many of our APIs: https://github.com/GoogleCloudPlatform/nodejs-docs-samples
- [x] Check the API's issue tracker: https://cloud.google.com/support/docs/issue-trackers

### A screenshot that you have tested with "Try this API".


.

### Link to the code that reproduces this issue. A link to a **public** Github Repository or gist with a minimal reproduction.


https://www.npmjs.com/package/@google-cloud/speech

### A step-by-step description of how to reproduce the issue, based on the linked reproduction.


.

### A clear and concise description of what the bug is, and what you expected to happen.

@google-cloud/speech

We are experiencing an issue with the Google Speech-to-Text API where English voice inputs are sometimes being incorrectly identified and transcribed as Arabic text. And also the same inputs are returning different responses when asked by different users.

We would appreciate your guidance on the following points:

How can we ensure that speech is consistently recognized in the correct spoken language, especially when English is spoken?

Is there a way to reduce or prevent misclassification into alternative languages like Arabic when English is the actual spoken language?

Can identical voice inputs result in different transcriptions depending on speaker tone, accent, or other voice characteristics?

Our desired outcome is to ensure the transcription reflects the user’s spoken language accurately and consistently.

Looking forward to your assistance on this matter.

Below is the code snippet that we are using for transcription of the voice to text.

`converter: function (req, res) { 
return new Promise( function (resolve, reject) { 
    const grpc = require("@grpc/grpc-js"); 
    const speech = require('@google-cloud/speech').v1p1beta1; 
    const client = new speech.SpeechClient({ projectId: config.api.PROECT_ID, credentials:req.body.oauthJSON, grpc: grpc});
    let file = req.fileNoExtension;
    const filename = "./audio/" + file + ".flac";
    let languageCode = req.body.lang_code ?
        req.body.lang_code : config.api.LANGUAGE_CODE;
    let alternativeLanguageCodes =
        req.body.alternative_language_codes && Array.isArray(req.body.alternative_language_codes) ?
           req.body.alternative_language_codes : config.api.ALTERNATIVE_LANGUAGE_CODES;
    const config1 = {
        encoding: config.api.ENCODING,
        sampleRateHertz: 16000,
        languageCode: languageCode,
        alternativeLanguageCodes: alternativeLanguageCodes,
    };
    const audio = {
        content: fs.readFileSync(filename).toString('base64'),
    };
    const request = {
        config: config1,
        audio: audio,
    };
    client
        .recognize(request)
        .then(data => {
            var temp = './audio/';
            findRemoveSync(temp, {
                filename: file,
                extensions: config.api.AUDIO_FORMATS
            }, function (err, body) {
                console.log("body", JSON.stringify(body))
            });
            if (req.options && req.options.filename) {
                let n = './audio/' + req.options.filename;
                fs.unlink(n, function (err) {
                    if (err) {
                        console.log(err);
                    }
                })
            }
            const response = data[0];
            let responseDataObj = {};
            responseDataObj.audio_link = req.body.attachment;
            responseDataObj.totalBilledTime;
            responseDataObj.convertedText = [];
            if (response.results && response.results.length > 0) {
                responseDataObj.totalBilledTime  = response.totalBilledTime;
                response.results.forEach(result => {
                    const alternative = result.alternatives[0];
                    let dbOut = {};
                    dbOut.text = alternative.transcript;
                    dbOut.confidence = alternative.confidence;
                    responseDataObj.convertedText.push(dbOut);
                    resolve(responseDataObj);
                });
            } else {
                req.speechStatus = 'false';
                console.info('Unable to process the file ', req.speechStatus)
                return res.json({
                    status: 400,
                    info: 'Unable to process the file'
                });
            }
        })
        .catch(err => {
            req.speechStatus = 'false';
                var temp = './audio/';
                findRemoveSync(temp, {
                    filename: file,
                    extensions: config.api.AUDIO_FORMATS
                }, function (err, body) {
                    console.log("body", JSON.stringify(body))
                });
                if (req.options && req.options.filename) {
                    let n = './audio/' + req.options.filename;
                    fs.unlink(n, function (err) {
                        if (err) {
                            //console.log(err);
                        }
                    })
                }
                return res.json({
                    status: 400,
                    info: 'Unable to process the audio file'
                });
        });
});`
Below is the configuration we use

{"encoding":"FLAC","sampleRateHertz":16000,"languageCode":"en-GB","alternativeLanguageCodes":["ar-AE","de-DE","th-TH"]}

Thanks

### A clear and concise description WHY you expect this behavior, i.e., was it a recent change, there is documentation that points to this behavior, etc. **

The converted text should be in the same language.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue in converting voice to text language detection #6429

Please make sure you have searched for information in the following guides.

A screenshot that you have tested with "Try this API".

Link to the code that reproduces this issue. A link to a public Github Repository or gist with a minimal reproduction.

A step-by-step description of how to reproduce the issue, based on the linked reproduction.

A clear and concise description of what the bug is, and what you expected to happen.

A clear and concise description WHY you expect this behavior, i.e., was it a recent change, there is documentation that points to this behavior, etc. **

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue in converting voice to text language detection #6429

Description

Please make sure you have searched for information in the following guides.

A screenshot that you have tested with "Try this API".

Link to the code that reproduces this issue. A link to a public Github Repository or gist with a minimal reproduction.

A step-by-step description of how to reproduce the issue, based on the linked reproduction.

A clear and concise description of what the bug is, and what you expected to happen.

A clear and concise description WHY you expect this behavior, i.e., was it a recent change, there is documentation that points to this behavior, etc. **

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions