Skip to content

sebastian-software/parakeet-coreml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

parakeet-coreml

parakeet-coreml

Production-ready speech recognition for Node.js on Apple Silicon

CI npm version npm downloads codecov
TypeScript Node.js License: MIT

Powered by NVIDIA's Parakeet model running on Apple's Neural Engine via CoreML.

Why parakeet-coreml?

Modern Macs contain a powerful Neural Engine (ANE) – dedicated silicon for machine learning that often sits idle. This library puts it to work for speech recognition, delivering real-time transcription without cloud dependencies.

The Problem with Alternatives

Approach Drawbacks
Cloud APIs (OpenAI, Google, AWS) Privacy concerns, ongoing costs, latency, requires internet
Whisper.cpp CPU-bound, significantly slower on Apple Silicon
Python solutions Requires Python runtime, complex deployment, subprocess overhead
Electron + subprocess Memory overhead, IPC latency, complex architecture

Our Solution

parakeet-coreml is a native Node.js addon that directly interfaces with CoreML. No Python. No subprocess. No cloud. Just fast, private speech recognition leveraging the full power of Apple Silicon.

Features

  • 🚀 40x real-time – Transcribe 1 hour of audio in 90 seconds (M1 Ultra, measured)
  • 🍎 Neural Engine Acceleration – Runs on Apple's dedicated ML silicon, not CPU
  • 🔒 Fully Offline – All processing happens locally. Your audio never leaves your device.
  • 📦 Zero Runtime Dependencies – No Python, no subprocess, no external services
  • 🎯 Smart Voice Detection – Built-in VAD automatically segments long recordings
  • 🌍 Multilingual – English and major European languages (German, French, Spanish, Italian, Portuguese, Dutch, Polish, Russian, Ukrainian, and more)
  • ⬇️ Automatic Setup – Models download on first use. Just npm install and go.

Performance

The Apple Neural Engine delivers exceptional speech recognition performance:

Measured: M1 Ultra

5 minutes of audio → 7.7 seconds
Speed: 40x real-time
1 hour of audio in 90 seconds

Run your own benchmark:

git clone https://github.com/sebastian-software/parakeet-coreml
cd parakeet-coreml && pnpm install && pnpm benchmark

Estimated Performance by Chip

Based on Neural Engine TOPS (tera operations per second):

Chip ANE TOPS Estimated Speed
M4 Pro 38 70x real-time
M3 Pro 18 35x real-time
M2 Pro 16 30x real-time
M1 Ultra 22 40x (measured)
M1 Pro 11 20x real-time

Performance scales roughly with Neural Engine compute. Ultra variants have 2x ANE cores. Results may vary based on thermal conditions and system load.

Use Cases

  • Meeting transcription – Process recordings without uploading to third-party services
  • Podcast production – Generate transcripts for show notes and accessibility
  • Voice interfaces – Build voice-controlled applications with predictable latency
  • Content indexing – Make audio/video content searchable
  • Accessibility tools – Real-time captioning for the hearing impaired
  • Privacy-sensitive applications – Healthcare, legal, finance – where data cannot leave the device

Requirements

  • macOS 14.0+ (Sonoma or later)
  • Apple Silicon (M1, M2, M3, M4 – any variant)
  • Node.js 20+

Installation

npm install parakeet-coreml

The native addon compiles during installation. Xcode Command Line Tools are required.

Quick Start

import { ParakeetAsrEngine } from "parakeet-coreml"

const engine = new ParakeetAsrEngine()

// First run downloads models (cached for future use)
await engine.initialize()

// Transcribe audio of ANY length (16kHz, mono, Float32Array)
const result = await engine.transcribe(audioSamples)

console.log(result.text)
// "Hello, this is a test transcription."

console.log(`Processed in ${result.durationMs}ms`)

// Every result includes timestamps
for (const seg of result.segments) {
  console.log(`[${seg.startTime}s] ${seg.text}`)
}

engine.cleanup()

That's it. No API keys. No configuration. No internet required after the initial model download. No length limits – audio of any duration is automatically handled.

Audio Format

Property Requirement
Sample Rate 16,000 Hz (16 kHz)
Channels Mono (single channel)
Format Float32Array with values between -1.0–1.0
Duration Any length

Voice Activity Detection (VAD) automatically finds speech segments and provides timestamps. The result always includes segments with timing information – useful for subtitles, search indexing, or speaker diarization.

Converting Audio Files

This library processes raw PCM samples, not audio files directly. You'll need to decode your audio files before transcription. Common approaches:

  • ffmpeg – Convert any audio/video format to raw PCM
  • node-wav – Parse WAV files in Node.js
  • Web Audio API – Decode audio in browser/Electron environments

Example with ffmpeg (CLI):

ffmpeg -i input.mp3 -ar 16000 -ac 1 -f f32le output.pcm

Then load the raw PCM file:

import { readFileSync } from "fs"

const buffer = readFileSync("output.pcm")
const samples = new Float32Array(buffer.buffer, buffer.byteOffset, buffer.length / 4)

Model Management

Models are automatically downloaded on first use:

  • ASR models (~1.5GB) → ~/.cache/parakeet-coreml/models
  • VAD model (~1MB) → ~/.cache/parakeet-coreml/vad

CLI Commands

# Download all models (~1.5GB)
npx parakeet-coreml download

# Run benchmark
npx parakeet-coreml benchmark

# Check status
npx parakeet-coreml status

# Force re-download
npx parakeet-coreml download --force

Custom Configuration

// Use custom model directories
const engine = new ParakeetAsrEngine({
  modelDir: "./my-models",
  vadDir: "./my-vad-model"
})

// Disable auto-download (for controlled environments)
const engine = new ParakeetAsrEngine({
  autoDownload: false // Will throw if models not present
})

API Reference

ParakeetAsrEngine

The main class for speech recognition.

new ParakeetAsrEngine(options?: AsrEngineOptions)

Options

Option Type Default Description
modelDir string ~/.cache/parakeet-coreml/models Path to ASR model directory
vadDir string ~/.cache/parakeet-coreml/vad Path to VAD model directory
autoDownload boolean true Auto-download models if missing

Methods

Method Description
initialize() Load models (downloads if needed)
transcribe(samples, opts?) Transcribe audio of any length
isReady() Check if engine is initialized
cleanup() Release native resources
getVersion() Get version information

TranscriptionResult

interface TranscriptionResult {
  text: string // Combined transcription
  durationMs: number // Processing time in milliseconds
  segments: TranscribedSegment[] // Speech segments with timestamps
}

interface TranscribedSegment {
  startTime: number // Segment start in seconds
  endTime: number // Segment end in seconds
  text: string // Transcription for this segment
}

TranscribeOptions

interface TranscribeOptions {
  sampleRate?: number // Default: 16000
  vadThreshold?: number // Speech detection sensitivity (0-1), default: 0.5
  minSilenceDurationMs?: number // Pause length to split, default: 300
  minSpeechDurationMs?: number // Minimum segment length, default: 250
}

Helper Functions

Function Description
isAvailable() Check if running on supported platform
getDefaultModelDir() Get default ASR model cache path
areModelsDownloaded() Check if ASR models are present

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Your Node.js App                     │
├─────────────────────────────────────────────────────────┤
│                  parakeet-coreml API                    │  TypeScript
├─────────────────────────────────────────────────────────┤
│          ASR Engine          │       VAD Engine         │  N-API + Objective-C++
│      (Parakeet TDT v3)       │      (Silero VAD)        │
├─────────────────────────────────────────────────────────┤
│                      CoreML                             │  Apple Framework
├─────────────────────────────────────────────────────────┤
│                 Apple Neural Engine                     │  Dedicated ML Silicon
└─────────────────────────────────────────────────────────┘

The library bridges Node.js directly to Apple's CoreML framework via a native N-API addon written in Objective-C++. Both ASR and VAD models run on the Neural Engine:

  1. VAD detects speech segments with timestamps
  2. ASR transcribes each segment (splitting at 15s if needed)
  3. Results are combined with full timing information

This eliminates subprocess overhead and Python interop, resulting in minimal latency and efficient memory usage.

Contributing

Contributions are welcome! Please read our Contributing Guide for details on:

  • Development setup
  • Code style guidelines
  • Pull request process

License

MIT – see LICENSE for details.

Credits


Copyright © 2026 Sebastian Software GmbH, Mainz, Germany

About

NVIDIA Parakeet TDT ASR for Node.js with CoreML/ANE acceleration

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published