parakeet-coreml

Production-ready speech recognition for Node.js on Apple Silicon

Powered by NVIDIA's Parakeet model running on Apple's Neural Engine via CoreML.

Why parakeet-coreml?

Modern Macs contain a powerful Neural Engine (ANE) – dedicated silicon for machine learning that often sits idle. This library puts it to work for speech recognition, delivering real-time transcription without cloud dependencies.

The Problem with Alternatives

Approach	Drawbacks
Cloud APIs (OpenAI, Google, AWS)	Privacy concerns, ongoing costs, latency, requires internet
Whisper.cpp	CPU-bound, significantly slower on Apple Silicon
Python solutions	Requires Python runtime, complex deployment, subprocess overhead
Electron + subprocess	Memory overhead, IPC latency, complex architecture

Our Solution

parakeet-coreml is a native Node.js addon that directly interfaces with CoreML. No Python. No subprocess. No cloud. Just fast, private speech recognition leveraging the full power of Apple Silicon.

Features

🚀 40x real-time – Transcribe 1 hour of audio in 90 seconds (M1 Ultra, measured)
🍎 Neural Engine Acceleration – Runs on Apple's dedicated ML silicon, not CPU
🔒 Fully Offline – All processing happens locally. Your audio never leaves your device.
📦 Zero Runtime Dependencies – No Python, no subprocess, no external services
🎯 Smart Voice Detection – Built-in VAD automatically segments long recordings
🌍 Multilingual – English and major European languages (German, French, Spanish, Italian, Portuguese, Dutch, Polish, Russian, Ukrainian, and more)
⬇️ Automatic Setup – Models download on first use. Just npm install and go.

Performance

The Apple Neural Engine delivers exceptional speech recognition performance:

Measured: M1 Ultra

5 minutes of audio → 7.7 seconds
Speed: 40x real-time
1 hour of audio in 90 seconds

Run your own benchmark:

git clone https://github.com/sebastian-software/parakeet-coreml
cd parakeet-coreml && pnpm install && pnpm benchmark

Estimated Performance by Chip

Based on Neural Engine TOPS (tera operations per second):

Chip	ANE TOPS	Estimated Speed
M4 Pro	38	70x real-time
M3 Pro	18	35x real-time
M2 Pro	16	30x real-time
M1 Ultra	22	40x (measured)
M1 Pro	11	20x real-time

Performance scales roughly with Neural Engine compute. Ultra variants have 2x ANE cores. Results may vary based on thermal conditions and system load.

Use Cases

Meeting transcription – Process recordings without uploading to third-party services
Podcast production – Generate transcripts for show notes and accessibility
Voice interfaces – Build voice-controlled applications with predictable latency
Content indexing – Make audio/video content searchable
Accessibility tools – Real-time captioning for the hearing impaired
Privacy-sensitive applications – Healthcare, legal, finance – where data cannot leave the device

Requirements

macOS 14.0+ (Sonoma or later)
Apple Silicon (M1, M2, M3, M4 – any variant)
Node.js 20+

Installation

npm install parakeet-coreml

The native addon compiles during installation. Xcode Command Line Tools are required.

Quick Start

import { ParakeetAsrEngine } from "parakeet-coreml"

const engine = new ParakeetAsrEngine()

// First run downloads models (cached for future use)
await engine.initialize()

// Transcribe audio of ANY length (16kHz, mono, Float32Array)
const result = await engine.transcribe(audioSamples)

console.log(result.text)
// "Hello, this is a test transcription."

console.log(`Processed in ${result.durationMs}ms`)

// Every result includes timestamps
for (const seg of result.segments) {
  console.log(`[${seg.startTime}s] ${seg.text}`)
}

engine.cleanup()

That's it. No API keys. No configuration. No internet required after the initial model download. No length limits – audio of any duration is automatically handled.

Audio Format

Property	Requirement
Sample Rate	16,000 Hz (16 kHz)
Channels	Mono (single channel)
Format	Float32Array with values between -1.0–1.0
Duration	Any length

Voice Activity Detection (VAD) automatically finds speech segments and provides timestamps. The result always includes segments with timing information – useful for subtitles, search indexing, or speaker diarization.

Converting Audio Files

This library processes raw PCM samples, not audio files directly. You'll need to decode your audio files before transcription. Common approaches:

ffmpeg – Convert any audio/video format to raw PCM
node-wav – Parse WAV files in Node.js
Web Audio API – Decode audio in browser/Electron environments

Example with ffmpeg (CLI):

ffmpeg -i input.mp3 -ar 16000 -ac 1 -f f32le output.pcm

Then load the raw PCM file:

import { readFileSync } from "fs"

const buffer = readFileSync("output.pcm")
const samples = new Float32Array(buffer.buffer, buffer.byteOffset, buffer.length / 4)

Model Management

Models are automatically downloaded on first use:

ASR models (~1.5GB) → ~/.cache/parakeet-coreml/models
VAD model (~1MB) → ~/.cache/parakeet-coreml/vad

CLI Commands

# Download all models (~1.5GB)
npx parakeet-coreml download

# Run benchmark
npx parakeet-coreml benchmark

# Check status
npx parakeet-coreml status

# Force re-download
npx parakeet-coreml download --force

Custom Configuration

// Use custom model directories
const engine = new ParakeetAsrEngine({
  modelDir: "./my-models",
  vadDir: "./my-vad-model"
})

// Disable auto-download (for controlled environments)
const engine = new ParakeetAsrEngine({
  autoDownload: false // Will throw if models not present
})

API Reference

`ParakeetAsrEngine`

The main class for speech recognition.

new ParakeetAsrEngine(options?: AsrEngineOptions)

Options

Option	Type	Default	Description
`modelDir`	`string`	`~/.cache/parakeet-coreml/models`	Path to ASR model directory
`vadDir`	`string`	`~/.cache/parakeet-coreml/vad`	Path to VAD model directory
`autoDownload`	`boolean`	`true`	Auto-download models if missing

Methods

Method	Description
`initialize()`	Load models (downloads if needed)
`transcribe(samples, opts?)`	Transcribe audio of any length
`isReady()`	Check if engine is initialized
`cleanup()`	Release native resources
`getVersion()`	Get version information

`TranscriptionResult`

interface TranscriptionResult {
  text: string // Combined transcription
  durationMs: number // Processing time in milliseconds
  segments: TranscribedSegment[] // Speech segments with timestamps
}

interface TranscribedSegment {
  startTime: number // Segment start in seconds
  endTime: number // Segment end in seconds
  text: string // Transcription for this segment
}

`TranscribeOptions`

interface TranscribeOptions {
  sampleRate?: number // Default: 16000
  vadThreshold?: number // Speech detection sensitivity (0-1), default: 0.5
  minSilenceDurationMs?: number // Pause length to split, default: 300
  minSpeechDurationMs?: number // Minimum segment length, default: 250
}

Helper Functions

Function	Description
`isAvailable()`	Check if running on supported platform
`getDefaultModelDir()`	Get default ASR model cache path
`areModelsDownloaded()`	Check if ASR models are present

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Your Node.js App                     │
├─────────────────────────────────────────────────────────┤
│                  parakeet-coreml API                    │  TypeScript
├─────────────────────────────────────────────────────────┤
│          ASR Engine          │       VAD Engine         │  N-API + Objective-C++
│      (Parakeet TDT v3)       │      (Silero VAD)        │
├─────────────────────────────────────────────────────────┤
│                      CoreML                             │  Apple Framework
├─────────────────────────────────────────────────────────┤
│                 Apple Neural Engine                     │  Dedicated ML Silicon
└─────────────────────────────────────────────────────────┘

The library bridges Node.js directly to Apple's CoreML framework via a native N-API addon written in Objective-C++. Both ASR and VAD models run on the Neural Engine:

VAD detects speech segments with timestamps
ASR transcribes each segment (splitting at 15s if needed)
Results are combined with full timing information

This eliminates subprocess overhead and Python interop, resulting in minimal latency and efficient memory usage.

Contributing

Contributions are welcome! Please read our Contributing Guide for details on:

Development setup
Code style guidelines
Pull request process

License

MIT – see LICENSE for details.

Credits

NVIDIA Parakeet TDT v3 – The underlying ASR model
Silero VAD – Voice Activity Detection model
FluidInference – CoreML model conversions for both Parakeet and Silero VAD

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.cursor		.cursor
.github		.github
.husky		.husky
docs/adr		docs/adr
src		src
test		test
.editorconfig		.editorconfig
.gitignore		.gitignore
.lintstagedrc		.lintstagedrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
.release-it.json		.release-it.json
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
binding.gyp		binding.gyp
commitlint.config.ts		commitlint.config.ts
eslint.config.ts		eslint.config.ts
logo.svg		logo.svg
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
test.mjs		test.mjs
tsconfig.json		tsconfig.json
tsconfig.test.json		tsconfig.test.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

parakeet-coreml

Why parakeet-coreml?

The Problem with Alternatives

Our Solution

Features

Performance

Estimated Performance by Chip

Use Cases

Requirements

Installation

Quick Start

Audio Format

Converting Audio Files

Model Management

CLI Commands

Custom Configuration

API Reference

`ParakeetAsrEngine`

Options

Methods

`TranscriptionResult`

`TranscribeOptions`

Helper Functions

Architecture

Contributing

License

Credits

About

Uh oh!

Releases 8

Sponsor this project

Uh oh!

Packages

Languages

Uh oh!

License

sebastian-software/parakeet-coreml

Folders and files

Latest commit

History

Repository files navigation

parakeet-coreml

Why parakeet-coreml?

The Problem with Alternatives

Our Solution

Features

Performance

Estimated Performance by Chip

Use Cases

Requirements

Installation

Quick Start

Audio Format

Converting Audio Files

Model Management

CLI Commands

Custom Configuration

API Reference

ParakeetAsrEngine

Options

Methods

TranscriptionResult

TranscribeOptions

Helper Functions

Architecture

Contributing

License

Credits

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 8

Sponsor this project

Uh oh!

Packages 0

Languages

`ParakeetAsrEngine`

`TranscriptionResult`

`TranscribeOptions`

Packages