ggml-org / llama.cpp Public

Notifications You must be signed in to change notification settings
Fork 14.2k
Star 91.8k

Code
Issues 355
Pull requests 626
Discussions
Actions
Projects 10
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Wiki
Security
Insights

Pull requests: ggml-org/llama.cpp

Labels 88 Milestones 0

New pull request New

626 Open 8,220 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

eval-callback : add support for saving logits examples

#18281 opened Dec 22, 2025 by danbev

Loading…

Vulkan: Tune Flash Attention for MoE on AMD GPUs ggml

changes relating to the ggml tensor library for machine learning

Vulkan

Issues specific to the Vulkan backend

#18280 opened Dec 22, 2025 by 0cc4m

Loading…

Prevent crash if TTFT >300sec, boosted to 90 days examples server

#18279 opened Dec 22, 2025 by wbtek

Loading…

opencl: fix q4_0 unpacking for adreno in get_tensor ggml

changes relating to the ggml tensor library for machine learning

OpenCL

Issues specific to the OpenCL backend

#18278 opened Dec 22, 2025 by lhez • Draft

tools : use common_log_pause to fix fit-params output race examples

#18276 opened Dec 22, 2025 by Aadeshveer

Loading…

KYLIN: fix compile error for cuda backend ggml

changes relating to the ggml tensor library for machine learning

Nvidia GPU

Issues specific to Nvidia GPUs

#18275 opened Dec 22, 2025 by lizhenneng

Loading…

docs: Fix typos in SYCL documentation documentation

Improvements or additions to documentation

SYCL

https://en.wikipedia.org/wiki/SYCL - GPU programming language

#18269 opened Dec 21, 2025 by yoka

Loading…

add LLAMA_ARG_OVERRIDE_TENSOR env var for -ot arg

#18267 opened Dec 21, 2025 by ddh0

Loading…

llama: fix magic number of 999 for GPU layers

#18266 opened Dec 21, 2025 by JohannesGaessler

Loading…

server: add real-time prompt preprocessing progress via synthetic SSE chunks examples server

#18265 opened Dec 21, 2025 by ServeurpersoCom • Draft

vulkan: Extend rope fusions to allow mrope ggml

changes relating to the ggml tensor library for machine learning

testing

Everything test related

Vulkan

Issues specific to the Vulkan backend

#18264 opened Dec 21, 2025 by jeffbolznv

Loading…

server: prevent data race from HTTP threads examples server

#18263 opened Dec 21, 2025 by ngxson

Loading…

server: (docs) remove mention about extra_args examples server

#18262 opened Dec 21, 2025 by ngxson

Loading…

Add Gemma3n multimodal support with MobileNetV5 vision encoder examples model

Model specific

python

python script changes

#18256 opened Dec 21, 2025 by simrnsingh

Loading…

ggml rpc : Add missing check for rpc buffer type ggml

changes relating to the ggml tensor library for machine learning

#18242 opened Dec 21, 2025 by struct

Loading…

ggml-cpu: parallelize tensor repacking with OpenMP ggml

changes relating to the ggml tensor library for machine learning

#18239 opened Dec 21, 2025 by pestopoppa

Loading…

cli: buffering info log, only show if model load failed examples

#18236 opened Dec 20, 2025 by ngxson

Loading…

webui: Fix the header backdrop blur examples server

#18230 opened Dec 20, 2025 by ImadSaddik

Loading…

server: /v1/responses (text generation only) examples python

python script changes

server

#18227 opened Dec 20, 2025 by openingnow

Loading…

webui: use server presets as parameter placeholders examples server

#18226 opened Dec 20, 2025 by ServeurpersoCom

Loading…

ggml-metal: guard buffer map slicing Apple Metal

https://en.wikipedia.org/wiki/Metal_(API)

ggml

changes relating to the ggml tensor library for machine learning

#18225 opened Dec 20, 2025 by SzymonPrajs

Loading…

webui: apply webui_settings on first load examples server

#18223 opened Dec 20, 2025 by ServeurpersoCom

Loading…

ggml-metal: fix memset range and temp buffer leaks Apple Metal

https://en.wikipedia.org/wiki/Metal_(API)

ggml

changes relating to the ggml tensor library for machine learning

#18221 opened Dec 20, 2025 by SzymonPrajs

Loading…

model: support nvidia/llama-embed-nemotron model

Model specific

python

python script changes

#18220 opened Dec 20, 2025 by sfallah • Draft

convert: rework ftype heuristics python

python script changes

#18214 opened Dec 20, 2025 by taronaeo

Loading…

Previous 1 2 3 4 5 … 25 26 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!