Releases · ggml-org/llama.cpp

sync : ggml

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

common : only load backends when required (#22290)

common : only load backends when required

Signed-off-by: Adrien Gallouët angt@huggingface.co

llama : call ggml_backend_load_all() directly from llama_backend_init()

Signed-off-by: Adrien Gallouët angt@huggingface.co

Add ggml_backend_load_all() where llama_backend_init() is not used

Signed-off-by: Adrien Gallouët angt@huggingface.co

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

vendor : update cpp-httplib to 0.43.3 (#22686)

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

llama : add option to save memory in device buffers (#22679)

llama : add option to save memory in device buffers
tests : extend llama-save-load-state

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

ggml : implement fast walsh-hadamard transform for kv rotation (#21352) (#22631)

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

kleidiai : update to v1.24.0 and use release archive (#22549)

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

server: implement /models?reload=1 (#21848)

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

examples: refactor diffusion generation (#22590)

examples: refactor diffusion generation
renamed enum values

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

common/autoparser: fixes for newline handling / forced tool calls (#22654)

chat/autoparser: the fixes
Move optspace() to chat-peg-parser, comment out server tests invalidated due to content now allowed with forced tool calls.
Trim whitespace on apply instead

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

model: move load_hparams and load_tensors to per-model definition (#22004)

git-friendly migration
add build_graph
nits
exclude old code from build
wip
add llm_arch_model_i
prepare downstream functions
nits
nits
wip
wip
add back create_tensor_qkv
fix files missing include
enforce one llm_build per arch
cmake: use glob
missing model params
nits
wip
wip (2)
wip (3)
test-llama-archs is happy
improve switch case
move more stuff into llm_arch_model_i
fix downstream code
nits
nits (2)
fix order
llama_model_base
LLAMA_LOAD_LOCALS
small fix
fix build errors
auto
rm migration script and ifdef

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

Releases: ggml-org/llama.cpp

b9033

Uh oh!

b9031

Uh oh!

b9030

Uh oh!

b9028

Uh oh!

b9026

Uh oh!

b9025

Uh oh!

b9023

Uh oh!

b9022

Uh oh!

b9020

Uh oh!

b9019

Uh oh!