Lumina offers a cutting-edge ๐ ๏ธ for Delphi developers to seamlessly integrate advanced generative AI capabilities into their ๐ฑ. Built on the computational backbone of llama.cpp ๐ช, Lumina prioritizes data privacy ๐, performance โก, and a user-friendly API ๐, making it a powerful tool for local AI inference ๐ค.
- Localized Processing ๐ : Operates entirely offline, ensuring sensitive data remains confidential ๐ก๏ธ while offering complete computational control ๐ง .
- Broad Model Compatibility ๐: Supports GGUF models compliant with llama.cpp standards, granting access to diverse AI architectures ๐งฉ.
- Intuitive Development Interface ๐๏ธ: A concise, flexible API simplifies model management ๐๏ธ, inference execution ๐งฎ, and callback customization ๐๏ธ, minimizing implementation complexity.
- Future-Ready Scalability ๐: This release emphasizes stability ๐๏ธ and foundational features, with plans for multi-turn conversation ๐ฌ and retrieval-augmented generation (RAG) ๐ in future updates.
Lumina expands your development toolkit ๐ with capabilities such as:
- Dynamic chatbot creation ๐ฌ.
- Automated text generation ๐ and summarization ๐ฐ.
- Context-sensitive content generation โ๏ธ.
- Real-time inference for adaptive processes โก.
- Operates independently of external networks ๐ก๏ธ, guaranteeing data security.
- Uses Vulkan ๐ฅ๏ธ for optional GPU acceleration to enhance performance.
- Configurable GPU utilization through the
AGPULayers
parameter ๐งฉ. - Dynamic thread allocation based on hardware capabilities ๐ฅ๏ธ via
AMaxThreads
. - Comprehensive performance metrics ๐, offering insights into throughput ๐ and efficiency.
- Embedded dependencies eliminate the need for external libraries ๐ฆ.
- Lightweight architecture (~2.5MB overhead) ensures broad deployment compatibility ๐.
-
Download the Repository ๐ฆ
- Download here and extract the files to your preferred directory ๐.
-
Acquire a GGUF Model ๐ง
- Obtain a model from Hugging Face, such as Gemma 2.2B GGUF (Q8_0). Save it to a directory accessible to your application (e.g.,
C:/LLM/GGUF
) ๐พ.
- Obtain a model from Hugging Face, such as Gemma 2.2B GGUF (Q8_0). Save it to a directory accessible to your application (e.g.,
-
Ensure GPU Compatibility ๐ฎ
- Verify Vulkan compatibility for enhanced performance โก. Adjust
AGPULayers
as needed to accommodate VRAM limitations ๐.
- Verify Vulkan compatibility for enhanced performance โก. Adjust
-
โจ TLumina Class
- ๐ Add
Lumina
to youruses
section. - ๐ ๏ธ Create an instance of
TLumina
. - ๐ All functionality will then be at your disposal. That simple! ๐
- ๐ Add
-
Explore Examples ๐
- Check the
examples
directory for detailed usage demonstrations ๐.
- Check the
Integrate Lumina into your Delphi project ๐ฅ๏ธ:
var
Lumina: TLumina;
begin
Lumina := TLumina.Create;
try
if Lumina.LoadModel('C:\LLM\GGUF\gemma-2-2b-it-abliterated-Q8_0.gguf',
'', 8192, -1, 8) then
begin
if Lumina.SimpleInference('What is the capital of Italy?') then
WriteLn('Inference completed successfully.')
else
WriteLn('Error: ', Lumina.GetError);
end;
finally
Lumina.Free;
end;
end;
Define custom behavior using Luminaโs callback functions ๐ ๏ธ:
procedure NextTokenCallback(const AToken: string; const AUserData: Pointer);
begin
Write(AToken);
end;
Lumina.SetNextTokenCallback(NextTokenCallback, nil);
-
LoadModel ๐
- Parameters:
AModelFilename
: Path to the GGUF model file ๐.ATemplate
: Optional inference template ๐.AMaxContext
: Maximum context size (default: 512) ๐ง .AGPULayers
: GPU layer configuration (-1 for maximum) ๐ฎ.AMaxThreads
: Number of CPU threads allocated ๐ฅ๏ธ.
- Returns a boolean indicating success โ .
- Parameters:
-
SimpleInference ๐ง
- Accepts a single query for immediate processing ๐.
- Returns a boolean indicating success โ .
-
SetNextTokenCallback ๐ฌ
- Assigns a handler to process tokens during inference ๐งฉ.
-
UnloadModel โ
- Frees resources allocated during model loading ๐๏ธ.
-
GetPerformanceResult ๐
- Provides metrics, including token generation rates ๐.
Lumina will use the template defined in the model's meta data by default, but you can also define custom templates to match your modelโs requirements or change its behavor. These are some common model templates โ๏ธ:
const
CHATML_TEMPLATE = '<|im_start|>{role} {content}<|im_end|><|im_start|>assistant';
GEMMA_TEMPLATE = '<start_of_turn>{role} {content}<end_of_turn>';
PHI_TEMPLATE = '<|{role}|> {content}<|end|><|assistant|>';
- {role} - will be replaced with the role (user, assistant, etc.)
- {content} - will be replaced with the content sent to the model
AGPULayers
values:-1
: Utilize all available layers (default) ๐ฅ๏ธ.0
: CPU-only processing ๐ฅ๏ธ.- Custom values for partial GPU utilization ๐๏ธ.
Retrieve detailed operational metrics ๐:
var
Perf: TLumina.PerformanceResult;
begin
Perf := Lumina.GetPerformanceResult;
WriteLn('Tokens/Sec: ', Perf.TokensPerSecond);
WriteLn('Input Tokens: ', Perf.TotalInputTokens);
WriteLn('Output Tokens: ', Perf.TotalOutputTokens);
end;
Discover in-depth discussions and insights about Lumina and its innovative features. ๐โจ
Lumina.Deep.Dive.mp4
- Report issues via the Issue Tracker ๐.
- Engage in discussions on the Forum and Discord ๐ฌ.
- Learn more at Learn Delphi ๐.
Contributions to โจ Lumina are highly encouraged! ๐
- ๐ Report Issues: Submit issues if you encounter bugs or need help.
- ๐ก Suggest Features: Share your ideas to make Lumina even better.
- ๐ง Create Pull Requests: Help expand the capabilities and robustness of the library.
Your contributions make a difference! ๐โจ
Lumina is distributed under the ๐ BSD-3-Clause License, allowing for redistribution and use in both source and binary forms, with or without modification, under specific conditions. See the LICENSE file for more details.
Advance your Delphi applications with Lumina ๐ โ a sophisticated solution for integrating local generative AI ๐ค.