![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
This tutorial will guide you through the process of building a simple C++ program that performs inference on GGUF LLM models using the llama.cpp framework. We will cover the essential steps involved in loading the model, performing inference, and displaying the results. The code for this tutorial can be found here.
Prerequisites
To follow along with this tutorial, you will need the following:
A Linux-based operating system (native or WSL)
CMake installed
GNU/clang toolchain installed
Step 1: Setting Up the Project
Let's start by setting up our project. We will be building a C/C++ program that uses llama.cpp to perform inference on GGUF LLM models.
Create a new project directory, let's call it smol_chat.
Within the project directory, let's clone the llama.cpp repository into a subdirectory called externals. This will give us access to the llama.cpp source code and headers.
mkdir -p externals
cd externals
git clone https://github.com/georgigerganov/llama.cpp.git
cd ..
Step 2: Configuring CMake
Now, let's configure our project to use CMake. This will allow us to easily compile and link our C/C++ code with the llama.cpp library.
Create a CMakeLists.txt file in the project directory.
In the CMakeLists.txt file, add the following code:
cmake_minimum_required(VERSION 3.10)
project(smol_chat)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
add_executable(smol_chat main.cpp)
target_include_directories(smol_chat PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
target_link_libraries(smol_chat llama.cpp)
This code specifies the minimum CMake version, sets the C++ standard and standard flag, adds an executable named smol_chat, includes headers from the current source directory, and links the llama.cpp shared library to our executable.
Step 3: Defining the LLM Interface
Next, let's define a C++ class that will handle the high-level interactions with the LLM. This class will abstract away the low-level llama.cpp function calls and provide a convenient interface for performing inference.
In the project directory, create a header file called LLMInference.h.
In LLMInference.h, declare the following class:
class LLMInference {
public:
LLMInference(const std::string& model_path);
~LLMInference();
void startCompletion(const std::string& query);
std::string completeNext();
private:
llama_model llama_model_;
llama_context llama_context_;
llama_sampler llama_sampler_;
std::vector
std::vector
std::vector
llama_batch batch_;
};
This class has a public constructor that takes the path to the GGUF LLM model as an argument and a destructor that deallocates any dynamically-allocated objects. It also has two public member functions: startCompletion, which initiates the completion process for a given query, and completeNext, which fetches the next token in the LLM's response sequence.
Step 4: Implementing LLM Inference Functions
Now, let's define the implementation for the LLMInference class in a file called LLMInference.cpp.
In LLMInference.cpp, include the necessary headers and implement the class methods as follows:
#include "LLMInference.h"
#include "common.h"
#include
#include
#include
LLMInference::LLMInference(const std::string& model_path) {
llama_load_model_from_file(&llama_model_, model_path.c_str(), llama_model_default_params());
llama_new_context_with_model(&llama_context_, &llama_model_);
llama_sampler_init_temp(&llama_sampler_, 0.8f);
llama_sampler_init_min_p(&llama_sampler_, 0.0f);
}
LLMInference::~LLMInference() {
for (auto& msg : _messages) {
std::free(msg.content);
}
llama_free_model(&llama_model_);
llama_free_context(&llama_context_);
}
void LLMInference::startCompletion(const std::string& query)
免責聲明:info@kdj.com
所提供的資訊並非交易建議。 kDJ.com對任何基於本文提供的資訊進行的投資不承擔任何責任。加密貨幣波動性較大,建議您充分研究後謹慎投資!
如果您認為本網站使用的內容侵犯了您的版權,請立即聯絡我們(info@kdj.com),我們將及時刪除。
-
-
- 5最佳預售加密貨幣:Dexboss(DEBO),Aurealone(DLUME)
- 2025-04-28 14:35:13
- 加密貨幣世界的世界再次充滿了活動。隨著2025年的承諾,大型區塊鏈技術是新一代的預售
-
- 據報導,華為開發了強大的AI芯片,可以與NVIDIA的處理器競爭
- 2025-04-28 14:30:13
- 總部位於深圳的華為準備開始測試一種名為Ascend 910D的新AI芯片,並已與當地科技公司聯繫
-
- 超過$ 90k的爆炸看起來不錯。但這真的是2025年公牛的怪物嗎?
- 2025-04-28 14:30:13
- 是的,減半的周期。比特幣的內置供應衝擊。每四年,礦工的獎勵就會被削減 - 發生在2024年4月左右。
-
- 比特幣(BTC)徘徊在94,000美元以下
- 2025-04-28 14:25:13
- 比特幣(BTC)徘徊在94,000美元以下,同時仍對美國經濟指標表現出敏感性。
-
-
- 華爾街傳奇關於財務的未來
- 2025-04-28 14:20:13
- 本文是Anthony Pompliano和Hamilton Lane共同首席執行官Erik Hirsch的視頻採訪,重點介紹以下三個核心主題:
-
- -
- 2025-04-28 14:20:13
- 在重大失敗的邊緣,Solana(Sol)顯示出了即將下降的早期跡象的跡象。加密貨幣達到了162美元的新高點
-
- 今天印度尼西亞的PI網絡(PI)價格
- 2025-04-28 14:15:13
- 截至2025年4月28日,截至今天,印度尼西亞的1 PI網絡(PI)的價格約為10,100 IDR。