![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
This tutorial will guide you through the process of building a simple C++ program that performs inference on GGUF LLM models using the llama.cpp framework. We will cover the essential steps involved in loading the model, performing inference, and displaying the results. The code for this tutorial can be found here.
Prerequisites
To follow along with this tutorial, you will need the following:
A Linux-based operating system (native or WSL)
CMake installed
GNU/clang toolchain installed
Step 1: Setting Up the Project
Let's start by setting up our project. We will be building a C/C++ program that uses llama.cpp to perform inference on GGUF LLM models.
Create a new project directory, let's call it smol_chat.
Within the project directory, let's clone the llama.cpp repository into a subdirectory called externals. This will give us access to the llama.cpp source code and headers.
mkdir -p externals
cd externals
git clone https://github.com/georgigerganov/llama.cpp.git
cd ..
Step 2: Configuring CMake
Now, let's configure our project to use CMake. This will allow us to easily compile and link our C/C++ code with the llama.cpp library.
Create a CMakeLists.txt file in the project directory.
In the CMakeLists.txt file, add the following code:
cmake_minimum_required(VERSION 3.10)
project(smol_chat)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
add_executable(smol_chat main.cpp)
target_include_directories(smol_chat PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
target_link_libraries(smol_chat llama.cpp)
This code specifies the minimum CMake version, sets the C++ standard and standard flag, adds an executable named smol_chat, includes headers from the current source directory, and links the llama.cpp shared library to our executable.
Step 3: Defining the LLM Interface
Next, let's define a C++ class that will handle the high-level interactions with the LLM. This class will abstract away the low-level llama.cpp function calls and provide a convenient interface for performing inference.
In the project directory, create a header file called LLMInference.h.
In LLMInference.h, declare the following class:
class LLMInference {
public:
LLMInference(const std::string& model_path);
~LLMInference();
void startCompletion(const std::string& query);
std::string completeNext();
private:
llama_model llama_model_;
llama_context llama_context_;
llama_sampler llama_sampler_;
std::vector
std::vector
std::vector
llama_batch batch_;
};
This class has a public constructor that takes the path to the GGUF LLM model as an argument and a destructor that deallocates any dynamically-allocated objects. It also has two public member functions: startCompletion, which initiates the completion process for a given query, and completeNext, which fetches the next token in the LLM's response sequence.
Step 4: Implementing LLM Inference Functions
Now, let's define the implementation for the LLMInference class in a file called LLMInference.cpp.
In LLMInference.cpp, include the necessary headers and implement the class methods as follows:
#include "LLMInference.h"
#include "common.h"
#include
#include
#include
LLMInference::LLMInference(const std::string& model_path) {
llama_load_model_from_file(&llama_model_, model_path.c_str(), llama_model_default_params());
llama_new_context_with_model(&llama_context_, &llama_model_);
llama_sampler_init_temp(&llama_sampler_, 0.8f);
llama_sampler_init_min_p(&llama_sampler_, 0.0f);
}
LLMInference::~LLMInference() {
for (auto& msg : _messages) {
std::free(msg.content);
}
llama_free_model(&llama_model_);
llama_free_context(&llama_context_);
}
void LLMInference::startCompletion(const std::string& query)
免责声明:info@kdj.com
所提供的信息并非交易建议。根据本文提供的信息进行的任何投资,kdj.com不承担任何责任。加密货币具有高波动性,强烈建议您深入研究后,谨慎投资!
如您认为本网站上使用的内容侵犯了您的版权,请立即联系我们(info@kdj.com),我们将及时删除。
-
-
- 5最佳预售加密货币:Dexboss(DEBO),Aurealone(DLUME)
- 2025-04-28 14:35:13
- 加密货币世界的世界再次充满了活动。随着2025年的承诺,大型区块链技术是新一代的预售
-
- 据报道,华为开发了强大的AI芯片,可以与NVIDIA的处理器竞争
- 2025-04-28 14:30:13
- 总部位于深圳的华为准备开始测试一种名为Ascend 910D的新AI芯片,并已与当地科技公司联系
-
- 超过$ 90k的爆炸看起来不错。但这真的是2025年公牛的怪物吗?
- 2025-04-28 14:30:13
- 是的,减半的周期。比特币的内置供应冲击。每四年,矿工的奖励就会被削减 - 发生在2024年4月左右。
-
- 比特币(BTC)徘徊在94,000美元以下
- 2025-04-28 14:25:13
- 比特币(BTC)徘徊在94,000美元以下,同时仍对美国经济指标表现出敏感性。
-
-
- 华尔街传奇关于财务的未来
- 2025-04-28 14:20:13
- 本文是Anthony Pompliano和Hamilton Lane共同首席执行官Erik Hirsch的视频采访,重点介绍以下三个核心主题:
-
- -
- 2025-04-28 14:20:13
- 在重大失败的边缘,Solana(Sol)显示出了即将下降的早期迹象的迹象。加密货币达到了162美元的新高点
-
- 今天印度尼西亚的PI网络(PI)价格
- 2025-04-28 14:15:13
- 截至2025年4月28日,截至今天,印度尼西亚的1 PI网络(PI)的价格约为10,100 IDR。