![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
Cryptocurrency News Articles
By Changing a Single Character, Researchers Can Bypass LLMs Safety and Content Moderation Guardrails
Jun 12, 2025 at 10:13 pm
Cybersecurity researchers have discovered a novel attack technique called TokenBreak that can be used to bypass a large language model's (LLM) safety and content moderation guardrails
Cybersecurity researchers at HiddenLayer have discovered a novel attack technique called TokenBreak that can be used to bypass a large language model's (LLM) safety and content moderation guardrails with just a single character change.
The finding, which was shared with The Hacker News, builds on prior work by the researchers, who in June found that it’s possible to exploit Model Context Protocol (MCP) tools to extract sensitive data.
"By inserting specific parameter names within a tool's function, sensitive data, including the full system prompt, can be extracted and exfiltrated," HiddenLayer said.
The finding also comes as the Straiker AI Research (STAR) team found that backronyms can be used to jailbreak AI chatbots and trick them into generating an undesirable response, including swearing, promoting violence, and producing sexually explicit content.
The technique, called the Yearbook Attack, has proven to be effective against various models from Anthropic, DeepSeek, Google, Meta, Microsoft, Mistral AI, and OpenAI.
"They blend in with the noise of everyday prompts — a quirky riddle here, a motivational acronym there — and because of that, they often bypass the blunt heuristics that models use to spot dangerous intent."
A phrase like 'Friendship, unity, care, kindness' doesn't raise any flags. But by the time the model has completed the pattern, it has already served the payload, which is the key to successfully executing this trick."
"These methods succeed not by overpowering the model's filters, but by slipping beneath them. They exploit completion bias and pattern continuation, as well as the way models weigh contextual coherence over intent analysis."
The TokenBreak attack targets a text classification model's tokenization strategy to induce false negatives, leaving end targets vulnerable to attacks that the implemented protection model was put in place to prevent.
Tokenization is a fundamental step that LLMs use to break down raw text into their atomic units – i.e., tokens – which are common sequences of characters found in a set of text. To that end, the text input is converted into their numerical representation and fed to the model.
LLMs work by understanding the statistical relationships between these tokens, and produce the next token in a sequence of tokens. The output tokens are detokenized to human-readable text by mapping them to their corresponding words using the tokenizer's vocabulary.
The attack technique devised by HiddenLayer targets the tokenization strategy to bypass a text classification model's ability to detect malicious input and flag safety, spam, or content moderation-related issues in the textual input.
Specifically, the artificial intelligence (AI) security firm found that altering input words by adding letters in certain ways caused a text classification model to break.
Examples include changing "instructions" to "finstructions," "announcement" to "aannouncement," or "idiot" to "hidiot." These subtle changes cause different tokenizers to split the text in different ways, while still preserving their meaning for the intended target.
What makes the attack notable is that the manipulated text remains fully understandable to both the LLM and the human reader, causing the model to elicit the same response as what would have been the case if the unmodified text had been passed as input.
By introducing the manipulations in a way without affecting the model's ability to comprehend it, TokenBreak increases its potential for prompt injection attacks.
"This attack technique manipulates input text in such a way that certain models give an incorrect classification," the researchers said in an accompanying paper. "Importantly, the end target (LLМ or email recipient) can still understand and respond to the manipulated text and therefore be vulnerable to the very attack the implemented protection model was put in place to prevent."
The attack has been found to be successful against text classification models using BPE (Byte Pair Encoding) or WordPiece tokenization strategies, but not against those using Unigram.
"The TokenBreak attack technique demonstrates that these protection models can be bypassed by manipulating the input text, leaving production systems vulnerable," the researchers said. "Knowing the family of the underlying protection model and its tokenization strategy is critical for understanding your susceptibility to this attack."
"Because tokenization strategy typically correlates with model family, a straightforward mitigation exists: Select models that use Unigram tokenizers."
To defend against TokenBreak, the researchers suggest using Unigram tokenizers when possible, training models with examples of bypass tricks, and checking that tokenization and model logic stays aligned. It also helps to log misclassifications and look for patterns that hint at manipulation.
Disclaimer:info@kdj.com
The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research!
If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.
-
-
-
-
-
- Bitcoin (BTC) Price Moved Little on Thursday, Extending Its Rangebound Performance After a Recent Rebound Ran Dry
- Jun 14, 2025 at 10:05 pm
- Bitcoin moved little on Thursday, extending its rangebound performance after a recent rebound ran dry amid persistent concerns over the outlook for U.S. trade tariffs and economic growth.
-
- The U.S. Securities and Exchange Commission (SEC) has postponed its decision to approve spot exchange-traded funds (ETFs) based on major altcoins such as Ripple (XRP) and Dogecoin (DOGE) to June.
- Jun 14, 2025 at 10:05 pm
- According to the virtual asset industry on the 1st (local time), the SEC announced in an official document released on the 29th (local time) that the final decision date for the XRP spot ETF applied by Franklin Templeton will be postponed to June 17 and the Dogecoin spot ETF applied by Bitwise will be postponed to June 15.
-
-
- Deputy Director of the CIA Micheal Jayellis said the CIA views Bitcoin and other cryptocurrencies as a tool they can use for payments.
- Jun 14, 2025 at 10:00 pm
- He said Bitcoin and other digital assets were “another tool in the tool box.” He also noted the agency aims to give President Donald Trump a decisive intelligence advantage
-