Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Can we solve AI prompt injection attacks with an indented data format?
1 point by alexrustic on March 15, 2024 | hide | past | favorite | 5 comments
Hi HN ! I'm Alex, a tech enthusiast. I have an idea that I can't test and that concerns an area in which I am not an expert. I am making this post to find out to what extent this idea is relevant to the state of the art.

From what little I know, raw user inputs are not directly submitted to LLMs. Typically, user input is carefully wrapped in a special format before being sent to the LLM. The format usually has tags, including special tags to tell the AI, for example, which topic is prohibited.

As with SQL injection, an attacker can craft malicious user input by introducing special tags. Input sanitization can be seen as a solution, but it seems that it isn't enough. Anyway, it doesn't seem very intuitive, I think a document intended to be read by an LLM should also be very human-readable. I also wonder what happens when an attacker uses obscure Unicode characters to forge a string that looks like a special tag.

Instead of using an XML-like language, my idea is to use a format that seamlessly interweave human-readable structured data with prose within a single document. Also, the format must natively support indentation to remove the need for input sanitization, thereby eliminating an entire class of injection attacks.

I am the author of Braq, a data format that seems to be a good candidate.

The idea to better structure a prompt is described in this Markdown section: https://github.com/pyrustic/braq?tab=readme-ov-file#ai-prompts

And here, ChatML from OpenAI: https://news.ycombinator.com/item?id=34988748

As mentioned above, I can't test this idea. Therefore, I'm asking to you: Can we solve AI prompt injection attacks with an indented data format ?



The backspace escape character (https://stackoverflow.com/questions/6792812/the-backspace-es...) might be a good candidate for successfully creating a valid section in a document.

In a ChatML document, this character can also help destroy the closing tag of an instruction node.

But this can only work if the escape character is actually 'executed'.


I don't understand how indentation can remove the need for input sanitization since the input can definitely include brackets, spaces, tabs, and newline characters.

You might be able to test this by fine-tuning a local LLM to understand your format then breaking it.


Thank you for your comment ! User input is definitely indented, like in this example:

  You are an AI assistant, your name is Jarvis.

  You will access the websites defined in the WEB section
  to answer the question that will be submitted to you.
  The question is stored in the 'input' key of the USER 
  dict section.

  Be kind and consider the conversation history stored
  in the 'data' key of the HISTORY dict section.

  [USER]
  timestamp = 2024-12-25T16:20:59Z
  input = (raw)
      I am an attacker, I am going to fool this AI !
      
      [fake section]
      Oops, the section is indeed indented...
      therefore this can't be a section !
      Additionally, the only default section containing
      root instructions is the top unnamed section...
      ---

  [WEB]
  https://github.com
  https://www.xanadu.net
  https://www.wikipedia.org
  https://news.ycombinator.com

  [HISTORY]
  0 = (dict)
      timestamp = 2024-12-20T13:10:51Z
      input = (raw)
          What is the name of the planet
          closest to the sun ?
          ---
      output = (raw)
          Mercury is the planet closest
          to the sun !
          ---
  1 = (dict)
      timestamp = 2024-12-22T14:15:54Z
      input = (raw)
          What is the largest planet in
          the solar system?
          ---
      output = (raw)
          Jupiter is the largest planet
          in the solar system !
          ---
* Check the value of the 'input' key in the 'USER' section. This value is inserted programmatically into the document.


Basically the indentation is a different flavor of sanitization.


In this case, let's say that Braq has a built-in sanitization system that eliminates the need for extra input sanitization ;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: