Code documentation

Main Functions

start_testing(attack_model: ClientBase, tested_model: ClientBase, config: dict, num_threads: int = 1, tests_with_attempts: List[Tuple[str, int]] | None = None, custom_tests_with_attempts: List[Tuple[Type[TestBase], int]] | None = None)[source]

Start testing.

Parameters:
  • attack_model (ClientBase) – The attacking model used to generate tests.

  • tested_model (ClientBase) – The model being tested against the attacks.

  • config (dict) –

    Configuration dictionary with the following keys:

    • ’enable_logging’bool

      Whether to enable logging.

    • ’enable_reports’bool

      Whether to generate xlsx reports.

    • ’artifacts_path’Optional[str]

      Path to the folder for saving artifacts.

    • ’debug_level’int

      Level of logging verbosity (default is 1). debug_level = 0 - WARNING. debug_level = 1 - INFO. debug_level = 2 - DEBUG.

    • ’report_language’str

      Language for the report (default is ‘en’). Possible values: ‘en’, ‘ru’.

  • num_threads (int, optional) – Number of threads for parallel test execution (default is 1).

  • tests_with_attempts (List[Tuple[str, int]], optional) – List of test names and their corresponding number of attempts. Available tests: - aim_jailbreak - base64_injection - complimentary_transition - do_anything_now_jailbreak - ethical_compliance - harmful_behavior - linguistic_evasion - past_tense - RU_do_anything_now_jailbreak - RU_typoglycemia_attack - RU_ucar - sycophancy_test - typoglycemia_attack - ucar

  • custom_tests_with_attempts (List[Tuple[Type[TestBase], int]], optional) – List of custom test instances and their corresponding number of attempts.

Return type:

None

Note

This function starts the testing process with different configurations.

Abstract Classes

class ClientBase[source]

Base class for interacting with chat models. The history and new messages are passed as a list of dictionaries.

system_prompts

Optional system prompts to guide the conversation.

Type:

Optional[List[str]]

model_description

Optional model description to guide the conversation.

Type:

Optional[str]

interact(history: List[Dict[str, str]], messages: List[Dict[str, str]]) Dict[str, str][source]

Takes the conversation history and new messages, sends them to the LLM, and returns a new response.

Note

ClientBase is an abstract base class for client implementations.

class TestBase(client_config: ClientConfig, attack_config: AttackConfig, artifacts_path: str | None = None, num_attempts: int = 0)[source]

A base class for test classes. Each test represents a different kind of attack against the target LLM model. The test sends a sequence of prompts and evaluate the responses while updating the status.

Parameters:
  • client_config (ClientConfig)

  • attack_config (AttackConfig)

  • artifacts_path (str | None)

  • num_attempts (int)

Note

TestBase is an abstract base class designed for attack handling in the testing framework.

Available Clients

class ClientLangChain(backend: str, system_prompts: List[str] | None = None, model_description: str | None = None, **kwargs)[source]

Bases: ClientBase

Wrapper for interacting with models through LangChain.

Parameters:
  • backend (str) – The backend name to use for model initialization.

  • system_prompts (Optional[List[str]]) – List of system prompts for initializing the conversation context (optional).

  • **kwargs – Additional arguments passed to the model’s constructor.

  • model_description (str | None)

_convert_to_base_format(message: BaseMessage) Dict[str, str][source]

Converts a LangChain message (HumanMessage, AIMessage) to the base format (Dict with “role” and “content”).

_convert_to_langchain_format(message: Dict[str, str]) BaseMessage[source]

Converts a message from the base format (Dict) to LangChain’s format (HumanMessage, AIMessage).

interact(history: List[Dict[str, str]], messages: List[Dict[str, str]]) Dict[str, str][source]

Takes conversation history and new messages, sends a request to the model, and returns the response as a dictionary.

Note

ClientLangChain is a client implementation for LangChain-based services.

class ClientOpenAI(api_key: str, base_url: str, model: str, temperature: float = 0.1, system_prompts: List[str] | None = None, model_description: str | None = None)[source]

Bases: ClientBase

Wrapper for interacting with OpenAI-compatible API. This client can be used to interact with any language model that supports the OpenAI API, including but not limited to OpenAI models.

Parameters:
  • api_key (str) – The API key for authentication.

  • base_url (str) – The base URL of the OpenAI-compatible API.

  • model (str) – The model identifier to use for generating responses.

  • temperature (float) – The temperature setting for controlling randomness in the model’s responses.

  • system_prompts (Optional[List[str]]) – List of system prompts for initializing the conversation context (optional).

  • model_description (str) – Description of the model, including domain and other features (optional).

_convert_to_base_format(message: Dict[str, str]) Dict[str, str][source]

Converts a message from OpenAI format (Dict) to the base format (Dict with “role” and “content”).

_convert_to_openai_format(message: Dict[str, str]) Dict[str, str][source]

Converts a message from the base format (Dict with “role” and “content”) to OpenAI’s format (Dict).

interact(history: List[Dict[str, str]], messages: List[Dict[str, str]]) Dict[str, str][source]

Takes conversation history and new messages, sends a request to the OpenAI-compatible API, and returns the response.

Note

ClientOpenAI is a client implementation for OpenAI-based services.