Code documentation¶

Main Functions¶

start_testing(attack_model: ClientBase, tested_model: ClientBase, config: dict, num_threads: int = 1, tests_with_attempts: List[Tuple[str, int]] | None = None, custom_tests_with_attempts: List[Tuple[Type[TestBase], int]] | None = None)[source]

Start testing.

Parameters:

attack_model (ClientBase) – The attacking model used to generate tests.
tested_model (ClientBase) – The model being tested against the attacks.
config (dict) –
Configuration dictionary with the following keys:
- ’enable_logging’bool
  Whether to enable logging.
- ’enable_reports’bool
  Whether to generate xlsx reports.
- ’artifacts_path’Optional[str]
  Path to the folder for saving artifacts.
- ’debug_level’int
  Level of logging verbosity (default is 1). debug_level = 0 - WARNING. debug_level = 1 - INFO. debug_level = 2 - DEBUG.
- ’report_language’str
  Language for the report (default is ‘en’). Possible values: ‘en’, ‘ru’.
num_threads (int, optional) – Number of threads for parallel test execution (default is 1).
tests_with_attempts (List[Tuple[str, int]], optional) – List of test names and their corresponding number of attempts. Available tests: - aim_jailbreak - base64_injection - complimentary_transition - do_anything_now_jailbreak - ethical_compliance - harmful_behavior - linguistic_evasion - past_tense - RU_do_anything_now_jailbreak - RU_typoglycemia_attack - RU_ucar - sycophancy_test - typoglycemia_attack - ucar
custom_tests_with_attempts (List[Tuple[Type[TestBase], int]], optional) – List of custom test instances and their corresponding number of attempts.

Return type:

None

Note

This function starts the testing process with different configurations.

Abstract Classes¶

class ClientBase[source]¶

Base class for interacting with chat models. The history and new messages are passed as a list of dictionaries.

system_prompts¶

Optional system prompts to guide the conversation.

Type:: Optional[List[str]]

model_description¶

Optional model description to guide the conversation.

Type:: Optional[str]

interact(history: List[Dict[str, str]], messages: List[Dict[str, str]]) → Dict[str, str][source]¶: Takes the conversation history and new messages, sends them to the LLM, and returns a new response.

Note

ClientBase is an abstract base class for client implementations.

class TestBase(client_config: ClientConfig, attack_config: AttackConfig, artifacts_path: str | None = None, num_attempts: int = 0)[source]¶

A base class for test classes. Each test represents a different kind of attack against the target LLM model. The test sends a sequence of prompts and evaluate the responses while updating the status.

Parameters:

client_config (ClientConfig)
attack_config (AttackConfig)
artifacts_path (str | None)
num_attempts (int)

Note

TestBase is an abstract base class designed for attack handling in the testing framework.

Available Clients¶

class ClientLangChain(backend: str, system_prompts: List[str] | None = None, model_description: str | None = None, **kwargs)[source]

Bases: ClientBase

Wrapper for interacting with models through LangChain.

Parameters:

backend (str) – The backend name to use for model initialization.
system_prompts (Optional[List[str]]) – List of system prompts for initializing the conversation context (optional).
**kwargs – Additional arguments passed to the model’s constructor.
model_description (str | None)

_convert_to_base_format(message: BaseMessage) → Dict[str, str][source]: Converts a LangChain message (HumanMessage, AIMessage) to the base format (Dict with “role” and “content”).

_convert_to_langchain_format(message: Dict[str, str]) → BaseMessage[source]: Converts a message from the base format (Dict) to LangChain’s format (HumanMessage, AIMessage).

interact(history: List[Dict[str, str]], messages: List[Dict[str, str]]) → Dict[str, str][source]: Takes conversation history and new messages, sends a request to the model, and returns the response as a dictionary.

Note

ClientLangChain is a client implementation for LangChain-based services.

class ClientOpenAI(api_key: str, base_url: str, model: str, temperature: float = 0.1, system_prompts: List[str] | None = None, model_description: str | None = None)[source]

Bases: ClientBase

Wrapper for interacting with OpenAI-compatible API. This client can be used to interact with any language model that supports the OpenAI API, including but not limited to OpenAI models.

Parameters:

api_key (str) – The API key for authentication.
base_url (str) – The base URL of the OpenAI-compatible API.
model (str) – The model identifier to use for generating responses.
temperature (float) – The temperature setting for controlling randomness in the model’s responses.
system_prompts (Optional[List[str]]) – List of system prompts for initializing the conversation context (optional).
model_description (str) – Description of the model, including domain and other features (optional).

_convert_to_base_format(message: Dict[str, str]) → Dict[str, str][source]: Converts a message from OpenAI format (Dict) to the base format (Dict with “role” and “content”).

_convert_to_openai_format(message: Dict[str, str]) → Dict[str, str][source]: Converts a message from the base format (Dict with “role” and “content”) to OpenAI’s format (Dict).

interact(history: List[Dict[str, str]], messages: List[Dict[str, str]]) → Dict[str, str][source]: Takes conversation history and new messages, sends a request to the OpenAI-compatible API, and returns the response.

Note

ClientOpenAI is a client implementation for OpenAI-based services.