Code documentation¶
Main Functions¶
- start_testing(attack_model: ClientBase, tested_model: ClientBase, config: dict, num_threads: int = 1, tests_with_attempts: List[Tuple[str, int]] | None = None, custom_tests_with_attempts: List[Tuple[Type[TestBase], int]] | None = None)[source]
Start testing.
- Parameters:
attack_model (ClientBase) – The attacking model used to generate tests.
tested_model (ClientBase) – The model being tested against the attacks.
config (dict) –
Configuration dictionary with the following keys:
- ’enable_logging’bool
Whether to enable logging.
- ’enable_reports’bool
Whether to generate xlsx reports.
- ’artifacts_path’Optional[str]
Path to the folder for saving artifacts.
- ’debug_level’int
Level of logging verbosity (default is 1). debug_level = 0 - WARNING. debug_level = 1 - INFO. debug_level = 2 - DEBUG.
- ’report_language’str
Language for the report (default is ‘en’). Possible values: ‘en’, ‘ru’.
num_threads (int, optional) – Number of threads for parallel test execution (default is 1).
tests_with_attempts (List[Tuple[str, int]], optional) – List of test names and their corresponding number of attempts. Available tests: - aim_jailbreak - base64_injection - complimentary_transition - do_anything_now_jailbreak - ethical_compliance - harmful_behavior - linguistic_evasion - past_tense - RU_do_anything_now_jailbreak - RU_typoglycemia_attack - RU_ucar - sycophancy_test - typoglycemia_attack - ucar
custom_tests_with_attempts (List[Tuple[Type[TestBase], int]], optional) – List of custom test instances and their corresponding number of attempts.
- Return type:
None
Note
This function starts the testing process with different configurations.
Abstract Classes¶
- class ClientBase[source]¶
Base class for interacting with chat models. The history and new messages are passed as a list of dictionaries.
Note
ClientBase is an abstract base class for client implementations.
- class TestBase(client_config: ClientConfig, attack_config: AttackConfig, artifacts_path: str | None = None, num_attempts: int = 0)[source]¶
A base class for test classes. Each test represents a different kind of attack against the target LLM model. The test sends a sequence of prompts and evaluate the responses while updating the status.
Note
TestBase is an abstract base class designed for attack handling in the testing framework.
Available Clients¶
- class ClientLangChain(backend: str, system_prompts: List[str] | None = None, model_description: str | None = None, **kwargs)[source]
Bases:
ClientBase
Wrapper for interacting with models through LangChain.
- Parameters:
- _convert_to_base_format(message: BaseMessage) Dict[str, str] [source]
Converts a LangChain message (HumanMessage, AIMessage) to the base format (Dict with “role” and “content”).
Note
ClientLangChain is a client implementation for LangChain-based services.
- class ClientOpenAI(api_key: str, base_url: str, model: str, temperature: float = 0.1, system_prompts: List[str] | None = None, model_description: str | None = None)[source]
Bases:
ClientBase
Wrapper for interacting with OpenAI-compatible API. This client can be used to interact with any language model that supports the OpenAI API, including but not limited to OpenAI models.
- Parameters:
api_key (str) – The API key for authentication.
base_url (str) – The base URL of the OpenAI-compatible API.
model (str) – The model identifier to use for generating responses.
temperature (float) – The temperature setting for controlling randomness in the model’s responses.
system_prompts (Optional[List[str]]) – List of system prompts for initializing the conversation context (optional).
model_description (str) – Description of the model, including domain and other features (optional).
- _convert_to_base_format(message: Dict[str, str]) Dict[str, str] [source]
Converts a message from OpenAI format (Dict) to the base format (Dict with “role” and “content”).
Note
ClientOpenAI is a client implementation for OpenAI-based services.