cflg code documentation

cflg

class Edge(*, start_node: Node, end_node: Node, timestamp: int)

Bases: BaseModel

Edge class representing a connection between two nodes with a timestamp.

Parameters:
  • start_node (Node) –

  • end_node (Node) –

  • timestamp (int) –

start_node

The starting node of the edge.

Type:

Node

end_node

The ending node of the edge.

Type:

Node

timestamp

The timestamp associated with the edge.

Type:

int

end_node: Node
get_max_node()
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'end_node': FieldInfo(annotation=Node, required=True), 'start_node': FieldInfo(annotation=Node, required=True), 'timestamp': FieldInfo(annotation=int, required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

start_node: Node
timestamp: int
class Node(*, number: int)

Bases: BaseModel

Node class with a numerical value for comparison operations.

Parameters:

number (int) –

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'number': FieldInfo(annotation=int, required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

number: int
class SelectApproach(s_node1_number: int = None, s_node2_number: int = None)

Bases: object

Class to select a subgraph from a given graph using different sampling approaches.

Parameters:
  • s_node1_number (int) –

  • s_node2_number (int) –

start_node1_number

The starting node number for the snowball sampling method.

Type:

Optional[int]

start_node2_number

An additional starting node number for the snowball sampling method.

Type:

Optional[int]

__call__(graph: StaticGraph)

Execute the selected sampling method on the graph.

Chooses between snowball sampling and random vertex sampling based on the provided starting nodes.

Parameters:

graph (StaticGraph) – The original graph from which a subgraph is to be sampled.

Returns:

The sampled subgraph.

Return type:

StaticGraph

static random_selected_vertices(graph: StaticGraph) StaticGraph

Perform random vertex sampling on the graph.

Randomly selects a specified number of vertices and their associated edges to create a subgraph.

Parameters:

graph (StaticGraph) – The original graph from which a subgraph is to be sampled.

Returns:

The sampled subgraph.

Return type:

StaticGraph

snowball_sample(graph: StaticGraph) StaticGraph

Perform snowball sampling on the graph.

Starting from one or two nodes, it expands to include neighbors of these nodes, up to a specified limit.

Parameters:

graph (StaticGraph) – The original graph from which a subgraph is to be sampled.

Returns:

The sampled subgraph.

Return type:

StaticGraph

start_node1_number: Optional[int]
start_node2_number: Optional[int]
class StaticGraph

Bases: object

Class representing a static graph with nodes and edges.

The graph is represented as an adjacency dictionary of dictionaries. Each node is a key in the outer dictionary, and its value is another dictionary containing adjacent nodes as keys and a list of timestamps as values.

num_of_edge

Number of edges in the graph.

Type:

int

num_of_node

Number of nodes in the graph.

Type:

int

adjacency_dict_of_dicts

Adjacency dictionary of dictionaries.

Type:

dict

largest_connected_component

Largest connected component of the graph.

Type:

Optional[StaticGraph]

number_of_connected_components

Number of connected components in the graph.

Type:

Optional[int]

add_edge(edge: Edge) int

Add a new edge to the graph. If the nodes of the edge do not exist, they are added to the graph.

Parameters:

edge (Edge) – The edge to be added.

Returns:

The updated number of edges in the graph.

Return type:

int

add_node(node: Node) int

Add a new node to the graph.

Parameters:

node (Node) – The node to be added.

Returns:

The updated number of nodes in the graph.

Return type:

int

adjacency_dict_of_dicts: dict[int, dict[int, [<class 'int'>]]] = None
assortative_factor() float

Calculate the assortativity coefficient of the graph.

Assortativity measures the similarity of connections in the graph with respect to the node degree. It indicates whether high-degree nodes tend to connect with other high-degree nodes (assortative mixing) or low-degree nodes (disassortative mixing). A positive assortativity coefficient indicates a preference for high-degree nodes to attach to other high-degree nodes, while a negative coefficient indicates the opposite.

Returns:

The assortativity coefficient of the graph.

Return type:

float

average_cluster_factor() float

Calculate the average clustering coefficient for the largest connected component in the graph.

The clustering coefficient for a vertex quantifies how close its neighbors are to being a complete graph (clique).

Returns:

The average clustering coefficient for the largest connected component.

Return type:

float

count_edges() int

Return the number of edges in the graph.

Return type:

int

count_vertices() int

Return the number of vertices in the graph.

Return type:

int

density() float

Calculate and return the density of the graph.

Density is defined as the ratio of the number of edges to the maximum possible number of edges in a graph with the same number of vertices.

Return type:

float

get_adjacency_dict_of_dicts() dict

Return the adjacency dictionary of dictionaries representing the graph.

Return type:

dict

get_diameter(graph: StaticGraph) int

Calculate the diameter of the graph.

The diameter is the greatest distance between any pair of vertices in the graph.

Parameters:

graph (StaticGraph) – The graph for which the diameter is to be calculated.

Returns:

The diameter of the graph.

Return type:

int

get_largest_connected_component() StaticGraph

If not already found, find the largest weakly connected component.

Returns:

The largest weakly connected component.

Return type:

StaticGraph

get_number_of_connected_components() int

If not already found, find the number of weakly connected components.

Returns:

Number of weakly connected components.

Return type:

int

get_radius(graph: StaticGraph) int

Calculate the radius of the graph.

The radius is the minimum eccentricity of any vertex in the graph. Eccentricity of a vertex is the greatest distance between that vertex and any other vertex in the graph.

Parameters:

graph (StaticGraph) – The graph for which the radius is to be calculated.

Returns:

The radius of the graph.

Return type:

int

largest_connected_component: Optional[StaticGraph] = None
num_of_edge: int = 0
num_of_node: int = 0
number_of_connected_components: Optional[int] = None
percentile_distance(graph: StaticGraph, percentile: int = 90) int

Calculate a specific percentile of the distance distribution in the graph.

Parameters:
  • graph (StaticGraph) – The graph for which the distances are calculated.

  • percentile (int) – The percentile to calculate (between 0 and 100).

Returns:

The calculated percentile distance.

Return type:

int

share_of_vertices() float

Calculate the proportion of vertices in the largest connected component.

Returns:

Proportion of vertices in the largest connected component.

Return type:

float

features_for_edges_of_static_graph(path_to_data, verbose=False)

Generate features for edges of the static graph from data file

Parameters:

path_to_data – the path to the data file in the format: string - “num_node_1 num_node_2 timestamp”. The data starts with the 3rd line. (the first two lines of the file are skipped)

Returns:

features for edges of the static graph

Return type:

pandas.DataFrame

graph_features_auc_score_tables(datasets_info: DataFrame, cls_model=None, verbose=False)

Generate LaTeX tables of network features from a DataFrame of datasets information.

Parameters:
  • datasets_info (pd.DataFrame) – DataFrame with columns: ‘Network’, ‘Label’, ‘Category’, ‘Edge type’, ‘Path’. Path - the path to the data file in the format: string - “num_node_1 num_node_2 timestamp”. The data starts with the 3rd line. (the first two lines of the file are skipped)

  • cls_model – classification model for predicting the appearance of an edge.

Returns:

A tuple of LaTeX strings for different feature tables of the networks.

Return type:

tuple