cflg code documentation

cflg

class Edge(*, start_node: Node, end_node: Node, timestamp: int)

Bases: BaseModel

Edge class representing a connection between two nodes with a timestamp.

Parameters:

start_node (Node) –
end_node (Node) –
timestamp (int) –

start_node

The starting node of the edge.

Type:: Node

end_node

The ending node of the edge.

Type:: Node

timestamp

The timestamp associated with the edge.

Type:: int

end_node: Node

get_max_node()

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'end_node': FieldInfo(annotation=Node, required=True), 'start_node': FieldInfo(annotation=Node, required=True), 'timestamp': FieldInfo(annotation=int, required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

start_node: Node

timestamp: int

class Node(*, number: int)

Bases: BaseModel

Node class with a numerical value for comparison operations.

Parameters:: number (int) –

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'number': FieldInfo(annotation=int, required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

number: int

class SelectApproach(s_node1_number: int = None, s_node2_number: int = None)

Bases: object

Class to select a subgraph from a given graph using different sampling approaches.

Parameters:

s_node1_number (int) –
s_node2_number (int) –

start_node1_number

The starting node number for the snowball sampling method.

Type:: Optional[int]

start_node2_number

An additional starting node number for the snowball sampling method.

Type:: Optional[int]

__call__(graph: StaticGraph)

Execute the selected sampling method on the graph.

Chooses between snowball sampling and random vertex sampling based on the provided starting nodes.

Parameters:: graph (StaticGraph) – The original graph from which a subgraph is to be sampled.
Returns:: The sampled subgraph.
Return type:: StaticGraph

static random_selected_vertices(graph: StaticGraph) → StaticGraph

Perform random vertex sampling on the graph.

Randomly selects a specified number of vertices and their associated edges to create a subgraph.

Parameters:: graph (StaticGraph) – The original graph from which a subgraph is to be sampled.
Returns:: The sampled subgraph.
Return type:: StaticGraph

snowball_sample(graph: StaticGraph) → StaticGraph

Perform snowball sampling on the graph.

Starting from one or two nodes, it expands to include neighbors of these nodes, up to a specified limit.

Parameters:: graph (StaticGraph) – The original graph from which a subgraph is to be sampled.
Returns:: The sampled subgraph.
Return type:: StaticGraph

start_node1_number: Optional[int]

start_node2_number: Optional[int]

class StaticGraph

Bases: object

Class representing a static graph with nodes and edges.

The graph is represented as an adjacency dictionary of dictionaries. Each node is a key in the outer dictionary, and its value is another dictionary containing adjacent nodes as keys and a list of timestamps as values.

num_of_edge

Number of edges in the graph.

Type:: int

num_of_node

Number of nodes in the graph.

Type:: int

adjacency_dict_of_dicts

Adjacency dictionary of dictionaries.

Type:: dict

largest_connected_component

Largest connected component of the graph.

Type:: Optional[StaticGraph]

number_of_connected_components

Number of connected components in the graph.

Type:: Optional[int]

add_edge(edge: Edge) → int

Add a new edge to the graph. If the nodes of the edge do not exist, they are added to the graph.

Parameters:: edge (Edge) – The edge to be added.
Returns:: The updated number of edges in the graph.
Return type:: int

add_node(node: Node) → int

Add a new node to the graph.

Parameters:: node (Node) – The node to be added.
Returns:: The updated number of nodes in the graph.
Return type:: int

adjacency_dict_of_dicts: dict[int, dict[int, [<class 'int'>]]] = None

assortative_factor() → float

Calculate the assortativity coefficient of the graph.

Assortativity measures the similarity of connections in the graph with respect to the node degree. It indicates whether high-degree nodes tend to connect with other high-degree nodes (assortative mixing) or low-degree nodes (disassortative mixing). A positive assortativity coefficient indicates a preference for high-degree nodes to attach to other high-degree nodes, while a negative coefficient indicates the opposite.

Returns:: The assortativity coefficient of the graph.
Return type:: float

average_cluster_factor() → float

Calculate the average clustering coefficient for the largest connected component in the graph.

The clustering coefficient for a vertex quantifies how close its neighbors are to being a complete graph (clique).

Returns:: The average clustering coefficient for the largest connected component.
Return type:: float

count_edges() → int

Return the number of edges in the graph.

Return type:: int

count_vertices() → int

Return the number of vertices in the graph.

Return type:: int

density() → float

Calculate and return the density of the graph.

Density is defined as the ratio of the number of edges to the maximum possible number of edges in a graph with the same number of vertices.

Return type:: float

get_adjacency_dict_of_dicts() → dict

Return the adjacency dictionary of dictionaries representing the graph.

Return type:: dict

get_diameter(graph: StaticGraph) → int

Calculate the diameter of the graph.

The diameter is the greatest distance between any pair of vertices in the graph.

Parameters:: graph (StaticGraph) – The graph for which the diameter is to be calculated.
Returns:: The diameter of the graph.
Return type:: int

get_largest_connected_component() → StaticGraph

If not already found, find the largest weakly connected component.

Returns:: The largest weakly connected component.
Return type:: StaticGraph

get_number_of_connected_components() → int

If not already found, find the number of weakly connected components.

Returns:: Number of weakly connected components.
Return type:: int

get_radius(graph: StaticGraph) → int

Calculate the radius of the graph.

The radius is the minimum eccentricity of any vertex in the graph. Eccentricity of a vertex is the greatest distance between that vertex and any other vertex in the graph.

Parameters:: graph (StaticGraph) – The graph for which the radius is to be calculated.
Returns:: The radius of the graph.
Return type:: int

largest_connected_component: Optional[StaticGraph] = None

num_of_edge: int = 0

num_of_node: int = 0

number_of_connected_components: Optional[int] = None

percentile_distance(graph: StaticGraph, percentile: int = 90) → int

Calculate a specific percentile of the distance distribution in the graph.

Parameters:

graph (StaticGraph) – The graph for which the distances are calculated.
percentile (int) – The percentile to calculate (between 0 and 100).

Returns:

The calculated percentile distance.

Return type:

int

share_of_vertices() → float

Calculate the proportion of vertices in the largest connected component.

Returns:: Proportion of vertices in the largest connected component.
Return type:: float

features_for_edges_of_static_graph(path_to_data, verbose=False)

Generate features for edges of the static graph from data file

Parameters:: path_to_data – the path to the data file in the format: string - “num_node_1 num_node_2 timestamp”. The data starts with the 3rd line. (the first two lines of the file are skipped)
Returns:: features for edges of the static graph
Return type:: pandas.DataFrame

graph_features_auc_score_tables(datasets_info: DataFrame, cls_model=None, verbose=False)

Generate LaTeX tables of network features from a DataFrame of datasets information.

Parameters:

datasets_info (pd.DataFrame) – DataFrame with columns: ‘Network’, ‘Label’, ‘Category’, ‘Edge type’, ‘Path’. Path - the path to the data file in the format: string - “num_node_1 num_node_2 timestamp”. The data starts with the 3rd line. (the first two lines of the file are skipped)
cls_model – classification model for predicting the appearance of an edge.

Returns:

A tuple of LaTeX strings for different feature tables of the networks.

Return type:

tuple