HPC Lab - Software - BiBench

Table Of Contents

Previous topic

datasets Package

This Page

validation Package

validation Package

This package contains bicluster evaluation metrics.

External metrics require that the ground truth be known; i.e. that a set of expected biclusters is known.

Internal metrics depend only on properties of the data and biclusters themselvs.

enrichment Module

external Module

Metrics for evaluating biclusters and sets of biclusters.

exception bibench.validation.external.EmptyList[source]

Bases: bibench.validation.external.ExternalError

exception bibench.validation.external.ExternalError[source]

Bases: exceptions.Exception

class bibench.validation.external.ListScore

Bases: tuple

ListScore(relevance, recovery)

recovery

Alias for field number 1

relevance

Alias for field number 0

bibench.validation.external.calculate_bicluster_group_matrix(datamatrix, bicluster_set)[source]

Returns two matrices,Crow and Ccol, that contain the information about how many times two pair of rows (and columns) are clustered together in the biclustering result. That is C_ij = 1/ (1+k), where k is the number of biclusters that row_i and row_j (col_i and col_j) appear together. Args:

  • datamatrix: 2-Dimensional np array representing dataset.
  • bicluster_set: List of bicluster instaces. The list of found biclusters in the datamatrix.
bibench.validation.external.calculate_distance_matrix(datamatrix, distance_function)[source]

Returns a matrix, Prows, that contains the information about proximity between rows. Args:

  • datamatrix: 2-Dimensional np array representing dataset.
  • distance_function: Function that calculates the distance between 2 vectors. For example ecludian_distance function.
bibench.validation.external.col_pair_count(i, j, bicluster_set)[source]

Returns the count of biclusters where the col_i and col_j appear together Args:

  • i: First col index, starting from 0.
  • j: Second col index, starting from 0.
  • bicluster_set: List of bicluster instaces. The list of found biclusters in the datamatrix.
bibench.validation.external.ecludian_distance(vector1, vector2)[source]

Returns the euclidian distance of the two vectors. Raises if the lengths of the vectors are not equal. Args:

  • vector1: 1-Dimensional list (vector)
  • vector1: 1-Dimensional list (vector)
bibench.validation.external.f_measure(expected, found, beta=1, modified=True)[source]

Calculates the f_measure score of a result of an algorithm with the expected biclusters that are embedded to the dataset. Uses relevance and recovery scores, and takes the harmonic mean of them. In the harmonic mean formula, relevance score is scaled with the square of beta as in the following formula:

\[(1 + \beta^2) * (spec * sens) / (\beta^2 * spec + sens)\]
Args:
  • expected: Bicluster.
  • found: Bicluster.
  • beta: scale factor in the formula of f_measure.
bibench.validation.external.f_measure_list(expected, found, beta=1, modified=True)[source]

Calculates both the relevance and the recovery scores of the algorithm result according to f_measure score with the given beta scale.

\[Recovery = \sum_{e \in expected} max_{f \in found} fmeasure(e, f)\]\[Relevance = \sum_{f \in found} max_{e \in expected} fmeasure(e, f)\]
Args:
  • expected: list of target biclusters
  • found: list of biclusters for comparison
  • beta: scale factor in the formula of f_measure.
bibench.validation.external.get_distance_matrices(datamatrix, distance_function)[source]

Returns two matrices, Prows and Pcols, that contain the information about proximity between rows and columns respectively. Args:

  • datamatrix: 2-Dimensional np array representing dataset.
  • distance_function: Function that calculates the distance between 2 vectors. For example ecludian_distance function.
bibench.validation.external.get_upperHalf(C)[source]

Returns a single dimensional array containing the values at upper half of the matrix C. Args: * C: 2-Dimensional square np array. Simply result of calculate_bicluster_group_matrix function.

bibench.validation.external.hubert_statistics(P, C)[source]

Returns a score that is calculated using internal indices. Uses hubert_statistics explained in R. Santamaria et. al 2007. Args: * P: 2-Dimensional squared np array. A result of get_distance_matrices function. * C: 2-Dimensional square np array. A result of calculate_bicluster_group_matrix function.

bibench.validation.external.internal_index_score(datamatrix, bicluster_set)[source]

Returns the weighted hubert_statistics scores of 2 scores calculated for columns and rows.(R. Santamaria et. al 2007 section 3.3) Args: * datamatrix: 2-Dimensional np array representing dataset. * bicluster_set: List of bicluster instaces. The list of found biclusters in the datamatrix.

bibench.validation.external.jaccard(expected, found)[source]

Jaccard coefficient of bicluster area.

\[\frac{|e \cap f|}{|e \cup f|}\]
Args:
  • expected: Bicluster
  • found: Bicluster
bibench.validation.external.jaccard_list(expected, found)[source]

Recovery and relevance scores of a set of biclusters using Jaccard coefficient.

\[Recovery = \sum_{e \in expected} max_{f \in found} \frac{|e \cap f|}{|e \cup f|}\]\[Relevance = \sum_{f \in found} max_{e \in expected} \frac{|e \cap f|}{|e \cup f|}\]
Args:
  • expected: list of target biclusters
  • found: list of biclusters for comparison
bibench.validation.external.modified_relevance(expected, found)[source]

The proportion of ‘expected’ retrieved in ‘found’.

Note that the one-way relevance is not true relevance: it is the intersection divided by the size of ‘found’.

One way relevance:

\[\frac{|e \cap f|}{|f|}\]
Args:
  • expected: Bicluster
  • found: Bicluster
bibench.validation.external.prelic_list(expected, found)[source]

Calculates both the relevance and the recovery scores of the algorithm result according to Prelics row jaccard score.

Args:
  • expected: list of target biclusters
  • found: list of biclusters for comparison
bibench.validation.external.recovery(expected, found)[source]

The proportion of true positives found.

\[\frac{|e \cap f|}{|e|}\]
Args:
  • expected: Bicluster.
  • found: Bicluster.
bibench.validation.external.recovery_relevance_list(expected, found, modified=True)[source]

N.B.: The f-measure calculated as the harmonic mean of the recovery and relevance scores returned by this function is different from the f_measure_list function.

\[Recovery = \sum_{e \in expected} max_{f \in found} \frac{|e \cap f|}{|e|}\]\[Relevance = \sum_{f \in found} max_{e \in expected} \frac{|e \cap f|}{|f|}\]
Args:
  • expected: list of target biclusters
  • found: list of biclusters for comparison
bibench.validation.external.relevance(expected, found, nelts=None)[source]

The proportion of true negatives to all negatives.

\[\frac{n - |e \cup f|}{n - |e|}\]
Args:
  • expected: Bicluster

  • found: Bicluster

  • nelts: the number of elements in the data matrix. This

    parameter can be ommitted if expected and found have their .data attribute set.

bibench.validation.external.row_jaccard(expected, found)[source]

The Jaccard coefficient of rows only. As used by Prelic - intersection_of_rows / union_of_rows

Args:
  • expected: Bicluster
  • found: Bicluster
bibench.validation.external.row_pair_count(i, j, bicluster_set)[source]

Returns the count of biclusters where the row_i and row_j appear together Args:

  • i: First row index, starting from 0.
  • j: Second row index, starting from 0.
  • bicluster_set: List of bicluster instaces. The list of found biclusters in the datamatrix.