HPC Lab - Software - BiBench

Table Of Contents

Previous topic

bibench Package

Next topic

datasets Package

This Page

algorithms Package

algorithms Package

This package contains biclustering algorithms. Each operates on numpy.ndarray datasets, and returns their results as a list of Bicluster instances.

bbc Module

Bayesian BiClustering (BBC) algorithm. Uses Gibbs sampling to find biclusters fitting the Bayesian biclustering model.

bibench.algorithms.bbc.bbc(data, nclus, norm_method='none', alpha=None)[source]

Wrapper to the BBC binary.

If ‘nclus’ is a list of integers, tries to determine the number of clusters in the dataset by performing multiple clusterings and choosing the one with the best BIC.

sqrn sometimes causes a divide by zero if the data is too uniform.

Args:
  • data: numpy.array; input data.

  • nclus: number of biclusters to find. Either an integer or

    a list.

  • norm_method: one of ‘none’, ‘csn’, ‘rsn’, ‘irqn’, ‘sqrn’.

  • alpha: alpha% quartile used for IRQN or SQRN normalization.

Returns: BiclusterList

biclust Module

coalesce Module

Coalesce algorithm wrapper for finding biclusters with up and down regulated TF.

bibench.algorithms.coalesce.coalesce(data, geneModuleProbability=0.95, conditionPvalueThreshold=0.05, conditionZThreshold=0.5, normalize=False)[source]

Wrapper for the COALESCE binary.

Args:
  • data: numpy.ndarray

  • geneModuleProbability: the probability threshhold for including

    genes in a regulatory module.

  • conditionPvalueThreshold: the P-value threshhold for including

    conditions in a regulatory module.

  • conditionZThreshold: the Z-score threshhold for including

    conditions in a regulatory module.

  • normalize: whether to normalize the data.

Returns:
A list of biclusters.

cpb Module

Correlated Pattern Bicluster (CPB) algorithm. Finds biclusters with genes that have large pairwise Pearson correlation.

bibench.algorithms.cpb.cpb(data, nclus, targetpcc=0.9, fixed_row=-1, fixed_col=-1, fixw=0, min_seed_rows=3, max_seed_rows=None)[source]

Wrapper for the CPB binary. Finds biclusters with high row-wise correlation.

Args:
  • data: numpy.ndarray
  • nclus: Number of biclusters to find.
  • targetpcc: Minimum PCC for rows.
  • fixed_row: A row that must be in each bicluster; -1 means none.
  • fixed_col: A column that must be in each bicluster; -1 means none.
  • fixw: Weight for computing error of fixed rows.
  • min_seed_rows: Minimum number of rows in each seed bicluster.
  • max_seed_rows: Maximum number of rows in each seed bicluster.
Returns:
A list of biclusters.
bibench.algorithms.cpb.cpb_filter(biclusters, data, nclus, *args, **kwargs)[source]

Filter out small biclusters found by chance. ‘nclus’ should be large enough to generate a representative sample set.

Args:
  • biclusters: a list of biclusters found by CPB.

  • data: the dataset they were all run on.

  • nclus: the number of clusters to generate for filtering.

  • args: any parameters, in order, that cpb() takes.

  • kwargs: may be any of the same named parameters as cpb() takes.

    For accurate results, use the same parameters used to generate the biclusters to be filtered.

Returns:
A sublist of ‘biclusters’, containing only those biclusters that are not likely due to random chance.

fabia Module

isa Module

opsm Module

Ordered Preserved SubMatrix wrapper.

bibench.algorithms.opsm.opsm(data, lValue=10)[source]

OPSM biclustering algorithm. Finds biclusters that have non-decreasing rows.

Args:
  • data: 2 dimensional numpy.array format to represent the data matrix.
  • lValue: the number of passed models for each iteration. Default is 10.
Returns:
A list of biclusters.

qubic Module

QUalitative BIClustering algorithm. Efficient algorithm for finding biclusters with scaling patterns.

bibench.algorithms.qubic.qubic(data, nblocks=100, quantile=0.06, ranks=1, discrete=False, filtering=True, min_col_width=2, consistency_level=0.95)[source]

QUBIC biclustering algorithm

Args:
  • data: numpy.ndarray.

  • nblocks: Number of biclusters to report.

  • quantile: Quantile to use for discretization.

  • ranks: Number of ranks in discrete data.

  • discrete: True if the data is already discrete.

  • filtering: Whether to filter overlapping biclusters.

  • min_col_width: Minimum number of columns in a bicluster.

  • consistency_level: the minimum ratio between the number of

    identical valid symbols in a column and the total number of rows in the output

Returns:
A list of biclusters.

wrapper Module

Wrapper architecture for tasks common to many biclustering wrappers. Contains utility functions for writing datasets, reading back results, etc.

This module is only useful for wrapping binaries that expect data input as a file, and/or write their results out to a file.

To easily write an interface for such binaries, simply implement write_dataset(), read_results, and do_call(). Then you can call wrapper_helper(), passing those functions

To see how this works, try looking at qubic.py, cpb.py, or any other module that uses this function.

exception bibench.algorithms.wrapper.WrapperException[source]

Bases: exceptions.Exception

bibench.algorithms.wrapper.wrapper_helper(binary, write_dataset, read_results, do_call, data, *args, **kwargs)[source]

Perform biclustering. Convenience function for wrappers.

Takes care of creating a temporary directory, exporting the dataset, running, and reading back the results.

Many biclustering implementations require specific data formats and produce results in specialized formats. Call function for such algorithms, implementing write_dataset(), read_results(), and do_call().

Args:
  • binary: the name of the binary to call.

  • do_call: This function formats the command and runs it. It must have the following arguments:

    (data, datafile, results_dir, **kwargs)

    It must call binary, passing it the data in datafile and storing results in the resultsdir directory. Any other arguments specific to each algorithm are passed in kwargs.

  • write_dataset: The function that writes the dataset out to filename in the necessary format. It must have the following arguments:

    (data, filename)
    
  • read_results: The function that reads the results back and returns them as a list of Biclusters. It must have the following arguments:

    (results_dir, data)
    
Returns:
  • A BiclusterList of Biclusters.