This package contains biclustering algorithms. Each operates on numpy.ndarray datasets, and returns their results as a list of Bicluster instances.
Bayesian BiClustering (BBC) algorithm. Uses Gibbs sampling to find biclusters fitting the Bayesian biclustering model.
Wrapper to the BBC binary.
If ‘nclus’ is a list of integers, tries to determine the number of clusters in the dataset by performing multiple clusterings and choosing the one with the best BIC.
sqrn sometimes causes a divide by zero if the data is too uniform.
data: numpy.array; input data.
a list.
norm_method: one of ‘none’, ‘csn’, ‘rsn’, ‘irqn’, ‘sqrn’.
alpha: alpha% quartile used for IRQN or SQRN normalization.
Returns: BiclusterList
Coalesce algorithm wrapper for finding biclusters with up and down regulated TF.
Wrapper for the COALESCE binary.
data: numpy.ndarray
genes in a regulatory module.
conditions in a regulatory module.
conditions in a regulatory module.
normalize: whether to normalize the data.
Correlated Pattern Bicluster (CPB) algorithm. Finds biclusters with genes that have large pairwise Pearson correlation.
Wrapper for the CPB binary. Finds biclusters with high row-wise correlation.
Filter out small biclusters found by chance. ‘nclus’ should be large enough to generate a representative sample set.
biclusters: a list of biclusters found by CPB.
data: the dataset they were all run on.
nclus: the number of clusters to generate for filtering.
args: any parameters, in order, that cpb() takes.
For accurate results, use the same parameters used to generate the biclusters to be filtered.
Ordered Preserved SubMatrix wrapper.
OPSM biclustering algorithm. Finds biclusters that have non-decreasing rows.
QUalitative BIClustering algorithm. Efficient algorithm for finding biclusters with scaling patterns.
QUBIC biclustering algorithm
data: numpy.ndarray.
nblocks: Number of biclusters to report.
quantile: Quantile to use for discretization.
ranks: Number of ranks in discrete data.
discrete: True if the data is already discrete.
filtering: Whether to filter overlapping biclusters.
min_col_width: Minimum number of columns in a bicluster.
identical valid symbols in a column and the total number of rows in the output
Wrapper architecture for tasks common to many biclustering wrappers. Contains utility functions for writing datasets, reading back results, etc.
This module is only useful for wrapping binaries that expect data input as a file, and/or write their results out to a file.
To easily write an interface for such binaries, simply implement write_dataset(), read_results, and do_call(). Then you can call wrapper_helper(), passing those functions
To see how this works, try looking at qubic.py, cpb.py, or any other module that uses this function.
Perform biclustering. Convenience function for wrappers.
Takes care of creating a temporary directory, exporting the dataset, running, and reading back the results.
Many biclustering implementations require specific data formats and produce results in specialized formats. Call function for such algorithms, implementing write_dataset(), read_results(), and do_call().
binary: the name of the binary to call.
do_call: This function formats the command and runs it. It must have the following arguments:
(data, datafile, results_dir, **kwargs)It must call binary, passing it the data in datafile and storing results in the resultsdir directory. Any other arguments specific to each algorithm are passed in kwargs.
write_dataset: The function that writes the dataset out to filename in the necessary format. It must have the following arguments:
(data, filename)read_results: The function that reads the results back and returns them as a list of Biclusters. It must have the following arguments:
(results_dir, data)