HPC Lab - Software - BiBench

Table Of Contents

Previous topic

algorithms Package

Next topic

validation Package

This Page

datasets Package

datasets Package

This package contains modules for working with datasets: generating data, I/O, data transformations, etc.

io Module

Utilities for reading and writing datasets for various algorithms.

class bibench.datasets.io.ExpressionArray[source]

Bases: numpy.ndarray

A numpy array with extra attributes ‘genes’, ‘samples’, and ‘annotation’.

adapted from http://docs.scipy.org/doc/numpy/user/basics.subclassing.html

bibench.datasets.io.read_expression_data(filename, skip_header=1, strip_chars=None)[source]

Read a tsv file with the same format written by write_expression_data().

Args:
  • filename:
  • skip_header:
  • strip_chars:
Returns:
An instance of ExpressionArray.
bibench.datasets.io.write_bicoverlapper(bicluster_sets, filename, rownames=None, colnames=None)[source]

Writes every bicluster in a list of lists of biclusters to a file in the format read by BicOverlapper.

Args:
  • bicluster_sets: a list of list of biclusters that will be

    written to the file.

  • filename: output file name.

File format:

[number_of_biclusters]
bicluster set 1
#rows bic1.1 #columns bic1.1
row1 row2 ... rowN
col1 col2 ... colN
#rows bic1.2 #columns bic1.2
row1 row2 ... rowN
col1 col2 ... colN
...
bicluster set 2
#rows bic2.1 #columns bic2.1
row1 row2 ... rowN
col1 col2 ... colN
#rows bic2.2 #columns bic2.2
row1 row2 ... rowN
col1 col2 ... colN
...
bibench.datasets.io.write_david_list(filename, gene_list)[source]

Write a DAVID (http://david.abcc.ncifcrf.gov/) list of genes.

bibench.datasets.io.write_david_multilist(filename, gene_lists, name=None)[source]

Writes a DAVID multilist, with each list in one column.

The first row gives the name of the list, which is just ‘name#’.

The gene names must be the same for each bicluster

bibench.datasets.io.write_expression_data(data, filename, sep='\t', genes=None, conditions=None)[source]

Writes a dataset in the following relatively standard format:

Genes/Conditions [col ID] [col ID] ... [col ID]
[row ID] [value] [value] ... [value]
[row ID] [value] [value] ... [value]
...
[row ID] [value] [value] ... [value]
Args:
  • data: numpy.ndarray
  • filename: Output file name.
  • sep: Seperating character, e.g. ‘ ‘ or ‘,’.
  • genes: Optional list of row labels.
  • conditions: Optional list of column labels.
bibench.datasets.io.write_pcl_dataset(data, filename)[source]

Given the pure numpy data matrix with only expression values, converts the data matrix into PCL format with default row and column names.

Args:
  • data: numpy.ndarray
  • filename: output file name.

rdata Module

synthetic Module

transform Module