datasets Package¶

`datasets` Package¶

This package contains modules for working with datasets: generating data, I/O, data transformations, etc.

`io` Module¶

Utilities for reading and writing datasets for various algorithms.

class bibench.datasets.io.ExpressionArray[source]¶

Bases: numpy.ndarray

A numpy array with extra attributes ‘genes’, ‘samples’, and ‘annotation’.

adapted from http://docs.scipy.org/doc/numpy/user/basics.subclassing.html

bibench.datasets.io.read_expression_data(filename, skip_header=1, strip_chars=None)[source]¶

Read a tsv file with the same format written by write_expression_data().

Args:

filename:
skip_header:
strip_chars:

Returns:

An instance of ExpressionArray.

bibench.datasets.io.write_bicoverlapper(bicluster_sets, filename, rownames=None, colnames=None)[source]¶

Writes every bicluster in a list of lists of biclusters to a file in the format read by BicOverlapper.

Args:

bicluster_sets: a list of list of biclusters that will be

written to the file.
filename: output file name.

File format:

[number_of_biclusters]
bicluster set 1
#rows bic1.1 #columns bic1.1
row1 row2 ... rowN
col1 col2 ... colN
#rows bic1.2 #columns bic1.2
row1 row2 ... rowN
col1 col2 ... colN
...
bicluster set 2
#rows bic2.1 #columns bic2.1
row1 row2 ... rowN
col1 col2 ... colN
#rows bic2.2 #columns bic2.2
row1 row2 ... rowN
col1 col2 ... colN
...

bibench.datasets.io.write_david_list(filename, gene_list)[source]¶: Write a DAVID (http://david.abcc.ncifcrf.gov/) list of genes.

bibench.datasets.io.write_david_multilist(filename, gene_lists, name=None)[source]¶

Writes a DAVID multilist, with each list in one column.

The first row gives the name of the list, which is just ‘name#’.

The gene names must be the same for each bicluster

bibench.datasets.io.write_expression_data(data, filename, sep='\t', genes=None, conditions=None)[source]¶

Writes a dataset in the following relatively standard format:

Genes/Conditions [col ID] [col ID] ... [col ID]
[row ID] [value] [value] ... [value]
[row ID] [value] [value] ... [value]
...
[row ID] [value] [value] ... [value]

Args:

data: numpy.ndarray
filename: Output file name.
sep: Seperating character, e.g. ‘ ‘ or ‘,’.
genes: Optional list of row labels.
conditions: Optional list of column labels.

bibench.datasets.io.write_pcl_dataset(data, filename)[source]¶

Given the pure numpy data matrix with only expression values, converts the data matrix into PCL format with default row and column names.

Args:

data: numpy.ndarray
filename: output file name.

HPC Lab - Software - BiBench

Table Of Contents

Previous topic

Next topic

This Page

datasets Package¶