Data Structures

To achieve best compatibility with external libraries most of the data structures are standard Python dictionaries, with at least one key – data. The data key contains the actual data in an array-like object (such as NumPy array). Other attributes provide metadata that are required by some methods.

Raw recording

Raw electrophysiological data sampled at equally spaced time points. It can contain multiple channels, but all of them need to be of the same sampling frequency and duration (for example, multiple contacts of a tetrode).

The following keys are defined:

data:

array, required

array-like object (for example numpy.ndarray) of dimensions (N_channels, N_samples)

FS:

int, required

sampling frequency in Hz

n_contacts:

int, required

number of channels (tetrode contacts). It is equal to the size of the first dimension of data.

Note

You may read/write the data with your own functions, but to make the interface with the SpikeSort a bit cleaner, you might also want to define your custom IO filters (see Read/Write Filters (spike_sort.io.filters))

Example

We will read the raw tetrode data from Tutorial data using standard PyTablesFilter:

>>> from spike_sort.io.filters import PyTablesFilter
>>> io_filter = PyTablesFilter('../data/tutorial.h5')
>>> raw_data = io_filter.read_sp('/SubjectA/session01/el1')
>>> print(raw_data.keys())          # print all keys
['n_contacts', 'FS', 'data']
>>> shape = raw_data['data'].shape # check size
>>> print "{0} channels, {1} samples".format(*shape)
4 channels, 23512500 samples
>>> print(raw_data['FS'])    # check sampling frequency
25000

Spike times

A sequence of (sorted) time readings at which spikes were generated (or other discrete events happened).

The data is store in a dictionary with following keys:

data:

array, required

one-dimensional array-like object with event times (in milliseconds)

is_valid:

array, optional

boolean area of the same size as data – if an element is False the event of the same index is masked (or invalid)

Note

You may read/write the data with your own functions, but to make the interface with the SpikeSort a bit cleaner, you might also want to define your custom IO filters (see Read/Write Filters (spike_sort.io.filters))

Example

spike_sort.core.extract.detect_spikes() is one of functions which takes the raw recordings and returns spike times dictionary:

>>> import numpy as np
>>> raw_dict = {
...       'data': np.array([[0,1,0,0,0,1]]),
...       'FS'  : 10000,
...       'n_contacts': 1
...            }
>>> from spike_sort.core.extract import detect_spikes
>>> spt_dict = detect_spikes(raw_dict, thresh=0.8)
>>> print(spt_dict.keys())
['thresh', 'contact', 'data']
>>> print('Spike times (ms): {0}'.format(spt_dict['data']))
Spike times (ms): [ 0.   0.4]

Note that in addition to the required data key, detect_spikes() appends some extrcontact a attributes: thresh (detection threshold) and contact (contact on which spikes were detected). These attributes are ignored by other methods.

Spike waveforms

Spike waveform structure contains waveforms of extracted spikes. It may be any mapping data structure (usually a dictionary) with following keys:

data:

array, required

three-dimensional array-like object of size (N_points, N_spikes, N_contacts), where:

  • N_points – the number of data points in a single waveform,
  • N_spikes – the total number of spikes and
  • N_contacts – the number of independent channels (for example 4 in a tetrode)
time:

array, required

Timeline of the spike waveshapes (in miliseconds). It must be of the same size as the first dimension of data (n_pts).

FS:

int, optional

Sampling frequency.

n_contacts:

int, optional

Number of independent channels with spike waveshapes (see also raw_recording).

is_valid:

array, optional

boolean area of the size of second dimension of data (N_spikes) – if an element is False the spike with the same index is masked (or invalid)

Example

Spike waveforms can be extracted from raw recordings (see raw_recording) given a sequence of spike times (see Spike times) by means of spike_sort.core.extract.extract_spikes() function:

>>> from spike_sort.core.extract import extract_spikes
>>> raw_dict = {
...             'data': np.array([[0,1,1,0,0,0,1,-1,0,0, 0]]),
...             'FS'  : 10000,
...             'n_contacts': 1
...            }      # raw signal
>>> spt_dict = {
...             'data': np.array([0.15, 0.65, 1])}
...            }      # timestamps of three spikes
>>> sp_win = [0, 0.4] # window in which spikes should be extracted
>>> waves_dict = extract_spikes(raw_dict, spt_dict, sp_win)

Now let us investigate the returned spike waveforms structure:

  • keys:

    >>> print waves_dict.keys()
    ['is_valid', 'FS', 'data', 'time']
    
  • data array shape:

    >>> print(waves_dict['data'].shape)
    (4, 3, 1)
    
  • extracted spikes:

    >>> print(waves_dict['data'][:,:,0].T) # data contains three spikes
    [[ 1.  1.  0.  0.]
     [ 1. -1.  0.  0.]
     [ 0.  0.  0.  0.]]
    >>> print(waves_dict['time'])          # defined over 4 time points
    [ 0.   0.1  0.2  0.3]
    
  • and potential invalid (truncated spikes):

    >>> print(waves_dict['is_valid'])     # last spike is invalid (truncated)
    [ True True False]
    

Note that the is_valid element of truncated spike is False.

Spike features

This data structure contains features calculated from spike waveforms using one of the methods defined in spike_sort.core.features module (one of the fet*() functions, see Features).

The spike features dictionary consits of following keys:

data:

array, required

two-dimensional array of size (N_spikes, N_features) that contains the actual feature values

names:

list of str, required

list of length N_features containing feature labels

is_valid:

array, optional

boolean area of of length N_spikes; if an element is False the spike with the same index is masked (or invalid, see also Spike waveforms)

Example

Let us try to calculate peak-to-peak amplitude from some spikes extracted in Spike waveforms:

>>> from spike_sort.core.features import fetP2P
>>> print(waves_dict['data'].shape) # 3 spikes, 4 data points each
(4, 3, 1)
>>> feature_dict = fetP2P(waves_dict)
>>> print(feature_dict.keys())
['is_valid', 'data', 'names']
>>> print(feature_dict['data'].shape)
(3, 1)

Then we have one feature for 3 spikes. Let check whether the peak-to-peak amplitudes are correctly calculated:

>>> print(feature_dict['data'])
[[ 1.]
 [ 2.]
 [ 0.]]

as expected (compare with example above). There is only one peak-to-peak (P2P) feature on a single channel (Ch0) and its name is:

>>> print(feature_dict['names'])
['Ch0:P2P']

The mask array is inherited from waves_dict:

>>> print(feature_dict['is_valid'])
[ True True False]

Spike labels

Spike labels are the identifiers of a cell (unit) each spike was classified to. Spike labels are not dictionaries, but arrays of integers – one cluster index per spike.

Example

Let us try to cluster the spikes described by Sample feature using K-means with K=2:

>>> from spike_sort.core.cluster import cluster
>>> feature_dict = {
...                  'data' : np.array([[1],[-1], [1]]),
...                  'names' : ['Sample']
...                }
>>> labels = cluster('k_means', feature_dict, 2)
>>> print(labels)
[1 0 1]

As expected labels is an array describing two clusters: 0 and 1.