Data Structures¶
To achieve best compatibility with external libraries most of the data structures are standard Python dictionaries, with at least one key – data. The data key contains the actual data in an array-like object (such as NumPy array). Other attributes provide metadata that are required by some methods.
Raw recording¶
Raw electrophysiological data sampled at equally spaced time points. It can contain multiple channels, but all of them need to be of the same sampling frequency and duration (for example, multiple contacts of a tetrode).
The following keys are defined:
data: array, required
array-like object (for example
numpy.ndarray
) of dimensions (N_channels, N_samples)FS: int, required
sampling frequency in Hz
n_contacts: int, required
number of channels (tetrode contacts). It is equal to the size of the first dimension of data.
Note
You may read/write the data with your own functions, but to make the interface with the SpikeSort a bit cleaner, you might also want to define your custom IO filters (see Read/Write Filters (spike_sort.io.filters))
Example
We will read the raw tetrode data from Tutorial data using
standard PyTablesFilter
:
>>> from spike_sort.io.filters import PyTablesFilter
>>> io_filter = PyTablesFilter('../data/tutorial.h5')
>>> raw_data = io_filter.read_sp('/SubjectA/session01/el1')
>>> print(raw_data.keys()) # print all keys
['n_contacts', 'FS', 'data']
>>> shape = raw_data['data'].shape # check size
>>> print "{0} channels, {1} samples".format(*shape)
4 channels, 23512500 samples
>>> print(raw_data['FS']) # check sampling frequency
25000
Spike times¶
A sequence of (sorted) time readings at which spikes were generated (or other discrete events happened).
The data is store in a dictionary with following keys:
data: array, required
one-dimensional array-like object with event times (in milliseconds)
is_valid: array, optional
boolean area of the same size as data – if an element is False the event of the same index is masked (or invalid)
Note
You may read/write the data with your own functions, but to make the interface with the SpikeSort a bit cleaner, you might also want to define your custom IO filters (see Read/Write Filters (spike_sort.io.filters))
Example
spike_sort.core.extract.detect_spikes()
is one of functions
which takes the raw recordings and returns spike times dictionary:
>>> import numpy as np
>>> raw_dict = {
... 'data': np.array([[0,1,0,0,0,1]]),
... 'FS' : 10000,
... 'n_contacts': 1
... }
>>> from spike_sort.core.extract import detect_spikes
>>> spt_dict = detect_spikes(raw_dict, thresh=0.8)
>>> print(spt_dict.keys())
['thresh', 'contact', 'data']
>>> print('Spike times (ms): {0}'.format(spt_dict['data']))
Spike times (ms): [ 0. 0.4]
Note that in addition to the required data key,
detect_spikes()
appends some extrcontact a attributes: thresh
(detection threshold)
and contact
(contact on which spikes were detected). These
attributes are ignored by other methods.
Spike waveforms¶
Spike waveform structure contains waveforms of extracted spikes. It may be any mapping data structure (usually a dictionary) with following keys:
data: | array, required three-dimensional array-like object of size (N_points, N_spikes, N_contacts), where:
|
---|---|
time: | array, required Timeline of the spike waveshapes (in miliseconds). It must be of the same size as the first dimension of data (n_pts). |
FS: | int, optional Sampling frequency. |
n_contacts: | int, optional Number of independent channels with spike waveshapes (see also raw_recording). |
is_valid: | array, optional boolean area of the size of second dimension of data (N_spikes) – if an element is False the spike with the same index is masked (or invalid) |
Example
Spike waveforms can be extracted from raw recordings (see raw_recording)
given a sequence of spike times (see Spike times) by means of
spike_sort.core.extract.extract_spikes()
function:
>>> from spike_sort.core.extract import extract_spikes
>>> raw_dict = {
... 'data': np.array([[0,1,1,0,0,0,1,-1,0,0, 0]]),
... 'FS' : 10000,
... 'n_contacts': 1
... } # raw signal
>>> spt_dict = {
... 'data': np.array([0.15, 0.65, 1])}
... } # timestamps of three spikes
>>> sp_win = [0, 0.4] # window in which spikes should be extracted
>>> waves_dict = extract_spikes(raw_dict, spt_dict, sp_win)
Now let us investigate the returned spike waveforms structure:
keys:
>>> print waves_dict.keys() ['is_valid', 'FS', 'data', 'time']
data array shape:
>>> print(waves_dict['data'].shape) (4, 3, 1)
extracted spikes:
>>> print(waves_dict['data'][:,:,0].T) # data contains three spikes [[ 1. 1. 0. 0.] [ 1. -1. 0. 0.] [ 0. 0. 0. 0.]] >>> print(waves_dict['time']) # defined over 4 time points [ 0. 0.1 0.2 0.3]
and potential invalid (truncated spikes):
>>> print(waves_dict['is_valid']) # last spike is invalid (truncated) [ True True False]
Note that the is_valid
element of truncated spike is
False
.
Spike features¶
This data structure contains features calculated from spike waveforms
using one of the methods defined in spike_sort.core.features
module
(one of the fet*()
functions, see Features).
The spike features dictionary consits of following keys:
data: | array, required two-dimensional array of size (N_spikes, N_features) that contains the actual feature values |
---|---|
names: | list of str, required list of length N_features containing feature labels |
is_valid: | array, optional boolean area of of length N_spikes; if an element is False the spike with the same index is masked (or invalid, see also Spike waveforms) |
Example
Let us try to calculate peak-to-peak amplitude from some spikes extracted in Spike waveforms:
>>> from spike_sort.core.features import fetP2P
>>> print(waves_dict['data'].shape) # 3 spikes, 4 data points each
(4, 3, 1)
>>> feature_dict = fetP2P(waves_dict)
>>> print(feature_dict.keys())
['is_valid', 'data', 'names']
>>> print(feature_dict['data'].shape)
(3, 1)
Then we have one feature for 3 spikes. Let check whether the peak-to-peak amplitudes are correctly calculated:
>>> print(feature_dict['data'])
[[ 1.]
[ 2.]
[ 0.]]
as expected (compare with example above). There is only one peak-to-peak (P2P) feature on a single channel (Ch0) and its name is:
>>> print(feature_dict['names'])
['Ch0:P2P']
The mask array is inherited from waves_dict
:
>>> print(feature_dict['is_valid'])
[ True True False]
Spike labels¶
Spike labels are the identifiers of a cell (unit) each spike was classified to. Spike labels are not dictionaries, but arrays of integers – one cluster index per spike.
Example
Let us try to cluster the spikes described by Sample feature using K-means with K=2:
>>> from spike_sort.core.cluster import cluster
>>> feature_dict = {
... 'data' : np.array([[1],[-1], [1]]),
... 'names' : ['Sample']
... }
>>> labels = cluster('k_means', feature_dict, 2)
>>> print(labels)
[1 0 1]
As expected labels
is an array describing two clusters: 0 and 1.