.. testsetup:: import numpy numpy.random.seed(1221) Data Structures =============== .. _raw_recording: To achieve best compatibility with external libraries most of the data structures are standard Python *dictionaries*, with at least one key -- `data`. The `data` key contains the actual data in an array-like object (such as NumPy array). Other attributes provide metadata that are required by some methods. Raw recording ------------- Raw electrophysiological data sampled at equally spaced time points. It can contain multiple channels, but all of them need to be of the same sampling frequency and duration (for example, multiple contacts of a tetrode). The following keys are defined: :data: *array*, required array-like object (for example :py:class:`numpy.ndarray`) of dimensions (N_channels, N_samples) :FS: *int*, required sampling frequency in Hz :n_contacts: *int*, required number of channels (tetrode contacts). It is equal to the size of the first dimension of `data`. .. note:: You may read/write the data with your own functions, but to make the interface with the SpikeSort a bit cleaner, you might also want to define your custom IO filters (see :ref:`io_filters`) .. rubric:: Example We will read the raw tetrode data from :ref:`tutorial_data` using standard :py:class:`~spike_sort.io.filters.PyTablesFilter`: >>> from spike_sort.io.filters import PyTablesFilter >>> io_filter = PyTablesFilter('../data/tutorial.h5') >>> raw_data = io_filter.read_sp('/SubjectA/session01/el1') >>> print(raw_data.keys()) # print all keys ['n_contacts', 'FS', 'data'] >>> shape = raw_data['data'].shape # check size >>> print "{0} channels, {1} samples".format(*shape) 4 channels, 23512500 samples >>> print(raw_data['FS']) # check sampling frequency 25000 .. _spike_times: Spike times ----------- A sequence of (sorted) time readings at which spikes were generated (or other discrete events happened). The data is store in a dictionary with following keys: :data: *array*, required one-dimensional array-like object with event times (in milliseconds) :is_valid: *array*, optional boolean area of the same size as `data` -- if an element is False the event of the same index is masked (or invalid) .. note:: You may read/write the data with your own functions, but to make the interface with the SpikeSort a bit cleaner, you might also want to define your custom IO filters (see :ref:`io_filters`) .. rubric:: Example :py:func:`spike_sort.core.extract.detect_spikes` is one of functions which takes the raw recordings and returns spike times dictionary: >>> import numpy as np >>> raw_dict = { ... 'data': np.array([[0,1,0,0,0,1]]), ... 'FS' : 10000, ... 'n_contacts': 1 ... } >>> from spike_sort.core.extract import detect_spikes >>> spt_dict = detect_spikes(raw_dict, thresh=0.8) >>> print(spt_dict.keys()) ['thresh', 'contact', 'data'] >>> print('Spike times (ms): {0}'.format(spt_dict['data'])) Spike times (ms): [ 0. 0.4] Note that in addition to the required data key, :py:func:`~spike_sort.core.extract.detect_spikes` appends some extrcontact a attributes: :py:attr:`thresh` (detection threshold) and :py:attr:`contact` (contact on which spikes were detected). These attributes are ignored by other methods. .. _spike_wave: Spike waveforms --------------- Spike waveform structure contains waveforms of extracted spikes. It may be any mapping data structure (usually a dictionary) with following keys: :data: *array*, required three-dimensional array-like object of size (N_points, N_spikes, N_contacts), where: * `N_points` -- the number of data points in a single waveform, * `N_spikes` -- the total number of spikes and * `N_contacts` -- the number of independent channels (for example 4 in a tetrode) :time: *array*, required Timeline of the spike waveshapes (in miliseconds). It must be of the same size as the first dimension of data (`n_pts`). :FS: *int*, optional Sampling frequency. :n_contacts: *int*, optional Number of independent channels with spike waveshapes (see also :ref:`raw_recording`). :is_valid: *array*, optional boolean area of the size of second dimension of `data` (N_spikes) -- if an element is False the spike with the same index is masked (or invalid) .. rubric:: Example Spike waveforms can be extracted from raw recordings (see :ref:`raw_recording`) given a sequence of spike times (see :ref:`spike_times`) by means of :py:func:`spike_sort.core.extract.extract_spikes` function: >>> from spike_sort.core.extract import extract_spikes >>> raw_dict = { ... 'data': np.array([[0,1,1,0,0,0,1,-1,0,0, 0]]), ... 'FS' : 10000, ... 'n_contacts': 1 ... } # raw signal >>> spt_dict = { ... 'data': np.array([0.15, 0.65, 1])} ... } # timestamps of three spikes >>> sp_win = [0, 0.4] # window in which spikes should be extracted >>> waves_dict = extract_spikes(raw_dict, spt_dict, sp_win) Now let us investigate the returned spike waveforms structure: * keys: >>> print waves_dict.keys() ['is_valid', 'FS', 'data', 'time'] * data array shape: >>> print(waves_dict['data'].shape) (4, 3, 1) * extracted spikes: >>> print(waves_dict['data'][:,:,0].T) # data contains three spikes [[ 1. 1. 0. 0.] [ 1. -1. 0. 0.] [ 0. 0. 0. 0.]] >>> print(waves_dict['time']) # defined over 4 time points [ 0. 0.1 0.2 0.3] * and potential invalid (truncated spikes): >>> print(waves_dict['is_valid']) # last spike is invalid (truncated) [ True True False] Note that the :py:attr:`is_valid` element of truncated spike is :py:data:`False`. .. _spike_features: Spike features -------------- This data structure contains features calculated from spike waveforms using one of the methods defined in :py:mod:`spike_sort.core.features` module (one of the :py:func:`fet*` functions, see :ref:`features_doc`). The spike features dictionary consits of following keys: :data: *array*, required two-dimensional array of size (N_spikes, N_features) that contains the actual feature values :names: *list of str*, required list of length N_features containing feature labels :is_valid: *array*, optional boolean area of of length N_spikes; if an element is False the spike with the same index is masked (or invalid, see also :ref:`spike_wave`) .. rubric:: Example Let us try to calculate peak-to-peak amplitude from some spikes extracted in :ref:`spike_wave`: >>> from spike_sort.core.features import fetP2P >>> print(waves_dict['data'].shape) # 3 spikes, 4 data points each (4, 3, 1) >>> feature_dict = fetP2P(waves_dict) >>> print(feature_dict.keys()) ['is_valid', 'data', 'names'] >>> print(feature_dict['data'].shape) (3, 1) Then we have one feature for 3 spikes. Let check whether the peak-to-peak amplitudes are correctly calculated: >>> print(feature_dict['data']) [[ 1.] [ 2.] [ 0.]] as expected (compare with example above). There is only one peak-to-peak (`P2P`) feature on a single channel (`Ch0`) and its name is: >>> print(feature_dict['names']) ['Ch0:P2P'] The mask array is inherited from :py:data:`waves_dict`: >>> print(feature_dict['is_valid']) [ True True False] .. _spike_labels: Spike labels ------------ Spike labels are the identifiers of a cell (unit) each spike was classified to. Spike labels are **not** dictionaries, but arrays of integers -- one cluster index per spike. .. rubric:: Example Let us try to cluster the spikes described by `Sample` feature using K-means with K=2: >>> from spike_sort.core.cluster import cluster >>> feature_dict = { ... 'data' : np.array([[1],[-1], [1]]), ... 'names' : ['Sample'] ... } >>> labels = cluster('k_means', feature_dict, 2) >>> print(labels) [1 0 1] As expected :py:data:`labels` is an array describing two clusters: 0 and 1.