.. _lowlevel_tutorial:

Using low-level interface
==========================

.. testsetup::
   
   import numpy
   numpy.random.seed(1221)
   import os, shutil, tempfile

   temp_path =  tempfile.mkdtemp()
   data_path = os.path.join(temp_path, 'data')
   os.mkdir(data_path)
   shutil.copyfile('../data/tutorial.h5', os.path.join(data_path, 'tutorial.h5'))
   os.chdir(temp_path)

.. testcleanup::

   shutil.rmtree(temp_path)


In this tutorial we will go deeper into the lower-level interface of
SpikeSort: :py:mod:`spike_sort.core`. This interface is a bit more
complex than the SpikeBeans (see :ref:`beans_tutorial`), but it offers
more flexibility and allows you to embbedd SpikeSort in your own
programs.

To start this tutorial you will need:

* working installation of SpikeSort 

* the sample :ref:`tutorial_data`


1. Read data
------------

   
   We will assume that you downloaded the sample data file :file:`tutorial.h5` and saved it to the :file:`data` 
   directory. 
   
   You can load this file using one of I/O fiters from 
   :py:mod:`spike_sort.io.filter` module:
   
   .. doctest::
   
      >>> from spike_sort.io.filters import PyTablesFilter
      >>> dataset = '/SubjectA/session01/el1'
      >>> io_filter = PyTablesFilter('data/tutorial.h5')
      >>> raw = io_filter.read_sp(dataset)
      
   :py:data:`raw` is a dictionary which contains the raw data (in this case it is
   a pytables compressed array) under :py:attr:`data`
   key:

   .. doctest::
   
      >>> print raw['data']
      /SubjectA/session01/el1/raw (CArray(4, 23512500)) ''
      
   The size of the data is 23512500 samples in 4 independent channels (`contacts`
   in the tetrode).

   .. note::
      
      HDF5 are organised hierarchically and may contain multiple
      datasets. You can access the datasets via simple paths - in this
      case `/SubjectA/session01/el1` which means dataset of SubjectA
      recorded in session01 from el1

2. Detect spikes
----------------


   The first step of spike sorting is spike detection. It is usually done by 
   thresholding the raw recordings. Let us use an automatic threshold on 
   4th contact i.e. index 3 (channel indexing always starts with 0!):
   
   .. doctest::
   
      >>> from spike_sort import extract
      >>> spt = extract.detect_spikes(raw,  contact=3, thresh='auto')
      
   Let us see now how many events were detected:
   
   .. doctest::
   
      >>> print len(spt['data'])
      16293
      
   We should make sure that all events are aligned to the same point of reference,
   for example, the maximum amplitude. To this end we first define a window
   around which spikes should be centered and then recalculate aligned event times:
   
   .. doctest::
      
      >>> sp_win = [-0.2, 0.8]
      >>> spt = extract.align_spikes(raw, spt, sp_win, type="max", 
      ...                            resample=10)
      
   ``resample`` is optional - it enables upsampling (in this case 10-fold) 
   of the original  waveforms to obtain better resolution of event times.
   
   After spike detection and alignment we can finally extract the spike waveforms:
   
   .. doctest::
  
      >>> sp_waves = extract.extract_spikes(raw, spt, sp_win)
      
   The resulting structure is a dictionary whose :py:attr:`data` key is an array
   containing the spike waveshapes. Note that the array is three-dimensional and
   sizes of its dimensions reflect:
     
     * 1st dimmension: number of samples in each waveform,
     * 2nd: number of spikes,
     * 3rd: number of contacts
   
   .. doctest::
   
      >>> print sp_waves['data'].shape
      (25, 15537, 4)
      
   In practice, you do not to take care of such details. However, it is always
   a good idea to take a look at the obtained waveforms. 
   :py:mod:`spike_sort.ui.plotting` module contains various functions which will
   help you to visualize the data. To plot waveshapes you can use :py:func:`plot_spikes` function from this module:
   
   .. doctest::
   
      >>> from spike_sort.ui import plotting
      >>> plotting.plot_spikes(sp_waves, n_spikes=200)
   
   .. figure:: images_manual/tutorial_spikes.png
   
   It is apparent from the plot that the spike waveforms of a few different cells
   and also some artifacts were detected. In order to separate these activities, 
   in the next step we will perform *spike clustering*.

3. Calculate features
---------------------

   Before we can cluster spikes, we should calculate some characteristic features 
   that may be used to differentiate between the waveshapes. Module 
   :py:mod:`~spike_sort.core.features` defines several of such features, for example
   peak-to-peak amplitude (:py:func:`fetP2P`) and projections on principal 
   components (:py:func:`fetPCA`). Now, we will calculate peak-to-peak amplitudes
   and PC projections on each of the contact, and then combine them into a single
   object:
   
   .. doctest::
   
      >>> from spike_sort import features
      >>> sp_feats = features.combine(
      ...      (
      ...       features.fetP2P(sp_waves),
      ...       features.fetPCA(sp_waves)
      ...      )
      ... )
   
   To help the user identify the features,  abbreviated
   labels are assigned to all features:
   
   .. doctest::
   
      >>> print sp_feats['names']
      ['Ch0:P2P' 'Ch1:P2P' 'Ch2:P2P' 'Ch3:P2P' 'Ch0:PC0' 'Ch1:PC0' 'Ch2:PC0'
       'Ch3:PC0' 'Ch0:PC1' 'Ch1:PC1' 'Ch2:PC1' 'Ch3:PC1']
      
   For examples feature ``Ch0:P2P`` denotes peak-to-peak amplitude in contact 
   (channel) 0.
   
   Let us plot the two-dimensional 
   projections of the feature space and histograms of features:
   
   .. doctest::
  
      >>> plotting.plot_features(sp_feats)
      
   .. figure:: images_manual/tutorial_features.png

4. Cluster spikes
-----------------

   Finally, based on the calculated features we can perform spike clustering. This
   step is a little bit more complex and the best settings have to be identified
   using trial-and-error procedure.
   
   There are several automatic, semi-automatic and manual methods for clustering.
   They performance and accuracy depends to large degree on a particular dataset
   and recording setup. In SpikeSort you can choose from several available methods,
   whose names are given as the first argument of :py:func:`~spike_sort.core.cluster.cluster`
   method.
   
   We will start with an automatic clustering :py:func:`~spike_sort.core.cluster.gmm` , which requires only the feature object :py:data:`sp_feats` and number of clusters to identify.
   It attempts to find a mixture of gaussian distributions which approximates best the
   distribution of spike features (gaussian mixture model).
   Since we do not know, how many cells were picked up by the electrode we guess
   an initial number of clusters, which we can modify later on:
   
   .. doctest::
      
      >>> from spike_sort import cluster
      >>> clust_idx = cluster.cluster("gmm",sp_feats,4)
      
   The resulting data is just assigning a number (cluster index) to each spike from
   the feature array :py:data:`sp_feats`.
   
   You can use the plotting module to draw the 
   feature vectors with color reflecting groups to which each spike was assigned:
   
   .. doctest::
   
      >>> plotting.plot_features(sp_feats, clust_idx)
      
   .. figure:: images_manual/tutorial_clusters.png

   or you can see the spike waveshapes:
   
   .. doctest::
     
      >>> plotting.plot_spikes(sp_waves, clust_idx, n_spikes=200)
      >>> plotting.show()

   .. figure:: images_manual/tutorial_cells.png
      
   If you are not satisfied with the results or you think you might do better, 
   you can also try manual sorting using cluster cutting method::
   
      >>> from spike_sort.ui import manual_sort
      >>> cluster_idx = manual_sort.show(features, sp_waves,
      ...                                ['Ch0:P2P','Ch3:P2P'],
      ...                                show_spikes=True)
      
   This function will open a window in which you can draw clusters of arbitrary
   shapes, but beware: you can draw only on two dimensional plane so that you 
   are limited to only two features (``Ch0:P2P`` and ``Ch3:P2P`` in this case)!

5. Export data
--------------

   Once you are done with spike sorting, you can export the results to a file.
   To this end you can use the same :py:mod:`~spike_sort.io.filters` module we used 
   for reading. Here, we will save the spike times of a selected cell
   back to the file we read the data from. 
   
   First, we need to extract the spike times 
   of the discriminated cells:
   
   .. doctest:: 
  
      >>> spt_clust = cluster.split_cells(spt, clust_idx)

   It will create a dictionary whose keys are the cell labels pointing
   to spike times of the specific cell. For example, to extract spike
   times of cell 0:

   .. doctest::

      >>> print spt_clust[0]
      {'data': array([  5.68152000e+02,   1.56978000e+03,   2.23985200e+03,
               ...
               9.24276876e+05,   9.33539168e+05])}
 
      
   Then we may export them to the datafile:

   .. doctest::
   
      >>> from spike_sort.io import export
      >>> cell_template = dataset + '/cell{cell_id}'
      >>> export.export_cells(io_filter, cell_template, spt_clust, overwrite=True)
      
   This will create a new node in :file:`tutorial.h5` containing  spike times of 
   the discriminated cell ``/SubjectA/session01/el1/cell{1-4}``, 
   which you can use for further analysis.
  
   Do not forget to close the I/O filter at the end of your analysis:

   .. doctest::

     >>> io_filter.close()
   
   Good luck!!!