Classification

In this notebook, we will try to classify the data in the logger file based on a classification value. To do so, we will use a function called classify_based_on_value. This function takes file containing the Morpheus log output, the field of interest that we want to classify, the value of classification, the time step, and the time symbol, as specified in the Morpheus model. The output will be a dictionary with cell type as a key, and the list of occurrences of the cell types as a value for different time intervals. The result will be classified to less or larger than the value_of_interest.

Let try to use the function now, but before we start, let’s define some required parameters. First, let’s create a sample logger file in a CSV format:

[1]:
import csv
import tempfile
import os

full_path=os.path.join(tempfile.gettempdir(), "logger.csv")

csvfile=open(full_path,'w', newline='')
obj=csv.writer(csvfile, delimiter='\t')
obj.writerow(['t', 'cell.id', 'RNA_concentration', 'cell_type'])
obj.writerow(['0', '1', '0.1' ,'0'])
obj.writerow(['0', '2', '0.2' ,'0'])
obj.writerow(['0', '3', '0.3' ,'0'])
obj.writerow(['1', '4', '0.4' ,'1'])
obj.writerow(['1', '5', '0.5' ,'2'])
obj.writerow(['1', '6', '0.6' ,'1'])
obj.writerow(['2', '7', '0.7' ,'2'])
obj.writerow(['2', '8', '0.8' ,'3'])
obj.writerow(['2', '9', '0.9' ,'2'])
csvfile.close()

As we can see from the above example, the time step is =1 (if not specify by user, the value will be extracted from Morpheus file. Moreover, the time symbol that was used in the model is =t. The time symbol by default is ‘t’. So, even if we didn’t specify the time_step and time_symbol here, the code will run fine.

[2]:
t_step=1
t_symbol='t'

We want to classify the result based on the RNA_concentration. We will use value_of_interest to be = 0.5. The field_of_interest will be =RNA_concentration. Fianlly, the value_of_interest = 0.5

[3]:
field_of_interest = 'RNA_concentration'
value_of_interest = 0.5

Now, we are ready to call the function classify_based_on_value(), but before that, let’s import the function’s package. Don’t forget to install the package first.

[ ]:
from fitmulticell.sumstat import cell_types_cout as ss

We also need to import util module form fitmulticell to read the CSV file as pandas df

[4]:
import fitmulticell.util as util

No, we will read the CSV file using the “tsv_to_df” function form the fitmulticell library

[5]:
logger_file = util.tsv_to_df("/tmp")

Let’s see how the logger_file loks like

[6]:
logger_file
[6]:
t cell.id RNA_concentration cell_type
0 0 1 0.1 0
1 0 2 0.2 0
2 0 3 0.3 0
3 1 4 0.4 1
4 1 5 0.5 2
5 1 6 0.6 1
6 2 7 0.7 2
7 2 8 0.8 3
8 2 9 0.9 2
[8]:
classification_result=ss.classify_based_on_value(logger_file, field_of_interest, value_of_interest)
[9]:
print(f'The classification result is = {classification_result}')
The classification result is = {0: [3, 0], 1: [1, 2], 2: [0, 3]}

As we can see from the above result, the RNA_concentration of the first cell type “0” has three occurrences that are less than the value_of_interest and none that are larger or equal to it. Whereas the second cell type “1”, has only one occurrence that is less than the value_of_interest and two that are larger or equal to value_of_interest.