We use cookies to improve your experience. By continuing to browse this site, you accept our cookie policy.×
ReportsOpen Accesscc iconby iconnc iconnd icon

MSCProfiler: a single cell image processing workflow to investigate mesenchymal stem cell heterogeneity

    Ayona Gupta

    Manipal Institute of Regenerative Medicine, Bengaluru, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India

    ,
    Safia Kousar Shaik

    Delft University of Technology, Delft, The Netherlands

    ,
    Lakshmi Balasubramanian

    SaviTix Technology Consultancy, Bengaluru, India

    &
    Uttara Chakraborty

    *Author for correspondence:

    E-mail Address: uttara.chakraborty@manipal.edu

    Manipal Institute of Regenerative Medicine, Bengaluru, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India

    Published Online:https://doi.org/10.2144/btn-2023-0048

    Abstract

    Single cell cytometry has demonstrated plausible immuno-heterogeneity of mesenchymal stem cells (MSCs) owing to their multivariate stromal origin. To contribute successfully to next-generation stem cell therapeutics, a deeper understanding of their cellular morphology and immunophenotype is important. In this study, the authors describe MSCProfiler, an image analysis pipeline developed using CellProfiler software. This workflow can extract geometrical and texture features such as shape, size, eccentricity and entropy, along with intensity values of the surface markers from multiple single cell images obtained using imaging flow cytometry. This screening pipeline can be used to analyze geometrical and texture features of all types of MSCs across different passages hallmarked by enhanced feature extraction potential from brightfield and fluorescent images of the cells.

    METHOD SUMMARY

    This study describes the development of an enhanced image feature extraction approach to analyze single cell image data of mesenchymal stem cells (MSCs). MSCProfiler, an automated image analysis pipeline, was developed using CellProfiler software to analyze geometrical and morphological/texture features and fluorescence intensities of biomarkers in MSCs across passages and tissue sources. Compared to existing image analysis tools, this workflow is devoid of human bias and is marked by the efficiency with which it filters out nontarget images and extracts features from brightfield and fluorescent images of cells.

    Graphical abstract

    Tweetable abstract

    Mesenchymal stem cells are known for their cellular heterogeneity, which requires deeper understanding to contribute stem cell therapeutics. MSCProfiler is an automated, single-cell image-processing workflow that sheds light on MSC heterogeneity.

    At the forefront of cell-based therapies are mesenchymal stem cells (MSCs), with over 1000 registered clinical studies under way [1]. These are adult tissue-derived multipotent stem cells, which can be isolated from almost all postnatal organs and tissues [2]. They have a variety of pleiotropic functions such as antiapoptosis, angiogenesis, antifibrosis and chemo-attractive properties, making them a useful tool for a wide variety of therapeutic applications [1,3,4]. An important metric to qualify MSCs as beneficial therapeutic agents is the determination of their safety and risks, if any [4]. Consequently, culturing and studying these cells become important. However, one of the hurdles faced in the translation of in vitro MSC studies is that many of the properties that are studied, especially expression of surface proteins, arise because of the artificial culture environment [5]. Expression of surface markers such as CD90, CD73, CD105 and CD44, which are routinely used for MSC characterization, are limited to only being expressed in vitro and not in situ. A major challenge in the translation of MSCs to clinical application is the inherent heterogeneity of MSCs, which influences its properties of immunomodulation and regeneration [3]. The heterogeneity, arising due to isolation, culture and expansion, warrants an extensive study on various MSC subtypes and their categorization [3,6].

    To understand this heterogeneity, a cell-by-cell approach as opposed to conducting studies at a population level would be more advantageous in revealing important and unique properties of various cell types [7]. In this context, single cell analysis platforms such as imaging flow cytometry (IFC) can prove to be a powerful tool. An amalgamation of high-throughput cytometry and microscopy, IFC allows extraction of traditional flow cytometric data and images of each event from fluorescent, brightfield (BF) and laser side scatter/darkfield (DF) channels [8]. The potential of this technology has mostly allowed the exploration of nuclear translocation [9], autophagy [10,11] and detection of DNA damage [12–14]. However, unraveling the immuno-heterogeneity of cells using this technology is still under way. The high-throughput nature of this technology allows the collection of enormous amounts of data from a single cell, which can be used to answer questions of cellular heterogeneity [8] and holds immense potential for data mining and creating well-trained neural networks. However, a large amount of information remains underutilized to a great extent due to the lack of data analysis tools that can extract meaningful information from the images [9].

    In this study, we have used the Amnis ImageStream image cytometer which is one of the widely used IFC instruments. Data analysis on Amnis platforms can be done using their proprietary software IDEAS or with analysis tools such as FCS Express (DeNovo Software). Apart from having the advantage of performing traditional flow cytometry data analysis, the IDEAS platform also provides users with ‘wizards’, which provide assay-specific analysis templates for feature finder, enabling sequential analysis without many complications (https://cytekbio.com/pages/imagestream). In addition, specific areas or regions of the cell image can be defined using a ‘mask’ (defined set of pixels in the region of interest) that contain features such as creating nuclear masks or cell surface marker-specific masks. It is also possible to create the masks on IDEAS based on user-defined criteria. This can be done by using the mask manager, which has 13 available functions to create the new mask. However, use of this analytical software is primarily dependent on the experience of the user and therefore brings in bias. This can especially be challenging when understanding immuno-heterogeneity, which can lead to missing out on potentially important features if one does not actively look for them.

    The recent trend in the incorporation of machine learning (ML) or artificial intelligence in biology has resulted in the advancement of data analysis approaches [15]. Use of virtual or label-free staining of cells [16,17], use of BF information alone to extract quantitative features of cells [16] and use of ML to analyze IFC data for diagnostics [18] have all introduced a new paradigm. However, some of these methods might require an in-depth knowledge of deep-learning techniques, neural networks or programming languages [19]. In this study, the authors used a user-friendly, freely available software, CellProfiler (available at https://cellprofiler.org/), to create a pipeline that can be used to analyze most of the commonly available image file formats [20].

    Here the authors introduce MSCProfiler, a completely automated workflow, to analyze IFC data from different human MSCs such as the stem cells of human exfoliated deciduous teeth (SHEDs) and Wharton's jelly-derived MSCs (WJMSCs). The authors developed the entire workflow using the SHEDs and tested the robustness of the same using WJMSCs. SHEDs were stained with antibodies that recognize surface antigens of these cells, prescribed as minimal criteria for identification of MSCs by the International Society for Cell & Gene Therapy (ISCT) [21]. The authors included two surface markers in their studies: CD44/homing cell adhesion molecule and CD73/ecto-5′-nucleotidase. Conventionally, with routine flow cytometry-based characterization of MSCs, bivariate plots of CD44 and CD73 should represent ≥95% double-positive population (refer to Supplementary Figure 1) to suffice the criteria of MSCs as per ISCT guidelines and should express ≤2% hematopoietic marker CD45 [21]. In the authors' panel, they used the above markers and included SYTOX Green as a live/dead discriminator. Live cells exclude SYTOX Green from their membranes, whereas the nucleus of dying/dead cells whose membrane integrity has been compromised take up the dye [22]. The authors focused on the extraction of feature information from individual BF and/or fluorescent images acquired on the Amnis ImageStream Mk II instrument. This workflow can identify images of live singlets based on the exact boundary of single cells from doublets or aggregates and compute the geometrical parameters such as area, aspect ratio, eccentricity, compactness and even texture features such as, inverse difference moment and entropy. The intensity of CD44 and CD73 antigens were also extracted from the fluorescent channel images and their surface distribution pattern was observed.

    Materials & methods

    Cell culture

    Individual cell cultures of SHEDs at passage 9 (P9) and 12 (P12) and WJMSCs at P6 were maintained for this study. Each cell type were seeded in 100 mm culture dishes at a density of 5000 per cm2 and grown in KnockOut Dulbecco’s modified Eagle medium (Gibco) media supplemented with 10% fetal bovine serum (Gibco), 1× Pen-Strep (Gibco) and 1× glutamine (Gibco) and grown at 37°C (humidity conditions) until they reached 90–95% confluency with media changes every 24 or 48 h as required. Cells were then washed once with plain basal media without any supplements and trypsinization was done by adding 0.25% trypsin-EDTA (Gibco) followed by incubation at 37°C for 10 min until all the cells had detached from the culture dish. Trypsin-EDTA was neutralized with basal media and the cells were collected and centrifuged at 1200 r.p.m. for 6 min to obtain the cell pellet.

    Preparation of cell suspension for immunostaining

    The cell pellet was resuspended in 1 ml media, and the cell count was determined. Cells were then washed twice with staining buffer (2% fetal bovine serum in phosphate-buffered saline). Two percent fetal bovine serum helps to sustain viable cells while in suspension post-trypsinization. A cell suspension of 1 × 106 cells per 50 μl suspension contributed to 100 μl of reaction volume when mixed with antibodies and staining buffer. The amount of antibody added to every tube is described in Table 1.

    Table 1. Sample preparation guide.
    TubeReagent mixTotal volume of antibody added (3 μl each antibody per 1 × 106 cells)Total volume of cells addedTotal volume of staining buffer addedFinal volume
    1CD44 + CD73 + CD45 + SYTOX Green9 μl50 μl41 μl100 μl
    2CD44 (single color control)3 μl47 μl
    3CD73 (single color control)3 μl47 μl
    4CD45 (single color control)3 μl47 μl
    5SYTOX Green (single color control)Nil.50 μl

    Antibodies & immunostaining

    Live SHEDs were simultaneously stained for three surface antigens with an antibody cocktail consisting of CD44, CD73 and CD45. The list of antibody–fluorochrome conjugates are provided in Supplementary Table 1. Cells were incubated for 30 min, washed twice with staining buffer and finally resuspended in 100 μl of staining buffer. Prior to data acquisition, SYTOX Green (1×) was added to Tubes 1 and 5 (Table 1) and incubated at 37°C for 15 min. SYTOX Green is a nucleic acid stain/dye which helps discriminate between live and dead cells.

    To address the biggest concern of spectral spillage in a multicolor immunophenotyping experiment, we prepared the right controls for compensation. The single color controls for every antibody used including the DNA binding dye were prepared. Using the compensation algorithm, the noBF files were generated while acquiring the single color tubes. Cells were stained for all the single color compensation controls. Since MSCs would not express CD45, cells from peripheral blood were used as single color controls. For the viability dye, SYTOX Green, compensation control was prepared by giving heat shock treatment to MSCs at 70°C for 5 min followed by incubation on ice. Table 1 summarized the reaction mix per tube, using -1 × 106 cells/ml cell density.

    Acquisition on the Amnis ImageStream Mk II

    Cells were acquired on the IFC platform Amnis ImageStream Mk II equipped with one charged couple device (CCD) camera (six channels/detectors) using INSPIRE acquisition software, briefly described in Figure 1A. The detection channels available for use in the instrument based on the fluorochrome conjugates were V450 (Channel 1, 435–505 nm), SYTOX Green (Channel 2, 505–560 nm), phycoerythrin (PE; Channel 3, 560–595 nm), BF images (Channel 4, 595–642 nm), Peridinin Chlorophyll Protein-Cyanine 5.5 (PerCP-Cy5.5) (Channel 5, 642–745 nm), and DF images/ side scatter (Channel 6, 745–780 nm). The instrument configuration and panel design are provided in Supplementary Table 2. Unstained cells were first run to set the baseline correction by adjusting the laser powers such that no autofluorescence could be detected. Lasers were set at 10 mW for 405 nm laser (Channel 1 excitation) and 10 mW for 488 nm laser (Channel 2, 3 and 5 excitation). Sample tubes containing all four colors were run and ≥1,00,000 events were acquired. All the single color tubes were run with the BF and DF parameters turned off and were automatically saved with ‘noBF’ mentioned in their file names. These files were later used to set up the compensation matrix postacquisition. All raw data files were saved in raw image file (.rif) format.

    Figure 1. Supervised versus automated data analytical tools.

    (A) Sample preparation and acquisition on the imaging flow cytometer using INSPIRE software, which generated the single cell images of each cell in flow. (B) Supervised analytical software IDEAS v6.2 was used to create a compensation matrix and .daf and .cif files to analyze the acquired data and view the cell images. Data analysis software such as FCS Express v6 was used to analyze the cytometry data. (C) MSCProfiler analytical automated pipeline. Images exported from IDEAS were converted to .tiff format and later uploaded on MSCProfiler and run through the workflow to extract information from the single cell images and identify cells of interest.

    Setting up compensation on IDEAS

    As opposed to conventional flow cytometry which relies on voltage gain of individual photomultiplier tubes to collect a signal generated from antibody fluorochrome conjugate, the detectors available on the Amnis ImageStream Mk II are charged couple devices. They have a higher quantum efficiency than photomultiplier tubes, which make them ideal detectors to collect dim fluorescence signals, a prerequisite for good imaging. On this instrument, a pixel-based compensation is performed postacquisition using IDEAS software. The ‘noBF’ files of each color acquired in .rif format were loaded onto IDEAS software to create a compensation matrix to correct the spectral spillage of fluorochromes in respective channels. Once the compensation matrix was set up, they were saved in compensation matrix (.ctm) file format.

    IFC data analysis

    The IFC data have two components – first, the flow cytometric data conventionally represented by dot plots and, second, the individual images of each dot on the plot. As a comparative study, the IFC data were analyzed using both IDEAS and FCS Express software. The data analysis was performed using the proprietary software IDEAS v6.2 (Amnis Corp., WA, USA). Following hierarchical gating, the population of interest was first identified, after which the image data analysis began. Every cell was demarcated with a number and was visible on the screen/image gallery. ‘noBF’ files were used to set up the compensation matrix. The .rif and .ctm files were then used to create the data analysis file (.daf) and compensated image file (.cif). The downstream analysis was performed using the .daf files. Following a hierarchical gating strategy, first the focused cells were identified by plotting a histogram of normalized frequency versus gradient root mean square value of Channel 4 (BF). This was followed by single cell identification (selected apart from the aggregates/debris) gated from the aspect ratio of Channel 4 versus area of Channel 4. The next plot was to identify live cells (negative for SYTOX) from the Channel 6 (DF) versus Channel 2 plot, followed by CD45 negative cells from the Channel 6 (DF) versus Channel 3 (CD45) plot (exclusion gating strategy), and finally populations positive for both CD44 and CD73 were identified from a bivariate plot between Channel 1 (CD44) and Channel 5 (CD73).

    Analysis using FCS Express

    The .daf file was read by FCS Express v6 (Research Use Only), flow cytometry data analysis software. It was ensured that both the .daf file and the .cif file for each sample tube were available in the same analysis folder. Using a gating strategy similar to that described in the preceding section, singlets were identified first, followed by live cells. The CD45 negative cells were first identified and gated onto the next plot to show the CD44 and CD73 double positive cells. This software validated the gating strategy applied to the populations before proceeding with single cell image data analysis. A flowchart of the steps involved in supervised analysis platforms are shown in Figure 1B.

    Preparing images for CellProfiler analysis

    The individual images from Channels 1, 2, 3, 4 and 5 were selected from Export .tif images under the Tools option on IDEAS. The images come with an open microscopy environment (.ome) extension, which were converted to .tiff using Fiji/ImageJ software in an automated fashion using an in-house-built macro (Supplementary Information). These individual images were used in the image processing workflow as described in the later sections, to extract the single cellular features. The workflow was developed using CellProfiler software to analyze the single cell images of MSCs.

    MSCProfiler workflow

    The individual .tiff images of all the channels were then loaded into the MSCProfiler pipeline through the input modules (features that perform specific tasks). Using the NamesAndTypes module, each channel image was categorized in a way that a singlet set consisted of information on a single cell/event consisting of BF and fluorescent channel images. The automated pipeline was described by broadly classifying them into 11 steps based on the purpose they serve. Individual cell information (i.e., information from all five channels) was processed through all the modules of each component to identify and extract various features such as geometric, texture and intensity values, described in Figure 1C. To begin with, the total number of cells analyzed was 54,356 (SHEDs P9), which were selected from the ‘focused’ cell gate on the IDEAS software. Therefore, a total of 271,780 images, collected from all five channels, were run on MSCProfiler. All 271,780 images were not run through the workflow at one time. The total set was divided into 24 groups, each containing approximately 10,000 images. In order to evaluate the strength of the workflow, roughly 13,000 and 10,000 each of WJMSC (P6) and SHEDs (P12), respectively, were analyzed as well. The steps in the MSCProfiler workflow are as follows:

    1.

    Quality control of images – the MeasureImageIntensity and FlagImage modules were used to discard images that did not contain any cellular information. The MeasureImageIntensity module calculated the intensity measurements of the images based on which the boundary conditions were set in the FlagImage module to discard the poor-quality images. Only BF (Channel 4) images were used for this quality check to estimate if the image had any cellular information or not. Those BF images whose total image intensity was in the range of 5500–7900 pixels and whose total image area was in the range of 9000–13,000 pixels were best fit to proceed. The images that did not satisfy the criteria were discarded/flagged from the analysis.

    2.

    Preprocessing steps – the unflagged image set was then used for further analysis. To identify the cellular region, BF images were used. Each of the BF images were preprocessed using four modules: ImageMath, Smooth, RescaleIntensity and EnhanceEdges. The ImageMath module was used for subtraction of the foreground from the background of the BF images. The Smooth module applied the Gaussian filter on the BF images to blur the pixels outside the cell and highlight the pixels inside the cell. The RescaleIntensity module found the minimum and maximum intensity values across the entire BF image and rescaled every pixel, so that the minimum intensity value was zero and the maximum intensity value was one. The EnhanceEdges module made use of the Sobel filter method to highlight the edges of the cell identified in the BF image.

    3.

    Segmentation of cell boundary – the preprocessed BF images resulting from the above-mentioned steps were used to identify the cell boundary. This section of the workflow involved four modules. First, the threshold module was used to identify the pixels of interest based on the gray scale signal using Otsu thresholding, a method based on the signal variances between the foreground and the background. Second, the Morph module removed the identified pixels from the background of the segmented image. Third, the IdentifyPrimaryObjects module identified cell boundary. Finally, the ExpandOrShrinkObjects module was added to get a more accurate cellular boundary. The BF images that contained one cell per image proceeded for further processing, while those that contained more than one cell per image were discarded by the FlagImage module.

    4.

    Identification of singlets – this section of the workflow contained three modules, which were used to identify only the singlet MSCs and remove the images that contained more than one cell. The MeasureObjectSizeShape module calculated geometrical features, such as Area, FormFactor, Compactness and Eccentricity, of the cell identified from the BF image. A boundary condition was set up on the Area, FormFactor and Eccentricity measurements in the FilterObjects module to obtain the singlets. Eccentricity describes the deviation of a shape/object is from circularity, for a circle it is equal to 0. Compactness gives an idea whether an object has holes (or missing pixels) in it or is a close bound figure; a circle has a compactness of 1. Aspect ratio (minimum ferret diameter/maximum ferret diameter) defines how elongated an object is; for a perfectly circular object, it is equal to 1. For the identified cell to be a singlet, its Area was set in the range of 150–4500 pixels, its FormFactor lay in the range of 0.75–1.0 pixels and its Eccentricity value was in the range of 0.0–0.7 pixels. If the identified cell did not meet this criterion, the BF image was considered a nonsinglet and was discarded by the FlagImage module. The ranges specified in individual modules were set by first taking a set of 50–100 cell images that had a mix of target and nontarget cells and were measured during the pipeline building steps.

    5.

    Identification of live singlets – this part of the workflow confirmed an identified singlet as a live or a dead cell based on the appropriate marker using the following five modules: RescaleIntensity, MeasureObjectIntensity, FilterObjects, MeasureImageIntensity and FlagImage modules. These modules sequentially called for the SYTOX channel images (Channel 2) of the identified singlet. For a singlet to be a live singlet, the mean intensity value of the SYTOX channel was set to 0 pixels. Therefore, such cells with no signal in the SYTOX channel were the live cells. Detection of signal/intensity from the SYTOX channel marked the cell as a dead cell and hence it was discarded.

    6.

    Identification of CD45 negative population – this section of the workflow checked that the MSCs were negative for CD45 expression (tagged to PE) as one of the first criteria to identify them. The RescaleIntensity, MeasureObjectIntensity, FilterObjects, MeasureImageIntensity and FlagImage modules were again sequentially called for the PE channel (Channel 3) images of the identified live singlets to confirm that the cells were PE negative. The mean intensity values of the images from PE channel was set in the range 0–0.002 pixels.

    7.

    Classification of singlets into large, medium and small MSCs – the identified singlets of MSCs were categorized as small, medium and large cells based on their area and this was performed using the FilterObjects module. The small, medium and large MSC singlet classification was based on the area criteria set within the ranges of 2200–3000, 3001–4000 and 4001–4500 pixels, respectively.

    8.

    Identifying CD44 and CD73 surface marker regions – the region of expression of CD44 (Channel 1) and CD73 (Channel 5) was identified using the Identify Primary Object module based on the minimum cross entropy thresholding method.

    9.

    Estimation of parameters – various parametric features were calculated for the identified singlets based on the geometry, texture and intensity measurements. The modules included MeasureTexture and MeasureGranularity. Using the BF images, a bunch of features such as area, eccentricity, compactness, entropy and granularity were calculated. Using the identified CD44 and CD73 regions, the total intensity of both the markers were also measured. Texture Features such as inverse difference moment and entropy value were also calculated. Inverse difference moment gave a measure of the homogeneity within a defined region. Entropy value indicated the randomness within the structure.

    10.

    Visualization of singlets – the OverlayOutline module gave the outline of the cell and outline of the CD44 and CD73 markers on the BF image of the cell. These images were saved as .png files with the help of the SaveImages module.

    11.

    Saving data – in the last section of the workflow, the data of all the calculated values were saved as separate .csv files with the help of the ExportToSpreadsheet module. The data of the geometrical features of the singlets were also saved as a .properties file by the ExportToDatabase module. This .properties file can be used in the future for analysis and classification of the MSCs using the ML approach.

    Results & discussion

    Conventional flow cytometry-based analysis versus imaging cytometry analysis

    The workflows illustrated in the Figure 1, compared supervised analytical tools with the authors' automated data analysis workflow, MSCProfiler. To understand the robustness of this workflow, they used MSCs from two sources and three different passages: WJMSCs (early passage P6) and SHEDs (later passages P9 and P12). The MSCs in all three passages showed dual expression of the cell surface markers, CD44 and CD73. However, the authors observed distinct populations losing expression of these surface markers toward the later passages (Supplementary Figure 1). The IFC results were analyzed on FCS Express software following the gating strategy as shown in Figure 2A. The next set of analysis following similar flow cytometric logic, as shown in Figure 2B, was performed using IDEAS. The FCS Express analysis revealed that out of 142,171 (the total acquired events) the number of singlets was 10,314, out of which 9733 were live cells (i.e., negative for SYTOX Green), 9714 were CD45 negative and 9404 cells were double positive for both CD44 and CD73, shown in Figure 2A. Similarly, using IDEAS software, out of 142,171 total events, 54,356 were focused cells. From the focused cells, 7900 cells were identified as singlets and 7700 were live cells. Finally, the number of cells positive for both CD44 and CD73 was found to be 7579, as seen in Figure 2B. The gate statistics and the percentages for both the software (FCS express and IDEAS) have been summarized in a tabular format in Figure 2C. The sequential output of IDEAS gating strategy is described in Figure 3A. The visual confirmation of every acquired event during the analysis stage, through the image gallery of IDEAS, served to bolster the gating strategy on the platform. For example, from the image gallery the live cells were confirmed by distinguishing from the images of the cells that had taken up SYTOX Green in Channel 2 for as shown in Figure 3A.

    Figure 2. A comparison between the software FCS Express and IDEAS with regard to the gating strategies used.

    (A) FCS Express gating strategy shows a hierarchical approach to identifying target cells, which first starts with gating singlets, followed by identification of live cells (negative for SYTOX Green, followed by exclusion gating of CD45 negative cells and then identification of double-positive populations of CD73 and CD44 expressing cells. Gate hierarchy and statistics are indicated in the figure. (B) IDEAS gating strategy followed a similar sequential gating strategy that first identified focused cells, followed by the identification of singlets. Here the same gating hierarchy was followed to identify the double positive expressors (CD44+ CD73+ mesenchymal stem cells). A representative image of a single live stem cells of human exfoliated deciduous teeth (SHED; cell no. 3025) from the image gallery on the Amnis ImageStream shows the double expressor of CD44 and CD73. (C) The statistics of the different populations analyzed by these supervised platforms have been tabulated.

    Figure 3. Stepwise identification of populations of stem cells of human exfoliated deciduous teeth (SHED) using the supervised and authors' automated workflow, MSCProfiler.

    (A) Sequential output of IDEAS workflow following the gating strategy as described. Representative cell images in each step are shown. (B) The sequential output of MSCProfiler workflow shows a comparable end result. However, in the first case, data input includes the .rif, while in MSCProfiler, data input is in the form of single cell images.

    In Figure 4, single cell images as seen in the IDEAS image gallery are shown. In Figure 4A, the width of each cell was also calculated from the BF channel image (top right corner of each image). It displays the image gallery of different sized cells based on their BF information in Figure 4A. The authors observed that the cells could be categorized into three groups based on their cell width ranging between 16–20 μm (small), 20–26 μm (medium) and ≥27 μm (large). In Figure 4B, the CD44 and CD73 double positive expressors are shown. A lack of signal can be seen from the SYTOX and CD45 channels. In addition, to understand the dual expression pattern of the two surface markers CD44 and CD73, the corresponding channel images represented by blue and red, respectively, showed co-expression by using the masking feature of IDEAS software. Similar classification was observed in the image gallery of another passage of the SHEDs (P12) and from a different source of MSCs such as the WJMSCs (Figure 4C).

    Figure 4. Single cell images are shown from the IDEAS image gallery.

    Each number indicated in the upper left corner is the cell/event number, and the number in the upper right corner (in blue) indicates the width of the cells. All acquisition was done using the 40× objective. (A) Brightfield images obtained from the IDEAS image gallery show the heterogeneity in size, categorized into small, medium and large observed in stem cells of human exfoliated deciduous teeth (SHEDs) at passage 9. (B) The three different categories of cells in all five channels along with the colocalization mask of CD44 and CD73 are shown. (C) Imaging flow cytometry data of SHEDs at passage 12 and WJMSC at passage 6 were acquired, and representative images of each cell type from all five channels have been shown.

    WJMSC: Wharton's jelly-derived mesenchymal stem cell.

    Analysis using a novel workflow: MSCProfiler

    The authors have outlined all the steps that went into building their novel pipeline: the MSCProfiler. The prime benefit of this lay in automation of the analysis workflow, starting from image quality control right up to the parameter estimation and classification of cells, described in Figure 3B. MSCProfiler can be fed with a large number of image sets depending upon the computational bandwidth available to the user. In this study, the datasets were split into batches to perform the analyses to ease out the data size. Post segmentation, the three types of features that were extracted from the imaging modalities were the geometrical and texture features and the intensity values. The ‘Focused Cells’ (54,356) annotated in this pipeline generated images of SHEDs that revealed statistically significant heterogeneous cell populations within a single passage of MSCs (SHEDs P9). Similarly, roughly 10,000 and 13,000 each of SHEDs (P12) and WJMSC (P6) cells, respectively, were also analyzed using the MSCProfiler (Supplementary Figures 2 & 3). The spread or distribution of area (based on number of pixels in a shape) in the cells (SHEDs P9), as seen in Figure 5A & B, demonstrated three different categories of populations, which were small (2200–2999 pixels), medium (3001–3999 pixels) and large (4008–4493 pixels), and most of the cells belonged to the small category. Categorization of the cell types of WJMSCs (P6) and SHEDs (P12) based on cell size are shown in Figure 5C & D. This also clearly indicates the robustness of the automated workflow using different cell types and passages, without human bias. The implication of this for the morphometric parameters was revealed further in the extraction of geometrical features. The shape quantification of the SHEDs enhanced the authors' understanding of the characterization parameters. With the goal of using the minimum necessary measurement features to characterize a single MSC adequately so that it can be unambiguously classified, the authors chose the aspect ratio, eccentricity and compactness of a single cell as discriminators. The performance of the pipeline in determining the shape measurements depended a lot on how the image objects were preprocessed. The distribution of aspect ratio (ratio of image object height vs width) did not show statistical significance among the three categories of cells (small, medium and large), as seen in Figure 6A. Analysis of the annotated pixel values using the Kruskal–Wallis multiple comparisons test demonstrated that the small cells were more circular (close to 1.0) than the medium and large cells from the same passage (p > 0.99). However, the distribution was negatively skewed in the small cells because the whisker and half-box were longer on the lower side of the median than on the upper side.

    Figure 5. Cell area graphs to demonstrate cell size heterogeneity in mesenchymal stem cells using the MSCProfiler.

    (A) Graphical representation of the spread/distribution of the area parameter of stem cells of human exfoliated deciduous teeth (SHEDs) at passage 9, run through the MSCProfiler pipeline. (B) Classification into distinct groups of small, medium and large cells based on the cell area of SHEDs at passage 9; 3486 cells obtained as an output of the MSCProfiler pipeline were graphically plotted. (C & D) Images of SHEDs at passage 12 and WJMSC at passage 6 were run on MSCProfiler, then classified into small, medium and large cells. Statistical significance was determined using the Kruskal–Wallis multiple comparisons test using GraphPad Prism (v8.0.1) software.

    Figure 6. Geometrical and texture parameters demonstrated heterogeneity in mesenchymal stem cells (SHEDs P9) using the MSCProfiler.

    (A–C) Geometrical features such as aspect ratio, eccentricity and compactness were compared among the three different classes of cells. (D–F) Texture features such as inverse difference moment, sum entropy and entropy values were compared among the three different classes of cells in stem cells of human exfoliated deciduous teeth at passage 9. Statistical significance was determined using the Kruskal–Wallis multiple comparisons test using GraphPad Prism (v8.0.1) software.

    The eccentricity feature (ratio of the minor axis length to the major axis length of an image object) distributed the pixel values (between 0 and 1) and demonstrated that the small cells had lower eccentricity values than the medium and large cells in the same passage, as shown in Figure 6B. The center of distribution of the box and whisker plots was the lowest of the three distributions in the small cells (median: 0.2–0.4), while in the others the median was between 0.4 and 0.6. The distribution in all three classes was approximately symmetric, as both the half-boxes were almost the same length on both upper and lower sides. According to the statistics, small cells were more circular than the other two categories.

    Cell shape measure can be best calculated from descriptors such as mean compactness (ratio of the area of an image object to the area of a circle within the same perimeter). A circle is depicted by a minimum value of 1.0. The larger the compactness, the more irregularities and complexities of the cell boundary. The small cells showed lower compactness values than the medium and large cells, as shown in Figure 6C. The center of distribution of the box and whisker plots was the lowest of the three distributions in the small cells (median: 1.0–1.1), while the medium cells had distributed values between 1.07 and 1.31 and large cells between 1.08 and 1.27. However, the distribution was positively skewed in the small cells because the whisker and half-box were longer on the upper side of the median than on the lower side.

    Texture features are also very important computational feature extraction descriptors. They bring about the values between shapes and individual pixel values. An important measure of variation brought about from the Haralick texture features is inverse difference moment. It is a measure of homogeneity in cells, which gets maximized when neighboring image pixels share same values (i.e., while measuring texture analysis, two pixels are considered at a single time, the reference and neighbor pixel). The spatial relationship between the reference and neighbor pixel is calculated to understand the gray level differences. There were stark differences among the three classes of cells in terms of inverse difference moment, as seen in Figure 6D (also refer to Supplementary Figure 3). The small cells were more homogeneous in terms of texture compared to the other two classes of cells- the medium and large cells. It provided high discrimination accuracy for images acquired in motion. This discriminator for local homogeneity is lower in medium and much lower in large cell types. The center of distribution of the box and whisker plots was the highest of the three distributions in the small cells (median: 0.2–0.25), while the medium cells had distributed values close to 0.2 and large cells close to 0.15.

    Entropy of population analysis can reveal highly structured cellular patterns. IFC dataset distributions are being utilized for identification of malignancy in other cell types by analyzing the differences in multidimensional distributions of related entropies. The entropy and combined entropy of the small cells (9.0) were lowest compared with the medium (9.5) and large cell types (10.71). The center of distribution of the box and whisker plots in all three sets showed homogeneous entropy patterns. More entropy-based patterns could be identified from the medium and large cells, as described in Figure 6E & F. Application of these textural entropy investigations can evaluate the homogeneity and randomness of gray values within the BF cell images, essentially making it a label-free digital image analysis of single cells.

    In terms of biomarker expression, CD73-PerCP Cy5.5 and CD44-V450 expression levels were compared in SHEDs (P9). CD73 intensity of expression was higher in the large cells compared with the medium and small cells, as seen from Figure 7A. The authors observed that the CD44 expression levels were higher than CD73 in all three cell types, represented in Figure 7B. To compare the distribution pattern of both the markers, the authors normalized the cell area from which CD44 and CD73 showed expression, by dividing that area by total area of the cell to obtain percentage expression. Results showed that CD44 was expressed over a larger area when compared with CD73, significantly (p < 0.0001) when analyzed using the Mann–Whitney test shown in Figure 7C. FCS Express software analyzed a population of 0.27% and 0.69% of cells which were non-expressers of CD44 and CD73 in SHEDs P9 and P12, respectively (Supplementary Figure 1). Although MSCProfiler can identify these rare cell populations, a greater number of datasets and additional criteria are required to modify the existing pipeline (part of an ongoing study).

    Figure 7. Fluorescence parameters demonstrate heterogeneity in SHEDs (P9) using the MSCProfiler.

    (A & B) Intensity of CD73 and CD44 expression compared among the three classes of cells. (C) Normalized values of cell area expressing markers CD44 and CD73 were compared. CD44 was expressed over a larger area when compared with CD73, significantly (p < 0.0001). Statistical significance was determined using the Mann–Whitney test using GraphPad Prism (v8.0.1) software (%CD44 Exp and %CD73 Exp denote the normalized percentage of CD44 and CD73 expression area, respectively).

    MSC heterogeneity has been documented by many researchers. It can be derived from different sources of tissues or even arise randomly from a clonally dividing cell population. However, what is debatable is whether the appearance of such cellular heterogeneity within the MSC population follows a stochastic or a deterministic process. Identifying heterogeneity among the MSCs has always been a challenge, in which the classical approach using microscopy and standard cytometry assays gives information about their surfaceome but not an exhaustive one, as these stem cells do not express any exclusive signature surface markers. There have been attempts using deep learning and ML methods to explore functional heterogeneity and phenotypic and morphometric classification in MSCs. However, most of these studies have been conducted at the population level using microscopic imaging of MSC in vitro cultures, which restricts the sample size for analyses [23]. Few reports of in silico studies using single cell images of MSCs are under way, and all of these use comprehensive image processing algorithms with extensive knowledge of image processing computational tools [19]. MSCProfiler captures the information of MSC heterogeneity based on the texture features of the single cells from IFC and helps explain the detailed cellular features of the population. The existence of heterogeneity is proof enough to show that studies must not be limited to population level but must also look at single cell expression. This will help researchers come up with more stringent protocols for identifying MSCs best suited for clinical purposes. This brings in the rational of the present study in developing an unsupervised workflow that can assist a stem cell biologist in the appropriate identification and classification of MSCs best fit for specific translational studies.

    Feature extraction from images of single cells was the hallmark of this study, which led the authors to develop the MSCProfiler workflow. The aim was to identify postacquisition pattern recognition of MSCs (SHEDs) based on multiple image features of single cells such as aspect ratio, cell texture, shape, surface antigen distribution and intensities of expression of such antigens. The authors characterized the gross surfaceome of the SHEDs in this study by developing a nonsupervised image processing pipeline that can robustly segment and analyze single cell morphologies from BF standalone images. There are multiple robust, high-end screening image analysis programs available that can quantify visual cellular morphotypes by microscopy [15]. However, the critical criteria in any such image analysis tool should be lack of bias, ability to identify image-based aberrations (image blur, debris crowding, autofluorescence, saturation of pixels) and high speed of resolution of individual image objects. This pipeline used images of single cells generated from an imaging flow cytometer (Amnis ImageStream Mk II platform), which collectively gives an advantage and takes care of most of the image-based aberrations, making a single image object for acquired data files. This feature rules out the first concern of shadowing of cellular features in case of a smear or a tissue slice [24]. In this paper, the authors have described an automated protocol implemented in validated open-source software called CellProfiler [24–26] with the capacity to offer a suite of image-based measurement features which can extract quantitative information from images. In addition, the parameters extracted from this workflow can be further used to build an unbiased categorization of singe cells based on the phenotypes using ML approaches. Though such tasks can be programmed using R/Python there are more user-friendly tools like CellProfiler Analyst. In fact, with the workflow described in this paper it is possible to extract cellular parameters in file formats that are supported by CellProfiler Analyst.

    Although conventional flow cytometry on its own can be a source of high-throughput screening for large datasets, IFC has been able to plug in the image data output to further enhance its throughput. The authors have strategically validated their pipeline with both conventional and image flow cytometry data and demonstrated the shortcomings in either case. Conventional flow cytometry lacks spatial information of every dot on the analysis plot, while IDEAS on the Amnis platform is proprietary and needs to be customized to meet individual needs. This has been one of the first attempts to screen for cellular heterogeneity of MSCs using morphometric features of single cells.

    Conclusion

    MSCProfiler is an approach to analyzing IFC data that adds valuable information from the images concurrently with conventional flow cytometric analysis. This would be the value addition to answering the respective biological questions that can be resolved using MSCs. This method is completely automated, and so there is no human bias, which was one of the crucial challenges of the other currently available techniques. It does not require extensive programming knowledge to perform such an intense analysis. Analyzing other sources of MSCs and different passages within the same type highlights the robustness of the workflow developed. However, having a high computational processing capacity to analyze thousands of image datasets should be considered as a caveat while using the MSCProfiler. As this pipeline is dependent on the computational power of the data processing system, it brings along with it the time constraint and the data storage of large file sizes (.daf file generated from the IDEAS software is ~0.1 GB). Our study used only five parameters to be analyzed by the MSCProfiler. In cases where more parameters need to be explored (6–12), other data analytical tools such as Cytominer or customized algorithms have to be employed to classify these image sets. In this study, we had to segregate the images into groups of 10,000–20,000 images for a run time of 2–4 h on the MSCProfiler.

    Future perspective

    This study has been one of the first attempts to screen cellular heterogeneity of MSCs derived from two different tissue sources using morphometric features of single cells. An automated workflow provided by the MSCProfiler has proved to be an unbiased approach to extracting image texture information from a huge range of single cell images that is not dependent on the instrument generating the data. Our workflow has been able to extract similar morphometric features of different types of MSCs in an unbiased manner. On the other hand, MSCProfiler can also be used to identify different cell populations with the inclusion of additional criteria such as fluorescence signals of the surface markers and correlate the extracted texture features for their identification. This is being evaluated with ongoing experiments on identification of hematopoietic stem cells or ‘blasts’ in human bone marrow samples. Most importantly, MSCProfiler does not require knowledge of any advanced computation language to apply this pipeline to the study of single cells. MSCs have a huge potential in drug discovery and cell-based therapeutic applications. While manufacturing clinical grade MSCs in an expanded scale, the availability of a quality assurance/quality control software such as the MSCProfiler can set comparable and rigorous standards for the production systems for cell culture. The stem cell heterogeneity, arising due to isolation, culture and expansion conditions, warrants a robust tool to set quality standards for these stem cells during production and make the supply more consistent and uniform in an efficient and cost-effective manner.

    Executive summary
    • To address the challenges in defining single cell phenotypes, an automated pipeline was developed using an enhanced image feature extraction approach.

    Experimental

    • Mesenchymal stem cells (MSCs) from stem cells derived from human exfoliated deciduous teeth (SHEDs) and Wharton jelly-derived MSCs (WJMSCs) obtained from the umbilical cord tissue, across different passages, were immunostained in single cell suspension and quantified using imaging flow cytometry on the Amnis ImageStream platform.

    • Using these single cell images, an automated pipeline named MSCProfiler was developed. The workflow was developed using CellProfiler software to analyze the single cell images of mesenchymal stem cells.

    Results & discussion

    • The prime focus in automation of the analysis workflow (MSCProfiler) started from image quality control right up to parameter estimation and classification of cells.

    • Postsegmentation, these images were classified to calculate geometrical and texture features such as shape, size, eccentricity and entropy along with intensity values of the surface markers from over 50,000 single cell images obtained from imaging flow cytometry.

    • The texture features of the mesenchymal stem cells such as inverse difference moment, sum entropy and entropy values proved to be very important feature descriptors that showed differences between different passages of SHEDs and between SHEDs and WJMSCs.

    • In terms of biomarker expression, CD73-PerCP Cy5.5 and CD44-V450 expression levels also showed additional differences in patterns. Results showed that CD44 was expressed over a larger area when compared with CD73 among the SHEDs.

    Conclusion

    • Development of the MSCProfiler was an approach to analyze the imaging flow cytometry data that added valuable information from the images concurrently with the conventional flow cytometry analysis. The hallmark of this screening pipeline was the identification and removal of nontarget images and extraction of features from single cell brightfield and fluorescent images of single cells. This was an important value addition to answering the respective biological questions of the mesenchymal stem cells.

    Supplementary data

    To view the supplementary data that accompany this paper please visit the journal website at: www.future-science.com/doi/suppl/10.2144/btn-2023-0048

    Author contributions

    Each coauthor listed participated sufficiently in the work to take responsibility for the content, and all those who qualify are listed. U Chakraborty and L Balasubramanian conceptualized and designed the study. A Gupta and SK Shaik performed the experimental work, analyzed the associated data, contributed to writing portions of the draft of the manuscript and revised it critically for important intellectual content. U Chakraborty, A Gupta and L Balasubramanian contributed to the analysis and interpretation of data. L Balasubramanian contributed to the data curation and establishment of the pipeline. U Chakraborty and L Balasubramanian contributed to the final approval of the version to be published. U Chakraborty supervised the entire study, including project administration, investigation and original draft writing and agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. All authors approved the final manuscript.

    Acknowledgments

    The authors thank Cytek Biosciences for its support in providing the Amnis Imagestream Mark II imaging flow cytometer in their facility. They are grateful to Debjani Kundu, field application scientist, Cytek, for helping with data analysis on the IDEAS software. They thank Risani Mukhopadhyay for providing raw image files of Wharton's jelly-derived mesenchymal stem cells for analysis. The authors are grateful to the Manipal Institute of Regenerative Medicine for its support in conducting the experiments. A. Gupta is grateful for the support of TMA Pai Scholarship from Manipal Academy of Higher Education.

    Financial disclosure

    Financial and material support was received for this research by Intramural Grant Manipal Academy of Higher Education, Manipal, India. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

    Competing interests disclosure

    The authors have no competing interests or relevant affiliations with any organization or entity with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending or royalties.

    Writing disclosure

    No writing assistance was utilized in the production of this manuscript.

    Ethical conduct of research

    Institutional Committee on Stem Cell Research approval for conducting this research was obtained. For sample procurement (human third molar teeth and umbilical cord tissue) concerned hospital ethical clearance certificate was obtained for the work.

    Open access

    This work is licensed under the Attribution-NonCommercial-NoDerivatives 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/

    Papers of special note have been highlighted as: • of interest

    References

    • 1. Andrzejewska A, Lukomska B, Janowski M. Concise review: mesenchymal stem cells: from roots to boost. Stem Cells 37(7), 855–864 (2019).
    • 2. da Silva Meirelles L, Chagastelles PC, Nardi NB. Mesenchymal stem cells reside in virtually all post-natal organs and tissues. J. Cell Sci. 119(11), 2204–2213 (2006).
    • 3. Glenn JD. Mesenchymal stem cells: emerging mechanisms of immunomodulation and therapy. World J. Stem Cells 6(5), 526–539 (2014).
    • 4. Saeedi P, Halabian R, Fooladi AAI. A revealing review of mesenchymal stem cells therapy, clinical perspectives and modification strategies. Stem Cell Investig. 6, 34 (2019).
    • 5. Wilson A, Webster A, Genever P. Nomenclature and heterogeneity: consequences for the use of mesenchymal stem cells in regenerative medicine. Regen. Med. 14(6), 595–611 (2019).
    • 6. Pittenger MF, Discher DE, Péault BM et al. Mesenchymal stem cell perspective: cell biology to clinical progress. NPJ Regen. Med. 4(22), (2019).
    • 7. Lansdowne LE. Single cell analysis – advantages, challenges, and applications (2019). www.technologynetworks.com/drug-discovery/blog/single-cell-analysis-advantages-challenges-and-applications-322768 • This paper will enable the reader to understand single cell technology.
    • 8. Hennig H, Rees P, Blasi T et al. An open-source solution for advanced imaging flow cytometry data analysis using machine learning. Methods 112, 201–210 (2017).
    • 9. Maguire O, O'Loughlin K, Minderman H. Simultaneous assessment of NF-κB/p65 phosphorylation and nuclear localization using imaging flow cytometry. J. Immunol. Methods 423, 3–11 (2015).
    • 10. Phadwal K, Alegre-Abarrategui J, Watson AS et al. A novel method for autophagy detection in primary cells: impaired levels of macroautophagy in immunosenescent T cells. Autophagy 8(4), 677–689 (2012).
    • 11. Pugsley HR. Quantifying autophagy: measuring LC3 puncta and autolysosome formation in cells using multispectral imaging flow cytometry. Methods 112, 147–156 (2017).
    • 12. Durdik M, Kosik P, Gursky J et al. Imaging flow cytometry as a sensitive tool to detect low-dose-induced DNA damage by analyzing 53BP1 and γH2AX foci in human lymphocytes. Cytometry A 87(12), 1070–1078 (2015).
    • 13. Lee Y, Wang Q, Shuryak I, Brenner DJ, Turner HC. Development of a high-throughput γ-H2AX assay based on imaging flow cytometry. Radiat. Oncol. 14(1), 1–10 (2019).
    • 14. Filby A, Perucha E, Summers H et al. An imaging flow cytometric method for measuring cell division history and molecular symmetry during mitosis. Cytometry A 79(7), 496–506 (2011). • The potential of using imaging flow cytometry is appropriately described in this paper.
    • 15. Negm AS, Hassan OA, Kandil AH. A decision support system for acute leukaemia classification based on digital microscopic images. Alex. Eng. J. 57(4), 2319–2332 (2018).
    • 16. Helgadottir S, Midvedt B, Pineda J et al. Extracting quantitative biological information from bright-field cell images using deep learning. Biophys. Rev. 2(3), 31401 (2021).
    • 17. Imboden S, Liu X, Lee B, Payn MC, Hsieh C-J, Lin NYC. Investigating heterogeneities of live mesenchymal stromal cells using AI-based label-free imaging. Sci. Rep. 11(1), 6728 (2021).
    • 18. Otesteanu CF, Ugrinic M, Holzner G et al. A weakly supervised deep learning approach for label-free imaging flow-cytometry-based blood diagnostics. Cell Rep. Methods 1(6), 100094 (2021).
    • 19. Kim G, Jeon JH, Park K, Kin SW, Kim DH, Lee S. High throughput screening of mesenchymal stem cell lines using deep learning. Sci. Rep. 12(1), 17507 (2022).
    • 20. Sanz G, Martinez-Aranda LM, Tesch PA, Fernandez-Gonzalo R, Lundberg TR. Muscle2View, a CellProfiler pipeline for detection of the capillary-to-muscle fiber interface and high-content quantification of fiber type-specific histology. J. Appl. Physiol. 127(6), 1698–1709 (2019).
    • 21. Dominici M, Le Blanc K, Mueller I et al. Minimal criteria for defining multipotent mesenchymal stromal cells. The International Society for Cellular Therapy position statement. Cytotherapy 8(4), 315–317 (2006).
    • 22. Chan DK, Miskimins WK. Metformin and phenethyl isothiocyanate combined treatment in vitro is cytotoxic to ovarian cancer cultures. J. Ovarian Res. 5(1), 19 (2012).
    • 23. Gundry RL, Riordon DR, Tarasova Y et al. A cell surfaceome map for immunophenotyping and sorting pluripotent stem cells. Mol. Cell. Proteomics 11(8), 303–316 (2012).
    • 24. Bray MA, Carpenter AE. Quality control for high-throughput imaging experiments using machine learning in Cellprofiler. Methods Mol. Biol. 1683, 89–112 (2018). • This paper will enable the reader to understand the baseline software that was used to develop the MSCProfiler pipeline/workflow.
    • 25. Lamprecht MR, Sabatini DM, Carpenter AE. CellProfiler™: free, versatile software for automated biological image analysis. BioTechniques 42(1), 71–75 (2007). • This paper will enable the reader to understand the baseline software that was used to develop the MSCProfiler pipeline/workflow.
    • 26. Bray MA, Vokes MS, Carpenter AE. Using CellProfiler for automatic identification and measurement of biological objects in images. Curr. Protoc. Mol. Biol. 109, 14.17.1–14.17.13 (2015). • This paper will enable the reader to understand the baseline software that was used to develop the MSCProfiler pipeline/workflow.