Study design
Three different types of thyroid samples were utilized for this study. The first group (Cyto-Hürthle) included FNAB for developing Hürthle cell and Hürthle cell Neoplasm indices, where detailed cytologic features were curated by expert thyroid cytopathologists using microscopic examination from FNAB. Hürthle cell-specific cytology class labels were assigned based on the presence or absence of Hürthle cells and potentially neoplastic features. The second group was comprised of FNAB for developing the Afirma Benign/Suspicious (B/S) classifier, where histopathology diagnoses were available. Two subsets were separately analyzed, as described in [34]: (1) a subset of samples used for training the Afirma GSC and (2) 191 samples used for the validation of Afirma GSC. The third group used fresh, frozen thyroid surgical tissues with histopathology diagnoses, including 12 Hürthle cell and 43 non-Hürthle cell cases, for examining copy number variation (CNV) and LOH using the Affymetrix CytoScan platform.
Identifying cytopathology samples for review
The Veracyte database was queried to identify 285 FNAB where Hürthle cell features were noted in the initial cytological reading of the case. An additional 272 FNAB where no Hürthle cell features were noted were also selected. These cases were subject to blinded re-review by a panel of 3 cytopathologists. Features examined were: cellularity, proportion of Hürthle cells, Hürthle cell morphology, Hürthle cell maturation spectrum, and presence or absence of colloid. Based on these features, four classes of samples were generated (See Additional file 1: Figure S1 for representative images): 1. Hürthle cell positive, Neoplasm positive (H + N+); 2. Hürthle cell positive, Neoplasm negative (H + N-); 3. Hürthle cell negative, Neoplasm positive (H-N+); 4. Hürthle cell negative, Neoplasm negative (H-N-). This analysis yielded 318 samples, including 119 Hürthle cell-negative and 199 Hürthle cell-positive samples. Of the 199 Hürthle cell-positive samples, 27 were identified as Bethesda II and were therefore labelled Neoplasm-negative, while 71 were Bethesda IV and were therefore labelled Neoplasm-positive. Samples were de-identified prior to cytopathology re-review prior to RNA Access library preparation.
Affymetrix CytoScan
Thyroid tissue DNA was extracted with the AllPrep Micro kit (Qiagen, Hilden, Germany) and quantitated with the Pico Green dsDNA kit (Thermo Fisher) on a Tecan Infinite Pro 200 plate reader (Tecan, Männedorf, Switzerland). DNA (125 ng) was used as input into the CytoScan HD array kit (Affymetrix, Santa Clara, CA) and the samples were processed according to the manufacturer’s protocol. Cel files were input into the Affymetrix Chromosome Analysis Software (ChAS) and Copy Number and SNP outputs were analyzed for LOH and other CNVs.
RNA library preparation and next-generation sequencing
Samples were processed as described [34]. Briefly, 15 ng of total RNA was input into a Microlab STAR (Hamilton, Reno, NV) automated version of the TruSeq RNA Access Library Preparation Kit (Illumina, San Diego, CA). Libraries were sequenced on the NextSeq 500 (Illumina, San Diego, CA) using paired-end 2 × 76 cycle reads.
RNA sequencing pipeline, feature extraction, and quality control
RNA-seq data were processed as described [34]. Raw sequencing data was aligned to human reference genome assembly 37 using STAR aligner. Normalized expression levels were obtained using variance stabilizing transformation (VST) from the DESeq2 package [32]. The gene-wise dispersion parameter was estimated by the ‘local fit’ method. Genome-wide variants were identified using the GATK variant calling pipeline. Samples that did not satisfy the minimum in-house sequencing QC metrics were excluded from downstream analyses.
Feature engineering
Loss-of-heterozygosity (LOH) statistic
We developed a LOH statistic at the chromosome and genome level using genome-wide variants. The statistic quantifies the magnitude of LOH by calculating the proportion of variants that have a variant allele frequency (VAF; fraction of reads carrying the alternative allele) away from 0.5 (< 0.2 or > 0.8) after pre-filtering of variants with a VAF exactly at one. For the genome-level LOH statistic calculation, the mitochondrial genome and X and Y chromosomes were excluded. The details of the LOH statistic calculation are shown in the formulas below, where “n_loss_het” is the number of variants with a VAF far away from 0.5 (< 0.2 or > 0.8), and “n_all_het” is the total number of potentially heterozygous variants. LOH statistic was calculated both for each chromosome (referred as chromosome-level LOH) and for the entire genome (referred as the genome-level LOH).
$$ LOH=\frac{n\_ loss\_ het}{n\_ all\_ het} $$
$$ n\_ loss\_ het={\sum}_{i=1}^N{\displaystyle \begin{array}{c}1\ if\ 0< vaf<0.2\ or\ 0.8< vaf<1\\ {}0\end{array}} $$
$$ n\_ all\_ het={\sum}_{i=1}^N{\displaystyle \begin{array}{c}1\ if\ 0< vaf<1\\ {}0\end{array}} $$
Fifty-four tissue samples have LOH measured by both Affymetrix CytoScan and RNA-seq. Concordance between these two methods is shown in Additional file 1: Figure S2. These data correlated well on the samples with high level LOH (> 0.2 by Affymetrix CytoScan), all of which were Hürthle samples.
Mitochondrial features
Mitochondrial genes were captured during RNA Access library preparation and the same experimental procedures and bioinformatic sequencing pipelines were applied as described in previous sections. In total, there are 13 protein coding genes, and transcripts from all 13 were captured by the sequencing assay. Exploratory data analysis revealed all 13 genes showed differential expression levels between Hürthle negative and positive groups. Therefore, all mitochondrial genes were included in the gene feature set to undergo feature selection in downstream classifier development.
Hürthle cell index (HI) development
A total of 318 FNA samples (199 Hürthle cell + and 119 Hürthle cell -) were used to develop a Hürthle cell Index (HI), which is a binary classifier, determining if a sample is Hürthle cell + or Hürthle cell -. Ten-fold cross validation was performed to estimate the training performance, and the final model was built on all samples. Classifier development comprised three sequential steps: (1) differential expression analysis on 21,162 genes, using a statistical software package, edgeR [35], (2) selection of top-ranked genes with a FDR-adjusted p-value < 0.05 and expression fold-change (log2 scale) > 1.5, (3) optimizing parameter setting of multiple state-of-the-art machine learning algorithms with nested cross-validation. The algorithms we tested include support vector machine (SVM), elastic net, random forest, as well as SVM with asymmetrical cost to account for class imbalance. Hyperparameter tuning was performed in the inner layer; while the performance evaluation was performed in the outer layer holding out 10% of samples for each fold. SVM was selected due to its optimal cross-validated performance. The cost-parameter tuning for SVM was performed on a grid of (1e-04, 0.001, 0.01, 0.05, 0.1, 1, 5, 10). The best parameter selected for the final model was 0.001, and the associated number of support vectors was 106. Based on these parameters, the final SVM model was established using the ‘svmLinear’ method from the ‘caret’ R package [36] with all training samples and 1408 genes selected from the differential expression analysis.
Hürthle cell neoplasm index (NI) development
Among the 199 Hürthle cell samples used for HI development, 98 were further grouped into Neoplasm+ (n = 71) and Neoplasm- (n = 27) and used for Neoplasm Index (NI) training. NI is a binary classifier, determining if a Hürthle cell + sample is Neoplasm+ or Neoplasm-. Algorithm training for the NI was carried out similarly to the training for the HI but included novel LOH statistics as features. For the final model, 2041 genes were selected from the differential expression analysis. In addition, 15 chromosome-level LOH statistics (chromosomes 1, 2, 3, 4, 5, 6, 8, 9, 11, 13, 14, 15, 16, 18, and 19) and genome-level LOH were included as features for model training. The SVM was then built similarly to HI training on the same cost-parameter grid. The best parameter selected for the final model was 0.001, and the associated number of support vectors was 51.
Integrating Hürthle and Hürthle neoplasm indices into the Afirma GSC B/S classification workflow
The Afirma GSC classification workflow (Fig. 1a) begins with four upstream classifiers handling special thyroid FNA entities (Parathyroid Adenoma (PTA), Medullary Thyroid Carcinoma (MTC), BRAF V600E, and RET/PTC1 and RET/PTC3 fusions). It then uses the ensemble B/S model to classify a majority of samples as GSC benign or suspicious, as described previously [34]. The HI and NI are integrated with the ensemble model to increase overall classification performance (Fig. 1b).
There are three mechanisms for arriving at the GSC Benign versus Suspicious binary outcome for a given sample:
-
(1)
The result is GSC Benign if the ensemble B/S score is lower than the nominal threshold; otherwise
-
(2)
The result is initially GSC Suspicious but can be reassigned to a benign call by “Hürthle-adjustment” if the sample is predicted as Hürthle cell Index-positive (HI+), and Neoplasm Index-negative (NI-), and the ensemble B/S score is lower than the Hürthle-adjusted threshold; otherwise
-
(3)
The result is GSC Suspicious.
The three types of outcomes are described with mathematical formulae as follows. For a given sample i, we denote the scores from HI and NI, and the ensemble B/S classifier as Hi, Ni, and BSi, respectively. We denote the thresholds for HI and NI as tH and tN, respectively. For the ensemble B/S score, two thresholds exist; one is the nominal threshold, tBS_nominal, and the other is an increased threshold to handle Hürthle cell-positive, Neoplasm-negative cases. The latter is referred to as a “Hürthle-adjusted” threshold and denoted as tBS_Hürthle. The binary call outcome of the sample will be:
-
(1)
GSC Benign, if BSi < tBS_nominal; otherwise
-
(2)
Initially GSC Suspicious, but reassign to GSC Benign if Hi > tH, and Ni < tN,and tBS_nominal ≤ BSi < tBS_Hürthle; otherwise
-
(3)
GSC Suspicious
The integration of HI and NI into Afirma GSC workflow aims to “rescue” truly benign Hürthle cell-containing samples by actively reassigning these samples to GSC benign that would otherwise have been called GSC Suspicious.
Determining thresholds during algorithm development to maximize overall performance
Multiple factors were considered in determining thresholds for individual HI and NI, as well as the Hürthle-adjusted B/S threshold. First, cross-validation training performance of HI and NI were evaluated against the cytology label. Second, we examined the concordance of predicted Hürthle cell and Neoplasm status with the histopathology diagnosis using samples from the Afirma GSC B/S training set. Finally, the potential gain in the overall Afirma GSC performance due to Hürthle-adjustment was assessed. Due to the complexity of including two types of datasets (one focused on cytologic features, and the other on histopathology labels) and integration of three separate classifiers, complex dynamic parameter optimization was engaged with extensive multi-dimensional grid search to examine the tradeoff in the sensitivity and specificity for each classifier alone, and in combination. The final thresholds were chosen by optimizing overall Afirma GSC B/S specificity, while maintaining a high sensitivity (> 90%), by enabling high performance in both HI and NI.