Benchmarking the paediatric <scp>T</scp> ‐cell <scp>ALL</scp> subtype classifier, <scp>TALLSorts</scp>

20250 citationsJournal Articlehybrid Open Access

Authors

Ozcan Gulbey · Newcastle University

Terena James · Illumina (United Kingdom)

Ruth E. Cranston · Newcastle University

Dagmara Furmanczyk · Illumina (United Kingdom)

Claire Schwab · Newcastle University

Anna Lawson · Cancer Research UK Clinical Trials Unit

Pamela Kearns ·

Abstract

To the Editor, T-cell acute lymphoblastic leukaemia (T-ALL) is a heterogeneous disease comprising 15 genomic subtypes.1 Although the outcome of children with T-ALL has improved over the past few decades, survival rates still lag behind B-cell precursor patients, and the outlook after relapse is dismal.2 Despite numerous studies, no robust validated genetic biomarkers have emerged that are used prospectively to tailor therapy. The rarity of the disease (15% of ALL) coupled with the large number of subtypes makes the identification and validation of T-ALL biomarkers challenging. Furthermore, the different types of abnormality and subsequent variation in detection methods add further complexity. Even though there is an urgent need for robust biomarkers in T-ALL, few studies have externally validated and evaluated potential biomarkers.3 A recent publication described a gene expression classifier (TALLSorts) capable of identifying seven distinct subtypes of T-ALL.4 In this study, we classified 126 patients using TALLSorts, examined the subtypes using orthogonal genetic data and evaluated the clinical relevance of the subtypes. Patients diagnosed with T-ALL by standard morphological and immunophenotypic methods were treated on UKALL2003 (n = 87) or UKALL2011 (n = 39), approved by the Scottish Multi-Centre or North Thames Research Ethics Committees.5, 6 Cytogenetics, fluorescence in situ hybridisation (FISH) and Multiplex Ligation-dependent Probe Amplification (MLPA) data were generated, curated and coded as previously described.3, 7 Libraries were prepared using Illumina RNA preparation kits, and sequencing was performed on a NextSeq500 or NovaSeq 6000 system (see Supporting Information S1). FASTQ files were checked for quality control with FastQC8 (v.0.12.1) and MultiQC9 (v.1.0.dev0). Adapters and poor quality reads were trimmed with BBmap10 (v.39.06) and HOMER11 (v.4.11.1). Salmon12 (v.1.9.0) was used to align reads to hg38 reference genome index for estimated read counts from trimmed FASTQ files. Quant files from Salmon were imported into R (v.4.3.2) with tximport13 (v.1.30.0) to create a count matrix file. The TALLSorts algorithm was downloaded from the publicly available repository, GitHub, and installed as instructed.4 The count matrix file was run through the TALLSorts algorithm within the Miniconda environment in Ubuntu (v.20.04.6), producing predicted scores and subtypes for all 126 samples. In addition, gene fusions were identified using Arriba (v1.2.0).14 Briefly, trimmed FASTQ reads were aligned to hg38 and Gencode 28 using STAR (2.7.0e)15 before running Arriba fusion detection. We used standard statistical tests to compare subgroups and perform survival analysis (see Supporting Information S1). The patient characteristics and outcomes of the cases included in this study were broadly representative of the total T-ALL cohort, but the tested cohort was younger (Tables S1 and S2). The TALLSorts algorithm produces probability scores (0–1.0) indicating the likelihood the sample belongs to each subtype. We used the same threshold (0.5) as the original study to assign samples to a subtype. The probability scores for each subtype revealed that some distributions were discrete (e.g. TLX1, TLX3), while others were diffuse (e.g. HOXA_MLLT10) (Figure 1A). This finding contrasts with the original study, which reported discrete distributions for all subtypes.4 Across the 126 samples, there were 208 positive calls (i.e. probability score >0.5), ranging from 0 to 6 per case (Figure 1B). Among the 67 cases with >1 positive call, 64 (96%) cases included a HOXA_MLLT10 call with the vast majority (53/64, 83%) being a secondary call (i.e. having the lower of the probability scores for that sample) (Figure S1). These observations suggest that the signature for the HOXA_MLLT10 subtype is less robust than for other subtypes, consistent with the diffuse distribution observed. Using the highest probability score for each sample, 125/126 (99%) cases were classified into one subtype (Figure 1C). The number of patients per subtype was TAL/LMO (n = 54, 43%), TLX3 (n = 19, 15%), diverse (n = 19, 15%), HOXA_MLLT10 (n = 12, 10%), NKX2 (n = 8, 6%), TLX1 (n = 7, 6%), HOXA_KMT2A (n = 5, 4%) and BCL11B (n = 1, 1%). The TALLSorts algorithm failed to classify one case (126) into any subtype despite good quality RNA. Traditional testing did not reveal any relevant fusions but Arriba detected a KMT2A::MAML2 fusion (Table S1). In the original TALLSorts study, the most frequent predicted subtype was TAL/LMO (46%) followed by TLX3 (15%), diverse (15%), TLX1 (7%), NKX2 (7%), HOXA_MLLT10 (4%), HOXA_KMT2A (3%) and BCL11B (1%), in agreement with the frequencies in our study, except for the HOXA_MLLT10 subtype. HOXA_MLLT10 was called twice as frequently in our study, tallying with our observation that this subtype had a diffuse distribution (Figure 1A), and accounted for most cases with >1 positive calls (Figure S1). The recent Children's Oncology Group (COG) study based on >1300 cases of T-ALL reported RNA-sequencing defined subtypes with frequencies similar to those found in this study: TAL1 (DP) Double positive-like (22%), (ETP) Early T-cell precursor-like (18%), TAL1 αβ-like (17%), TLX3 (16%), NKX2-1 (6%) and TLX1 (6%).1 Next, we compared the results of DNA-based methods (cytogenetics, FISH and MLPA) with the TALLSorts subtypes in 63 cases where appropriate testing had been performed and found a concordance rate of 87% (55/63) (Figure 1D). Two of the discrepancies (cases 13, 14) were explained by fusions detected with Arriba. Case 13 had a SIL::TAL1 rearrangement by MLPA and both a SIL::TAL1 and PICALM::MLLT10 rearrangement were called by Arriba but was only called as HOXA_MLLT10 by TALLSorts (Figure 1E). Arriba revealed a KMT2A::MLLT1 fusion in case 14 with the TALLSorts classifier generating probability scores of 0.92 and 0.96 for HOXA_KMT2A and HOXA_MLLT10 subtypes respectively. No additional relevant fusions were detected in cases 7, 11, 12, 15, but in all four cases, TALLSorts called >1 subgroup with the second placed subtype matching the results of DNA testing (Figure 1E). We were not able to resolve the remaining two discrepancies (cases 62 and 122), but it is noteworthy that neither had high probability scores for the assigned subtype (both <0.75). In case 62 FISH detected a NKX2 rearrangement but TALLSorts did not classify it in the NKX2 subtype and no NKX2 fusion was called by Arriba despite good quality RNA. Similarly, FISH detected a KMT2A fusion in both the diagnostic and relapse samples of case 122 which was not detected by Arriba and TALLSorts did not place it in the HOXA_KMT2A subtype. Factoring in the Arriba calls as well as the second subtype scores, the concordance rate would rise to 97% (61/63). Overall, 5/8 (63%) discrepant cases had poor RNA quality due to a high number of unmapped reads, compared to 11/54 (20%) concordant cases (p = 0.011). In addition, four of five samples with poor RNA quality had >1 call (Figure 1E). However, most poor quality samples (11/16, 69%) had concordant results, so they were not excluded from the study to provide a real-life evaluation. Among 63 cases not assigned a TALLSort subtype by DNA, 11 had relevant fusions by RNA analysis Arriba which explained their classification and included two fusions missed by the DNA methods (Table S1). Generally, our T-ALL cohort had not been comprehensively tested for genetic abnormalities (Table S1). However, among 63 cases with T-ALL specific abnormalities identified by DNA methods, 55 (95%) cases had probability scores of >0.65 corresponding to the correct TALLSorts subtype (Figure 1F). Among the five cases with TALLSorts probability scores of 0.65–0.80, one case had poor RNA quality, compared to 10/50 cases with probability scores of >0.80 further supporting our decision not to exclude these samples. There were few differences between the seven subtypes in terms of clinical features and outcome (Table 1). However, patients in the TAL/LMO subtype had a higher white blood cell count (p = 0.004) and were more frequently MRD positive at the end of induction (p = 0.03). There was no difference in outcome between the subtypes. The same results were obtained even when we reassigned the five cases (7, 11, 12, 14 and 15) with aberrantly high HOXA-MLLT10 scores to relevant subtypes (Table S3). Finally, we examined the spectrum of secondary abnormalities by TALLSorts subtypes. The frequency of cases with NOTCH1/FBXW7 mutations and CDKN2A/B deletion was high across most of the subtypes as reported by other studies. The exception was the diverse subtype, where only 2/9 (11%) had a CDKN2A/B deletion (Table S4). We have confirmed the utility of TALLSorts in an independent cohort. T-ALL classification by traditional DNA tests is challenging due to the variety of mechanisms by which T-cell oncogenes can be activated. Hence, TALLSorts offers a potential alternative to performing multiple tests. We observed a high concordance rate with Standard of care (SOC) techniques as well as the successful classification of cases with non-standard rearrangements which were ambiguous or missed by SOC tests. However, TALLSorts does not detect all subtypes which is a limitation, but its creators plan the addition of new subtypes.4 The major limitation was the over-calling of the HOXA_MLLT10 subtype which accounted for >95% of cases with >1 positive calls. While we cannot exclude the possibility that these cases harbour a cryptic abnormality consistent with the HOXA_MLLT10 subtype, this is unlikely for three reasons: (1) it was the only subtype to differ in frequency with the original TALLSorts study; (2) among the 12 cases called as HOXA_MLLT10, five cases produced valid second calls supported by DNA testing; and (3) the distribution of the calls was not discrete unlike in the original TALLSorts study. TALLSorts allows users to train custom models; such flexibility could enable its integration into a T-ALL screening strategy. Conception and design: AVM, OG and CJH. Provision of data: TJ, DF, CS, AV, JR, PVV and MTR. Data analysis and interpretation: OG, REC, AE and AVM. Statistics: OG. Manuscript writing and final approval: all authors. This study was supported by the Ministry of National Education-Republic of Türkiye (International Graduate Education Scholarship [YLSY]). The authors thank the VIVO Biobank for Children and Young People with Cancer for the provision of samples. TJ, DF and MTR were employees of Illumina, a public company that develops and markets systems for genetic analysis. All other authors declare no competing interests. Figure S1. The probability scores of each subtype predicted by TALLSorts across all cases. Table S1. Demographic features, clinical features, sample detail, genetic subtype and TALLSorts data for all 126 cases in the study. Table S2. Patient characteristics and clinical outcomes of tested and total (tested + non-tested) cases with T-ALL treated on UKALL2003 and UKALL2011. Table S3. Characteristics and clinical outcomes of patients classified by the TALLSorts algorithm adjusting for cases with aberrantly high HOXA_MLLT10 scores. Table S4. Distribution of genetic alterations by TALLSorts predicted subtypes (number positive/number tested). Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

Topics & Keywords

Acute Lymphoblastic Leukemia research Lymphoma Diagnosis and Treatment Carcinogens and Genotoxicity Assessment

Publication Details

Published in: British Journal of Haematology

Volume 208, Issue 2, pp. 732-736

DOI: 10.1111/bjh.70263

Field-Weighted Citation Impact: 0.00

Command Palette

Benchmarking the paediatric <scp>T</scp> ‐cell <scp>ALL</scp> subtype classifier, <scp>TALLSorts</scp>

Authors

Abstract

Topics & Keywords

Publication Details