Status : Verified
Personal Name Hugo, Joshua Emmanuel L.
Resource Title Compositional Data Analysis Based on Unsupervised Clustering of Functional Redundancy of Epitope Data to Enhance Discoverability of Target-Binding Peptide Motifs from High-Throughput Sequencing of Panned Phage Display Libraries
Date Issued 29 July 2025
Abstract Phage display is a well-established platform for exploring and engineering protein-ligand interactions with applications in medicine, biotechnology, and nanotechnology. The integration of high-throughput sequencing (HTS) has transformed phage display by enabling deep profiling of entire phage pools, offering comprehensive insights into a phage library’s diversity and selection dynamics. However, the large quantity and complexity of HTS data pose analytical challenges, particularly in identifying true target-binding sequences from non-specific ones. To address this, an alternative computational pipeline was developed that incorporates two novel approaches: (a) clustering based on Functional Redundancy of Epitope Data to group functionally similar sequences, and (b) differential abundance analysis using ALDEx2, which implements compositional data analysis to identify significantly differentially abundant clusters. The pipeline was applied to three datasets: (1) Fyn SH2, (2) PHASTpep, and (3) Canine PD-1, to exemplify its capability in discovering target-binding peptide motifs by comparing its results with identified binders or motifs reported in the literature. It captured 20 out of 22 experimentally verified reported binders in Fyn SH2, identified four cluster chain IDs containing a total of 45 HPQ-containing sequences known to bind Streptavidin in PHASTpep, and detected sequences that align with reported putative epitopes in Canine PD-1. Moreover, the comparison of the analysis on ALDEx2 with and without FRED clustering revealed the benefit of FRED clustering in uncovering patterns obfuscated by the volume of data generated by HTS. These results highlight the effectiveness of the pipeline in discovering target-binding peptide motifs and demonstrate its potential as a valuable tool for phage display HTS data analysis, with the ability to enhance current computational workflows and contribute to advances in the bioinformatics community.
Degree Course Master of Science in Chemical Engineering
Language English
Keyword phage display, high-throughput sequencing, bioinformatics pipeline, peptide motif discovery
Material Type Thesis/Dissertation
Preliminary Pages
892.26 Kb
Category : I - Has patentable or registrable invention of creation.
 
Access Permission : Limited Access