当前位置: X-MOL 学术arXiv.cs.CE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ASAP-SML: An Antibody Sequence Analysis Pipeline Using Statistical Testing and Machine Learning
arXiv - CS - Computational Engineering, Finance, and Science Pub Date : 2020-03-08 , DOI: arxiv-2003.03811
Xinmeng Li, James A. Van Deventer, and Soha Hassoun

Antibodies are capable of potently and specifically binding individual antigens and, in some cases, disrupting their functions. The key challenge in generating antibody-based inhibitors is the lack of fundamental information relating sequences of antibodies to their unique properties as inhibitors. We develop a pipeline, Antibody Sequence Analysis Pipeline using Statistical testing and Machine Learning (ASAP-SML), to identify features that distinguish one set of antibody sequences from antibody sequences in a reference set. The pipeline extracts feature fingerprints from sequences. The fingerprints represent germline, CDR canonical structure, isoelectric point and frequent positional motifs. Machine learning and statistical significance testing techniques are applied to antibody sequences and extracted feature fingerprints to identify distinguishing feature values and combinations thereof. To demonstrate how it works, we applied the pipeline on sets of antibody sequences known to bind or inhibit the activities of matrix metalloproteinases (MMPs), a family of zinc-dependent enzymes that promote cancer progression and undesired inflammation under pathological conditions, against reference datasets that do not bind or inhibit MMPs. ASAP-SML identifies features and combinations of feature values found in the MMP-targeting sets that are distinct from those in the reference sets.

中文翻译:

ASAP-SML:使用统计测试和机器学习的抗体序列分析管道

抗体能够有效地和特异性地结合单个抗原,并在某些情况下破坏它们的功能。产生基于抗体的抑制剂的主要挑战是缺乏将抗体序列与其作为抑制剂的独特性质相关联的基本信息。我们开发了一个管道,即使用统计测试和机器学习 (ASAP-SML) 的抗体序列分析管道,以识别将一组抗体序列与参考集中的抗体序列区分开来的特征。管道从序列中提取特征指纹。指纹代表种系、CDR 规范结构、等电点和频繁位置基序。将机器学习和统计显着性测试技术应用于抗体序列并提取特征指纹以识别区分特征值及其组合。为了证明它是如何工作的,我们将管道应用于已知结合或抑制基质金属蛋白酶 (MMP) 活性的抗体序列集,基质金属蛋白酶 (MMP) 是一种锌依赖性酶家族,可在病理条件下促进癌症进展和不希望的炎症,对照参考数据集不结合或抑制 MMP。ASAP-SML 识别在 MMP 目标集中发现的特征和特征值的组合,这些特征值与参考集中的特征值不同。我们将管道应用于已知结合或抑制基质金属蛋白酶 (MMP) 活性的抗体序列集,这是一种锌依赖性酶家族,可在病理条件下促进癌症进展和不希望的炎症,对照不结合或抑制的参考数据集基质金属蛋白酶。ASAP-SML 识别在 MMP 目标集中发现的特征和特征值的组合,这些特征值与参考集中的特征值不同。我们将管道应用于已知结合或抑制基质金属蛋白酶 (MMP) 活性的抗体序列集,这是一种锌依赖性酶家族,可在病理条件下促进癌症进展和不希望的炎症,对照不结合或抑制的参考数据集基质金属蛋白酶。ASAP-SML 识别在 MMP 目标集中发现的特征和特征值的组合,这些特征值与参考集中的特征值不同。
更新日期:2020-07-01
down
wechat
bug