Immunoglobulin BLAST (Igblast), a Blast Tool Specific for Antibodies© Karobben

Immunoglobulin BLAST (Igblast), a Blast Tool Specific for Antibodies

Key Features of IgBLAST

  1. Identification of V(D)J segments: IgBLAST can identify variable (V), diversity (D), and joining (J) gene segments in IG or TCR sequences.

  2. Clonotype Analysis: It helps in determining clonotypes based on V(D)J segment usage, providing insights into the diversity and clonality of IG or TCR repertoires.

  3. Somatic Hypermutation Analysis: It identifies somatic hypermutations in IG sequences and can compare these to germline sequences, which is critical in understanding adaptive immune responses.

  4. Flexible Input Options: IgBLAST can process both nucleotide and protein sequences, and it supports various input formats.

  5. Detailed Alignment Information: It provides detailed alignment results that include information about gene segments, framework regions, complementarity-determining regions (CDRs), and mutations.

  6. Integration with Other Databases: The results can be linked to other NCBI databases for additional information and analysis.

IgBLAST is widely used in immunology and related fields for studying B cell and T cell receptor repertoire, which is crucial for understanding immune responses, vaccine development, and in the study of autoimmune diseases and cancer.

Local Set Up

Here is an example of set up by using conda from nicwulab/SARS-CoV-2_Abs

conda create -n Abs -c bioconda \
python=3.9 \
igblast

conda activate Abs
# install pyir and use it to set up the blast database
pip3 install crowelab_pyir
pyir setup

Error in setup

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/Data/PopOS/miniconda/envs/bio311/lib/python3.11/site-packages/crowelab_pyir/data/bin/setup_germline_library.py", line 112, in 
for line in urllib.request.urlopen(locus_url):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.
.
.
urllib.error.URLError: 

In the pyir, it is using 'http' and download the data failed. By following the error code, we could find the script and alter the 'http' to 'https'. It should solving the problem.
In this case, I high light the most important part in the error code with red. For make a quick correction, we could run: sed -i 's/http:/https:/g' /mnt/Data/PopOS/miniconda/envs/bio311/lib/python3.11/site-packages/crowelab_pyir/data/bin/setup_germline_library.py

  1. prepare the DataBase
  2. run the blast
igblastn -query result/test.fasta \
-germline_db_V imgt_database/human_nuc/IGV.fasta \
-germline_db_J imgt_database/human_nuc/IGJ.fasta \
-germline_db_D imgt_database/human_nuc/IGD.fasta \
-organism human -domain_system kabat \
-auxiliary_data imgt_database/optional_file/human_gl.aux \
-out result/igblast_output
-germline_db_V  Germline database name
-organism  The organism for your query sequence. Supported organisms include human, mouse, rat, rabbit and rhesus_monkey for Ig and human and mouse for TCR. Custom organism is also supported but you need to supply your own germline annotations (see IgBLAST web site for details) Default = `human'

Build Your Reference

  1. Download the Reference
    Go to the database like IMGT and select the organism. Download all IGHV, IGHD, IGHJ, etc.
  2. Change the name of the sequence. For example, when you read the download file, the name of the sequence would be like: >KT723008|IGHD1-1*01|Bos taurus_Holstein|F|D-REGION|284884..284914|31 nt|1| | | | |31+0=31| | |. We only keep the most important informaion: >IGHD2-1*01. (A quick way is to using the script below)
  3. After that, build the index by blast: makeblastdb -parse_seqids -dbtype nucl -in IGV
  4. Build the aux file and run igblast
from Bio import SeqIO
import os

for key in "VDJ":
print(key)
Files = [i for i in os.listdir() if i[-7:] == key + '.fasta']
Final_seq = []
for file in Files:
for seq_record in SeqIO.parse(file, "fasta"):
id_tmp = seq_record.id.split('|')[1]
seq_tmp = str(seq_record.seq).replace('.', '')
Final_seq += [f">{id_tmp}\n{seq_tmp}"]

with open(f'IG{key}', 'w') as F:
F.write("\n".join(Final_seq))

pyir

If you installed pyir, we could use the pyir to do the igblast with less parameters.

pyir -m 60  result/clean_split.fa --outfmt tsv -o result/clean

Key parameters:

--sequence_type {nucl,prot}     default: nucl
-m MULTI, --multi MULTI         Number of threads
-o,  --out                      default: inputfile.json.gz
--outfmt {tsv,lsjson,dict,json} suggest: tsv
--igdata IGDATA                 Path to your IGDATA directory.
-r, --receptor {Ig,TCR}         The receptor you are analyzing, immunoglobulin or t cell receptor
-s, --species {human,mouse...}  The Species you are analyzing {human,mouse,rabbit,rat,rhesus_monkey}

Q&A

Can I annotate the light chain and heavy chain simultaneously?

IgBLAST is designed to analyze immunoglobulin (IG) sequences, including both heavy and light chains. However, it typically processes and analyzes these chains separately. When you input a sequence that contains both heavy and light chains, IgBLAST might only process the first recognizable sequence, which in your case appears to be the heavy chain.

To analyze both heavy and light chains using IgBLAST, you generally need to input them as separate sequences. This means splitting your combined sequence into two parts - one for the heavy chain and the other for the light chain - and then running IgBLAST for each part individually.

There isn't a parameter in IgBLAST that allows for the simultaneous analysis of both heavy and light chains when they are combined into a single sequence. The tool's algorithm is designed to identify and annotate the V(D)J segments of a single chain at a time, as the structure and sequence features of heavy and light chains are distinct.

If you are consistently working with sequences that contain both chains, you may need to develop a preprocessing step in your workflow to separate these chains before analysis. Alternatively, if such a tool is essential for your work, you might need to look into other bioinformatics tools or custom scripting to first identify and separate the heavy and light chain sequences before feeding them into IgBLAST.

Immunoglobulin BLAST (Igblast), a Blast Tool Specific for Antibodies

https://karobben.github.io/2023/12/21/Bioinfor/igblast/

Author

Karobben

Posted on

2023-12-21

Updated on

2024-12-23

Licensed under

Comments