Immunoglobulin BLAST (Igblast), a Blast Tool Specific for Antibodies
Key Features of IgBLAST
-
Identification of V(D)J segments: IgBLAST can identify variable (V), diversity (D), and joining (J) gene segments in IG or TCR sequences.
-
Clonotype Analysis: It helps in determining clonotypes based on V(D)J segment usage, providing insights into the diversity and clonality of IG or TCR repertoires.
-
Somatic Hypermutation Analysis: It identifies somatic hypermutations in IG sequences and can compare these to germline sequences, which is critical in understanding adaptive immune responses.
-
Flexible Input Options: IgBLAST can process both nucleotide and protein sequences, and it supports various input formats.
-
Detailed Alignment Information: It provides detailed alignment results that include information about gene segments, framework regions, complementarity-determining regions (CDRs), and mutations.
-
Integration with Other Databases: The results can be linked to other NCBI databases for additional information and analysis.
IgBLAST is widely used in immunology and related fields for studying B cell and T cell receptor repertoire, which is crucial for understanding immune responses, vaccine development, and in the study of autoimmune diseases and cancer.
Local Set Up
- Basically, you can use the online service: NCBI igblast
- Set up by following the official documentation: NCBI igblast set up
Here is an example of set up by using conda from nicwulab/SARS-CoV-2_Abs
|
Error in setup
During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/mnt/Data/PopOS/miniconda/envs/bio311/lib/python3.11/site-packages/crowelab_pyir/data/bin/setup_germline_library.py", line 112, infor line in urllib.request.urlopen(locus_url): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ . . . urllib.error.URLError:
In the pyir
, it is using 'http' and download the data failed. By following the error code, we could find the script and alter the 'http' to 'https'. It should solving the problem.
In this case, I high light the most important part in the error code with red. For make a quick correction, we could run:
sed -i 's/http:/https:/g' /mnt/Data/PopOS/miniconda/envs/bio311/lib/python3.11/site-packages/crowelab_pyir/data/bin/setup_germline_library.py
- prepare the DataBase
- run the blast
|
-germline_db_VGermline database name -organism The organism for your query sequence. Supported organisms include human, mouse, rat, rabbit and rhesus_monkey for Ig and human and mouse for TCR. Custom organism is also supported but you need to supply your own germline annotations (see IgBLAST web site for details) Default = `human'
Build Your Reference
- Download the Reference
Go to the database like IMGT and select the organism. Download all IGHV, IGHD, IGHJ, etc. - Change the name of the sequence. For example, when you read the download file, the name of the sequence would be like:
>KT723008|IGHD1-1*01|Bos taurus_Holstein|F|D-REGION|284884..284914|31 nt|1| | | | |31+0=31| | |
. We only keep the most important informaion:>IGHD2-1*01
. (A quick way is to using the script below) - After that, build the index by
blast
:makeblastdb -parse_seqids -dbtype nucl -in IGV
- Build the
aux
file and run igblast
|
pyir
If you installed pyir, we could use the pyir to do the igblast with less parameters.
|
Key parameters:
--sequence_type {nucl,prot} default: nucl -m MULTI, --multi MULTI Number of threads -o, --out default: inputfile.json.gz --outfmt {tsv,lsjson,dict,json} suggest: tsv --igdata IGDATA Path to your IGDATA directory. -r, --receptor {Ig,TCR} The receptor you are analyzing, immunoglobulin or t cell receptor -s, --species {human,mouse...} The Species you are analyzing {human,mouse,rabbit,rat,rhesus_monkey}
Q&A
Can I annotate the light chain and heavy chain simultaneously?
IgBLAST is designed to analyze immunoglobulin (IG) sequences, including both heavy and light chains. However, it typically processes and analyzes these chains separately. When you input a sequence that contains both heavy and light chains, IgBLAST might only process the first recognizable sequence, which in your case appears to be the heavy chain.
To analyze both heavy and light chains using IgBLAST, you generally need to input them as separate sequences. This means splitting your combined sequence into two parts - one for the heavy chain and the other for the light chain - and then running IgBLAST for each part individually.
There isn't a parameter in IgBLAST that allows for the simultaneous analysis of both heavy and light chains when they are combined into a single sequence. The tool's algorithm is designed to identify and annotate the V(D)J segments of a single chain at a time, as the structure and sequence features of heavy and light chains are distinct.
If you are consistently working with sequences that contain both chains, you may need to develop a preprocessing step in your workflow to separate these chains before analysis. Alternatively, if such a tool is essential for your work, you might need to look into other bioinformatics tools or custom scripting to first identify and separate the heavy and light chain sequences before feeding them into IgBLAST.
Immunoglobulin BLAST (Igblast), a Blast Tool Specific for Antibodies