SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation© Della-3

SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation

Install

GitHub: shenwei356/seqkit

wget https://github.com/shenwei356/seqkit/releases/download/v2.7.0/seqkit_linux_amd64.tar.gz
tar -zxvf seqkit_linux_amd64.tar.gz

Convert the Fastq to Fasta

seqkit fq2fa output_directory/output_prefix.extendedFrags.fastq -o output_directory/output_prefix.merged.fasta

Remove Duplicated Sequence

seqkit rmdup -s sequences.fasta -o unique_sequences.fasta -D counts.tsv
  • -s: Specifies that duplicates should be identified based on sequence content.
  • [input_file]: Replace this with the path to your input FASTA or FASTQ file.
  • -o [output_file]: Specifies the output file. Replace [output_file] with the desired path for the file containing the sequences after duplicate removal.
  • -D: write all removed duplicates (and counts) to this specified file.

SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation

https://karobben.github.io/2024/02/05/Bioinfor/seqkit/

Author

Karobben

Posted on

2024-02-05

Updated on

2024-02-14

Licensed under

Comments