0 Posted 2024-02-05Updated 2024-02-14Biology / Bioinformatics / Software / Fasta/qa minute read (About 134 words)

SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation

Install

wget https://github.com/shenwei356/seqkit/releases/download/v2.7.0/seqkit_linux_amd64.tar.gz
tar -zxvf seqkit_linux_amd64.tar.gz

seqkit fq2fa output_directory/output_prefix.extendedFrags.fastq -o output_directory/output_prefix.merged.fasta

seqkit rmdup -s sequences.fasta -o unique_sequences.fasta -D counts.tsv

-s: Specifies that duplicates should be identified based on sequence content.
[input_file]: Replace this with the path to your input FASTA or FASTQ file.
-o [output_file]: Specifies the output file. Replace [output_file] with the desired path for the file containing the sequences after duplicate removal.
-D: write all removed duplicates (and counts) to this specified file.

SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation