-s: Specifies that duplicates should be identified based on sequence content.
[input_file]: Replace this with the path to your input FASTA or FASTQ file.
-o [output_file]: Specifies the output file. Replace [output_file] with the desired path for the file containing the sequences after duplicate removal.
-D: write all removed duplicates (and counts) to this specified file.
Sequence Statitic
seqkit stats your_R1.fastq.gz
file format type num_seqs sum_len min_len avg_len max_len
your_R1.fastq.gz FASTQ DNA 15,800,000 2,370,000,000 150 150 150