htseq-count的坑 | zhangchaofan's blog

发布日期: 2024-09-27

更新日期: 2024-09-27

文章字数: 369

阅读时长: 1 分

最近在跑RNA-seq碰到一个自己挖的坑，samtools将sam文件转为二进制的bam文件并排序，默认是按照pos进行排序的。htseq-count默认的输入sam|bam是按照name排序的，这样就导致了一个问题，htseq-count会把不在相邻行的paried-end reads当成一条序列：

Warning: 16607642 reads with missing mate encountered.

后果是导致虚假的输出： Notice that your BAM file is probably sorted by position, but you're not using the -r pos option. That will most likely result in bogus output. 我们需要设置htseq-count的-r pos参数，或者在对bam文件进行排序时，保留按name进行排序。

Usage: samtools sort [options...] [in.bam]
Options:
  -l INT     Set compression level, from 0 (uncompressed) to 9 (best)
  -u         Output uncompressed data (equivalent to -l 0)
  -m INT     Set maximum memory per thread; suffix K/M/G recognized [768M]
  -M         Use minimiser for clustering unaligned/unplaced reads
  -K INT     Kmer size to use for minimiser [20]
  -n         Sort by read name (not compatible with samtools index command)
  -t TAG     Sort by value of TAG. Uses position as secondary index (or read name if -n is set)
  -o FILE    Write final output to FILE rather than standard output
  -T PREFIX  Write temporary files to PREFIX.nnnn.bam
  --no-PG    do not add a PG line
      --input-fmt-option OPT[=VAL]
               Specify a single input file format option in the form
               of OPTION or OPTION=VALUE
  -O, --output-fmt FORMAT[,OPT[=VAL]]...
               Specify output format (SAM, BAM, CRAM)
      --output-fmt-option OPT[=VAL]
               Specify a single output file format option in the form
               of OPTION or OPTION=VALUE
      --reference FILE
               Reference sequence FASTA FILE [null]
  -@, --threads INT
               Number of additional threads to use [0]
      --write-index
               Automatically index the output files [off]
      --verbosity INT
               Set level of verbosity

zhangchaofan

https://zhangchaofan01.gitee.io/2024/09/27/htseq-count-de-keng/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 zhangchaofan !

linux

slurm单机部署

2024-10-10 zhangchaofan

linux

linux查看GPU状态

2024-08-25 zhangchaofan

linux