kallisto/sleuth使いたい人生だった v2
昔Kallistoの使い方の記事を書いたんですけど,だいぶバージョンが古い奴だったので
## インストール
```
% wget https://github.com/pachterlab/kallisto/releases/download/v0.43.0/kallisto_linux-v0.43.0.tar.gz
% tar zxfv kallisto_linux-v0.43.0.tar.gz
% cd kallisto_linux-v0.43.0
% ./kallisto
kallisto 0.43.0
Usage: kallisto <CMD> [arguments] ..
Where <CMD> can be one of:
index Builds a kallisto index
quant Runs the quantification algorithm
pseudo Runs the pseudoalignment step
h5dump Converts HDF5-formatted results to plaintext
version Prints version information
cite Prints citation information
Running kallisto <CMD> without arguments prints usage information for <CMD>
```
僕は~/opt/binにすべてのソフトを入れて,パスを通しているので,
```
% cp ./kallisto ~/opt/bin
% cd ~
% kallisto
kallisto 0.43.0
Usage: kallisto <CMD> [arguments] ..
Where <CMD> can be one of:
index Builds a kallisto index
quant Runs the quantification algorithm
pseudo Runs the pseudoalignment step
h5dump Converts HDF5-formatted results to plaintext
version Prints version information
cite Prints citation information
Running kallisto <CMD> without arguments prints usage information for <CMD>
```
となります.
## 実際に動かす
例えば,とある生物全CDS配列 (transcripts.fa)と,RNA-Seqデータ (sample_R1.fq, sample_R2.fq)があるとします.
まずはkallistoのindexを作成します.
```
% kallisto index
kallisto 0.43.0
Builds a kallisto index
Usage: kallisto index [arguments] FASTA-files
Required argument:
-i, --index=STRING Filename for the kallisto index to be constructed
Optional argument:
-k, --kmer-size=INT k-mer (odd) length (default: 31, max value: 31)
--make-unique Replace repeated target names with unique names
% kallisto index -i transcripts.fa.kallisto transcripts.fa
```
こちらにpaired endなデータをpseudoalignmentしてTPMを計算します.
```
% kallisto quant
kallisto 0.43.0
Computes equivalence classes for reads and quantifies abundances
Usage: kallisto quant [arguments] FASTQ-files
Required arguments:
-i, --index=STRING Filename for the kallisto index to be used for
quantification
-o, --output-dir=STRING Directory to write output to
Optional arguments:
--bias Perform sequence based bias correction
-b, --bootstrap-samples=INT Number of bootstrap samples (default: 0)
--seed=INT Seed for the bootstrap sampling (default: 42)
--plaintext Output plaintext instead of HDF5
--single Quantify single-end reads
--fr-stranded Strand specific reads, first read forward
--rf-stranded Strand specific reads, first read reverse
-l, --fragment-length=DOUBLE Estimated average fragment length
-s, --sd=DOUBLE Estimated standard deviation of fragment length
(default: value is estimated from the input data)
-t, --threads=INT Number of threads to use (default: 1)
--pseudobam Output pseudoalignments in SAM format to stdout
%kallisto quant -i transcripts.fa.kallisto -o sample.kallisto --bias -b 100 -t 32 sample_R1.fq sample_R2.fq
```
-t でコア数を指定しますが、基本的に1分程度で終ります.
=========
とろあえずここまで。。。
後で続き書きます.