kallisto/sleuth使いたい人生だった v2

昔Kallistoの使い方の記事を書いたんですけど，だいぶバージョンが古い奴だったので

## インストール

```
% wget https://github.com/pachterlab/kallisto/releases/download/v0.43.0/kallisto_linux-v0.43.0.tar.gz
% tar zxfv kallisto_linux-v0.43.0.tar.gz
% cd kallisto_linux-v0.43.0
% ./kallisto
kallisto 0.43.0

Usage: kallisto <CMD> [arguments] ..

Where <CMD> can be one of:

index Builds a kallisto index
quant Runs the quantification algorithm
pseudo Runs the pseudoalignment step
h5dump Converts HDF5-formatted results to plaintext
version Prints version information
cite Prints citation information

Running kallisto <CMD> without arguments prints usage information for <CMD>
```

僕は~/opt/binにすべてのソフトを入れて，パスを通しているので，

```
% cp ./kallisto ~/opt/bin
% cd ~
% kallisto
kallisto 0.43.0

Usage: kallisto <CMD> [arguments] ..

Where <CMD> can be one of:

Running kallisto <CMD> without arguments prints usage information for <CMD>
```

となります．

## 実際に動かす
例えば，とある生物全CDS配列 (transcripts.fa)と，RNA-Seqデータ (sample_R1.fq, sample_R2.fq)があるとします．

まずはkallistoのindexを作成します．

```
% kallisto index
kallisto 0.43.0
Builds a kallisto index

Usage: kallisto index [arguments] FASTA-files

Required argument:
-i, --index=STRING Filename for the kallisto index to be constructed

Optional argument:
-k, --kmer-size=INT k-mer (odd) length (default: 31, max value: 31)
--make-unique Replace repeated target names with unique names

% kallisto index -i transcripts.fa.kallisto transcripts.fa
```

こちらにpaired endなデータをpseudoalignmentしてTPMを計算します．

```
% kallisto quant
kallisto 0.43.0
Computes equivalence classes for reads and quantifies abundances

Usage: kallisto quant [arguments] FASTQ-files

Required arguments:
-i, --index=STRING Filename for the kallisto index to be used for
quantification
-o, --output-dir=STRING Directory to write output to

Optional arguments:
--bias Perform sequence based bias correction
-b, --bootstrap-samples=INT Number of bootstrap samples (default: 0)
--seed=INT Seed for the bootstrap sampling (default: 42)
--plaintext Output plaintext instead of HDF5
--single Quantify single-end reads
--fr-stranded Strand specific reads, first read forward
--rf-stranded Strand specific reads, first read reverse
-l, --fragment-length=DOUBLE Estimated average fragment length
-s, --sd=DOUBLE Estimated standard deviation of fragment length
(default: value is estimated from the input data)
-t, --threads=INT Number of threads to use (default: 1)
--pseudobam Output pseudoalignments in SAM format to stdout

%kallisto quant -i transcripts.fa.kallisto -o sample.kallisto --bias -b 100 -t 32 sample_R1.fq sample_R2.fq
```

-t でコア数を指定しますが、基本的に1分程度で終ります．

=========

とろあえずここまで。。。

後で続き書きます．

Aide-memoire

Bio-infomaticsで困った事の備忘録的なサムシング

kallisto/sleuth使いたい人生だった v2