Skip to content

Reference guided assembly without reference selection

Victoria Cepeda edited this page Sep 6, 2018 · 8 revisions

Test input data is available in the tutorial folder:

$ ls -1 tutorial
Candidatus_Carsonella_ruddii_HT_Thao2000.fasta
c_rudii.fastq
c_rudii_reference.fna
thao2000.1.fq
thao2000.2.fq
thao.fa
thao.fq

In this example, we are assemblying a set of paired-end reads, "thao2000.1.fq" and "thao200.2.fq", using the reference genome "Candidatus_Carsonella_ruddii_HT_Thao2000.fasta" to guide the assembly:

$ python3 go_metacompass.py -r tutorial/Candidatus_Carsonella_ruddii_HT_Thao2000.fasta -P tutorial/thao2000.1.fq,tutorial/thao2000.2.fq -o tutorial_example1

You will see the following standar output while running MetaCompass:

/cbcb/software/Linux-x86_64/packages/ncbi-blast-2.4.0+/bin/blastn
/cbcb/sw/RedHat-7-x86_64/users/vcepeda/local/kmer/kmer-mask
/cbcb/sw/RedHat-7-x86_64/users/vcepeda/local/mash/1.1.1/bin/mash
/cbcb/sw/RedHat-7-x86_64/common/local/Python3/common/3.6.0/bin/snakemake
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
	count	jobs
	1	all
	1	assemble_unmapped
	1	bam_sort
	1	bowtie2_map
	1	build_contigs
	1	create_tsv
	1	join_contigs
	1	merge_reads
	1	pilon_contigs
	1	pilon_map
	1	sam_to_bam
	11
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (1):
	merge_reads
Selected jobs (1):
	merge_reads
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

---merge fastq reads
Reason: Missing output files: tutorial_example1/thao2000.merged.fq

Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 10.
1 of 11 steps (9%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (1):
	bowtie2_map
Selected jobs (1):
	bowtie2_map
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

---Build index .
Reason: Missing output files: tutorial_example1/thao2000.0.assembly.out/thao2000.sam; Input files updated by another job: tutorial_example1/thao2000.merged.fq

Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 8.
2 of 11 steps (18%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (1):
	build_contigs
Selected jobs (1):
	build_contigs
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

---Build contigs .
Reason: Missing output files: tutorial_example1/thao2000.0.assembly.out/contigs.fasta; Input files updated by another job: tutorial_example1/thao2000.0.assembly.out/thao2000.sam

Skipped removing non-empty directory tutorial_example1/thao2000.0.assembly.out
Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 5.
3 of 11 steps (27%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (1):
	pilon_map
Selected jobs (1):
	pilon_map
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

---Map reads for pilon polishing.
Reason: Missing output files: tutorial_example1/thao2000.0.assembly.out/thao2000.mc.sam.unmapped.2.fq, tutorial_example1/thao2000.0.assembly.out/thao2000.mc.sam.unmapped.1.fq, tutorial_example1/thao2000.0.assembly.out/thao2000.mc.sam; Input files updated by another job: tutorial_example1/thao2000.0.assembly.out/contigs.fasta

Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 6.
4 of 11 steps (36%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (2):
	assemble_unmapped
	sam_to_bam
Selected jobs (1):
	sam_to_bam
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

---Convert sam to bam .
Reason: Missing output files: tutorial_example1/thao2000.0.assembly.out/thao2000.mc.sam.bam; Input files updated by another job: tutorial_example1/thao2000.0.assembly.out/thao2000.mc.sam

Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 9.
5 of 11 steps (45%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (2):
	bam_sort
	assemble_unmapped
Selected jobs (1):
	bam_sort
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

---Sort bam .
Reason: Missing output files: tutorial_example1/thao2000.0.assembly.out/sorted.bam; Input files updated by another job: tutorial_example1/thao2000.0.assembly.out/thao2000.mc.sam.bam

Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 7.
6 of 11 steps (55%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (2):
	pilon_contigs
	assemble_unmapped
Selected jobs (1):
	pilon_contigs
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

---Pilon polish contigs .
Reason: Missing output files: tutorial_example1/thao2000.0.assembly.out/contigs.pilon.fasta; Input files updated by another job: tutorial_example1/thao2000.0.assembly.out/contigs.fasta, tutorial_example1/thao2000.0.assembly.out/sorted.bam

Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 4.
7 of 11 steps (64%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (1):
	assemble_unmapped
Selected jobs (1):
	assemble_unmapped
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

---Assemble unmapped reads .
Reason: Missing output files: tutorial_example1/thao2000.0.assembly.out/thao2000.megahit/final.contigs.fa; Input files updated by another job: tutorial_example1/thao2000.0.assembly.out/thao2000.mc.sam.unmapped.1.fq, tutorial_example1/thao2000.0.assembly.out/thao2000.mc.sam.unmapped.2.fq

Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 3.
8 of 11 steps (73%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (1):
	join_contigs
Selected jobs (1):
	join_contigs
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

---concanenate reference-guided and de novo contigs
Reason: Missing output files: tutorial_example1/thao2000.0.assembly.out/contigs.final.fasta; Input files updated by another job: tutorial_example1/thao2000.0.assembly.out/thao2000.megahit/final.contigs.fa, tutorial_example1/thao2000.0.assembly.out/contigs.pilon.fasta

Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 1.
9 of 11 steps (82%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (1):
	create_tsv
Selected jobs (1):
	create_tsv
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

---information reference-guided and de novo contigs
Reason: Missing output files: tutorial_example1/metacompass_summary.tsv; Input files updated by another job: tutorial_example1/thao2000.0.assembly.out/contigs.final.fasta, tutorial_example1/thao2000.0.assembly.out/thao2000.megahit/final.contigs.fa, tutorial_example1/thao2000.0.assembly.out/contigs.fasta, tutorial_example1/thao2000.0.assembly.out/contigs.pilon.fasta

Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 2.
10 of 11 steps (91%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (1):
	all
Selected jobs (1):
	all
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

localrule all:
    input: tutorial_example1/metacompass_summary.tsv
    jobid: 0
    reason: Input files updated by another job: tutorial_example1/metacompass_summary.tsv

Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 0.
11 of 11 steps (100%) done
unlocking
removing lock
removing lock
removed all locks
confirming file containing reference genomes exists..
[OK]
checking for dependencies (Bowtie2, Blast, kmermask, Snakemake, etc)
Bowtie2--->[OK]
Blast+--->[OK]
kmer-mask--->[OK]
mash--->[OK]
Snakemake--->[OK]
Cleaning up files..
MetaCompass finished succesfully!

The default output is in the user specified directory tutorial_example1:

$ ls -1 tutorial_example1
metacompass_logs
metacompass_output

"metacompass_logs" folder contains individual log files for each software run:

$ ls -1 tutorial_example1/metacompass_logs:
thao2000.0.bowtie2map.log
thao2000.0.megahit.log
thao2000.0.pilon.map.log
thao2000.buildcontigs.log
thao2000.pilon.log
thao2000.samtools.log

log files include alignment, reference-guided assembly, reference-guided error correction, and de novo assembly info. The bowtie2 log life in this example has the read mapping information:

$cat tutorial_example1/metacompass_logs/thao2000.0.bowtie2map.log
200000 reads; of these:
  200000 (100.00%) were unpaired; of these:
    0 (0.00%) aligned 0 times
    199189 (99.59%) aligned exactly 1 time
    811 (0.41%) aligned >1 times
100.00% overall alignment rate

"metacompass_output" contains the results, exaplained in Analyzing MetaCompass results":

tutorial_example1/metacompass_output:
metacompass_assembly_stats.tsv
metacompass.final.ctg.fa
metacompass.genomes_coverage.txt
metacompass_summary.tsv

The -k option keeps intermediate outputs from MetaCompass. The intermediate_files folder contains files required for some downstream analysis. To keep intermediate outputs and redirect standard output to a file modified the previous command as follows:

$ python3 go_metacompass.py -r tutorial/Candidatus_Carsonella_ruddii_HT_Thao2000.fasta -P tutorial/thao2000.1.fq,tutorial/thao2000.2.fq -o tutorial_example1 -k > tutorial_example1.metacompass.log 2>&1

The results are the same as the one found in the previous example but with the addition of the intermediate_files folder:

$ ls -1 tutorial_example1_k
intermediate_files
metacompass_logs
metacompass_output

The intermediate_files folder contains the following subfolders:

$ ls -1 tutorial_example1_k/intermediate_files
assembly_output
mapped_reads
pilon_output

The detail of each is subfolder is:

$ ls tutorial_example1_k/intermediate_files/* 
tutorial_example1_k/intermediate_files/assembly_output:
contigs.fasta
selected_maps.sam
thao2000.sam

tutorial_example1_k/intermediate_files/mapped_reads:
thao2000.mc.sam
thao2000.mc.sam.bam

tutorial_example1_k/intermediate_files/pilon_output:
contigs.pilonBadCoverage.wig
contigs.pilon.changes
contigs.pilonChanges.wig
contigs.pilonClippedAlignments.wig
contigs.pilonCopyNumber.wig
contigs.pilonCoverage.wig
contigs.pilonDeltaCoverage.wig
contigs.pilonDipCoverage.wig
contigs.pilon.fasta
contigs.pilon.fasta.fai
contigs.pilonGC.wig
contigs.pilonPctBad.wig
contigs.pilonPhysicalCoverage.wig
contigs.pilonPilon.bed
contigs.pilonUnconfirmed.wig
contigs.pilon.vcf
contigs.pilonWeightedMq.wig
contigs.pilonWeightedQual.wig
sorted.bam
sorted.bam.bai