-
Notifications
You must be signed in to change notification settings - Fork 11
Reference guided assembly without reference selection
Test input data is available in the tutorial folder:
$ ls -1 tutorial
Candidatus_Carsonella_ruddii_HT_Thao2000.fasta
c_rudii.fastq
c_rudii_reference.fna
thao2000.1.fq
thao2000.2.fq
thao.fa
thao.fq
In this example, we are assemblying a set of paired-end reads, "thao2000.1.fq" and "thao200.2.fq", using the reference genome "Candidatus_Carsonella_ruddii_HT_Thao2000.fasta" to guide the assembly:
$ python3 go_metacompass.py -r tutorial/Candidatus_Carsonella_ruddii_HT_Thao2000.fasta -P tutorial/thao2000.1.fq,tutorial/thao2000.2.fq -o tutorial_example1
You will see the following standar output while running MetaCompass:
/cbcb/software/Linux-x86_64/packages/ncbi-blast-2.4.0+/bin/blastn
/cbcb/sw/RedHat-7-x86_64/users/vcepeda/local/kmer/kmer-mask
/cbcb/sw/RedHat-7-x86_64/users/vcepeda/local/mash/1.1.1/bin/mash
/cbcb/sw/RedHat-7-x86_64/common/local/Python3/common/3.6.0/bin/snakemake
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
1 assemble_unmapped
1 bam_sort
1 bowtie2_map
1 build_contigs
1 create_tsv
1 join_contigs
1 merge_reads
1 pilon_contigs
1 pilon_map
1 sam_to_bam
11
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (1):
merge_reads
Selected jobs (1):
merge_reads
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}
---merge fastq reads
Reason: Missing output files: tutorial_example1/thao2000.merged.fq
Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 10.
1 of 11 steps (9%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (1):
bowtie2_map
Selected jobs (1):
bowtie2_map
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}
---Build index .
Reason: Missing output files: tutorial_example1/thao2000.0.assembly.out/thao2000.sam; Input files updated by another job: tutorial_example1/thao2000.merged.fq
Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 8.
2 of 11 steps (18%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (1):
build_contigs
Selected jobs (1):
build_contigs
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}
---Build contigs .
Reason: Missing output files: tutorial_example1/thao2000.0.assembly.out/contigs.fasta; Input files updated by another job: tutorial_example1/thao2000.0.assembly.out/thao2000.sam
Skipped removing non-empty directory tutorial_example1/thao2000.0.assembly.out
Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 5.
3 of 11 steps (27%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (1):
pilon_map
Selected jobs (1):
pilon_map
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}
---Map reads for pilon polishing.
Reason: Missing output files: tutorial_example1/thao2000.0.assembly.out/thao2000.mc.sam.unmapped.2.fq, tutorial_example1/thao2000.0.assembly.out/thao2000.mc.sam.unmapped.1.fq, tutorial_example1/thao2000.0.assembly.out/thao2000.mc.sam; Input files updated by another job: tutorial_example1/thao2000.0.assembly.out/contigs.fasta
Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 6.
4 of 11 steps (36%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (2):
assemble_unmapped
sam_to_bam
Selected jobs (1):
sam_to_bam
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}
---Convert sam to bam .
Reason: Missing output files: tutorial_example1/thao2000.0.assembly.out/thao2000.mc.sam.bam; Input files updated by another job: tutorial_example1/thao2000.0.assembly.out/thao2000.mc.sam
Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 9.
5 of 11 steps (45%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (2):
bam_sort
assemble_unmapped
Selected jobs (1):
bam_sort
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}
---Sort bam .
Reason: Missing output files: tutorial_example1/thao2000.0.assembly.out/sorted.bam; Input files updated by another job: tutorial_example1/thao2000.0.assembly.out/thao2000.mc.sam.bam
Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 7.
6 of 11 steps (55%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (2):
pilon_contigs
assemble_unmapped
Selected jobs (1):
pilon_contigs
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}
---Pilon polish contigs .
Reason: Missing output files: tutorial_example1/thao2000.0.assembly.out/contigs.pilon.fasta; Input files updated by another job: tutorial_example1/thao2000.0.assembly.out/contigs.fasta, tutorial_example1/thao2000.0.assembly.out/sorted.bam
Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 4.
7 of 11 steps (64%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (1):
assemble_unmapped
Selected jobs (1):
assemble_unmapped
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}
---Assemble unmapped reads .
Reason: Missing output files: tutorial_example1/thao2000.0.assembly.out/thao2000.megahit/final.contigs.fa; Input files updated by another job: tutorial_example1/thao2000.0.assembly.out/thao2000.mc.sam.unmapped.1.fq, tutorial_example1/thao2000.0.assembly.out/thao2000.mc.sam.unmapped.2.fq
Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 3.
8 of 11 steps (73%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (1):
join_contigs
Selected jobs (1):
join_contigs
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}
---concanenate reference-guided and de novo contigs
Reason: Missing output files: tutorial_example1/thao2000.0.assembly.out/contigs.final.fasta; Input files updated by another job: tutorial_example1/thao2000.0.assembly.out/thao2000.megahit/final.contigs.fa, tutorial_example1/thao2000.0.assembly.out/contigs.pilon.fasta
Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 1.
9 of 11 steps (82%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (1):
create_tsv
Selected jobs (1):
create_tsv
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}
---information reference-guided and de novo contigs
Reason: Missing output files: tutorial_example1/metacompass_summary.tsv; Input files updated by another job: tutorial_example1/thao2000.0.assembly.out/contigs.final.fasta, tutorial_example1/thao2000.0.assembly.out/thao2000.megahit/final.contigs.fa, tutorial_example1/thao2000.0.assembly.out/contigs.fasta, tutorial_example1/thao2000.0.assembly.out/contigs.pilon.fasta
Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 2.
10 of 11 steps (91%) done
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (1):
all
Selected jobs (1):
all
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}
localrule all:
input: tutorial_example1/metacompass_summary.tsv
jobid: 0
reason: Input files updated by another job: tutorial_example1/metacompass_summary.tsv
Releasing 1 _cores (now 1).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 0.
11 of 11 steps (100%) done
unlocking
removing lock
removing lock
removed all locks
confirming file containing reference genomes exists..
[OK]
checking for dependencies (Bowtie2, Blast, kmermask, Snakemake, etc)
Bowtie2--->[OK]
Blast+--->[OK]
kmer-mask--->[OK]
mash--->[OK]
Snakemake--->[OK]
Cleaning up files..
MetaCompass finished succesfully!
The default output is in the user specified directory tutorial_example1:
$ ls -1 tutorial_example1
metacompass_logs
metacompass_output
"metacompass_logs" folder contains individual log files for each software run:
$ ls -1 tutorial_example1/metacompass_logs:
thao2000.0.bowtie2map.log
thao2000.0.megahit.log
thao2000.0.pilon.map.log
thao2000.buildcontigs.log
thao2000.pilon.log
thao2000.samtools.log
log files include alignment, reference-guided assembly, reference-guided error correction, and de novo assembly info. The bowtie2 log life in this example has the read mapping information:
$cat tutorial_example1/metacompass_logs/thao2000.0.bowtie2map.log
200000 reads; of these:
200000 (100.00%) were unpaired; of these:
0 (0.00%) aligned 0 times
199189 (99.59%) aligned exactly 1 time
811 (0.41%) aligned >1 times
100.00% overall alignment rate
"metacompass_output" contains the results, exaplained in Analyzing MetaCompass results":
tutorial_example1/metacompass_output:
metacompass_assembly_stats.tsv
metacompass.final.ctg.fa
metacompass.genomes_coverage.txt
metacompass_summary.tsv
The -k option keeps intermediate outputs from MetaCompass. The intermediate_files folder contains files required for some downstream analysis. To keep intermediate outputs and redirect standard output to a file modified the previous command as follows:
$ python3 go_metacompass.py -r tutorial/Candidatus_Carsonella_ruddii_HT_Thao2000.fasta -P tutorial/thao2000.1.fq,tutorial/thao2000.2.fq -o tutorial_example1 -k > tutorial_example1.metacompass.log 2>&1
The results are the same as the one found in the previous example but with the addition of the intermediate_files folder:
$ ls -1 tutorial_example1_k
intermediate_files
metacompass_logs
metacompass_output
The intermediate_files folder contains the following subfolders:
$ ls -1 tutorial_example1_k/intermediate_files
assembly_output
mapped_reads
pilon_output
The detail of each is subfolder is:
$ ls tutorial_example1_k/intermediate_files/*
tutorial_example1_k/intermediate_files/assembly_output:
contigs.fasta
selected_maps.sam
thao2000.sam
tutorial_example1_k/intermediate_files/mapped_reads:
thao2000.mc.sam
thao2000.mc.sam.bam
tutorial_example1_k/intermediate_files/pilon_output:
contigs.pilonBadCoverage.wig
contigs.pilon.changes
contigs.pilonChanges.wig
contigs.pilonClippedAlignments.wig
contigs.pilonCopyNumber.wig
contigs.pilonCoverage.wig
contigs.pilonDeltaCoverage.wig
contigs.pilonDipCoverage.wig
contigs.pilon.fasta
contigs.pilon.fasta.fai
contigs.pilonGC.wig
contigs.pilonPctBad.wig
contigs.pilonPhysicalCoverage.wig
contigs.pilonPilon.bed
contigs.pilonUnconfirmed.wig
contigs.pilon.vcf
contigs.pilonWeightedMq.wig
contigs.pilonWeightedQual.wig
sorted.bam
sorted.bam.bai