Welcome to JUGGERNAUT ASSEMBLER Project

This on going project is based on our work preliminary work in the following paper, [Efficient Parallel and Out-of-Core Algorithms for building Large Bi-directed de Bruijn Graphs]. Currently the assembler uses out-of-core and parallel techniques to build and compact the de Bruijn graphs. In this brief document we describe how to build and run the programs which we used for the experiments in our paper. We also give the details and logs of the benchmarks which we have reported. We first describe how to build and run our parallel de Bruijn graph construction algorithm. Later we describe how to build and run our out-of-core de Bruijn graph assembler.

Building the Parallel De Bruijn Graph Program


---------------------------
1. STEP-1: [GET THE SOURCE] 
---------------------------

Download the tarball
[juggernaut-asm.tgz]

[OR] get a fresh copy from sourceforge, by the following commands.

$ mkdir juggernaut-asm; cd juggernaut-asm
$ cvs -d:pserver:anonymous@juggernaut-asm.cvs.sourceforge.net:/cvsroot/juggernaut-asm co .


-----------------------------
2. STEP-2: [BUILD THE SOURCE]
-----------------------------

Build the programs 'par-debruijn-graph' and 'aluru-graph'. Use the following
commands to build the source.

2.1 Build the program par-debruijn-graph
----------------------------------------
$ cd parallel/ 
$ make clean
$ make par-debruijn-graph-nout

You should see 'par-debruijn-graph' program.

2.2 Build the Aluru's program
-----------------------------
$ cd parallel/
$ make aluru-graph-no-ut

You should see 'aluru-graph'


-----------------------------
3. STEP-3: [RUN THE PROGRAMS]
-----------------------------

Both the programs are very easy to use. The input is the value
of 'K' and input reads file. See an example read file 'test_reads.5000.fa'
in the code directory. Use -dump < output_file.txt > if you want to dump 
the output. You need to use mpirun to launch the program (see the
below e.g.)


linuxAltix juggernaut-asm/parallel> mpirun -np 1 ./aluru-graph 
USAGE: ./aluru-graph {K-SIZE} {READS-FILE} [-dump file]


linuxAltix juggernaut-asm/parallel> mpirun -np 1 ./par-debruijn-graph 
USAGE: ./par-debruijn-graph {K-SIZE} {READS-FILE} [-dump file]

Benchmarks and Test Cases.

We used short reads of length 36 from a plant genome. The benchmark of the 8million short read input can be downloaded from here ( plant-genome_8M_reads.tgz). The memory and runtime logs for both the programs are available at mem-logs and run-time-logs respectively. The file-names are indexed with the parameters of the test for example the file http://trinity.engr.uconn.edu/~vamsik/ParBiDirected/mem-test/test.8388608.fa.AL.8.21.mem corresponds to the memory usage of Aluru's program, the corresponding memory usage of our program is recorded at http://trinity.engr.uconn.edu/~vamsik/ParBiDirected/mem-test/test.8388608.fa.VK.8.21.mem.

Using our Out-of-Core algorithm in the de novo assembly flow

Our program acts a major de Bruijn graph preprocessor, which can take massive amount of reads and build, compact the de Bruijn graph. Since the number of nodes of the de Bruijn graph is proportional to the number of repeats in the genome which is much smaller than the number of reads, we believe that major bottleneck in the assemblers is building, compacting and error removal from the reads. Once this is done the graph size fits in the memory and you can run any assembler. The following steps let you build and compact the de Bruijn graph and export this graph into Velvet's representation.


---------------------------
1. STEP-1: [GET THE SOURCE] 
---------------------------

Get a fresh copy from sourceforge, by the following commands.

$ mkdir juggernaut-asm; cd juggernaut-asm
$ cvs -d:pserver:anonymous@juggernaut-asm.cvs.sourceforge.net:/cvsroot/juggernaut-asm co .


-----------------------------
2. STEP-2: [BUILD THE SOURCE]
-----------------------------

2.1 Build the program par-debruijn-graph
----------------------------------------
$ make clean
$ make build_simple_velvet_graph 

You should see 'build_simple_velvet_graph' program.

2.2 Build the program to convert our graph to Velvet graph 
----------------------------------------------------------
$make convert_dbgraph_to_velvet

You should see 'convert_dbgraph_to_velvet'


-----------------------------
3. STEP-3: [RUN THE PROGRAM]
-----------------------------

$ ./build_simple_velvet_graph 
[main]: USAGE: ./{exe} {K} {READ-FILE} [-trace_reads]



--------------------------------------------------
4. Example on 5000 reads (file test_reads.5000.fa) 
--------------------------------------------------

Run the out-of-core graph construction program with parameters 21

$[vamsik-desktop:/media/Iomega_HDD/juggernaut-asm-main_branch] ./build_simple_velvet_graph 21 test_reads.5000.fa 
All messages will be logged in file test_reads.5000.fa.log

Find the name of nodes file and edges file.

$[vamsik-desktop:/media/Iomega_HDD/juggernaut-asm-main_branch] tail  test_reads.5000.fa.log
==================
PRUNED NODES = 2
UN PRUNED NODES = 4988
TOTAL NODES = 4990
SELF NODE TEST PASSED
[RemoveNodes] total_nodes_size = 218304, pruned_size=88, unpruned_size=218216
[RemoveNodes] : UNIT_TEST_NODE_PRUNE : PASSED
NODE COUNT = 4988
NODE FILE = ./bg.no7WgHIp
EDGES FILE = ./bg.ed7DD8TB

Convert the binary representation of the de Bruijn graph into Velvet's ASCII representation by the program convert_dbgraph_to_velvet. You should see the file Graph which is the de Bruijn graph in Velvet's format. You can feed this graph into Velvet(velvetg) and complete the assembly.

$[vamsik-desktop:/media/Iomega_HDD/juggernaut-asm-main_branch] ./convert_dbgraph_to_velvet ./bg.noPE5uEW ./bg.edB5K8qr 5000 21
SIMPLIFY_NODES = 12 
TOTAL_NODES = 4988 
The nodes file has 4987 comparison success's