Table 1 Running times in minutes for fastbreak, fastbreak on hadoop on a 9 server cluster, and BreakDancer

From: Fastbreak: a tool for analysis and visualization of structural variations in genomic data

Bam file Fastbreak (both passes) Fastbreak on hadoop (pass1 + pass2) BreakDancer
9 gb Tumor 80 4 + 25 785
20 gb Tumor 91 8 + 40 812
40 gb Blood 163 9 + 110 449
  1. Hadoop running times are dominated by the time it takes the longest reducer to finish, meaning most of the cluster is unused for most of the time allowing greater throughput when processing many files. BreakDancer running times appear to scale with the number of abnormal reads, not the file size; it performs faster on the larger “blood” files than it does on the smaller “tumor” files.