DOI

A new major version of Dorado was released 3 weeks ago, and it came with a new DNA basecalling model: dna_r10.4.1_e8.2_400bps_hac@v6.0.0. However, ONT released a new hac model only, with no new sup model.1 This leaves users with an interesting choice: use this year’s hac model (hac@v6.0.0) or last year’s sup model (sup@v5.2.0)?

ONT introduced hac@v6.0.0 in their London Calling 2026 tech talk.2 Here is a quote from that presentation:

Importantly, we’re not releasing a sup model this time. And the reason that we’re doing that, is that we believe that this particular model is good enough that a sup tier is no longer needed.

That’s a big claim! In previous releases, sup has been much better than hac, both for read-level and assembly-level accuracy. So I was sceptical that this new hac model could really be as good as ONT claimed. This blog post aims to test that.

Model architectures

The hac@v6.0.0 model has some architectural differences to its hac@v5.2.0 predecessor. The biggest change is in the recurrent stack: hac@v5.2.0 uses seven standard 384-wide LSTM layers, while hac@v6.0.0 uses seven non-standard (custom-designed by ONT) recurrent layers. These hac@v6.0.0 layers have a wider 1024-dimensional input/output size, but a smaller 128-dimensional internal size, which helps keep the parameter count in check. Overall, the new model is only slightly larger: 8,790,768 parameters for hac@v5.2.0 versus 9,808,240 for hac@v6.0.0.

The sup@v5.2.0 model uses a very different architecture based on 18 transformer layers. It is much larger at 78,718,162 parameters.

I also took a quick look at the modified-base models: hac@v5.2.0_4mC_5mC@v1, hac@v6.0.0_4mC_5mC@v1, sup@v5.2.0_4mC_5mC@v1, hac@v5.2.0_6mA@v1, hac@v6.0.0_6mA@v1 and sup@v5.2.0_6mA@v1. These have ~3.2 million parameters each and they all have essentially the same architecture.

Methods

In this post, I tested the following DNA basecalling models: hac@v5.2.0, hac@v6.0.0 and sup@v5.2.0. Comparing hac@v5.2.0 and hac@v6.0.0 shows the hac-level improvement in this generation. But the more interesting comparison (to me) is hac@v6.0.0 versus sup@v5.2.0, since those are the two contenders for best available model today.

I mostly followed the same methods as I used in last year’s analysis of Dorado v1. Briefly, I used the five genomes from the Autocycler paper (each from a different bacterial species), aligned reads to the curated reference genomes to measure read accuracy, produced non-overlapping read subsets to increase my sample count, assembled each subset with Autocycler and counted assembly errors.

A couple notable differences to last year’s methods:

  • I included --emit-moves and modified bases (6mA,4mC_5mC) in my basecalling commands. This was to allow for polishing and methylation analyses (stay tuned for those in future posts).
  • Instead of subsampling to a depth of 50×, I used a depth of 30×. This allowed me to get more subsampled read sets per genome (10 instead of 6) and the lower depth was intended to cause more assembly errors, hopefully making it easier to see a signal when comparing basecalling models.

Speed performance

Basecalling was done on an H100 GPU on the University of Melbourne’s Spartan cluster using Dorado v2.0.0. This run (which contained more samples than just the analysed genomes) was on a PromethION flowcell with ~132 Gbp total yield.

Model Model
parameters
Time (h:m) Speed
(samples/sec)
hac@v5.2.0 8,790,768 8:04 6.25×107
hac@v6.0.0 9,808,240 7:48 6.47×107
sup@v5.2.0 78,718,162 21:33 2.34×107

These speeds are noticeably slower than last year’s results, probably because I included modified bases this time. Interestingly, hac@v6.0.0 was a bit faster than hac@v5.2.0, possibly due to Dorado optimisations for this new architecture, but my basecalling was done on a shared HPC node, so these values may not be very precise.

Read accuracy

The violin plots below show the read accuracy distributions (higher is better).3 The line inside each violin indicates the median.

Dorado v2 read accuracy

Comparing hac@v5.2.0 and hac@v6.0.0, median accuracy increased from Q17.2 (98.09%) to Q18.1 (98.46%), which corresponds to ~19% fewer read errors. ONT’s tech talk presentation said that hac@v6.0.0 ‘provides high single-molecule accuracy of Q23.5’, and they showed a distribution with a dotted line at Q23.5 (not sure if this is a mean or median) and a peak at ~Q24. My distribution is much lower, with a mode (peak) at around Q19.7.

Comparing hac@v6.0.0 and sup@v5.2.0, the difference is clear. The median accuracy for sup@v5.2.0 is Q20.6 (99.13%), i.e. sup@v5.2.0 has ~43% fewer read errors than hac@v6.0.0. So I can say with confidence that at the read level, the new hac model does not render sup obsolete.

One other finding: hac@v6.0.0 seems to do slightly better than sup@v5.2.0 at low-accuracy reads. This is apparent by looking at the lower peak of the bimodal distribution: for reads below Q10, hac@v6.0.0 has a modal accuracy of Q8.8 while sup@v5.2.0 is Q8.3. This is probably not important for most users, as most pipelines include a QC step which discards low-Q reads, but it is interesting nonetheless.

Assembly accuracy

While read-level accuracy is nice, I’m personally more interested in assembly-level accuracy. For my work, I would prefer a model that produces the best consensus sequences, even if its reads are less accurate.

The boxplots below show the number of assembly errors (lower is better). The line in each box shows the median, and whiskers span the full range.

Dorado v2 assembly accuracy

Overall, hac@v6.0.0 seems to do better than hac@v5.2.0, with the median errors per assembly dropping from 25.5 to 11. But sup@v5.2.0 is still clearly the best with a median of only 4 errors per assembly.

However, the top whisker of the hac@v6.0.0 boxplot extends much higher, i.e. for some samples, hac@v6.0.0 produced the worst assembly. To investigate, the boxplots below show the same data but partitioned by species.

Dorado v2 assembly accuracy by species

This shows that while hac@v6.0.0 did better than hac@v5.2.0 with most genomes, it did much worse on the Klebsiella genome, with ~100 errors per assembly. I don’t understand the nature of these errors – they seem to mostly occur in GCT motifs, and modified bases (at least the ones Dorado is able to detect) do not occur at the error sites. So it remains a mystery to me why hac@v6.0.0 struggles with this Klebsiella genome.

Discussion and conclusions

Sadly, my findings do not back up ONT’s ‘a sup tier is no longer needed’ claim. At both the read level and assembly level, sup@v5.2.0 was clearly better than hac@v6.0.0. So my recommendation for anyone using ONT for microbial WGS is to stick with sup@v5.2.0 if they can.

I don’t know whether ONT skipped sup for this release only, or whether they will not be producing any more sup models going forward. My colleagues and I almost exclusively use sup basecalling, since we have the computational power to do so in real time.4 I have also spoken to many collaborators in microbial genomics who do the same. Not everyone who works with ONT is limited by basecalling compute, so I don’t want to see sup go away simply because some users (e.g. high-volume human WGS) can’t afford to use it.

So here is my plea to ONT: do not abandon sup basecalling! I would love to see a sup@v6.0.0 model become available in the near future, even if it only brought marginal improvements over sup@v5.2.0. Even better would be R&D to further improve sup accuracy (e.g. new model architectures) as has occurred with hac.

Final thought: although my hac@v6.0.0 assemblies were worse than my sup@v5.2.0 assemblies, polishing may be able to correct the errors. ONT claims hac@v6.0.0 performs well for variant calling, and assembly polishing is conceptually quite similar (call variants and then apply them). So in my next post, I’ll run dorado polish on these assemblies to see if it can close the gap between hac@v6.0.0 and sup@v5.2.0.

Footnotes

  1. There is also no fast@v6.0.0 model, but that doesn’t bother me. 

  2. Mike Vella’s segment begins at the 32:00 point, he specifically starts discussing the hac@v6.0.0 model at 36:30, and the quoted part is at 39:00. 

  3. As I have mentioned before, ONT reads have a curious bimodal distribution with a low peak around Q5–Q10. I don’t know what causes this, and it’s often hidden by read QC which can discard low-Q reads. But if you include low-Q reads in your analysis (as I did here), it’s quite apparent. 

  4. OnION is now 2.5 years old but still going strong! It can comfortably do real-time sup basecalling for multiple MinIONs at once.