DomClust is an effective tool for orthologous grouping in multiple genomes, which is a crucial first step in large-scale comparative genomics. The method takes as input all-against-all similarity data and classifies genes based on the traditional hierarchical clustering algorithm UPGMA. In the course of clustering, the method detects domain fusion or fission events, and splits clusters into domains if required. The subsequent procedure splits the resulting trees such that intra-species paralogous genes are divided into different groups so as to create plausible orthologous groups. As a result, the procedure can split genes into the domains minimally required for ortholog grouping. DomClust outputs a set of hierarchical clustering trees, but these trees may overlap with each other. The overlapping trees, which are represented in the above logo, actually result from the domain fusion/fission event, and are the salient feature of the DomClust program. When comparing several clustering algorithms combined with the conventional bidirectional best-hit (BBH) criterion, DomClust generally showed better agreement with the COG classification. By comparing the clustering results generated from datasets of different releases, we also found that DomClust showed relatively good stability in comparison to the BBH-based methods.
DomClust has been used for classifying hundreds of mocrobial genomes in MBGD (Microbial genome database for comparative analysis), which itself provides currently the most user-friendly interface for DomClust.
|README||The readme file for the program|
|domclust.tgz||The program source code|
|README||The readme file for the dataset|
|cog02.tgz||The COG02 dataset used in the DomClust paper (including all-all similarities, 65MB).|
|cog03.tgz||The COG03 dataset used in the DomClust paper (including all-all similarities, 190MB).|