From: Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring

Overview of algorithm for attribute clustering and pattern discovery. The k-modes algorithm is summarized in the green portion of the flowchart. The algorithm can be run for a particular value for k or for multiple values. In order to build a hierarchical cluster tree for a protein family, represented by a multiple sequence alignment with N aligned sites, the value of k is increased each time from a starting value of N – 1 down to 2. The entire set of attribute clusters can then be arranged into a cluster tree for that protein family, or individual clusters can be computationally analyzed for patterns.

