pHMM4weka
This Java software implements Profile Hidden Markov Models (PHMMs) for binary protein classification for the WEKA workbench. A PHMM is a Hidden Markov Model especially designed to represent multiple sequence alignments of amino acid sequences. This software learns the alignment for unaligned sequences that are represented as a string attribute in an arff file. The learning algorithm is the Baum-Welch algorithm. It trains the PHMM until a user-defined threshold or a specified number of training iterations. Different to WEKA, the training process can be evaluated after each training iteration of the Baum-Welch algorithm. This software introduces binary PHMMs. They consists of two PHMMs one trained on the positive and the other on the negative instances. The software allows sampling of the negative class. Additionally the software creates attribute-value representations from PHMMs. These representations can be used in combination with any other WEKA classifier (depending on the individual classifier's capabilities).