DiMotif: alignment free discriminative motif mining

DiMotif: we present DiMotif as an alignment-free discriminative motif miner and evaluate the method for finding protein motifs in different settings. The significant motifs extracted could reliably detect the integrins, integrin-binding, and biofilm formation-related proteins on a reserved set of sequences with high F1 scores. In addition, DiMotif could detect experimentally verified motifs related to nuclear localization signals.

The code is available at Github:

https://github.com/ehsanasgari/dimotif

 

The paper is under review, but available on bioArxiv and the software will be available on GitHub.

@article {Asgari345843,
author = {Asgari, Ehsaneddin and McHardy, Alice and Mofrad, Mohammad R. K.},
title = {Probabilistic variable-length segmentation of protein sequences for discriminative motif mining (DiMotif) and sequence embedding (ProtVecX)},
year = {2018},
doi = {10.1101/345843},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2018/07/12/345843},
eprint = {https://www.biorxiv.org/content/early/2018/07/12/345843.full.pdf},
journal = {bioRxiv}
}