Pattern Syntax
Amino acids (or nucleotide residues for DNA patterns) are enclosed within square brackets []
The pattern [P][ST] would therefore search for a P followed by either an S or a T
The pattern [A][T][G][GC] would therefore search for ATG followed by either an G or a C
The character ^ is used to exclude residues
The Pattern [^DE][P][ST] would therefore search for a P followed by either an S or a T as long as the P was not proceeded by a D or an E.
The character * is used as a wildcard to represent any residue
The Pattern [P][*][P] would therefore search for a Proline followed by any other amino acid followed by another Proline
A character definition such as [ST] followed by {} is used to define repetition where
{x} is exactly x times
{x,} is at least x times and
{x,y} is at least x but not more than y times
The Pattern [KR]{4,} would find occurrences of 4 or more adjacent Ks or Rs (eg KRKR, RRKKR, but not KRK)
The Pattern [DE][ST][*]{4,7}[W] would find a D or E followed by an S or T as long as a W was within 4-7 amino acids of the S or T