EpiMatrix
Variation in human MHC insures that the surveillance capabilities of the human immune system are both broad and deeply redundant, making immune escape through mutation more difficult for pathogenic organisms. Unfortunately, this variation also vastly complicates the process of selecting T cell epitopes for vaccine designers. Selecting T cell epitopes from too many alleles creates a larger pool of epitopes than may be practically incorporated into a vaccine, while selecting epitopes from too few alleles may result in a vaccine that is effective in only a small portion of the population. Fortunately, some alleles are much more common than others and the binding repertoires of many alleles significantly overlap. By focusing on “archetypal” or “super-type” alleles that are both common and different from each other, one can reduce the search space to a manageable size. For Class I we focus on six of these super-type alleles (A*0101, A*0201, A*0301, A*2402, B*0702, and B*4403) and for Class II on eight of these super-type alleles (DRB1*0101, *0301, *0401, *0701, *0801, *1101, *1301, and *1501) that collectively “cover” the genetic background of most humans worldwide. In a typical analysis, protein antigens are parsed into overlapping 9-mer frames where each 9-mer overlaps the last by eight amino acids. Each 9-mer is then scored for predicted binding affinity against a panel of Class I or Class II alleles. The EpiMatrix algorithm compares the amino acid sequence of each given 9-mer to the coefficients contained in the matrix and produces a raw score. In order to compare potential epitopes across multiple HLA alleles, EpiMatrix raw scores are converted to a normalized “Z” scale. Peptides scoring above 1.64 on the EpiMatrix “Z” scale (typically the top 5% of any given sample), are likely to be MHC ligands and are worthy of further consideration.
Conservatrix
The genetic variability of some pathogens constitutes a significant challenge to efforts to design a vaccine driven by cellular immune response. Vaccines designed to protect against a given strain of clade of a quickly mutating pathogen may be ineffective when faced with a heterologous challenge. One approach to solving this problem is to include conserved and functionally or structurally important epitopes in the vaccine. EpiVax has done so in the context of an HIV-1 vaccine program using the Conservatrix tool. The algorithm parses input sequences into component strings (typically either 9-mer of 10-mer segments) and then searches the input dataset for matching segments. No alignment of input sequences is necessary. The program then produces a frequency table showing each unique segment in the dataset and the number of times that sequence has occurred. Results of each analysis are stored in a database and may be browsed or exported to another program for analysis. These sequences are then put into EpiMatrix for scoring. Conservatrix may also be used to compare strings derived from different strains of the same organism or to search for elements common to disparate organisms. Peptides that occur in common “housekeeping” genes may show up in many different types of organisms. In all likelihood, these common peptides have been seen repeatedly by the human immune system and are probably tolerated whereas peptides that are unique to a given pathogenic target are more likely to be immunogenic.
EpiMatrix
By focusing on alleles that are both common and significantly different from each, alleles referred to as “archtypes” or “super-types”, one can reduce the search space to a manageable number. For class I we focus on six of these super-type alleles. They are A*0101, A*0201, A*0301, A*2402, B*0702, and B*4403. Taken collectively these alleles “cover” the genetic backgrounds of most humans worldwide (Sette et al. 1999) [8]. For Class II we focus on a panel of eight common alleles. They are DRB1*0101, *0301, *0401, *0701, *0801, *1101, *1301, and *1501. In a typical analysis, protein antigens are parsed into overlapping 9-mer frames where each 9-meroverlaps the last by eight amino acids. Each 9-mer is then scored for predicted binding affinity to a panel of Class I or Class II HLA alleles. The EpiMatrix algorithm compares the amino acid sequence of each given 9-mer peptide to the coefficients contained in the matrix and produces a raw score. In order to compare potential epitopes across multiple HLA alleles, EpiMatrix raw scores are converted to a normalized “Z” scale. Peptides scoring above 1.64 on the EpiMatrix “Z” scale (typically the top 5% of any given sample), are likely to be MHC ligands.
ClustiMer
We have observed that MHC Class II restricted T cell epitopes tend to co-locate in short well-defined regions within protein sequences. The ClustiMer algorithm reads EpiMatrix results sets and identifies regions (typically 15 to 25 amino acids in length) that contain significantly more predicted T cell epitopes than we would expect to find by chance alone. We refer to these regions as T cell epitope “clusters.” In our experience these clustered regions are highly likely to contain promiscuous T cell epitopes (i.e. epitopes that can bind to more than one HLA allele). Because they can interact with multiple HLA alleles T cell epitope clusters are important drivers of adaptive immune response. In a vaccine context these short amino acid sequences can be used as either priming antigens or as boosting adjuvants. In a deimmunization context T cell epitope clusters are high value targets, areas where a small number of amino acids substitutions can have a large impact on immunogenicity.
EpiAssembler
The problem of virus variability also significantly complicates the selection of epitopes that have a population coverage advantage; such epitopes are termed “clustered”, “superfamily”, or “promiscuous”. To address this problem, EpiVax has developed EpiAssembler to identify sets of overlapping, conserved and immunogenic epitopes and to assemble them into extended immunogenic consensus sequences (ICS). Processing and presentation of these sequences would allow for presentation of the highly conserved peptides in the context of more than one MHC. The resulting peptide is not a “pseudo-sequence” as such, since each constituent epitope occurs in its corresponding position in the native protein but in different variants of the pathogen. Briefly, a highly conserved, putatively promiscuous 9-mer is chosen as the core peptide. Additional epitopes are then identified that overlap with the natural n- and c-terminal flanking regions of the core 9-mer peptide. If more than one suitable overlap is identified, the overlapping peptide with the higher overall EpiMatrix score is selected. The cycle is repeated until the peptide reaches a length that can be easily produced synthetically. The significant EpiMatrix scores contained within these ICS clusters are then aggregated to create an EpiMatrix Cluster Immunogenicity Score.
BlastiMer
One of the advantages of epitope-based vaccines is that it is possible to omit deleteriously cross-reactive epitopes from their formulations. One way to limit the possibility of cross-reactivity and autoimmunity in epitope-based vaccines is to BLAST all the putative epitopes selected for evaluation against the human sequence database at Genbank. ICS clusters that are homologous to components of the human genome can be set aside while the remaining “foreign” epitopes can be safely included in vaccine formulations. As a standard practice, any peptide that shares greater than 70% identity (or more than 7 identities per 9-mer frame) with peptides contained in the human proteome is eliminated from consideration.
