The *.faa.gz files included here are the protein sequence files extracted from the genbank records for each sequence. The are 149 sequences from Ebola virus samples, and two sequences from Marburg virus samples. The proteins included are named: GP L NP VP24 VP30 VP35 VP40 sGP ssGP The file names are the genbank accession identifiers. The fasta headers specify the strain naming scheme, in the format: >strainName_proteinName The accessionToStrain.sed file can be used to translate accession identifiers to the strain naming scheme: cat fileWithAccessionIds.txt | sed -f accessionToStrain.sed \ > fileWithStrainNames.txt