Monday, September 4, 2017

Multistage database searches for proteogenomics!

EDIT: *searches...

Whoa! I was just talking about this with some really smart people at the NIH last week! The rarely used, extremely powerful, and somewhat questionably valid from an FDR standpoint, multistage database search techniques!

And this paper (also in press -- come on, library! though...I can write more blog posts if I'd read the abstracts only...) seems to suggest that the FDR problems aren't as bad as you would think!

The idea is that you start with your MS/MS spectra and you search them with one simple database search -- maybe no variable PTMs -- definitely no mutations and you remove these most biologically likely and boring peptide spectral matches.

The stuff that doesn't match goes down to more comprehensive searches -- multiple missed cleavage events, less common PTMs and maybe even alternative sequences. With fewer MS/MS spectra to search for this massively expanding database you can make your search time and matches more realistic.  In this study they apply this technique via MS-GF+ to proteogenomics -- one place where search space is often terrifyingly large.  And it appears to work.

I'll presume they did this all in Command line until I get the full paper PDF. For those of us that aren't as good at typing -- just a quick reminder you have this power in Proteome Discoverer (disclaimer -- I've never used this exact configuration below, but I've ran similar stuff)

In this example I search NIST normal human high res first, assuming it is the highest confidence (and fastest) for searching everything. The MS/MS spectra that don't match go to MSAmanda 2.0 with only limited mods -- 2 missed cleavages and Percolator for the ultra high confidence stuff. What doesn't match there goes to Sequest and I open it all up -- semi-tryptic with missed cleavages and a bunch of PTMs I consider likely. The final bit goes to Byonic for full out delta mass searches. Find my single amino acid variants and PTMs and everything.

If that group in the paper above is telling me how to deal with the FDR -- I can make a report like this easily!

