Friday, November 10, 2017

How to do dependent peptide search using Proteome Discoverer!


Sometimes we have to stand on the shoulders of giants. This is more polite than just up and stating that your are going to steal someone's super cool idea and show people how you can do something similar without using their uber powerful software.

I recently discovered a MaxQuant feature called "dependent peptide search." It has been in place for years, but I've been motivated for a number of years to spend my time on another software package.

Dependent search goes kinda like this (not exact, but you didn't come to a blog for exactness):

There are modified peptides present in your RAW file.
If you looked for them all (with traditional search engines) it would take FOREVER.
However, if there are modified peptides from a protein there -- there are definitely unmodified peptides from that protein -- and they're almost always easier to detect.

SO -- it's time to reduce some variables.

MaxQuant is fancy enough to do all this with some button presses. You, my friend, paid for your software (unless you are using IMP-PD) so you have to do a bit more work

Step 1: Process your RAW files with your FASTA database. Go easy on the modifications. (Your cysteine alkylation, oxidation of Met, protein N-terminal acetylation maybe)

Open your processed report. Right click anywhere on it and check ALL THE THINGS!


You'll notice I have a Contaminant flagged. I'm gonna leave it in there. I'm not getting paid for this. Actually, it will just be a redundant entry in the later steps and won't matter.

Now that everything is checkmarked -- File > Export > To FASTA.  Then you'll discover you don't actually have to do the right click checkmark thing.


Now you have a FASTA that is only made up of the proteins that you actually discovered. If you are using a big database this could be a massive search space reduction. You'll notice my Filters are open. I'm running some stuff as I'm writing this to see what filters are the most effective. First run, I filtered down to just the "Master" proteins and things with >1 unique peptide ID. If you're going to find a phosphopeptide you're sure as heck gonna find at least 2 peptides from that protein first -- right?

Now you can input this new FASTA database and go crazy.

Check the phosphoSTY, add all the acetylations, throw in some GlyGly. If you've got Byonic or Mascot you can get closer to dependent peptide search by actually doing a deltaM or wildcard search.

If you're concerned about FDR considerations -- you definitely should be. That's why I don't have data from this to look at. Your lowering your database size and potentially forcing the search engine to make some matches that might not be the best ones artificially.

I'm dealing with it (for now) by allowing the Peptide FDR in the consensus to work things out. If I take my first run data and my new stuff that just processed with the 10 PTMs I care about and combine it into a new (Multi) Consensus report


I have a lot of settings (hit "Advanced") under the peptide validator that I can toy around with:


And I think that optimization of these is the trick to getting the best data out of this. You can always go back to deltaM or search engine PSM score (and manual validation of the ugly ones) if you need to.

2 comments:

  1. Thanks for a very nice blog. Just wanted to add a thing or two:
    - with this approach you're able to find peptides that dependent peptide cannot. Dependent peptide rely on delta-masses compared to already identified PSMs i.e. there is a need for a PSM.
    - This approach can only find known modifications. Dependent peptide can find any delta mass.

    ReplyDelete
    Replies
    1. Great point that I should have emphasized more! I can go back in with Byonic or Mascot (if I have those) and look for the delta masses, but with Sequest I currently can not do this.

      Delete