Saturday, September 24, 2016

How does the Precursor Ion Area Detector node work?

I'm procrastinating this morning and just when I was running out of excuses for not finishing an ongoing bathroom remodel, I realized there were a bunch of unapproved questions/comments on the blog!  This is the last one. After writing far too many lines in the little comment box about why the NIST antibody is so much better than the commercial sources that have been around, I didn't want to tackle this one the same way.

How does the Precursor Ion Area Detector node work? And a reference?

The reference might surprise you!

You can direct link to it here, and I think its open access.  Look, I'm gonna give Q-TOFs a hard time. I've only had one in all my career and it was, on its best day, a turd sandwich not very good. [Completely dedacted rant about Ben's hatred for Q-TOFs and sarcastic statements about their many uses as well as recently acquired facts regarding their value in scrap metal components].  Wow, I feel much better about publishing this post now -- and it is much much shorter!

Remember, though, that there was a day when this was the cutting edge and people were just as smart back then as they are now, and they did good research despite the limits of their instrumentation!  This paper is such a study. It is definitely intended to be a paper showing off a new (at the time) fragmentation technology, but in it they set the framework that most label free quantification is based on --or at least, influenced by.

The idea -- the high resolution (here 10,000) extraction of the intensity of the 3 most intense peptides from each separate maximum intensity is very strongly correlated with the abundance of the protein.

This is the Proteome Discoverer interpretation  -- you're ticking along and identifying peptides and you assign each PSM (peptide spectral match) the intensity it had in the MS1 event that it was selected from.  When you compile the PSMs into the peptide, if there are more than one PSM the peptide is assigned the intensity of the highest PSM. When the peptides are pulled into the protein or protein group, the average of the intensity of the (up to) 3 (adjustable in PD 2.1) peptides is averaged into the protein area.

If you have a protein that has only one PSM, this is easy. The "area" of that protein is the intensity of the PSM.
If you have 3 PSMs that all go to one peptide and into one protein, still easy. The "area" of the protein is the intensity of the most intense PSM.
If you have 3 PSMs for each of 3 peptides, the protein "area" will be the average of the most intense PSM from each peptide.

Important note here!  The protein "areas" will not always be calculated from the same peptides. If you've got something where you had 50-60% sequence coverage and have 200PSMS, chances are it won't be the same peptides at all. But, seriously, this totally works at the protein level. You are going to need to go to the PSM or peptide level intensities if you want to say, for example, how this modified peptide changes from run to run, and that requires a good bit extra work.

Michael Bereman, who knows a little something about protein quantification (SProCop! and and he told me it worked, if I remember correctly, "surprisingly well". I use it in virtually every sample I process in PD. It has never once hurt me to have that extra information!

Are there better ways of getting relative quantification of proteins and peptides? Sure!  And these algorithms are coming -- and are going to absolutely change EVERYTHING about how we do proteomics -- Minora, PeakJuggler, and IonStar are all getting ready for prime time and are going to usher in something I think will finally be worthy of the title "next gen" proteomics by allowing us to finally see all the stuff in Orbitrap data that we've never seen before. Your Orbitrap, right now, is far better than you think it is.

Friday, September 23, 2016

iMixPro -- less false discoveries in pulldowns with heavy peptides!

Affinity purifications (or much cooler...affinity enrichments!) are ever in increasing demand. What protein interacts with my other proteins and how may be one of the most important things we'll be contributing to biology in the future -- once its not completely fracking impossible to do it. The new crosslinking methodologies that are coming are going to help, but iMixPro is another elegant approach.

It is described in this awesome new JPR paper from Sven Eyckerman et al.,!  I'll start off by saying it isn't the simplest method you've ever seen, but if you've spent much time doing protein-protein interaction assays you've either developed your own complex methodology that works and you're keeping it secret from the world -- or you'd try just about anything to figure out what is real and what is not! -- especially when today's super sensitive instrumentation is telling you you pulled down 1,000 proteins with that expensive "specific" antibody you just got in!

It differs from affinity enrichments in that it intelligently employs heavy labels (this is where the "i" comes from in the name). Having essentially SILAC pairs to look at in their data improves even the label free quan approach you have in affinity enrichments. Combining labeled peptides = less batch effects, which is never a bad thing.  They show some great examples where they can remove the noise and find their true interactors, even when the intensity of the true signal is only a fraction of the value of the other signals identified!

Thursday, September 22, 2016

New cool stuff in Q Exactive Tune 2.7

I popped by to visit Dr. Kowalak the other day to see what cutting edge science the NIMH Proteomics Center is doing these days and he showed me that his QE HF Tune doesn't look like my QE HF tune....

So I upgraded my QE HF Tune to 2.7SP1 to check it out!  There is a bunch of cool stuff in here!  One highlight: The confusing %underfill ratio is now gone and replaced by a much more sensible measurement. You now have a "minimum AGC Target" as well as your normal AGC target.

According to the manual, "IF the mass peak of interest reaches this minimum AGC target within the maximum injection time, a data dependent scan will be initiated"

I like this much better!

If I've got it right, this is the current instrument logic --

--and a pretty good drawing of me and my dog, Gustopheles (you're welcome!)

What else is included in the Tune 2.7SP1?

Loads of upgrades for the QE Focus (extended mass range, more MSX counts && some combination scans, like MS1 and PRM in the same experiment!)! And a software modification to make instrument bakeouts on all Exactives and Q Exactives more efficient!

Please remember that this is my interpretation and may not be 100% factually accurate or well drawn. Sometimes I put things like this up and the vendor involved will contact me to tell me I'm wrong and I'll walk away learning something. If that happens, I'll be sure to edit this later!

The best part of this is that I don't ever have to explain %underfill ratio again!

Wednesday, September 21, 2016

Threonine and Isothreonine have different HCD fragmentation patterns!

Yeah...I totally stole this from another blog in-between meetings, but its seriously cool. The original blogger put it up on Accelerating Proteomics here.

The original article is from KG Kuznetsova et al., and can be found here.  (Side note: Man, there has been some cool original research coming out of Moscow lately! Keep it coming!)

Wait. What is Isothreonine again? Well, its also called homoserine and we sometimes see it in proteomics data, but it generally isn't a good thing.

Check out this quick image I borrowed from Alexey Chernobrovkin et al., from this paper a couple years ago:

In this illustration, the protein is yanked out and digested and...crud...overheating the protein with iodoacetamide converts some of the methionines to isothreonines. Gross. Then, cause you don't have IsoThreonine in your FASTA, you end up finding a peptide with regular old Threonine in it. 

Boom. False discovery. Where is that a big deal?

1) De novo sequencing (nuts) -- you totally got a peptide wrong
2) Proteogenomics -- cause your huge database has lots and lots of possibilities in it. And...well...the chances that you'll have a peptide sequence in your database with a  xxxTxxxK (from a peptide that really started out as xxxMxxxK...but isn't there anymore is higher than when you are using a smaller, manually curated FASTA and your odds of making that mismatch is made higher just algebraically.

All is not lost, researchers who are banking hard on proteogenomics/metagenomics being the future!

Cause the original paper I found at the top did a focused study with synthetic peptides and found 1) the Isothreonine peptides elute differently AND there is a change in the HCD fragmentation patterns (actually the second paper I mention reports that as well), but they suggest that it would be reasonably easy to integrate this shift in fragmentation patterns into most proteogenomic pipelines!

Monday, September 19, 2016

GlycoPep MasList -- automatically build targeted lists for glycopeptides!

Shoutout to @ScientistSaba for helping me keep up with all the awesome stuff happening #HUPO2016 and still having time to tip me off to some cool papers like this one!!

The paper introduces this little program GlycoPep MassList.

The concept is simple AND powerful!  Feed it your protein and it will generate you an inclusion list for the glycopeptides that could occur given your parameters.

They demonstrate that it works on their Orbitrap Velos Pro. You can use it for a purely targeted experiment or within a "gas phase enrichment" strategy (like the "include others" button on a Q Exactive)!

Saturday, September 17, 2016

MCP wants your opinions on targeted proteomic publishing guidelines!

MCP has always had (famously strict!) guidelines for what and how they will accept global proteomics data. They are now working on a draft for targeted proteomics data and have opened that draft up to the community for contributions.

Want to shape how we publish? Check it out here!!

Thursday, September 15, 2016

Rosetta's mass specs have confirmed complex organic molecules on Comet67P

You've probably already heard this, but if not -- its totally cool. According to this paper in Nature this week, Comet67P is just flying along leaving its long comet trail -- and that trail has a bunch of complex organic molecules in it!

My first question -- how do you confirm that? I'm first thinking the orbiter is probably doing this by spectroscopy and I'm gonna find those readings -- dubious -- but it turns out there are 2 mass specs on the Orbiter!

COSIMA is a Time Of Flight instrument designed by researchers at Max Planck that is capable of 1500 resolution at 100(m/z) but has an effective mass range up to 1,000 m/z. COSIMA's job is to collect dust particles and to analyze those particles by Secondary Ion Mass Spectrometry (SIMS). The surface of the particles are hit with an ion beam that ionizes stuff off the particles and in.

ROSINA is another mass spectrometer that is detecting and ionizing gases. I'm having trouble finding much in the way of details on it, but it I've ran across several descriptions of it as a double focusing mass spectrometer, something I'm not familiar with (and I've got to be at work super early today). The design should be investigated, though. It is capable of 3,000 resolution at 1% peak height. Which...honestly is a lot of resolution.

If I think of resolution at 1/2 height like this image I stole from the Fiehn lab...

We're calculating resolution at FWHM or 50% peak height. Unless I've got my numerator and denominator mixed up, if you're pulling 3,000 resolution at 1% peak height, that...ain't bad at all!

And I think I do have it right, because this double focusing mass spec is supposed to be able to tell CO from N2 -- (27.9949 from 28.0062!  that's 11 millimass units!!)

Okay. So...if there is evidence from these two impressive mass specs that survived a 4 BILLION mile trip there, I'm going to believe it!

But that isn't all! This isn't he first time we've shot instruments through the trails of comets. This is just the first time we've gotten readings this good. If you take the mass spectra from the other instruments in the past (as they show in the paper) you'll see that we have seen this in the trails of other comets.

So....there are huge balls of rock that fly through the universe shooting organic molecules the whole way....does anyone else feel like this alters an adjustment factor in the Drake equation at all?!?

Wednesday, September 14, 2016

GEMPro -- Genome Scale Models with Protein Structures!

This paper from Elizabeth Brunk and Nathan Mih et al., is not the first paper to jump on if you're already feeling dumb.

It is elegant and brilliant and imposing. The concept is an extension on a genome regulation tool called genome scale models. This is a nice open access review written on the topic from that is directed to people who aren't planning to encode their own tools.  

There appear to be multiple iterations of GEM, but the one that seems the most straight-forward to me is the integration of the genomic changes with the metabolic ones. Obviously there is several levels of regulation from the genetic level (from the transcriptional regulation through the post translational) that all have effects on metabolite production, but GEM steps around that. The concept takes our existing knowledge and relationships and feeds it into a framework -- we know that it isn't a direct link from RNA X to metabolite Y, but all the same when we see an upregulation in X we see a down-regulation in Y.

I probably slaughtered the concept, but that's what I'm getting out of it.

GEMPro builds on this. Cause what would make this more complicated? What if you also threw in protein 3D structures into the mix!?!?  The whole idea definitely makes my head hurt, but...

The GEM framework is in place and yielding dividends
We have structural information on 110k proteins (seriously...!?!...that's what the paper says!) and more all the time.
For those protein 3D structures we have useful information -- like what is the structure of this protein at this temperature...or in this disease state.
More metabolomics data is showing up all the time that could correspond to changes in either.

This is obviously a big data problem with this number of variables....and a big focus of the paper is that if they build a framework that can do this --it MUST be able to grow with the existing knowledge bases, cause our knowledge of everything biological is increasing significantly faster than linear rates.

How do you test something like this? They go for 2 bacteria. E.coli and T.maritima (which I don't think I've ever heard of...Wikipedia says its a cool extremophile from Italian volcanoes (estremofilo!)
Cool point in the paper -- if they try to do this analysis with all the data that was available in previous years you get a really cool picture of how our knowledge is expanding.

The myoglobin crystal structure was published in 1958. From that time until 2013 all the groups doing protein 3D structure work got to where about 34% of the E.coli proteins are characterized in high quality maps that can be used for this type of analysis. If they step forward in time to when they finalized this paper? They're at 44%. Wow! (Google doesn't know the word "Wow" in Italian. knows one...but apparently its not appropriate in all dialects and I'm watching it today.)

And they dump all this data in. From these 2 organisms and look around. This is my favorite analysis:
E.coli isn't very tolerant to heat compared to our estremofilo friend. They go into the literature and find the proteins in E.coli that are known to be adversely affected by growing in culture that is too hot. Here they can draw on their their GEM models -- what genes are known to be similar as well as what gene products are linked to metabolic functions that are the same (if the genes don't look the same, you can pull the listings that are tightly linked to the creation of this metabolite as the same thing)

This gives them a little over 200 entries that either have the same (or very similar) metabolic functions in the 2 organisms....and only 10% of them have similar 3D structures.

So...the genetic pressure is there to conserve this basic DNA sequence for making, for example, this amino acid. Or if the two organisms have evolved very different ways of making that amino acid -- we can link some of these proteins together by the fact that they make that amino acid. But at the 3D protein level they are very very different.

So...E.coli has 200 proteins that presumably just up and fall apart. No amino acid in my example = dead, but our Italian friend just keeps chugging along and enjoying its relaxing volcanic sauna.

I totally dig this paper. I'm not sure what I'm going to do with this information, but I really like it!

Erratum to yesterday's talk.

Thank you to everyone who popped in to hear my 4+ hours of sleepy incoherent rambling about how Orbitraps work.

Important erratum: During a discussion on the potential promised by the experimental NeuCode reagents I mistakenly copied an image that was NOT NeuCode. I then, very sleepily, tried to figure out why I hadn't cited the paper -- cause the slide was from something else entirely.

This slide has been corrected and clarified and this section will need to be deleted entirely from the video recording. No slides have been distributed from the talk, so I don't have to worry about a big and embarrassing mistake being shown to anyone. So...I've got that going for me!

This talk was not officially sponsored or blessed by any representative of any corporation of any kind and was not a responsibility of my day job. I have evidence, cause I had to spend Sunday and late Monday and Tuesday nights doing my day job so I could put on the talk. Hence why I was sleepy enough to put an image that made no sense to the talk at all into the slide. The proof is on my FitBit sleep tracker....

To anyone this slide annoyed -- I'd like to apologize. I need a proofreader!

Tuesday, September 13, 2016

UVPD without lasers!!

UV photodissocation and you don't need a laser!!  Paper here!

Check this out!

That is seriously it. Somebody here in Maryland has got an ion trap sitting around somewhere. Time to set up a weekend. I can get some LEDs on Amazon. Lets do this!

Is it the coolest possible use of LEDs?!?!

...I guess it matters who you ask....

My vote is PUT THEM IN AN ION TRAP!!!!

Monday, September 12, 2016

SCX separation inside your nanospray emitter!?!?!!?

I read this abstract yesterday and thought to myself "....great, someone discovered MudPIT..." and promptly forgot about it and got back to work.

This morning I reread it. Unfortunately, I don't have time to read the paper before I go out the door on this ridiculously early morning, but....they appear to be doing SCX in their nanospray emitter....

Considering the way that I learned to do SCX involved a buffer with 8M salts of some kind, they are either doing something very different -- or they are replacing their mass spec every couple of days.

If you'd like to delve into this mystery you can find it here!

Sunday, September 11, 2016

Known unknowns of cardiolipin signaling: The best is yet to come

A few years ago I had the pleasure of spending a few days with a bunch of lipidomics experts in Pittsburgh and got to learn: 1) How ridiculously insanely hard lipidomics can be if you aren't going after the "easy" compounds and 2) How important they are 3) How very very little we know about them.

This group just wrapped up a really nice review on one of their tougher problems -- the analysis of cardiolipins. How much fun are cardiolipins to work with? Start with the fact that structurally similar ones tend to cluster in similar mass ranges, but have different functions and you have a good idea.

One example they mention in the paper, 12 of their compounds of interest are within 0.1 Da in MS1 mass and even when they fragment them to figure out which one is which -- MS2 isn't capable of elucidating the location of a functional site -- which is critical to know cause there are a slew of isomers that are within these "12" compounds. They have to employ a 2D LC method and utilize MS3 methods on an Orbitrap Fusion to figure out what they are looking at.

They also show genetics techniques they can use, as well as imaging techniques to localize these things. The problem sounds...daunting...but groups all over the world are chipping away at it. And you can't beat the optimism in the title!

Oh yeah! Paper link here!

Friday, September 9, 2016

Basic Orbitrap physics seminar

Hey! Wanna log on for free and listen to me go on about where and how ions move around in Orbitrap devices?  The goal is to have a better understanding of where the ions are going and when to help understand your instruments better!

Note: -- Part 6 should be more like: From the LTQ-Orbitrap XL through the Orbitrap Fusion Lumos!

You can register here (limited to 1,000 total be quick about it)

The videos should be available afterward and I'll post it here!

Thursday, September 8, 2016

OpenMS 2.0!!!

Does OpenMS officially have everything now?

Ben, what are your rambling about now?  Oh...just the evolution of OpenMS into something that can do everything, as described in this brand new paper!

OpenMS can already do:
-Peptide ID
-Peptide Quan
-Integration into Proteome Discoverer via the OpenMS PD Community nodes
-Add DNA/RNA binding (to protein) detection capabilities to both OpenMS and to PD
-Allow people to add their own source code and then use the OpenMS downstream workflows (like FDR) to link to whatever upstream source search engines you are using; I think this is how these guys controlled this awesome inference study.
and now?

-INTEGRATION WITH COMPOUND DISCOVERER?  I love HRAM metabolomics and it consumes most of my increasingly rare instrument time these days. As much as I may rant about how easy metabolomics is with an Orbitrap after a beer or two, it is a field that still has its own innate challenges -- challenges that we honestly may not fully understand yet. Flexible software platforms that can address these are going to be critical if metabolomics is every really going to blow up the way we keep thinking its going to. I'm not surprised that the OpenMS team has the capability to add software to Compound Discoverer....considering the cool stuff they've developed for Proteome Discoverer, but...

...I had no idea they'd stated making nodes....and I don't know what the MetaboProfiler is or what it does, but it painlessly installed into my copy of CD 2.0 can't wait to give it a try!!!!

-PROTEOGENOMICS!?!? This study points out a case study where it does, as well as...

-Degradomics!  Have you tried realistically quantifying the degradation of proteins at a global level? No? Well...It. is. not. fun. The tools have to get better before we can track more than a small group in and someone is using OpenMS for that.

-Integration into KNIME and R for collaboration and downstream processing, respectively

-And a bunch of other stuff like Galaxy integration(?!?!), but this list is long enough now.

Does this sound like a sales pitch for OpenMS? It probably does, but this team of talented people are quietly making amazing tools for our community and going to great pains to make these tools as accessible as possible. And I don't mind being loud about it. (There are bunch of new tutorial videos for getting started now!)

You can easily find OpenMS and their spiffy new website with a Google search or directly link here.

Wednesday, September 7, 2016

Origin of Disagreements in Tandem Mass Spectra!

When you search the same RAW file containing tandem mass spectra versus the same database using different search engines, you are going to see some disagreements in the results.

For example, if I take a proteomic sample from myself and I run it through Mascot and I run it through Sequest separately, the results probably not going to be exactly the same. Mascot will identify some peptides that Sequest won't, and vice versa. It is also likely that I'll see a few MS/MS spectra that Sequest said was one sequence and Mascot something different...

Considering that the database we're searching this against is constructed making some textbook assumptions and is starting from a DNA sequence....that is not mine....we do pretty darned good though!

Where do these disagreements come from? That is the topic of this new paper from Dominique Tessier et al., in this month's JPR.   To evaluate this question, these researchers grab a cancer dataset from PRIDE from Gygi lab and then run some plant samples in house on an Orbitrap Velos using high/low (or...medium/low? 30k MS1 + Top5 ion trap MS/MS).

The RAW files are then searched versus: Mascot, MSGF+, X!Tandem, TPP (presumably, also using X!Tandem) and an analysis of the conflicts are performed between the results.

The results are interesting, and the processed results are more conflicting than I've ever seen. The authors develop a concept of "peptide space" and conclude that optimization of the search parameters for each engine is essential to getting the best and most overlapping data. They also note that in some versions of the software they utilize the parameters that they need to change to get the best data is sometimes not easily user accessible.

I think this is a nice study and a good look at some of the problems we have in the statistics behind the scenes. It is sometimes easy to forget these days what an enormous undertaking from a mathematical perspective developing all these tools has been over the last couple of decades. Today's proteomics researchers coming in can simply push a play button to get good results and its easy to take it for granted!

Minor criticisms:
1) The RAW files were converted by different tools that I believe are quite different in their underlying mechanisms.  I think this is a variable should have been eliminated by using the same tools. Would it have an effect? I dunno...but its a variable that could be knocked out with 5 minutes more work.
2) PD 1.7? Wow, I don't have that one! ;)
3) I think the function of the search engines is something that is being focused on cause its the easiest to implicate. The FDR estimations employed were different for each engine. I think this could have a big impact on these results. I'd suspect that if FDR was controlled the same way for each of these results that the level of agreement would be a little better
4) The in-house generated data is just a little weird. 30k MS1 followed by 5 MS/MS for plant fractions is going to yield only high copy number proteins and using a search parameter of 0.4 Da for the fragments is probably too tight and will affect the downstream results a little.

Again, minor criticisms from somebody who just does proteomics as a hobby. Please feel free to ignore!  I do like this paper and I'm glad Twitter (PastelBio!) recommended it for my breakfast paper today.