Saturday, February 25, 2017
Okay...so this paper has made the blog a couple of times. This time, however, I'm up early on a Saturday cause I've got the RAW data files from it!
Now you can have them too! They're at Proteome Xchange under PXD003904.
I've been wanting to run this for a few reasons:
1) To see how hard it is to process
2) To figure out the best engine and settings for it
3) ....cause I thought I was going kayaking this morning before I went to lab...and...it's...raining....
(So it's another Science Saturday!)
Why #1? Check out that image! Normally we're just worrying about b/y ions. For those of you who have bribed friends with advanced engineering degrees to help you set this up in your instruments, what do you get for your troubles? Now you get to worry about b ions, y ions, a ions, x ions, c ions AND z ions in each MS/MS spectra!
(Just a reminder...thanks WikiPedia!) This looks like MASSIVELY more search space. Just what y'all needed, right?
Okay -- so I'm going to start with just one file from this dataset. And only care about the positive UVPD -- and I chose this study cause the peptide is not chemically modified AND the fragments are read out in high resolution (I have some ion trap files now....and I'm finding them challenging...more later, maybe!)
First impression? HOLY COW, this data is beautiful!!! Second impression....is my computer going to wake the dogs up?!?!? ALL the fans are running.
This is one file.... Yes....the search space has blown up!
DISCLAIMER: This may not be the smartest way to run this. I'm half-awake, somewhat annoyed, and not a professional scientist. This is how I set it up, and I'm super impressed with the data.
Used a normal Uniprot database (from 2011, LOL!) used a tight mass tolerance cutoff and then allowed SequestHT to have equal weighting on all the fragment ions. I'm doing several iterations with the different engines as well, but this is a nice start.
If I use exactly these settings and just exchange Percolator and Target Decoy
Sequest + Target Decoy = 1,700 phosphopeptides
Sequest + Percolator = 4,459 phosphopeptides!
Are the extra Percolated peptides real? As far as I can tell? Yeah...they're real. Umm...check this out! (Click to expand)
It is a great big peptide phosphopeptide (2 missed cleavages?) and looks better than any ETD phosphopeptide I think I've ever done. I dropped +3 and +4 fragments from this so I could visualize the chart here....IMHO, this data is just stunning. I chose this at random. Most of them look this good!
My worst scoring peptide from Percolator -- I might not put it directly in a paper itself, but I wouldn't be embarrassed to load it into a Supplemental figures file. (Leaving it out for the sake of -- I do have to do a lot of work today...)
How do the other engines I have do with this data? Bad news.
Neither my version of MsAmanda nor Byonic currently have the capabilities of accepting the full a/b/c/z/x/y fragment spread....
...but I'm quite certain both of these awesome teams have it on their radar if not ready to rock already!
I haven't configured a Mascot server in a few years, but unless they changed something significantly, I'm sure you can go in and configure a new instrument and set the weights of the ion spreads in the same way as this, then you just use PD to access that new instrument type.
TL/DR? We can natively process UVPD data from the literature with SequestHT in PD. It will push our processing CPU pretty hard, but it is do-able. And there is a reason the field is excited about the possibilities of UVPD Orbitrap fragments!
Thursday, February 23, 2017
I needed some new fun data to process while I was at work today. I went through PRIDE but didn't see anything that caught my attention this morning (I want something really difficult -- I'm gonna be gone for at least 12 hours and my desktop isn't getting off the hook!)
I popped over to Proteome Xchange...found EXACTLY what I was looking for (BTW, that search bar is better than the one at PRIDE...so much easier to find weird stuff!) 14GB downloading now!
There is also this neat little graphic at the top that breaks down the files that are uploaded, by species or by instrument that the data was generated on!
To this day -- the Orbitrap Velos is responsible for the most datasets!! I'm gonna give the credit to 2 of the guys that helped us buy mine back in the day....
...and wonder how many other people upgraded their lab capabilities thanks to a program that provided a big boost in research funding around the time that awesome instrument came out....?
Guess who is catching up, though?!?! No surprise.
And third, in purple? That's the LTQ-Orbitrap (Classic, XL and Discovery all lumped together, I think)
Wednesday, February 22, 2017
We don't often get super desperate for signal in peptides these days unless they're reeeal weird. Our intact protein counterparts are often scrounging for a few more ions -- and man, drop in on the metabolism/omic/onics labs? You'll normally see some people trying everything to just get a few more ions out of the noise.
These guys? This is a whole new level!
They're doing the real basic stuff -- just trying to add some stuff to the periodic table -- and they've been real good at it. This new technique allows them to get even further.
The techniques they used in the past relied on detection of these new heavy element particle created in accelerators in their "native" (feel like I'm mis-using this) state. We know that ions are a whole lot easier to detect, right? So -- this new process makes ions out of their heavy elements with a laser. That helps a whole lot! But to boost their signal? They concentrate gas with their heavy elements -- with a JET ENGINE. The gas coming out is MACH 6. But it's concentrated.
Don't know about you, but I'm impressed!
Tuesday, February 21, 2017
I could probably cheat and use this great review on another blog as well -- because it covers both proteomic and metabolomic biomarkers of sepsis.
It isn't the most in-depth paper regarding instrumentation aspects. It focuses more on what has been done to attempt prediction of this terrifying complication of bacterial infection. It also introduces our technologies to microbiologists in a really digestible format. Best of all, it provides some serious insight into what needs to be done next. This is another condition where early prediction means better outcome for the patient -- and we've got the best tools for tackling it!
Monday, February 20, 2017
I feel like this paper, though still in PrePrint only, might have made this blog once before.
Much better! (Most colorful thing I could think of in 2 seconds).
Back to the paper! Here is the idea -- fine -- do your normal peptide search with your full database -- but do your FDR decoy search against just the peptides you identified.
Which -- you know -- kinda makes sense, right? If your database is way bigger than your matches (which is true in everything I can think of right now) then you have a much higher chance of false matches if you reverse it. But...I have no way of testing whether this idea is smart or not.
Oh wait -- they set up a webpage where you can test this premise on your own data just to show that they're onto something!
Here it is!
You'll need to get your PSMs into the format they they want (CSV, with specific comma placements) but the output charts will show you your results if you do normal target/decoy vs. their hybrid approach.
Sunday, February 19, 2017
This is a really nice new study showing how you can go from discovery to clinical validation with one instrument!
In this case it is a Q Exactive classic and it is going after the somewhat terrifying multi-drug resistant tuberculosis strains. I didn't know about these, but it appears to be not one strain origin, but something that many environmental strains appear to be capable of picking up under the right conditions (ugh...)
The goal of this study -- find what proteins differ when this genotype is achieved with deep quantitative discovery proteomics (on the QE). Then create a rapid, targeted, clinically applicable assay to see how far these other strains are from achieving this indestructible status.
As an impressive addition to the study, once they had targets they made stable isotope labeled standards that they also spiked in for their PRMs. Seriously -- this is a ready-made tuberculosis clinical method! They aren't 100% clear (to me) regarding the LC conditions for the PRMs, but it looks like they might use a rather long 120 minute gradient --- suggesting (to me) the next application of this method is the direct application of this method to patient blood or plasma!!
Thursday, February 16, 2017
It is way too early for me to get my head wrapped around this one entirely. I've got part of it down and it is enough for me to say that I really like this new paper!
What I get (and like!): the concept of the "hidden proteome" -- it's this stuff!
This is a pie chart from a human file I've been messing around with -- about 100,000 MS/MS spectra and 28,000 or so match to PSMs. All that green stuff...that's the DARK PROTEOME (this sample is kind of weird, btw, that's why I'm messing with it). These authors state that in normal human plasma they can identify all but 25-30% of their MS/MS spectra using high resolution MS1 and MS/MS methods.
They attribute most of this variation to antibodies -- and have some remarkably interesting things to say about circulating antibodies that is heavily backed by citations (and all stuff I didn't know).
What!?!?! I know! Okay...now I'm actually getting the rest of this paper. I needed more coffee and to reread, I guess.
What they do to get to unknown stuff is to get the antibodies out of the plasma. They use an MG (melon?) column that crudely pulls out all the antibodies and associated proteins and I guess it pulls them out regardless of what their specificity is (goes after the non-variable region?)
They do normal LC-MS/MS (except high res with HCD and ETD) on the MG fraction as well as the unfractionated and de novo search all of it and use DeMix-Q (label free quan you can apply an FDR to!), and BLAST to filter the results. I might have this out of order. Still sleepy I guess.
The end results? Great! If they take their normal proteome and compare patient samples with a distinct and observable phenotype, they get nice clustering and differentiation. If they add in the results from this workflow -- the get AMAZING clustering and differentiation!
Wednesday, February 15, 2017
I sat this paper to the side because I didn't know what the heck to think about it. It sounded like a funny concept -- then I skipped the nice and short (Open Access) paper and just went to the tool!
What? Who thought this would be a good idea? Why didn't anyone think of this before? Then...NO WAY...it CAN'T BE THAT FAST AND WORK!! Who can I send this paper to immediately? Wait... I might be the only person who will be this excited...just put it on the blog...
This was my brain on PGx.
If you want to go through this roller coaster of an experience...you can use the tool here.
What's it do? It takes your peptide list and then makes a new file out of it. This file is called a BED file and it is a mapping of all the peptides you found -- to their specific place in the genome.
No -- seriously this is pretty cool for the proteogenomics stuff -- because it saves you a lot of steps. Have you tried to do this with the (AWESOME!) free tools out there? It is really super hard (if you're as dumb as me). In this paper they state that it takes 140 steps to get this far using the Galaxy tools...which might be an exaggeration.....
There is a catch, however. The web interface is just a taste of the power that you have. You have to go to Python to use all the tools they show in the paper, but if you just have a couple human peptides you want to know more about -- the web interface is good enough -- and it'll make you a BED file for your entire peptide list in a few seconds! I loaded 34k PSMs which it reduced down to 21k non-redundant sites!
If you are using PD 2.1 you'll have to export your peptide or PSM list and then remove the flanking residues.
I did it with "Text to Columns" in Excel.
Tuesday, February 14, 2017
I'm going out the door stupid early this morning but don't want to forget to revisit this new paper in Cell when I get back!
It looks like a big group from MD Anderson (a place you could argue knows at least a little bit about cancer!) built some intelligent cancer protein arrays, screened an absolute ton of cancer cells -- and made all that data publicly available to the world through a friendly and powerful web interface.
It looks like around 250 - really good - targets were chosen that represent both important mutations -- and PTMs (!?!?!? can you do that via array? guess so!?!?!). I can't wait to check it out later!
Monday, February 13, 2017
Wow! Has this topic ever come up over the years...and I haven't known what to say in response. The question of course is -- can I install Proteome Discoverer on Linux?
Apparently the answer is "Yes!" and here are the instructions (can't verify these...yet....) courtesy of Computational proteomics: (This tutorial is for PD 1.4 and ubuntu)
As an aside -- I finally got fed up with Windows updates enough to install Ubuntu on an old laptop sitting around -- that might be the fastest laptop in the house now. These free operating systems have come a looong way and use very few resources compared to the big operating systems!
Sunday, February 12, 2017
If you're local here in Maryland or D.C. and want to send some of your customers/collaborators somewhere to learn something about proteomics, you should check out the Center for Proteomics Discovery's Seminar series. The first one -- that some people are calling "Fall in Love with Proteomics" is this Tuesday. I hope to have a full schedule to announce and send around soon. Oh yeah -- you're welcome to come as well!
JHU has made a considerable investment in cutting edge proteomics recently -- both by modernizing instrumentation (there are a bunch of tribrids now!) and, more importantly, by recruiting some serious young scientific talent.
It is exciting in Baltimore!
Saturday, February 11, 2017
I stole this figure above from this Nature Review from 2004, but I think the figure really shows how interesting this brand new paper from Peter Kubiniok et al., really is!
Chances are you've heard of RAS -- if it it messed up in a tumor -- it's real bad. And tons of really smart people have been studying RAS for years for that reason.
RAF is short for Rapidly Accelerated Fibrosarcoma -- DO NOT GOOGLE IMAGE SEARCH THE FINAL WORD. You might see a cat with it....and might lose interest in your breakfast...
Okay -- so now I know another reason for wanting to know more about RAF besides that figure at the top of this post that probably still hangs in my old boss's office. Look at all those question marks around the A/B/C RAFs!
Ready to get rid of some of them?!? Go Go Time Resolved Phosphoproteomics! (It almost works)
In this impeccable study they start by getting 2 colon cancer cell lines that respond differently to a RAF inhibitor drug and SILAC label them. They hit the 2 cell lines with the drug and pull samples every 5 minutes. Counting the zero time point it looks like 15 time points! Wait. This study gets harder.
They combine heavy and light peptides at each time point -- phospho enrich with titanosphere beads according to this protocol then offline SCX fractionate each time point with their own handmade spin columns (cool! I don't know how to do that. They briefly describe it, but I don't think I could replicate from this paper. It might be in the paper linked in this paragraph).
It looks like they pull 6 SCX fractions. We're at 90 or so phospho-enriched runs -- that go into a Q Exactive Plus. All RAW files are available at PRIDE -- file PASS00897.
MaxQuant was used for processing with normalization at each time point. Okay -- this group deserves some serious credit for their downstream analysis and how clearly they describe the methodology! It goes --> we used this tool --> it has a funny name --> here is where you read about it and get it for yourself --> here are the settings we used and why. Not to be critical, but a lot of the time you get to this point and "here is a formula that you might be able to understand if you hadn't taken your last stats class in the 90s...and forgotten what all these big Greek letters are..." This one is a joy to read!
Enough flattery! How'd they do? I lied. I'm not done with the flattery. This study is killer. Almost 38,000 phosphorylation sites....
....LET ME FINISH....but around 650 that clearly modulate in response to the drug treatment! When they extract them out after all these super cool tools --- they are left with the most comprehensive phosphorylation interaction map of RAS/RAF/ERK that I've ever personally seen! They cut right through the noise and it takes them right to where they want to be. Seriously...WOW....you have to check this out!
Disclaimer: I know a lot of work has been done on these pathways since the review photo at the top of this post and maybe moving from that one to the ridiculously improved one in this paper (you'll have to look at it yourself, it's Open Access) isn't solely this work -- but -- man, this is a really good phosphoproteomics study!
Final note: I get a little uncomfortable when we're doing phosphoproteomics without whole proteomics to show that up-regulation isn't due to whole protein level changes and down-regulation isn't caspase activation or whatever -- but if you're pulling a sample at 5/10/15/60 minutes after treating with a drug -- I'm just fine with your assumption that is a phospho change, especially if it is up-regulation!
Friday, February 10, 2017
I'm embarrassed...cause I know some of these authors on this poster...and it is from HUPO 2012... but I have been looking for info like this here and there for months.
What is the limit of detection (LOD) and limit of quantification (LOQ) for the QE running PRM mode (used to be called targeted-HCD) -- in a complex matrix?
The answer is here -- in this case, heavy peptides spiked into an E.coli background. Imagine -- much better with the newer stuff -- and...a much better test of LOD/LOQ than the ones we typically see.
Thursday, February 9, 2017
This new study at JPR is interesting because the results are substantially different than other reports out there (and my own personal observations!)
Ignore the top figure first.
This is a comparison of iTRAQ 4-plex, iTRAQ 8-plex and TMT 6-plex on a 5600 Triple TOF. In their analysis they get WAY more peptides with proteins labeled by iTRAQ 4-plex than iTRAQ 8-plex. Okay, I'm fine with that. This has been found a bunch of times before, but I've never seen anything this dramatic -- it is seriously like...50%...or less!
Where this gets weird is that they find more peptides if they're iTRAQ 8-plex labeled than when they're TMT 6-plex labeled...and...this is very contrary to my observations and most of the papers I've seen on the topic. Less than half the number of peptides? Umm...if true this is a seriously big deal!
Let's see what is different!
The MS1 TOF scan looks in the 400-1250 Da range...maybe that is a hint..?
The MS/MS was selected from within the same range -- well...401 to 1249...
Top20 ions were selected for fragmentation
A feature called "Adjust CE when using iTRAQ reagent" was used for MS/MS scans.
Okay...I don't know anything about the organism they are using and I'm seriously late so I don't have time to investigate, but I'm gonna assume it is relatively normal as organisms go -- let's focus on the tags!
Google Images pulled this up -- but it doesn't link me to the original source. If this is yours, I apologize, and I'll totally give you credit for it if I knew who you were!
Here is my first thought -- the iTRAQ 4-plex tag is relatively small -- it only adds 144.102 Da to your peptide. That's one big amino acid mass shift and that is it.
TMT 6-plex and iTRAQ 8-plex, however, are much larger -- 229.163 and 304.something, respectively.
I'm gonna go back to one of the few posts from 2012 (holy cow...I was so much dumber back then!) that I haven't deleted out of just embarrassment.
This was a map of my general distribution of the (m/z) of my identified tryptic peptides from this huge pool we'd created of my boss'es depleted serum (hey...I'll argue all day that it was a better run to run QC than a BSA digest!)
First of all -- for my stuff (and, again, I don't know their organism) only fragmenting to 1249 would be a pretty massive hit in my peptide numbers!
But consider this one....if we added 144 Da to all of these peptides. And..yeah...I know...iTRAQ will add 0.5 charges or whatever across the pH/pKa thing (holy cow I'm late! words words words!) ignoring the charge shift, I'm going to lose quite a few identifiable peptides.
BUT....if I add 229 or 304...(ignoring charge shift again) I'm going to get VERY few of the peptides from my mix. Maybe not even 1/3!!!
I'm not being critical. Maybe this instrument can only scan to 1250. I don't know. If that is the case, WOW! you should definitely only use iTRAQ 4-plex! Maybe that is why they report so many missing channels on these devices when doing isobaric labeling -- again, not an expert on this one. Just seriously curious!
Let's look next at the "Adjust CE if iTRAQ reagent used" -- there is very little info on this one. Close as I can get by sacrificing a shower to finish this is maybe this PLOSone paper from 2015.
On page 5 it suggest that this feature may just be extra collision energy for the iTRAQ reagents. And this is going to be my only guess at why iTRAQ 8-plex outperformed the same peptides tagged with TMT 6-plex in this paper. One of my favorite things about TMT 6 or 11-plex is that I really don't need to fool around with the collision energy. The TMT tag appears to come off just fine at the same CE that I use to fragment peptides. I know some people crank it up a couple volts, but I don't think it is critical. For TMT 8-plex? HECK YEAH you crank up that energy! Is it possible that using this feature on TMT tagged peptides might over-fragment them? Maybe...?
I don't know -- it is honestly just me brainstorming on why this doesn't match my own personal observations. And...well...procrastinating...whoever wanted to meet at the gym before work every morning this week is dumb...and, yeah, I hope you read this!
One more thing of interest in this paper -- the figure at the top! I'm fascinated by the fact that Mascot distiller running Mascot and Proteome Discoverer running exclusively Mascot with the same Mascot node settings came back with slightly different results, but I'd suspect it would make sense if we looked at the Raw file to MGF conversion or node settings in depth.
I hope I don't come off as critical of this paper. I think it is a really good work, well written, and just plain fascinating!
Tuesday, February 7, 2017
I don't have time to read this one yet -- but I need to for a really dumb project I'm slowly working on here and there on the weekends. If it works, you are going to be seriously impressed by how dumb it is!
Quick clarification -- none of these Python tools are dumb -- they are super awesome. If you have the capability to take a search engine and write it so it's accessible in a freely available language for everyone -- YOU are also awesome. The only thing dumb here is what I do when it's too cold to do stuff outside!
What I need for this project is tiny but effective Python packages that are published and well annotated.
Ursgal is the missing link and it is described here!
Minor criticism (or is it..) the name of this software sounds like something Johan Hegg would scream in the middle of a song.
(Johan is the lead singer of the world's most popular Viking Death Metal band -- didn't know this was a music genre? You're welcome!)
What is URSSSGGGGAAAALLLLL!!!!!!? It is a python interface that allows you to combine data from the other code people have put together. They show they can use Comet, X!Tandem, MSAmanda, and other awesome engines and link it together into a single output. It even Percolates and the whole file is 9.2 MB Zipped -- you can email that!
You can get this suite directly at GitHub here.