Wednesday, October 18, 2017

Simplify the top down proteomics problem with NeuCode!

Top down proteomics is still tough  -- some of the problems that were really hard 10 years ago are still really hard now. What if we could simplify the whole thing by doing something completely different?

Like this...?!?!?!?

What if you looked at the challenge of trying to work out an intact protein sequence (including PTMs!) from the MS/MS spectra and set that to the side for now? Instead you use a combination of the intact protein masses alone (those are much easier to get!) and you combine that with ultra-deep shotgun analysis to work out the PTMs?  Could you then link the proteoforms back together?

Some of them for sure, but there is going to be a whole lot of uncertainty there between proteoforms of similar mass. What if you knew something really cool about the intact proteins that would help you link it back to the shotgun measurements -- like EXACTLY how many lysines are in each protein!

This is yet another thing that you can do with the NeuCode reagents. The application of NeuCode in this manner was previously shown in this paper by many of these authors.

Honestly, it sounds like a neat trick -- TADAA! this is how many lysines are in this protein, but there are other ways we could do this, right?

What this new study does is shows how we can actually apply this to a biological system -- by delivering the largest number of E.coli PTM-annotated proteoforms we've ever seen in a single analysis (>500). It is worth noting that there are some of the familiar top-down limitations, like proteins >45kDa were excluded from analysis, but what a cool new method to have in our utility belts!

Tuesday, October 17, 2017

Sonic speed digestion for complex proteomics samples!

I just got back from a couple weeks in amazingly beautiful southern Portugal. Great climbing, beautiful beaches, cool people, and the best $2 wine in the world.

I didn't actually intend to be disconnected from this hobby, but I dropped my laptop -- due, in no way whatsoever, to the awesome $2 wine.

I'll be backlogging posts for a bit, though, TONS of cool stuff came out recently!


My library doesn't provide digital access to Talanta any longer, but I think this tool is super cool. Why wouldn't ultrasonic treatment speed up sample digestion?!?

According to the abstract they get to full tryptic digestion in 5 MIN!!  It's exciting to see some good science out of Portugal, as well as sonar contributing something positive to our field!

Monday, October 16, 2017

Multiplexed plasma peptidomics!!

I stole this slide on peptidomics from this talk by Harald Tammon (see LinkedIn is good for something!)

Peptidomics is coming fo' real, yo! And this new paper in MCP jumps one of the biggest hurdles in doing these experiments! 

One of the reason peptidomics is so hard is that that processed circulating peptides are a little bit too big for metabolomics tools -- and often too small for proteomics tools. When a metabolomics person is optimizing their chromatography to separate lactic acid from alanine so they don't just shoot off the column in one single peak, that same chromatography system might not be the best for catching a singly charged peptide at 550m/z. And our tools? We rely to a huge extend on peptides accepting at least 2 charges so they provide an appropriate b/y spread for identification. +1 peptides?!? Most of the time we tell the mass spectrometer to just ignore them -- cause we aren't gonna identify them anyway.  (I wrote a post on a classic paper about this here).

How'd these authors tackle the problem? By TMT tagging everything! In general the TMT tag will add about 1 extra charge (all pH/pKa {or something} dependent) and all the sudden their normal proteomics workflow could do quantitative peptidomics!

To improve identification they used both HCD and ETHcD (forgive capitalization) and search the data with PEAKS and Byonic which are both more likely to successfully identify +1 peptides than Sequest.

How'd they do? Thousands of quantified peptides and a method that I'd follow to the letter if someone asked me to quantify changes in the global peptidome!

Saturday, September 30, 2017

Unrestricted data analysis of protein oxidation!

Okay -- you're gonna have to trust me on this one -- this figure above is actually really cool, but I can't get even the single image to copy over here right. I even tried (on purpose!) to open this paper in the "Active View" thing...

It's from this paper that is way too smart for me this morning.

In general we still have to limit the PTMs we go after in a study. Maybe that's going to change soon with some of the next generation algorithms that are coming, but right now we need to be restrictive. People studying protein oxidation in a biological context -- for example in aging research -- tend to focus primarily on carbonylations. We know from induced oxidation studies, like FPOP (which is probably an extreme example) that oxidation can have all sorts of different effects on a protein.

What this team shows here is a somewhat counter-intuitive way of looking at all sorts of oxidative events, even in complex matrices -- as far as I can tell, by just using MaxQuant in a clever way and some relatively simple post search filtering.

All the data they show is from a Q Exactive with 70,000 resolution and 35,000 resolution MS/MS. I think the resolution in the MS/MS is pretty critical for what they are doing. Even though mass accuracy doesn't really change with increased or decreased Orbitrap resolution, their downstream filtering is super harsh and co- corresponding fragment ions at lower resolution will probably lead to a real PTM getting tossed.

If you're trying to resolve a modification of tryptophan chlorination (+33.96) from homocysteic acid (+33.97) you might want to double that resolution ( does help a little that this example occurs on different amino acids... ;)

Something that ends up being ultra-critical for them is the "dependent peptide search" function in MaxQuant. Fabian Coscia describes this function in this YouTube video here (description of the function starts at 9:19, but the whole thing is worth watching.)  This slide screenshot does a good job of summarizing how it works.

These authors utilize this function and then export the resulting delta M peptide modifications and filter them down to known oxidative modifications (oh -- their samples are treated with something that oxidates the Albert Heck out of them.)  What they find in a very simple mixture reflects in a much more complicated sample -- specific oxidation "hot spots" and a whole lot more interesting protein oxidative modifications than carbonylation! Once they find them -- they've got MS1 signal to quantify them with.

Friday, September 29, 2017

Advanced precursor ion selection strategies on an LTQ Orbitrap!

I'll be honest, when I found this paper I was looking for the answer to a completely different mystery. However, this awesome paper goes a long way toward answering a question that's been rumbling around in my head for a while -- that is: how hard could you push an LTQ-Orbitrap system?

3 paragraphs dedacted due to excessive rambling....

What if you could get around that awesome mustard yellow interface and into the guts of the operating software. Could you write better, smarter instrument control software and crank that monster to 11?

Yeah -- this awesome study suggests there is definitely some room for improvement!

This team totally hacks and Orbitrap XL -- and drastically improves it's performance! 200ng of HeLa ran 8 times they get around 1,600 unique proteins (single shot top 10, 2 hour gradient). Honestly, that's pretty good and smokes any Q-TOF or Ion Trap I've ever personally used.

When they modify their instrument parameters to do cool things like better control dynamic exclusion -- and to automatically exclude peptides identified in the previous runs using their cool method (Smart MS2) they can get that number up to ~2,500 unique protein groups in 4 runs. Wow, right?

And if you're thinking "who cares, I'm no Russian hacker" check this out!

You can buy SmartMS2 for your LTQ Orbitrap!  (please see the disclaimer's section of this blog, it is way over to the right somewhere at the top. I am not endorsing this product. I have not used this product. This is a semi-scientific review of the literature only, and mentioning the fact that this thing is out there falls in line with the general story of this paper and blog review thing. And someone reading the paper would find out anyway!)

Thursday, September 28, 2017

Is it finally time to revisit biomarker discovery in plasma proteomics?!?

Admit it -- we jumped the gun. Proteomics was the most exciting thing ever and we had a couple of awesome early successes -- and every lab and biopharma company in the world dumped $$$$ into searching for plasma proteomics biomarkers. And...not to be mean...but very little came out of it...

There are facilities I've been to where people worry about saying "biomarker discovery" out loud. Where the impressions from the fleets of FT-ICRs still remain indented in the floors...and 6 cars fill lots that could hold 600....

We underestimated the problem. We underestimated the matrix. We didn't have enough speed, our separations were too primitive, and -- especially this -- we didn't have the dynamic range. I'm saying this all in past tense...cause this is the real question....

...and this great new open review tries to tackle this question head on!

The review starts off with some very good perspective. We all know the dynamic range of proteins in plasma is 10 or 11 orders, right? But, honestly, what does that mean in relation to disease states and current biomarkers? This is addressed very well here.

Okay -- another cool thing in the paper is the literature analysis, the rise of published proteomics studies that just keeps going versus the plasma proteomics biomarkers studies that went way up and then went way down. I think you could directly correlate that graph with the parking lot(s) I mentioned earlier!

I mentioned above that we didn't have the dynamic range. This is only partly true. It is more accurate to say -- we didn't have the dynamic range per unit time. I have a good friend who does incredibly deep analysis of samples. She gets coverage of her samples on par with anything we see in the literature today on today's newest instruments -- and she's been doing it for years with little change in her instruments and methods. However...she may spend 1-2 months of analysis on a single sample. Multiple protein extraction and digestion techniques, 2D-offline fractionation, that sort of thing. We've always been able to do that stuff...eventually....but now we have dynamic range / unit time!

This is where this review goes from reviewing the history of plasma biomarker proteomics -- to providing the blueprints we might use and changes we'll need to initiate if it turns out we're finally there. I especially like the grouping of the strategies into 2 clear groups, triangular and rectangular and I plan to add them to my terminology list.

Are we there yet? Maybe? At the very least, we're a whole lot closer than we've ever been, but there's definitely some work ahead still.

Loosely linked side note/foot note: I just learned recently about Eroom's law. It's a play on Moore's law (that we'll double computer power every 2 years or whatever). Eroom's Law is the opposite. It states that each new drug discovery will cost more and will take longer than the last one. This was based on observations from the pharmaceutical industry and there are lots of thoughts on the causes. One leading theory is the tendency for companies to expand and continuously add non-scientific staff like management, administrators and HR to "support" the scientific development. I've also seen it thrown out there that as a company expands it becomes commonplace to bring in outside thinkers from other industries and this may contribute.

It turns out that the fictional(?) parking lot I mentioned above had 30 spaces for the scientists and 569 spaces for the managers, administrators, marketing and human resources people and, of course, 1 for the hot shot executive bringing in all the freshest ideas from AOL...maybe our technological limitations weren't 100% of the problem...

Wednesday, September 27, 2017

QuiXoT -- Quantify any proteomics dataset labeled in any way?

I feel like QuiXoT has been on the blog before -- because maybe I used a screenshot from this travesty of an 80s cartoon show (the only thing good about it was the pun -- that I didn't get till years later...), but a search doesn't reveal it.

Nope -- it looks like QuiXoT is new!

...and available in pre-print at BiorXiV here! (I can not commit the proper capitalization to memory)

What is it? It is a pile of tools for quantifying mass spec data -- any mass spec data. It was first designed in house to quantify 18O (O18?) labeled proteins, but then they realized if they could do that, quantifying everything else was easy.

Immediately, my first question is what about 15N. It is never mentioned in the paper. Fortunately, in my country right now it is completely acceptable for you to perform any kind of formal business thru Twitter. You can break 100+ year established traditions at 3 in the morning, you can change military policy-- anything you want....just Tweet it out into the universe.

...I also wrote an email to the corresponding author, just in case they aren't up with the modern way we do things here in D.C....(sigh)...

BTW, you can get QuiXoT here. It has an awesome logo.

QuiXoT does need some manipulation to run. It isn't a super user-friendly GUI. However, it has a really nice feature called a "Unit Test". These are short tests that make sure that 1) You have all the pre-requisites to run the program 2) You get familiar with the steps of what you're doing and 3) You can get the data out that they expect from the data they give you.

Considering how hard it can be to get quan from 18O/15N labeled data (I know there are things, but it's nice to have alternative algorithms to run samples through) this doesn't seem too bad at all.

UPDATE 1 hour later (people in Europe get up EARLY!): QuiXoT can't do 15N, but it's still awesome!

Tuesday, September 26, 2017

The Dark Proteome Database!!

(Borrowed this image from ChemistryWorld -- here)

I LOVE talking to people about the dark proteome. I seriously think there is some fundamental biological process occurring in all cells that we don't know about yet. Maybe I'm crazy, but when you realize that we can't identify MOST of the MS/MS spectra we get, it does suggest something like that - right?

This new paper doesn't diminish this idea at all! 

I love the name of this journal! Just added it to my "watch" list.

In this study, these authors construct a database of the stuff proteomics doesn't identify -- and -- holy cow, it's super weird!

They test a lot of the presumed assumptions -- like, these are alternative cleavage events, or intrinsically disorders -- and find that these assumptions fall well short of the whole.

Okay -- so how about this for cool -- they rate entries according to their "level of darkness" in the database.

I'll be honest. I'm in the web interface now (which you can directly access here) and -- while cool -- I can't come up with a good idea of how I would/could utilize it right now. Considering the fact I just resigned myself to the fact I will never find my car keys again and I swear I just had them -- maybe I just need (a lot) more espresso this morning!

Monday, September 25, 2017

Deep Dive -- double 96 fractionation for when you absolutely need sensitivity

Ouch. This goes on the list of techniques I really hope I never have to do -- but if you really really need to quantify that peptide and you don't have any way to enrich for it, deep dive will probably get you there.

This is the overall strategy -- (please note they fall in the non-deplete camp) -- and you are reading that right. Fractionate into 96 well plate, monitor to figure out the well(s?) where your peptide ends up. Fractionate that single well into another 96 well plate -- and then use that final well for quantification.

I guess if it is completely automated and you know that C6 well fraction F4 has what you're looking for it should be the same for every it wouldn't be that bad? Again...I hope I don't have to do it, but the method is here just in case.

Sunday, September 24, 2017

SugarQB -- Glycoproteomics just got a whole lot easier!

I've had to sit on this one for a couple of days. This is a dilemma for me (despite the fact it took me 10 tries before Google was satisfied with my spelling of the word "dilemma." Not one "n". At all. )

This is it: I LOVE Byonic. Love it. It doesn't show up on the blog a lot, but I think it is some of the best software we've ever seen for proteomics. It has been pigeon-holed (which apparently is a term) as a glycoproteomics tool -- and maybe that alone. It is, however, a REALLY good proteomics search engine. It is, {also} however, a commercial product and not every lab can afford the $7k USD or so for it. {Am I allowed to say that? Guess we'll find out...}

Now -- I'd seen some posters online that suggested that my friends at IMP had glycoproteomics tools in the works. This is good for all of us, because to date, IMP has never charged anyone for a piece of software. They are even responsible for the fact there is a free functional version of Proteome Discoverer (which is, btw, off-the-charts awesome! Have I shown data yet? Man, I need some free time -- glad my vacation starts in 42 hours!)

I forgot what I was talking about. And I don't care. But check this out! that a free glycoproteomics workflow running in Proteome Discoverer 1.4? totally is...and it's no joke. It is seriously amazingly -- like -- I don't want to admit how good it is -- good.

(Did you know Lavar can't say that phrase out loud due to a court order? He can't. If I ever met him, I'd still try really hard to trick him into saying it. I have scenarios planned and everything...)

More importantly to this conversation -- is this new Nature Letter article --

Where they utilize SugarQb to uncover new glycosylation events that occur in exposure to Ricin. This is totally cool, for sure, we should figure out how Ricin works in case we make Walter White mad or something. But SugarQb can be applied to any acquired proteomics data and any biological problem -- where we know the glycosylation pathway of interest -- or not.

You can get SugarQb at the revamped As far as I can tell, it currently only installs in PD 1.4, but I haven't tried moving the .DLLs over yet.  I'll let you know.

Saturday, September 23, 2017

Quick guide -- is it a peptidase or a protease?

My formal training is in teaching and in microbiology. This whole mass spec thing kinda happened because no one else wanted to do it -- and I was definitely the worst person at the stuff my lab was good at -- ya go!

I appreciate the heck out of anything that can clear up technical things for me fast. Especially if I can just link them on this blog so I can find it later.

Thanks to @PastelBio I now have a link to look at the next time I'm considering using peptidase -- when I really mean protease -- as well as the different kinds of each.

This is courtesy of DifferenceBetween.Com and you can find this here! 

Wednesday, September 20, 2017

APOSTL -- A staggering number of Galaxy AP-MS tools in a user friendly interface!

Thanks to whoever sent me the link to this paper! That doesn't mean that I'll write about it, btw. However, if the paper leads me to an easy user interface that allows me to use a bunch of tools I've heard about, but all require Perlython or Gava or Lunyx or whatever to use otherwise, there's a pretty good shot!

This is the paper that describes this awesome new tool! 

As far as I can tell, bioinformaticians fall into a couple of different branches. You've got the hardcore computational camps that are writing all their stuff in Python, PERL or whatever. And you've got your more data science people who seem to be using either R or Galaxy. From my perspective all the awesome tools have one common denominator when I give them a shot...

Okay...maybe 2 common denominators...

....joking of course! But it isn't uncommon for these shells or studios to require some extremely minor alteration or library installation that is a challenge for me to do. And honestly, as cool as all the stuff in Galaxy looks -- I don't even know where to start with that one.

And here is where we click on the link to the APOSTL SERVER where all the Galaxy tools for Affiniity purification/enrichment MS experiments are all located in a form where we all can use them!

APOSTL has a flexible input format. Workflows are already established for MaxQuant and PeptideShaker but it looks like you'd just need to match formatting to bring in data from anything else.

I don't have any AP-MS data on my desktop right now (and it sounds like it's still working on a big queue anyway) but I have some stuff that needs to go into this later. I'll let you know how it goes.

...and sometimes Matthias Mann uses a picture from your silly blog!!!

One of the downsides of my current job is that I have to miss my favorite conference, iHUPO. Fortunately, though, loads of really cool people are there and I've been able to keep on top of what is happening via Twitter.

...and...yesterday I got this picture that made me feel included and also made me laugh a lot!!

Tuesday, September 19, 2017

Two cool new (to me?) tools I somehow missed (?)

I'm leaving these links here so I don't forget them (again?) they've both been around for a while and I'm wondering if I forgot, or if our field just has a lot of software!

You can check out MZmine 2 here. (now I can close that tab -- WAY too many are open!)

And Mass++ is here. P.S. PubMed thinks + signs mean something else and doesn't like searching it as text.

Both are free -- look super powerful -- and are waiting on my desktop for me to bring my PC into a hyperbolic time chamber.

Let's plunder some .PDresult files, matey!

(Wait. There's a guy on our team who dresses as a pirate?!?)

As I'm sure you're aware, it's talk like a pirate day. You can go two ways with a holiday like this as a blogger. You can ignore it completely OR you can make a sad attempt to tie it in with what your blog is about. I, unfortunately, chose the latter.

Recently, I've been helping some people with some impressively complex experiments. The days of "how many proteins can you identify" are just about gone. The days of "how does this glycosylation event change globally in relation to this phosphorylation event and how the fourier are you going to normalize this" Arrr upon us. 

The Proteome Discoverer user interface has gotten remarkably powerful over the years. However, I imagine the developers sit back and have meetings about -- "we have this measurement that is made as a consequence of this node's calculation, but I can't imagine a situation under any circumstances for why someone would want this." To keep from overwhelming us with useless measurements they don't output some of these measurements.

.MSF files and .pdresult files are really formatted SQLite files in (pirate? groan....) disguise. DB Browser uses virtually no space and can pillage these files and reveal all the behind the scenes data. 

For this reporter quan experiment, I can get to:  

78 different tables! Add more nodes into your workflow and there are more! You can get to in and pillage the files for tables you can't access otherwise.

Is this useful to you? Maybe if you're doing something really weird. If the weird thing you are doing is really smart, you could also make a suggestion to the PD development team to include it in the next release.  In the meantime, maybe this will do, ya scurvy dog (ugh...sorry...)