Wednesday, August 23, 2017

GenPro -- 1 step closer to personalized proteomics!


One very convenient shortcut in most proteomics workflows is the fact that we basically ignore individual genetic variation.

If you are doing proteomics on perfect clonal populations like E.coli K-12 strain or Mr. Meeseeks

...one of which has a Uniprot entry...

You don't have to worry about missing a peptide spectral match (PSM) because of an amino acid variant (AAV) [let's assume mutations don't spontaneously arise :) ]

For everything else? Missense mutations/alterations leading to single amino acid substitutions from one organism to another are all over the place!


GenPro is new software packaged described in JPR here that can create a personal protein database from whole exome sequencing data!  Emphasis here on the second part -- exome sequencing is the much cheaper genome sequencing type. (We just had ours done by a new startup and I think 125x coverage was $400 USD direct-to-consumer.)

First off -- this isn't the only way you can make a database like this, by any means. However --

1) GenPro appears to be the simplest way to do it I've ever seen

The software is available on GitHub here with step-by-step instructions (it isn't a GUI -- we do have to type/copy/paste a lot of stuff).

2) GenPro is the most thoroughly validated one I've seen so far!

They pull BRAIN tissue and do LFQ global proteomics compared to the GenPro generated database and find nearly 1,000 new peptides!

Okay -- if the authors or editors see this and want this image taken down, please email me (orsburn@vt.edu). You'll get a swift removal and heartfelt apology, but I LOVE this figure and I think everyone should subscribe to JPR and read this study enough that I'm going to risk it --

Check this out!!


This is HRAM fragmentation on their (OT Fusion, I think). The top is the normal variant -- the one that you'll find a PSM for using UniProt and from a normal brain of this species of organism. However -- in the personalized database there is an AAV -- and it is very clearly picked up from the MS/MS spectra. Just to cover their bases, these authors produce a synthetic standard and it shows 100% that the variant is there and fragments in that way.

These authors make some interesting biological conclusions from their observations on the AAV distributions, but you'll have to read it to see what they are. They make some heavy peptide standards and go back with PRM to obtain absolute quantification on some of the 1,000 variant peptides they find.

TL/DR?
1) Great new piece of free software
2) Thorough validation of said software
3) Thorough mass spectrometry from some people who obviously know what they're doing,
4) Extra steps taken to strengthen their observations and fill in some new insight into brain biology.
5) Shows how close we are getting to personalized proteomics!

Tuesday, August 22, 2017

ComPIL -- Infinitely scalable databases for metaproteomics!!


WHOA!!!!  Have you tried processing any metaproteomics data you found in a repository online (or generated yourself)?  If you naively barge into it you will rapidly discover a problem.

There are hundreds of GB of sequenced bacteria/virus/archaea databases out there! The sequencing labs aren't slowing down, either. In 2006 there were 300 bacterial genomes completed. In 2016 there were over 2,000 genomes completed of just E.coli! (Ref)

If you really don't know what species might be in your sample of mud or sewage or whatever (that's metaproteomics, btw, just digest what is there and do LC-MS/MS) you either need to find some database reduction steps -- or -- ComPIL!


ComPIL is a metaproteomics search system designed for the future. In theory -- it is infinitely scalable and may be able to keep up with these busy genomics centers and all the information they are kicking out into the world.

Nope, I don't get how they made it work, but I do understand the evidence they used to validate it.

They generate some LC-MS/MS data on HEK293 (immortalized human cell line) and search it against a human database in a normal way and then vs a ComPIL'ed database where human protein entries are a very small percentage of the database. Normally when you do something like this you get loads of false discoveries thanks to homology and just database scaling issues and your number of IDs drops through the floor. At the same FDR, they only lose 15% of their PSMs.

It gets better -- HEK293 was immortalized with an adenovirus infection. Using this massive database -- they identify the virus incorporation sites!!  This thing is POWERFUL. They do some more validation with some bacterial proteomes (which is what it is intended for) and it appears to work better on those!

The data from the paper was deposited in PRIDE/ ProteomeXchange under PXD003896 and PXD003907.

Monday, August 21, 2017

Toward solving the stromal microenvironment puzzle!


(Image above showing the components of tumor stroma from this ResearchGate-hosted article)

A cancer cell alone in the body generally isn't a big deal. Maybe it starts dividing uncontrollably, but cancer cells can also starve. In fact, they generate more waste and use more energy than normal cells (textbook generalization) so, in a way, they're more vulnerable to needing support than normal cells. In order to form a tumor, that cell needs support from noncancerous cells to provide blood flow and structural support. We typically call all that other stuff stroma.

This new study in Science signaling reveals a ton of new information on what is going on in the stroma by using patient derived xenografts and reporter ion based quantitative proteomics!


Figuring out what is cancer cell and what is stroma can be tough -- however, in this system the cancer cells are human and the stroma cells are mouse! Much easier to sort out what is the stromal support response.

They quantify around 5,000 human proteins and close to 2,000 mouse proteins in these xenografts and -- all the sudden, patterns start emerging in the mouse protein response! This provides a better picture than we had before of what changes are occurring in the "normal" cells to support tumor progression.

During one of my postdocs I did a lot of data processing and QC on transcriptomes of cancer cells through xenograft tumor progression. While this database is undoubtedly useful, there are considerable challenges with microarrays when trying to determine what readings are from human and which are from mice. When you've only got 4 nucleotides, a lot that appears very conserved -- is more like...kinda conserved... 26 amino acids is a much more sensitive readout for determining protein-species assignment!

BTW, this new study used some great study design in advance of this project and the LC-MS workload was divided between an Orbitrap Elite and Q Exactive system.

Sunday, August 20, 2017

MSAcquisition Simulator!


This tool is seriously brilliant, and I don't know how I've never heard of it (glad @PastelBio is back from vacation and Tweeting amazing stuff for me to read!) The study is short and open access and available here.


Do you have a great idea for a new type of way for your mass spec to acquire proteomics data? Have you gotten far enough into it to investigate changing the underlying software on your instrument to implement it?


What if you could first simulate the method to see if it makes sense to buy a backup computer and get to writing first?

That's what this thing does! It creates a realistic (key word here!) simulated proteomics dataset that you can try your new method on. It takes input from your .FASTA file and other features of your organism and integrates those with parameters of your LC-MS system and tests what your expected output would be! Maybe there are unintentional consequences of your cool theoretical method that you hadn't considered. If so, this might save you an awful lot of precious time!

Saturday, August 19, 2017

2 interesting studies on journal quality!


Two really smart people sent me separate articles recently about something called "Predatory Journals." I'm going to reserve my opinions and just put the papers out there because I think they're both really interesting.


This is the second (most recent one above) that I was sent. Some people who read this silly blog may already find something funny about this post, but this gets better.


I strongly recommend this new paper by Lucas McGeorge and Annette Kin (Edit -- this has now been taken down). It deals with mitochondria -- and more specifically -- midichlorians, which are what allows organisms to interact with The Force -- yes...

...this Force...

This entirely fictitious study -- that includes Reddit's favorite scene from the prequels -- in it's entirety(!!) was part of this --


--sting operation! The "study" was sent to 9 journals. Apparently 4 accepted it!  One of them offered the fictional Dr. Lucas McGeorge a position on their editorial board...having obviously never read a single page of this work. As of this second as I'm writing this...only one of these journal websites that accepted it still has this paper up.

Okay -- again, I'm just pointing y'all toward these articles. No opinions on my part. However, I've flipped through some articles on a couple sites that accepted this spoof article. There appears to be some legitimately good and interesting science here. I'd be so bummed to think -- "wow, the reviewers loved my article, it went right through!"  -- and find out this reputable sounding journal with this reputable looking website was on this list. And I think this is why the journals themselves are considered the predatory part....

Thursday, August 17, 2017

What's in my burger? -- proteomics edition!


In case you also weren't aware, there is a video game App called "What's in my burger?" I discovered it while looking for images for this post. The goal appears to be to lure unsuspecting animals to bounce into the "BurgerMaker 3000".  I saw a description that called it "funny and horrifying at the same time". There is no space on my old phone, so I have to take their word for it.

With that out of the way -- time for seriousness!

Food counterfeiting is a serious big deal. The FDA is right down the road from me and they deal with this stuff all the time. I just heard a fantastic story about a company they busted that was attempting to utilize left-over leather as a food additive....yeah....I hear this stuff and I'm saddened that global industrialization has gotten to a level that we need a bunch of highly trained scientists constantly testing our food supply to make sure no one fills it full of poison to save $1 on each 100 kg of food they import. I'm consistently glad that we have these scientists!!

It turns out proteomics can make a significant impact in this field! Check out this new paper. 

This team applied a standard quadrupole-Orbitrap proteomics approach (however -- using 75uL/min! not nanoflow! yay!) to a bunch of different meats mixed together in different combinations. You'll note the proteogenomics mention in the title. They whittled down their FASTA database to focus on animal muscle and to remove sequences with 100% homology.

With this technique, they show they can determine when 1% of the wrong animal tissue is present in the sample -- honestly, it looks to me like they could do better than this as well. After this success, the authors conclude that more targeted quantification methods should be developed here for routine monitoring, but I'm not sure I agree. The ongoing battle between the FDA and U.S. customs with food importers and manufacturers has been described to me as an "arms race." Businesses motivated by that $1/ 100kg savings employ their own scientists who have the job of coming up with any way they can to circumvent these testing rules.

In 2007 a manufacturer/importer found a way around some simple quality tests for protein content in rice protein for animal food (I'm pretty sure it was a colorimetric assay that reacted to N-terminal amines, but don't quote me) by spiking in melamine and this additive (or chemical reactions with this additive) killed thousands of dogs in the U.S.

As an incredibly biased person and admitted dog fanatic, I don't see how developing a targeted MRM assay for some peptides is much different than the colorimetric assays. I get offers in my inbox to synthesize any peptide I want at lower and lower rates every week. I think that if you give me a couple days and some motivation I could beat a peptide MRM based food quality assay. A global HRAM MS/MS based one? That's gonna be a whole lot harder!

I do like this study and if you put me in charge of determining food contamination/adulteration this is exactly the method I'd run with.


(This post reminded me there hasn't been a dog pic on here in a while. Bernie's great at cleaning jars!)

Wednesday, August 16, 2017

Dissolvable protein gels to extract intact proteins!


If this is true:
1) This could be amazingly useful!
2) Somebody please commercialize it fast so I don't have to ruin my day making my own (terrible) gels!


This new study in ACS can be found here.
You can't tell me you haven't ran a gel at least once and wished you could just get that protein out of it intact, right?

The info for what is in the dissolvable gels is in Supplemental Table 1. This doesn't look all that weird. I guess the power comes from the slow dissolving of the gel in the special mixture they used. Hey -- works for me -- someone is going to have a cool problem soon that they ask me about and I'm going to have this paper in my utility belt (well...Kindle...) that will get them one step closer to a solution.

Shoutout to Dr. Murray for the link to this exciting method!

Tuesday, August 15, 2017

Unnecessary reminder that Charlottesville is awesome!


Like just about everyone in the world, I'm shocked and saddened by what put a rural town in Virginia in the spotlight this weekend. I'll be honest -- I'm also really really angry and I hope we can all focus these emotions here into motivation to get things back on track. 

When I think of Charlottesville, I don't want the first thing to come to my mind to be what happened this weekend. And I don't want it to be for you either -- so here are some unnecessary reminders that Charlottesville is and has been an awesome hub for science! 

I'm going to start this post like this -- some guy named Don Hunt is there!!! -- AND he's been there for kind of a while.

Some decent science has come out of the Hunt lab in Charlottesville, I'm having trouble coming up with anything off the top of my head, but I think there are a couple (come on, brain...)

-- Oh wait --- here's one



  --- ETD!  Electron transfer dissociation -- a strategy essential to many proteomics experiments today for PTMs and intact proteins -- and heck, regular big 'ol peptides came from that little town.

--hmmm...there was this other kind-of-important paper -- I can't remember what it was called -- wait! was it called PROTEIN SEQUENCING BY TANDEM MASS SPECTROMETRY!?!?


Might have been the one I'm thinking of....though, I swear I was thinking of something that everybody is trying to do right now that even with today's tools, software, and super computers is still really hard to do -- wait -- was it MHC peptidomics?!?


I can go on and on with this. Work from the Hunt lab in Charlottesville has been cited in over 47,000 other studies. (Google Scholar numbers this morning).

Of course this hasn't been Dr. Hunt toiling alone in a little closet lab in that scenic mountain town. In 2007 ACS reported over 130 grad students and postdocs had trained in the Hunt lab -- I've had the privilege to work with several and among them have been some of the very best mass spectrometrists I've ever met -- some of these students have went on to do amazingly impactful research in labs of their own, names like (in no particular order, and just a few chosen off of the top of my head -- no disses intended! The Hunt lab either doesn't recruit or release dummies.) 

John Yates III
Josh Coon
Ben Garcia
Trixie Ueberheide
John Syka

...all passed through Charlottesville. No joke -- according to Scholar you're looking at something in the range of 150,000 to 200,000 citations from just the work of these authors alone!

I do have to mention that the chemistry department isn't the only place in Charlottesville with a mass spectrometer. The UVA School of Medicine has a top-notch core facility open to internal and external customers. Proof? De novo protein sequencing is just listed as a service. I've visited. They know what they're doing! 

It is also worth noting that some of John Fenn's earliest work on ion beams was done in collaboration with John Scott in Charlottesville (as noted in Dr. Fenn's Nobel Biography here

Look -- this post is probably dumb -- but if this helps to remind anyone that Charlottesville, VA is a place that can be directly linked to some of the most impactful science (IMHO) of the last half century rather than just a place associated with the deplorable acts of last weekend, then I haven't wasted all my pre-work time this morning! 

Monday, August 14, 2017

What can we learn from intentional protein carbamylation?


Carbamylation is one of the primary reasons that I always use the free Protein Metrics Preview node on my RAW files before queuing up a big run. It is generally an unintentional artifact caused by your proteins spending too much time in urea -- or in urea that is too hot.  However --

--these authors carbamylated some stuff on purpose!

This team intentionally carbamylates proteins and looks at changes in the MS1 profile of several different proteins.

This causes massive shifts in the number of charges the proteins can accept! As you might expect from some of the names on the study, the next thing they do is look at how these proteins fragment. The carbamylated proteins provide much worse HCD fragmentation spectra and (surprisingly to me) big shifts in the elution times.

However, UVPD fragmentation of these proteins remains mostly unchanged -- carbamylated or not. While the authors may have used this study as a way to learn more about how UVPD fragments intact proteins and conclude it does not follow the mobile proton model, I'm wondering if this could have other applications.

When looking at intact proteins, there are some big advantages to having proteins accepting fewer charges. This makes the protein isotopic envelope easier to deconvolute and helps reveal co-eluting species (often proteoforms). Obviously, there would be some disadvantages to carbamylating all your proteins -- like now you have another PTM to worry about -- and less charges means your proteins are in a higher m/z range, but I can't help but think this might come in handy later!

Sunday, August 13, 2017

Measuring protein interaction energies in a single native mass spectrum!


I'm not even gonna pretend I could've come up with this idea.  Albert Heck, I'm not even gonna pretend that I have the qualifications to be critical of the work they did. Nope! I'm going to take what I understand of what I read here at face value and say that I think it is a really cool idea, though!

This is the paper. 


What did they do? They purified proteins that they are interested in that interact. They purified these from cell lines where they made mutations that changed single amino acids in the interacting sites of the proteins. Then they did native mass spec (in a modified(?) Q Exactive Plus system. It's a little unclear, it may have the vendor's native mass options "Biopharma mode" but I didn't check the reference.

If you have two proteins that interact and you can get a mass spectrum from the native configuration of those two proteins, but you control the mutations in the proteins -- you get a super cool readout! What is the strength of the interaction of those two proteins -- and, more importantly, how does it differ when you change this amino acid over here, vs this one over there? 

Right? See how cool that is? How easy would it to be to extend this to your protein(s) of interest that you already had somebody construct all those mutants of?!? Sure, the math looks a little daunting at first, but they explain it in such clear detail (good writing!) that I feel like you just plug this all into Excel and only use that scary math thing at the top of this post in presentations to impress people in your department.

Saturday, August 12, 2017

Deep coverage of the beer proteome!


On the new list of surprisingly complicated matrices that also make me feel strangely thirsty on Saturday afternoons...


...the BEER PROTEOME sequenced to a depth of 1,900 proteins!

Why are there 1,900 proteins in my beer? I'm going to guess there isn't in mine, because I'm trying to get in shape for my local "basketball league for people way too old to be playing it" and I'm sticking to an American atrocity called an "ultra light" (don't try it!!).

This group pulled a real beer -- a traditional Czech one -- and did some interesting isoelectric focusing based fractionation (the opposite of the 1D-gel that I do) and used an LTQ Orbitrap Elite on the fractions (they employed "high/low"). (FASP was also utilized for digestion as shown in the image above).

I know what your first question is -- where did they get the beer from? From a grocery store! Apparently they live somewhere civilized where you aren't limited to one specialized government-operated location per township that is authorized to distribute these sinful liquids. These intrepid researchers can buy beer for some topnotch science experiments and also pick up some crisps to eat without wasting gas driving somewhere totally different!  I'll trade that for my local grocery store -- that also carries firearms (an increasingly common thing in the U.S..) in an instant!

If your second question is also -- how much was necessary for this proteome depth(?) I can't quite tell. They concentrated it to 3mg/mL prior to the desalting step mentioned above.

Ben's rambling aside -- This is a solid bit of science on a really interesting topic.

Friday, August 11, 2017

Proteomics and lipidomics combines to reveal shocking membrane complexity in Plasmodium!


These authors sure know how to make my day! An amazingly in-depth analysis of the lipid membrane microdomains of anything would probably get my attention anyway, but this great new paper is extra awesome:



1) They combine proteomics and lipidomics
2) They do the work on the sexual stage parasites of the universally despised Plasmodium genus (in this case, the mouse infecting model organism Plasmodium berghei
3) They do top notch bioinformatics that not only leads to output showing enrichment of the proteins interacting in this awful-to-work-with mostly detergent-resistant mess of lipids, but also protein-lipid associations (what?!?)
4) They show how incredibly important designing and upstream experiment is in biology by pulling all of the proteomics with a linear ion trap and some really clever analysis and the lipidomics via TLC and GC.

How'd they pull this off? With an amazingly painful amount of upstream sample preparation. What? Another list? Sorry, I have to, to capture how glad I am that they did this and I didn't have to.

1) Synchronized the parasite (difficult)
2) Infected mouse.
3) Removed parasites when they reached their sexual stage (difficult to determine AND remove)
4) Separated infected from uninfected cells on Nycodenz gradient (blech. blech. blech.)
5) And then it gets fun. Lysing the infected red blood cells, determining the protein content at every stage with anti-parasite antibodies and more detergents and gradient ultracentrifugation and some homogenization.
6) I have to end the list. The amount of work here is truly just awe-inspiring. It just keeps going....

The amount of care that these authors spend on ensuring that they aren't going to the next awful painful step unnecessarily by carefully determining what they currently have is just amazing. I'm so glad there are people in this world who go to these lengths to help understand malaria so we can kill it.

Once they obtain their lipid rafts, they break them down with chloroform and methanol to get the peptides and take the lipids through a staggering amount of work to HPTLC and GC readouts.

And then?? then they make sense of all these measurements and more that I didn't mention!! The network analysis of the proteins is linked with the major lipid class identifications and paints a surprisingly complex picture of the parasite's membrane proteome -- and how it interacts with an also surprisingly heterogeneous lipid profile in the strains they work with.

This was a monumental amount of work, and I'm beyond impressed.

All data is uploaded at Massive and will be available at full paper release (the authors include access info now if you really want it)

Thursday, August 10, 2017

Great review on fecal metaproteomics!


There was a great paper in MCP recently that looked at the metaproteomics of baby poo. Turns out that wasn't the only group working on this easy to obtain, but tough to process, sample type!

This is a surprisingly comprehensive review on the topic -- in fact, this is one of two published this month, in what appears to be a rapidly growing area of our field. I'm challenged sometimes by finding clear peptide level differences in samples from a single -- fully sequenced - species. In high complexity ecosystems? Ouch.

If this area really is growing as fast as the recent literature suggests we all might be called upon to do this sooner or later and this review is a great place to start on the topic!

Not-at-all related to this paper:


Encode a computer virus -- in DNA!!


Long day running between meetings -- and this isn't proteomics -- but I have to post this.

These guys encoded a strand of DNA that was actually a computer virus -- and could potentially hack the DNA sequencer....

Sounds dumb till you consider the fact that some sport agencies have been considering genetic testing for alterations like the "mighty mouse gene" for a while. Tough to do that when the sequencers all crash!

Sure -- this is Sci-Fi stuff now -- but I really liked the article.

Wednesday, August 9, 2017

New NIST paper answers so many questions I've had about intact protein MS!


I have been called upon from time to time to do some intact proteins. Honestly, I believe I achieved the point where I finally failed enough at it (ask my postdoc advisor) that I'm reasonably good at it. However, I've ALWAYS had some questions about what is going on with the proteins as I kind of haphazardly change settings till I get a good signal. Especially when I'm tinkering with source parameters trying to, for example, get a low abundance mAB PTM to resolve a little better!

This brand new paper at NIST answers so many of the things I've wondered about in my head that I honestly can't come up with anything except -- how don't I know these authors?!? We're a pretty small community here in Maryland...but I'll sort this out later.



In this work they take several intact proteins of varying sizes-- all interesting (BONUS!), starting with CRP and up to the NIST mAB. They utilize three instruments, an Orbitrap Elite and 2 Q-TOFs. Systematically they go through a bunch of different parameters, including:

Source temperatures
Solvation energies (in source CIDs)
AGC targets
Resolution
They even go through the (always fun for those of us with multiple knee surgeries) manual optimization of the HCD gas pressures with the little knob hidden deep inside the later LTQ Orbtrap systems.
Etc.,

What they come up with is the best study I've ever seen on optimizing an LTQ Orbitrap system for obtaining reproducible intact protein measurements (particularly focused on the low abundance PTMs). Of course, these observations can be directly converted to the other systems -- since a lot of what they look at will be the same for the quadrupole-Orbitraps as well.

I've already sent links to this paper to a number of different people I know who will find this very useful.

Interesting note -- they calculate the intact protein theoretical masses using this cool little program from NIST that I'm embarrassed I didn't know (or forgot) about!