In the abstract sense, experimental systems contain two polar nodes: variable inputs and result outputs.

In the simplest experiment, all conditions are constant and only a single variable is changed (e.g. +/- Stimulant X). This ensures that differences in the result output are exclusive products of the variable input.

But scientists are curious and impatient folk. They’re often interested in lots of things and have limited time to produce an answer. Consequently, they test multiple variable inputs concurrently (e.g. +/- Stimulant X, +/- Stimulant Y, +/- Stimulant Z). This is known as a multi-variate experiment. Presuming you can analyse the data, multi-variate experiments make sense. Concurrently testing multiple hypothesises is quicker than testing them one-by-one.

What about the result outputs? The simplest experiment produces a single result output (e.g. +/- Signal). But again, inquisitive scientists want to know more. So they try to measure multiple result outputs concurrently (e.g. +/- Signal 1, +/- Signal 2, +/- Signal 3). I can’t find a communal term for this, so I’ll call this a multi-result experiment. Again, like their multi-variate cousins, multi-result experiments make sense. By collecting more results at once, you obtain faster insight regarding your hypothesises than if you measured each result one-by-one.

Thus, to get the theoretical maximum out of any experiment, we want to test lots of hypothesises (multi-variate) and collect as much data as possible (multi-result). Unfortunately, for many biological fields, this big-input, big-output combination is rarely possible. This is especially true in my own field: proteomics.

For example, consider a classic protein quantification method: the ELISA. A 96-well ELISA microtiter plate can host multiple variable inputs with multiple replicates per variable. You can test a lot of samples with an ELISA. However, ELISAs typically only measure one result output (e.g. a protein antigen). ELISAs have a large number of variable inputs and a small number of result outputs. They are “top-heavy” experiments.

Conversely, consider a SILAC mass-spectrometry experiment. "Light" and "Heavy" SILAC channels can only host one variable (with no replicates). However, LC-MS/MS analysis produce thousands of relative protein level result outputs. SILAC proteomic experiments have a small number of variable inputs and a huge number of result outputs. They are “bottom-heavy” experiments.

Ideally, to perform multi-variate and multi-result proteomic experiments, we need a way to merge these respective “top-heavy” and “bottom-heavy” approaches. We need “hourglass” experiments.

So how do we develop "hourglass" proteomic experiments?

Collecting thousands of protein measurements via LC-MS/MS is extremely powerful. It makes sense to continue using LC-MS/MS for large result outputs. What we really need to do is increase the number of distinct variable inputs for LC-MS/MS analysis.

So what's stopping us?

Most quantitative proteomic experiments use labels to distinguish between variable inputs. SILAC (or CTAP) and isobaric peptide labeling (e.g. iTRAQ or TMT) support 2-3 and 6-10 variable inputs respectively. By individually labelling each variable prior to mixing, variable-specific technical variation does not bias the result output. Labelling controls variable-to-variable input bias – and this is extremely powerful. Unfortunately, as you can only test labelled variables, the finite number of labels technically limits the number of variable inputs. Labels strangle the number of inputs possible in bottom-heavy LC-MS/MS experiments.

One approach to increase the number of input variables is to not use labels. To be label-free. Being liberated from labels means you can have as many input variables as you like! The huge problem with label-free quantification is that it is incredibly vulnerable to technical variation. Unless all sample-prep steps are uniformly consistent and reproducible, small technical errors can lead to large result output biases. This is big problem when using multi-step sample preparation – such as is required for phosphoproteomics.

To address this issue, together with my colleagues at the ICR (and CRUK Manchester), I've been working on a method to uniformly enrich phosphorylated peptides for label-free phosphoproteomics. In short, we adapted typical phosphopeptide enrichment protocols to work with a 96-well particle handling robot. We call the method: Automated Phosphopeptide Enrichment (APE).

APE brings two salient advantages to phosphoproteomics:

1) A robot provides uniformity. Consistent, automated enrichment time after time. Every well is the same – every plate is the same. Result outputs are reproducible. This means we don't have to worry when using label-free quantification.

2) A 96-well plate provides lots of samples. Variable inputs are increased. It means replicates. And replicates mean statistics.

When you combine reproducible result outputs with increased variable inputs – you can start doing hourglass phosphoproteomic experiments. To demonstrate this, we tested the phosphoproteomic consequence of oncogenic KRAS (KRAS-G12D) in PDA cells. Using the multi-variate input provided by APE, we tested three cell lines, using three biological replicates and three technical replicates in one experiment (54 variable inputs). Using the multi-result capacity of LC-MS/MS, we quantified 5,481 phosphopeptides. That's a big-input, big-output, hourglass phosphoproteomic experiment. Crucially, this allowed us to identify a core panel of phosphosites that are statistically regulated by KRAS-G12D across all PDA cells. There are no anecdotal results. Some phosphosites we'd seen before but many were totally new.

This little project has just been published over at Analytical Chemistry. If you're interested in multi-variate phosphoproteomics, take a look.