How to Navigate a World of Exploding Metrics and Estimates with Dr. Stephen Seiler and Marco Altini

The number of, well, numbers we track during training is exploding, but they’re not all made equal. Some represent actual measurements while others are just estimates. We discuss the implications.

FTL EP 3082

We can debate the value of the various technological changes over the last half century, but hands down the biggest evolution in training is the shift from the approach of training purely by feel from the 1960s and ’70s to the approach of tracking numerous endurance workout data numbers not only every moment of your workout, but every moment of the rest of your day. Believe it or not, there was a time not all that long ago when a device that showed only your speed was considered revolutionary.  

RELATED: Fast Talk Episode 19—Training as a Numbers Game 

What’s important to understand is that these various numbers, or training metrics, aren’t all made equal. “Metrics” and “measures” don’t mean the same thing. Some of these data points are actually measured, such as heart rate and cadence. But many metrics—including some of the most exciting new numbers—are simply estimates (such as sleep scores or training stress). These estimates are calculations often based on assumptions or not fully validated correlations, which raises serious questions about their validity. For example, how accurately can we truly analyze something like sleep based on changes in HRV? Especially when you question how accurately HRV is being measured using light sensors on your wrist.  

If you’re wondering how complex this issue of measures verses estimates can become, remember that even power is not a measure. Power is actually a calculation based on cadence and torque that requires some estimations. Here to help us navigate this tricky yet fascinating subject are none other than physiologist Dr. Stephen Seiler and founder of HRV4Training, Dr. Marco Altini, who is a key science advisor for Oura rings.  

They’ll explain the differences between measures and estimates and how both can have their issues. They’ll discuss how most metrics try to provide a better understanding of load, stress, and strain. Then they’ll dig a little deeper into what Dr. Altini calls known and unknown metrics—those that we can and can’t validate. They’ll then shift gears to the psychological impacts of these various metrics—particularly sleep rating—on athletes and their performance. Finally, they’ll round out the conversation with ways to differentiate not only good measures from bad measures, but more importantly, good estimates from bad estimates. Some estimates can be very valuable, but be careful of who and what you let guide your decision for what metrics to include—flashy marketing and false promises or metrics that have been validated? 

RELATED: Fast Talk Episode 199—How to Use Data to Make Better Training Decisions, with Tim Cusick 

Joining the conversation, we’ll hear from elite cyclist and author of How to Become a Pro Cyclist Jack Burke, physiologist and writer Brady Holmer,  TriDoc podcast host Dr. Jeff Sankoff, and elite cyclist and coach Taylor Warren.  

So, think critically about how you want to measure your speed, and let’s make you fast! 

Episode Transcript

Trevor Connor  00:04

Hello and Welcome to Fast Talk, your source for the science of endurance performance. I’m your host Trevor Connor here with Dr. Stephen Seiler doing co-hosting duties, we can debate the value of various technological changes over the last half century. But hands down the biggest evolution in training is the shift from the trained purely by field approach of the 60s and 70s. To having numbers to track not only every moment of your workout, but basically every moment of your day. Believe it or not, there was a time not all that long ago when a device had just showed your speed was revolutionary. What’s important to understand is that these numbers or training metrics aren’t all made equal metrics and measures don’t mean the same thing. Some things are actually measured, such as heart rate and cadence. But many metrics including some of the most exciting new numbers are actually estimates such as sleep scores or training stress.

Trevor Connor  00:54

These estimates are calculations often based on assumptions or not fully validated correlations, which raises serious questions about their validity. For example, how accurately can we truly analyze something like sleep based on changes in HRV? Especially when you question how accurately HRV is being measured using light sensors on your wrist? If you’re wondering how complex is issue of measures versus estimates get, remember, even power is not a measure. It is actually a calculation based on cadence and torque. That does require some estimation here to help us navigate this tricky but fascinating subject are none other than physiologist Dr. Steven Siler and founder of HRV for training Dr. Marco Latini, who’s a key science advisor for the aura rings. Together, we’ll explain the differences between measures and estimates and how both can have their issues.

Trevor Connor  01:41

We’ll discuss how most metrics try to get at the concepts of load, stress and strain. Then we’ll dig a little deeper into what Dr. LTV calls known and unknown metrics, those that we can and can’t validate. We’ll then shift gears to talk about the psychological impacts of these various metrics, particularly sleep rating on athletes and their performance. Finally, we’ll round out the conversation with ways to differentiate not only good measures from bad measures, but more importantly, good estimates from bad estimates. Some estimates are actually pretty valuable. But be careful when marketing teams start getting involved in deciding what metrics to include. During the conversation, we’ll hear from elite cyclists and author of How to Become a pro-cyclist Jack Burke, physiologist and writer Brady Homer, Tri Doc podcast host Dr. Jeff Sankoff, and elite cyclist and Coach Taylor Warren. So think critically about how you want to measure your fast and let’s make it fast.

Marco Altini  02:35

Today’s episode of Fast Talk is brought to you by alter exploration created by me fast talk labs co founder Chris case, alter exploration crafts challenging transformative cycling journeys in some of the world’s most stunning destinations. A mantra is a powerful tool used to focus your mind on a particular goal and create calm during challenging situations. Our mantra transformation begins where comfort ends. This monster isn’t meant to be intimidating. On the contrary, it should be invigorating. For many people everyday life is filled with convenience, monotony, and lack of time spent in nature. Alter exploration facilitates the exact opposite, challenging, invigorating, life altering experiences in the natural world. alters journeys aren’t so much a vacation as an exploration of you and the destination. At the end of every day, be preoccupied as much by the transformative experience, as by the satisfaction of exhaustion, life altered. Learn more about my favorite adventure destinations and start dreaming at Alter

Trevor Connor  03:41

Welcome, Marco, welcome Dr. Sylar to the episode. I’m kind of excited about this, because Dr. Sylar, this was an episode that you brought up to us that you really want to do. I know this is something you’re a little bit passionate about. So I kind of see this as you’re the host of this episode, and I’m the co host, how do you feel

Jack Burke  04:00

about that? Hey, well, we’ll see how it goes. But you’re right. The background for this is two areas. One, I came back into this research, you know, full speed and had to catch up a bit a couple of years ago, a few years ago. And I was just seeing all these metrics, all these amazing numbers for different things like training load, and that that I was like, wow, where’d that come from? Is that been validated? So a lot of metrics, you know, because of digital tools. We live in this amazing time when we’re able to measure things we’d never have been able to measure before. But we also are trying to measure things that we’re not actually able to measure. And so, Marco Ltd, who is with us today and has the wonderful heart rate for training application. He tweeted not so long ago, a very basic thing. He said, Look, a lot of these watches, they measure one thing but they estimate 10 more or estimate many things, you know, so he was basically saying we measure things and sometimes we estimate and I think A lot of our listeners, a lot of cyclists, a lot of runners, they may be easily fooled to believe that the estimates that they’re getting on their watches are more precise than they really are. So that’s kind of the starting point for this is what are we measuring? What’s useful, a little bit about some of the catch words like validity and reliability, and then we’ll take it from there. Before

Trevor Connor  05:24

we dive into some of this complexity. Let’s go to some real basics here. And just asked a simple question, why do we monitor our training, I

Jack Burke  05:33

perceptually. Think of it as this way. Well, the first thing I think about when I think of measurement is i, if I’m a coach, tell my daughter, I want you to do four times eight minutes a day, or I want you to keep it easy today, I want to see whether or not her execution or the athlete’s execution matches with the prescription. And that is simple, but it’s surprisingly important. Point one, and then that creates a starting point for individualization that I, as a coach, or as my own, self coached, you know, coaching myself, I can make adjustments, I can individualize things, because I start to see how my body responds to certain prescriptions and execution of training. So that’s point two. And then third, that builds on that, again, is I can detect deviations, you know, if my heart rate is running low, or running high after workouts or a period of high load, that tells me something, and hopefully, I can make some adjustments. Very often the adjustments involve rest or reduced loads, because often the telltale signs are associated with having pushed too hard or too long, and so forth, or not had any rest. So that’s the third point is just being able to make adjustments early enough so that it doesn’t become a big problem. And then finally, I think it’s more about institutional knowledge. Whether your institution is a huge team of pro cyclists or me is just a coach of my daughter. I’m trying to build a library of understanding that helps me to both coach those specific athletes, but also future athletes. I

Trevor Connor  07:11

found that really interesting, because we had Dr. Coggins and Hunter Allen on the show, and we were talking about training zones with them. And that was something they brought up, which was a big misinterpretation of training zones, they said, training zones, were never meant really for the analysis to say you had this exact physiological effect, because you were in this zone. They said they were designed to be prescriptive, they were a communication tool, they were a way to make it easier for a coach to say to an athlete, here’s how to do this particular workout. It was never meant to be a if you’re at 98% of threshold, you’re getting this training, adaptation. But if you’re at 102%, a threshold, you’re getting a different training adaptation. I heard you saying a lot of that it’s it just it helps you coach and guide your daughter, it gives you that tool for communication.

Jack Burke  07:59

Well, yeah, not just me, I think that’s one of the biggest success stories, for example, in Norwegian endurance sport, which is kind of punched above its weight class and endurance for some decades, is just having the same intensity scale, that everybody kind of understands the same. But you know what they say zone four, we all know what that means. You know, and there’s some there’s some sports specificity to it. But it’s universal enough that we have a good starting point for interaction for communication. The other thing is, is of course, I agree. And I think, number one, from a stimuli for adaptations standpoint, these various training zones are so much overlap in terms of the generation of a stimuli for altered protein synthesis, you know, getting down into the rabbit hole of what’s happening to the cells. The muscle cells are not clearly distinguishing zone two from three from four, there’s overlap. But maybe what we’re really using the intensity zones to is to manage stress responses to manage, how is the body recovering, making sure that that flow is sustainable? So that’s been my kind of 25 year take home on all this is? No, I don’t think I’m controlling precisely the stimuli, but I’m doing a better job of controlling the ebb and flow of stress and trying to keep on the plus side for the athlete over time.

Trevor Connor  09:30

Marco, what’s your thoughts on this?

Marco Altini  09:31

Yeah, maybe I would love that, outside of what Steven was mentioning and monitoring training itself. I think the other aspect that we consider linked or monitoring training is monitoring not the session, but what happens to the body after the session for example, and I think this links back to a lot of the aspects we want to discuss in terms of what we are measuring and what we are estimating, for example, The downside of the body’s response to the session, right? So we could look at data collected during the session and maybe have our is that from what we had prescribed and are the athletes responded. But we can also look at the data collected after the session through different technologies, apps and wearables that now athletes are using a lot devices that you can use to measure first thing in the morning or in the night, to measure our physiology, and try to see if these measurements reflect accurately the body’s response to the stimulus. If we have a response, that is not what we expect, that is also an indication that maybe the stimulus was not appropriate for an athlete at a given point in time, or that maybe there were other stressors, so playing a role, right, because when we isolate training and look on your training data, sometimes we forget that a lot is happening. And maybe we’re traveling, maybe there’s some work related stress or some other things that will also have, in a way, impact on our ability to assimilate the training stressor, and respond positively to that. So as we start looking at monitoring the response through different technologies and devices, than I think it can get a bit confusing, as many of these devices come up with numbers and scores and estimates that are not necessarily even things that actually exist, that you can measure with other devices, right. So some are, are sort of made up, you know, readiness, recovery scores, things like that. I think it’s important, maybe later that we frame this as something that are a bit of a different category with respect to actually looking at the physiology, which could be just your resting heart rate, or your heart rate variability or things that you’re actually measuring, as opposed to things that you are estimating, and that you cannot even in some cases, possibly validate, because they are not quantities that have a reference device, it’s

Jack Burke  11:58

useful to have a couple of frameworks here that kind of give us some pegs to hang things on. One framework would be for me this idea that when we’re measuring this monitoring process, we kind of have this triangle or triangulation, where we’ve got some things we can measure that tell us just what was done the external load and for the runners, that is just distance and time, you know, how long did you run? How long did it take to do so you can get an average velocity, you can get a a number of kilometers, you can get some pace in those kinds of characteristics, the cyclists can get power times duration. And those are basic things that they are measurable, there is essentially a gold standard for power, there’s even ISO industrial standards for instrumentation, for some of these things, ensure at least some degree of precision in those measurements, you cannot sell certain products unless they are able to measure correctly, the treadmill has to provide you with a reasonably accurate measure of Treadmill speed, or you can’t sell it. So we have some protection on those kinds of measures, then we go over to some physiology. And we’ve got our old standby of heart rate, which I think you know, we can say fairly confidently that yes, we can measure that. But there are caveats in terms of the ECG standard versus meaning the electrical, the belt versus the photoplethysmography on the wrist. And there, there are methodological issues, and user issues, the user just uses it wrong. So there are problems there. But we know what the problems are. And we can fix them, for the most part. So heart rate can be a very useful tool, and it’s valid and it’s reliable. If we do it, right. We’ve got lactate lactates tricky to measure, you got to do it, right, you have to have skills, and especially in the field, you know, so people make a lot of mistakes. But it’s the technology that works. But it depends on user skills. We’re getting online with some other things like breathing with these shirts, we can measure perceived exertion, we can ask people how they feel. And that puts us over in that other category, which is this perceptual. So external, internal physiology and an internal perception. Those are kind of three main categories of data, at least in the training process itself. You know, we can measure what we actually did, we could measure how we responded to it, there and then and then we get to where Marcos taken us in, which brings up this another framework, which is this idea of that engineers use and that’s load stress strain load is just what you do. Stress is how you respond to it there and then and then strain is at least in a biological context, I would argue that the strain is lingering effects that don’t go away after 24 hours, that there is somewhat of a lingering negative consequence or fatigue or a change in heart rate responses, a change in heart rate variability, a change in readiness to train or some other perceptual measure of how do I feel. So that’s, that takes us to that post training that something is changing. And I got to think about this, do I need rest do I need to ease up a bit, you know, so that’s kind of a framework that I work with is load stress strain, I stole it from engineers. But I find it kind of useful in this forest of data.

Trevor Connor  15:39

And I love the analogy you use for it, because it makes it really clear where you have this plank of wood that’s sitting across two beams. So it’s, and you you put a large weight or a brick or something on that piece of wood. So that brick is the load. And the woods response, trying to hold up that break is the stress. And over time, you might see a bending in the board or the piece of wood, and that’s your strain.

Jack Burke  16:06

And then you take away the load. And of course the strain disappears. But it may not disappear immediately that you know that wood may be bent, and it slowly returns to its starting point. So there are some analogies that kind of are useful. They’re not perfect. There’s no model that’s perfect. But I I find the load stress strain from engineers to be somewhat useful to me, and to try to categorize some of these variables that we’re measuring. And say, what where are we in this process? And I think there’s some of these measures that we do we use that maybe are misplaced. I’ve often harped on training stress score from training peaks. And I love the guys that training peaks. But I don’t think that’s appropriately named because it can’t really measure stress. It just measures what you’ve done in a kind of a calibrated way. So I would say that should just be a load score, you know, and then we’d be interested in saying, All right, what’s the stress response to those loads, right?

Trevor Connor  17:07

Power is very direct and clear measure of load 300 Watts, 300 watts. But what it doesn’t necessarily capture in the numbers is a subtle differences in the stress experience, for what appears by the numbers to be the same load. Here’s Jack Burke to explain.

Jack Burke  17:24

So this is something that took me so long to figure out my career. And it was such a game changer when I did. And so going back to like power, like it’s a measure of torque, right. And so there’s a very big difference on how you make power at 50 kilometers an hour versus eight kilometers an hour. So for me, I would always do all my training on climbs were like or do a lot of my intervals. When I was a junior, I lived in Toronto, we didn’t have any climbs. So I would do all my intervals on the TT bike, and I got very good at time trialing, when I moved to the West Coast, suddenly, we have all these mountains, and I just want to go do my efforts there, I start doing all my efforts on the climbs. And now suddenly, I can’t put at the same power on the flats. So if you want to be able to put at the same watts on the climb versus the flats, you have to train an equal amount of time of intensity on the flats versus the climbs. And it all comes down to inertia. Like the way your muscles fire at 50 kilometers an hour compared to eight kilometers an hour is completely different. So knowing how to train this, depending on what you’re training for, if you’re training to be a climber versus a time trial is versus a sprinter, something like that, that was just something I never factored in. Because I always just thought watts or watts power is power. But it actually matters how fast you’re going. Because the way the muscles fire is different. And it’s like you have the engine already, you just need to teach the transmission to fire differently, you need to give the transmission a different set of gears. And that’s why like motor pacing can be really effective, because you need to teach the muscles to fire at race speeds.

Trevor Connor  18:43

You and I have talked about this a lot. And if I was looking at the stress response to that, we’ve often looked more at things like what’s the cardiac drift over the course of the workout? Right, and there’s where you’re seeing the body’s response, we’re seeing

Jack Burke  18:55

ventilatory drift is an even stronger measure, you know, so we’re, we’re coming online with some new, you know, with taking ventilation out of the laboratory and into the field. And in that relationship between heart rate drift and ventilatory drift is really interesting. But we’re not ready to make a variable, we’re not ready to say Well, here’s the Seiler breathing index. We got more work to do. But it’s very, it’s tempting, because it’s so easy with digital tools to just make up a new thing. You know, all we got to do is divide a, you know, a numerator and a denominator or multiply this times that or take the fourth power and then the fourth root of something, and we’ve got a new number. So this is dangerous, I think and it’s part of our issue here today.

Trevor Connor  19:43

So before we go there one thing I just want to say in response to everything you’ve just told us, I think you’ve made a really important point that I really want the listeners to remember as we continue this conversation, which is we tend to think well we got these metrics and you go into whatever software platform use training peaks, golden cheetah, Wk Oh, and you see all these different metrics and you think they’re all made equally, but they’re not. And you just brought up there’s different categories. And there’s a different quality of those metrics. So you raise the fact that there’s some that are external load and some that are internal response. We did do an episode on this, I’ll put it in the show notes. But as you said, you know, power distance, those are external. That’s that’s what you’ve done doesn’t say anything of what’s going on in your body heart rate. That’s an internal metric. But you’ve just raised another thing, which is the direct measure versus estimates. And Marco, I think you had a few other categories to bring up, which was the known and unknown parameters, and also the health readiness versus training vectors that just describe what happened. Do you want to quickly explain what you mean by those two categories?

Marco Altini  20:49

Yeah, for sure. I think that the first distinction we can make is measurements and estimates. And sometimes it’s maybe easier to look at wearables metrics, because here’s where we have more of this. Like when we look at training data and everything we talked about so far, a lot of the things we look at are actually measured, right, so we measure power and cycling, we measure heart rate, we measure distance of think, during training, there’s a fair amount of measuring and maybe less estimating, when it comes to the response to the stressor than I think we get into a bit of a more complex way of assessing this kind of responses in a way that we have some measurements. But then in the past few years, we have seen a lot of estimates coming up from wearables, the typical ones would be sleep scores, recording scores, readiness scores. And the way I further classify this, apart from the measurements and estimates, and again, the measurements, here would be the things that your wearable can measure because there is a dedicated sensor in the device that can measure that parameter. For example, if you have an optical sensor, you know, there’s a green light that you see there, and this detector that is going to measure the light that is transmitted or reflected by the light, then you have a sensor there that is measuring changes in blood volume, and this measuring your pulse rate, and therefore your heart rate. So that is something that has been designed for that job. And you can do that job, it does not mean that it does it perfectly right. context matters, measurements are not perfect. So addressed, it might read very well, as you move around and maybe move your wrist and things like that there will be added noise and the measurement might also become inaccurate. But there is a sensor there that is designed to measure that parameter. Outside of pulse rate, heart rate and variability, almost nothing is measured, right? The devices provide a lot of additional parameters to track your behavior or your response. And those I’d like to classify them between distinguish them between two groups, as you were mentioning, so that would be the known and the unknown parameters. And that will be simply something that allows us to distinguish between the things that we can actually validate with reference systems. And these will be the non parameters, things like calories or sleep stages, right, we can get indirect calorimetry, or direct calorimetry and measure calories. And then we can see what our body is providing. Or we can get polysomnography and measure brainwaves and look at what that is providing with respect to our wearable, that is measuring sleep stages. But there are other parameters that are not something we can measure, even with another device or reference system. And those will be the unknown parameters. And there’s a lot of this, right. There’s stress estimates, recovery readiness, sleep scores. And all of this is sort of made up. So that is, I think one of the most challenging things because it’s also difficult to evaluate. It’s easy for people that maybe I’ve invested in the device to think that this is working because we want it to work, right? Maybe we pay subscription every month. You know, there’s a lot of marketing that tries to convince us that it is working. But there is really no reference for that. And I think we need to be a lot more careful with this kind of estimates. And what are the implications in terms of how we might try to adjust training or assess the impact of training based on these parameters. And to get there to try to do that a bit better, I think we really need to understand how these are built so that we understand their limitations. And we can try either to use them a bit more effectively. Or we simply decide that we ignore those and we use the wearable to look at the actual physiology since it is measuring it it is providing it to us. And that is probably what it can do best a measurement of your resting physiology, as opposed to building maybe things on top of that. That might be tricky to evaluate. but not particularly relevant for athletes, I will maybe just start one thing about this, that I think in the context of people interested in exercise, or athletes, or coaches or people that use wearables to understand how training is going, I think, in this context, it is particularly problematic at times to use these scores, because they combined our physiology and our behavior. And they think that sometimes it’s not too clear. So when we look at an athlete looks at our readiness score or recovery score, they might think, hey, it is very low, something is wrong in my body. But that is often just part of what is making up the score. Part of the score is just your behavior, which means that if you sleep a bit less, or if you were a bit more active, you will also get a lower score. Because the device makes an assumption based on a genetic model that is not dependent on you. That may be less sleep requires more recovery. But that does not mean that your physiology was impacted negatively, it could actually be perfectly normal. So there was no change, no sign in your body, that sleeping a bit less was detrimental for you, and that you needed more recovery. But the device will still tell you so because it relies on this kinetic model that is also using your behavior to determine the output. So I think especially in the context of training, and working with athletes, when we do things like manipulate training, or go to an altitude camp or try different things, we really want to see the body’s response, we don’t want to see this cumulative score, that maybe makes us think that it’s actually more informative, just because it’s putting together multiple pieces of information. But in fact, it’s less informative, because we do not really know as the body responded negatively, or it’s just an assumption that this model is may being based on a change in behavior. For example,

Jack Burke  26:56

I think this is really important. But it also takes us into another aspect of the psychology of measurement. Because if my measurement device tells me Oh, you didn’t get enough sleep last night, you’re sleep deprived, I felt fine. But now I don’t feel so fine. Because now I’ve been told that I have a sleep deficit from last night. And I can be now we’re talking knock Siebel effects. So this is one of the big concerns I have with monitoring is that almost in any setting, whether it’s academia or business, the things we measure, they can easily be gamified, they can easily have psychological impacts on the organization, they can have unintended consequences that are not positive, we perform to the metrics instead of what we really, we really want to sell better cars, but we’re measuring some other aspect of it in the sales division and we we perform to that metric or in research, we have metrics related to publish it, I really want to impact people to help them train better. But what I’m measuring is how many times I publish in a year, for example, and those two things may not be related. So we see this everywhere. But in the context of training, I think we do need to be concerned, we need to be somewhat critical of what we’re measuring. And just before we started this podcast, I was speaking with my daughter, and I wanted to, you know, I asked her so you know some things about her max heart rate and so forth. And then she said, you know, that I don’t I don’t use heart rate variability. I think it might be really good, but I just don’t want to spin out of control on the data. Because she knows she’s a bit OCD, obsessive compulsive disorder, she can easily go down rabbit holes on data. She’s self aware on that. So she purposefully tries to limit what kinds of information are kind of flowing into her head, and mostly tries to go on feeling. You know, she just says, I trust if I feel like not training. I trust that, you know, like I said, Yeah, I think in the research actually supports that. So far, the you know, the psychological kind of perceptual stuff triggers quicker than a lot of the physiology when it comes to those long term strain type issues. So I do think that we need to also remember, our brains are still pretty darn good. They’re pretty useful if we let ourselves use them, unencumbered by too much data. A great

Trevor Connor  29:35

example of what you’re talking about that we’ve seen is with the aura rings in the whoop, where you get athletes buying these and they get a recovery score, they get an estimation of their sleep, and they as you said, they start gamifying it and they start going, Oh, I got I need to I need to get more sleep. I need to get better sleep I need to get a better whoop score for my sleep. And then they get stressed about it and start developing insomnia because they’re not going to bed I’d relaxed, they’re going to bed feeling they have to perform.

Jack Burke  30:03

It’s, it’s just spiral.

Trevor Connor  30:06

So you know, like that I laugh at this because I have that gene where I’m a short sleeper. So every morning I wake up and oops, like you didn’t get nearly enough sleep, you now need to catch up. And I just kind of laugh. But you know, not everybody can do that. That’s fine

Marco Altini  30:20

on Saturday in the original paper, actually, that introduced the word or two Samia, right that we now use as a term to define the people that have the sort of obsession basically, to optimize the sleep metrics and the wearable metrics in a way that becomes unhealthy for them. So they like this. So they were people with insomnia, that had these wearables, and then they understood quickly that the more time you spend in bed, the higher the score, typically, and so they would end up spending, forcing themselves to same but even more, and that would result in even worse insomnia, right? So we ended up getting better scores, and making our health or performance worse by optimizing the scores. So clearly, we have an issue there. And I think maybe there’s a meaningful difference between like Stephens example, where your daughter does not want to include HRV, which is, I would say it’s about do we want to track something or not. And it can be totally fine not to and to rely on field and in what we think works best for us and our psychology. But in that case, at least we are measuring something, I think it’s really crazy in a way that when we look at something that is completely made up and has absolutely no relation with health or performance like sleep score, then that can lead to negative consequences for health and performance. Because it’s not some health related that deals that we have received, or something that is really wrong in our body is something that is almost entirely made up. And it’s not really again, there is no evidence anywhere that the sleep scores are associated with poor health, if you have a lower one,

Jack Burke  32:05

let’s go back to how his sleep being estimated, they’re not putting any kind of a halo on your head to actually measure brainwaves. They are measuring how your hand flops around in the bed, as far as I can tell, you know, so movement of the arm, that’s one indicator that you’re not sleeping, and then I guess perhaps there may be some heart rate associated issues that are brought into the algorithm to try to quantify sleep. But the point is, is that neither one of those are anywhere near actually measuring sleep. And this is exemplary. Here’s

Trevor Connor  32:43

something that’s interesting for you, because I have read a study on this, where they took all these different devices that measured sleep, I believe this was before the aura ring. So I don’t think the aura ring was included in this. But if you ever look at you know, the Garmin watch or the the whoop, they’ll try to tell you how much time you were in deep sleep, how much time you were in REM sleep, and it’ll actually break it down over the course of the night. And in this study that I read, the beta said most of them not even close to what you see in the lab. But they did say that of all them, the whoop was actually close enough that you could potentially use it for research. So I was I was same thing I was I was highly skeptical, but was surprised to see that. So

Marco Altini  33:27

I developed myself they together with the team at the visor for them. And we’re single I was doing some more technical work. So I was actually developed the new generation algorithm that is in the ring. And this, let’s say last generation algorithms they perform decently, let’s say better than anything before the Buddha or even I would say the new versions in the AppleWatch, they are very similar. They may be get 80%, right over four stages, which is again, as good as it gets without managing brainwaves, basically, you know, the previous generation was maybe 6065. So it’s been a decent step up. But it’s still far from what you get with polysomnography and actually measuring EEG. And I think there are various aspects there that all require, say, maybe a long conversation what would be, even if we were to detect how much REM sleep and deep sleep we get and all of this. We really have no idea what to do with that. Like even when we do with when we do it with polysomnography and we get sleep stages. It’s not that this information is particularly actionable. The second aspect there is that when we look at what these devices are estimating, they’re doing it through mostly measurements of autonomic nervous system activity, like again, changes in temperature and heart rate variability and heart rate during sleep, and also of course movement. But I would say the step up recently has been due to measuring atomic nervous system activity, which is not too bad, but it’s not the same as measuring In brain activity, brain states, so is this something different. And I was recently part of this sort of committee with a series of scientists and sleep plus that scientists to write recommendations and guidelines for using wearables in sleep research. And one of the common themes there was that maybe we’re even looking at this the wrong way. So we shouldn’t really be looking much at trying to emulate polysomnography with a wearable. So trying to guess, sleep stages with Steelers that are remained quite large. But maybe we should embrace what the wearable is actually providing us which is different, and possibly more insightful, which is autonomic nervous system activity. So maybe that is what we want to study in the context of sleep, but also in the context of performance and training and all of that. So is your body responding through again, changes in temperature, heart rate, heart rate variability, those kinds of measurements are at this stage, very accurate as you sleep collected with these devices, might be more insightful than than trying to mimic what another device is capturing, let alone that PSGs it’s its own promise, I think the reference device here is something that requires experts to look at the data, and then agree that this 30 seconds segment is deep sleep. And this 32nd segment is REM sleep. And when you put the three aspects together to do this, they agree 85% of the time, so that 85% of the agreement is actually your reference, what you consider 100% When you develop your algorithm, and then you get 80% of that accurate with the web, so many layers here of different areas that are introduced to estimate something that even if we were able to estimate to measure it correctly, we wouldn’t really know exactly what to do with it. Nor probably we should expect to need the same amount of right, deeper REM sleep every day, every week, depending on stress in our lives, training sessions that we did, most likely, the distribution of these stages should change over time of things that we don’t really know,

Jack Burke  37:19

Marco, you know, you’re going down the rabbit hole on sleep. And it’s really an important issue, because a lot of people are being misled. But it’s a very popular thing to try to measure. But I think within that it was it’s also useful to kind of back up a little bit and think about, alright, if the measurements are incorrect, what are the sources of incorrectness or error as sleep is a fairly complex one, but we can take something that the cycling community will understand much better just power, or some derivative there of like normalized power, where we’ve introduced various algorithms, but power itself is become our gold standard tool in cycling. And even there, we’re being challenged by certain problems. Because, you know, if we assume that most cyclists want to know about cycling out in the, in the real world on real roads, and we test them in the laboratory on various devices, trainers, you know, are gardeners were finding out even if they take their bicycle from the road, and put it on a train or the same trainer they may have used. Otherwise, they put it in the laboratory, they put that bike on that trainer, they’re getting different power measurements, with the power meter on their bicycle, when it’s on a trainer than they’re getting outdoors, actually cycling on that bike

Trevor Connor  38:43

very quickly. Something I want to bring up. This is a pet peeve of mine, even though it’s a very minor thing. Power is not a measure power is a calculation. We measure torque and we measure cadence, you don’t actually measure power.

Jack Burke  38:54

Right? That’s true. So yeah, so we’re already deriving good correction. So yeah, we’re using a strain gauge that somewhere in the crank, pedal system, Cadence is being measured. And that combination is giving us a measure of power. And, you know, my cycling community friends, they’re even having to use correction factors. Because our lab which was supposed to be the gold standard, laboratory testing of power and doing the lactate profiling, that it’s not transferring to the field, in a one to one way, and I even have a colleague Espen airshow that’s been on your program been on this podcast, he basically just as well yeah, we use 8% We use a an 8% correction, the powers are higher out in the field. We don’t know for sure if it’s just a function of some biomechanical issues. You know what exactly so I just exemplify that even even the fundamental gold standard kinds of measurements. I can give you another example. From the laboratory that we’re working a lot with, for decades, we have measured breathing by either using what’s called a Hans Rudolph valve, there’s a mouthpiece in their mouth, and we clamp their nose, or we put a mask on our athletes. And then it’s got a tube that we’re collecting that ventilatory exhaust and sampling it for Oh, two and co2, and we’re in also measuring the amount. And that’s our gold standard way of measuring the metabolics. But what we found out is, when we take off the mask, our athletes breathe differently. They breathe on average at higher frequencies, because there’s less resistance. And so their brain, our brains adjust the breathing cycle, the breathing frequency and tidal volume is adjusted in a different way. Now, we think the overall volume is about the same, but the brain uses this seems to regulate breathing differently, to solve for that or took a counteract for that resistance from that mass that is part of our gold standard technology. So we are influencing the physiology, with the technology. When we take away the mask and just use the shirt, we calibrate the shirt and the mask, then we see, oh my goodness, some of these athletes breathe, they hit peak breathing frequencies during the max test that are 20 breaths per minute faster. And on average, 10 or 12, faster. And at the very extreme, some of them are 30 breaths, you know, they’re hitting 90 breaths per minute, without a mask or hitting 65 with a mask. So we are really in we’re in a crisis right now, my physiology friends, because our wonderful laboratories that are gold standard kind of halos, where we collect data are being challenged fundamentally, because the data doesn’t match up with what happens out in the real world, always. So we’re, we’re dealing with this right now.

Trevor Connor  41:56

That’s a conversation I’ve had with physiologists, I’ve had that experience myself. And I’ve done a lot of lab testing. And I can tell you, I go in, they don’t want me on my bike, they put me on the scientific velodrome which to try to match up with my bike but never quite get the same position. It’s a different saddle. Then they come, you know, block off your nose, put a mask on your face, all these other things. And they had me start riding at 150 Watts and I’m sitting there going, this already feels hard. Because 150 Watts is hard, but just because I’m in such an uncomfortable state, and you know, it’s the whole experience is going to be different. And so I’m a very big believer that when you get athletes in the lab to test them, you don’t put them on the Vela Tron, you have them on their bike, because I think it’s a mistake. To test them on this Mellotron and give them these numbers and you go How do you know if that matches up with their power meter, and I’ll have physiologist say, well, the Velodrome is accurate and I go but that doesn’t matter because they’re not doing their training on the Mellotron,

Jack Burke  42:50

we have two beautiful load Excalibur devices in our lab, we never use them, because any cyclist that has any capacity, they’re saying I want to be tested on my bike. So they’re bringing in their own equipment. And we’re accommodating that, or we may be using a white bike. But we’re not using the fancy load a you know, so they’re sitting there collecting dust. And that’s just the realities of it. But the most common situation will be that if we do testing in the lab now, and this is what professional athletes, you know, cyclists, they’re bringing in their own equipment, they’re bringing in their bike, and they’re bringing in their trainer. And we’re just serving as some reference to try to standardize the process. But some of the measurement devices are the same ones are using in the field, and that gives them a bit more security and what they’re measuring.

Background Noise  43:41

Hey, listeners, it’s Rob pickles co host of the fast talk podcast. Believe it or not, I don’t just talk about training on the airwaves. I also talk about the science of training with the athletes that I coach. If you like hearing from me on the show, then you can have me as a resource to help you achieve your goals. Whether that’s racing, a big adventure for improving your fitness for more information about coaching with me or great coaches like Grant colicky us cyclocross champ Steven Hyde for many other amazing coaches, check out forever

Trevor Connor  44:14

Dr. Sadler, I’m going to flip this around because I think you’re touching on a really important point. And I’m actually going to ask you a question that you put in the outline, which is, what then are the properties of good metrics? Because we’ve just raised a whole bunch of concerns that we’ve seen with metrics. So what do you look for to say this is a valuable metric? What do you look for to say this is a good metric? This is something we can use. Well, I

Jack Burke  44:36

want to know where all the parts are coming from. So first principles I want to understand where each part like as you pointed out, power is not actually being measured. It’s its cadence times torque. Okay, how’s cadence been measured? I get that. That’s fairly straightforward. What about torque? Is that straight, you know, so I want to know the pieces to the puzzle. I am allergic as a scientist to black boxes, if you understand what I mean by that, so if a proprietary company says, Hey, we’ve got our black box measurement for whatever this metric is, I’m out of there, because that is for me a red flag. But I understand it’s business to have a business that you’re trying to protect. So I’m cognizant of that. But you know, I feel comfortable with measurements where I know, I know where the various elements are coming from, whether it’s vo to or heart rate variability, or whatever it might be. So that’s the actual parts of the measurement. And then the other is that kind of the execution of the measurement, which is user error, you know, there is to some of these measurements, it depends on how you do it. With blood lactate, for example, a lot of recreational athletes that by blood lactate monitors and boxes of strips, and they waste a lot of money, because they just make mistakes in the actual process of sampling their own blood, you know, because it’s not easy. I mean, it’s not easy if you’ve never done it, right. So there’s a technical error aspect. So there’s measurement error, that his device associated the actual elements that are being measured and how they’re being technically achieved. There’s user error. And then I guess you The third source of errors, not really error at all, but it’s just variation. And that’s just day to day variability in these measurements that we assume that are going to be static, we assume maximum heart rate is going to be maximum heart rate every day. But it’s not. We assume peak lactate is always peak lactate, it’s not because there is biological variation. So you’ve got three kind of sources of variability in these measurements that you try to minimize, you know, you can’t minimize the day to day you do it in the lab with testing. And you say, Well, I want you to show up with the same time of day saying we’re in the same shoes, having not had caffeine, you know, so we try to standardize some things like that. And we actually standardize often for testing, they standardize in ways that they wouldn’t actually do in the real world, that cyclists as well, I wouldn’t have not had a meal for the previous three hours before a workout. But that’s what I do in the lab, I would not have to abstain from a coffee or a cup of coffee for three hours before but I do that in the lab, you know. So we’ve created some kind of artificial situations that actually don’t apply to what our athletes are really doing in the field.

Trevor Connor  47:38

As doctors either just pointed out even a good measure we trust such as heart rate can vary and still ultimately lead to estimates. Here’s physiologist Brady homer to explain a little more.

Brady Holmer  47:49

So I mean, maybe I think the most common and most popular one would be sort of thinking about like zone two training. So a measure or a metric of whether you’re in zone two or not, would actually be to measure your lactate levels. If you did a lactate test, you could actually kind of determine where your different thresholds are. And whether you were in zone two or not, versus an estimate, which is what most people when they say they’re training in zone two are probably doing, which is training based on it’s an almost an estimate of an estimate. Because I think what most people do when they’re estimating there’s two is estimating their maximal heart rate, probably using 220 Minus age, which is a very, very rough estimate. And then from that, they’re kind of estimating what their zone two is based on what you know, say maybe 60 to 70% of their estimated maximal heart rate might be so you know, hopefully, this is kind of like what you’re looking for. But yeah, I think you know, regarding that, while that can be valuable, and maybe get you somewhere towards your zone to training, it’s certainly not as accurate as a measure of lactate would be. And I think a problem though, even with a measure is that that measure is constantly going to be changing. So even if I have a measure, someone had even posted something on extra Twitter about this a while ago, but say I get my zone to I measure my zone to using a lactate meter. And so I know what my zone to say power output is or running paces. That could also change from from day to day. So unless I’m actually measuring my lactate every single day to determine where my zone two is that day, perhaps an estimate might even be better in that case, because you get like a wider range.

Trevor Connor  49:29

So you just brought up basically validity, reliability, and then you called it functionality. I’m going to throw something out here just to get a conversation going. To me of these three, reliability to me is the most important meaning. If I have a power meter, that’s not valid, that’s a 50 Watts off. I’m okay with that as long as it’s always 50 Watts off, because then I can still use it for training.

Jack Burke  49:57

Not me It would be okay if it’s 50 Watts too high. Right, exactly. So my, my ego is so intimately connected to those power numbers that there’s no way in heck, I’m going to be able to survive a 50 watt deficit. So the psychology is still important.

Trevor Connor  50:18

We’ll give me 50 Watts too high. But how do you to feel about that? If you have something that is reliable, it’s fairly consistent day to day? Can it be a useful metric?


I would say that, in general, it can be useful in certain situations. I mean, as my expertise has been mostly related to measuring, you know, the stress response and throw party variability, and wearables, and I’ve seen a bit the evolution of these devices that started measuring HRV in the night, different wearables. And what they measure is not even really heart rate variability, right, by definition, that is only coming from the activity of the heart. So what you measure is pulse rate variability, we just be different, because by the time you know, blood has traveled to the wrist or the finger there are, there is additional variability due to changes in blood pressure, and things like that. But even if the absolute value there is not really the same as your ECG, derive heart rate variability, the ability of these devices to capture day to day changes in response to stressors. So the changes with respect to your own data, the previous days, or your normal range, or, you know, whichever way you quantify your previous data, and how it changes over time, is very good. These devices can do that very well in a way that is very similar to what you would capture with a nice ECG, even though the absolute values will be a bit off. So certainly, I think reliability in that sense, is more important than the device being actually valid at measuring heart rate variability. So gives us a way to use it in certain contexts. I think that this technology is also tried to solve some of the issues, we shall see you’re both brought up earlier in the context of what we could call ecological validity. Right? So how there is a difference in measuring things in the land and imagining things in real life with sleep is also the case, right? Yes, sure. We can use polysomnography again, and use measure brainwaves and all of that, that when you are in a sleep lab, and you have all these devices linked to you, and you’re sleeping there, maybe for one night or two, how does that relate to your actual sleep when you’re at home, right. So if you take that data, and think that that’s how you sleep and try to maybe implement changes in your behavior, or sleep routine based on that, I don’t think it’s particularly meaningful. And that’s probably the case also for some of the tests. You mentioned earlier, measuring the lab with the oxygen mask, or lactate, or a different position on the biker and all of that. So as the measurements move outside of the law, I think we do have something to gain there with our ability to capture these changes in a way that is more representative of real life is the same with how to buy a bat, right? In the past, we’re doing things like Steven was saying before, do not eat, do not drink coffee, do not do anything a lot exercise, then come to the lab. And then three hours later, you go to the lab, and then they tell you, hey, now lay down and relax. And then you know, you’re ordered to relax, and then you measure your heart rate by a bit. And that is probably nothing to do with your resting physiology, the way you measure it now is the night or first thing in the morning at home, which is a lot more useful in terms of we interpret the data with respect to the alternative that we had before. So there is there something to gain, I think as long as we are measuring things, it also gets, I think, a bit more challenging than when when we start not measuring things, but estimating them and making them up a bit when you build a model, right? Even if we are, if we say that, okay, we use this very complex machine learning model that uses all these parameters. But still you are learning also from certain data that you have collected, that might not be really representative of the person that is using the device now. So if I use this device, and is measuring something that is accurate, typically, if it’s estimating something, it might be accurate if I’m very similar to the person that was used to create the dataset and create the model that is used now to develop this slip staging estimation algorithm or any other algorithm. But if this person is a bit different from me, than maybe what I get is not particularly useful. So it’s always difficult also to generalize, right. If we take it in the simplest form possible, an estimate would be, you know, what’s your maximal heart rate based on your age, right? None of us would use that. But there are people of course, for which this goes exactly to work perfectly as their maximum heart rate that it would get in a maxima test, because there will be those people. But that does not mean that that is a good method. So I think that’s also some sometimes difficult for people to understand that there is a lot of inter individual variability on how these things work. And even when they are validated, maybe they’re not validated on, you know, an individual that maybe is older or has a different behavior, which might be someone that is using this device, but it was not the person used to develop the model.

Jack Burke  55:14

So I want to go back to your 51 issue or your 50 Watt example where you say reliability is more important than validity. And I would say I would argue this, I would say, Look, if we’re within a person within a specific individual, I can buy your argumentation and say yes, if I have to choose then repeatability Trumps absolute validity. Although I don’t really think you can separate them. Because now what happens if I don’t have validity? I am 50 Watts, I think I’m cycling at 400. But I’m really cycling at 350 Watts, it’s always 50 Watts, I’m feeling great about myself. But then I go to the race. And I compete against my opponents. And suddenly I get this aha experience. Holy cow, I’m not what I thought I was. My 400 is actually 350 I’m getting my butt kicked here. So as soon as I want to make comparisons within a team, or across individuals, validity really matters. I actually agree with that. I’ve got to have standards, if I know you know, if I’m I’m the coach of visma Elisa bike, and they’re trying to prepare to win monuments against you know, some of the other great athletes like Vonda pole and opposite in Phoenix. And well, that power needs to be valid. Okay, they need to have an upfront understanding of look at when we come, we’re going to have to be able to generate these kinds of powers in these particular situations to get where we want to be. So validity matters within individuals. It’s about patterns across individuals, we need some absolute calibrations, I think and that’s a worthwhile thing to remember. Because sometimes we do want to compare a cross and see where are my athletes relative to some standard, you know, what does it take to qualify for the World Champions chips in the 1500? You know, there’s just you got to have a certain speed. So That better be valid. During your training,

Trevor Connor  57:13

I’ve raised that intentionally to ruffle some feathers. I agree with you. That’s okay.

Jack Burke  57:17

But I think it’s it’s a useful distinction to remember, what are we using variables for internal monitoring versus standards? You know, making sure you know, where are we relative to certain targets, you know, for, for performance. And even in our training process, like Kavanagh, the one of the coaches who’s worked with some of the best Kenyan runners in the world, you know, he would say that during early phase training, the metrics that we’ll be using will be more perceptual or maybe heart rate when they’re doing hard sessions. But then as they move towards the season, they will move to very pace oriented very, you know, validity, dependent measures, they have to be able to run these paces these lap times if they’re going to be competitive for 10,000. So that construct this idea, you know, sometimes you need the absolute numbers to match up pretty well. Yeah,


that’s actually a great point, even let’s create a side application of having valid measurements or estimates. Is that something I see a lot with, you know, new devices, wearables and things, you know, every day, there’s a new, either a new device or a new parameter that is estimated, right. And if there is a level of validity, then we can try to figure out if this parameter is, is worth something, even before it has been validated or found in scientific research, looking at comparison with reference system, if that is a parameter that does have a reference system, which is something that might take two years by that I’m there, the device is, has been validated, and the peer review process and everything that has been done. So a way that we can look at metrics is typically to compare a couple of devices weren’t on the same person, right? If these are parameters that we can measure reliably, or estimator I believe in and not all estimates maybe are so bad, then we expect all of these wearables to provide us with very similar data related to this parameter. So if I measure the resting heart rate, or heart rate during the night, with three or four wearables, I want to see these data to be very close in absolute terms, but also in relative terms as they change over time or their days, right and for the same person, and then the same for HRV. But then if I look at sleep stages, then it will be all over the place when I look at multiple wearables for the same person over time, and the same if I look at calories. So this tells us something, I think that is that maybe we are unable to estimate these parameters with the required validity or reliability at this stage. And so seeing But I would say exercise we can do if we have access to more wearables, and we have some doubts about which metrics or estimates we can rely on, use a number of wearables and the ones that track very well between each other up, it means that we are actually measuring something, or estimating something in a way that we can trust it. But if that is all over the place, and we talk about, you know, the major players out there, so it’s very unlikely that one is better. I mean, we’re talking about companies that millions of dollars of budget, hundreds of employees, a lot of smart people, like the reason I think anyone that is better than the others at this stage, right? It’s they’re all doing the same things with the same sickness. So either we do it well, and we can do it. Or if it is all over the place from multiple devices, then maybe they’re really not worth our time.

Jack Burke  1:00:46

It’s not doable. Yeah, I’m gonna give an example. And I do not want to disparage this technology, or the users of it, who find it very useful. But I find this to be a challenge with near infrared spectroscopy, the Pierce method, muscle oxygenation, muscle oxygenation is an interesting thing for me, because it’s a technology using light refraction, just like you were talking about for photoplethysmography. So there’s different wavelengths and so forth. It’s a combination of blood flow and oxygenation of the hemoglobin that’s passing under the tissue. But the challenge, the big difference for nears versus heart rate, or versus ventilation, is that when I measure heart rate, or ventilation, I’ve got a system wide response, it’s very robust, it’s very, I’m looking at the entire body’s ventilatory output, or the entire body’s the heart rate response is a function of the entire system’s responses. When I look at nears, I am sampling a really small amount of tissue because that light only travels maybe one or two centimeters deep, I’m giving an extra credit with two centimeters to put it that way. And so it is sampling a small amount of tissue in one muscle. And then it’s trying to say something about all you know all the musculature that’s active. And that’s challenging. And we’re seeing that if you put multiple nearest devices on multiple muscles, you get different values at the same time. So also, it’s not necessarily that nears itself is wrong, it’s just EMG near some of these issues, we’re just not able to sample enough tissue, we’re taking a very small sample and trying to make a really big projection about how the whole body is responding. So those technologies are more vulnerable than the technologies that are built more on kind of robust system wide responses. And I again, not to disparage Nielsen probably nears is one of those things that if you use it every day, you start to detect patterns that are useful that you’re able to help your athlete. But if you don’t use it every day, and you don’t get that detailed variability, understanding how that athlete what their pattern is, then probably it’s not so easy to use.

Trevor Connor  1:02:59

So I’ve got one last big question for both of you. And before I go there, I just want to say that the reason I use that power example with you Dr. Sylar and into fully agree with you here, I bought a new power meter in 2016, that rat read about 40 Watts too high, and spent the whole spring thinking I was on great form. And it’s exactly what you say, I was feeling great until I went to my first race went Oh, no, I’m not a great form at all. So I agree with you completely. But I just want to ask one last big question here, because we have been talking about a lot of the metrics. But the theme of this episode is also talking about some of these estimates. So right now I’m thinking to things like training load TSS pm sees these sorts of things. What do you look for with these estimates to be able to say, this is something I can use because I’m you know, I always get concerned, we just spent a fair amount of time talking about things that claim to be direct measures and all the particular issues with those. So those have their their issues, they have their points where they don’t work. Now you have these estimates, which are taking these imperfect measures, and then doing calculations on them often then doing calculations on those calculations. And do you hit a point where you’re having something that could be more harmful than good? How do you make that assessment? Well,

Jack Burke  1:04:23

I mean, we can start with something as simple as FTP, functional threshold power, and no disparaging to those who created it. But if we go back to the origins of it, it was a fairly straightforward. What’s your average power for an hour? Right? The Hour of Power been done for decades. It’s not a fun test, but it is a truth teller. Right? Well, then you have to add that’s kind of hard. Can’t we estimate with a shorter distance and so you’ve had this slow industry of well, you know, let’s take do a hard five minute effort and then we’re going to do At 20 Min, and then we’re going to take point nine, five. And, but as soon as you try to estimate the average power the person can maintain for 60 minutes from an average power for some other duration, there’s already, you’re introducing error, because individuals are different their fatigue curves, their fiber type compositions, and so forth, dictate that, yeah, there’s not going to be a one to one correspondence between are a perfect correlation between their 20 minute power and their 60 minute power, even if you measure both of those beautifully and perfectly, so we shouldn’t even expect that to be a perfect correlation. Okay, so those kinds of issues are rampant. And then you have this matrix type thing where we get athletes that, unfortunately, will take every way of trying to get a higher FTP, and then they’ll use the highest FTP they ever had when the power meter was probably not quite calibrated. Right. And that will become their benchmark. And that’s tough, you know, because it’s probably 30 Watts off, you know, so I had this happen to me on Zwift. One time, I had a my regular bike, my text, Neil bike was, had to be repaired, I was using another bike, and I’m riding in this race, and all of a sudden, for whatever reason, I was like, Holy crap, I am feeling great. And I was just flying, I was just kicking butt and set in power records along the way. And I was like, Where did this come from? Where did my magic Cobra, in the end, I realized that, at the end of the ride, I realized that a towel had gone down. And I was using the concept to and it was slightly covering the intake. And anyway, it was disrupting the calibration of the ergometry. And I was getting a bunch of free watts. And I have just lived with that, because there’s those numbers are still on my Zwift you know, old time performance numbers. They’re, they’re wrong, but I can’t get rid of them. And they haunt me, you know, they still haunt me to this day. But it just shows that we have such a struggle with understanding that look, the best calibration for us is, what can you do on a soso? Day? You know, what is your daily grind? FTP, that’s the one you need to really use too as a reference, not that one that was probably the calibration was off, you’re on an all time high, you just can’t, you know, don’t use that, because that ends up creating problems for us. That’s just one example.

Trevor Connor  1:07:35

All right, maybe our group here isn’t going to offer a lot of positives about estimates. So let’s hear from the try doc himself, Jeff Sankoff as he shares a few calculations and estimates that he finds valuabe, including FTP,

Jeff Sankoff  1:07:48

the major things that I use are bike derived or bike measured, I use W Kayo, to interface with training peaks. And because of that, I get a lot of data that’s pulled out of training peaks. So most of my athletes have power meters, if they don’t have power meters, I encourage them to get power meters, I have I think one athlete who still doesn’t have a power meter. And so for them, I’ll use a lot of subjective kind of information. But for all my other athletes that have power meters, I’m getting a lot of calculated data that is coming through W kayo. And let’s face it, I mean, I’m still one of those people that really leverages FTP, I don’t do specific FTP tests. Wk O has its own protocol for testing, and then calculates an ongoing FTP, and I’ll use that. But I do like FTP, I do think it’s a good metric. I don’t use it. I’m not slavish to it in any way, I don’t think it’s the end all to be all, but I do think it’s a good number that I like to follow. I think it gives me a good sense of where my athletes are in terms of their ability. And then I kind of pair that with other things. So because I’m coaching triathlon, mostly I, for running, I will use pace, I use a threshold pace, but I kind of pair it with, I don’t love heart rate as an individual metric on its own, because I think there are so many things that can impact that, especially for age groupers. And we’re not dedicated professional athletes. So we’re not spending you know, getting up in the morning. And immediately training, most of us have jobs, most of us are squeezing our training and whenever we can, and so there are so many things that can impact our heart rate. And so for that reason, I look specifically at how heart rate is interacting with things like pace or interacting with power output. And I want I’m looking for any signs of decoupling. So any decoupling, where I don’t expect it to come, I take that as a sign of fatigue or I take that as a sign of overreaching. So those are both measured paces also measured, obviously. And then the other data that I’ll really use, I guess is going to be related to things like cadence. I really feel like cadence is very important, especially running cadence. There’s so much science showing how running cadence is Important for running economy. And so I harp on my athletes continuously on the importance of cadence, I have a lot of them using the metronome function on their Garmin watches to try and get them to run at a good cadence. And spinning cadence as well, I think is a very important number. So most of my data is measured. And a lot of it is things that I try to follow on a workout by workout basis.

Jack Burke  1:10:24

The normalized power is it’s a arbitrary decision made by someone to say, well, it looks like you know, taking the fourth power of these numbers, and then the fourth root to get this normalization, it kind of looks right. But there is absolutely no gold standard to say this makes sense. It has to be the fourth power. Third power is not right. Fifth power is not right. It’s fourth power, right? No, that was never the case. And then now we know that in training peaks, they don’t even actually use the normalized power in the raw format, they take a 32nd smoothing version of the raw of the normalized power, because that turns out to end up matching up better. So even the normalized power has been, in a sense, bastardized to tweak it so that it looks a little more correct. And we see this, if we take true normalized power from some of these highly stochastic races, we get absolutely crazy values. But it’s perpetuated in we’ve just gotten used to it. Here’s my

Trevor Connor  1:11:23

issue with normalized power. And I, whenever athletes send it to me, I immediately tell them, No, tell me your average power. And they go, Oh, Trevor, you’re so outdated, you got to stick with normalized. But let’s even give the benefit of the doubt and say normalized power does what it says it’s supposed to do. What they are trying to do is take an external measure, and give an estimate of internal stress. So normalized power is supposed to be here’s what the race or the ride or whatever felt like to your body. So even though you average, you’re doing a crit, even though you average 275 watts. To you, it felt like 400 watts. But I have athletes send me the normalized power all the time and say, look how hard I was going and how fast I was going and go. That’s not what the normalized power tells you. It’s actually the exact opposite, stop sending me that number. And athletes won’t stop because normalized power is always higher than average power, it’s a nice number,

Jack Burke  1:12:21

we’re actually using those types of metrics in race analytics to act to say, Where are the power loss is happening, because what we would like to do is reduce the energy bleed, you know, every time you go around a turn, every time you get your head in the wind, every time you, you have to jump back on a wheel, because you got a little lazy and you fell back a few meters, you have to do these power surges, and they are costly energetically. So we’re actually look using normalized power in a different way to help. Because we would like to bring it down, we would like to be, you know, using the famous eat from the heat from the plates of the others, before we start licking our own, you know, we want to reduce because you can only do so much with power in. So we’ve got to look at power losses, and say, Where’s the wasted energy coming? Where can we in a classic in pre ru Bay? Where can we kind of save some energy? Because if we can save some of those searches. So in that respect, we want raw in p, we want to use the raw data, because it’s telling us the actual power surges, not some smooth variant thereof. So I think sometimes there are ways to use some of these constructs that have been developed. But just be very cognizant of what you’re, you know, if you smooth out, for example, if you use 30 seconds smoothing and as in training piece, with that normalized power, you’re actually miss, you’re losing data, you’re losing information about how did the cyclist actually cycle? How do they actually solve the problems of the course. And so we’re finding that it’s more interesting, it’s more truth telling to use the raw, normalized power, not the training peaks version. That’s interesting.

Trevor Connor  1:14:08

I found


it particularly interesting that in this context, or the examples you guys have made, we talk about estimates where we are estimating something from actually the same thing collected maybe over a shorter time period, right, like your power over an hour. But we start from your power over 20 minutes. And that is already a million issues, right? As you were describing. But now with the devices we have, we actually estimate things from things that are completely different, not even the same parameter measured over a different timeframe or in different contexts, right. Again, brain activity from part activity. So I think that says a lot about how things can easily go wrong, and how important it is to understand the limitations. It’s not that it’s all bad or not useful, right? It’s just that we need to under Standard limitations and to know that there can be issues if we do an FTP estimation from a 20 minute test, depending on the type of audits we have, and also different variables. And the same is true for all other estimates, there might be a use for it. But there is likely a margin of error, we might understand where that error comes from. And that maybe allows us to use the data more effectively, or sometimes we might even not know where they’re coming from. And that makes it a lot harder maybe to use the data more effectively,

Jack Burke  1:15:32

I want to say just another, I work with companies a bit and I teach a course in sports technology. And so in that respect, I’ve gotten to know some of these fantastic innovators that develop companies and try to bring technologies to the market. And what I find in some of my colleagues, we find is that young companies, hungry, almost bankrupt, you know, engineers, or scientists in a want to do things, right. And they want to get it right, they want to measure everything, right. And they’re just dedicated to their hardware development in their user interface, and they’re trying to make things happen. Those are the ones I love working with. But if the company gets really successful, they start really making money selling a lot of product, what happens? Well, the engineers get pushed out of the head office in the marketing people come in. And then they start to say, Hello, we cannot afford to do another hardware iteration here. 2.0 is fine. But we’re going to do some more algorithmic work, we’re going to build in some new, some software changes so that we can make some more estimates, then we start talking about these big companies like polar or catapult, which does all the inertial measurement units for team sports, they can measure hundreds of variables, they will tell you, they know they estimate hundreds they measure a few they become the world champions of estimates, because they don’t want to iterate hardware that’s too expensive. They want to iterate software, because that’s relatively cheap to do. And your marketing whizzes can fool the public into believing that they’re buying a product that has a lot of new features that gives them more information when actually it’s fuzzier than ever. So this is just the nature of the beast that we need to our consumers need to be aware of that how the process works in the technology business.


Yes, right. I think we see it also in the wearables that we use these days, right? The technological innovations have stopped almost 10 years ago, right? Once they introduced PPG. That was the use chain change and improvement from the previous just accelerometers, right. So before you had the Fitbit, and now we have the PPG. So we have HRV, heart rate and everything. But then the hardware has been exactly the same. For a long time. Now, there is no innovation from a hardware point of view sensing, actually sensing something differently, is just all software and with the progress that we discussed. An

Jack Burke  1:18:09

example of that is if the heart rate company, if polar wanted to, they could build a belt that had a stretch transducer in it that could measure at least breathing frequency quite well, quite accurately. And it would be on first principles, it would actually be measuring thoracic excursion, you know, which is the gold standard way in the field to capture breathing frequency. But instead, they don’t want to do that. Because that would imply you know, a big shift technologically and have a new hardware and going back to a chest strap. And they don’t want to do that they want to put everything into one device. And so instead they say, well, we’ll measure it through heart rate variability, and that’s not going to work because heart rate variability just keeps going down, down down as intensity increases and breathing frequency increases. So you can’t capture that accurately, that may be at rest, you can be reasonably accurate. But at 90% heart rate and going up a mountain, and your breathing frequencies going up to 75, you’re not able to capture that. So it’s a dead end, I would argue it’s a dead end. It can’t work, at least across the whole spectrum. But it’s cheap. It’s relatively cheap to create an algorithm and give people a number. But it’s wrong. You can’t trust it. And it’s unfortunate. I’ve

Trevor Connor  1:19:22

got one final question here. And this might be a bit of a challenge, because we’ve been sitting here, have we beat up on metrics a bit, we’re really beating up on estimates here. So I just want to flip this around and ask the question, Where can these estimates be valuable? Or can you think of particular estimates where you go even though that’s a bunch of calculations, I find that useful?

Jack Burke  1:19:45

Well, I’m gonna go to fundamentals. What’s going on up at the brain is important. It’s useful. It’s interesting. The fundamental is just hey, how are you feeling? Right? That’s our good basic communication. I feel tired, so tired. Is it construct, right? It’s something that the even here we’re having to estimate, right? Our brains are estimating some Gestalt from lots of inputs, lots of different neurons that are firing last or more in the net result is my daughter says I feel tired, right? But that only exists up in her brain? Well, it’s an estimate. And I find it very important. So I do think that, you know, perceived exertion, for example, RPE, that’s an estimate, you’re taking a very complex brain kind of iterate, the brain is trying to capture information, it’s a construct up in the brain even. And now we’re trying to match resize it, you know, turn it into a number and Borg scale, if you really think about it, can it truly be possible? Can you Is it conceivable that sitting on a sofa with an RP of six watching your favorite silly show, versus coming at the end of the peri roo Bay, three abreast with two of two other greatest cyclists in the world who’s going to get their wheel ahead, driving heart rates that Max, that that’s only 3.33 times higher perception of exertion than sitting on the sofa? Because that’s what the scale says, it goes from six to 20, you’re with me? It’s inconceivable. It can’t be true. But yet, the scale has existed for decades, and it has some utility, but probably the scale should go from zero to 500. If it was going to truly capture the difference in brain activity and perceptual just lightening going on up in my head in that moment, trying to get across the finish line first versus sitting doing absolutely nothing. But those are estimates, and we’re using them. But I can’t even conceive that they’re truly representative of brain activity differences, if that makes sense. So I really wiped out Borg scale there. It’s just always captured my fat, it’s fascinating to think about, yeah, for sure.

Trevor Connor  1:22:24

Burn winter, there is cold. But again, back to conditioning and looking to rev up your training. If you haven’t already, now is a great time of year to reflect on the past season. Specifically, when it comes to data and recovery to very important metrics in endurance sports, visit Bastok labs and take a look at our pathways on recovery and data analysis. These two in depth guides can help you get the most from your offseason, see more fast Doc According to Coach Taylor Warren, well, even perception has issues can still be a valuable coaching metric. Let’s hear what he has to say

Taylor Warren  1:23:05

about these measurements, right? It’s like, yeah, power is a calculation. And then you have all these internal measurements, heart rate, HRV, core body temperature, these are all internal objective measurements. The measurement that I found to be this is really old school. But I think one of the most impactful measurements is the subjective measurement of RPE, or just rate of perceived exertion. And it’s really just like learning how an effort feels, learning how an effort feels when you’re fatigued, learning how an effort feels when you’re fresh. And using that internal subjective measurement to guide your training in a purposeful way, I think is very, very valuable. And a big part of the training process is learning your body. It’s understanding the relationship between workload and fatigue. And if you can master this ratio, I think that goes a really long way in how you’re planning your training day to day and how you’re planning training blocks.


I think as long as we understand that these are approximations, in some cases, we can you can use them, as long as maybe they check some of the boxes we talked about, right in terms of, for example, being able to measure or estimates are reliably and things and how they change over time, at least within individuals. So that there is some level of reliability. And again, if we understand we are approximating something, and we are not taking it as the reference on which to base everything else that we do, or to interpret it, then in some cases, there can probably be a use

Jack Burke  1:24:35

in some of these measures like readiness to train and profile of mood states and things like that. They use what’s called a Likert scale. You know, they use the scale that goes from one to seven where you completely disagree. And then in the middle is both yes and no and then completely agree 100% agree. And so it’s so again, you’re taking some kind of a very fuzzy idea You have how much do I agree with this and your metrics, you turn it into a number. And what we what the research shows is that the assumption of linearity, meaning that the difference between a one and a two is the same as the difference between a two and a three and a three and a four that that interval is equivalent across that scale. The reality is that the human brain usually doesn’t actually treat it like that the extremes are bigger, they’re less likely to be used, they feel, you know, if you go all the way down to a one that feels like, wow, you know, he really disagrees or really agrees. So we tend to use the middle of the scale a bit, and the variation is smaller. And even in cultures, the Italians may interpret a scale differently than the Scandinavians, just as a cultural bias. Or even within Norway, I would say the one the Norwegians up in the north, they’re more willing to use the extremes, they cuss more in their in their language, you know, if they don’t feel good, they’re gonna tell you, whereas where I come from in the Bible Belt of Norway, they’re going to stick to the middle of the scale. Well, this translates and transfers out into how they use these various measures related to training and so forth, that they, it’s difficult to get them to say, I’m really tired, I really feel great. They always say, I feel fine, well, then then your measures not very sensitive, right? Because you need, you need the measured, you need people to say when I feel really great, I want to use the sixes and sevens, when I feel really sucky I want to use the ones and twos, I don’t want to always just give the coach either a three, four or a five, that’s not going to be very useful. So even these kinds of that’s why palms and some of these are challenging. And particularly they’re challenging to start comparing across people are across cultures or across teams, and so forth. So don’t be stoic. Just to add to the complexity here, we haven’t really talked about it. But as you know, a core a property of these variables should be that they’re sensitive to change, if we want to use it as an effective monitoring tool, it’s sensitive to, if my state of readiness is changing, if my physiology is changing, it’s captured in a robust way by that variable. That’s what we’re looking for in our metrics, that they’re valid, they’re reliable, but they’re also sensitive to real changes.

Trevor Connor  1:27:27

Marco, final thoughts,


I would say, stick to measurements. And if you really want to go into estimates, try to look at estimates that are consistent across devices that can help you understand which ones we are actually able to estimate with reasonable accuracy and reliability as opposed to the ones that we might not be able to yet.

Trevor Connor  1:27:50

Well, I hate to say it, guys, we’ve been going a while now. So I think we need to wrap this up, Marco, you’re new to the show, we always finish with what we call our one minutes, where we give everybody on the show one minute to summarize what they think is the most important lesson to learn from the entire episode. So Marco, will give you a little time to think about it. Dr. Sylar, why don’t you go first. Okay. Think

Jack Burke  1:28:15

of it as a heads up display, you want to have the minimum number of metrics or measurements that give you the maximum amount of information. So be careful what you trust, and keep it simple, as much as possible. So, you know, yes, measure. But if you’re measuring 12 different things every day, probably that’s not helping, you know, and think like the fighter pilot, they’ve got to keep their eyes up, looking forward at what matters. And they need a few numbers on their display. But they can’t be too many. So pick your metrics carefully.

Trevor Connor  1:28:49

Marco, do you have your thoughts? Measuring


can be useful. It’s not necessarily for every one, right? We might have people that have a healthy relationship with the measures that are looking at and people that don’t, we might end up risking focus too much on what we are able to measure sometimes losing focus on the actual outcomes that matter, which could be health or performance related simply because they might not be as easy to quantify. So those are things I think that we need to think about, apart from everything that we just mentioned, related to which metrics are derived in which weights and what does that mean in terms of trusting that more or less?

Trevor Connor  1:29:35

Good answer. So I guess I’ll wrap this up here. Mine is going to take a little more than a minute Oh,

Jack Burke  1:29:40

that you’re cheating. I’m cheating a little hosted cheat, you’re cheating. Remember, I

Trevor Connor  1:29:45

only have something that measures five minutes and on Sunday measure was one everybody can. So mine is I think the most valuable lesson I ever learned about measurements and calculations and estimates was So a class that I took on body composition. So anybody listening who doesn’t know what that means is basically just measuring body fat percentage. And we spent this entire course learning all the different methods of measuring body composition, our professor had spent most of his career studying this, and I couldn’t resist at the end of the class, I asked him, so which is most accurate? And he goes, I test being an idiot. I’m like, What do you mean by tests? And he goes, you look at them. And we continue this discussion. And you know, he brought this really good point that no matter what method hydrostatic weighing, you know, electrostatic impedance, you know, your your calipers, all these different methods, said, ultimately, you’re using the calculations, and this gets a little bit gross. But how did they come up with those calculations? They took corpses, they did these measurements on them, and then they literally cut up the corpses and weighed the fat tissue, the muscle tissue, the bone tissue to figure it out. So if you are similar body type to those corpses, you’re gonna get a somewhat accurate measurement. If you’re not, it’s actually not going to be that great for you. So it’s still an estimate. Now, does that mean we throw all these out? No, I still have on those scales that does the bioelectric impedance. And it gives me some useful information. But I know that it’s not perfect. It’s not accurate. What I learned from this class is, it’s the eye test. And the equivalent to me in endurance sports is, it’s feel it’s RP. I think all these metrics are great. But at the end of the day, you got to trust feel it is the best, I

Jack Burke  1:31:42

don’t know what it is. Maybe they don’t trust their own eyes, or they’re looking for a confirmation. They’re looking for something that tells them something different than that reality they see or they feel. And we sometimes like you driver, just how you feel. And what you see are those still are pretty useful measures.

Trevor Connor  1:31:57

Maybe you’ve just hit on the most valuable aspect of metrics, they are reality check. Yeah,


I think they lead to awareness in many situations, right? So the ultimate goal is that we rely on feel and perception and all of that more as athletes, but the data cannot think in that process. I think for some people, then it goes sideways. And you think about the metric and you use the metric and you completely ignore how you actually feel and obviously in that case, that’s not the proper way of using the metric. But in many cases, I think it helps us to pause for a second and a self assess how we feel and over time become maybe better at using perceived effort,

Jack Burke  1:32:38

recalibrate our own feeling or perception sometimes, you know, and that can be useful. Well,

Trevor Connor  1:32:43

guys, I hate to say we need to call it there. But that was a great episode. It was a lot of fun talking.

Jack Burke  1:32:48

Thanks, Miko for being part of this. And Trevor, thanks for pulling this off. My pleasure.

Trevor Connor  1:32:53

Thank you everyone. That was another episode of Fast Talk. The thoughts and opinions expressed on Fast Talk are those are the individual. Subscribe to Fast Talk wherever you prefer to find your favorite podcasts. Be sure to leave us a rating or review. As always, we love your feedback. Tweet us at  @fasttalklabs. Join the conversation at Or learn from our experts at fasttalk for Dr. Stephen Seiler, Dr. Marco Altini, Jack Burke, Brady Homer, Dr. Jeff Sankoff, and Taylor Warren. I’m Trevor Connor. Thanks for listening!