Can hot baths boost VO₂max?
Learn to train smart, run fast, and be strong with this endurance performance nerd alert from Thomas Solomon, PhD.
Long-term passive heat acclimation enhances maximal oxygen consumption via haematological and cardiac adaptation in endurance runners
Jenkins et al. (2025) J Physiol (click here to open the original paper)
What type of study is this?
◦ This study is a nonrandomised controlled trial with crossoverCrossover means that all subjects completed all interventions (control and treatment) usually with a wash-out period in between. .
What was the authors’ hypothesis or research question?
◦ The authors hypothesised that 5 weeks of passive hot water immersion would increase haemoglobin mass (oxygen carrying cells in the blood), blood volume (indicates how much blood can return to the heart), and left-ventricular volume (how well the heart fills with blood) and, together, improve V̇O2maxV̇O2max is the maximal rate of oxygen consumption your body can achieve during exercise. It is a measure of cardiorespiratory fitness and indicates the size of your engine, i.e., your maximal aerobic power, which contributes to endurance performance..
What did the authors do to test the hypothesis or answer the research question?
◦ The study enrolled 10 endurance-trained runners, 9 male and 1 female, aged about 25 years. Each runner completed two 5-week training blocks in a counterbalanced crossoverCrossover means that all subjects completed all interventions (control and treatment) usually with a wash-out period in between.: hot water immersion + training and a usual-training control. Hot water immersion involved 5 sessions per week, 45 minutes per session, starting near 40 degrees Celsius and progressing toward roughly 41 to 42 degrees (about 104 to 108 degrees Fahrenheit) based on comfort; most sessions happened right after training.
What did the authors find?
◦ Hot water immersion increased haemoglobin mass versus control. Plasma volume rose after 1 week, then drifted back toward baseline, while red blood cell volume followed later and peaked near week 5. Left-ventricular end-diastolic volume (how much blood is in the ventricle just before it contracts; measured by echocardiography) increased. These changes lined up with a 2.7 millilitres per kilogram per minute rise in V̇O2maxV̇O2max is the maximal rate of oxygen consumption your body can achieve during exercise. It is a measure of cardiorespiratory fitness and indicates the size of your engine, i.e., your maximal aerobic power, which contributes to endurance performance. and a 0.8 kilometres per hour faster treadmill speed at V̇O2max.
◦ Hot water immersion also caused thermoregulatory acclimation across the 5 weeks: By the end of the study, during immersion, there were smaller rises in core temperature and a higher sweat rate with similar thermal comfort scores. Training load and perceived well-being were similar between immersion and control blocks, supporting a true effect of hot water immersion rather than a training volume artifact.
◦ The authors concluded that 5 weeks of passive hot water immersion can meaningfully raise V̇O2max in trained runners through coordinated haematological and cardiac adaptations while preserving normal training quality.
What were the strengths?
◦ The crossoverCrossover means that all subjects completed all interventions (control and treatment) usually with a wash-out period in between. design with a time-matched control tightly matched training load and well-being, which helps isolate the effect of regular heat exposure. The physiological measurements were strong and objective: duplicate carbon-monoxide rebreathing for haemoglobin mass and blood volumes, and modern 4D echocardiography for cardiac measures, followed by lab-based gas-exchange performance testing.
What were the limitations?
◦ The sample was very small at 10 people, which lowers statistical powerStatistical power is the probability that a statistical test will correctly detect a real effect if there is one: a true positive. (In jargon: power is the probability that a statistical test correctly rejects a false null hypothesis). Higher statistical power reduces the risk of a false negative (failing to detect a true effect; or a Type II error). Power is typically influenced by sample size, effect size, significance level, and variability in the data, with a common target being at least 80% (or 0.8). and raises the risk of false negativeWhen a statistical test fails to detect an effect or difference when there actually is one. I.e, “a missed detection”. Studies with a small sample size (N, number of participants) are more likely to produce false negative results. findings (i.e., missing real effects). The study did not randomiseRandomization means assigning people to different parts of a study (e.g., groups in a randomised controlled trial) by chance, not by choice. This helps make the groups similar at the start and reduces bias, so any differences you see are more likely due to the treatment, not background differences. In a crossover study, randomization usually decides the order in which each person gets the treatments (for example, Treatment A first then B, or B first then A). This way, order effects—like learning, fatigue, or simple time passing—are less likely to skew the results. the order of participants’ entry into the heat or control groups, there was no protocol pre-registrationPreregistration is when a detailed description of a study plan is deposited in an open-access repository before collecting the study data. It promotes transparency and accountability, and boosts research integrity. Without preregistration, it is easier for scientists to change outcomes after seeing the data, selectively report “exciting” results, or run many analyses and only show the ones that work, which can introduce bias and weaken the trustworthiness of the findings., and no sample size calculation was performed. GeneralisabilityGeneralisability is about how far you can confidently stretch a study’s findings beyond the specific people, place, and conditions that were tested. In simple terms, it asks: “If this result is true here, how likely is it to also be true in other groups or real-world settings?” It’s closely linked to external validity, which is the overall strength of those broader conclusions. is also limited to well-trained runners used to lab testing.
How was the study funded, and are there any conflicts of interest that may influence the findings?
◦ No funding was received for this work, and the authors declared no conflicts of interest.
How can you apply these findings to your training or coaching practice?
◦ For coaches and self-coached runners, this is practical. You can train as normal and, by adding regular hot baths at roughly 40 to 42 degrees Celsius (about 104 to 108 degrees Fahrenheit) for 45 minutes, 5 times per week, potentially increase V̇O2max and running speed at V̇O2max.
◦ One last thought: Is the boost big enough to matter in races outside the lab, on a windy day, after a rough week, with travel in the mix? Maybe; the haemoglobin and volume changes look real, but life is messy. Worth a try, but the evidence is not yet strong enough to hang your trail shoes on. Meanwhile, I’m looking forward to drinking an ice cold beer in a hot bath.
What is my Rating of Perceived scientific Enjoyment (RPsE)?
6 out of 10 → I experienced only moderate scientific enjoyment because although the measurements, reporting, and crossover control were solid, the design was not randomized, the sample was small, and there was no protocol pre-registrationPreregistration is when a detailed description of a study plan is deposited in an open-access repository before collecting the study data. It promotes transparency and accountability, and boosts research integrity. Without preregistration, it is easier for scientists to change outcomes after seeing the data, selectively report “exciting” results, or run many analyses and only show the ones that work, which can introduce bias and weaken the trustworthiness of the findings. or power calculationA power calculation is a way to figure out how many people or data points you need in a study so you can reliably spot a real effect if it exists. It balances four things: the size of the effect you care about, how much random variation there is, how strict you are about false alarms, and how likely you want to be to detect the effect. In plain terms: it helps you avoid running a study that’s too small to be useful or so big that it wastes time and money..
Important: Don’t make any major changes to your daily habits based on the findings of one study, especially if the study is small (e.g., less than 30 participants in a randomised controlled trial or less than 5 studies in a meta-analysis) or poor quality (e.g., high risk of biasRisk of bias in meta-analysis refers to the potential for systematic errors in the studies included in the analysis, which can lead to misleading or invalid results. Assessing this risk is crucial to ensure the conclusions drawn from the combined data are reliable. or low quality of evidenceA low quality of evidence means that, in general, studies in this field have several limitations. This could be due to inconsistency in effects between studies, a large range of effect sizes between studies, and/or a high risk of bias (caused by inappropriate controls, a small number of studies, small numbers of participants, poor/absent randomization processes, missing data, inappropriate methods/statistics). When the quality of evidence is low, there is more doubt and less confidence in the overall effect of an intervention, and future studies could easily change overall conclusions. The best way to improve the quality of evidence is for scientists to conduct large, well-controlled, high-quality randomized controlled trials.). What do other trials in this field show? (opens in new tab) Do they confirm the findings of this study or have mixed outcomes? Is there a high-quality systematic review and meta-analysis evaluating the entirety of the evidence in this field? (opens in new tab) If so, what does the analysis show? What is the risk of bias or certainty of evidenceCertainty of evidence tells us how confident we are that the results reflect the true effect. It’s based on factors like study design, risk of bias, consistency, directness, and precision. Low certainty means more doubt and less confidence, and that future studies could easily change the conclusions. High certainty means that the current evidence is so strong and consistent that future studies are unlikely to change conclusions. across the included studies? I’ve actually published a meta-analysis on this topic (see Solomon & Laye 2025) and written a deep-dive article; check it out at veohtu.com/heatacclimation.
Access to education is a right, not a privilege
Equality in education, health, and sustainability matters deeply to me. I was fortunate to be born into a social welfare system where higher education was free. Sadly, that's no longer true. That's why I created Veohtu: to make high-quality exercise science and sports nutrition education freely available to folks from all walks of life. All the content is free, and always will be.