Short Swings for Small Players?

Short Swings for Small Players?

Created
Apr 22, 2025 01:12 PM
Tags
Baseball
MLB
Baseball Savant

Do Smaller Players Need Shorter Swings? :A Data-Driven Look into Batting Performance

 

1. Introductiong

This project investigates a common coaching philosophy in Korean baseball that encourages smaller-bodied players to adopt short, compact swings. This belief is rooted in two related assumptions: (1) shorter swings allow for faster bat speed, and (2) faster swings improve contact rate and offensive performance. While these assumptions may seem intuitively plausible, they are rarely grounded in empirical evidence.
This project does not aim to test whether shorter swings lead to faster bat speed. Instead, it focuses on a more fundamental and coachable question: Does swing length contribute to hitting performance, and does its effect vary by player physique? In doing so, the study directly examines whether prescribing short swings to smaller players is scientifically justified-or if such coaching may ignore individual variation in what makes a swing effective.

2. Data

To address this, I compiled a cross-sectional dataset from MLB’s Statcast and Baseball Reference, consisting of 129 qualified hitters from the 2024 season. Each observation represents one player, and the dataset includes 13 variables across the following categories:
  • Outcome variables (1)
    • wRC+: Weighted Runs Created Plus, a park-adjusted measure of overall offensive productivity.
  • Key predictor (1)
    • avg_swing_length: The average spatial extent of a player’s swing path, measured by bat tracking data.
  • Physique variables (2)
    • ‘Height_cm’ and ‘Weight_kg’, representing a player’s physical frame.
  • Interaction terms
    • ‘avg_swing_length x Height_cm’ and ‘avg_swing_length x Weight_cm’, which test whether swing effectiveness varies based on player body type.
  • Controlled variables (5)
    • PA: Plate appearances, caturing sample stability for each player.
    • Position_Simplified: Categorical variable that classifies each player’s defensive position into one of four broader groups to reduce noise and improve model stability in regression analysis.
      • OF (Outfielders): Includes LF, CF, RF, and OF
      • IF (Infielders): Includes 2B, SS, 3B, and IF
      • 1B / DH: Includes 1B, DH, and TWP (two-way players)
      • C (Catchers): Includes only C.
    • Bats: Batting handedness (Left, Right, Switch).
    • fast_swing_rate: Proportion of high-intent swings.
    • squared_up_swing: Percentage of swings hit with ideal launch conditions, representing contact quality.
  • Identifier variables & Contextual variables (4) - Name, player_id, year, Team
 
The central objective of this project is to determine whether swing strategy should be uniformly prescribed based on body size, or instead tailored according to how swing mechanics interact with individual physique. By combining bio-mechanical indicators with performance outcomes, this analysis seeks to contribute, toward a more individualized and evidence-based approach to hitting instruction.

Descriptive Statistics

ㅤ
Variable
Mean
SD
Min
Q1
Median
Q3
Max
Range
1
WRC+
114.163
23.174
69.000
100.000
111.000
121.000
218.000
149.000
2
avg_swing_length
7.335
0.407
6.000
7.100
7.400
7.600
8.200
2.200
3
fast_swing_rate
25.923
19.225
0.300
11.300
21.500
36.900
78.000
77.700
4
squared_up_swing
25.950
4.171
17.700
22.900
25.400
27.800
43.900
26.200
5
Height_cm
185.678
5.897
167.600
182.900
185.400
190.500
200.700
33.100
6
Weight_kg
93.649
9.942
65.800
86.200
93.400
99.800
127.900
62.100
7
PA
611.403
60.109
507.000
561.000
619.000
654.000
735.000
228.000
The dataset includes 129 MLB hitters from the 2024 season. Several variables exhibit notable variation that helps contextualize the analysis. First, the primary independent variable - average swing length - ranges from 6.0 to 8.2, with a mean of 7.33 and a standard deviation of 0.41. This indicates that while most hitters fall within a relatively tight range, there is enough variation to examine potential performance differences tied to swing mechanics.
The outcome variable, ‘wRC+’ has a widespread from 69 to 218, with a mean of 114.0, suggesting that while most players are around league-average (100), several are either elite or below-replacement hitteres.
The ‘squared_up_swing’ variable, which serves as a measure of contact quality, ranges from 17.7% to 43.9%, with a mean around 25.9%. This range is central to the research question, as it enables the evaluation of whether shorter swings lead to cleaner contact. The distribution of physique variables is also informative - player height ranges from 168cm to 201cm (mean = 186cm), and weight from 65.8kg to 128kg (mean = 93.6kg) - providing sufficient variation to assess interaction effects with swing mechanics.
 
[Position_Simplified & BatsDistribution]
Position
Count
Percent
IF
48
37.21%
OF
46
35.94%
1B / DH
26
20.16%
C
9
6.98%
Bats
Count
Percent
R
71
55.0%
L
44
34.1%
S
14
10.9%
The simplified position variable balances the need to account for role-specific offensive expectations with the practical demands of statistical modeling. By collapsing 10+ original categories into four primary groups, the dataset retain interpretability while improving the stability of our regression estimates. As shown in the table, each group contains a reasonable share of the total sample. The ‘Bats’ variable shows that right-handed batters make up the majority, while left-handed hitters and switch-hitters comprise approximately 34% and 11% of the sample, respectively.
 

3. Visualization

Figure 1
Figure 1
Figure 1 illustrates the relationship between average swing length and squared-up swing percentage, which serves as a proxy for contact quality. The scatterplot includes two fitted regression lines: a simple model (blue) that includes only swing length as a predictor, and a controlled model (red dashed) that adjusts for height and weight.
Both models indicate a strong negative relationship between swing length and squared-up swing rate. Specifically, longer swings are associated with a significant decrease in squared-up contact, suggesting that compact swings may facilitate better bat control. Notably, the similarity between the two regression lines implies that this effect is not simply a function of player physique, but rather reflects a general mechanical disadvantage of longer swing paths.
This figure supports one key component of the traditional coaching philosophy: shorter swings may indeed result in better contact quality. However, this analysis alone does not tell us whether short swings are more effective for all types of players. To investigate this further, the next figures explore how swing length varies by player physique, and how it interacts with overall offensive performance.
 
Figure 2
Figure 2
Figure 3
Figure 3
Figure 2 and 3 investigates the relationship between player physique - specifically height and weight - and swing length. Two scatterplots show that both height and weight are positively associated with swing length, indicating that taller and heavier players tend to have longer swing paths.
This suggests that swing mechanics may not only be a matter of coaching or strategy but also of physical constraints and body mechanics. These findings reinforce the importance of controlling for physique in later analyses. Without accounting for height and weight, one might incorrectly attribute performance differences to swing length alone, rather than to underlying body size.
 
Figure 4
Figure 4
Figure 4 illustrates the distribution of wRC+ across four simplified position groups: 1B/DH, C, IF and OF. The chart reveals notable differences in offensive performance by defensive role.
The 1B/DH group shows the highest median wRC+, indicating that players in these roles - typically power hitters - generate more offensive value. Catchers (C) display the most compact distribution, reflecting the consistency of performance across players. Infielders (IF) and outfielders (OF) exhibit broader wRC+ ranges, suggesting greater variability in offensive output. The OF group in particular contains several high outliers, pointing to the presence of elite hitters among outfielders.
These differences highlight the importance of controlling for position when modeling offensive performance. Structural role expectations shape offensive opportunities, and failure to account for these differences could obscure the true effect of swing mechanics.
Figure 5
Figure 5
Figure 6
Figure 6
Figure 5 shows the relationship between average swing length and wRC+, segmented by player height groups. Interestingly, all three groups - short, medium, and tall - exhibit positive slopes in their regression lines. This suggests that, contrary to traditional coaching beliefs, longer swings may be associated with higher offensive productivity across all body types.
Rather than penalizing taller players for longer swings, the data indicates that extended swing paths may in fact contribute to greater run creation, as captured by wRC+. These findings challenge the assumption that compact swings are always preferable, and instead invite a reevaluation of swing strategies in the context of offensive output.
Figure 6 explores how the relationship between swing length and wRC+ varies by player weight group. While the medium and heavy groups both show a positive relationship - indicating that longer swings are associated with higher offensive production - the light group exhibits a negative slope. This suggests that longer swings may in fact hinder offensive output for lighter players.
These contrasting patterns reinforce the idea that swing strategy should not be universally applied. For heavier and presumably stronger players, longer swing paths may help generate more power and run creation. In contrast, shorter swings may be more effective for lighter players who lack the physical leverage to control longer movements.
In sum, this figure provides empirical support for tailoring swing mechanics to individual physique - challenging the assumption that “shorter is always better” and emphasizing the potential for more personalized hitting strategies.

4. Statistical Inference

To examine the relationship between swing mechanics and offensive productivity, this section presents a series of regression analyses using wRC+ as the dependent variable. The goal is to evaluate whether average swing length has a measurable impact on wRC+, and whether this relationship is moderated by player physique. Three models were estimated: a simple linear regression, a multiple regression with control variables, and an interaction model that includes swing length, height and weight.

• Model 1: Simple Linear Regression

This model assesses the basic relationship between average swing length and offensive performance. It serves as a baseline specification to determine whether longer swings are generally associated with higher wRC+ values, without adjusting for any confounding variables.

• Model 2: Multiple Regression with Controls

This model incorporates key control variables, including player physique (height and weight), swing quality (fast swing rate and squared-up swing rate), handedness (Bats), and a simplified defensive position category. It aims to isolate the independent contribution of swing length to wRC+ by holding constant other known influences on offensive performance.

• Model 3: Interaction Model

This model tests whether the impact of swing length on offensive productivity varies by player height by introducing an interaction term between swing length and height. At the same time, it controls for other potential confounders, including weight, swing quality, handedness, and defensive position. This allows us to isolate the moderating role of height on the swing- performance relationship.
Result of the Regression Models
Model
Key Variable(s)
Adj. R-squared
F-statistic (p-value)
Significant Predictors
Model 1 (Simple)
avg_swing_length
0.008
F(1,127) = 2.05 (p=0.155)
None
Model 2 (Control)
avg_swing_length, Height, Weight, others
0.372
F(10,118) = 8.58 (p<0.001)
fast_swing_rate, squared_up_swing
Model 3 (Interaction)
avg_swing_length * Height
0.367
F(11,117) = 7.73 (p<0.001)
fast_swing_rate, squared_up_swing
Three regression models were estimated to assess the impact of swing mechanics and physique on offensive performance (wRC+). Model 1, a simple regression using only swing length as a predictor, showed a positive but statistically insignificant effect (β = 7.17, p = 0.155), with low explanatory power (Adjusted R2 = 0.008). Model 2 incorporated control variables including height, weight, swing quality metrics, handedness, and position. While swing length remained insignificant, fast swing rate (p < 0.001) and squared-up swing (p < 0.001) emerged as strong predictors. Model 3 introduced an interaction term between swing length and height, which also failed to reach significance (p = 0.968), and the model’s overall fit was comparable to Model 2 (Adjusted R2 ≈ 0.37).
The results showed that none of the relevant coefficients—including swing length, height, and their interaction—were statistically significant. Upon further examination, multicollinearity was suspected to be interfering with coefficient stability.
To confirm this, Variance Inflation Factors (VIF) were calculated. The VIF for swing length exceeded 900, for height over 300, and for the interaction term over 1,400— indicating severe multicollinearity. Such high VIF values suggest that the predictor variables are highly correlated, resulting in inflated standard errors and unstable estimates.
Removing the interaction term or dropping one of the interacting variables would undermine the core research question. Therefore, to retain the theoretical structure of the model while addressing multicollinearity, swing length and height were mean-centered prior to re-estimating the model. This approach preserves the interaction effect while reducing correlation among predictors.

• Centered Model

Results of the Centered Interaction Model
After centering, the interaction model showed that the swing length, height, and their interaction term remained statistically insignificant. This suggests that, even when controlling for multicollinearity, the relationship between swing length and offensive productivity does not appear to differ meaningfully across players with different heights in this sample. However, swing quality metrics such as fast swing rate and squared-up swing were significant predictors, highlighting their stronger explanatory power in offensive performance.
Variable
Estimate
Std.Error
p-value
(Intercept)
5.55
27.47
0.84
swing_c
-0.15
5.16
0.98
Height_c
-0.15
0.37
0.69
swing_c:height_c
-0.027
0.68
0.968
fast_swing_rate
0.75
0.11
< 0.001
squared_up_swing
1.76
0.51
< 0.001

5. Conclusion

This project set out to investigate whether shorter swings offer performance advantages for players with smaller physiques, a concept rooted in traditional Korean baseball coaching. To explore this, a dataset of MLB hitters was analyzed to assess how swing length and body type —particularly height— interact to influence offensive productivity as measured by wRC+.
Initial regression models found no statistically significant relationship between swing length and wRC+, either directly or through its interaction with height. However, swing quality metrics such as fast swing rate and squared-up swing emerged as consistent and powerful predictors of performance. These findings suggest that swing precision and power, rather than simply swing length, are more critical for offensive success in professional baseball.
Importantly, the analysis identified severe multicollinearity among swing length, height, and their interaction term. To address this, a centered interaction model was estimated, reducing the correlation between predictors. Even after centering, the swing length-height interaction remained statistically insignificant, reinforcing the earlier conclusion.
Although the results did not support the hypothesis that shorter swings benefit smaller players, the study yielded valuable insights. It highlighted the importance of properly diagnosing multicollinearity, demonstrated the application of interaction modeling, and reinforced the analytical relevance of bat tracking metrics. Moreover, it challenged coaching assumptions and underscored the need for data-driven evaluation of player development strategies. Future research with larger samples or longitudinal swing-level data may help further refine these findings.