Were the Cleveland Guardians Right to Cut Nolan Jones? A Statistical Analysis

Gone Fishin’ with Professor Saber

What a mess! I sat slumped at my desk with my head buried in my arms, lost under an avalanche of scribbled notes and wadded-up pieces of paper. Player names, spring training stats, roster projections — all of it scratched out, circled, crossed out, and scribbled over again. I lifted my head and sent a small cascade of crumpled wads tumbling to the floor. I pushed the remaining pile aside and tried, for what felt like the gazillionth time, to make sense of it all.

The Guardians had been cutting players left and right and it was getting hard to keep it straight. Every morning brought a new move:

Watson — gone.
Rodríguez — gone.
Halpin — gone.
Jones — outrighted to Columbus.

I stared at the stats. All the numbers were starting to swim. I just couldn’t see the signal through the noise.

I tapped my pencil on the desk. Think, Mario, think! Halpin had looked good out there. Watson too. And sure, Jones hadn’t exactly been setting the world on fire — but it was still just Spring Training. Big decisions were being made with just thirty-some at-bats. Was such a small sample size really enough to know anything? How could I know if the difference between these players was actually significant?

The word hung in the air. I sat up straight. I knew exactly who I needed to see.

I grabbed my jacket and headed for Case Western. At the math department I found the familiar gray carpet, white walls, and fluorescent lights humming their monotonous hum. I knocked on Professor Saber’s door.

From inside came the familiar sing-song: “Come iiiii-in!”

I turned the handle — and…

To my great surprise the deep navy walls decorated with planets and shooting stars that I remembered from my last visit were gone. In their place was painted a sweeping ocean panorama — pale blue sky meeting open ocean, whitecaps catching the light, and out on the water, half a dozen boats casting large nets into the sea. I could swear I saw one of the boats actually move!

And there at the helm of her desk sat Professor Saber, decked out head to toe in a full Lake County Captains uniform — jersey, hat, and all — looking like she’d just sailed in from Lake Erie.

“Permission to come aboard, Captain?” I asked.

“Permission graaaanted!” she sang out, both arms raised in her signature greeting.

I shook my head, laughing, and dropped into the familiar beanbag chair, turquoise as the ocean. It made that satisfying whoosh sound as I settled in.

“So tell me, my seaward friend,” she said, getting up and absently coiling a large length of rope over her shoulder, “what brings you to my skiff today?”

“Well, Captain,” I said, settling deeper into the beanbag, “the Guardians have been making a lot of moves and I’ve been trying to figure out if they actually make sense based on the data. I’ve got all these spring training stats, but it’s hard to know if the numbers mean anything with such small sample sizes.”

I pulled out my notes — slightly crumpled — and spread them on my knee. “I’ve got batting averages, OBP, at-bats, the whole shebang. But I just can’t tell if the difference between these players is real, or if I’m just staring at noise.”

Professor Saber’s eyes lit up the way they always did. She heaved the coiled rope into the corner of her office and sat down at her desk, kicking her feet up on the desk: shoes… I was half expecting cleats.

“You want to know how confident you can be.”

She said the word like it meant something.

“Yeah…” I said, thinking it over. “I guess I want to know how much I can actually trust what I’m looking at with just thirty at-bats — that’s a tiny sample size. Is that really enough to know anything?”

“Oh, you’d be surprised,” she said, with a mischievous glint in her eye. She swung her feet off the desk, reached up, and grabbed a rope hanging from the ceiling — which I was only just noticing — and gave it one firm pull.

From somewhere above the whiteboard that hung on her wall, a giant eraser swung down connected to a wooden arm on a pulley system and skreeeee — squeaked across the entire board in one clean pass. It then swung back the other way — skreeeee — and vanished back into the ceiling.

Professor Saber was already uncapping a marker.

“Let’s start with batting average,” she said, writing across the fresh board in large, bold letters:

Batting Average = Hits / At-Bats

“Batting average is really just a proportion,” she continued, tapping the equation. “It’s our best estimate of the fraction of at-bats where a batter gets a hit. If we multiply it by 100, it becomes a percentage — a probability — but as a decimal, we call it a proportion. We take the number of hits, divide by the number of at-bats, and there’s our number.”

“Right,” I said. “But that number jumps all over the place when we have just a few at-bats early in the season.”

“Exactly right!” She exclaimed, and excitedly grabbed a green marker. “Let’s say a player steps up for his very first at-bat and gets a hit. He’s batting 1.000. He’s perfect!”

She wrote in green:
1 Hit / 1 AB = 1.000
And then added a big green arrow pointing up.

She continued, “Then he comes up again — and misses. Suddenly he’s batting .500.”

She wrote on the line below:
1 Hit / 2 AB = .500
And added a red arrow pointing down.

“Just one at-bat later and he’s dropped 500 points! He comes up a third time — misses again. Now he’s one for three, and he’s dropped down to .333. But if he gets another hit, he’s back to 2 hits out of 4 at-bats, and he’s batting .500 again. The number is moving wildly!”

“So the batting average is basically meaningless with small sample sizes,” I said.

“Not meaningless — just highly uncertain!” she corrected, wagging the marker. “But what happens as the season goes on and our hitter keeps getting at-bats?”

“The batting average… levels off?” I offered.

“It converges!” she said, spinning back to the board and writing the word in a dramatic swooping hand. “As a player gets more and more at-bats, the batting average starts to stabilize. It converges toward something that looks more and more like their true ability. So the more data we have, the more we can trust the statistic we are looking at. And the less data we have…”

She let the conclusion dangle in the air.

“The less we can trust it,” I finished.

“Very good! The less data we have, the less confident we are,” she echoed, nodding slowly. “So. You’ve got thirty-some at-bats per player. It’s still just Spring Training. How much can you really trust those batting averages?”

“That’s —” I tried to lean forward in the beanbag but I struggled and fell in 3 inches deeper, “— that’s exactly what I’m trying to figure out, Professor.”

Professor Saber had a full-body laugh at my beanbag battle before turning back to the board and writing in enormous letters:

She underlined it twice for good measure.

“What’s a confidence interval?” I asked as I shifted in the beanbag finding a better position.

“I’m so glad you asked,” she said, already halfway around her desk. She crouched down, rummaged around underneath it for a moment, and then hauled out — with considerable effort — an enormous tangled fishing net, dumping it on the desktop with a dramatic thwump.

I stared at the massive net.

“Do you like fishing, Mario?” she asked pleasantly.

“I — what does —”

“Do you like fishing?” she insisted.

I looked at the net. I looked at the ocean painted across the walls. I looked at her Captains jersey. “Sure,” I said with a shrug. “Let’s go fishing, Professor.”

“Wonderful!” She spread the net wide across the desk with both hands. “Here’s the thing about a batting average, Mario. It’s just one number. A single estimate of a player’s ability. And that single number is our best guess at something we can never actually know — a player’s true batting average. The number they would hit if they played forever, in every condition, against every pitcher, across an infinite career. That true value is out there somewhere —” she gestured toward the ocean painted on the wall, “— swimming around in the ocean. And our job is to cast a wide enough net to catch it.”

“Okay,” I said slowly. “So a confidence interval is like the net?”

“The confidence interval is the net,” she said. “It’s a range of likely values for a player’s true batting average, based on the data we have. Instead of pointing at one number and saying that’s who this player is, we cast a net and get a range of values instead. And how wide that net is depends on two things: how much data we have, and how confident we want to be that the true value is somewhere inside it.”

“How confident we want to be,” I repeated. “Like a percentage?”

“Exactly like a percentage. Do we want to be 80% confident our net contains the true value? 95% confident? 99%?” She drew three horizontal lines on the board, each one wider than the last, labeling them 80%, 95%, 99%. “The more confident we want to be, the wider the net has to be to capture the true value.”

I squinted at the board, and then proclaimed loudly, “I want to be 100% certain!”

She laughed, delighted. “Wouldn’t that be something! But to be 100% certain we’ve captured the true value, we’d need a net so enormous it tells us absolutely nothing. I can be 100% confident that Petey Halpin will bat somewhere between .000 and 1.000 — but that doesn’t exactly help us, does it?”

“So there’s a tradeoff,” I said.

“There is always a tradeoff in statistics,” she emphasized.

“So how do we pick what confidence level to use?”

“Well,” she said, leaning forward with that mischievous glint in her eye, “let’s let the data speak for itself.”

She reached under her desk, flipped up a large ship wheel, and gave it a hefty spin. Behind her, a large computer screen dropped down from the ceiling.

“Fortunately,” she said, “with today’s statistical software we don’t have to do the calculations by hand.” She turned to me. “Do you mind giving me your data there, Mario?”

I leaned forward in the beanbag, with some difficulty, and handed over my notes. She smoothed them out and then fed them into a narrow slot on her desk.

The screen flickered to life, and a radar beam began sweeping in slow circles.

Faster and faster until the screen flashed:

I looked at the screen. There was clearly an order — Watson sitting on top, albeit with a wide interval, and Jones anchoring the bottom — but every single bar was enormous, stretching way out in both directions, overlapping with the players above and below like a pile of pickup sticks.

“So there’s clearly an order,” I said slowly, “but everything overlaps. I can’t really say anyone is definitively better or worse than anyone else.”

“Exactly right,” she said. “We can see who has the higher batting averages and who has the lower ones. And we can think of these intervals as comparing the floor and ceiling of each player — that’s useful on its own. But look at some of these ceilings, Mario.” She pointed at the top end of Martinez’s bar. “Are we seriously expecting Angel Martinez to hit .505 this season?”

“No,” I admitted. “That’s just him getting hot early with a small sample size. It’s not going to hold in the long run.”

“Right. And DeLauter at .561? Watson at .688?” She shook her head. “At 95% confidence the nets are too wide to tell us much of anything. We need narrower intervals if we’re going to see any real separation between players.”

I looked back at the three lines on the whiteboard — 80%, 95%, 99%, each one wider than the last. Something clicked. “So if we want to be more confident, we need a wider interval… which means if we want narrower intervals, we use a lower confidence level,” I said slowly, working it out as I went.

“I think someone is starting to get their sea legs!” she exclaimed.

I tried not to look too pleased with myself from inside the beanbag. But it was difficult.

“But before I show you what 80% looks like,” she said, holding up one finger, “I want to justify the choice. Because 80% might sound arbitrary.” She perched on the edge of her desk. “95% confidence intervals are the standard that you see most often. But sometimes we need to adjust our approach. Think about presidential election forecasters. If pollsters were to predict a race using 95% confidence intervals, every single race would look too close to call. So the best forecasters in the world — your Nate Silvers, your FiveThirtyEights — they use 80% confidence intervals. Not because it’s a special number, but because it’s the sweet spot between confidence and clarity. Tight enough to see a real pattern. Honest enough to acknowledge what we don’t know.”

“So we’re not being sloppy by dropping the confidence down to 80%,” I said. “We’re being practical.”

“Exactly,” she said, reaching over and giving the ship’s wheel another spin.

Behind her the screen flickered again as the radar swept the ocean, searching for a signal.

The chart refreshed and the bars tightened. Professor Saber stepped aside so I could see.

And suddenly — there it was.

“Do you see it, Mario?” she asked breathlessly, leaning forward.

“I sure do!” I said. “Wait… what do I see?”

She laughed. “Look at the top three. Watson, DeLauter, Martinez — their intervals all sit above .250. Every single one of them.” She dragged her finger along the bottom of those three bars. “Now look at Jones at the bottom.”

I looked. His bar stopped at .232.

“His entire interval,” I said slowly, “falls below the intervals of those three players. There’s no overlap at all.”

“Which means?” the Professor edged me on.

“Which means Jones is significantly worse than Watson, DeLauter, and Martinez?”

“Congratulations,” she said, snapping to attention and giving me a crisp salute. “You’ve just been promoted to Lieutenant!”

I tried to sit up straight in the beanbag to accept the honor, but the beanbag had other ideas and I simply sank further in.

“We dropped our confidence to 80% to narrow our intervals,” she continued, “and now we have real separation. Jones doesn’t overlap with the top three players at all. That’s not a small difference. That is a statistically significant gap — and that,” she said, tapping the screen, “is why he’s on a bus to Columbus.“

“But batting average is just one net, Mario,” she continued. “It only tells us about hits. It says nothing about a player’s patience at the plate.” She grabbed the ship’s wheel firmly in both hands. “So let’s cast a second net and see what we get looking at on base percentage!”

She gave the wheel another spin.

The results appeared for on base percentage — with Watson, DeLauter, and Martinez on top once again, and Jones all the way at the bottom.

“So the top three are the same,” I said, studying the new chart, “but now Halpin has jumped up to fourth.”

“And do we see any significant differences with OBP?” Professor Saber asked, gesturing at the screen.

I looked carefully. “Jones still falls completely below Watson and DeLauter — no overlap at all. But there’s a little overlap now between Jones and Martinez.”

“Which means what, Mario? Tie it all together.”

“Jones is significantly worse than Watson and DeLauter in OBP — but we can’t say he’s significantly different from Martinez.”

“Tied like a true sailor’s knot!” she exclaimed proudly. “Now —” she pointed at the screen, “— what do you notice about the width of Watson’s interval compared to Jones’s?”

I studied the bars. “Jones’s interval is narrower. So… we’re more certain about where his true value is?”

“Exactly right,” she said. “Jones’s interval is narrower because he had more at-bats — more data gives us a tighter interval. Watson had fewer at-bats, so his interval is wider. We know less about him.” She tapped the screen. “But even with all that uncertainty, you can see exactly why they put Jones on that bus.”

I leaned back in the beanbag and looked at the chart for a long moment. “So when the front office sent him down, they weren’t just going on a gut feeling. The data actually backed it up — even with just thirty at-bats.”

“Even with just thirty at-bats,” she confirmed, pointing the marker at me like I’d just won a prize.

I chewed on that for a second. “Still,” I said, “this feels pretty simple. Is this really what major league front offices are doing to analyze players?”

“Something like it — but they’ve got a bigger boat,” she said, leaning forward. “Teams use a technique called multiple regression, which lets them factor in career statistics, injury history, age, ballpark effects, platoon splits — the works. More data means tighter intervals so their results are even narrower than ours. But the bones of the analysis are the same.”

“And even with our simple version,” I said, “we could still see a significant difference between Jones and the top three players.”

“A statistically significant difference,” Professor Saber said triumphantly.

Saluting, I said, “Aye aye, Captain,” and rolled myself out of the beanbag.

Were the Cleveland Guardians Right to Cut Nolan Jones? A Statistical Analysis

Tags: