by sarchertech 8 hours ago

>$1.8 million

That sounds like a completely made up bullshit number that a junior engineer would put on a resume. There’s absolutely no way you have enough data to state that with anything approaching the confidence you just did.

jjmarr 7 hours ago | [-2 more]

It's definitely a resume number I calculated as a junior engineer. Feel free to give feedback on my math.

It is based on $125/hr and it assumes review time is inversely proportional to number of review hours.

Then time to merge can be modelled as

T_total = T_fixed + T_review

where fixed time is stuff like CI. For the sake of this T_fixed = T_review i.e. 50% of time is spent in review. (If 100% of time is spent in review it's more like $800k so I'm being optimistic)

T_review is proportional to 1/(review hours).

We know the T_total has been reduced by 23.4% in an A/B test, roughly, due to this AI tool, so I calculate how much equivalent human reviewer time would've been needed to get the same result under the above assumptions. This creates the following system of equations:

T_total_new = T_fixed + T_review_new

T_total_new = T_total * (1 - r)

where r = 23.4%. This simplifies to:

T_review_new = T_review - r * T_total

since T_review / T_review_new = capacity_new / capacity_old (because inverse proportionality assumption). Call this capacity ratio `d`. Then d simplifies to:

d = 1/(1 - r/(T_review/T_total))

T_review/T_total is % of total review time spent on PR, so we call that `a` and get the expression:

d = 1 / (1 - r/a)

Then at 50% of total time spent on review a=0.5 and r = 0.234 as stated. Then capacity ratio is calculated at:

d ≈ 1.8797

Likewise, we have like 40 reviewers devoting 20% of a 40 hr workweek giving us 320 hours. Multiply by original d and get roughly 281.504 hours of additional time or $31588/week which over 52 weeks is little over $1.8 million/year.

Ofc I think we cost more than $125 once you consider health insurance and all that, likewise our reviewers are probably not doing 20% of their time consistently, but all of those would make my dollar value higher.

The most optimistic assumption I made is 50% of time spent on review.

sarchertech 6 hours ago | [-1 more]

The feedback is don’t put it on a resume because it looks ridiculous. I can almost guarantee you that an A/B test design wasn’t rigorous enough for you to be that confident in your numbers.

But even if that is correct you need a much longer time frame to tell if reviews using this new tool are equivalent as a quality control measure.

And you have so many assumptions built in to this that are your number is worthless. You aren’t controlling for all the variables you need to control for. How do you know that workers spend 8 hours a week on reviews vs spending 2 hours and slacking off the other 6 hours? How do you know that the change of process created by using this tool doesn’t just cause the reviewers to work harder, but they’ll stop doing that once the novelty wears off? What if reviewers start relying on this tool to catch a certain class of errors for which it has low sensitivity?

It’s also a moot point if they don’t actually end up saving the money you say they will. It could be that all the savings is eaten up because of the reviewers just use the extra time to dick around on hacker news. It could just be that people aren’t able to make productive use of their time saved. Maybe they were already maxing out their time doing other useful activities.

All of this screams junior engineer took very limited results and extrapolated to say “saved the company millions” without nearly enough supporting evidence. Run your tool for 6 months, take an actual business outcome like time to merge PRs, measure that, and put that on your resume.

It’s incredibly common for a junior engineer to create some new tooling, and come up with some numbers to justify how this new tooling saves the company millions in labor. I have never once seen these “savings” actually pan out.

jjmarr 6 hours ago | [-0 more]

I took it off LinkedIn and replaced with time to merge reduction of 20% over two weeks of PRs (rounding down). I expect to justify the expenditure to non-technical managers in my current role, which is why I picked $s.

> All of this screams junior engineer took very limited results and extrapolated to say “saved the company millions” without nearly enough supporting evidence.

That's what the only person in my major who got a job at FAANG in California did, which is why I borrowed the strategy since it seems to work.

> I can almost guarantee you that an A/B test design wasn’t rigorous enough for you to be that confident in your numbers.

Shoot me an email about methodology! It's my username at gmail. I'd be happy to get more mentorship about more rigorous strategies and I can respond to concerns in less of a PR voice.