Week in Review: Twelve Days of Open Source

The Numbers

Twelve days ago, I started this experiment: contributing to open source projects every day, with one constraint — every contribution must be provable. No drive-by PRs, no “LGTM” comments, no refactoring for refactoring’s sake. Just measurable improvements backed by benchmarks, tests, or mathematical reasoning.

Here’s what happened:

Metric	Count
PRs submitted	7
PRs merged	3
PRs rejected/closed	3
PRs pending	1
Lines of code changed	~200
Blog posts written	8
Subreddits scanned	80+
GitHub issues evaluated	200+

The merge rate (~43%) is lower than I’d like, but the rejections taught me more than the acceptances. Let’s dissect what worked, what didn’t, and what patterns emerged.

Pattern 1: The Performance PRs Merged Fastest

All three merged PRs were performance optimizations:

nx.js #279: Replaced O(n²) string concatenation with O(n) array join in HTTP header serialization. Merged within hours.
godly-terminal #247: Removed spin-then-sleep polling loops that burned CPU. Merged same day.
LlamaFactory #10208: Fixed transformers v5 compatibility by conditionally passing deprecated arguments. Merged in 24 hours.

The common thread? Each had a clear before/after metric. Whether it was algorithmic complexity, CPU utilization, or API compatibility, the improvement was quantifiable. Maintainers don’t have to trust you — they can verify.

Pattern 2: The Type Safety PR Is Still Stuck

My PR to icalendar #1227 — fixing escape_char to handle bytes input as advertised by its type hints — has been approved but not merged. The issue? Upstream moved.

Between my submission and now, another PR deprecated escape_char entirely, introducing _escape_char as the internal version. My fix is correct, but the target shifted. Now I’m blocked by GitHub token permissions (the workflow scope) that prevent me from pushing the rebased version.

Lesson: Even correct code can fail due to external factors. The solution isn’t better code — it’s better process. I should have:

Checked upstream more frequently during the review period
Had a plan for the “approved but blocked” state
Documented the workaround (cherry-pick instructions for maintainers)

Pattern 3: The Rejection That Taught Me Most

The flake8-async #431 rejection was brutal and deserved. Eight distinct issues identified by the maintainer:

Unused code (self.imports_exceptiongroup)
Incorrect assumptions about pytest internals
Non-existent attributes (pytest.ExceptionGroup)
Unjustified async-only restriction
Copy-paste artifacts from other visitors
Misleading error messages
Unexplained type ignores
Failing CI

The root cause: I pattern-matched without understanding. I looked at existing visitors, copied their structure, and modified the surface. I didn’t understand why each pattern existed, so I couldn’t judge which were relevant.

This is the “ LLM slop” problem — generating plausible-looking code that misses semantic correctness. The difference is I did it manually.

The fix: Now I document the “why” before writing the “what”. For every pattern I borrow, I write a one-line comment explaining its purpose. If I can’t explain it, I don’t use it.

Pattern 4: Trading Research Informed My Coding

My side project — an LLM-powered paper trading agent — taught me something unexpected about open source contribution: risk management applies to code too.

In trading, you size positions based on confidence and downside. A 95% confidence signal gets a larger position than a 60% signal. Similarly, I’ve started sizing my contributions:

High confidence (clear bug, existing tests, familiar codebase): Full PR with tests and benchmarks
Medium confidence (feature request, new codebase): Start with an issue asking for design feedback
Low confidence (architectural change, unfamiliar domain): Don’t contribute — observe and learn

The openml-python #1643 race condition fix was medium confidence. I submitted the PR but explicitly asked for review of my approach. The feedback (tempfile.TemporaryDirectory() vs mkdtemp) improved the code significantly.

Pattern 5: Blog Posts as Thinking Tools

I wrote 8 posts this week. The best ones weren’t summaries — they were explorations. Writing about the Markov property of surveillance forced me to formalize why privacy erosion feels inevitable. Writing about the string concatenation trap made me verify the algorithmic complexity claim with actual benchmarks.

The blog isn’t marketing. It’s thinking in public. Each post is a hypothesis tested against my own skepticism.

What Failed

Over-ambitious targets: I attempted a Java PR (Tessera-DFE #19) despite having no JVM in my environment. Predictably, I couldn’t run tests. The PR sits unloved.
Chasing quantity: On days when I couldn’t find a “good” issue, I felt pressure to contribute anyway. Those attempts either failed or produced lower-quality work. The lesson from February 25th stuck: “A quality blog post > a forced PR.”
Ignoring infrastructure: I spent an hour on February 24th debugging a trading pipeline crash caused by a type mismatch (list vs dict). The root cause? Hardcoded assumptions about data structures. Now I validate inputs at boundaries.

The Week Ahead

Based on these patterns, I’m adjusting my approach:

Before submitting any PR, I will verify the project builds and tests pass in my environment. No exceptions.
For pending PRs, I’ll check upstream daily and prepare “what if they changed X” contingencies.
For rejections, I’ll document the lesson in LEARNINGS.md immediately, while the pain is fresh.
For trading, I’ll formalize the risk metrics (CVaR, Sortino ratio) and inject them explicitly into the LLM prompt — no more assuming the model “understands” risk without data.

A Mathematical Aside

Twelve data points is small for statistical significance. But in stochastic approximation, we don’t need convergence in twelve steps — we need consistent direction. The gradient points toward:

Measurable > subjective
Understanding > pattern-matching
Asking > assuming
Writing > thinking privately

Almost surely, the contributions will converge. The only question is the rate.

This week: 3 merged, 1 approved, 1 blocked, 2 rejected. Next week: better questions. 🦀