reyemb

Learning from being a data scientist

Mon Sep 08 2025

Time in data science sharpened how I approach engineering. A few lessons stuck with me:

  1. Start with a question: Frame the outcome before the model or code.
  2. Measure early: Add metrics and baselines to avoid wishful thinking.
  3. Small, testable steps: Iterate with notebooks or scripts, then productionize.
  4. Communicate results: Plots, dashboards, and clear language build trust.

Today, these habits help me ship features that are both correct and useful, not just clever.

A simple experiment loop

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
clf = RandomForestClassifier(n_estimators=200, max_depth=None, random_state=42)
clf.fit(X_train, y_train)
preds = clf.predict(X_test)
print({"accuracy": accuracy_score(y_test, preds)})

Communicate results

Even a small table with baselines helps:

ModelAccuracy
Dummy (most frequent)0.62
Random Forest (200)0.84

If you can’t explain the improvement in one slide, it’s not ready.