MBA Final Paper
Data Science and Analytics. University of Sao Paulo 2024
Proposal and introduction
My thesis analyzes Brazil’s Magic: The Gathering singles market to turn dispersed store data into decisions on assortment and pricing. Scope covers certified retailers, segmented by game format. The objective is to measure, on a comparable basis, how well each store covers format-relevant cards and how its prices sit against the market, linking these choices to margin, rotation, and customer reach.
I collected public data from store sites, normalized the catalog by unique card IDs, edition, language, condition, variant, stock, and price, and filtered the universe to competitive-demand cards. With a clean dataset, I contrasted inventory depth by format with price deviation from market averages, revealing strategies of breadth, specialization, and pricing posture by category.
Technical Approach
I designed and executed the end-to-end pipeline. Using structured scraping, I built a per-store URL matrix, parsed product pages, and conformed everything to a single schema. I removed duplicates and outliers, standardized currency and condition, and added data quality checks with regex rules, field completeness tests, and winsorization of extreme prices. The effort spanned dozens of stores and more than one hundred thousand URLs, producing a dataset with hundreds of thousands of rows.
For analytics, I created two core axes. Availability by format measures how much a store sustains, at practical quantities, the key cards of each format, capped at typical deck needs. Price posture captures deviation from the format mean per card using absolute spread and z score after normalization. With these indicators, I produced rankings and a dashboard by store and format that expose assortment gaps, overpricing, and opportunities to adjust. The outputs translate into actions such as rebalancing inventory by format, tuning price ladders, and setting targets for launches and restocks.

