Article
When Git Became the Bottleneck
How I migrated 3,000+ academic materials from GitHub to Notion, and turned a community archive into something you can actually operate

1. Context
CAECOMP is the Computer Engineering Student Association at IFCE Fortaleza. Since 2016 our organization has kept a GitHub repository with more than 3,000 materials: past exams, problem sets, slides, assignments, loose files. It's used by hundreds of students in the program and is probably one of the Student Association's most valuable assets, along with welcoming first-year students.
The problem is that, to upload a PDF to this repository, a student had to understand Git, GitHub, clone, branch, pull request, and then wait for someone in the Student Association with a working local environment to review it. At some point this stopped being an open workflow and became an unintentional technical filter.
We migrated to Notion. Underneath sits an automation layer that handles scale, idempotency, and traceability. On top, the flow became obvious: a form to contribute, a database to review, a button to publish.
This piece is an autopsy of the project. Who the user is, where GitHub was wrong for them, what migrated, what failed, and what I learned trying to fix it.
2. Snapshot
Field | Detail |
|---|---|
Project | Migrating CAECOMP's academic archive to Notion |
Organization | CAECOMP / IFCE Fortaleza |
Users | ~500 to 600 Computer Engineering students, plus students from programs that share courses |
Scope | 3,000+ materials |
My role | President of CAECOMP, responsible for the migration, the automation, and the governance design |
Stack | TypeScript, Node.js, Notion API, Cloudflare Workers, Hono |
Spin-off projects |
|
3. The problem
The repository had value. A first-year student looking for a past exam, an upper-year student hunting down a hard problem set, someone searching for slides from a demanding course, everyone passed through there. That was the point.
But the contribution flow no longer fit the audience it had. To send in a material, a student needed to have the file (or take a photo of the exam), clone a repo several gigabytes in size, survive the clone (on Windows, that alone could become a gamble), understand the folder structure, create a branch, open a PR, and then hope someone from the Student Association would show up to review it. That person also had to have the repo cloned and a working environment. All of that for a PDF.
Of the 500 to 600 students who benefited, only a small fraction had real Git fluency. Among the Student Association members, same story: some knew how to accept a PR, and the responsibility kept falling on the same two or three people, every time. The rest is what you'd imagine. Requests piling up, contributions stalling, Student Association members burning out, students giving up on sending files, the archive slowly losing credibility. In the worst case, parallel WhatsApp groups would start popping up to trade PDFs with no curation at all. To keep that from spiraling, CAECOMP's leadership created an official channel instead, so the demand could be acknowledged without abandoning curation entirely.
The underlying issue was something else. The archive was disorganized, sure, but what really hurt was that the community had no simple way to organize it.
4. Where GitHub was wrong for this case
Let me be clear: I have nothing against GitHub. It's great for code, for version control, for technical collaboration, for open source culture. I use it every day.
The point is that our problem wasn't versioning code. It was storing, consuming, reviewing, and publishing PDFs for a student community with uneven technical fluency. The tool looked open, but the actual experience excluded most people.
The average student could navigate the folders, with plenty of friction: structures nested too deep, inconsistent naming, a viewer that kept failing on large PDFs, bad search, zero metadata. Contributing was an even bigger barrier. One very specific example: the Computer Graphics materials were stored in a structure with a professor folder inside a professor folder, with folder names so erratic that the migration code itself had to treat these cases as exceptions.
5. The decision to move the interface
The turning point came when CAECOMP joined Notion for Student Organizations, a program that gives a verified student organization the Plus Plan for free, with unlimited members in the workspace. Without that benefit, frankly, I wouldn't recommend the switch for a Student Association with no budget. With it, the math changed.
Notion offered a familiar interface for a semi-technical audience, databases with filters and views, Forms for contributions, a public page via Notion Site, access control, internal review, a usable mobile experience, and collaboration between the devs and the non-devs. Without requiring Git from anyone.
The new model is straightforward:
CAECOMP's role changes in this arrangement too. We went from technical guardian of a repository to curator of student knowledge, and those two things require very different skills.
6. The technical part
The decision was simple. The execution, not so much.
The archive had more than 3,000 files, several gigabytes, bad names, duplicates, unpredictably nested folders, and plenty of stuff that shouldn't even go to Notion (zips, entire code projects, etc.). Doing it by hand was out of the question.
The solution became two layers. repo2notion handles the bulk migration: scanning the repo, extraction, cleanup, metadata inference, a confidence score, sorting candidates into good, bad, and doubtful, and exporting to Notion. notion-file-actions handles the day-to-day operation. Since Notion doesn't natively have the action I wanted, copying a file from one page/database to another inside the acceptance flow, I built an external automation: webhook, Cloudflare Worker, Hono, TypeScript, Notion API. The Notion API already supports programmatic upload and attachment through the file_upload cycle, so I could close out the operation without an ugly hack.
Deterministic first, AI as fallback
A choice that mattered: not using AI as an irresponsible foundation for the migration. An academic archive has real value, and filing an exam under the wrong course isn't just a bug, it's a break in the community's trust in the system.
So the strategy ended up layered. Deterministic rules and regex to clean and classify. A confidence score to evaluate metadata. Generative AI only when the score was low. An intermediate record for everything, with an audit trail. Human review when needed.
Of the roughly 3,000 materials, most came out with good metadata directly. A few hundred low-confidence records needed an AI refinement pass. A few hundred more failed as attachments but kept a traceable record, almost all of them zips or code projects that Notion doesn't handle well.
AI didn't replace modeling. It came in as a controlled fallback to reduce manual review where the rules already indicated low confidence.
Idempotency
A big migration always fails at some point. APIs have rate limits, networks drop, files corrupt, data comes in inconsistent. The script had to be able to run again without turning into a mess. Each file gets a key based on its content, so you can avoid duplicate uploads and reprocessing. That let me run the pipeline several times without turning Notion into a triplicate archive, and later it saved fixes mid-way, things like updating a record with the original path even after part of the migration had already happened.
A migration script that can't be run again is a bet, not a pipeline. I learned that the hard way, on earlier projects.
7. Governance
The most important part of the project wasn't moving files. It was redesigning who does what.
The new system separates four things that used to be tangled together in a single interface. Public contribution: anyone can submit material. Internal review: Student Association members evaluate and correct. Publication: only what's approved shows up in the public archive. Consumption: students access it through views, filters, pages, and Notion Site.
The minimum fields in the form are title, course, file, material type when applicable, and an internal note when needed. Day to day, any trained Student Association member can review using views and a button. The technical layer still exists underneath, but it's encapsulated and auditable.
The complexity didn't disappear. It moved somewhere else.
8. Contributing, before and after
Dimension | Before | After |
|---|---|---|
Contribution | Git, clone, PR | A form |
Review | PR + local repo | Internal database |
Publication | Manual merge | Button / automation |
Consumption | GitHub folders and viewer | Notion database / site |
Who can contribute | A Git-fluent minority | Any student with a camera and internet |
Time to publication | Could stretch to months | Expected in days, worst case |
Governance | Implicit and fragile | Separate flow to submit, review, and publish |
9. What to expect in terms of impact
The number of migrated files is the least interesting part. What really changed is the organization's capacity.
Before, the archive was useful but hard to maintain, contributing required disproportionate technical maturity, and responsibility kept falling on the same few people. As requests piled up, trust in both the archive and the contribution process started to erode. Now any student can contribute, any Student Association member can review without Git, and the archive has metadata. Publication becomes an operational process, and the governance is clear enough for the next leadership to carry on without reinventing everything from scratch.
There's an effect that matters beyond this project. CAECOMP Skills, the bet on turning CAECOMP into a platform for hands-on training with learning tracks, workshops, and digital assets, can reuse this foundation. The archive stops being an isolated repository and becomes input for other initiatives.
Metrics I plan to track
Total migrated, % with good metadata, % that needed AI, % not attached but tracked, new contributions per month, average time between submission and publication, materials per course (and where they're missing), approval rate, rejection rate with reason, unique contributors.
Two that matter more than the rest. Time-to-first-contribution: how long it takes an average student to contribute for the first time. Before, it involved Git, clone, PR, environment failures, waiting. Now you can measure it in minutes. And the findability test: how long it takes a student to find a recent, relevant exam for a specific course. It measures what matters, which is the experience of the person using it, not the architecture.
10. Why I count this as a DevRel case
Maybe "DevRel" is a big word for a Student Association project, but the underlying exercise is the same: noticing that a community is stuck because of a poorly positioned tool, and building the bridge for more people to take part.
The audience here is a community of people who want to become devs but don't yet operate like a software team. That changes the nature of the thing and the design of the solution. There are first-year students, upper-year students, builders, Student Association members, people on Windows, people who only use a phone, people who want to contribute but get stuck on Git. The solution had to respect that heterogeneity. A decision made with only the most technical profile in mind would, again, leave the majority out.
Personally, I like this kind of engineering. Building the system is half the work. The other half is building the interface between people and the system, and that half almost never shows up in a README.
11. What I'd tell another Student Association
Don't start by asking which tool is more technical. Start with: who consumes, who contributes, who reviews, and who maintains it when the current leadership leaves. Then ask where the current tool creates unintentional gatekeeping, and what can be automated without becoming a black box. For us, the answer was dropping GitHub as the main interface and using Notion as the operational layer. But the moral of the story isn't "use Notion." It's: choose the interface based on the community, and design the automation to sustain that choice.
12. Final reflection
The decisive thing here was realizing that knowledge management had become dependent on a handful of people with enough technical fluency to handle a giant, inconsistent repo. From that diagnosis on, the project stopped being a file migration and became an organizational redesign.
The biggest risk wasn't technical, it was cultural. Changing contribution habits takes time. And the most delicate trade-off was giving up GitHub's open source aesthetic as the main interface. The repository still exists as a backup and historical reference, but the live flow happens where the community can actually operate.
CAECOMP didn't need another place to store PDFs. It needed a system where hundreds of students could find, contribute, and maintain academic knowledge without depending on the Git fluency of a few. The project was about trading a workflow that looked open for a participation infrastructure that works.
Have a technical problem that deserves a serious answer?
Tell me what you’re building, what’s stuck, or where you need a stronger technical owner. I’ll look at the context and tell you straight whether Luminiware is the right place for it.
No sales pitch. The first conversation is about the problem, the constraints, and how much engineering ownership it really needs.