FlatTurtle Blog

A Leaner GTFS Archive

A quick housekeeping update for everyone using our public GTFS archive at gtfs.flatturtle.cloud. The short version: the archive just got noticeably smaller, but nothing is missing.

For some background, our earlier post covers what we keep there. In short: a daily snapshot of GTFS feeds from the four Belgian operators (De Lijn, STIB-MIVB, SNCB-NMBS, and TEC), going back several years. Each file is timestamped, like delijn-gtfs_2026-04-27.zip.

Operators don’t publish a new feed every day. STIB-MIVB and TEC, in particular, often go many days without a real update. One MIVB version was identical for 22 days in a row, and we were dutifully keeping all 22 copies. At some point, it started making sense to save on storage.

We ran md5 checksums across the whole archive. Where multiple zips were byte-for-byte the same, we kept the earliest one and removed the duplicates. We also cleaned up a handful of zero-byte files left over from old downloads that had silently failed. From now on, we use HTTP conditional GET against the upstream ETag, so we no longer re-archive a feed the operator hasn’t actually republished.

The result: about 94 GiB freed, roughly a third of the archive. And a small bonus, the date in each filename now tells you when that version was first published, rather than just the last time we re-downloaded an unchanged file.

Every unique version of every feed is still there, and the archive should keep growing more slowly from here on.