When Iceberg Says 321 GB and S3 Says 46 TB

Iceberg said 321 GB. S3 said 46 TB.

Same table.

That mismatch is what happens when the table and the storage path stop telling the same story. In this case, the culprit was huge orphan-file accumulation.

The uncomfortable part: both numbers can be true.

Iceberg and S3 are looking at different things.

Iceberg asks: which files are still part of the table?

S3 asks: which objects still exist under this path?

That gap is where the bill hides. Some files are retained table history. Some are leftovers from failed jobs. Some are true orphan files that the table will never read again.

If you operate Iceberg on object storage, you need to care about both.

How this happens

An Iceberg table is not just a folder full of Parquet files. Think of it as a table plus an index of the files that count.

Queries follow the index. They do not scan every object in the S3 folder and decide what belongs.

That index is Iceberg metadata:

the table metadata file points at the current snapshot and snapshot log
each snapshot points at a manifest list
each manifest list points at manifest files
manifest files point at data files and delete files

If a Parquet file is not reachable through that metadata, the table should not read it.

S3 does not care about the graph. S3 has objects, keys, sizes, versions, lifecycle rules, retention policies, replication state, and a bill.

That separation is the point of Iceberg. It is also the trap.

A migration can copy files into the table path and fail before the commit. A rewrite can produce replacement files and leave old files behind until snapshot expiration and cleanup are allowed to remove them. Snapshot expiration can free files tied only to old table versions, but it does not prove the whole prefix is clean.

A maintenance workflow can report green because a scheduler task ran, even if it skipped the tables that needed physical cleanup.

That is how a table with hundreds of gigabytes of live data can drag around tens of terabytes of physical baggage it no longer reads or no longer needs.

The failure is usually a chain

Huge orphan accumulation rarely comes from one dramatic bug. It is usually a chain of small operational lies.

First, something writes files the current table will not keep forever, or will never reference at all. A failed commit, migration, backfill, compaction job, format conversion, manual copy, or “temporary” export lands objects under the table path.

Second, the table keeps moving. New snapshots arrive. Compaction rewrites more data. Deletes create delete files. Maintenance produces more metadata. Queries still work, so nobody feels urgent pain.

Third, cleanup becomes a checkbox. Someone sees that “Iceberg maintenance” ran and assumes storage was reclaimed. But table maintenance is not one thing. Compaction is not snapshot expiration. Snapshot expiration is not orphan-file removal. Manifest rewrite is not a storage audit.

Fourth, cost visibility comes from the wrong layer. The platform watches Iceberg metadata and query performance, while the storage bill is growing in S3. By the time finance notices, the table prefix is full of unexplained files.

Nobody needs to be reckless for this to happen. They only need to treat metadata health and storage health as the same signal.

Why metadata alone misleads platform owners

Table metadata answers a table question: what files belong to this table?

Object storage answers a storage question: what bytes exist under this path?

Both are true. Neither is enough alone.

If the platform only monitors Iceberg metadata, the table can look healthy. Queries work. Row counts make sense. Live data size looks manageable. Time travel works inside the retained snapshot window.

Meanwhile S3 may show a completely different operational reality: old Parquet files still retained for history, old delete files, abandoned metadata files, copied migration outputs, and failed rewrite outputs still sitting under the same table path.

The fix is not another chart with a nicer color. The fix is a repeated check:

list the files Iceberg still references
list the objects S3 still stores
compare the two sets
explain the delta before deleting it
verify physical bytes after cleanup

If you cannot do that, you do not know whether the table path is clean. You only know the query engine can still find its current files.

S3 Inventory is not S3 Storage Lens

The common mistake is reaching for a summary tool when the problem needs an object list.

S3 Storage Lens is useful. It gives high-level visibility into storage usage and activity. It is good for budget-review questions like “which buckets are growing fastest?” and “where is the storage concentrated?”

It is not the tool I would use to prove which objects are unreachable from an Iceberg table.

For that, use S3 Inventory.

S3 Inventory produces scheduled object-level reports for a bucket or prefix. In plain English: it gives you a file list. The report can include the object key, size, last modified date, storage class, encryption, replication, version information, and other object attributes.

That file-list shape matters. You can query Inventory with Athena, Spark, or Trino. You can list the files Iceberg still references from table metadata. Then you can join:

select
  inv.key,
  inv.size,
  inv.last_modified_date
from s3_inventory inv
left join iceberg_reachable_files live
  on inv.key = live.object_key
where inv.key like 'warehouse/db/table/%'
  and live.object_key is null;

That query is the beginning of an orphan-file investigation. Not the end. You still need safety checks for active writers, time travel, tags, branches, object versions, retention rules, and non-Iceberg files that may legitimately share a prefix.

But Inventory gives you the raw material Storage Lens does not: the object list.

Use Storage Lens to find the growing area. Use S3 Inventory to identify the exact objects.

The compliance angle

Orphan files are not just wasted storage.

If old files contain sensitive data, they may survive outside the retention rules and cleanup controls you think govern that table.

The catalog says the table was cleaned. The retention policy says old data should be gone. The query engine no longer sees the old files. The object store still has them.

That is not a theoretical governance annoyance. It changes the answer during an audit, a breach review, a deletion request, or a regulated retention check.

The platform team may honestly say, “the table no longer references that data.” The security reviewer will ask a different question: “does the data still exist?”

S3 answers that question.

Iceberg maintenance is several different jobs

The maintenance vocabulary matters because the failures are different. Apache Iceberg’s own maintenance docs split these operations apart for a reason.

Data-file compaction

Does: Rewrites many small data files into fewer larger files.
Protects: Query planning, file-open overhead, and scan efficiency.
Miss it: Small files pile up; queries get slower even if storage size looks normal.

Delete-file compaction

Does: Reduces row-level delete-file overhead in format v2 tables.
Protects: Read performance on update, delete, and merge-heavy tables.
Miss it: Reads keep paying to merge base files with scattered delete files.

Manifest rewrite

Does: Reorganizes manifest metadata so planning has less metadata to scan.
Protects: Coordinator memory, planning latency, and metadata scan cost.
Miss it: The table can have reasonable data files but awful planning behavior.

Snapshot expiration

Does: Drops old table versions and files referenced only by expired snapshots.
Protects: Metadata growth, storage tied to old versions, and bounded time travel.
Miss it: Every rewrite can keep old files alive because old snapshots still need them.

Orphan-file removal

Does: Deletes table-location files not referenced by valid Iceberg metadata after the configured safety window.
Protects: Physical storage cost and undeclared retained data.
Miss it: Failed writes and migrations can leak unreferenced objects forever.

The operations interact. Compacting data creates a new snapshot. Snapshot expiration decides when old compacted-away files are no longer needed by retained table history. Orphan cleanup catches files that are not reachable through the valid metadata graph. Manifest rewrite helps the engine plan the table without drowning in metadata.

Skip one, and the symptom moves somewhere else.

Run compaction without snapshot expiration and you can improve read performance while retaining old files for too long.

Expire snapshots without orphan cleanup and you may still miss files that were never safely attached to table history.

Delete files too aggressively and you can break active writers, time travel, branches, tags, or recovery assumptions.

Run all of it without object-store reconciliation and you still do not know whether the bill moved.

Vendor support does not mean vendor maintenance

The phrase “supports Iceberg” is too vague to be useful. It can mean read support, write support, catalog integration, managed optimization, SQL procedures, or a service-level maintenance policy.

Here is the practical version to check before trusting the storage bill. This is a June 2026 documentation snapshot, not a contract. Re-check the current vendor docs before turning it into policy.

Apache Iceberg OSS

Maintenance model: Primitives are available; you schedule and observe them.
Typical controls: Expire snapshots, remove orphan files, rewrite data files, rewrite manifests.
Watch: Nothing runs just because the table exists.

Amazon Athena

Maintenance model: Manual SQL maintenance.
Typical controls: OPTIMIZE rewrites data files; VACUUM expires snapshots and removes orphan files.
Watch: You own scheduling, table selection, and proof that S3 bytes changed.

AWS Glue / Lake Formation

Maintenance model: Managed optimizers for Iceberg tables in the Data Catalog.
Typical controls: Compaction, snapshot retention, and orphan-file cleanup can be enabled per table.
Watch: Verify optimizer eligibility, permissions, skipped tables, and job status.

Amazon S3 Tables

Maintenance model: Managed table-bucket maintenance.
Typical controls: Compaction, snapshot management, and unreferenced file removal.
Watch: Maintenance has documented limitations; tags, branches, or retention properties can stop snapshot management.

Snowflake

Maintenance model: Depends on Iceberg table type and catalog/storage ownership.
Typical controls: Snowflake documents automatic maintenance behavior for some table/storage models.
Watch: Snowflake's orphan-file story depends on the table and storage model; do not assume external-volume cleanup works like Snowflake-managed storage.

Databricks

Maintenance model: Unity Catalog managed tables get managed storage and optimization features.
Typical controls: Predictive optimization can run OPTIMIZE and VACUUM for supported Unity Catalog managed tables.
Watch: External tables, foreign tables, and interoperability modes do not all inherit the same maintenance behavior.

BigQuery

Maintenance model: Managed Iceberg tables perform automatic table management.
Typical controls: Compaction, clustering, garbage collection, and metadata generation or refresh.
Watch: Managed Iceberg tables and externally managed Iceberg tables are different operational products.

Dremio

Maintenance model: SQL maintenance commands and platform maintenance features.
Typical controls: OPTIMIZE TABLE rewrites data and manifest files; VACUUM TABLE expires old snapshots and can clean orphan files.
Watch: Confirm which catalog/table type supports which cleanup action before assuming orphan deletion exists.

Trino / Starburst

Maintenance model: Connector procedures expose maintenance operations.
Typical controls: Snapshot expiration, orphan-file removal, manifest optimization, and file optimization depending on engine/version.
Watch: The engine may provide the procedure; your platform still owns policy, cadence, isolation, and observability.

The practical question is not “does the vendor support Iceberg?”

The question is: who deletes unreferenced physical files, which retained-history files must stay, under what retention rule, how often, with what failure visibility, and how do you verify the object store afterward?

What to monitor

Start with a boring reconciliation job:

live data size from Iceberg metadata
physical bytes under the table path from storage inventory
reachable file count versus total file count
orphan candidate bytes by age
orphan candidate bytes by partition or prefix
tables the cleanup job skipped or failed
tables the cleanup job selected but could not clean
post-cleanup storage size, verified from object inventory

The important metric is not “did the maintenance job run?” It is “did physical storage converge toward logical table state?”

I would also track ratios:

physical bytes / live Iceberg bytes
total object count / reachable file count
orphan candidate bytes older than the writer safety window
snapshots retained versus expected retention
manifests per snapshot
average data file size
delete-file count and delete-file bytes

Ratios are useful because every lake has big tables. The weird table is the one where logical size is flat and physical size keeps growing.

The cleanup rule

Do not start by deleting files.

Start by proving why they are safe to delete.

For a large table, the cleanup sequence should look boring:

Freeze the investigation window so you know which Inventory report and Iceberg metadata version you are comparing.
Export the reachable file list from Iceberg metadata.
Query S3 Inventory for the table prefix.
Identify objects not reachable from Iceberg metadata.
Exclude files younger than the active writer safety window.
Exclude files protected by retention, legal hold, branch, tag, or recovery policy.
Sample and explain candidates before bulk deletion.
Delete in batches with audit logs.
Re-run Inventory-based reconciliation after S3 reports the next inventory cycle.

That sounds slow because it should be slow the first time. Deleting object-store data based on a clever query deserves paranoia.

Once the control is proven, automate the boring parts: candidate generation, safety windows, owner approval, deletion batches, and post-cleanup verification.

The practical checklist

For the largest tables, review these things:

Does Iceberg’s live table size roughly match the bytes stored under that table path?
Is orphan-file cleanup scanning every table class it is supposed to clean?
Can you see which tables failed or were skipped, instead of only seeing that the job finished green?
After cleanup runs, can you prove the files actually disappeared from object storage?
Are snapshot retention, branch/tag retention, and recovery windows explicit?
Does compaction reduce file count without creating long-lived old files?
Does snapshot expiration run often enough for the table’s write rate?
Does S3 Inventory show old objects that Iceberg metadata cannot reach?
Who owns the exception list when a table is too risky to clean automatically?

If your S3 bill keeps climbing while your Iceberg tables look small, stop trusting table metadata alone.

The table format tells you what is alive.

The object store tells you what still exists.

The bill only cares about the second one.