Talk Commerce Talk Commerce
Floxy Ranked 15 AI Coding Tools by Data Retention. Your Commerce Stack Probably Touches Most of Them.
| 5 min read

Floxy Ranked 15 AI Coding Tools by Data Retention. Your Commerce Stack Probably Touches Most of Them.

By Brent W. Peterson


Google Gemini holds developer code for 540 days. Replit holds it for 7. The spread matters when the code being held is your custom commerce stack.

Floxy, an IP infrastructure provider, just released a study scoring 15 popular AI coding assistants on the security risk they pose to developer code. The top of the leakage table is Google Gemini at 99 out of 99, mostly because Gemini retains developer prompts and code for 540 days, the longest of any tool in the study. The list goes on through Bolt.new, Lovable.dev, Claude Code, Replit, Microsoft Copilot, ChatGPT, Amazon Q Developer, GitHub Copilot, and v0.dev. The whole table is worth a read at floxy.io. The piece I want to pull out for commerce operators is what those retention windows mean when the code being held is the proprietary logic running your storefront.

What the data shows

Floxy scored each tool across five inputs. AI agent tool usage rate (how many developers report using it), data retention period (days the platform keeps user code), user data training risk (whether the platform uses code to train future models), hallucination rate, and downtime rate. Each input becomes a 1 to 99 risk score, and the five combine into a final composite.

The top ten:

Tool Retention Training Risk Hallucination Final Score
Google Gemini 540 days 7 (opt-out available) 7.0% 99
Bolt.new 30 days 10 (unclear policy) 10.3% 85
Lovable.dev 90 days 7 (opt-out available) 10.3% 70
Claude Code 30 days 7 (opt-out available) 10.3% 57
Replit 7 days 7 (opt-out available) 10.3% 56
Microsoft Copilot 180 days 7 5.6% 54
ChatGPT 30 days 7 9.6% 47
Amazon Q Developer 90 days 7 5.1% 42
GitHub Copilot 28 days 7 5.6% 39
v0.dev 20 days 7 10.3% 37

A few things stand out. Retention windows differ by a factor of 77 across the tools studied. The platforms with the highest usage rates (ChatGPT at 81.7%, GitHub Copilot at 67.9%, Gemini at 47.4%, Claude Code at 40.8%) span the full spread of policies, so usage alone is no signal for safety. The platform Floxy flags as most opaque on training data, Bolt.new, is also one of the platforms most aggressively pitched into the headless storefront and vibe-coding wave of 2025 and 2026.

Aimen Hallou, Chief Technology Officer at Floxy, frames the leakage scenario this way in the release. “We’re seeing a new category of security incident where companies discover their code showed up in a competitor’s product, and the only explanation is a shared AI training dataset. When your code trains a model, you lose control over where it goes. The model learns patterns from your implementation, and those patterns can surface in suggestions given to other developers, including your competitors.”

The retention numbers come from each vendor’s published policies, which Floxy aggregated. The composite risk score is Floxy’s own weighting. The leakage scenario above is Floxy’s framing, not an independently benchmarked finding.

Why commerce code is exposed

Commerce stacks are not generic developer code. They carry pricing logic, promotion rules, payment integration patterns, customer data handling routines, fraud rules, and the parts of the checkout that took twelve months to tune. A standard headless build today involves AI assistance at almost every layer. Storefront scaffolds drafted in Bolt.new or Lovable.dev. Custom API integrations written in Claude Code or Copilot. Third-party connectors generated in Cursor. PDP code refactored in v0.dev.

The commerce operator running that build often has no inventory of which tools touched which parts of the codebase. Devs pick the tool that ships fastest on a given day. The proprietary checkout flow that gives a brand its conversion edge may have moved through three different AI assistants by the time it goes live, each with a different retention policy and a different training default.

The 540 day retention number is the eye-catcher in the Floxy data. The more practical risk for commerce teams sits in the unclear category. Bolt.new lands in second place on the leakage list not because it retains data the longest (30 days, same window as ChatGPT and Claude Code) but because its training policy is opaque. A platform that does not clearly state whether your code trains its model is a platform you cannot reason about. That ambiguity is harder to govern than a long-but-stated retention window.

What this means for commerce

Three takeaways worth borrowing from how the report is structured.

  1. Inventory the tools. Before you read another paragraph about AI governance, find out which AI coding tools your dev team is using week to week. The Floxy study lists 15. Most commerce teams I talk to can name three or four on their stack and are surprised when a real audit surfaces six or seven.

  2. Read each tool’s retention and training policy yourself. Vendor wording moves. The numbers Floxy aggregated are accurate as of the publication date and may not match what is in the policy six months from now. Add the check to your vendor review cycle alongside third-party SaaS contracts.

  3. Default to opt-out on training. Most of the tools in the Floxy table offer training opt-out. Most also default to opting in. The dev who turned on Cursor or Copilot last quarter probably did not flip that toggle. For proprietary commerce logic, opting out of training is the cheap and correct setting.

The deeper point the report does not quite reach is that AI coding tools have become part of the commerce stack, and the data handling rules for that category do not really exist yet. Treating these tools the way commerce teams treat any vendor touching proprietary data is the move. Inventory, retention review, training opt-out, contracts that name the data class.

The full Floxy study and methodology are at floxy.io. The composite scores and the broader leakage framing come from a vendor that sells infrastructure adjacent to the same problem, so weigh them accordingly. The vendor-policy retention windows in the table are the durable part of the data and the part worth saving.