Good points. The easy experimentation factor is helpful for development, though ...

bix6 · 2026-05-30T18:24:32 1780165472

It’s interesting all the focus on opt-out from training. Sometimes I worry there is an intentional focus on that so people don’t think about the other ways the company might be profiting off our data. Like I pay for Anthropic and they don’t train on that but are they selling my “anonymized” usage data in some other way?

derefr · 2026-05-30T18:38:01 1780166281

From what I recall, these companies don't offer any option to opt out of your session transcript data being used (and sold!) for "regular" adtech targeting purposes.

nl · 2026-05-31T02:23:07 1780194187

Anthropic explicitly state that they don't do this, even if you use the free plan and even if you don't opt-out of letting them use your data for training:

"We do not sell users’ data to third parties."

https://www.anthropic.com/news/updates-to-our-consumer-terms

derefr · 2026-05-31T14:45:48 1780238748

That answers for the "sold" part but not for the "used" part.

I.e. nothing about this statement prevents Anthropic from running ads within Claude, as long as they run the ad-placement auctions themselves, and so aren't leaking any of the data they're using to decide which placements are relevant to which users+sessions. (This is the same thing Google does for SERP ad auctions.)

But actually, and perhaps more interestingly, nothing about this statement prevents Anthropic from building a Google AdSense competitor either. Other sites (or mobile apps, etc) could plop in an Anthropic ad iframe; and it'd be Anthropic's knowledge of your interactions with Claude that would drive what ads would show up in that iframe. The embedding site doesn't know what ads the users are seeing, so that's still not "selling users' data to third parties", per se.

nl · 2026-05-31T02:14:47 1780193687

> You should expect that any inputs and outputs are going into someone's training database.

OpenRouter explicitly lets you filter by zero-data-retention providers: https://openrouter.ai/models?zdr=true

derefr · 2026-05-30T18:33:33 1780166013

> You should expect that any inputs and outputs are going into someone's training database.

True enough, in theory; but what exactly are you imagining would be a useful-enough signal in the OpenRouter request+response stream, that any company would want their data as training material?

Even a single OpenRouter-API-key-identified subscriber's traffic, may consist of an mixture of traffic from multiple different sessions, under potentially multiple different end-users. (Where, if the subscriber is doing security correctly, then their OpenRouter key lives on a gateway rather than in a frontend app; and so the only IP address / UA / etc OpenRouter sees is that of the gateway itself.)

And the traffic stream may also invoke multiple models, and provide multiple different system prompts for those models; which, while marked in the traffic (i.e. conveyed as part of each request), makes the resulting data much less useful in aggregate, than if it were all training data for one model with one system prompt.

Plus, there are no RLHF signals in OpenRouter data. Even if OpenRouter wanted to build a general model-neutral framework for collecting RLHF-type data, it can't force subscriber apps to do the UI-level stuff necessary to collect it (i.e. the things ChatGPT/Claude do, with "thumbs-down" buttons, A/B tested responses, etc.) Analysis would have to rely on pure transcript-level user sentiment extraction.

reed1234 · 2026-05-30T20:16:00 1780172160

You get a 1% discount if you give OpenRouter your traces so at least they think there's some (a lot) of value.

aargh_aargh · 2026-06-01T16:58:27 1780333107

I had no idea what traces are in this context. While looking, I found this post from @OpenRouter:

https://x.com/OpenRouter/status/2041193329270878707

  > Privacy:
  > 
  > Private I/O logging and the 1% data sharing discount are separate settings. You control each independently.
  > 
  > Input & Output Logging stores prompts and completions for your use only and makes them visible in your logs. OpenRouter does not access this data. You can configure it in your observability settings.
  > 
  > As always, more in the docs!

https://openrouter.ai/docs/guides/features/input-output-logg...

https://openrouter.ai/docs/guides/features/broadcast/overvie...

reed1234 · 2026-06-02T15:43:42 1780415022

The logging thing is like langfuse

nl · 2026-05-31T02:19:46 1780193986

> Plus, there are no RLHF signals in OpenRouter data. Even if OpenRouter wanted to build a general model-neutral framework for collecting RLHF-type data, it can't force subscriber apps to do the UI-level stuff necessary to collect it (i.e. the things ChatGPT/Claude do, with "thumbs-down" buttons, A/B tested responses, etc.)

The majority of RLHF data doesn't need this. The majority is software development and/or tool calling where the agent gets a signal back as to if it succeeded (eg compilation errors, test errors). It's true that end-of-trajectory signals (eg, did this task do what you wanted) are even more useful but even partial signals are great for RL training.

lxgr · 2026-05-30T22:42:26 1780180946

> what exactly are you imagining would be a useful-enough signal in the OpenRouter request+response stream, that any company would want their data as training material?

Isn't this a treasure trove for any model distillation effort?

gbro3n · 2026-05-30T18:58:25 1780167505

I've wondered this too - exactly how are our inputs and outputs useful as training data? So I asked Gemini. Apparently using negative sentiment in user or llm responses can serve as RLHF, and the human prompts can also serve as useful data for what problems the llms need to be able to solve. There's also that smaller models can train on and improve from data from larger models but that's less relevant when not switching models in context.

mannanj · 2026-05-30T22:05:14 1780178714

How about protection of intellectual property? Doesn’t have to be patented to be valuable.

tasuki · 2026-05-30T19:14:11 1780168451

> Clearly anyone who can pay should be using paid models with privacy protections

Clearly, anyone who needs privacy should be using models with privacy protections. Some people build open source and the models will get the code anyway.

derac · 2026-05-30T18:36:15 1780166175

I recommend nvidia nim for completely free dev access for young people.

acka · 2026-05-30T19:48:49 1780170529

It's free, but not unlimited. Besides rate limits, new sign-ups get 1000 credits (requests), and once those are gone, they're gone for good. Only business accounts might get a couple of free refills.

derac · 2026-05-31T21:39:36 1780263576

It is unlimited under the free NVIDIA Developer Program. You're talking about a different sort of acct I think. The dev program acct is 40 rpm unlimited for personal use.

ssivark · 2026-05-31T02:55:02 1780196102

Is there a way to check/track your available credits?