Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The article says that they're partnering to incorporate OpenAI's algorithms into a generative AI solution that SO was already working on in parallel to their Q&A sites, and to allow data from SO sites to be accessible to OpenAI's own solutions.

It doesn't indicate that generative AI is going to be shoehorned into StackOverflow's websites. It would seem counterproductive, in fact, to do that, since the gist of this seems to be that StackOverflow provides a large wealth of organized, validated human-generated knowledge, which is exactly the sort of thing you want to train LLMs on. Feeding AI-generated data back into that would diminish the value of the data SO hosts for that purpose.



Too bad OpenAI already scrapped all of this data years ago and is in a position of power here.


Not sure what you mean. Sure, they've scraped a lot of data, but websites are in a position to inhibit further scraping, so it's in their interests to cooperate with data sources they want to rely on.

I'm not sure what "position of power" you could be referring to. Power to do what, with respect to what? OpenAI has useful tools that Stack Overflow wants to apply to its own use cases, and Stack Overflow has good data for training LLMs on. Seems like a straightforward alignment of incentives.


OpenAI has enough motivation to circumvent whatever anti-scraping measures stackoverflow could muster.

I assume stackoverflow's metrics (traffic, number of new questions and answers) are down by an amount they are not happy with, so they are eager to strike any deal before their ship sinks.

At least that's how I read the news piece. Personally, I'm as often on stackoverflow, as I've ever been, whereas my chatGPT usage is down to almost zero.


> OpenAI has enough motivation to circumvent whatever anti-scraping measures stackoverflow could muster.

And even greater motivation to just cooperate with StackOverflow for mutual benefit, rather than engage in a ridiculous arms race with them.

> I assume stackoverflow's metrics (traffic, number of new questions and answers) are down by an amount they are not happy with, so they are eager to strike any deal before their ship sinks.

I'm not sure I'd understand the connection to this even if that were true. The value StackOverflow seems to be bringing to the table is specifically a large dataset of human-curated technical knowledge. Both parties in this arrangement would have strong interest in ensuring that StackOverflow continues to generate this data through its user-centric Q&A website. I'm not sure how a deal with OpenAI would prevent their "ship" from "sinking" if that were the situation they were in.

> Personally, I'm as often on stackoverflow, as I've ever been, whereas my chatGPT usage is down to almost zero.

Same here. ChatGPT is a nice novelty, but I haven't found all that much productive use for it. Most people I know who do use it regularly are using it for either correcting their spelling/grammar, or as a conversational-interface search engine, neither of which I find to be superior to proofreading my own writing or evaluating information from its original sources after doing a conventional search.

But there might be a value-add for StackOverflow in the latter case: finding specific answers to complex questions can be a hit-or-miss proposition, and ChatGPT might at least provide a more efficient way of finding the articles that answer your questions, if implemented properly.

Of course, implementing it properly would likely involve designing the LLM to track the sources of the data it's tokenizing, and present a 'bibliography' for each of its answers, rather than just blindly compositing data from all sources into single probability values.


StackOverflow released a data bundle that anyone could use to prevent scraping.


I hope that StackOverflow people understand this. And that they do not panic because their usage/engagement metrics is down quite a bit over the last years.


Might very well be in panic mode. They're also partnering with Indeed to bring back a new version of StackOverflow Jobs.

https://meta.stackexchange.com/questions/399440/testing-a-ne...


Regarding usage, I was on SO.

I specialize in Amazon Redshift.

I've written a lot of PDFs about Amazon Redshift - serious stuff, deep technical investigations and explanations, published along with the source code which produces the evidence which the PDF is based on - and when people asked questions where I'd written up the answer, I pointed them at the appropriate PDF.

After some months, I received a direct message, which looked to me to be a pro-forma, a standard message sent in this situation, from the staff that I was promoting my site and I should not do so. It was well written and polite.

That's fine - I have no problems with that, it's their web-site.

What I did not like, however, and what came over as slimey, was that the staff had also deleted every post I had made.

This was not mentioned, at all, in the well written and polite message, which then of course became disingenuous. If you're going to do something serious like that, you need to tell people, not let them discover it for themselves.

This was for all posts, where I'd explained something directly or pointed to a PDF - presumably it's a standard action SO take in this situation.

I deleted my account and left.


SO corporate has been trying to shoehorn AI into the sites ever since it became the latest buzzword. It's been largely laughably bad and is alienating the community, who don't want it and aren't asking for it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: