Ask HN: How do I sell unique training data?

5 points by mackwell 14 hours ago

We have all seen the recent large deals made by tech companies to purchase access to various types of data for training their models (or Reddit, Photobucket). I have also seen some articles about the industry’s ever growing need for unique media and data that seem to suggest the existence of a market and brokers in need of new sources that are not online. They seem willing to pay, but I don’t see an obvious way to sell.

I believe I have access to troves that have never and will never be online. Some quick research has not turned up any obvious marketplace online or who to talk to.

Is anyone here in this business or have any advice or resources for people like me who want to explore offering training data for sale or license?

mmarian 14 hours ago

The sales process is the same as with any other b2b product. You need to figure out its value and customers.

And make sure you're confident about the value. For example, in many workflows having only 10% coverage of the population makes the data useless.

I wouldn't worry about the licensing details as a startup. It won't matter until you can afford lawyers and reputational damage for pursuing someone who's broken the license.

  • mackwell 13 hours ago

    The articles covering this topic have shown deals where various forms of media were purchased at surprising rates.

    I’m not looking to start a company around this, my source of media and data are essentially a byproduct of my actual business. Ideally I’d like to find a broker or marketplace that I can feed it to.