There are tools like OSINT or just plain old web-scraping which could allow you to harvest a lot of data. Let’s say you never harvest data in an illegal way, so technically all the data you compile is public information. However, you managed to extract data from so many fragmented parts of the internet that the net sum is a data set that is unusually useful and informative - about private or sensitive topics like individual’s info.
Is there any threshold one crosses where even if the data is legal to acquire, it becomes illegal to gather and store and share it, because the resulting data set is more or less a kind of privacy invasion or privacy threat?