20

There are tools like OSINT or just plain old web-scraping which could allow you to harvest a lot of data. Let’s say you never harvest data in an illegal way, so technically all the data you compile is public information. However, you managed to extract data from so many fragmented parts of the internet that the net sum is a data set that is unusually useful and informative - about private or sensitive topics like individual’s info.

Is there any threshold one crosses where even if the data is legal to acquire, it becomes illegal to gather and store and share it, because the resulting data set is more or less a kind of privacy invasion or privacy threat?

Julius Hamilton
  • 797
  • 6
  • 19

5 Answers5

26

Even if you gather from public sources, you need to comply with the GDPR to process Personal Identifiable Data. That means you need a strictly legal basis to even be allowed to gather them. That you gathered them from all over just means you made data of others identifiable. The threshold at which you have to comply with GDPR is the moment you start to gather data about people in Europe.

As nvoigt correctly noted, a phonebook is the easiest example in Europe: A person is only listed in the phone book because the phone book maker has an interest and usually consent. This consent is not given to anyone other but the phone book company, and this consent is not transferable. To process the data in a phonebook but for purely personal use (e.g. as a company), you are required to have another legal basis to process it. Among them is legitimate interest or to get consent from the data subject.

Data Scraping, under the GDPR, is almost impossible and very risky:

Do you remember Equifax? They were struck with the worst of all punishments. Not a fine, but they had to destroy any part of their database that contained any data obtained without consent, because some of the data was obtained by illegal scraping.

Do you remember ClearView? Fined 20 million by Italy for violation of the GDPR, together with an order banning ClearVieww from operating in Italy and to delete all data from people inside Italy in February 2022. They had to delete their French database in 2021. And they were fined another 7.5 million in the UK in November 2022.

Database rights

Some countries also have a copyright-akin right in databases, which disallows scraping data from those databases.

Trish
  • 48,907
  • 3
  • 98
  • 200
16

There is no such law in the United States. It is legal to collect large amounts of raw data from public sources and share it.

Usually, even information protected by a non-disclosure agreement, or a statutory privacy requirement can be legally collected and shared once it becomes a matter of public record or is made public (although this isn't true for certain national defense information, and for information obtained in confidential attorney-client communications if the attorney is the one seeking to share it).

There is no generally applicable right to privacy of information in the United States, although sometimes there a privacy rights associated with information disclosed in the context of certain specific kinds of relationships (e.g. banker-customer, attorney-client, health care provider-patient). Some U.S. states protect more privacy in more specific relationships than other U.S. states.

Some public data sources, such as PACER, the public database of the federal courts, charge users for large downloads of data from their database, but not for small downloads of data, however.

Indeed, even privately collected assemblies of raw data are not protected by copyright. See Feist Publications, Inc. v. Rural Tel. Serv. Co., 499 U.S. 340 (1991).

ohwilleke
  • 249,340
  • 16
  • 487
  • 868
5

The goal of "sharing information" may possibly be illegal itself, depending on what you are sharing and why, regardless of where the information came from.

As an example, a student named Jack Sweeney is currently being threatened with legal action by Taylor Swift, because he runs social media accounts that compile public FAA data to publish tracking of celebrities' jets. All the data Sweeney uses is publicly available to anyone, but Swift's lawyers contend that the aggregation and publication of the data in with only 24 hours of delay amounts to unlawful harassment. It's unknown at this time if such a claim would be successful, but it's within the realm of possibility. Here is a more detailed legal analysis of the claims in the case - the upshot is it's not clear what laws if any Sweeney might be breaking, but it does mention a few not-too-distant hypotheticals that would be more likely illegal, like using public information to stalk and harass celebrities in violation of anti-paparazzi laws.

Nuclear Hoagie
  • 6,208
  • 1
  • 29
  • 24
3

It's perfectly possible to collate a set of public data that a government agency might feel it necessary to apply a restrictive security classification to. At that point you're in Official Secrets territory and doing anything with it comes with fairly horrible downsides.

Where this might apply to privacy is if the individual concerned has some national security significance. You might not be aware of this in advance.

regularfry
  • 131
  • 1
2

Note that the answers assume you are collecting the data from an unrestricted source. In the US, a fact can not be copyrighted but a collection of facts can be, so a specific database may be copyrighted even if the data originally came from open sources, and extensive copying would make your version a Derivative Work at best.

It's not uncommon for databases that are copyrighted but exposed to include some "smoking gun" entries which don't affect use of the data but whose presence in another resource would demonstrate that it was bulk-copied from this one and hence (unless specifically authorized) a copyright violation.

Format of the data may also be copyrightable, or trademarkable, or qualify for a Design Patent.

Basically, if you aren't sure you have permission to use a dataset, ask. And then remember that they may be wrong. In either direction.

keshlam
  • 282
  • 1
  • 7