50

I have come across a published dataset which comes with a note saying that: (1) the authors of the dataset would like to be informed of any papers that use it (2) they might request co-authorship depending on how much the paper depends on the data.

I am feeling ambivalent about using this dataset. On the one hand, I appreciate that the authors chose to make it public, despite the many man-hours of work that went into creating it. On the other hand, it seems unusual that the creator of an already published dataset would request co-authorship on a paper they did not otherwise contribute to.

I am interested in using such a dataset for an idea, but I am reluctant to put significant work into a project when there is a risk (even if a small risk) that the dataset's creator may interfere with publication. Accepting co-authorship essentially means agreeing that the co-author may delay publication or may attempt to shape the paper, possibly in ways I am unhappy with.

Are such requests common? Are they reasonable? Is it reasonable to ignore such a request, given that the dataset is public?

I understand that this question may read like a nitpick. Realistically, I don't expect trouble. Yet it bothers me that starting to work with this dataset appears to essentially require agreeing that my work may be interfered with. It seems like a rather unreasonable "have your cake and eat it" mentality on part of the dataset author when releasing the data.

Ecobos
  • 453
  • 4
  • 6
  • 2
    What is the data? They might have a bit more of a leg to stand on if it's perhaps super sensitive and they want to discuss your usage of the data pre-publication to make sure it's correct. Just spitballin – Azor Ahai -him- Jul 26 '21 at 18:05
  • 4
    Plenty of researchers have private datasets of one sort or another that they don't publish, except insofar as is necessary to support particular papers. If you wish to use their dataset, you have to come to an agreement with them -- which almost always involves co-authorship. The present case is really just an evolution of that approach. – avid Jul 26 '21 at 19:51
  • 1
    A request to "consider us for future collaboration if you use this dataset" is probably much more acceptable, but, still, not common. – Buffy Jul 26 '21 at 20:31
  • 2
    People try to claim rights over all sorts of intellectual property. In the early days of the development of finite element methods for solving PDEs, somebody was actually granted a patent for the concept of linear interpolation between two points. I don't think they ever made any money from licensing it, though. – alephzero Jul 26 '21 at 22:30
  • 2
    I'm actually glad that the data set is released in this format - it used to be harder to access (on a private web page rather than a public archive, required registration) – Ben Bolker Jul 27 '21 at 14:02

12 Answers12

56

On the other hand, it seems unusual that the creator of an already published dataset would request co-authorship on a paper they did not otherwise contribute to.

I do not think they are asking for co-authorship on a paper they did not otherwise contribute to; I think they are soliciting a collaboration with someone who is interested in using their data. "co-authorship might be requested" should be interpreted as "we might ask to be involved at a level suitable for authorship".

They have not released their data with a license requiring that they are included as coauthors, so this is a request rather than condition of using the data.

I have personally authored papers based on data collected by others, with coauthors whose contribution to the paper was primarily their involvement with data collection rather than any of the analyses. Those collaborators made the effort to be authors by providing the data, helping with data extraction, reviewing the manuscript and discussing the analysis approaches even if they didn't do that directly, and most importantly by knowing and being experts about the data itself. Even well-annotated data can have little quirks that are not transparent to someone unfamiliar with the data. Sometimes there could be additional data or caveats that hasn't made it into the shared version and you'd very much want to know about this as someone using the data.

Though authorship conventions vary a bit by field, in the authors' field it is common to have multiple coauthors and there is no implied dilution of the effort of other authors (especially the first author) by having additional coauthors. The cost of their ask is pretty low, and if I was planning to use their data I'd want to contact them about their interest and if they were interested and planning to contribute then I'd certainly give them the opportunity to meet authorship standards.

Bryan Krause
  • 114,149
  • 27
  • 331
  • 420
  • 3
    I accepted this one out of many good answers for the idea that this is an opportunity, not an obligation, to collaborate. It's okay to say no, but saying yes might be advantageous. I would like to clarify that my concern about forced collaboration (rather than merely putting names on the paper) was that while it may turn out to be positive, it might also be negative. We have all heard stories of some irresponsible senior co-author delaying publication forever without considering that getting the paper out might be critical for a junior first author to advance their career. – Ecobos Jul 27 '21 at 21:01
26

They can request whatever they like, but you don't have to accede to their wishes unless you think it is valid. If they don't actually contribute to the intellectual content of the paper, then they aren't due authorship, but certainly need to be acknowledged with a citation. Saying "thanks" explicitly would be polite, also.

I've never heard of such a thing (but don't hear everything). I doubt that it is "usual" or "common" in any sense.

You make the decisions. Use normal ethical principles about what does and does not constitute authorship.


An exception would be if there are specific licensing terms, though, normally, data per se can't be copyrighted and so no license is needed.

Buffy
  • 363,966
  • 84
  • 956
  • 1,406
  • 7
    If you are just using data that is published and in the public domain, I'm not really sure an acknowledgement is necessary beyond the citation. Obviously acknowledgements are cost free, and it always helps to be more polite than is strictly necessary, but I wouldn't see it as a requirement. – Ian Sudbery Jul 26 '21 at 17:31
  • 2
    @IanSudbery, probably correct. But a citation is an ack of sorts. I was only worried about avoiding plagiarism charges. And a thanks is polite. But I updated. Thanks. – Buffy Jul 26 '21 at 19:38
  • 1
    I don't think copyrights are relevant here. They are based on laws, and laws vary arbitrarily between countries, while the standards of academic integrity should be more universal. Datasets often come with various terms, and while the terms may not be legally binding, it's best to accept them unless you are deliberately seeking conflict. – Jouni Sirén Jul 26 '21 at 23:46
  • 1
    data per se can't be copyrighted IANAL, but while it looks like that data itself can't be copyrighted, a compilation of data can be. So you can use the data without restriction, but not necessarily the data file itself. I'm not clear what that means legally for using data out of the file, though; would it be a derivative work? I might ask on legal SE – anjama Jul 26 '21 at 23:48
  • @JouniSirén, in order to require "terms" you need to own rights, such as copyright or patent. You can't put restrictions on something to which you don't hold legal rights. – Buffy Jul 26 '21 at 23:53
  • 1
    Here's a link with more details about the copyright aspect for people that are interested: https://law.stackexchange.com/questions/11359/can-you-copyright-data Basically, straight tabular data files might not be copyrightable, but might be if the data has been sufficiently worked up and is not considered "raw". Even in raw form, it might fall under "database rights" in some countries: https://en.wikipedia.org/wiki/Database_right – anjama Jul 27 '21 at 00:01
  • 1
    @Buffy As academics, we expect many things not required by law, such as citing earlier work and including people who contributed to the work as coauthors. Why should this be any different? – Jouni Sirén Jul 27 '21 at 00:02
  • @JouniSirén, because "terms" and licenses are a limitation on others. You can't limit them when you have no right to do so. It is quite different. – Buffy Jul 27 '21 at 00:05
  • 6
    @Buffy Academic publishing is based on expectations, not rights and limitations. If you ignore what others expect from you, your papers may get rejected or retracted, because the editors may not want to get involved in conflicts they do not understand. – Jouni Sirén Jul 27 '21 at 00:13
  • 3
    At least in Germany you can get copyright if you build a data set or data base. Each single data point cannot be copyrighted, but the work as a whole. – usr1234567 Jul 27 '21 at 08:09
17

If you read the license (on the right side of the webpage you provided the link for), it states that "The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law."

In other words, they waived all their rights. I think their request for notification or co-autorship is not obligatory but voluntary. I guess they mean 'they might be requesting to be co-authors', but accepting or not depends on you.

Tigerx
  • 319
  • 1
  • 6
  • Is that public-domain waiver actually binding, though? I hear that intentionally putting stuff in the public domain by fiat is relatively hard. E.g., Wikipedia claims that's against the law in Europe. – Daniel R. Collins Jul 26 '21 at 23:13
  • 1
    @DanielR.Collins even then, it would be hard for the people who declared it public domain to sue user of the content in any reasonable court of law, so as long as they are alive and no heirs are involved it would be rather safe. And note that this claim on Wikipedia has a status of [citation needed] so you should treat it as any other hearsay. – Mołot Jul 27 '21 at 07:56
  • @Molot: Okay, but the null hypothesis (and safety considerations) would be to treat the waiver as not-working until some legal citation/expert says otherwise. – Daniel R. Collins Jul 27 '21 at 12:15
  • Is a dataset even something that can be copyrighted in the first place? Maybe a specific way of organizing it would be, but not the raw information. This is similar to phone book copyrights -- you can copyright the yellow pages because of the way you categorize it, but not the set of names and phone numbers themselves. – Barmar Jul 29 '21 at 14:10
17

All of the thoughts you raise about co-authorship are negative!

  • The data folks might delay publication
  • The data folks might shape the paper

As others have said, this is in the public domain, so you are not required to make the data folks co-authors. But I think you should consider the many benefits:

  • You build new collaborations. Later, this leads to Letters of Rec, conference invites, early notification about new datasets, someone on a faculty selection committee who knows your name.
  • The data folks understand their data. You may think that you do, but data can have subtle issues. A co-authorship incentivizes the data folks to help you interpret their data as accurately as possible.
  • The data folks may have more, unpublished data. Bringing them aboard as co-authors could give you the opportunity to explore the topic in a broader or more nuanced way if it provides the data folks an avenue for getting more data out there.
  • The data folks turn out to be good writers and your final paper is better for bringing them aboard. Personally, I know I often grow tired of a paper in the final stages of working with it. Having coauthors continue to raise nit-picks is annoying, but when we submit I'm much more confident that the final product is of high quality.
  • You learn more about collaboration/coordination/leadership. Increasingly, science is a team sport, so playing it that works in your favour.

When I go to seminars, I'm always awed by the final slides of the presentation where the speaker shows the veritable army of collaborators they've led in producing their Science Thing. Correlation isn't causation, but if you want to go around and give seminars, collaborating widely seems like it helps get you there.

Build a big tent :-)

Richard
  • 1,008
  • 6
  • 10
13

I request that if you read the rest of this answer, you upvote it, send me a check for $100, and include me as a coauthor on your next paper.

See what I did there? It wasn’t unreasonable, I merely made a request. If it had been a demand, it would be quite unreasonable on the other hand. The same goes for these authors, who merely wrote (coyly using the passive voice to distance themselves from the obviously rather silly thing they are writing) that

[…] Depending on our level of interest and how much a paper depends on the BCI plot, co-authorship might be requested.

(Emphasis added.)

So, as @Buffy says, a person can ask for a pony, coauthorship or anything else that strikes their fancy. All you would actually owe them if you use the database is the same thing you owe anyone else whose published work you make use of, which is a citation.

If you read this far, I request that you ignore my earlier request, and that you ignore these database authors’ equally illogical one.

Dan Romik
  • 189,176
  • 42
  • 427
  • 636
  • 1
    Doesn't it make a difference that academic authorship has clear rules, such as no (co)authorship without significant intellectual contribution, and no significant intellectual contribution without (co)authorship (because otherwise the too-few authors would plagiarize)? That is very much in contrast to any request or demand you make here. – cbeleites unhappy with SX Jul 26 '21 at 19:27
  • 6
    Hmmm. Sorry, but I don't have an address. But you can pick up a pony at mine if you like. – Buffy Jul 26 '21 at 19:29
  • @cbeleites if I write a paper about research I did in which you did not participate from start to finish, then you have not made an intellectual contribution, significant or otherwise. So saying you reserve the right to ask me for coauthorship because I used your already-published dataset is no different than me asking you to give me coauthorship on your next paper because you had previously read my SE answer and were (obviously) very inspired by it. – Dan Romik Jul 26 '21 at 20:19
  • I gave you the Upvote. 0.333 is a pretty good batting average. – manassehkatz-Moving 2 Codidact Jul 27 '21 at 15:55
6

While I do not generally disagree with Buffy's thought-through answer, I think there is another thing to consider which makes it more difficult to judge whether their demand is unreasonable or legitimate.

There is a change in perspective these days. With mounting demands for research transparency, there is increasing moral pressure for groups to publish their data.

Normally, a group might get multiple publications out of such a good-quality set of data. With the pressure to publish it, it may be that this groups tries a different model - if they can not keep the data exclusive to themselves for a longer period over which they develop their papers, they seem to try to ask for a different way of both permitting transparency and reap the benefits of having developed the data set. A citation may not be sufficient when they could have multiple publications instead, so this is the way they try to proceed to make the effort worth their time.

In addition, I am not sure what the copyright situation is. Buffy says that data are not copyrightable, however, it may be that the preparation/curation of the data still gives them a copyright. I would be very surprised if that famous (notorious) book listing random numbers (https://www.amazon.co.uk/Million-Random-Digits-Normal-Deviates/dp/0833030477 ; reading the reviews is highly recommended before buying) were not copyrighted - but then, it's just random numbers, no?

In summary, it is unusual for an author to request co-publication based on someone using their published information, but this might be an experiment in transparent science. Maybe, if they find it is not worth their time, in the future they won't publish data before they have had extracted their fill of publications from it.

As hard criterion for OP, the only thing you really have to obey is copyright law and the academic standard of citation, possibly acknowledgements. But it might be worth considering whether you might find value in contemplating their demands and - who knows - possibly finding interesting new collaborators. Of course, academic standards require that they contribute intellectually to your paper.

Captain Emacs
  • 47,698
  • 12
  • 112
  • 165
  • 2
    If you generate the data with a computer program, for example, you have copyright to the program. And curation can be sufficiently intellectual in nature that it permits copyright. It can be subtle. See https://libguides.library.kent.edu/data-management/copyright, for example. But in the instant case, the data is specifically put in to the public domain, which gives up all rights. – Buffy Jul 26 '21 at 15:12
  • @Buffy Program yes, algorithm no, data, perhaps. And as for public domain, yeah, I noticed Tiger's comment. I am wondering whether to remove my reply as it does not really treat copyright as per OP's case. I guess this makes your response more pertinent for OP. My emphasis was on pointing out that the people responsible for the data might have a legit rationale to make the request, as OP thinks it is not legit in the first place. – Captain Emacs Jul 26 '21 at 16:42
  • 3
    Yes, a program can be copyrighted as it is a specific expression of an idea. Ideas themselves, including algorithms, aren't subject to copyright. But lets not get in to software patents here - ugh. But I'd suggest leaving the answer as there is a more general question that others might have and stumble in here. Unless you get unfair down votes, I guess. – Buffy Jul 26 '21 at 20:30
4

Trying actively to find a situation where I'd think this request may sensible:

They may think it likely that further work with their data will lead to lots of genuine discussion about the data between you and them. I.e., to them actually contributing intellectually to your study in a manner that warrants co-authorship.

I could think of such a scenario if e.g. a machine learning group would want to try out things with data I acquire. Such a group would typically not bring much expertise in the particular data-generation/measurement processes nor in the underlying application field. In consequence, a study may genuinely benefit from added expertise in all three domains.

cbeleites unhappy with SX
  • 23,007
  • 1
  • 44
  • 91
3

Is the dataset related to forest ecology? Because this is the norm there. The logic is that there is a lot of field work, which is really heavy work, which goes unappreciated with a simple citation. I've seen papers with 20 authors, because they used datasets which were authored by teams of 10+ people.

I personally do not agree with this view, but I've heard it and seen it in practice quite a bit.

Andrei
  • 1,045
  • 6
  • 9
1

Data sets are a bit like software. Often they are regarded to be not publishable by themselves, but some scientific contribution like new insight have to be generated with the software / from the data set.

If this is not the case, the author gets a publication for this data set / software and everybody using it will cite this paper. Any further authorship would come with additional contributions to new papers.
If there is no such paper, it becomes more tricky, as the original author cannot gain scientific credit (publications, citations) with his work.

usr1234567
  • 5,748
  • 14
  • 36
1

I had one case of data being produced in a research center and used in my publication. The one who built the experiment and measured the data did not contribute to the paper.

I added him as a co-author because I could not have written my paper without his work. I felt that he deserved actual academic recognition for his work (his name on a paper), not only a thank you.

This despite him not having worked at all on the paper.

PS. This is usual in particle physics where the 1256 authors do not participate in the write-up of a paper, their name just lands as co-author because of "collaboration". I personally was not listing such papers in my CV.

EDIT following the comments: the data was not published elsewhere, this post is to show a case of someone producing data, not participating in the paper at all and still being listed as a co-author

WoJ
  • 8,330
  • 18
  • 42
  • 2
    There are two reasons why I don't completely agree with this view: 1. All research builds on older research. If one publishes a proof of theorem A, which relies on theorem B, the author of theorem B's proof will not be a co-author. The proof of theorem B will be cited, but that's very different from making its author a co-author. – Ecobos Jul 27 '21 at 20:42
  • @Ecobos: the data he provided were not published / used in another publication. – WoJ Jul 27 '21 at 20:44
  • 1
  • Accepting someone's request to be a co-author means accepting that they will have some say over the contents of the paper. This may result in a positive contribution (good), in no contribution at all (neutral), but there is a risk that it will hinder publication (bad). We have all heard of stories of an irresponsible senior co-author delaying publication forever without a good reason when it would be important for a postdoc to finally get the paper out, as it's important for their job search ...
  • – Ecobos Jul 27 '21 at 20:46
  • 2
    That makes your case quite different from what I asked about (a public dataset, which has already been used in many publications). – Ecobos Jul 27 '21 at 20:48
  • @Ecobos: yes, I just wanted to add a case where someone who just provided data without other work on the paper could end up being a co-author. In my case it was my decision (and he agreed when I suggested that). – WoJ Jul 27 '21 at 20:50