Search for authors on Google Scholar by their field of study and country

Question

On Google Scholar, you can do an author search based on keywords. For example, if I find an author who has "Robotics" as a keyword, and I click on that keyword, it lists all the authors with that keyword, in order of the number of citations.

What I would like to do, is see that same list, but:

1) Only for authors in my country

And:

2) By combining it with another keyword, e.g. all the authors who have "Robotics" and "Machine Learning".

Is this kind of advanced author search possible?

As an aside, it's worth mentioning that Google Scholar Author search only lists authors who have signed up to google scholar. — Jeromy Anglim, Jul 28 '16 at 01:19

score 14 · Accepted Answer · edited Apr 13 '17 at 12:49

You can search multiple keywords with something like this:

label:robotics + label:machine_learning

in the author search. Narrowing by country can be trickier. You can potentially narrow by the email address. For example, if their profile is verified with a UK email address it will end in .ac.uk. Thus you can search:

label:robotics + label:machine_learning + .ac.uk

and only get UK researchers. You'll probably get some false positives and negatives with this technique so it's not perfect but will help narrow the scope.

Here's a related question pertaining to narrowing country on Google Scholar:

Google Scholar: how to exclude some countries from the search?

Any insight on scenarios when it may generate false positive? — Bogaso, Apr 15 '20 at 11:57

score 0 · Answer 2 · answered Mar 23 '22 at 11:32

A complimentary answer to CephBirk if you want to do it programmatically via Python. Additionally, you can add functionality to save those results to CSV/Excel or to DB.

Code and full example in the online IDE to test out extraction part:

# Iterates over all pages and extracts profile results.
from parsel import Selector
import requests, json, re
https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
params = {
    "view_op": "search_authors",  # author results
    "mauthors": f'label:robotics + .de + "University of Freiburg"',  # search query
    "hl": "en",  # language
    "astart": 0  # page number
    }
https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Safari/537.36",
    }
profile_results = []
profiles_is_present = True
while profiles_is_present:
# make a request
html = requests.get(&quot;https://scholar.google.com/citations&quot;, params=params, headers=headers, timeout=30)

# pass response to HTML/XML processing library
selector = Selector(text=html.text)

print(f&quot;extracting authors at page #{params['astart']}.&quot;)

# iterate over profile results from one page
for profile in selector.css(&quot;.gsc_1usr&quot;):
    name = profile.css(&quot;.gs_ai_name a::text&quot;).get()
    link = f'https://scholar.google.com{profile.css(&quot;.gs_ai_name a::attr(href)&quot;).get()}'
    affiliations = profile.css(&quot;.gs_ai_aff&quot;).xpath(&quot;normalize-space()&quot;).get()
    email = profile.css(&quot;.gs_ai_eml&quot;).xpath(&quot;normalize-space()&quot;).get()
    cited_by = profile.css(&quot;.gs_ai_cby *::text&quot;).get()
    interests = profile.css(&quot;.gs_ai_one_int::text&quot;).getall()

    # append extracted result to the list
    profile_results.append({
        &quot;profile_name&quot;: name,
        &quot;profile_link&quot;: link,
        &quot;profile_affiliations&quot;: affiliations,
        &quot;profile_email&quot;: email,
        &quot;profile_city_by_count&quot;: cited_by,
        &quot;profile_interests&quot;: interests
        })

# check if next page token is present -&gt; update next page token and increment 10 to get the next page
if selector.css(&quot;button.gs_btnPR::attr(onclick)&quot;).get():
    # https://regex101.com/r/e0mq0C/1
    params[&quot;after_author&quot;] = re.search(r&quot;after_author\\x3d(.*)\\x26&quot;,
                                       selector.css(&quot;button.gs_btnPR::attr(onclick)&quot;).get()).group(1)  # -&gt; XB0HAMS9__8J
    params[&quot;astart&quot;] += 10
else:
    profiles_is_present = False


print(json.dumps(profile_results, indent=2))

Part of the JSON output:

[
  {
    "profile_name": "Wolfram Burgard",
    "profile_link": "https://scholar.google.com/citations?hl=en&user=zj6FavAAAAAJ",
    "profile_affiliations": "Professor of Computer Science, University of Freiburg",
    "profile_email": "Verified email at informatik.uni-freiburg.de",
    "profile_city_by_count": "Cited by 94818",
    "profile_interests": [
      "Robotics",
      "Artificial Intelligence",
      "AI",
      "Machine Learning",
      "Computer Vision"
    ]
  }, ... other results
]

Alternatively, you can achieve it using Google Scholar Profile Results API from SerpApi. It's a paid API with a free plan which differs by providing a complete solution without the need to figure out how to extract the data and maintain it over time.

# Iterates over all pages and extracts profile results.
import os, json
from urllib.parse import urlsplit, parse_qsl
from serpapi import GoogleSearch
params = {
    "api_key": os.getenv("API_KEY"),  # SerpApi API key
    "engine": "google_scholar_profiles",  # profile results search engine
    "mauthors": f'label:robotics + .de + "University of Freiburg"'  # search query
    }
search = GoogleSearch(params) # where extraction happens on SerpApi backend
profile_results_data = []
profiles_is_present = True
while profiles_is_present:
    profile_results = search.get_dict() # JSON -> Python dictionary
for profile in profile_results[&quot;profiles&quot;]:
    thumbnail = profile[&quot;thumbnail&quot;]
    name = profile[&quot;name&quot;]
    link = profile[&quot;link&quot;]
    author_id = profile[&quot;author_id&quot;]
    affiliations = profile[&quot;affiliations&quot;]
    email = profile.get(&quot;email&quot;)
    cited_by = profile.get(&quot;cited_by&quot;)
    interests = profile.get(&quot;interests&quot;)

    profile_results_data.append({
        &quot;thumbnail&quot;: thumbnail,
        &quot;name&quot;: name,
        &quot;link&quot;: link,
        &quot;author_id&quot;: author_id,
        &quot;email&quot;: email,
        &quot;affiliations&quot;: affiliations,
        &quot;cited_by&quot;: cited_by,
        &quot;interests&quot;: interests
        })

    if &quot;next&quot; in profile_results.get(&quot;pagination&quot;, []):
        # splits URL in parts as a dict() and update search &quot;params&quot; variable to a new page that will be passed to GoogleSearch()
        search.params_dict.update(dict(parse_qsl(urlsplit(profile_results.get(&quot;pagination&quot;).get(&quot;next&quot;)).query)))
    else:
        profiles_is_present = False


print(json.dumps(profile_results_data, indent=2))

Part of the JSON output:

[
  {
    "thumbnail": "https://scholar.googleusercontent.com/citations?view_op=small_photo&user=zj6FavAAAAAJ&citpid=6",
    "name": "Wolfram Burgard",
    "link": "https://scholar.google.com/citations?hl=en&user=zj6FavAAAAAJ",
    "author_id": "zj6FavAAAAAJ",
    "email": "Verified email at informatik.uni-freiburg.de",
    "affiliations": "Professor of Computer Science, University of Freiburg",
    "cited_by": 94818,
    "interests": [
      {
        "title": "Robotics",
        "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Arobotics",
        "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:robotics"
      },
      {
        "title": "Artificial Intelligence",
        "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aartificial_intelligence",
        "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:artificial_intelligence"
      },
      {
        "title": "AI",
        "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aai",
        "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:ai"
      },
      {
        "title": "Machine Learning",
        "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Amachine_learning",
        "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:machine_learning"
      },
      {
        "title": "Computer Vision",
        "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Acomputer_vision",
        "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:computer_vision"
      }
    ]
  }, ... other results
]

Disclaimer, I work for SerpApi.

Search for authors on Google Scholar by their field of study and country

2 Answers2

https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls

https://docs.python-requests.org/en/master/user/quickstart/#custom-headers

Linked

Related