15

On Google Scholar, you can do an author search based on keywords. For example, if I find an author who has "Robotics" as a keyword, and I click on that keyword, it lists all the authors with that keyword, in order of the number of citations.

What I would like to do, is see that same list, but:

1) Only for authors in my country

And:

2) By combining it with another keyword, e.g. all the authors who have "Robotics" and "Machine Learning".

Is this kind of advanced author search possible?

CephBirk
  • 1,458
  • 14
  • 30
Karnivaurus
  • 1,435
  • 2
  • 14
  • 24
  • 6
    As an aside, it's worth mentioning that Google Scholar Author search only lists authors who have signed up to google scholar. – Jeromy Anglim Jul 28 '16 at 01:19

2 Answers2

14

You can search multiple keywords with something like this:

label:robotics + label:machine_learning

in the author search. Narrowing by country can be trickier. You can potentially narrow by the email address. For example, if their profile is verified with a UK email address it will end in .ac.uk. Thus you can search:

label:robotics + label:machine_learning + .ac.uk

and only get UK researchers. You'll probably get some false positives and negatives with this technique so it's not perfect but will help narrow the scope.

Here's a related question pertaining to narrowing country on Google Scholar:

Google Scholar: how to exclude some countries from the search?

CephBirk
  • 1,458
  • 14
  • 30
0

A complimentary answer to CephBirk if you want to do it programmatically via Python. Additionally, you can add functionality to save those results to CSV/Excel or to DB.

Code and full example in the online IDE to test out extraction part:

# Iterates over all pages and extracts profile results.

from parsel import Selector import requests, json, re

https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls

params = { "view_op": "search_authors", # author results "mauthors": f'label:robotics + .de + "University of Freiburg"', # search query "hl": "en", # language "astart": 0 # page number }

https://docs.python-requests.org/en/master/user/quickstart/#custom-headers

headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Safari/537.36", }

profile_results = []

profiles_is_present = True while profiles_is_present:

# make a request
html = requests.get("https://scholar.google.com/citations", params=params, headers=headers, timeout=30)

# pass response to HTML/XML processing library
selector = Selector(text=html.text)

print(f"extracting authors at page #{params['astart']}.")

# iterate over profile results from one page
for profile in selector.css(".gsc_1usr"):
    name = profile.css(".gs_ai_name a::text").get()
    link = f'https://scholar.google.com{profile.css(".gs_ai_name a::attr(href)").get()}'
    affiliations = profile.css(".gs_ai_aff").xpath("normalize-space()").get()
    email = profile.css(".gs_ai_eml").xpath("normalize-space()").get()
    cited_by = profile.css(".gs_ai_cby *::text").get()
    interests = profile.css(".gs_ai_one_int::text").getall()

    # append extracted result to the list
    profile_results.append({
        "profile_name": name,
        "profile_link": link,
        "profile_affiliations": affiliations,
        "profile_email": email,
        "profile_city_by_count": cited_by,
        "profile_interests": interests
        })

# check if next page token is present -> update next page token and increment 10 to get the next page
if selector.css("button.gs_btnPR::attr(onclick)").get():
    # https://regex101.com/r/e0mq0C/1
    params["after_author"] = re.search(r"after_author\\x3d(.*)\\x26",
                                       selector.css("button.gs_btnPR::attr(onclick)").get()).group(1)  # -> XB0HAMS9__8J
    params["astart"] += 10
else:
    profiles_is_present = False

print(json.dumps(profile_results, indent=2))

Part of the JSON output:

[
  {
    "profile_name": "Wolfram Burgard",
    "profile_link": "https://scholar.google.com/citations?hl=en&user=zj6FavAAAAAJ",
    "profile_affiliations": "Professor of Computer Science, University of Freiburg",
    "profile_email": "Verified email at informatik.uni-freiburg.de",
    "profile_city_by_count": "Cited by 94818",
    "profile_interests": [
      "Robotics",
      "Artificial Intelligence",
      "AI",
      "Machine Learning",
      "Computer Vision"
    ]
  }, ... other results
]

Alternatively, you can achieve it using Google Scholar Profile Results API from SerpApi. It's a paid API with a free plan which differs by providing a complete solution without the need to figure out how to extract the data and maintain it over time.

# Iterates over all pages and extracts profile results.

import os, json from urllib.parse import urlsplit, parse_qsl from serpapi import GoogleSearch

params = { "api_key": os.getenv("API_KEY"), # SerpApi API key "engine": "google_scholar_profiles", # profile results search engine "mauthors": f'label:robotics + .de + "University of Freiburg"' # search query } search = GoogleSearch(params) # where extraction happens on SerpApi backend

profile_results_data = []

profiles_is_present = True while profiles_is_present: profile_results = search.get_dict() # JSON -> Python dictionary

for profile in profile_results["profiles"]:
    thumbnail = profile["thumbnail"]
    name = profile["name"]
    link = profile["link"]
    author_id = profile["author_id"]
    affiliations = profile["affiliations"]
    email = profile.get("email")
    cited_by = profile.get("cited_by")
    interests = profile.get("interests")

    profile_results_data.append({
        "thumbnail": thumbnail,
        "name": name,
        "link": link,
        "author_id": author_id,
        "email": email,
        "affiliations": affiliations,
        "cited_by": cited_by,
        "interests": interests
        })

    if "next" in profile_results.get("pagination", []):
        # splits URL in parts as a dict() and update search "params" variable to a new page that will be passed to GoogleSearch()
        search.params_dict.update(dict(parse_qsl(urlsplit(profile_results.get("pagination").get("next")).query)))
    else:
        profiles_is_present = False

print(json.dumps(profile_results_data, indent=2))

Part of the JSON output:

[
  {
    "thumbnail": "https://scholar.googleusercontent.com/citations?view_op=small_photo&user=zj6FavAAAAAJ&citpid=6",
    "name": "Wolfram Burgard",
    "link": "https://scholar.google.com/citations?hl=en&user=zj6FavAAAAAJ",
    "author_id": "zj6FavAAAAAJ",
    "email": "Verified email at informatik.uni-freiburg.de",
    "affiliations": "Professor of Computer Science, University of Freiburg",
    "cited_by": 94818,
    "interests": [
      {
        "title": "Robotics",
        "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Arobotics",
        "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:robotics"
      },
      {
        "title": "Artificial Intelligence",
        "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aartificial_intelligence",
        "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:artificial_intelligence"
      },
      {
        "title": "AI",
        "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aai",
        "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:ai"
      },
      {
        "title": "Machine Learning",
        "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Amachine_learning",
        "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:machine_learning"
      },
      {
        "title": "Computer Vision",
        "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Acomputer_vision",
        "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:computer_vision"
      }
    ]
  }, ... other results
]

Disclaimer, I work for SerpApi.

Dmitriy Zub
  • 143
  • 5