Building a Scientific Concept Hierarchy Database (SCHBase)

Eytan Adar and Srayan Datta


Abstract

Extracted keyphrases can enhance numerous applications ranging from search to tracking the evolution of scientific discourse. We present SCHBase, a hierarchical database of keyphrases extracted from large collections of scientific literature. SCHBase relies on a tendency of scientists to generate new abbreviations that "extend" existing forms as a form of signaling novelty. We demonstrate how these keyphrases/concepts can be extracted, and their viability as a database in relation to existing collections. We further show how keyphrases can be placed into a semantically-meaningful "phylogenetic" structure and describe key features of this structure. The complete SCHBase dataset is available at: http://cond.org/schbase.html.