Saturday, December 16, 2006

Citation Graphing

Anyone who has tried to navigate a complex web of articles in a new academic realm knows the pain and confusion of being ignorant of the milestone works, the important authors and groups, and even sometimes the most basic keywords. In most cases an old field will have evolved with keywords and phrases going into and out of "style". Given a literature search with criteria ignorant of the important key words it is possible to explore only a small subset of the "space" despite being right "alongside" of the latest important works. Often the only solution is the so-called "review paper", however these are rarely unbiased assessments of all past work, rarely give a complete view of a field, and are often non-existent.
This problem is an area of active research by people in various Computer Science, Sociology, Math, and Physics fields. In most cases a single database, conference, or journal is studied looking for patterns in citations. A well known result is that in large samples, citations form a scale invarient network. Meaning that small clusters of specific research areas are connected together into larger clusters of less specific subject area and these are then clustered around a generic principle or idea and so on, like a fractal.
Others recognize the problems I've outlined above and research ways for people to interact with "digital libraries" in ways that allow a quick navigation to the heart of an issue. For example the following is from Chen 1999:
Author co-citation networks provide a snapshot of a scientific field as reflected through publications in the literature. More importantly, co-citation patterns offer a valuable alternative to the existing visualisation paradigms, which largely rely on the analysis of term distributions in a document collection. Author co-citation maps generated in our study are informative and revealing.
This paper in particular outlines a very neat looking way for a user to navigate the publication space based on the common ideas and citation patterns. My question is: If this has been the subject of research for many years, why are there not any existing tools using this technology?
The first step in answering this question involves (of course) navigating the publication space on this subject. In the next post I will record as many key phrases as I can.

Update:
I found the end of the discussion in An 2002 (link coming) to also be a good description of the problem:

Our intuition and experience tells us that papers on a specific research topic must be more densely interconnected than random groups of papers. Hence research topics or ”communities” should correspond to locally dense structures in the citation graph. However, our work shows that the connectivity of citation graphs as a whole is such that it is not possible to extract such communities with straightforward methods such as minimum cut. More sophisticated methods are needed if we wish to succeed in mining the community information encoded in the link structure of a citation
graph or other networked information spaces.

Another important subject of further study is the evolution of the citation graph over time. Knowledge about the temporal evolution of the local link structure of citation graphs can be used to predict research trends or to study the life span of specialties and communities. Such knowledge can also be used for the development of dynamic models for the citation graph. Such models can, in turn, give insight into the self-organizing processes that created the citation graph, and serve as a tool for prediction and experimentation.
We need this tool. Lets turn this area of research into an area of usefulness. Imagine if Watt had simply published his findings on steam engines in "Annals of Industrial Revolution" and gone on to more advanced research questions.

Labels: ,

0 Comments:

Post a Comment

<< Home