Saturday, March 31, 2007

Citation graphing challenge

Academic literature is exploding.

Spencer Weart summarizes the problem quite well in his article "Trend-spotting: Physics in 1931 and today"

"A quick way to compare the situations then and now is to look at the Physical Review. The most obvious difference is size. Last year's volumes take up about 30 times as much shelf space as did the two 1931 volumes—not to mention that the pages have gotten bigger and the print smaller. To be sure, the 1931 student also had to read Zeitschrift für Physik and Nature. Even so, the student could have read every important article in the field. Today such breadth is out of the question; dozens of subfields each publish more than the entire physics community did back then. To get an overview of physics nowadays, you must read review journals; scan news stories in Science, Nature, and PHYSICS TODAY; and—that old standby—talk with professors."

This is a project that I hope people smarter than I will solve. My complaint about current academic search engines like NASA ADS, LANL arxive, and google scholar is that the search results are too flat. A keyword search results in a list sorted by a complicated algorithm that might take into account word frequency, seperation distance, connectedness between cited links, and a whole host of things. However what the search cant tell you is what you might be missing by using the wrong phrases. Academic literature uses citations for a reason. What I would like to see is a search engine that displays its results graphicly in a user friendly graph that allows the user to select different display modes (context grouping, co-citation grouping, author\topic highlighting, 2d\3d representation, interactive navigation).

At first I thought I could get some of this done myself. I wrote a perl script that retrieves two citation or reference lists from NASA ADS entries and finds their intersection. The answers the question "what articles did both of my papers of interest cite" or "what articles reference both of my papers of interest" . These metrics measure co-citation.

It worked well but it was an itty bitty baby step in a large problem. Maybe by next update of this page google or NASA will step up to the plate! In my estimation Tim Brody at Citebase is probably the closest, he recently added cocitation and coreferencing, just like my crummy little perl script but much nicer.

Labels: , ,

March 2007 Ment' ath

Today I updated my email signature file. I noticed everyone that was cool had more stuff in theirs. I guess under the assumption that anyone you email (directly or via N forwards) will want to know your phone number, website, and favorite aphorism. I was updating it because I wanted to send a professor an email then I realized I should update my website since now thats in there. So now I am updating my old google pages site and I am just deleting the entire "physics" section and starting over with more recent research topics.
This is a little sad since, though it represents is three long horrible years and causes a little embarrasement at my naivete, it acts as a reminder. When I read this it reminds me how easy it is to become victim to hubris. All scientists have feelings and feelings like intuition, left unguided or uninformed can be blinding and deadly. Intuition is just an unconsious synthesis of acquired data. How can one have intuition about things that take many years to fully explore and understand?

Anyway. I preserve here my "physics page" as it was written in January of 2006. Note: I never did get anywhere with QFT. It is a difficult subject true, but it turned out that I never even got to the hard bits, other classwork and regular money work got in the way.

Also Note: Now that I've had more experience with physical computer modeling I suspect that my implementation left lots to be desired. My model implemented a simple Euler integration scheme and didnt really make any sense. Why not compute the trajectories of each particle using newtonian dynamics and reflections? This is simple trig and could probably be reduced to a linear algebra problem. Thats my slightly more educated intuition talking this time.

Anyway, back to that web page->email signature->email.
Happy Ment' ath!

Independent Work

Large Scale Gravitation

Much of the problems in physics at large and small scales has to do with gravitation. In one case there is a need for a complete theory of quantum gravity where effects can be observed in deviations from Newtonian gravity at short distances, on the other hand there are unexplained phenomena at distances almost as large as the hubble radius.
Through field theory and the standard model with supersymmetry three of the the four 'forces' have been unified (though not necessarily better understood). Gravity is the odd one out and appears to be the source of many of the big questions. Thus it is likely that the explanations of the phenomena will bring solutions to the unification problem.

Thermodynamics

Gas in a box; is it really that hard to solve? This was a short inquiry into the transition of a multi-particle dynamical system from classically solvable to requiring statistical methods. The transition was defined as the number of non-interacting particles in a simple numerical simulation that could bounce around a box in real time.

It turned out that assuming particles following classical trajectories and not interacting, like photons, one could calculate tractories for about 1000 particles before the simulation slowed to less than 1:1 (computer time:real time). Of course, this was a coarse Eulerian type step integration of very simple dynamics so the addition of more elements to the model would just decrease the number of particles my computer is capable of following.

This little experiment reminded me that entropy is a quantity that requires some element of multiplicity for macrostates. In this case the system can be represented by a single point describing a cyclic motion in 3N parameter space (each point executes completely reversible motion inside the box). Because the phase and period of the each particles path in the box depends on its own initial state and the subsequent geometry, the recurrence time -the time after which the intial state is regained- might be a good measure of the entropy of the "steady state".

Studies

Quantum Field Theory
Most of the work one finds on cosmology and gravitation problems seems to require a small amount of relativity and a rather large portion of field theory. Field theory is very popular as a means to build models about escotericly large or small phenomona. Especially since all particle physics can be done this way.
In cosmology, acceleration of the expansion might be caused by a slowly changing field and at bottom gravity might be mediated by the Higgs field.
QFT is one of those physics topics that is almost always an... afterthought in a physics education. Certainly not the foundation in freshman physics. Yet in most of the literature one sees little to no remnant of anything from the first 6 years of your education.
In any case, I am studying in Zee's book, "QFT in a nutshell". After several days of hard work, I am finally halfway through chapter i.!


Labels: , ,

Tuesday, January 30, 2007

allowing subdirectory access with .htaccess on Apache

Another bit of documentation that should be more obvious online:
If you have a directory on an apache webserver that is protected by a .htaccess script and you want to allow access in one of its subdirectories, simply create a .htaccess file in the "allowed" directory with the text.

Allow from all
Satisfy Any

Thanks to Pascal over at nano-rails who's correct solution showed up on as google hit #79.

Labels: ,

Thursday, January 25, 2007

installing Numerical Recipes on cygwin

#unpack source tarball into /usr/local/src/
#edit Makefile so that "CFLAGSANSI = -ansi -I../include" (we added the -ansi) this gets rid of a conflict between math.h and nr's fmin function
# bash$ ln -s /lib/gcc/i686-pc-cygwin/3.4.4/include/stddef.h /usr/include -nrutil.c wants this file, the target director I found by find /lib -name stddef.h
#go to directory with Makefile and execute make
#compile programs with: bash$ cc -ansi -o myprogram myprogram.c /usr/local/lib/librecipes_c.a

Labels: , , , ,

Sunday, December 31, 2006

GW noise sum.

The total noise or measurement error in a measurement can be expressed as the "quadratic sum" of all measurement errors. Any instrument suffers from background "noise". Any observed signal (s) can be divided into two components. That which we wish to observe (h) and "anything else" (n). If these two parts are unrelated, they add linearly.
In many cases the distinction is obvious. For example a CCD image of a star will be the sum of the effect of the starlight on the camera plus the effects of heat and dust (and lots more!).
In observations with more coarse instruments where many different signals can be detected, the distinction between noise and signal becomes blurry. Consider a radio station which is overlapped by several other stations at a time, as is common between large metropolitan areas. One station is probably louder than the rest and would be easily distinguishable by itself but the overlapping signals from other stations make listening difficult. This is a case of certain signals becoming undesirable noise. If you had a directional antenna you could turn it until your desired station was much louder -spatially resolving the signal.
Like a dipole radio antenna, GW telescopes are usually all-sky detectors. If there are two sources which are spatially unresolved and are close in frequency then the sources become confused together and not much can be learned the individual source. Resolution of each parameter such as frequency and sky position is a function of the number of data points. If the source is faint, the resolution decreases. One find the limit below which, two sources within a certain parameter distance cannot be distinguished from each other. These sources then become a kind of noise. Thus if there is a large number of these indistinguishable sources, the noise level rises.
Noise can also be thought of as contributing to measurement error. In the case of multiple sources of noise,

each adds a gaussian? distribution of a certain width. The total noise is the quadratic sum of the widths. If the noise has a zero mean, then it is also the quadratic sum of the rms values.

Labels: ,

Wednesday, December 27, 2006

MBH Readings

Reading
Sesana et. al. Dec 2004

This article is a more refined version of a similar paper published a few months prior. It seeks to calculate the number of Massive Black Hole (MBH) inspiral and coalescence signals observed by LISA in a three year observing period.
Key Concepts to refine:
  • 'Generic' gravitational wave signals. Bursts, periodic signals, characteristic strain...
  • 1/f noise
  • Sensitivity. Is it S/N=Sensitivity*Signal?
  • What is it mean to "Add in quadrature"
Keywords (words with ad-hoc or implicit definitions, related concepts)
  • Hierarchy
  • MBH formation
The statistics on page 8 seem ad-hoc. They give a single event count and explore the sensitivity to various priors by computing one or two other numbers. It would seem to me that the important prediction wouldnt be the number of events observed in three years but rather the dependence of that number on assumptions. The treatment of this question does not satisfy me.

I need a better understanding of the gravity waves. Why is the spectrum and time evolution the way it is? What is a "burst" and why can we treat it as a single wavelength pulse.

Is the loose estimate that the frequency 'bin' Df is the same size as f typical? This seems like an awfully large bin.

All this aside, this is much better than the first version of this article (Sesana et al, ApJ August 2004)

Labels: ,

Saturday, December 16, 2006

Citation Mapping Keywords

  • Salient semantic structures
  • citation patterns
  • Latent Semantic Indexing
  • Pathfinder Network Scaling
  • co-citation map
  • semantic space
  • semantic structures
  • digital libraries
  • self-organizing
  • information space
  • citation graph
  • degree distributions
  • graph theoretical algorithms
  • small-world
  • connectivity
  • bibliometric
  • informetrics
  • domain visualizations
  • SCI-Map
  • geometric triangulation method
  • information retrieval

Labels: ,

Citation Graphing

Anyone who has tried to navigate a complex web of articles in a new academic realm knows the pain and confusion of being ignorant of the milestone works, the important authors and groups, and even sometimes the most basic keywords. In most cases an old field will have evolved with keywords and phrases going into and out of "style". Given a literature search with criteria ignorant of the important key words it is possible to explore only a small subset of the "space" despite being right "alongside" of the latest important works. Often the only solution is the so-called "review paper", however these are rarely unbiased assessments of all past work, rarely give a complete view of a field, and are often non-existent.
This problem is an area of active research by people in various Computer Science, Sociology, Math, and Physics fields. In most cases a single database, conference, or journal is studied looking for patterns in citations. A well known result is that in large samples, citations form a scale invarient network. Meaning that small clusters of specific research areas are connected together into larger clusters of less specific subject area and these are then clustered around a generic principle or idea and so on, like a fractal.
Others recognize the problems I've outlined above and research ways for people to interact with "digital libraries" in ways that allow a quick navigation to the heart of an issue. For example the following is from Chen 1999:
Author co-citation networks provide a snapshot of a scientific field as reflected through publications in the literature. More importantly, co-citation patterns offer a valuable alternative to the existing visualisation paradigms, which largely rely on the analysis of term distributions in a document collection. Author co-citation maps generated in our study are informative and revealing.
This paper in particular outlines a very neat looking way for a user to navigate the publication space based on the common ideas and citation patterns. My question is: If this has been the subject of research for many years, why are there not any existing tools using this technology?
The first step in answering this question involves (of course) navigating the publication space on this subject. In the next post I will record as many key phrases as I can.

Update:
I found the end of the discussion in An 2002 (link coming) to also be a good description of the problem:

Our intuition and experience tells us that papers on a specific research topic must be more densely interconnected than random groups of papers. Hence research topics or ”communities” should correspond to locally dense structures in the citation graph. However, our work shows that the connectivity of citation graphs as a whole is such that it is not possible to extract such communities with straightforward methods such as minimum cut. More sophisticated methods are needed if we wish to succeed in mining the community information encoded in the link structure of a citation
graph or other networked information spaces.

Another important subject of further study is the evolution of the citation graph over time. Knowledge about the temporal evolution of the local link structure of citation graphs can be used to predict research trends or to study the life span of specialties and communities. Such knowledge can also be used for the development of dynamic models for the citation graph. Such models can, in turn, give insight into the self-organizing processes that created the citation graph, and serve as a tool for prediction and experimentation.
We need this tool. Lets turn this area of research into an area of usefulness. Imagine if Watt had simply published his findings on steam engines in "Annals of Industrial Revolution" and gone on to more advanced research questions.

Labels: ,

Saturday, December 02, 2006

Improved Optimal Dissertation plot


After a recent observation over at PhD comics on the most horrible implications of being scooped for your PhD topic I just had to point out the most likely region of UR (uniqueness and relevance) space where this might occur. Of course this assumes that noone else has overlapping confidence intervals in UR space, in which case the probabilities become horribly non-linear.

Labels: ,

wikipaper

Most wiki systems incorporate equation editing and other basic formating stuff like figures with captions and sections/subsections. These are the same basics of any latex formatted paper. Are there any scripts out there for converting wiki pages to latex?
One can imagine an environment where collaborators can all work on a paper in a wiki environment but with the addition of other unpublished appendices. These could have related articles, data, conversations (like a message board), etc. Seems like its all there, just need a converter.

Labels: , ,