Sunday, May 24, 2009

What's Up With Wojo? (Spring 2009)

Exams are over, and I passed. My major trips, presentations, and essays are all behind me. For the next four months, I can work on my research, travel for fun, and explore new projects and ideas. It's a much calmer time, though throngs of tourists visit Oxford and space in the library is harder to find. Life is different than it was several months ago, so it's time for an update and an overview.

Dissertation and Research

My academic work is the main focus of my life until September. Data mining cabinet networks, politica bloggers, and organizational affiliations... If you had to tag my life in this area, it would include three big ones: "Social Networks Analysis", "Machine Learning", and "Mathematical Modelling". I'll post on these topics in a number of blog posts over time.

Politics and International Development

For the first time in a number of years, I find myself without a concrete plan or opportunity to do field work for Five Minutes to Midnight or related organizations. In my defense, I need to spend more time at Oxford, but it's an odd feeling to not have any major plans in this area. As such, I've decided to do more writing on politics, human rights, international development, and technology. If I can't travel and do work abroad, I'll at least keep up to date with developments, and share a few opinions.

So far, I'm hoping to write articles on cyber war, anonymity, network neutrality, and technology-related human rights. Not as much international development as I'd like, but I'm open to suggestions and ideas.

Art and Culture

Now that I have more time to explore England and places further away, I will definitely do so. More plays, museums, events, and so on. No concerete plans yet, but they're coming!

That's all for now... A much more relaxed Wojo, that's for sure.

Tuesday, May 19, 2009

Political Blog Networks and the U.S. Presidential Election

I had the great fortune of giving a talk at the Nuffield Networks seminar series today, and the talk was titled "Political Blog Networks and the US Presidential Election". It was meant as an overview of some of the work I did at the IBM TJ Watson Research Center last year, though also went into new work I'm doing in sentiment analysis and complex networks. Overall, I'm quite pleased with the talk. It allowed me to find a focus for some of the work I've been doing over the last few weeks. Below is a brief list of some of the important points I wanted to raise during this talk.

Machine Learning is Important for Social Science

I think the most exciting part of my presentation, though also a part that was quite low-key, is the potential that machine learning holds for social science research. Labeling blog posts by hand is useful, but fairly intense and sometimes expensive. Tools like Amazon's Mechanical Turk provide a cheap alternative for labeling, but even this method if not scalable to two million or more blog posts.

I will not argue that machine learning can be the saviour of such social research, but rather that if it is used intelligently and correctly, it can help elucidate some of the trends within massive social systems (such as the blogosphere). By no means is this the death knell for human labels or qualitative research. Instead, I see the two working hand-in-hand.

Forget Word Vectors... Use Graph Theory

The subtitle is a bit strong, and maybe a little sensationalist. No, we shouldn't be avoiding word frequencies, multinomial distributions, or natural language processing... Keep these wonderful things, but also include the graph structure behind the blog posts and other data sets you are using! I remember writing a bit about some potential tools before, and still think it is quite important.

On a related point, predicting edges between nodes, while much harder (in my opinion) than predicting sentiment of a specific blog post, is still worth trying. There's a great paper that will be presented at the upcoming International Conference on Machine Learning, and it is worth reading.

Accuracy is Dead! Long Live Accuracy!

One of the biggest challenges in terms of this type of approach is how difficult it is to actually make predictions, and more importantly, how to validate models that predict rare events. When you're predicting hyperlinks between bloggers, you can have a model with 99% accuracy by simply saying that every blogger will not hyperlink to anyone. Accurate? Yes. Useless? Definitely.

Unfortunately, it's a bit difficult to justify the use of inaccurate machine learning models for social science research. That being said, I'm confident some creative and interesting solutions exist to this problem.

Monday, May 11, 2009

Great Books for Mathematical Modelling

I realize it has been over a month since I last posted, and for this I apologize. The reason is simple: I had to write my final exams. Aside from a few evenings spent with friends, I pretty much studied every day in April, after which I spent a week with my parents and then a week in Switzerland. Now it's time to return to my pre-exam life, which is hectic in a very different way.

Fortunately, I have my exam results and they went well. I feel like the last two months have been the months of learning and understanding various tools in mathematical modeling, and there are certain books I simply wouldn't be able to live without. If you are interested in some of the technical aspects of mathematical modelling or are thinking of studying for the formal M.Sc. at Oxford, make sure you keep these books in mind.

Numerical Mathematics (1): focusing on all topics related to actually implementing theoretical mathematical ideas in a computer. The numerical linear algebra section (specifically, solving linear systems) is the best and clearest I've read in a while.

Finite Element Methods and Fast Iterative Solvers: with Applications in Incompressible Fluid Dynamics (1): impressive and complicated name for an impressive and complicated area of research. Yes, there's a whole course on this in the M.Sc., though we only really get through the first chapter of the book!

Applied Partial Differential Equations (1): focuses on getting you to solve PDEs. Really, that's all I can say... Though I'm convinced that there's an inverse relationship between the number of words used to describe a mathematical problem and the number of things you have to do to actually do it!

Boundary Value Problems of Mathematical Physics (1): a great introduction to how one can use distributions to solve various problems. Think of it as generalizing and abstracting how you actually integrate or solve differential equations.

So to those books and authors thereof, thank you!