Sunday, May 24, 2009

What's Up With Wojo? (Spring 2009)

Exams are over, and I passed. My major trips, presentations, and essays are all behind me. For the next four months, I can work on my research, travel for fun, and explore new projects and ideas. It's a much calmer time, though throngs of tourists visit Oxford and space in the library is harder to find. Life is different than it was several months ago, so it's time for an update and an overview.

Dissertation and Research

My academic work is the main focus of my life until September. Data mining cabinet networks, politica bloggers, and organizational affiliations... If you had to tag my life in this area, it would include three big ones: "Social Networks Analysis", "Machine Learning", and "Mathematical Modelling". I'll post on these topics in a number of blog posts over time.

Politics and International Development

For the first time in a number of years, I find myself without a concrete plan or opportunity to do field work for Five Minutes to Midnight or related organizations. In my defense, I need to spend more time at Oxford, but it's an odd feeling to not have any major plans in this area. As such, I've decided to do more writing on politics, human rights, international development, and technology. If I can't travel and do work abroad, I'll at least keep up to date with developments, and share a few opinions.

So far, I'm hoping to write articles on cyber war, anonymity, network neutrality, and technology-related human rights. Not as much international development as I'd like, but I'm open to suggestions and ideas.

Art and Culture

Now that I have more time to explore England and places further away, I will definitely do so. More plays, museums, events, and so on. No concerete plans yet, but they're coming!

That's all for now... A much more relaxed Wojo, that's for sure.

Tuesday, May 19, 2009

Political Blog Networks and the U.S. Presidential Election

I had the great fortune of giving a talk at the Nuffield Networks seminar series today, and the talk was titled "Political Blog Networks and the US Presidential Election". It was meant as an overview of some of the work I did at the IBM TJ Watson Research Center last year, though also went into new work I'm doing in sentiment analysis and complex networks. Overall, I'm quite pleased with the talk. It allowed me to find a focus for some of the work I've been doing over the last few weeks. Below is a brief list of some of the important points I wanted to raise during this talk.

Machine Learning is Important for Social Science

I think the most exciting part of my presentation, though also a part that was quite low-key, is the potential that machine learning holds for social science research. Labeling blog posts by hand is useful, but fairly intense and sometimes expensive. Tools like Amazon's Mechanical Turk provide a cheap alternative for labeling, but even this method if not scalable to two million or more blog posts.

I will not argue that machine learning can be the saviour of such social research, but rather that if it is used intelligently and correctly, it can help elucidate some of the trends within massive social systems (such as the blogosphere). By no means is this the death knell for human labels or qualitative research. Instead, I see the two working hand-in-hand.

Forget Word Vectors... Use Graph Theory

The subtitle is a bit strong, and maybe a little sensationalist. No, we shouldn't be avoiding word frequencies, multinomial distributions, or natural language processing... Keep these wonderful things, but also include the graph structure behind the blog posts and other data sets you are using! I remember writing a bit about some potential tools before, and still think it is quite important.

On a related point, predicting edges between nodes, while much harder (in my opinion) than predicting sentiment of a specific blog post, is still worth trying. There's a great paper that will be presented at the upcoming International Conference on Machine Learning, and it is worth reading.

Accuracy is Dead! Long Live Accuracy!

One of the biggest challenges in terms of this type of approach is how difficult it is to actually make predictions, and more importantly, how to validate models that predict rare events. When you're predicting hyperlinks between bloggers, you can have a model with 99% accuracy by simply saying that every blogger will not hyperlink to anyone. Accurate? Yes. Useless? Definitely.

Unfortunately, it's a bit difficult to justify the use of inaccurate machine learning models for social science research. That being said, I'm confident some creative and interesting solutions exist to this problem.

Monday, May 11, 2009

Great Books for Mathematical Modelling

I realize it has been over a month since I last posted, and for this I apologize. The reason is simple: I had to write my final exams. Aside from a few evenings spent with friends, I pretty much studied every day in April, after which I spent a week with my parents and then a week in Switzerland. Now it's time to return to my pre-exam life, which is hectic in a very different way.

Fortunately, I have my exam results and they went well. I feel like the last two months have been the months of learning and understanding various tools in mathematical modeling, and there are certain books I simply wouldn't be able to live without. If you are interested in some of the technical aspects of mathematical modelling or are thinking of studying for the formal M.Sc. at Oxford, make sure you keep these books in mind.

Numerical Mathematics (1): focusing on all topics related to actually implementing theoretical mathematical ideas in a computer. The numerical linear algebra section (specifically, solving linear systems) is the best and clearest I've read in a while.

Finite Element Methods and Fast Iterative Solvers: with Applications in Incompressible Fluid Dynamics (1): impressive and complicated name for an impressive and complicated area of research. Yes, there's a whole course on this in the M.Sc., though we only really get through the first chapter of the book!

Applied Partial Differential Equations (1): focuses on getting you to solve PDEs. Really, that's all I can say... Though I'm convinced that there's an inverse relationship between the number of words used to describe a mathematical problem and the number of things you have to do to actually do it!

Boundary Value Problems of Mathematical Physics (1): a great introduction to how one can use distributions to solve various problems. Think of it as generalizing and abstracting how you actually integrate or solve differential equations.

So to those books and authors thereof, thank you!

Wednesday, April 1, 2009

Virtualization on Mac OS X -- Wow!

I've written about this before, but wanted to share some more thoughts about my current laptop setup, and why I absolutely love it.

First and foremost, I am a Linux geek. It took me a while to be converted, and I still remember the early days... Walking around with my laptop and an external hard drive with Fedora Core.

I eventually started using Ubuntu as my main OS, and did so until this past June, when I bought a Mac. I tried to get Ubuntu running, but the amount of time I had to invest to get all the drivers working was a bit much, and I decided to use OS X. I miss Ubuntu, and use it as much as I can. Now that I'm working on my research again, Ubuntu is becoming more and more prevalent in my life.

And this is where VMWare Fusion enters. I run OS X, enjoy Ubuntu immensely, and also got on the beta testing wagon for Windows 7...

So here's what I do when I need to run multiple OSes. I enter full screen mode in VMWare Fusion, and have six desktops running in OS X. In full screen mode, I have either Ubuntu or Windows 7 (sometimes, but rarely, both). Even when I'm using Ubuntu or Windows, I still have the ability to switch between Desktops in OS X. This means I can be in full screen mode on any one of my OSes, and simply switch out of them using the standard Mac commands. I end up with four Mac spaces, an Ubuntu space, and a Windows 7 space... All easily accessible with a simple key press.

Talk about multiple Desktops in Mac... Now if only I had the multiple computer monitors I used back in Canada!

To make things even more wonderful, VMWare Fusion allows you to share folders between the various operating systems. So now when I do my research, I can actually modify the same files, documents, and presentations in any of the operating systems. How sweet is that?

This is such an awesome feature, and I just can't write enough about it. For anyone who enjoys working in multiple operating systems, I really recommend trying this.

Tuesday, March 24, 2009

Why I'm Giving Up on IMs

Instant Messengers (IMs) are wonderful software products... I've been on ICQ, MSN, AIM, and a number of other incarnations since elementary school, forced to use them at various companies and organizations, and still have a few accounts. However, I'm deciding to bring it all to an end... At least temporarily, though we'll see how it goes.

The idea behind technology is convenience, and passive information is wonderful -- I can log into Twitter, Google Reader, Blogspot, e-mail, and Facebook whenever I want. Furthermore, these services are passive -- I am not interrupted when working if I am logged into any of them (at least with my current settings). IMs are different. They run in the background and people can contact me whenever I'm on, whenever they want. This has become far from convenient, as I try to memorize theorems or write essays.

So it's time to say goodbye to the instant messenger. If you need me, you can still find me on G-mail, Twitter, Facebook, and of course, this blog.