G2 Crawler News

Comments (1)

g2paranha – The New G2 Crawler

Anyone who has read through this blog knows that the crawler has tended to crash fairly often. In recent times it was crashing to much to even continue running it. But rather than give up entirely I decided to write my own crawler. Five weeks later and g2paranha has emerged. To go along with the new crawler is a redesigned website written by kevogod. g2paranha has been designed to be distributed. Currently I run one crawler and Datz kindly volunteered to run another.

Over the next few weeks expect there to be a few bumps as any bugs get ironed out. I’ll also be rounding out the feature set provided by the original crawler. Right now that means adding country flags and keeping track of unique nodes. This last feature is the best estimate of the network size so until it is implemented the network counter won’t be put back on the front page. The leaf count on the history page gives an over count of the network size because each leaf can connect to multiple hubs.

I’ll try to keep the news updated as I progress.

2008 06 23

The State of G2

I was reading the Gnutella2 article on Wikipedia today and I noticed both entries in the External Links section point to my sites (crawler.trillinux.org and g2.trillinux.org). The latter being the new home for the G2 specs after gnutella2.com was allowed to expire. This got me thinking that it looks like I’m the only one trying to keep G2 from completely disappearing.

This is partly to benefit others and partly out of self-interest. I don’t think the G2 protocol as a whole is all that spectacular anymore, if it ever was. But parts of the protocol can be reapplied to accomplish other things. For example, at its core is a specification for a compact, extensible tree structure for communication. This could be made generic and used for all sorts of applications outside of G2. The search mechanism of a random walk is not original or unique but it’s the largest P2P network I’m aware of that still makes use of it so from that perspective it could be interesting to study.

I run the crawler out of self-interest. I like data, statistics, and graphs. I never turn down the opportunity to collect raw data and turn it into graphs and make inferences from the data.

I started maintaining the G2 website sometime in (late?) 2005 and moved it to its current home in October 2007. The crawler has similarly been running since late 2005. Here’s to many more years to come.

2008 02 28

Comments (5)

More Crawler Downtime

I spent last weekend replacing my router with another computer. The transition was a bit bumpy but things are starting to get sorted out. More extended periods of downtime are possible over the next few weeks as I get things completely transitioned and working reliably.

2008 01 30

Crawler Downtime

The crawler has been down since Friday because I’m doing hardware work on my router. It is also the computer that does backups and it has had a slowly failing hard drive for the last few months. I finally bought a new hard drive and have been deciding how to set it up. In the mean time my network is a bit fragmented and the consumer router I’m using as backup folds under the load of the crawler. Things should be back to normal sometime on Tuesday.

2007 10 08

New Graphs

I added some new graphs back in April on the hub density page. They show the percentage of hubs with a certain number of leaves. This way the capacity of hubs can be tracked more granularly than just the average leaves per hub statistic.

Let me know about other improvements you’d like to see and I will try to make them happen.

2007 05 24

Country Database Update

You may have noticed that the number of “Unknown”/”??” countries has been increasing since the graphs came back in June. This is because the country is determined by using MaxMind’s GeoLite Country database which maps IP addresses to countries. The version the crawler was using hadn’t been updated since September 2005 and many new IP blocks have been added since then. So the database is now current and hopefully the country stats will more accurately reflect where people really are. Some of the countries which should see increases are Australia, Costa Rica, India, Japan, Korea, Mexico, New Zealand, and Thailand.

2006 10 11

New best hub uptime

Today a new best hub uptime was established beating the old one of 209h 10m 45s. This is because for some reason unknown to me the crawler has decided to run for over a week continuously without dying. I did however check system logs recently and it does look like it may be crashing because of a memory leak. With any luck magic fixed it and it will now run smoothly forever.

2006 10 06

A new crawler blog

First the actual crawler program had to be stopped on its original host and I took over. Then the website that hosted the stats and pretty graphs of the crawls went down and never came back. So I set up my own website but it didn’t have the graphs. A while later I managed to recreate the graphs as well using Munin. So to keep alive my tradition of adopting the evicted and forgotten I am starting a blog to replace the old crawler blog which has now dropped off the internet.

I don’t know that I’ll have much in the way of actual news to share so don’t expect frequent updates.

The latest news on the state of the crawler is that it tends to die every day or two. I don’t know what is causing it to stop. Also my attempts at getting it to start itself back up haven’t been successful. So there are lots of gaps where it has stopped and I haven’t started it back up yet either because I’ve forgotten about it or can’t be bothered to. If someone nags me periodically I’d probably keep it up more.

2006 09 21