The Blog




I did a quick search this morning trying to find details of Googletalk's approach to NAT busting when both clients are behind a firewall but without success. It's really irritating when a Google search brings up your own blog with the question!

The only experience I have here is with my daughter at Edinburgh University. They have a very restrictive firewall that limits you to HTTP via a proxy and that's basically it. She (with my help) hasn't been able to get Skype or MSN to work. But Googletalk got straight through with good voice quality. So how did they do that?

But mostly I'm curious about how Google's servers compare with Skype's Supernodes. Anyone?

But most of all I'm desperately waiting for GoogleTalk v0.2 and the first decent alternative (Gaim?) that supports voice.




One small snippet for anyone working with Google Base. They've opened up the restrictions on the g:location tag. You can now put in any location string that returns a single location in maps.google.com So "postcode, country" is perfectly acceptable and works. They also have a format for lat,long[1].

I'm still getting errors from Google about "too much activity" but it seems that this is a limitation of the management pages and not the underlying database. I'm successfully uploading Ecademy's Marketplace listings daily.

If you're doing a regular upload, you need to post a bulk file by hand first. Then use FTP to relace the same filename periodically. Something that's desperately needed for bulk uploaders is an incremental load. At the moment, you have to post your entire database as old items from old copies of the file are deleted when you put up a new bulk file. The fact that they take RSS/Atom as a data transfer format means we should be able to get to the point where we register a feed and then use a ping technique to tell Google it's changed.

Sadly, Google are ignoring the PubDate tag. All items get the date from the bulk file date. This screws up the listings when you view "by date".

I still don't understand the need for a g:label tag when RSS already has Category and we're using it successfully to tell other services about tags on an item.

I think the user interface to Google Base still needs lots of work. Several of the fields look like they ought to be clickable but aren't. And it's not always clear what's going to happen when you click on a tag label.

[1]A note here. Google has a very powerful Geocoder in their String to Lat,Long conversion. I wish they would provide an API to this.

Light Reading - VOIP - Jabber IM Adds VOIP - Telecom News Wire

Jingle All The Way

I don't need to tell you that this is hugely significant. It adds P2P NAT busting and support for Audio and in the near future Video to the Jabber community. There's also the hint of a near future SIP interoperability as well. It's all open source, free to use in commercial products and has announcements of support from Gaim, Trillian and Asterix among others.

Inevitably there are questions arising
- There needs to be a Mozilla Firefox style project to produce a definitive IM-Voip product.
- Long term, how can Skype, Yahoo! and MSN compete.
- At what point do AIM and Apple join in.
- What's in this for Google?
- Where's GoogleTalk v0.2? Or does Google now get out of the Client business and just focus on running the servers?
- Google has threatened to interoperate with other Jabber servers at some stage in the future. How does this affect that?

But right now I want just one thing. A way of putting GoogleTalk/Jabber presence on a web page. And a way of clicking on that presence to launch a GoogleTalk/XMPP chat session.




Just got this from Microsoft. It's only really of historical interest now. A SOAP addressable index of SOAP endpoints seemed like such a good idea at the time. Shame the press built it up into something
it was never going to be. And that the big players managed to complicate it all so much that, like SOAP, it became unusable. At the time I actually thought an Open Source implementation in PHP would be a good idea and there were a few attempts but AFAIK they never really got finished.

It feels like the idea has now been replaced with the directories of web APIs being built by WSFinder and Programmable Web

uddi@microsoft.com Wed, 14 Dec 2005 16:19:15
You are receiving this mail because you have registered as a publisher on the Microsoft node of the UDDI Business Registry (UBR). The primary goal of the UBR was to prove the interoperability and robustness of the UDDI specifications through a public implementation. This goal was met and far exceeded, and now the UBR is discontinuing its operations. As part of this process the Microsoft UBR node at uddi.microsoft.com will be permanently unavailable for all operations beginning January 12, 2006. Data stored in the UBR may be retrieved until January 12, 2006 and used in accordance with the UDDI Business Registry terms of use available at http://uddi.microsoft.com/policies/termsofuse.aspx. You may find the UDDI Data Export Wizard useful for retrieving your data, and it is available here: http://www.microsoft.com/downloads/details.aspx?familyid=9D467A69-57FF-4AE7-96EE-B18C4790CFFD. For more information, please see the frequently asked questions related to the UBR discontinuation at http://uddi.microsoft.com/about/FAQshutdown.htm. You may submit feedback to Microsoft at the following location: http://uddi.microsoft.com/contact/default.aspx.

Thank You,
Microsoft UDDI Team




I just had one of those Ah-Ha moments.

Back in the day we used to focus group discussions on a subject around a mailing list. The participants knew where to find the latest conversation and it was pushed to you via email. Now we've exploded these discussions all over the web into individual blogs, blog posts and comment threads on blog posts. It's still way too hard to track who said what about the particular subject and who replied to your post. We've tried all sorts of things to try and cope with this all the way to trying to use common tags and using Technorati and things like it to bring it all back together.

Over on Tribes, they're having a Terms and Conditions crisis. Somebody asked if you could build a P2P version of Tribes wth no central controlling body. I thought of Skype Group chats and Groove. Then I remembered the Microsoft SSE RSS extensions and Ray Ozzie.

So now imagine an extension to Blog CMS software that allowed a blog to subscribe to a federated group of participating blogs that maintained state between themselves by replicating the conversation around using RSS SSE. This would begin to look like a mailing list again, but where there was no central server.

Or have I just re-invented NNTP and Usenet?


MP3Test
MP3 validity checker [from: del.icio.us]




p2pnet.net - Europe may pass snoop law : "The legislation, written in September, is coming up for a vote in record time," says the New York Times. "Though it generally takes a year to 18 months to bring a law to a vote, the countries that make up the union back the legislation, which comes in the wake of terrorist attacks in London last summer and Madrid last year."

And the European parliament is expected to approve it, says the story, going on that the speed, however, alarms telecommunications companies, which say the public hasn't had enough time to consider the implications.

The version the EU parliament will vote on tomorrow, written by Britain, would require phone companies to keep information like the time of phone calls or fax transmissions, the phone numbers of incoming and outgoing calls and the duration of the calls for at least two years, says the NYT. Details of e-mail activity would have to be stored for a minimum of six months.

...

"The industry's main worry, however, is cost. It estimates that telecommunications companies would have to store 50 times the data they do now. There is no provision under the draft law to compensate phone companies and Internet providers."


Oh good grief. Everything I've seen has asked how Telcos and ISPs can afford this. But that assumes that they are the only ones that provide Email servers and such like. What about Corporates and SMEs. Even private individuals. Will they too have to store 6 months worth of email logs? My Ecademy logs are currently running at 0.25 Gb a week or 6 Gb for 6 months. Not too bad I guess. I dread to think what the volume is for the emails as well.

It's about time Google sorted out a system to cope with people who have multiple accounts with them. I reckon I've got 3 email addresses used for Adsense, Analytics and googlegroups. Then there's a gmail account also used for Googletalk. And an Orkut account. And Blogger. Then I've got a signin to Google Earth as well

Now currently these tend to be stored in cookies and things like Googletalk have trouble working out who you currently are. If you made the mistake (I did) of using different email addesses for different services, you then find yourself constantly logging in and logging out. And every so often you end up with a password change in one system (like something in the UK) that doesn't propagate to the others.

Yahoo! hit the exact same problem some years ago and allowed you to link email addresses together and confirm that they all represented a single person. As they acquire people like Flickr and now del.icio.us they're doing a fairly good job at adding them into the system. Google needs to do the same. and soon.

Looking at all this, Google and Yahoo! have a golden opportunity to create a competitor to Passport but to do it in an open way with plenty of APIs (preferably based on something like OpenID). I really don't understand why they don't do this.

My good friend Dave posted this.

Scripting News: 12/12/2005 : Here's a serious thought. Google Base is all about RSS, right? That's cool. I understand RSS, and I like it. And no one owns it, dammit. But everyone's all concerned (rightly so, imho) about turning over our data to Google. How do we know they won't put ads in our stuff without our permission? And without paying us? How do we know they won't change what we say? We've been through this before, and they won't even talk about it with us! How about that for snarky.

So here's an idea, let's start a company, hire some great people to run our database. Instead of being the users in "user-generated content," we'll be the owners. The former sounds like a hamster in a cage, to me. An owner is someone who commands respect. We could wear badges and give ourselves business cards that say Owner. Forget about being a hamster. I want to be an owner!

I've already talked about this idea with friends with huge pipes and big databases who know how to run things reliably, and they find the idea interesting. But so far they haven't said they'll do it.


So I've done a load of work on Google Base, and right now it sucks. It's an unstable alpha. Craigslist is irritatingly local and it's horribly Web 0.9. eBay is all about money, not classifieds. A web 2.0 version of Craigslist with tags and AJAX and maps and APIs and RSS coming out of it's ears sounds like a good plan. And there's some real serendipity available if you link it with a publisher driven Ads system like BlogAds. Post your listing, have it turn up all over the place via RSS and including in the Ads on relevant Blogs.

I want to do this, and I've now got a lot of experience after building a tag based listings service into Ecademy. I was seriously thinking about building it as a Drupal module, but the tag support in Drupal was going off in a direction that didn't help me.

Anyone else want to play at this? Drop me a line at julian_bond at voidstar.com

[edited to add] As all the major blog systems add support for tags, they become a neat way of working out relevance for Ads placed on the blog. Just match off the tags added by the advertiser with the tags added by the blogger. Then pick a random Ad from the intersection.




The Clicker: What did I just buy? - Engadget - www.engadget.com

They turned off comments on this article. But it's mention of derivatives got me thinking.

What if we could trade the right to use content independently of the content itself?

In the financial markets, ever since Stani Yassukovich invented the Eurobond, it's become normal to produce ever more complex paper that securitises underlying investments. Fundamentals are grouped, packaged and turned into a derivative paper contract that is then tradable independently of the fundamental. Can the same thing be done with media content? Can we create a situation were I can buy and trade the right to listen to the latest Cinematic Orchestra Album without actually shipping the bits from one place to another? Can I go further than this and package my MP3 collection as a contract to listen to it and then sell that to somebody else? How about futures and options. Could I sell the
right to access it in 6 months time?

To some extent the EULA is all that's left of digital media. The actual bits get moved, converted and transformed more or less without friction. It's only the EULA that has any value. And it's only the physical EULAs that let me prove to the RIAA that I obtained all my MP3s honestly. But
the EULA is just text and itself ends as just bits as well. So what if we used digital money technologies to ensure that one and only one copy of the EULA is in the possession of one and only one person at any one time. Now I can trade this certificate separately from the MP3.

Which reminds me of the guy who did the proof by example of trading an iTMS track on eBay shortly after iTMS launched.




::HorsePigCow:: life uncommon: The Madenning Octet

The 8-fold path to Internet Happiness

1. Information wants to be free
2. Zero distance
3. Mass amateurisation
4. More is much more
5. True names
6. Viral behaviour
7. Everything is personal
8. Ubiquitous computer

The 8 barriers to progress

1. Copyright
2. Borders
3. Censorship
4. Network blocking
5. Identity cards and databases
6. More network blocking
7. Everything is trackable
8. No privacy

A restatement of the World Of Ends manifesto as another 8-fold path.

1. The Internet isn't complicated
2. The Internet isn't a thing. It's an agreement.
3. The Internet is stupid.
4. Adding value to the Internet lowers its value.
5. All the Internet's value grows on its edges.
6. The Internet has three virtues:
a. No one owns it
b. Everyone can use it
c. Anyone can improve it
7. So Money moves to the suburbs.
8. It's a world of ends, not the end of the world.




I've been listening to Lawrence Lessig giving an interview to Digital Village and part of it is about the current arguments around Google Print.

I have a slightly different take on this. Most of the arguments from both sides and from commentators have been about copyright and copyright law and focusing on fair use. I think this is a red herring. I think what's really happening here is horse trading in public between Google and the publishers. And the publisher's real worry is that Google is creating a new form of their product which they really should be doing themselves. In a few years, Google will hold a digital copy of the publisher's product and the publishers won't. And at some time in the future after that, the publishers will want to buy a digital copy from Google rather than create one themselves. At which point it may be rather more expensivethan they would like because Google will have sole control.

One point in the copyright arguments that I find particularly interesting is that fair use of the written word is fairly well understood and has a long history. Provided you give attribution where you can and limit what you quote, it's held to be entirely reasonable to include a snippet of text from a copyrighted work in your work and then do whatever you like with it, including to sell it. However there appears to be no equivalent fair use for any other form of media and communication. Or at least that's what the major media companies would like you to believe. And in particular where it relates to sound and vision. To make it completely clear, including a sample of music or video in your music or video is not allowed while including a sample of text in your text is allowed. Now why is that and is it right? And as we move to an increasingly multimedia world and away from a pure text world, this limits our freedom of expression and ability to create new works. It also limits Google's (and others) ability to provide search metadata for audio and video in comparison with their ability with text. This is the core of Lessig's arguments for a remix culture and against current copyright law.

Has anyone figured out bulk upload via FTP? I can't seem to make it work reliably.

What I want to do is a daily upload from a PHP script. The script is running, it appears to upload ok and ftp returns no errors, but Google rarely seems to process the file. And there's precious little feedback about what's going on. And when there is feedback, it's 2-6-12 hours later.

1) Are you supposed to use the same file over and over again? Or does it need a new unique filename for each upload?

2) Do you have to do one manual upload and then ftp over the same filename later? Or if you do the manual upload does that prevent using ftp later?

3) After doing an ftp upload by hand, filezilla, firefox and IE usually show no entry on the server side. What? But if you use linux ftp from the command line and do a dir / the new file is there with the correct timestamps. If you try to upload the file twice in quick succession, the second upload fails with a permissions error. Try it an hour later and it goes through as if Google is processing the previous one. But the timestamps on the web bulk upload file display don't change.

Agh! This is getting extremely frustrating and google support are not responding to trouble tickets.
The only mailing list I can find doesn't seem to be very helpful either.

If anyone can help please email julian_bond at voidstar.com




Quantum mechanics appears to work. The equations predict things we can test and these check out. But numerous anomalies and paradoxes appear if we try and scale them up to the real world. There are two classic problems 1) Put a cat in a box with a radioactive pellet. If the pellet decays, release a poison capsule. After one half life of the pellet, the cat has 50% probability of being dead. The equations predict that the cat is both alive and dead until we open the box. Not either alive and dead but both alive and dead. 2) There are lots of atomic actions that generate a matched pair of particles with opposite values of some parameter. Like polarization of light photons. Now before we measure one of them we don't know their state. But after measuring one we know the state of the other. Now let them move light years apart. When we measure one we now know the state of the other instantaneously. So something (information perhaps) has traveled across the gap faster than the speed of light.

There are a bunch of major views on all this.

- The Copenhagen interpretation. The equations don't reflect reality. They reflect a reality we need to create in order to think about what's happening. They're nothing but maths, although they are useful maths

- Bell's Theorem. Particles that have been in touch continue to influence each other. They can only do this if the communication employs no known form of energy since it violates Relativity. Which leads to.

- Everett-Wheeler-Graham. Everything that can happen does but in another universe. When we open the box and discover a live cat we choose which universe to live in and it's the one with a live cat. Next door there is a universe where we found a dead cat.

- Hidden variable. There is an invisible hand below the quantum level that is manipulating reality to appear the way it does. Some people think this is consciousness

- Non-objectivity. The universe has no reality aside from observation. If the tree falls in the wood and nobody hears it, there is no sound.

Now Reed's law suggests that that the utility of large networks, particularly social networks, can scale exponentially with the size of the network. The reason for this is that the number of possible sub-groups of network participants is 2^N - N - 1 , where N is the number of participants. This grows much more rapidly than either

* the number of participants, N, or
* the number of possible pair connections, N (N - 1) / 2, (which follows Metcalfe's law)

But this requires that every entity in the network is in touch with every other entity, and all possible combinations of entities simultaneously. This violates several social relativity laws such as the number of people any one person can "know" and track. And as this is all happening in time rather than as a snapshot it assumes simultaneous communication across time and space at faster than light speed. So while the reality of the value of large networks does appear to be somewhere between Metcalfe and Reed, we can draw some parallels with the Quantum Physics paradoxes and theorems. :)

- Reed and Metcalfe are just mathematical formalisms. We don't actually know what they mean by value. But they seem to be handy when predicting the success of various networks.

- Two people who meet at an Social Networking meeting continue to influence each other across space and time (limited only by their ability to use email)

- Every possible combination of people at Social Networking meetings does in fact happen. Just not all in the same room.

- The network owners are actually manipulating all their members. They just don't all notice this.

- If you don't keep your eyes open at Social Networking meetings, you'll miss the details. If you're asleep in the corner and don't observe it, it's as if the meeting never happened.




I'm sure I've seen reports that Sony have only been selling their XCP Rootkit infected CDs in the USA.

So what's this then on the UK Amazon website?

Amazon.co.uk: XCP: Search Results Music

And why doesn't this page say anything about refunds in the UK?





I've been playing with Google Base and also tracking Amazon because I've been trying to find a Book equivalent of last.fm, settled on librarything.com and was looking at the Amazon API, lists and recommendation system. What's interesting in all this is categorization systems for general SKU#s This also got raised in the Microformats group. The underlying problem is that a job, an event, a book, an iPod, a piece of music, and so on, all have different sets of attributes.

So back to Google Base. They've got their own set of top categories. Each category has a set of attributes that can be provided as part of an RSS or Atom upload using namespace extensions (Why isn't Dave Winer all over this? Somebody should tell him.). You can then also provide arbitrary tags that you can use in Google Base searches as pivots. These are kind of like Flickr/del.icio.us tags but the user interface is very different and there's no community feedback at all.

Now, inside Ecademy (and like Tribes) we've got our own copy of Craigslist. We use user-entered tags (unlike Tribes and Craigslist) to navigate these and to avoid me having to constantly maintain a category hierarchy. I'm working on uploading these automatically into Google Base. But if I'm going to do this really effectively I have to duplicate Google's top level categories and category attributes.

And so I finally get to the point. Google, Amazon and eBay with their APIs and uploads are forcing a kind of category imperialism on the rest of us. As much smaller developers, we have to match their category-attribute schemes in our systems or maintain lookup tables of the differences. And I really can't decide if this is a bad thing or a good thing. We do actually need standards for how to describe common SKUs. And the Microformat people could do worse than create an hGoogleJob format to match Google's idea of job. But then you get into the whole issue that Google's descriptions aren't always very good. For instance, they currently only support USA addresses and telephone numbers (surprise!). And Google are pretty much impossible to have a conversation with to try and improve them. There's no standards process here at all. We get to use whatever Google decides with no reference to us.

Much to ponder here.

Separately, Danny Ayers (I think) made a comment (which of course I can no longer find) about OPML, XOXO and RDF. He pointed out that a hierarchy is just a special case of a mesh. And that all real world problems are meshes. I'd take that further, we as humans have a hard time understanding and groking meshes. It's as though they're multi-dimensional when our brains want to work in 3D and preferably in 2D. And we work best manipulating symbols on 2D paper. Which is partly why we insist on reducing complex mesh problems into 2D Hierarchies. And it's why I consider Outliners to be harmful. Somewhere in there is a big thesis on why we seem to like hierarchical command structures that support an Alpha Male. But I digress! One of the things that makes tag based folksonomies interesting is that they represent a way of navigating and understanding multi-dimensional meshes in a 2D user interface.




2005 Notebook Drive Roundup
I'm sure I'm going to be updating the drive ooner rather than later. You can never get enough storage, right? [from: del.icio.us]




Another quick thought about Google Base. It’s good that they support RSS and Atom as transfer formats but with Google specific extensions that are category specific. Which means knowing about their categories in my system and storing, retrieving and formatting the data specific to them for each category. This is going to be a form of category imperialism where we our systems will have to be designed round their category data elements.

And FTPing a bulk file is a pretty old fashioned API. Why can’t they just subscribe to my RSS?




I've been messing around this afternoon with taking entries out of Ecademy marketplace and posting them into Google Base.

I've taken the last 15 items in the marketplace and written some code to produce a custom RSS feed and saved it locally. I've then uploaded it using an ftp program. Google is currently processing the entries and accepted the file. So I think I can do a daily update of the file, save it and then use ftp_put to upload it. There's probably a way of using ftp_fput to stream it straight from the code.

Some discoveries.

1) The lack of a global location element is annoying. I've emailed Google about this but not heard anything. But then it is thanksgiving. I think they ought to accept address strings that work in google maps, break it out into street, city, county/state and country and/or provide a lat/long element. This all arises because the location element requires a full US address "Should include street, city, state, postal code, and country, in that order." and partial data is apparently not allowed. eg Anytown, CA, 12345, USA and hence SG12 7DB, UK or London, UK

2) They've added new categories Blogs and News Articles. This seems to suggest that they want pretty much any arbitrary data. That includes all the blogs, meetings etc from Ecademy. Lots of overlap here with Blogsearch and Search.

3) There's a missing category for "Listing". They've got "Wanted Ads" and a bunch of others but no obvious way to say "it's a listing and I don't know what type". I can imagine that as other people do this we're all going to end up using their categories to describe our stuff in our systems. I feel vaguely uncomfortable about this.

1 to 20 of 3860