electronic museum

Entries categorized as ‘api’

The Brooklyn Museum API – Q&A with Shelley Bernstein and Paul Beaudoin

April 16, 2009 · 4 Comments

The concept and importance of museum-based API’s are notions that I’ve written about consistently (boringly, probably) both on this blog and elsewhere on the web. Programmatic and open access to data is – IMO – absolutely key to ensuring the long-term success of online collections.

Many conversations have been going on about how to make API’s happen over the last couple of years, and I think we’re finally seeing these conversations move away from niche groups of enthusiastic developers (eg. Mashed Museum ) into a more mainstream debate which also involves budget holders and strategists. These conversations have been aided by metrics from social media sites like Twitter which indicate that API access figures sometimes outstrip “normal web” browsing by a factor of 10 or more.

On March 4th 2009, Brooklyn Museum announced the launch of their API, the latest in a series of developments around their online collection. Brooklyn occupies a space which generates a fair amount of awe in museum web circles: Shelley Bernstein and team are always several steps in front of the curve – innovating rapidly, encouraging a “just do it” attitude, and most importantly, engaging wholly with a totally committed tribe of users. Many other museum try to do social media. Brooklyn lives social media.

So, as they say – without further ado – here’s Shelley and Paul talking about what they did, how they did it, and why.

Q: First and foremost, could you please introduce yourselves – what your main roles and responsibilities are and how you fit within the museum.

Shelley Bernstein, Chief of Technology. I manage the department that runs the Museum’s helpdesk, Network Administration, Website, gallery technology, and social media.

Paul Beaudoin, Programmer. I push data around on the back-end and build website features and internal tools.

Q: Can you explain in as non-technical language as possible what exactly the Brooklyn API is, and what it lets people do?

SB: It’s basically a way outside programmers can query our Collections data and create their own applications using it.

Q: Why did you decide to build an API? What are the main things you hope to achieve …and what about those age old “social web” problems like authority, value and so-on?

SB: First, practical… in the past we’d been asked to be a part of larger projects where institutions were trying to aggregate data across many collections (like d*hub). At the time, we couldn’t justify allocating the time to provide data sets which would become stale as fast as we could turn over the data. By developing the API, we can create this one thing that will work for many people so it no longer become a project every time we are asked to take part.

Second, community… the developer community is not one we’d worked with before. We’d recently had exposure to the indicommons community at the Flickr Commons and had seen developers like David Wilkinson do some great things with our data there. It’s been a very positive experience and one we wanted to carry forward into our Collection, not just the materials we are posting to The Commons.

Third, community+practical… I think we needed to recognize that ideas about our data can come from anywhere, and encourage outside partnerships. We should recognize that programmers from outside the organization will have skills and ideas that we don’t have internally and encourage everyone to use them with our data if they want to. When they do, we want to make sure we get them the credit they deserve by pointing our visitors to their sites so they get some exposure for their efforts.

Q: How have you built it? (Both from a technical and a project perspective: what platform, backend systems, relationship to collections management / website; also how long has it taken, and how have you run the project?)

PB: The API sits on top of our existing “OpenCollection” code (no relation to namesake at http://www.collectiveaccess.org) which we developed about a year ago. OpenCollection is a set of PHP classes sitting on top of a MySQL database, which contains all of the object data that’s been approved for Web.

All that data originates in our internal collections management systems and digital asset systems. SSIS scripts run nightly to identify approved data and images and push them to our FreeBSD servers for processing. We have several internal workflow tools that also contribute assets like labels, press releases, videos, podcasts, and custom-cropped thumbnails. A series of BASH and PHP scripts merge the data from the various sources and generate new derivatives as required (ImageMagick). Once compiled new collection database dumps and images are pushed out to the Web servers overnight. Everything is scheduled to run automatically so new data and images approved on Monday will be available in the wee hours Tuesday.

The API itself took about four weeks to build and document (documentation may have consumed the better part of that). But that seems like a misleading figure because so much of the API piggy-backs on our existing codebase. OpenCollection itself – and all of the data flow scripts that support it – took many months to build.

Cool diagrams. Every desk should have some.

Cool diagrams. Every desk should have some.

Q: How did you go about communicating the benefits of an API to internal stakeholders?

SB: Ha, well we used your hoard.it website as an example of what can happen if we don’t! The general discussion centered around how we can work with the community and develop a way people can can do this under our own terms, the alternative being that people are likely to do what they want anyway. We’d rather work with, than against. It also helped us immensely that an API had been released by DigitalNZ , so we had an example out there that we could follow.

Q: It’s obviously early days, but how much interest and take-up have you had? How much are you anticipating?

SB: We are not expecting a ton, but we’ve already seen a lot of creativity flowing which you can check out in our Application Gallery. We already know of a few things brewing that are really exciting. And Luke over at the Powerhouse is working on getting our data into d*hub already, so stay tuned.

Q: Can you give us some indication of the budget – at least ballpark, or as a % compared to your annual operating budget for the website?

SB: There was no budget specifically assigned to this project. We had an opening of time where we thought we could slot in the development and took it. Moving forward, we will make changes to the API and add features as time can be allocated, but it will often need to be secondary to other projects we need to accomplish.

Q: How are you dealing with rights issues?

SB: Anything that is under copyright is being delivered at a very small thumbnail size (100px wide on the longest size) for identification purposes only.

Q: What restrictions do you place on users when accessing, displaying and otherwise using your data?

SB: I’m not even going to attempt to summarize this one. Here’s the Terms of Service – everyone go get a good cup of coffee before settling down with it.

Q: You chose a particular approach (REST) to expose your collections. Could you talk a bit about the technical options you considered before coming to this solution, and why you preferred REST to these others?

PB: Actually it’s been pointed out that our API isn’t perfectly RESTful, so let me say first that, humbly, we consider our API REST-inspired at best. I’ve long been a fan of REST and tend to gravitate to it in principal. But when it comes down to it, development time and ease of use are the top concerns.

At the time the API was spec’ed we decided it was more important to build something that someone could jump right into than something meeting some aesthetic ideal. Of course those aren’t mutually exclusive goals if you have all the dev time in the world, but we don’t. So we thought about our users and looked to the APIs that seemed to be getting the most play (Flickr, DigiNZ, and many Google projects come to mind) and borrowed aspects we thought worked (api keys, mindful use of HTTP verbs, simple query parameters) and left out the things we thought were extraneous or personally inappropriate (complicated session management, multiple script gateways). The result is, I think, a lightweight API with very few rules and pretty accommodating responses. You don’t have to know what an XSD is to jump in.

Q: What advice would you give to other museums / institutions wanting to follow the API path?

SB: You mean other than “do it” <insert grin here>? No, really, if it’s right for the institution and their goals, they should consider it. Look to the DigitalNZ project and read this interview with their team (we did and it inspired us). Try and not stress over making it perfect first time out, just try and see what it yields…then adjust as you go along. Obviously, the more institutions that can open their data in this way, the richer the applications can become.

_______

Many, many thanks to Shelley and Paul for putting in the time to answer my questions. You can follow the development of the Brooklyn Museum collections and API over on their blog, or by following @brooklynmuseum on Twitter. More importantly, go build something cool :-)

Categories: IT · api · collections · community · innovation · mashup · technology · web2.0
Tagged: , , , , , , ,

(Selling) content in a networked age

April 1, 2009 · 4 Comments

I’m just back from Torquay where I’d been asked to speak at the 32nd annual UKSG conference. I first came across UKSG more than a year ago when they asked me to speak at a London workshop they were hosting. Back then, I did a general overview of API’s from a non-technical perspective.

This time around, my presentation was about opening up access to content: the title “If you love your content, set it free?” builds on some previous themes I’ve talked and written about. Presenting on “setting content free” to a room of librarians and publishers is always likely to be difficult. Both groups are – either directly or indirectly – dependent on income from published works. I’m also neither publisher nor librarian, and although I spent some time working for Waterstone’s Online and know bits about the book trade, my knowledge is undoubtedly hopelessly out of date.

Actually, I had two very receptive crowds (thank you for coming if you were there!) and some really interesting debate around the whole notion of value, scarcity and network effects.

Like any sector, publishers and librarians have their own language, their own agendas and their own histories of successes and failures. Also like any sector, they are often challenged to spend time thinking about the bigger picture. Day jobs are about rights and DRM, OPAC and tenure. They aren’t (usually) about user experience, big-picture strategy or considering and comparing approaches from other sectors.

What I wanted to do with the presentation was to look at some of the big challenges which face (commercial) material in the networked world by thinking a bit more holistically about people’s relationship with that content, and the modes of use that they apply to the stuff that they acquire via this networked environment.

The – granted, rather challenging – title of the presentation is actually a question cunningly disguised as a statement. Or maybe it’s a statement cunningly disguised as a question. I lost track. The thing I was trying to do with this questatement (and some people missed this, more fool me for being too subtle) was to say: “Look, here’s how many people are talking about content now: they’re making it free and available; they’re encouraging re-use; they’re providing free and open API’s. They’re understanding that users are fickle, content-hungry and often unfussy about the origin of that content. What, exactly, do we do in an environment like this? What are the strategies that might serve us best? Can we still sell stuff, and if so, how?”

The wider proposition (that content fares rather better when it is freed on the network than when it is tethered and locked down) is a source of fairly passionate debate. I’ve written extensively about Paulo Coehlo’s experiments in freeing his books, about API’s, about “copywrong“, about value, authority and authenticity. The suggestion that if you free it up you will see more cultural capital is starting to be established in museums and galleries. The suggestion that you might, just might, increase your financial capital by opening up is for the most part considered PREPOSTEROUS to publishers. Giving away PDF’s increases book sales? Outrageous. Apart from the only example I’ve actually seen documented, of course, which is Coehlo’s, and that seems to indicate a completely different story.

There are fine – and all the finer the closer you examine them – levels of detail. Yes, an academic market is vastly different from a popular one: you don’t have the scale of the crowd, the articles are used in different ways, the works are generally shorter, the audiences worlds apart. But nonetheless, Clay Shirky’s robust (if deeply depressing) angle on the future – sorry, lack of future – of the newspaper industry needs close examination in any content-rich sector. I don’t think anyone can deny that the core proposition he holds up – that the problems that (newspaper) publishing solves (printing, marketing and distribution) are no longer problems in the networked age. I don’t think that what he’s saying is that we won’t have newspapers in the future, and he’s definitely not saying that we won’t need journalists. What he is saying – and this was the angle I focused on in my slides – is that this change is akin to living through a revolution. And with this revolution needs to come revolutionary responses and understanding that the change is far bigger and more profound than almost anyone can anticipate. The open API is one such response (The Guardian “Open Platform” being an apposite example). Free PDF’s / paid books is another. Music streaming and the killing of DRM is another.

Revolutions are uncomfortable. The wholesale examination of an entire industry is horrifically uncomfortable. Just take a look at the music business and you’ll see a group of deeply unhappy executives sitting around the ashes of a big pile of CD’s as they mourn the good ‘ole times. But over there with music, new business models are also beginning to evolve and emerge from these ashes. Spotify is based on streaming, Last.fm is based on social, Seeqpod is a lightweight wrapper for Google searches, The Pirate Bay ignores everyone else and provides stuff for free.

Which ones are going to work? Which ones will make money? Which ones will work but displace the money-making somewhere else? The simple answer, of course, is that no-one really knows. Some models will thrive, others will fail. Some will pave a new direction for the industry, others we’ll laugh at in five years time.

So where can the answers be found? Predictably for me, I think all sectors (including academic publishing!) need to take a punt and do some lightweight experimentation. I think they need to be trying new models of access based around personalisation, attention data and identity. They need to examine who gets paid, how much and when. They need to be setting stuff free in an environment where they can measure – effectively – the impact of this freedom across a range of returns, from marketing to cultural to financial. If they do this then they’re at least going to have some solid intelligence to use when deciding which models to take ahead. And it may be that this particular industry isn’t as challenged as most people assume, and that the existing models can carry on – lock it down, slap on some DRM, charge for access. It’d be far less uncomfortable if this was the case. But at least that decision would be made with some solid knowledge backing it up.

Open Access is one clear way of forging this debate ahead. But once you get under the apparently simple hood of the OA proposition, it actually turns out that not only are many institutions simply ignoring guidelines to produce OA versions of published works but that the payment models are complicated and based on a historical backdrop which to many seems inherently broken. I’d be interested to hear from someone with way more knowledge than me on the successes and failures or market research done on setting content free in this way.

It was clear to me in talking to a range of people at UKSG – librarians, publishers, content providers – that there are huge swathes of evidence missing – surprising, perhaps, from sectors which pride themselves on accuracy and academic rigour. When I asked “how many people aren’t coming to your site because search engines can’t see your content?” or “what is your e-commerce drop-out rate?” or “how much of your stuff do you estimate is illegally pirated?”, very few had coherent – (or even vague) (or any!) – answers.

More telling, perhaps, is that the informal straw poll question I posed to various people during the conference: “Do you feel that this is a healthy industry?” was almost always answered with a negative response. And when I asked why, the near-consistent reply was: “It’s too complicated; too political; too entangled” or from one person: “the internet has killed us”.

I’m really not as naive as I sometimes appear :-) I know how terribly, terribly hard it is to unpick enormous, political and emotive histories. When I suggest that “we need to start again”, I’m obviously not suggesting that we can wipe the slate clean and redefine the entire value proposition across a multi-billion dollar, multi-faceted industry. But I think – simply – that awareness of the networked environment, a knowledge of how people really use the web today and an open mind that things might need to change in profound ways are very powerful starting points in what will clearly be an ongoing, fraught and fascinating discussion.

Categories: api · museum
Tagged: , , , , , , , , , ,

Omeka – an online exhibits framework

March 17, 2008 · Leave a Comment

Tom Scheinfeldt contacted me through a comment on the Electronic Museum blog. He’s MD of the Center for History and New Media (CHNM) who among other things produce Zotero – a kind of semantic webby bookmarking toolbar.

Omeka logoCHNM have recently produced an open source application called Omeka (Swahili for “to display or lay out goods or wares”..) – a product specifically pitched at museums or other cultural institutions wanting to put their collections and exhibits on the web.

To date the offerings in this space tend to follow one of two distinct and reasonably unsatisfactory flavours: Either you choose an ‘out of the box’ templating and publishing system (albeit with the promise that you can “edit your own templates”) which come with systems like MultiMimsy or TMS, or you choose to start from scratch and build the entire thing from nothing.

Omeka - ExhibitThe former is generally pretty bad form for the user: most of these products are generic, badly designed and force museums to follow a prescribed path of development with little flexibility to change or choose their collections management system. The latter is complex and expensive, and although carries with it huge amounts of flexibility, it also has the burden of any bespoke system.

Tom and his team noticed that over the course of several years working with the museum sector that:

We found ourselves building more or less the same website over and over again, or at least the feature set

They also noted that although there were tools for curators, there weren’t any for educators or webmasters: the ‘front of house’ people who wanted to create online exhibitions. They decided that they would build some of these common approaches into a framework application for delivering narrative exhibitions online.

Omeka AdminOmeka is an open source application which you download and install on your LAMP web environment. It draws content in real time (i.e isn’t a “tick and publish” like many of the other systems in this space). At the moment you export your data from your collections management system and import it into Omeka for delivery to the web, but Tom was quick to point out that this is “just an intermediary step” and that they’re working on a database abstraction layer which will allow for “live sync” with existing collections managements systems. This is great news, and absolutely the direction that needs to be taken more in our sector.

Tom and his team used the metaphor of a blog to guide their thinking on development. They:

“…thought it should be as easy for museums to publish online exhibitions as it is for individuals to start a blog…and in many ways WordPress has been our model…

They have a drag and drop exhibit builder, a strong API and also a plugin architecture which allows users to add their own functionality. All of this is very positive news given the approaches taken to date with the systems I’ve mentioned above – very clunky, very web1 and with bad UI’s for both users and administrators.

I’m in the middle of installing Omeka to do some “real world” testing but it certainly looks and sounds very positive to me. If anyone out there has experience using Omeka (or the other systems I’ve been rude about) then please comment away. Examples of institutions using Omeka can be found on their website.

Categories: api · collections · community · content · exhibition · gallery · museum · objects · software · web2.0

The progress of content

January 8, 2008 · Leave a Comment

I’m just helping Brian Kelly author a paper on Openness in Museums for the Museums and the Web conference later in the year. It just stuck me that the movement of content around the web has followed / is following a pattern a little bit like this:

Phase I: content held as HTML within sites. Little or no interoperability. Content mostly viewed “on site”

Phase II: content held as XHTML within sites. Better markup means better SEO. Better SEO means that content starts to find its way out to the wider web

Phase III: content held as XHTML but also key bits of content (news in particular) syndicated out via RSS

Phase IV: content held as XHTML/XML; key segments syndicated via RSS (and some RDF) but additional movement of data via some “islands” of additional functionality such as API’s.

Phase V: content held as XHTML/XML, some/all syndicated via RSS, RDF, API’s but additional standards (oAuth, OpenSearch, Microformats etc) begin to ensure further interoperability between disparate sites

It’s a bit of a brain dump and please feel free to take it apart in the comments, but I thought I’d share it with you :-)

I’d say most big commercial sites are firmly at Phase III but moving towards IV; museums are mostly at Phase II but moving (slowly!) towards Phase III…

Categories: api · content · mashup · web2.0
Tagged:

Facebook poll: flawed, but do you care?

November 25, 2007 · Leave a Comment

Facebook pollThe long and frankly fairly boring (to those other than people like me, and probably you if you’re reading this..) debate continues about Facebook data – who owns it, who shares it, how it can be attributed, how open it is.

Techcrunch as always pile into the debate with a simple point and a simple question: do people actually care? According to the poll they’re running, the stats (so far) look like the graph on left (click to take part in the poll or the comments).

The only (major!) problem with the poll of course is that the Techcrunch readership is going to be almost entirely geektypes, the very ones who do care about the issue. Does my wife, though? Or her mates? Or their mates? Nope, not really.

I just asked around a bit and in general the majority of people have the same response:

“Yes, of course I care….but I haven’t actually read the Terms of Service, or know what they are doing with my data…but I still care…”

I made a point on an earlier post about the transparency of data and also the extreme ugliness (both in appearance and techwise under the hood) of MySpace and the fact that these services still continue to be among the most popular in the world. It’s disconcerting to people like me, but I don’t think it should stop us banging on about the goodness of doing things right. I think, patronising though it is, that us “professional types” carry a sort of duty to make this stuff what we believe it should be – open, platform agnostic, accessible, etc etc etc. But let’s not:

1) Get depressed and give up if our users don’t appear to care: at the end of the day, why should they?

2) Get happy and give up because the community seems to be in agreement…

Categories: api · community · content · facebook · innovation · museum · myspace · social network · web2.0

Amazon announces SLA for S3

October 9, 2007 · Leave a Comment

The cloud. Do some computing here.One of the fears which cloud computing – or any hosted application – brings out in museum and other IT professionals is that your up-time becomes reliant on services over which you have no control. I’ve always argued that although this is a real fear, it’s infinitely more likely that the ropy single machine you’ve got holding your museum website up is going to fall over than an application hosted with Amazon, Google or Yahoo on an enormous server farm.

For those who feel this may be a bit of a fatuous response, a recent post on the Amazon Web Service Blog may provide some more reassurance. They’ve announced that as of October 1st 2007, the Amazon S3 Service Level Agreement is in effect. It guarantees 99.9% monthly uptime, with service credits being paid back against your account for any time below the three nines. It seems likely that EC3 will be next, but this is still a beta service so it’s hardly suprising that they’re not offering it right now…

So there you go. Another reason not to compute in the cloud disappears.

Categories: api · innovation · museum · technology · web2.0

Commoditisation of IT. And ducks.

October 8, 2007 · 2 Comments

I said on a previous post that I’d write more about Simon Wardley’s excellent presentation at the Future of Web Apps conference. He’s now put the presentation on Slideshare but warns (and he’s right) that it’s not an easy one to digest without the audio. Apparently FOWA are going to be publishing the sound for free sometime but there’s no sign of it right now.

Simon’s presentation focused on a number of things which also feature large in my personal tag cloud. Not ducks (although I like them, too) but:

Ducks. Simon likes them.1. Commoditisation of IT – how the movement from new thing to utility service creates tensions as products move from competitive advantage to the cost of doing business

2. Innovation – how the shift from Today’s Hot Stuff to Tomorrow’s Boredom (or, as Tom Standage puts it, the move towards invisible technology) drives, and is driven by, commoditisation

3. How the “new world” of the API and computing in the cloud becomes a utility service: how in this day and age we should be looking at the cloud for IT services and not building and re-building each time we put an application on the web. It’s a view which I’m pushing as hard as I can whenever I can, and it’s lovely to see such an erudite set of slides on a subject area which isn’t the easiest to explain.

It also turns out that Simon has written about the Internet Of Things, Spimes and a bunch of other stuff which really tickle my interest, but these might have to wait until a later post…

One of the major contrasts with Simons presentation, as I said previously was that he really can present in an amusing and interesting way, which was in sharp contrast to pretty much everyone else at the FOWA conference. His presentation style reminded me very much of Dick Hardt’s now famous Identity 2.0 talk which you should check out if you haven’t already.

Categories: api · conference · fowa · innovation · technology · web2.0

Google checkout for non-profits

September 30, 2007 · Leave a Comment

Google CheckoutAccording to the Google Checkout Blog, the big G have just launched Google Checkout for Non-Profits – with no charges at all either per-transaction or percentage until at least the end of 2008.

So far though, this looks like it’s just for US based NFP’s. I’ll have a poke about the infosuperhighweb and see if there’s anything about non-US sites.

Google Checkout is a pretty nice bit of functionality – easy to bolt on to your site, with a number of options (from “paste this HTML for a button” to “integrate your code with the API”) – also the back-end reporting is pretty strong.

NFP’s will presumably jump at this one. Would be great to see it outside the US as well.

Categories: api · ecommerce · museum

Freebase is live

August 26, 2007 · Leave a Comment

Freebase logoFreebase has now opened its doors to anyone, at least for those who just want to browse and search. Looks like you’ll have to wait a while longer if you’re wanting to contribue. I’m still really interested in what Freebase brings to the party; how it compares and is different to Wikipedia – but most of all what such an open API can do for those of us mashing up data from across the web. When I get time (in about 2028 at this rate..) I’ll have a long hard look at their API and try a few ideas…

Meanwhile, there’s some lovely mashups already built – see for example CineSpin which is not only elegant and rather beautiful to look at but also extremely content rich, and (gasp) useful, too. There are more examples here.

Categories: api · community · folksonomy · mashup · web2.0