Friday, February 5, 2010

Gene Wiki stats for January 2010

Whenever I lack for inspiration for a blog post, or whenever we need a pick-me-up in the form of evidence that our work is being used, I fire off an update of our usage metrics. Yesterday, I ran an update of the Gene Wiki stats.

For the month of January 2010:

  • The 9860 Gene Wiki pages were viewed 3,016,227 times.

  • 91% of Gene Wiki pages are in the top 8 Google results when searching by gene symbol (43% are the top hit).

  • A total of 949 human edits were performed by 346 unique editors.

  • An additional 115 edits were performed by bots.

  • Total text content grew by 121 kilobytes, approximately equal to one PLoS Biology article.

  • Total text content now stands at 69.73 megabytes, approximately equal to 575 PLoS Biology articles (slightly more than were actually published in PLoS Biology in 2008 and 2009)


I've uploaded the full statistics if anyone would like to see.

That completes today's navel gazing session....

Monday, February 1, 2010

BioGPS is going social

The blog is getting a bit of a slow start to 2010 here at Chez BioGPS, but our developers certainly aren't. They've been hard at work on some great new features, and we'll have a couple announcements here on the blog soon.

But I thought a great way to kick off the new year would be to introduce one of our primary initiatives for the coming year. Yes, we're jumping on the bandwagon -- BioGPS is going social.

The value of creating social networks has been demonstrated by numerous non-scientific applications. FaceBook, MySpace, LinkedIn, Twitter, Friendfeed, etc are all great examples of sites that provide users value through their social networks. Clearly our goal is not to duplicate any of these existing social networks. We also don't want to just rebrand the same ideas targeted at a different audience. So how will BioGPS be different than a generic "Facebook for scientists"?

We believe that the value of social networks doesn't lie in the number of users you have, and it doesn't lie simply in the network of connections between users. Rather, the value of all social networking applications lies primarily in the objects that users contribute to the network. These objects are the currency of communication and interaction.

In a social network meant for socializing, these primary objects of currency are personal things -- pictures, events, and stories. In a social network for professional networking, the objects of currency are work history, business interests, and employment opportunities.

So what's the currency in a social network for biologists? In BioGPS, it's all about the genes. Specifically, BioGPS users will be able to manage all the gene-related objects that are relevant to them -- gene lists, plugins, gene report layouts, and data sets. We already have lots of users coming to BioGPS to learn about genes, and now we're building the social networking infrastructure to enable users to share these objects with others.

Importantly, a social network centered on genes facilitates biological serendipity. When you create a new BioGPS gene list, wouldn't it be great to see that your colleague Jane has another list with a significant overlap to yours? When you search for resources on SNPs, maybe John just registered a plugin for a SNP annotation site he's developed. When you search BioGPS for "diabetes", wouldn't it be useful to see all the relevant genes, gene lists, plugins, layouts and data sets that are shared in your network?

We think that these social networking possibilities are exciting, and we're looking forward to rolling out many of these features in the coming year.

Thursday, December 10, 2009

BioGPS is now an iPhone App!


Just in time for the holidays, we've got some exciting news for you. BioGPS is no longer constrained to full-size desktop or laptop computers. Now you can fit the world's most powerful gene annotation portal in your pocket! Introducing the BioGPS App for iPhone® and iPod touch®, available for Free on the iTunes App Store.

We collaborated with the team at In Situ Mobile Designs, LLC to bring the knowledge of the genome to you, wherever you are. Never be stuck again, needing to trek from your lab bench back to your desk, only to look up something simple.

This is our first version, so for now you can only use the standard set of layouts. We've got a lot of improvements planned for the app, still yet to come. Give us your feedback and let us know what you'd like to see most. Enjoy!

[Ed. note: If you like the iPhone app, be sure to rate it in the App Store. Unless we get a substantial user feedback (minimum of, say, 100 ratings), then it's unlikely we'll spend more development effort on it... Of course, five-star ratings are preferred...]

Friday, December 4, 2009

Raw data download

One of the primary reasons scientists come to BioGPS is to view our reference gene expression data sets in a simple bar chart form. And for the bioinformaticians, we've always provided the data sets for download on our BioGPS downloads page. Now, we have one more mechanism for users to access the raw data.



Using the new "Downloads" tab in the gene expression chart plugin, users can now download data on a gene-by-gene basis. This feature is useful for people who just want to plot data from one gene for a publication, and for cases when dominant expression in one tissue obscures variation in other tissues in the bar chart.

We hope you find this feature useful. As always, feedback welcome...

Monday, November 30, 2009

BioGPS profile in BioTechniques

Prompted by the recent publication of the BioGPS paper in Genome Biology, the folks at BioTechniques recently wrote a profile of BioGPS. It's a very nice overview of BioGPS and its goals.

For those who are interested in more of the gory details, I'm pasting the entire email "interview" below. Enjoy...


Why did the research team decide to create BioGPS? Was it in support of any research or specific needs of Novartis?

BioGPS grew out of a need facing many organizations, both academic and commercial, who are doing genome-scale science. Say you do some sort of profiling experiment (microarray, next gen sequencing, etc.) and find that there are 10 candidate genes highlighted in your study. How do you learn what's known about these genes? There are hundreds of public, gene-centric resources available, and many organizations also have internal databases with proprietary gene annotation information. The problem that BioGPS addresses is how to aggregate data from all these separate online databases into one application and interface.

Could you briefly describe what makes this technology different from other genomic organization platforms out there?

Development of BioGPS has focused on two relatively unique features. First, we emphasize the idea of "community extensibility". BioGPS utilizes a very simple plugin interface that allows most gene-centric external databases to be easily included in our plugin library. Currently we have over 250 plugins registered by over 40 unique users, and we allow any registered user to add new plugins. Second, we emphasize the idea of "user customizability". Most gene-centric databases tell the user what they think the user should know about their gene of interest. In contrast, BioGPS allows users to individually combine and arrange plugins into "layouts", enabling each user to define for themselves what content they find most useful.

Why should researchers switch to your technology?

We won't go so far as to say BioGPS is a replacement for any other tool that's out there. Rather, we view BioGPS has a new tool that is complementary to existing resources. But we definitely feel that the design principles behind BioGPS and the emphasis on "community intelligence" are unique among gene annotation databases.

Do you think in the future such databases will follow a model similar to yours? Why? What is inadequate about the other technology currently in use?

The real difference between BioGPS and other mechanisms of aggregating data is in the degree of data structure. For example, approaches like Semantic Web and Distributed Annotation System (DAS) are based on a very structured data exchange format -- every piece of data that's transmitted is "typed". The advantage is that consumers of those data can do very powerful analyses based on combining data from multiple sources. However, relatively speaking, it requires quite a bit of programming sophistication to become a DAS or Semantic Web data provider.

In contrast, BioGPS utilizes a completely unstructured data format based on HTML, the language of the web. The advantage is that it's very easy to become a data provider because it's very easy to create web pages. Again, as evidence, we have over 250 plugins registered by over 40 unique users and spanning over 100 unique domain names, and the developers behind those plugins range from the big annotation authorities (NCBI, Ensembl, UniProt, etc.) all the way down to the single postdoc or grad student developing a simple web site using perl or python. Although we're somewhat more limited in how we can integrate data from multiple sources, the BioGPS model focuses on attracting as large a developer community as possible to contribute to a single gene annotation platform.

Does Novartis have any other genetic-based technology in development related to organization or communication capabilities?

My group has also been involved in developing the Gene Wiki, the goal of which is to collaboratively annotate the function of all human genes. In short, BioGPS applies the principle of community intelligence to the development of a gene annotation database, while the Gene Wiki applies community intelligence to direct annotation of gene function.

How does BioGPS address community intelligence?

The defining characteristic of community intelligence is that each user benefits from the aggregated usage patterns of the entire community of users. Let's take a concrete example. Suppose I want to find all resources that have information on SNPs for my favorite gene. In the pre-BioGPS days, you would probably start with a web search and spend quite a bit of time clicking on links and browsing candidate web sites to see which had that information. Undoubtedly, that process had been repeated before and will be repeated again by other scientists elsewhere asking that same question.

Using BioGPS, you would instead simply search the plugin library for the keyword "SNP" to find all the relevant gene-centric resources that other BioGPS users have registered. Searching on a gene in BioGPS will take you directly to the relevant page for each plugin. Moreover, the list of SNP plugins is ranked by popularity, so you can see which specific SNP plugin is most used by other users in the BioGPS community.

What inspired you to name the technology BioGPS?

The meaning of the name has changed several times. Currently, we're telling people that BioGPS is a tool to "navigate the landscape of gene annotation resources". Even though it wasn't the original meaning, I think it's a pretty good analogy for why people use BioGPS.

Tuesday, November 24, 2009

The retirement of SymAtlas

Although SymAtlas has served us well over the years, it's finally time for its graceful retirement. We've been keeping it running on life support for quite some time now (with varying degrees of success).

We've tried to make the transition from SymAtlas to BioGPS a gradual one. In BioGPS's early stages, we had them running in parallel, and we purposely modeled the BioGPS gene report after the SymAtlas design. In August 2008, we posted a prominent link from SymAtlas to BioGPS, noting our plans for the future. In March 2009, we then actively started redirecting traffic from the SymAtlas home page to BioGPS, while leaving users a "backdoor" to access the old site. A couple months ago, we started redirecting "deep links" to specific SymAtlas pages to the corresponding BioGPS pages (again, leaving a "backdoor" for resistant users). At that time, we also forced users from the SymAtlas front page to go to BioGPS (a move that in retrospect I wish we'd delayed).

Starting shortly after the Thanksgiving holiday, this transition will be complete. We will force all traffic from SymAtlas to BioGPS with no exceptions. If you had a bookmarked link or were following a link to specific SymAtlas page, we'll try to get you to the right page in BioGPS. If you've been resistant to migrate to BioGPS, you should know that acceptance by the majority of our the community has been phenomenal. Moreover, over 80% of users who were given the choice between continuing to BioGPS and going back to SymAtlas chose the "right option".

If there are still loyal SymAtlas hold outs, please let us know why you haven't migrated over to BioGPS. We'll do our best to satisfy your needs. But overall, we strongly believe that BioGPS offers a brighter future with more and better features.

Monday, November 23, 2009

Correlation search


Many of you know that the precursor to BioGPS was another web tool called SymAtlas. While SymAtlas was great, BioGPS was meant to expand on that design by focusing on community extensibility and user customizability. We are very proud that BioGPS handled all of the key SymAtlas use cases...

... with one notable exception, that is. BioGPS lacked the ability to search for genes by expression pattern, a commonly-used feature in SymAtlas. And our users let us know over and over and over and over again that we needed to add this feature before officially retiring SymAtlas.

We're happy to report now that search by correlation is now an official feature in BioGPS. Follow the three steps shown at the right to see how it works. As always, let us know if you have any feedback on this new feature.