How Bloomberg bet on an open source project to re-rank search results.
Earlier this year, Bloomberg reached a milestone in open source development with the incorporation of the Learning-to-Rank plug-in into Apache Solr 6.4.0. The release of the plug-in was the culmination of a year’s worth of close collaboration between two groups of Bloomberg software engineers in New York and London and the open source project’s community to make it easier to re-rank search results using machine learning.
In an exclusive Q&A with Computer Business Review’s James Nunns , software engineers and project collaborators Diego Ceccarelli, Michael Nilsson and Christine Poerschke at Bloomberg (who also served as the Apache Lucene/Solr committer in this process) shared insights about their experience, challenges and learnings.
JN: Tell me a little about the project and what you hoped to achieve from it? How did the idea come about? What kind of work did you have to put into it and how much planning went into it?
Diego Ceccarelli: “Our project was intended to add Learning-to-Rank (LTR) functionality to open source enterprise search platform Apache Solr in order to improve both Federated Search and News Search on the Bloomberg Terminal. LTR is a technique for improving the relevance and performance of search that was proposed in academia more than 10 years ago. Today, several major commercial search engines use this technique but, although there is some software written to extend it on the web, we realized it didn’t exist inside Solr, which we use to power search across a number of Terminal functions.”
Michael Nilsson: “As we began to dig more, we discovered that other teams within Bloomberg Engineering were also looking for a better re-ranking framework for search results. Teams in New York and London had slightly different needs but started working together to develop a small prototype for Solr that used machine learning models to create an infrastructure that could be customized. From start to finish, this ‘side project’ took us about a year-and-a-half to complete and ship.”
JN: What’s your history of working with open source tools?
Christine Poerschke: “I first started working with open source software in 2012, around the time of the London Olympics, as part of my team’s effort to migrate Bloomberg’s news search backend from a proprietary third-party solution to Apache Solr, an open source search platform.
“We started out simply using Solr, then found some bugs and contributed back fixes. Over time, we got even more involved. For example, I was named to the Apache Lucene Project Management Committee (PMC) earlier this year.”
Diego: “I did a Ph.D. and postdoc on search before joining Bloomberg to work on relevance in the News Search team. I started using Solr back in 2010, so I had some experience with the platform. I like the idea of open source – the idea that people from different countries can work together on a project, access the code, see how it works and learn from each other.
“For years, Bloomberg has contributed to open source tools across many domains, including Solr. Some of the tools created by our teams have been released as open source projects, and recently one of those has been adopted by another project, so we’re feeding the virtuous cycle of open source software development.”
JN: Why have you chosen to work with open source tools? How much of your IT estate relies on this?
Michael: “Bloomberg prefers to use open source tools across its infrastructure, even when doing so requires us to enhance or extend those tools to meet our unique challenges. The ability to study and improve the software is fundamentally important, and rarely an option with proprietary software tools. This leads to our Engineering teams hiring from open source project communities, and training existing team members on open source tools and how to collaborate with their communities.
“While our infrastructure is a mixture of open source and proprietary (both commercial and home-grown) tools, each time we consider replacing an infrastructure component to address our growing needs, we investigate all possible open source options first.”
JN: Have open source projects thrown up any unique challenges?
Diego: “A major challenge we faced was how to model our problem so that it would be general enough to apply to many different use cases. The plug-in we developed sits somewhere between the search engine and machine learning and we wanted to make this accessible to either community. We wanted to make machine learning more easily accessible to search engineers, and search more accessible to machine learning engineers.”
Christine: “The plugin-in had to be independent and mature enough to stand on its own. Only then would people other than the original developers be able to make changes and write further extensions to it in future.”
Michael: “We had two types of team involved – first, those who had the ranking mechanism already, but wanted to add it to Solr so that they could get additional information they didn’t previously have access to. The others were already using Solr but wanted to be able to add the LTR ability. We had to find a balance between the needs of these two teams.”
JN: What kind of feedback did you get back from boardroom positions when going through the project?
Diego: “We started with a clear shared objective of improving the relevance and performance of our search applications. We quickly came to a consensus that a Solr-based LTR and machine learning plug-in was the right solution and it made sense to collaborate across teams to realise it. Management was supportive from the outset.”
JN: What advice would you give to those embarking on an open source project?
Diego: “You don’t need to dive straight into coding to get involved in open source. I would recommend getting your feet wet by getting involved in the community slowly. After you take that initial step of replying to a thread or helping identify a bug, the rest comes naturally.”
Christine: “Open source is a community, not just free code: the ideal is to give and take. As a good citizen, you use the software for free, while enabling and empowering your employees to participate and contribute back within the open source community.”