If a dedicated line can’t slurp all your data to the cloud, here’s what has to happen…
How do you start the migration of a massive data storage from your own data centres to the cloud? This is something Photobox Group’s CTO Richard Orme had to contemplate and deal with in granular detail over the hot summer months.
Despite the unprecedented heatwave, it ended up entailing a lot of snowballs.
Richard was tasked with moving Photobox’s nine petabytes of data from the data centres the company maintained to a new cloud provider.
Photobox is one of the largest image-hosting website platforms in the world. Their customers can upload their photos and then order personalised canvases, photobooks, prints and mugs.
Richard told Computer Business Review that the company typically “ingests anywhere between three and five million photos a day. At a peak that can be a million photos an hour. So it is really an intensive front end that we’ve got.”
“We have as a result of that about 6.5 billion photos that we store. That equates to about 9 petabytes of data.” To put a petabyte of data into context one petabyte is roughly equal to running a HD-Video stream constantly for 13 years.
It is a significant amount of data to be maintaining on-premises and eventually Photobox “decided that we were having to spend more time than we were comfortable with maintaining and running those data centres,” Orme notes.
Photobox’s Cloud Migration Voyage
Richard Orme and his team took their time and went over their options, met with the different market players, from IBM to Oracle, via Google and AWS.
“All of them slightly different, there was always more upside available to us than just the storage, either that was going to be advanced data and analytics capabilities or that was going to be advanced computer vision like the data we could extract out of the photos at the time of upload, or it was advanced machine learning that was offered,” Orme comments.
In the end they went with market leader Amazon; a company they had a working relationship with due to the fact that AWS was already hosting Photobox’s website.
Actually Moving 9 Petabytes
At that point came the new question of how on Earth do you actually move nine petabytes of data from your data centres to Amazon’s?
One of the first options they looked at was copying the data via a dedicated line straight into Amazon’s infrastructure, however, they couldn’t achieve the bandwidth they required out of the data centre so this idea was scrapped.
Enter the Snowball, an Amazon device that holds up to 100 terabytes of data.
A petabyte is a 1000 terabytes, something Richard is acutely aware of as he noted: “We kinda did our maths and went, we are going to need a lot of those.”
“The alternative is a Snowmobile, which is a big truck that they come and park outside your data centre and plug it in, but they didn’t have it in Europe when we started this work, so it’s going to have to be snowballs.”
“So then we did a bunch of calculations, we were going to need to have about 90 of these things, and we are going to need to be filling four for transporting and draining four at any given time,” Richard told us.
This involved an intense piece of logistical planning between Photobox, UPS and Amazon.
Before anything could happen Richards’s team had to write some custom software so they were able to take the images out of their stores and put them into the Snowballs. Then once Amazon had put the copied data into their data centre it had to be reconciled with Photobox’s records: e.g. is this the same photo? Does all the data match what was sent? Only once all this was confirmed could Photobox remove the image from their stores.
“We had some stumbling blocks,” Richard admits.
“First of all the storage software that we have, we trying to pull files out of it at pace and it was really complaining, we were really worried, because what we though was going to take a year now looked like it might take us three years.”
Yet he says the infrastructure and software engineering teams at Photobox Group really came through for the company. They focused on the issue and ended up with a system that pulled the data from the stores faster than they though was possible.
However, now the issue was that the Snowball devices couldn’t get the files fed into them fast enough, they would fill it to at a certain rate and it would begin to slow down.
According to Richard Orme AWS took the challenge on board: the company’s head of storage came to the UK and “a day later we were dealing with the engineering teams at four o’ clock in the morning.”
“They offered all kinds of solutions at their cost to make this work for us. In the end we got to a place now where we are filling the snowballs faster than we ever though we would and amazon are emptying them faster than they thought they would.”
“So actually as a result of this, their snowballs are better for the whole customer base.”
It was a winning result all round and in the end Richard and his team managed to complete the data migration a comfortable five months ahead of their schedule.