Archive for April, 2016

What is colocation?

Posted by Adrien Tibi

Moving your existing servers out of your own premises and in to a data centre improves reliability and security while granting access to sophisticated connectivity and hybrid cloud computing options.

IT and applications are increasingly moving to the cloud, but there are still a great many servers sitting in offices around the world. These servers can be a drain on their owners, who have to manage and maintain them, and they still remain vulnerable to a wide range of business risks including fire, theft and connectivity outage.

When the investment in equipment has already been made, and you want to continue using it, but you want to benefit from a more secure environment, better access to connectivity and higher levels of resilience, colocation is the solution.

What is colocation?

Colocation is the act of housing your physical servers in someone else’s data centre.

By colocating your servers within a managed data centre, you can obtain many of the benefits of the data centre environment and business model without the need to invest in new dedicated servers or migrate your applications to virtual ones.

Key benefits of collocating your servers include improved business continuity capability, enhanced connectivity options and access to hybrid cloud options.

Business continuity

No business can affordably create an IT environment as safe and secure as that of a Tier 3 data centre – the tier of data centre that most respectable hosting providers operate.

A Tier 3 data centre offers, amongst other things:

All things that you, probably, cannot replicate on your own site and which can, in the event of a system outage or service interruption, keep your servers online and available.

Connectivity

With increasing globalisation, more mobile workers and BYOD trends, servers connected to your building’s internet pipe are not going to meet the connectivity needs of your organisation.

In a colocation facility, your servers can be given state of the art connectivity. This includes but is not limited to:

Hybrid cloud

Data centres are, naturally, the place where public, private and bare metal clouds live. Colocating your own servers with providers of these other platforms will make it easier integrate them into a hybrid cloud solution – both technically and financially.

By colocating within a data centre, your servers can easily be connected with a wide range of complementary technologies that address specific requirements for your business. These could include solutions like:

A solution for now, and the future

From your first steps into cloud computing through to a fully mature cloud strategy, colocation plays a valuable role at every stage of cloud adoption. For some services, running anything but your own hardware may not be an option but by colocating you’re getting access to all the benefits of the data centre environment all the same.

Apache Spark vs Hadoop: What’s best for managing Big Data?

Posted by Adrien Tibi

Apache Spark and Hadoop are both frameworks of platforms, systems and tools that are used for real time Big Data and BI analytics, but which one is the best for your data management?

According to Bernard Marr at Forbes, Spark has overtaken Hadoop as the most active open source Big Data project. While Hadoop has dominated the field since the late 2000s, Spark has more recently come to prominence as a big hitter

However, a quick look at Google Trends shows us that while interest in Spark has been on the rise since around November 2013 it’s still completely dwarfed by Hadoop.

Google suggests that in March 2016 interest in Hadoop equalled its all-time peak, but Spark has only ever achieved around 44% of Hadoop’s peak interest level. Incidentally, Spark’s own March 2016 peak is only up 3% from its previous high point in June 2015, so growth in interest does seem to have slowed.

Ready to deploy on bare metal? Create your free account and start configuring your bare metal servers here.

So what is Spark, and how is it competing with the Hadoop elephant?

What is Apache Spark?

At its simplest, Apache Spark is a data processor. Like Hadoop, it is open source, and provides a range of connected tools for managing Big Data. It’s often considered a more advanced product than Hadoop, and is proving popular with companies that need to analyse and store vast quantities of data.

The Spark team are clear on who they view as their competition, suggesting that that their engine can run programs “up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.” If that’s true, then why has interest in Hadoop contiued to rise?

Comparing Spark and Hadoop

The answer is that both products have their strengths and weaknesses, and in many cases their use is not mutually exclusive.

1) Performance

By processing data in-memory Spark reduces latency almost to 0, but can be extremely demanding in terms of memory as it caches processes. This means that if it is running on top of Hadoop YARN or systems that also have high-memory demand, it might be deprived of the resource it needs to perform efficiently.

By contrast, Hadoop MapReduce kills each process once a task is completed, which makes it leaner and more effective to run alongside other resource demanding services. Spark is a classic only-child, it works best in dedicated clusters, whilst Hadoop plays well with others.

2) Costs

Although both software products are open-source and thus free to use, Spark requires a lot of RAM to run in-memory, and thus the individual systems required to run it cost more. However, this is balanced out by the fact that it requires far fewer machines to process large volumes of data, with one test successfully using it to sort 100 TB of data three times faster than Hadoop MapReduce on 10% of the machines.

3) Ease of Use

Spark is generally regarded as easier to use than MapReduce, as it comes packaged with APIs for Java, Python and Spark SQL. This helps users to code in their most familiar languages, and Spark’s interactive mode can help developers and users get immediate feedback for queries.

4) Scalability

Both systems are scalable using the Java-based file system HDFS. Hadoop’s age means that it has been used for high profile large infrastructures: Yahoo has over 100,000 CPUs in over 40,000 servers running Hadoop, with 4500 nodes in its largest cluster. According to the Spark team the largest known cluster has 8000 nodes.

5) Security

Hadoop’s Kerberos authentication support can make security difficult to manage. While Spark lacks secure authentication, it benefits from sharing Hadoop’s HDFS support for access control lists and file level permissions.

Overall, Hadoop comes out on top for security, but Spark benefits from its strengths.

Conclusion

The good news is that the two systems are compatible. Spark benefits from a lot of Hadoop’s strengeths via HDFS, while adding speed and ease of use that the older project lacks.

If you need to process huge quantities of data, and you can dedicate systems to process it, then Spark is likely to be better, easier to use and more cost-effective for your project. However, if you need scalability and for your solution to run alongside other resource-demanding services, Hadoop MapReduce will probably be a safer bet.

Build your bare metal cloud

Speak to an advisor for a completely free consultation or create a free account and start configuring servers now