Exploring machine learning in the cloud with AWS

There has been a resurgence of interest in machine learning in recent years. The ability to uncover hidden patterns and derive insights from vast, complex datasets is exceptionally valuable in an age where sensors are omnipresent and storage is cheap. Machine learning’s current status as the technology ‘buzzword of the year’ does not diminish the enormous potential of this technology in applications ranging from healthcare to self-driving cars.

Despite this, there are still considerable complexities in the design and implementation of a machine learning solution. Computational power is one of the core challenges; training a machine learning algorithm involves massive numbers of floating point operations, which regular computers perform relatively slowly and inefficiently. Cloud-based machine learning solutions can address this by offering access to highly optimised hardware that can perform the calculations at a fraction of the time and energy required by a traditional computer. AWS P3 instances are a great example of these specialised machines, providing up to one petaflop of floating point performance – meaning the machine can complete 1,000,000,000,000,000 floating point operations per second. This is around 1000 time faster than the average consumer PC. Cloud solutions can also reduce the cost and complexity of implementing a machine learning solution by providing developers with a pre-provisioned and configured platform to work with.

AWS recently hosted a seminar in Melbourne providing an overview of their machine learning platforms and capabilities, which I attended to explore opportunities for integration with Umps Health’s analytics platform. The session provided an excellent overview of the AWS offerings, which are broadly divided into three categories:

  1. Application Services
    These are API-based services for specific language and visual applications. These services are a simple and relatively low cost way to perform common machine learning tasks like image recognition, natural language processing, and translation. The pricing is based on the number of API requests made.
  1. Platform Services
    The most important service in this category is SageMaker, a managed service which enables a machine learning solution to be built, trained, and deployed using a simple modular interface. Sagemaker eliminates the complexities involved in platform implementation and enables users to focus on more important aspects – the data, algorithms, and insights. SageMaker’s pricing structure is based on usage time, billed per second.
  1. Frameworks and AMIs
    AWS also offers specialised machine learning Amazon Machine Images (AMIs). These are virtual machines running on optimised hardware, which are preconfigured with a range of popular machine learning tools like Apache MXNet, TensorFlow, and PyTorch. This option gives developers the maximum amount of control over the machine, but requires a greater level of technical knowledge to get the best results. The pricing model is based on hours of virtual machine uptime.

It’s important to note that (as of March 2018), SageMaker is only available in the US and EU regions. This means that users outside these regions (in Australia, for example) can still use SageMaker, but their data must be transferred to one of these regions for processing – a major problem for personal or sensitive data, which is generally subject to strict privacy laws and limitations on cross-border disclosures. Machine learning application services and AMI instances are already available worldwide, so most users will be able to run analytics with these services in the region in which their data was collected.

The seminar concluded with a demonstration of AWS DeepLens, a camera that analyses video using deep learning in real time and securely integrates with other AWS services. The accuracy of its object recognition algorithms was generally very good, and as a proof-of-concept and development tool DeepLens is impressive.

By eliminating the need for prohibitively expensive hardware and streamlining the implementation process, cloud-based machine solutions can significantly lower the cost and complexity of creating a powerful machine learning platform. However, as with any solution that handles personal data, privacy and security must remain a top priority.