If you’ve never heard of Elasticsearch, you can certainly be excused. Here at 4GoodHosting we’ve got some pretty smart cookies around, but as a whole we’re a Canadian web hosting provider who’ll never claim to be entirely full of digital wherewithal. Truth be told I hadn’t heard of it either until recently, but no one had ever suggested to me that I should give a second thought to whether or not I’ll trust it as base for searching online.
Right then, get right to the definition you say. Elasticsearch is an open source search and analytics engine and data store developed by Elastic. The appeal of it has always been in the way it allows for searching through huge amounts of data with reasonable timeframes, and running calculations with resultant data in the blink of an eye.
However, recent news indicates that there’s a potential downside to using Elasticsearch, and sharing what we know about that is going to be the subject of this week’s entry here.
Legit Associations
Elasticsearch has been all over the headlines – well, industry headlines at least – recently, and not in a good way. It seems like each new week brings along a new story about a breached Elasticsearch server resulting in troves of data being exposed. But why is this happening with Elasticsearch buckets as predominantly as it has been, and is it legit to associate Elasticsearch with an ever-present risk of this happening?
The question then further becomes can businesses leveraging this otherwise very-helpful technology do so to the full extent while still avoid data leaks?
Organizations have been using this platform en masse to store information in depositories (aka ‘buckets’), the contents of which then become emails, spreadsheets, social media posts, files and any and all matter of raw data in the form of text, numbers, or geospatial data.
The problem for Elastic is that now it’s beyond debate that their storage option is leaving massive amounts of date unprotected and potentially exposed online. Sometimes this leak is disastrous, and the number of high-profile breaches attributed to use of Elasticsearch continues to grow.
nvenient as this sounds, it can be disastrous when mass amounts of data are left unprotected and exposed online. Unfortunately for Elastic, this has resulted in many high-profile breaches involving well-known brands from a variety of industries.
Where There’s Smoke..
Just this year alone, there’s been a few doozies related to Elasticsearch. Cosmetics giant Avon had 19 million records leaked, and an online genealogy service called Family Tree Maker had over 25GB of sensitive data made available as a result of it. Sport giant Decathlon also got bitten, with 123 million records leaked.
During 2020 alone, cosmetics giant Avon had 19 million records leaked on an Elasticsearch database. Another misconfigured bucket involving Family Tree Maker, an online genealogy service, experienced over 25GB of sensitive data exposed. The same happened with sports giant, Decathlon, which saw 123 million records leaked. Despite more than few insistences from the people at Elastic, it’s clear that there’s a fundamental risk factor here and people should be made aware of it.
At Issue
Those who choose to use cloud-based databases must be aware of the inherent risks that come with that, as well as performing the necessary due diligence to configure and secure every corner of the system. Shared research indicates this necessity is often being overlooked or just plain ignored, so we can say that the problem with Elasticsearch in part has to do with the shortcomings of some of those using it.
One contributing security researcher even determined how long it would take for hackers to locate, attack, and exploit an unprotected Elasticsearch server when purposely left exposed online. That task was completed in eight hours. Not a short period of time, but also not too long and especially if there’s something significant in it for you if you’ll be the one arranging the leak.
Cloud storage technology is going to continue to be eagerly adopted, and it’s safe to say by this point that nothing is going to curb that eagerness. While cloud technologies certainly have their benefits, improper use of them has very negative consequences. Failing or refusing to understand the security ramifications of this technology can have very serious fallouts, and we’re seeing that now.
As it relates to Elasticsearch, just because a product is freely available and highly scalable doesn’t mean skipping the basic security recommendations and configurations is advisable. In fact, it’s not advisable at all. The problem is that some organizations are putting less of priority on data privacy and security have and more of one on profit as they aim to capitalize on the data-gold rush.
Multiple Breach Methods
Is there only one attack vector for a server to be breached? Not really. In truth, there are a variety of different ways for the contents of a server to be leaked – a password being stolen, hackers infiltrating systems, or even the threat of an insider breaching from within the protected environment itself. The most common, however, occurs when a database is left online without any security (even lacking a password), leaving it open for anyone to access the data.
A lot of what we’re seeing here, if we’re going to be plain about it, is attributable to a poor understanding of the Elasticsearch security features and what is expected from organizations when protecting sensitive customer data. That data security is automatically attributed as a responsibility of the cloud service provider simply isn’t true.
More often than not any attempt at that results in misconfigured or under-protected servers. Cloud security is – and should be - a shared responsibility between the organization’s security team and the cloud service provider.
What we can say is that the organization itself – in this case Elastic - owns the responsibility to perform the necessary due diligence to configure and secure every corner of the system properly to mitigate any potential risks.
To effectively avoid Elasticsearch (or similar) data breaches, a different mindset to data security is required and one that allows data to be a) protected wherever it may exist, and b) by whomever may be managing it on their behalf. This is why a data-centric security model is more appropriate, as it allows a company to secure data and use it while it is protected for analytics and data sharing on cloud-based resources.
Standard encryption-based security is one way to do this, but encryption methods can be a headache and the farthest thing from straightforward. Also, many encryption algorithms can be easily cracked. Tokenization is the better choice, and that’s really what should be seen here if the product manufacturer is seriously interested in rectifying this situation.
Tokenization is a data-centric security method that replaces sensitive information with innocuous representational tokens. So even if the data falls into the wrong hands, no clear meaning can be derived from the tokens. Sensitive information remains protected, and the malicious intention types have no means of capitalizing on the breach and helping themselves to available data that’s not deciphered.
Don’t sour on cloud storage just yet, but if you’re putting sensitive data into the cloud and doing so a large-scale then do be sure to do your homework and be explicitly in the know about what can (and needs) to be done to minimize the risks of data leaks.