11/30/2021
I am currently pursuing a Master’s Degree in Cybersecurity at UC Berkeley, and as part of my final project for the Fall semester’s Cybersecurity in Context course, I put together a case study on the LinkedIn data breach of 2012.
The slides can be found here, but as the presentation could only be 15 minutes long, there is only so much I could include in that time frame.
What follows is a more detailed overview of the data breach, and why I think it’s an important case study to look at when studying cybersecurity.
The hack actually started with some brilliant social engineering. The hacker used LinkedIn to target LinkedIn employees with “DevOps” or “SRE” in their title.
This is quite brilliant on the hacker’s part, because not only was he able to easily search for a target, but you could also see what technologies these people were adept at, thus giving you some insight into what technologies you might need to use in order to eventually hack into the desired system.
Really makes you wonder if putting so much of our professional information on LinkedIn is such a good idea, as you could be possibly arming potential attackers with information about a system they’re trying to break into.
The hacker found a LinkedIn employee who happened to also self-host his website. While there was nothing of interest on that site for the hacker, he also found a different site being hosted on the same server. A WordPress site.
Now, most people in tech know WordPress is terribly vulerable to exploits. The hacker knew this too, and uploaded a malicious PHP script (essentially using SQL injection), which granted him admin access to the server where the site was being hosted.
Poking around the server, the hacker found it was actually a VM… running on an iMac. The employee’s personal iMac. I am not entirely clear on how the hacker was able to hop from the VM onto the iMac, but from there, the hacker fonud an RSA key, which granted him access to LinkedIn. Boom. He was in.
LinkedIn did not find out about this hack until about 3 months later, when a hacker posted on a message board asking for help cracking passwords. They were now in a race against time. Could they find the hacker before he cracked the passwords?
Spoiler alert: no. More on what LinkedIn could have done here later.
The tracing part is one of the aspects that I find most interesting here for a couple of reasons:
Once LinkedIn found out they had a breach, they got to work on trying to find where it came from. They eventually find logins from Russia from the RSA key of the employee mentioned earlier, who lived in California.
LinkedIn reaches out to the employee and asks if he had been to Russia lately. The employee says no. Okay, that’s one place to start. Looking at those requests from Russia, LinkedIn is able to get an IP address, and looking at those requests, they find something interesting. A custom user-agent.
If you’re not familiar with user-agents, they are a string you can add to the header of an HTTP request. Applications use this to self-identify. For example, a request made with Google Chrome would have a user-agent of “Google Chrome.”
In this case, the hacker was using a custom user-agent called “Sputnik”. The combination of the IP address and the user-agent allowed LinkedIn to more easily trace the hacker’s activity through their systems.
Lo and behold, they find a login to their website from this IP address, and with this same user-agent. Remember, the hacker used LinkedIn to target LinkedIn employees! So LinkedIn was then able to get the user associated with this IP and user-agent.
The hacker was smart enough not to include any PII in that LinkedIn account, but the email address was a Gmail address. Now that LinkedIn knows the attack came from a nation state, and armed with the IP address and suspected email address, LinkedIn works with the FBI on further tracing efforts.
Since Google is a US based company, the FBI is able to subpoena user data for the email address in question. They get a data dump which includes email messages and search terms the user searched for.
The search terms literally included “wordpress vulnerabilities”, which help prove intent here (more on this later).
From the email messages, they find some accounts linked to this email that the FBI then uses to do further tracing. One of the accounts was an account to a US-based video game message forum. Once again, subpoena to the rescue here.
From the data dump they’re able to get from this video game message board, the FBI finds banking data, and another email address. Both the banking data and email address belong to Russian companies, so this is not something they can subpoena; but the email address (a mail.ru) address happened to have a Gmail equivalent!
Once again, the FBI subpoenas data from Google on this second email address, and this gmail address eventually leads them to attribute the hack to one Yevgeniy Nikulin.
Nikulin is eventually aprehended while he’s vacationing in the Czech Republic. One of the ways in which he was found out was by “driving around in a flashy car and spending liberally”.
They had their guy, and the case went to trial.
Let’s switch gears for a bit and talk about what LinkedIn could have done better here.
First of all, they were using SHA-1 hashing for their passwords. I didn’t know much about hashing before I researched this case study, but now I know quite a bit.
SHA-1 was never meant to be used for password protection, but was actually designed for things like message authentication and data validation (kind of like an MD5, I assume, where you can use the hash to do a checksum).
Not only that, but they were also not salting their passwords. I also didn’t know much about salting before this, but essentially, salting introduces some randomness to the hash, so that even if two strings are the same, they don’t generate the same hash.
This means that if you don’t salt your passwords, hackers only have to crack a password once before they can check their data dump for more instances of the same hash they just cracked. So if you’re not salting your passwords, you might as well just hand over your database records to the dark web now.
Another thing that could have helped mitigate this issue would be to follow CIS Controls more closely.
Specifically, I keep coming back to:
First of all, you can’t claim secure configurations of hardware and software if you’re allowing employees to use their personal devices (phones notwithstanding, but that’s a different story).
Secondly, thinking about controled use of admin privileges, this typically is meant to have companies restrict admin access, but I think it’s not just about restricting admin access, but closely monitoring who has admin access. Ensuring that employees with admin access do not abuse that privilege by logging in from unsanctioned devices.
The impact of this breach was felt in waves, because we just kept uncovering data.
First, it was thought only 6.5 million accounts had been affected in 2012. 4 years later in 2016, Motherboard reported that LinkedIn data from 160+ million accounts was up for sale on the dark web.
LinkedIn took a look at the data and was able to verify that this was not new data, but data from the 2012 breach.
Here’s where I may get controversial.
In 2012, LinkedIn only forced users who had been affected to reset their passwords.
Resetting their password would actually get users on their “new” system, which was using stronger encryption and salt.
If they had forced their entire user base to reset their passwords, the rest of the data that, unbenknownst to them, was still out there, would have been rendered obsolete.
Say LinkedIn did force their entire user base to reset their passwords in 2012, and then, in 2016, someone puts up the 160+ million accounts for sale. You can simply say, well, all those passwords have been invalidated now, so, who cares LOL.
But that’s not what they did. Moreover, only users who reset their passwords benefitted from their new security measures, meaning users who had not reset their passwords were still on their previous crappy encryption (with no salt!).
So, in my opinion, LinkedIn had many incentives to force all their users to reset their passwords, but they just didn’t do it.
Other companies were affected by the LinkedIn breach as well. Dropbox and Formspring were hacked from… the hacker looking up their passwords he stole from LinkedIn and reusing them on Dropbox and Formspring respectively.
What this tells me is that the hacker did not target LinkedIn just to target LinkedIn, but he did this knowing that it would be much easier to target employees from other companies he found interesting if he also had their passwords.
Lastly, President Trump’s Twitter account was hacked because his password was part of that 2016 data dump, and his team just so happened to use the same password for both LinkedIn and Twitter.
So the ramifications of a hack like this can be huge, especially when you take into consideration who is on LinkedIn (heads of state, employees from other companies that can be easily indexed).
LinkedIn was sued in a Class Action Lawsuit, which they settled for about 1 million USD. The most you could get from that was about $50, if you had been affected.
Nikulin was indicted and charged in 2018. He spent time in jail awaiting trial, which was slated to start in 2020, but was put on hold due to the pandemic. Eventually, the trial ended on September 2020, and he was sentenced to 88 months in jail (with time served), and 1.7 million in restitution.
The employee who had their RSA key stolen is still with LinkedIn, and in fact, seems to be quite high up, working as a Principal Staff Engineer.
Personally, I think it’s great that he wasn’t fired for this, nor was his career ruined. I’m a big believer that you learn by fucking up, and I’m sure he, as well as his team, probably walked away with a ton of learnings.
LinkedIn was, once again, in the news for an alleged data breach. This time, it was 700+ million accounts thought to have been compromised.
Looking at the data that was for sale, it looked like no passwords were part of this breach. LinkedIn determined that it was not a breach, but simply data scraping, which they believe is against their terms of service (but has been hard for them to enforce legally, see LinkedIn v. HiQ).
The hacker actually confirmed to a seller that he had acquired the data from the LinkedIn API.
I would not consider this a hack, though I honestly admire the feat and sheer force of will of whoever sat there and (ab)used the LinkedIn API to collect all this data.
That said, what type of API allows a user to make these many API calls? I would reserve this amount of volume for enterprise level customers, not just any developer with an API key; but we don’t know much about how this data was gathered.
Personally, I would have triggers in place that would bubble up this type of activity as suspicious, so teams could then take a closer look and ensure no one is doing anything malicious with the data that can be gathered using a public API like this.
Is 88 months too little or too much? Personally, I think it’s not enough deterrent, but it’s about as high a sentence as you could get for charges such as these.
https://blog.linkedin.com/2012/06/06/linkedin-member-passwords-compromised
https://www.zdnet.com/article/linkedin-will-pay-1-25-million-to-settle-suit-over-password-breach/
https://www.bankinfosecurity.com/linkedin-a-7229
https://www.justice.gov/opa/press-release/file/904516/download
https://www.theregister.com/2020/09/30/linkedin_hacker_prison/
https://news.linkedin.com/2021/april/an-update-from-linkedin
http://securitynirvana.blogspot.com/2012/06/final-word-on-linkedin-leak.html
https://darknetdiaries.com/episode/86/
https://www.justice.gov/usao-ndca/pr/russian-man-found-guilty-hacking-three-bay-area-tech-companies