Natural Language Processing: the age of Transformers
This article is the first installment of a two-post series on Building a machine reading comprehension system using the latest advances in deep learning for NLP.
In February 2021, Scaleway introduced a new product: the Mac mini M1 from Apple. Later, I was put in charge of the massive restocking that we needed.
This project was particularly exciting because Apple’s M1 finally gave us an ARM chip that is more powerful than most of its x86 counterparts. The engineering put in this product is nothing short of incredible. And we weren’t the only ones thinking it was exciting: our Apple silicon M1 as-a-Service quickly sold out after the launch. We had to buy a lot more to add them to our data center... and we doubled our customer base since.
But there was one big problem: the VNC was slow.
Our customer excellence team was getting more and more tickets over the months, yet we couldn’t find the root cause of the issue. We even got reports of VNC connections working better in the evening than during the day! Our clients were starting to get rightfully annoyed, and we started to sweat over this issue.
Our first designated culprit was the network. It seemed logical that an overcrowded network link could result in some slowdowns during the day, which is where most of the activity happens. But after spending quite some time with our network engineers, we’ve come to the conclusion that it couldn’t be the network. In fact, the network traffic for this product is usually below 0.25% of its capacity, and for the time being, never exceeded 2.5%.
We couldn’t wrap our heads around why only the VNC was this slow. We tried several remote desktop solutions, either open-source or commercial, and each one of them worked flawlessly. Only the built-in VNC remained slow. That’s when one of our team members had the simple yet smart idea of capturing the network traffic from a machine having the symptoms. After looking at the capture, we discovered something unusual: our machine saw many different IP addresses targeting the VNC port. Dozens of them.
There was only one possible reason for this: our machines were the targets of brute-force attacks. Attackers were trying to gain control over the machines. And we quickly confirmed this hypothesis by looking at the logs, as there were many “Authentication :: FAILED” messages. Yet, this is a common thing in the IT industry. Every machine with a public IP address is subject to automated scans and brute force attacks, so we found a way to see for ourselves if these attempts were the source of our troubles. We added a firewall rule to discard all incoming traffic on the VNC port from all but our IP, and we enabled the firewall. The results were staggering, as the slowness of the VNC simply disappeared.
It seems that screensharingd, the process that provides the built-in VNC, is vulnerable to denial of service attacks. The more an attacker tries to log in, the slower it becomes for legitimate users. We needed to protect this port.
We had two main ideas to solve the problem:
The first idea was quickly discarded. The VNC port is located in a system configuration file protected by SIP (System Integrity Protection). That means that we had to disable SIP to change it, which is not currently possible with our infrastructure.
The second idea was the one that we ended up using. With a combination of fail2ban, a tool that’s used to protect against brute force attacks, and some plumbing, we were able to leverage the power of the Packet Filter system (pf) to dynamically protect the machine. If a client happens to lock himself out, we made the system in-memory only, so that a reboot will be enough to reset the protection and allow the user to connect again.
Now that the problem is solved, we are seeing huge improvements in the VNC speed in our staging environment. Following these good results, we are, at the time of writing, deploying the fix on our machines in stock.
Even in 2021, there are still flaws where we don’t expect them, and screensharingd is a good example of this. This reminds us that we should always question everything when approaching a problem and not blindly trust components because they come from a given provider. If we had questioned the software at the beginning we would have saved quite a lot of time.
In 2022, we’re looking forward to delivering even more value to our Mac mini M1 customers, maybe even putting Mac mini M1s in orbit… Stay tuned!
This article is the first installment of a two-post series on Building a machine reading comprehension system using the latest advances in deep learning for NLP.
As we were building Scaleway's Console, we kept recreating the same components. So we decided to gather all of those components into a library. That's how Scaleway UI was born.