Tag Archives: home-server

Incident Postmortem: BSD.am home server @ 3-4 July 2023

Incident Information

Between the hours of Mon Jul 3 03:05:59 2023 and Tue Jul 4 01:10:15 2023 the home server named BSD.am (also known as pingvinashen.am) was completely down.

The event was triggered by a battery issue due to high temperature at the apartment where the home server resides.

A battery swell caused the computer to shut down as it produced higher than normal heat into the system.

The event was detected by the monitoring system at mon.bsd.am which notified the operators using email and chat systems (XMPP).

This incident affected 100% of the users of the following services:

  • jabber.am public XMPP server
  • conference.jabber.am public XMPP MUC server
  • օրագիր.հայ public WriteFreely instance
  • սարեան.ցանցառներ.հայ public Lobste.rs instance
  • BIND.am public DNS server and its zones
  • Multiple hosted blogs, including this one you’re reading.
  • A private ZNC server for Armenian Hackers Community
  • git.bsd.am public Gitea server
  • A matterbridge instance connecting multiple communities
  • A Huginn instance automating tasks (such as RSS to Telegram, RSS to newsletter) for Armenian Hackers Communities
  • A newsletter instance running listmonk.app
  • A private Miniflux.app server for Armenian Hackers Community
  • FreeBSD Jail users’ meetup website

Multiple community members contacted the operator (yours truly) asking for an ETA.

Response

After receiving an email at Mon Jul 3 03:06:49 2023, the Chief Debugging Officer (yours truly) started analyzing the possible issue. According to Monit (mon.bsd.am) all the services were unavailable and the server was not reachable by IP (based on ICMP).

The usual possibility, network failure at the ISP level, was ruled out, as the second home server (arnet.am) was functioning properly.

The person closest to the server physically, was the operator’s sibling (lucy.vartanian.am), however she did not have the background in Unix system administration nor in hardware maintenance. Also, she was asleep.

Hours later the siblings (yours truly) organized a FaceTime call to debug the issues remotely.

The system did boot the kernel properly, however it would shutdown before the services could complete their startup.

Clearly, the machine needed to be shipped to the operator (yours truly) to be debugged at the spot.

So that’s what the team did.

IMG 6689
Precise addresses are removed for privacy

Recovery

At the operator’s (yours truly) location, the BIOS logs have listed that the system suffered from a ASF2 Force Off. This usually means a thermal problem.

The operator (yours truly) disassembled the laptop, hoping the system needs a little dust clean-up and a thermal paste update.

Turns out the problem was actually a swollen battery.

IMG 6683
IMG 6684
IMG 6685

After removing the battery, the system booted fine. Just to be sure that the swollen battery was the root cause, a complete system stress test was ran. No issues detected (Well, except “Missing Battery”).

The systems was returned to its residency, connected to the internet and all services were accessible again.

IMG 6690
Precise addresses are removed for privacy

Next Steps

  • Install a new battery in the future, as the laptop is not connected to a UPS
  • Make sure to test the hardware during environmental changes (too cold, too hot, etc)
  • Run a simple status page with an RSS feed in a separate environment and notify users

If you’re new here, then first of all I’d like to thank you for reading this IR Postmortem article.

Yes, this was an IR Postmortem of a home server of a tiny community in a tiny country. This was not about Amazon, Google, Netflix, etc.

I wrote this for two reasons.

First, I wanted to show you how awesome the actual internet is. You see, when Amazon dies, everything dies with it. Your startup infra, your website, your hobby projects, everything.

When my server dies, only my server dies. And that’s the beauty of the internet. If you can, please, keep that beauty going.

Second, I run a small security company, illuria, Inc., where we help companies harden their environment and recover from incidents. It’s been years since I wrote an IR postmortem personally (my team members who do that are way smarter than me!), and I thought it would be a nice exercise to write it all by myself 🙂

I hope you liked this.

That’s all folks…

Reply via email.

Downtime for the rest of us

If the homebrew server club had an official membership based on technicality, then I would be a very proud member, but it does not have a membership application. That being said, I am still a proud member of HBSC, as I’ve been running a home server for a decade now.

I can’t say that it’s been easy, but it has been evolving. When I tried setting up my first server, I had issues with an ISP that didn’t allow me to have more than a single public static IP address.

Over time, ISPs changed, servers have changed, but the only thing that remained the same is me running my server from my home.

Now, I do have multiple IPs, a VLAN with my ISP that we’ve agreed on the setup, an internal email where they answer my questions without me calling the general support line and finally a publicly available Looking Glass that anyone can use.

Unfortunately, it’s not all sunshine and roses. My biggest request for the last couple of years has been the same: a status page.

You know, that simple web page that tells you if a service is down?

Interestingly, when I was researching ISPs (that’s a post for another day) I noticed that most ISPs don’t provide a status page.

Some ISPs (like Google Fiber) ask for an address, while others ask you to log in.

I understand that an ISP is a complex beast, and it would not be an easy task to say “we have an issue”, but hey, someone has to start trying.

Oh, I forgot, the downtime mentioned in the title!

Well, my personal blogs don’t have a lot of traffic (unless if someone posts a link to the Orange Website, then I get 20K+ viewers per day), but many people use my services, such as my Jabber/XMPP chat server, a publicly available blogging system an Armenian tech forum and so on.

All of the local ISPs had issues this week and their first response was to fix the outbound traffic. So for most people in the country, they didn’t care, as long as they were able to use Telegram and log into their Meta-owned social media services.

But for me and my community, we had to wait almost 18 hours for them to fix the internal network issues.

However, I am still a proud member of HBSC, because unlike Big Tech companies, if I go down, only I go down. But if a cloud goes down, everyone goes down with them.

See you at the next downtime 😉

Reply via email.

Antranig Vartanian

December 16, 2022

Well, it time to do upgrades on my server. expect this blog will be offline for a while as well.

How long were we up anyway?

root@pingvinashen:~ # uptime 
 9:41PM  up 270 days, 18:36, 4 users, load averages: 0.15, 0.36, 0.38

Not bad, huh?

See you soon!

Reply via email.

Migrating home-servers

As I have mentioned before, I want to blog more, so here it goes.

I’ve been struggling financially lately, with COVID and then the war I’ve thrown away almost all of my savings. One of the decisions that I had to make was moving back to my old place. No one lives here anymore, my parents got their own house, which means I can live with freedom alone and rent-free.

That meant I need to move home servers again. Yes, I’ve always been a home server fan. This blog runs on my home server as well.

While many people argue that running a home server is a complex process compared to the cloud, since you need to pay for electricity and manage hardware, I, however, feel that’s a myth.

My current uptime is

ssh pingvinashen uptime
1:59PM up 48 days, 42 mins, 2 users, load averages: 0.15, 0.18, 0.21

And I only needed to reboot because I had to upgrade since I’m a fan of upgrading whenever there’s a patch to some critical software 🙂

One of the advantages of running a home server in Armenia is the fact that electricity is cheap, so are static IP addresses. I pay 2USD/mo for each IP address and I have many of them.

Usually, I have one static IP per service (Jabber, ZNC, etc.) and one static IP for all web-oriented services such as blogs, websites, etc.

However, norayr also runs a home-server for the community, he runs the Armenian instance of Diaspora*, Mastodon, and SocialHome.

Due to technical limitations at his side of the city, he’s been keeping his home server at my place.

Vartanian LLC, Home-Server as a Service 😛

Anyways, I had to bring his home server to my new/old place as well, which meant that he needs a static IP for his services.

I did not want to call the ISP for a new IP address since the last one I’ve been using was for an Armenian instance of Lobste.rs that I deployed for our community. It’s not very active, but you can’t force people to be active in communities and Armenia does not have the concept of “tech communities” like others do in the west.

That meant that I have to remove an IP from a Jail so norayr can use it.

So I had to migrate some things. I had to use my proxy server IP address and reverse_proxy the traffic to the lobsters’ Jail.

Sounds easy, until I remembered that I run Apache on my host.

I’m not very fluent in Apache, I keep doing mistakes, so I wanted to migrate all of my vhosts to Nginx.

You’d think that it would be easy, and yes it was 🙂

So now, norayr runs his home server and I have migrated the webserver to Nginx in an hour.

For some reason, it feels faster, but I’m still not sure why. I probably had to optimize Apache back in the day, but Nginx’s default configs do seem better.

Now, since many IP addresses have been changed, I have to struggle with SMTP issues. No, SMTP works fine, but Google, just like it keeps breaking the web, it keeps breaking email as well, routing all-good emails to people’s spam folder, eh.

That’s all folks.

* not a footnote but part of the project name.

Reply via email.