But I digress.
I was upgrading the packages, and one of those was a kernel update, so I rebooted for the first time in months and...
Remember how I had hard drive problems recently? Yeah.
The CF card I've been using had IO errors. This caused filesystem corruption. This, in turn, meant GRUB couldn't locate the kernel to boot from. I had to open it up and pull the flash card out to see what was going on. And it wasn't pretty.
I can't really expand on the advice I posted in Rescue Me! about backups, because I've been procrastinating on fixing the other terrible hard drive fails I had in April and haven't got that machine restored yet; never mind finally finishing my Ultimate Backup Final Solution. Instead, let's just briefly follow the immediate steps I'm taking to rescue what I can, and hope that nothing else breaks while I'm typing this paragraph. Seriously, what is with hardware I own?! Is my house experiencing unusually large Neutrino flux? Was it built on an ancient UNIX burial ground?
Making an image
At first, I mounted the filesystem directly from the card OK, and did a btrfs scrub. In hindsight, what I should have done as the very first step is make an image, because later on things would fail further.I install ddrescue (via the gddrescue package on Ubuntu). Be aware that its command-line arguments aren't identical to the venerable old dd; it takes an input file, output file, and optional 'log' file in that order.
james@yang(): /mnt/touro1/@image
$ sudo ddrescue /dev/sdd ./sumomo-cf32-20140710.img ./sumomo-cf32-20140710.img.ddrescuelog
GNU ddrescue 1.16
Press Ctrl-C to interrupt
rescued: 17241 MB, errsize: 16781 kB, current rate: 0 B/s
ipos: 17258 MB, errors: 1, average rate: 5538 kB/s
opos: 17258 MB, time since last successful read: 13.5 m
Copying non-tried blocks...
Interrupted by user
I interrupted it because just over half-way through making the image, it hits hard IO errors that the kernel decides are unrecoverable, and the device gets removed. What now? Well, if we're lucky, I copied enough that we can still access the 2nd partition of the three that were on there, and get at my data. But how to do that? We've made an image of the entire device, MBR and partitions and everything. Is there a way to read it without going the obvious route of writing it to some USB flash drive first?
Of course there is. This is Linux, and an image file is not really much different from a device file like /dev/sdd; we just need to point our tools at the file we created.
Well, okay, maybe some of our tools are assuming that the data we are feeding it has at least some internal consistency. GParted is a nice tool, but evidently it doesn't like only having half a drive to work with. Similarly, I try cfdisk, and it doesn't like it either. What else can we do?
Of course there is. This is Linux, and an image file is not really much different from a device file like /dev/sdd; we just need to point our tools at the file we created.
james@yang(): /mnt/touro1/@image
$ sudo gparted ./sumomo-cf32-20140710.img.fucked.test
[sudo] password for james:
======================
libparted : 2.3
======================
Cannot have a partition outside the disk!
Well, okay, maybe some of our tools are assuming that the data we are feeding it has at least some internal consistency. GParted is a nice tool, but evidently it doesn't like only having half a drive to work with. Similarly, I try cfdisk, and it doesn't like it either. What else can we do?
Epic Flying Mount
A bit of research later, and I can construct a mount option that gets us to the partition we want.james@yang(): /mnt/touro1/@image
$ sfdisk -l -uS ./sumomo-cf32-20140710.img.fucked.test
Disk ./sumomo-cf32-20140710.img.fucked.test: cannot get geometry
Disk ./sumomo-cf32-20140710.img.fucked.test: 2105 cylinders, 255 heads, 63 sectors/track
Units = sectors of 512 bytes, counting from 0
Device Boot Start End #sectors Id System
./sumomo-cf32-20140710.img.fucked.test1 2048 1998847 1996800 83 Linux
./sumomo-cf32-20140710.img.fucked.test2 * 1998848 32718847 30720000 83 Linux
./sumomo-cf32-20140710.img.fucked.test3 32718848 63438847 30720000 83 Linux
./sumomo-cf32-20140710.img.fucked.test4 0 - 0 0 Empty
SFDisk is a lower level tool that can get us the partition info we need from the MBR. It doesn't work on GPT, so you'll have to wait until I (inevitably) have a failure on an GPT disk for me to post info on doing that! The -l option lists the partitions, and -uS sets the units to 'sectors', 512 bytes each. The 2nd partition, hopefully still present on this image, starts at sector 1998848. Multiplying that by 512 gets us:-
$ echo $((512 * 1998848))
1023410176
Heh, did you know you could do simple maths right there in the shell? Anyway, we can use this byte offset as an option to the mount command! Since I need my btrfs mount to work despite its second half not being available, I include the 'degraded' flag.
james@yang(): /mnt/touro1/@image
$ sudo mount -t btrfs -o degraded,ro,loop,offset=1023410176 ./sumomo-cf32-20140710.img.fucked.test /mnt/tmp1/
Well, did it work...?
james@yang(): /mnt/tmp1
$ ls
@/ fstab.gud @home/ isos/ @sumomo/
james@yang(): /mnt/tmp1
$ ls @/
bin/ btrfs/ dev/ home/ isos/ machine mnt/ proc/ run/ selinux/ sshfs/ sys/ tmp/ var/
boot/ cifs/ etc/ initrd.img lib/ media/ opt/ root/ sbin/ srv/ sumomo/ thanatos/ usr/ vmlinuz
Oh thank fuck. Some of the files are in fact corrupt; I know this because btrfs keeps checksums of all the data. Happily, keeping an eye on /var/log/syslog while copying files out of the loop mount didn't show any errors for the really important stuff.
I am, however, one more machine down. Is there a finite amount of brokenness that I have to maintain, else more hardware will randomly break on me? Who knows! Hopefully future posts will be able to focus more on prevention than scrambling to pull data out of a burning building, but we'll see.
No comments:
Post a Comment
Note: only a member of this blog may post a comment.