Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Data Storage Open Source Portables Software IT

Ask Slashdot: Asynchronous RAID-1 Free Software Backup For Laptops? 227

First time accepted submitter ormembar writes "I have a laptop with a 1 TB hard disk. I use rsync to perform my backups (hopefully quite regularly) on an external 1 TB hard disk. But, with such a large hard disk, it takes quite some time to perform backups because rsync scans the whole disk for updates (15 minutes in average). Does it exist somewhere a kind of asynchronous RAID-1 free software that would record in a journal all the changes that I perform on the disk and replay this journal later, when I plug my external hard disk on the laptop? I guess that it would be faster than usual backup solutions (rsync, unison, you name it) that scan the whole partitions every time. Do you feel the same annoyance when backing up laptops?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Asynchronous RAID-1 Free Software Backup For Laptops?

Comments Filter:
  • mdadm can do this (Score:5, Informative)

    by Fruit ( 31966 ) on Thursday July 25, 2013 @01:16PM (#44383007)
    Use mdadm -C -b internal to create a bitmap. Detach and readd the mirror at will and it will only sync the difference.
    • by kasperd ( 592156 )

      Use mdadm -C -b internal to create a bitmap. Detach and readd the mirror at will and it will only sync the difference.

      I am going to test this on my next laptop, or if I decide to upgrade my current with an SSD some day.

      Meanwhile, I do have a couple of questions. How automated is this going to be? Will it automatically start to sync, once the USB/eSata disk is connected?

      Can I safely attach that disk to another computer for reading? I am worried such operation might corrupt data, even if I don't write a

  • Obligatory (Score:5, Informative)

    by Anonymous Coward on Thursday July 25, 2013 @01:17PM (#44383023)

    RAID is not backup.

    • Re:Obligatory (Score:5, Informative)

      by XanC ( 644172 ) on Thursday July 25, 2013 @01:20PM (#44383049)

      True. I'd recommend he check out rdiff-backup, which keeps snapshots of previous syncs. Fantastic tool.

    • by hawguy ( 1600213 )

      RAID is not backup.

      It is in this situation since he wants to mirror to an external disk , then break the mirror and unplug the disk.

      It's no worse than if he does "rsync --delete" to the backup medium. (well ok, slightly worse since if the mirror fails in the middle, the backup disk is left in an inconsistent state and could be unreadable, but the rsync would also leave an unknown number of files/folders unsynced, so it's not a perfect backup itself)

      As long as you have more than one backup disk, then a mirror is as safe as rs

      • Re: (Score:3, Informative)

        Just because you've hacked RAID into part of a backup strategy does not mean that backup is a standard use-case for RAID. It's far too easy for the wrong disk to get overwritten because of all the things RAID is set up to do by default. With rsync, you're telling the disks exactly which direction the data needs to flow. In a production environment, there's also a greater chance of failure using RAID because of the whole "plugging / unplugging drives" thing. Sure, it's rare, but your operating system and/or
        • by hawguy ( 1600213 )

          Just because you've hacked RAID into part of a backup strategy does not mean that backup is a standard use-case for RAID. It's far too easy for the wrong disk to get overwritten because of all the things RAID is set up to do by default. With rsync, you're telling the disks exactly which direction the data needs to flow.

          In a production environment, there's also a greater chance of failure using RAID because of the whole "plugging / unplugging drives" thing. Sure, it's rare, but your operating system and/or motherboard may or may not enjoy having drives attached and detached from its SATA bus.

          Hearing the above, a systems administrator would assume you're confused between the terms "backup" and "mirror". It's a non-standard use-case, so the admin that arrives after you've moved on to another job will have to deal with that confusion.

          My RAID backup strategy was fully supported and recommended by the manufacturer of the storage array, and was a big selling point. It wasn't a hack. Even tape backups can suffer problems from overwriting the wrong tape if someone does something stupid. "Oh hey, the backup system says this tape isn't expired yet, I'm sure I loaded the right tape, so I'll just do a hard-erase so I can write to it"

          • by hawguy ( 1600213 )

            My RAID backup strategy was fully supported and recommended by the manufacturer of the storage array, and was a big selling point. It wasn't a hack. Even tape backups can suffer problems from overwriting the wrong tape if someone does something stupid. "Oh hey, the backup system says this tape isn't expired yet, I'm sure I loaded the right tape, so I'll just do a hard-erase so I can write to it"

            Here's a Sun/Oracle doc that explains the procedure:

            http://docs.oracle.com/cd/E19683-01/817-2530/6mi6gg886/index.html [oracle.com]

            How to Use a RAID 1 Volume to Make an Online Backup
            You can use this procedure on any file system except root (/). Be aware that this type of backup creates a “snapshot” of an active file system. Depending on how the file system is being used when it is write-locked, some files and file content on the backup might not correspond to the actual files on disk.

            The following limitations apply to this procedure:

            * If you use this procedure on a two-way mirror, be aware that data redundancy is lost while one submirror is offline for backup. A multi-way mirror does not have this problem.

            * There is some overhead on the system when the reattached submirror is resynchronized after the backup is complete.

        • You're getting way too detailed to be implicating the highly generic term RAID in your list of fault conditions.

          I will point out however that your premise that a system won't know which way to sync the data is wrong. Any running RAID implementation that syncs from a recently attached disk to the currently in use disk is just broken and would never get out of QA.

          However using a RAID1 to mirror to an external drive isn't going to be a particular benefit unless the raid implementation manages a changed block m

    • by Sloppy ( 14984 )

      It is, if you then disconnect half of it and move it offsite! I'm not sure that's the best way to do backups, though.

      If I were this guy, I'd look into why it takes rsync so long to read the dir tree. This is one of those situations where no matter how much people say "Linux filesystems don't suffer from fragmentation," I nevertheless suspect you're suffering from highly fragmented directories. Let me guess: do you repeatedly come close to filling the disk? Maybe it's time to do this: after the next rsyn

  • ZFS: Snapshot + send (Score:2, Interesting)

    by Anonymous Coward

    Cleanest implementation of this I've seen is with ZFS.

    You do a snapshot of your filesystem, and then do a zfs send to your remote backup server, which then replicates that snapshot by replaying the differences. If you are experiencing poor speed due to read/write buffering issues, pipe through mbuffer.

    The only issue is that it requires that you have your OS on top of ZFS.

  • Exclude directories (Score:5, Informative)

    by Anonymous Coward on Thursday July 25, 2013 @01:22PM (#44383085)

    Are you backing up EVERYTHING on the laptop -- OS and data included? Even if you are only backing up your home directory there is stuff you don't need to backup like the .thumbnails directory which can be quite large. Try using rysnc's exclude option to restrict the backup to only what you care about.

    DNA
    AKA mrascii

  • by phorm ( 591458 ) on Thursday July 25, 2013 @01:24PM (#44383107) Journal

    In this case, it sounds like you want a fast on-demand sync rather than a RAID.

    However, you could possibly use dm-raid for this if you're a linux user.
    Have the internal disk(s) as a degraded md-raid1 partition. When you connect the backup disk, have it become part of the RAID and the disks should sync up. That said, it likely won't be any faster than rsync, quite possibly slower as it'll have to go over the entire volume.

    Alternate solutions:
    * Have a local folder that does daily syncs/backups. Move those to the external storage when it's connected.
        CAVEATS: Takes space until the external disk is available
    * Use a differential filesystem, or maybe something like a COW (copy-on-write) filesystem. Have the COW system sync over to the backup disk (when connected) and then merge it into the main filesystem tree after sync
        For example, /home is a combination of /mnt/home-ro (ro) and /mnt/home-rw (rw, COW filesystem). When external media is connected, /mnt/home-rw is synced to external media, then back over /mnt/home-ro

  • OS? (Score:5, Insightful)

    by ralf1 ( 718128 ) on Thursday July 25, 2013 @01:28PM (#44383153)
    The OP doesn't mention which OS he's on - the tools he mentions both run across multiple OS's. Would be helpful to know. I know as a group we probably assume some form of Linux but..... I use MS Home Server at the house to back up my family's multiple Windows machines. Runs on crappy hardware, does incrementals on a schedule, allows file level or bare metal restore, keeps daily/weekly/fulls as long as I ask it to. I know we aren't a Windows friendly crowd but this product does exactly what it promises and does it pretty well.
    • I use robocopy on Windows for my 1:1 backup copy since it will use timestamps and file sizes to determine if a file needs to be synced or not. But I assume rsync does the same thing.
      • Robocopy doesn't keep the ACM dates across volumes. So it is certainly not a 1:1 copy.

        The only thing that comes close, but still not there completely, is the legacy MS (Veritas) backup utility. And that one is far from automated.

        • Robocopy doesn't keep the ACM dates across volumes. So it is certainly not a 1:1 copy.

          The only thing that comes close, but still not there completely, is the legacy MS (Veritas) backup utility. And that one is far from automated.

          What about SyncToy? [microsoft.com] Seems to work pretty well, at least it does for me.

        • Robocopy doesn't keep the ACM dates across volumes. So it is certainly not a 1:1 copy.

          Maybe I'm misunderstanding you, but robocopy does keep dates across volumes. You can also control whether or not you want to copy them. File times are copied by default and for directory you add the DCOPY:T parameter. Are you speaking of some other underlying file system date?

  • CrashPlan (Score:4, Informative)

    by Nerdfest ( 867930 ) on Thursday July 25, 2013 @01:34PM (#44383221)

    CrashPlan [crashplan.com] is free, but not open, and I think will do everything you need. You can backupto an external disk, over the network to one of your own machines, or back up to a freind who also runs it. Great key based encryption support. If you want, you can pay them for offsite backups (which is a great deal as well, in my opinion). It's cross-platform, and easy to use. Never underestimate the benefits of off-site backups.

    • by lw54 ( 73409 )

      For the last month, I've been using CrashPlan to back up a 5.5TB filesystem over AFP to a remote AFS file share over the Internet. I did the initial backup across the LAN and then moved the drive array to its final destination. I'm now a few weeks in after the move and for the last 4 days, it has not backed up and is instead synchronizing block information. 4 days in and it's up to 59.9%. It spent 1.5 days about a week ago doing something like recursive block purging. I wish the client could do these housek

      • That is a long time. I think I had something similar when I 'adopted' a backup. Once it's in sync the backups are quite quick, with pretty much no 'start-up scan time'.

  • by benjymouse ( 756774 ) on Thursday July 25, 2013 @01:39PM (#44383255)

    Windows Backup (since Vista) use Volume Shadow Copy (VSS) to do block level reverse incremental backup. I.e. it uses the journaling file system to track changed Blocks and only copies over the changed Blocks.

    Not only that, it also backs up to a virtual harddisk file (VHD) which you can attach (Mount) as a seperately. This file system will hold the complete history, i.e. you can use the "previous versions" feature to go back to a specific backup of a directory or file.

    • by h4rr4r ( 612664 )

      Lots of backup software uses VSS, pretty much any credible backup software on windows. It totally lacks automation, which is a pretty big downside.

      I doubt he is using windows, since he mentions rsnapshot.

      • It totally lacks automation, which is a pretty big downside.

        wbadmin.exe [microsoft.com] is available since Vista (where the VSS based image backup was introduced).

        How is that totally lack of automation?

        • by h4rr4r ( 612664 )

          You can use it for automation sure, but out of the box it does not do any. Nearly no windows user will know how to use that. It would need a shiny wizard and other mythical figures to do that for you.

          My personal favorite is to have bacula do it, that is even less end user friendly though. It does mean all the schedules live on the server not the client, which is nice.

        • Home versions of windows don't support scheduled backups. You might be able to hack something yourself using task scheduler and a batch file though.

          • Home versions of windows don't support scheduled backups. You might be able to hack something yourself using task scheduler and a batch file though.

            No, that is not correct.

            At least in Windows 7 *all* editions [microsoft.com] have the full image capability. Only the professional/enterprise editions can backup to a *network* drive. But in this case it is a local or attached disk, so the edition really does not matter.

    • Comment removed based on user account deletion
      • by benjymouse ( 756774 ) on Thursday July 25, 2013 @07:26PM (#44386725)

        Unless you're running Windows 8 or Server 2012, Windows Backup on Windows 7 and below is functionally obsolete due to the new 3TB + drives now in 4k sector Advanced Format technology.

        Nice. So because you can buy large-capacity drives that immediately would "functionally obsolete" backup solutions even if a system does not have such a drive? Tell me, did you buy a new BMW when apple changed the connector for iPhone 5? You know, the old BMW are now "functionally obsolete".

        Not that it matters much here anyway, because you got it wrong. Windows backup *will* backup to drives larger than 3TBs - as long as they use the 512e advanced formatting where it logically uses 512 bytes allocation units but physically 4096 bytes units. The solution is to use the GPT (GUID Partition Table) format. This will work for Vista and up.

        The drives that are exclusively 4096 cannot be used with Windows 7 / Server 2012 - that's a limitation of the OS and not the backup software, however.

  • Whooosh (Score:4, Interesting)

    by jayteedee ( 211241 ) on Thursday July 25, 2013 @01:43PM (#44383313)

    Holy cow people, your missing the OP point. It's taking 15 minutes to SCAN the 1TB drive.

    I've run into the same problem on windows and Linux. Especially for remote rsync updates on Linux on slow wireless connections. It's not the 1TB that kills since I can read 4TB drives with hundreds of movies in seconds. It's the amount of files that kill performance.

    My solution on windows is to take some of the directories with 10,000 files and put them into an archive (think clipart directories). Zip, Truecrypt, tar, whatever. This speeds up reading the sub-directories immensely. Obviously, this only works for directories that are not accessed frequently. Also, FAT32 is much faster on 3000+ files in a directory than NTFS is. Most of my truecrypt volumes with LOTS of files are using FAT32 just because of the directory reading speed.

    On Linux systems, I just run rsync on SUB-directories. I run the frequently accessed ones more often and the less-accessed directories less often. Simple, No. My rsyncs are all across the wire, so I need the speed. Plus some users are on cell-phone wireless plans, so need to minimize data usage.

    • Yup almost everyone missed the point of having to deal with shitty File Systems.

      Agreed about using the "dumb" FAT32 FS for speedy access!

      It's too bad you couldn't load the FS meta-info into a RAM drive, or onto a SSD, kind of like how ZFS gives you the option with the ZIL on SSD.

      • Agreed. My first thoughts were ZFS, but with the laptop I figured it was more-than-likely a windows box. Plus I wouldn't use BSD on a laptop either and I don't quite trust ZFS on Linux yet...(but it's getting close). Also agree on the ZIL on SSD. I can keep quite a few VMs (websites) in cache on the SSD and hardly have to worry about the speed of the HDs. Plus backups from the filesystem level. One of those tools I can't believed I've lived without all these years.

    • Re:Whooosh (Score:4, Informative)

      by benjymouse ( 756774 ) on Thursday July 25, 2013 @07:41PM (#44386845)

      My solution on windows is to take some of the directories with 10,000 files and put them into an archive (think clipart directories).

      I hope your are not an IT professional. Windows comes with a perfectly good backup solution built-in. It will use Volume Shadow Copy Service (VSS) to track changes as they occur and subsequently only do backup of the changes blocks. No need to scan *anything* as the journaling file system has already recorded a full list of changes in the journal.

      The backup is basically stored in a VHD virtual harddisk (and some catalog metadata around it), so you can even attach the VHD and browse it. It will by default let you browse the latest backup, but the previous versions feature will let you browse back in time to any previous backup still stored in the VHD (oldest backups vill be pruned from the backup when the capacity is needed). The VHD is a inverse incremental backup because it stores the latest backup as the readily available version and only the incremental (block level) differences between previous backup sets.

      Moreover, VSS also ensures persistent consistency for a lot of applications that are VSS aware (VSS writers), i.e. database systems like Oracle, SQL Server, Active Directory, registry etc. VSS coordinates with the applications so that exactly when the snapshot is taken, the applications ensure that they have flushed all state to disk. This means that applications will not need to be stopped to get a consistent backup, i.e. database systems will not see a restore of a backup that was taken from a running system as a "crash" (as they would without such a service) from which they must recover through some other means (typically a roll-forward log).

  • by tibit ( 1762298 ) on Thursday July 25, 2013 @01:51PM (#44383425)

    I'd think to use LVM and filesystem snapshots. The snapshot does the trick of journaling your changes and only your changes. You can ship the snapshot over to the backup volume simply by netcat-ing it over the network. The backup's logical volume needs to have same size as the original volume. It's really a minimal-overhead process. Once you create the new snapshot volume on the backup, the kernels on both machines are essentially executing a zero-copy sendfile() syscall. It doesn't get any cheaper than that.

    Once the snapshot is transferred, your backup machine can rsync or simply merge the now-mounted snapshot to the parent volume.

    • by tibit ( 1762298 ) on Thursday July 25, 2013 @02:00PM (#44383505)

      Well, of course I goofed, it's not that easy (well it is, read on). A snapshot keeps track of what has changed, yes, but it records not the new state, but the old state. What you want to transfer over is the new state. So you can use the snapshot for the location of changed state (for its metadata only), and the parent volume for the actual state.

      That's precisely what lvmsync [github.com] does. That's the tool you want to do what I said above, only that it'll actually work :)

  • If you are spending time messing with a system that is not going to provide you with a running computer after a quick trip to the store for a new hard drive, then maybe you should rethink your goals.
    And perhaps you would regret the time spent less if you knew that in the event of an emergency, your backup would not only save your data, but prevent a re-installation and updates and more updates and more updates, and hunting for installation media and typing in software keys.
    AIX had/has a nice system for back

  • Making a mirror every now and again is not a backup strategy to use. This is the canned RAID is NOT a backup and never will be advice. For a single laptop something like backblaze is probably a better bet.

  • Upgrade your rsync! (Score:5, Informative)

    by phoenix_rizzen ( 256998 ) on Thursday July 25, 2013 @02:25PM (#44383775)

    You're holding it wrong. ;)

    rsync 2.x was horribly slow as it would scan the entire source looking for changed files, build a list of files, and then (once the initial scan was complete) would start to transfer data to the destination.

    rsync 3.x starts building the list of changed files, and starts transferring data right away.

    Unless you are changing a tonne of files between each rsync, it shouldn't take more than a few minutes using rsync 3.x to backup a 1 TB drive. Unless it's an uber-slow PoS drive, of course. :)

    We use rsync to backup all our remote school servers. Very rarely does a single server backup take more than 30 minutes, and that's for 4 TB of storage using 500 GB drives (generally only a few GB of changed data). And that's across horrible ADSL links with only 0.768 Mbps upload speeds!

    Going disk-to-disk should be even faster.

    • by Skater ( 41976 )
      I was hoping someone would say something like this. I do the exact same thing with my "media" drive - a two terabyte drive with our pictures, home videos, mp3s, etc. on it. I have another external 2 GB drive. It really doesn't take that long for the rsync to work - I start it and it finishes a couple minutes later, even when I haven't done it for 30 or 60 days. I've never sat there and timed it, because I usually start it and go do something else, but I don't think it takes 15 minutes on average - maybe
  • by Roskolnikov ( 68772 ) on Thursday July 25, 2013 @02:26PM (#44383779)

    two pools, internalPool, externalPool

    use ZFS send and receive to migrate your data from internal to external, you and do whole fs or incremental if you keep a couple of snaps local on your internal disk, this can get excessive if you have a lot of delta or you want a long time.

    http://docs.oracle.com/cd/E18752_01/html/819-5461/gbchx.html [oracle.com]

    of course you will need a system that can use ZFS, there are more options for that than time machine, its block level and its fast, and it doesn't depend on just one device, you can have multiple devices (I like to keep some of my data at work, why? my backup solution is in the same house that would burn, if it burned...)

    • Very nice suggestion about using two pools !

      >of course you will need a system that can use ZFS

      Actually I was suprised how well "ZFS on Linux" works if you don't have a FreeNas/BSD system.
      * http://zfsonlinux.org/ [zfsonlinux.org]

      It is too bad the ZFSonLinux documentation is total garbage but at least it was relatively painless to get it to work on a spare Ubuntu box. IIRC, ZFS on Linux setup was ...

      sudo apt-get update
      sudo apt-get install uuid-dev

      wget http://archive.zfsonlinux.org/downloads/zfsonlinux/spl/spl-0.6

    • by steak ( 145650 )

      I declare you the winner.

  • Comment removed based on user account deletion
  • I usually hate making posts where I am questioning the questioner, rather than providing an answer but with 1 TB of information you should put on the patience cap. It will take as long as it takes.

    To break down what you are wanting:
    I want a backup based on a journal file system sorta of thing that works incrementally slowing down every disk operation by a few milliseconds so I can shave 15 minutes off of a backup procedure, but I still have to send the same data. I don't think that would be very wise. The b

  • Btrfs send/receive (Score:5, Informative)

    by jandar ( 304267 ) on Thursday July 25, 2013 @02:48PM (#44384007)

    Btrfs send/receive should possible be doing the trick. After first cloning the disk and before every subsequent transfer create a reference-snapshot on the laptop and delete the previous one after the transfer.

    $ btrfs subvolume snapshot /mnt/data/orig /mnt/data/backup43
    $ btrfs send -p /mnt/data/backup42 /mnt/data/backup43 | btrfs receive /mnt/backupdata
    $ btrfs subvolume delete /mnt/data/backup42

    I havn't tried this for myself, so the necessary disclaimer: this may eat your disk or kill a kitten ;-)

  • by flux ( 5274 ) on Thursday July 25, 2013 @02:53PM (#44384057) Homepage

    Btrfs has tools for doing this. It also comes with find-new that allows to find exactly which files have been changed between snapshots, and it does it basically instantenously.

    Though Btrfs might not be the solution for ensuring data integirity at this point.. But setting up hourly snapshots of your drives can be quite nice when you accidentally destroy something you've created after the last backup.

    • by ssam ( 2723487 )

      >Though Btrfs might not be the solution for ensuring data integirity at this point.

      its certainly close though. it also has bunch of data integrity features (like checksumming) that will make it far safer than ext (and most other filesystems apart from zfs). if you have slightly dodgy hardware btrfs will let you know, whereas your data may silently corrupt on ext4.

  • I use iFolder for this. It has clients for Windows, Linux, and Mac platforms, and works reasonably well. The server was a bit of a pain to get set up though. It used to be a Novell product but has spun off as its own open source project. You can check it out at ifolder.com

  • Real backup and since 6.3 does journal based backups for Ext2, Ext3, Ext4; XFS, ReiserFS, JFS, VxFS, and NSS.

    The other option I have seen (surprisingly for GPFS as TSM does not do journal based backups for GPFS even though both are IBM products) is to register to the DMAPI (this would only work for XFS I think) and then use that to capture all activity on the file system. You could then use that to generate your list of files to backup. Admittedly this is going to require you to get your hands dirty and do

I have hardly ever known a mathematician who was capable of reasoning. -- Plato

Working...