Plot semi-optimization

  • Short version: Is there a tool that can do only partial optimization? Does it makes sense for you?

    Long version:
    Previously, we were plotting only few disks per mining rig so creating optimized plot file was not an issue. Now, we plan to improve our capacity a lot. So I'm trying to create a proper workflow and test a lot of combinations (where most of them are useless).

    [increase capacity] => our target is between 1-2PB/week

    Creating optimized plots (CPU: i7770) is too slow, so let's create unoptimized plots via GPU (RX 580). We can do 80k nonces/min on single GPU (when writing on 3 disks) what is enough for our purpose as we can scale easily horizontally.

    Depending on various scenarios, we might find out that we can't read plots fast enough. Currently, we can completely optimize plots. When we take a look at ''Technical information to create plot files'' it is easy to spot that optimized plot means 'seek + read data sequentially; while unoptimized means that we have multiple smaller optimized sequences in one large file thus (plot size/stagger size) * 'seek + read smaller amount of data'. For small staggersize, it means that by doubling staggersize we might receive 'almost' two times faster access as we will don't need another seek. So, we might optimize only when required. I will probably write that program as it might be possible to do this as a filter when copying to USB SMR drives. But I'm open to other ideas.

  • @marxsk I have half a PB active, and "only" plotted about a PB, but my approach was:

    • having a small (600 lines in my environment) shell script doing all the automation;
    • GPU plot 1 TB files to a disk stripe
    • fully optimize from stripe to target disk(s), which itself might be stripe(s) of SMR disks to have more bandwith
    • having (at least) two temp stripes per batch run, alternating between plotting and optimizing

    As long as you have a streaming bandwith of 80kn/m (~340 MB/s) you keep your GPU busy. That means your system needs to handle peaks of at least 500 MB/s times three;

    • plot writing from GPU to temp stripe A
    • plot reading (with seeks) from temp stripe B
    • plot writing (linear) to target X

    PMRs are 100-200 MB/s, SMR (my Seagates) 100-180 MB/s.
    So 3 PMR disks per temp stripe (2x), 3-4 disks as target stripe (sum: 9-10 drives active).

    Or, if you want single SMR disks (100-180 MB/s) as targets:
    4 temp stripes of 3 PMR disks, plus 4 single SMR targets (sum: 16 disks active).

    nice home work assigment ! Which OS ?
    I would't care for "partial" optimization. If the seeks in the second step above are limiting, just add another temp stripe.

  • @vaxman Thanks, for such detailed description of your process.

    When optimizing, I'm using similar process with the exception of not using stripes. I'm using 2TB SSD instead and they are faster than my GPU (60knonces when plotting to single SSD). I was not very successful when using optimizer directly to SMR disks, the performance was quite bad (~50MB/s) and it almost looked like that there was too much seeks. But I did not evaluate it further.

    Which OS?

    We have two kinds of deployment. In the first, we just adds few external disks (Seagate Backup Plus 8TB) to a 'common' non-stop running workstations. In such case it looks like that there is no need to optimize as long as staggersize is good enough (65536 works) as we don't have to "re-use" threads. So everything is running in parallel and we are <1min. The plot machine have to be Windows as performance of NTFS in Linux is not good enough (using ntfs-3g).

    For mining rigs, we are on Ubuntu with ext4 (ext3 does not have support for fast fallocate; no-barriers). They include more internal disks (SkyHawk 8TB) and a bit more optimisation is required. The amount heavily depends on the number of disk and SATA performance. We are still in process of finding the best combo :)

  • @marxsk said in Plot semi-optimization:

    when using optimizer directly to SMR disks, the performance was quite bad (~50MB/s) and it almost looked like that there was too much seeks

    When writing to NTFS or ext3/4 you run into the SMR random-write problem; these filesystems make use of central (ntfs) or distributed (extfs) bitmap tables.
    You could try btrfs or zfs, as their copy-on-write scheme fits the SMR requirements nicely - virtually all IO is sequential in this use case.

    side note:
    On zfs you get correct data; these skimpy SATA's have a bit error rate of 10^-14.
    For the comparatively low Burst work-load that means 21 bad bits in a year per 8 TB drive, or 2,568 per Petabyte-Year.
    Plus the error rates for the SATA-Link itself, PCIexpress transmission and main memory errors. Been bitten by that in a different context. But the effect might not justify the added expense (storage space) for checksums at all.

  • @marxsk said in Plot semi-optimization:

    ext4 (ext3 does not have support for fast fallocate; no-barriers).

    when optimizing, you do not need fast-allocate - the writes are sequential, anyway. The no-barriers option makes no sense for a single disk in my opinion - you wouldn't write 500 MB/s to a single disk, anyway. But hey, test it. I used more time for testing and writing the automation than I needed for starting up the then-finished plotter. 8)

  • @vaxman
    imho main problem with optimizer to SMR disks was based on ntfs-3g that added fuse layer (read a lot of context switching). It was visible even when I was just copying plot file. The speed on windows (with native ntfs) was 3x.

    Using btrfs might be interesting idea, I will try it when building next Linux rig.

    ext3 vs ext4 really makes difference. On the ext3 there is no native support for fallocate syscall so default POSIX is used. It makes creating an empty file really slow (something like you have in windows when non-using admin account). My first attempt was with ext3 as I say that there is no need to have unimportant features from ext4. Unfortunately ext3 in the kernel is just emulation inside ext4 code :)

    no-barriers makes almost no sense under normal scenarios. My tests shown that we can improve write performance by 2% (when writing to multiple disks at once). So it is not worth more testing but it looks good enough for default.