Help understanding optimal GPU Plotting settings



  • All,

    I'm trying to figure out how to best calculate optimal settings within the devices.txt file for GPU plotting.

    [platformId] [deviceId] [globalWorkSize] [localWorkSize] [hashesNumber]
    
    - platformId: The platform id of the device. Can be retrieved with the [listPlatforms] command.
    - deviceId: The device id. Can be retrieved with the [listDevices] command.
    - globalWorkSize: The amount of nonces to process at the same time (determine the required amount of GPU memory). Should be a power of 2.
    - localWorkSize: The amount of parrallel threads that computes the nonces. Must be less or equal than [globalWorkSize]. Should be a power of 2.
    - hashesNumber: The number of hashes to compute per GPU calls. Must be between 1 and 8192. Use a value under 8192 only if you experience driver crashes or display freezes.
    

    I have a Gigabyte Radeon R9 280 (3072MB) video card.

    The first 2 settings are obvious, and will be 0 0 for me.

    globalWorkSize & localWorkSize, is there a reasoning that these should be a power of 2? Also, my specific graphics card has 1792 "Cores" and 28 "Compute Units". So, shouldn't I set localWorkSize to be 28 (or should it be 1792)? and then globalWorkSize to be a multiple of the localWorkSize?

    Just trying to figure things out, since it seems to be a guessing game as to what values are best. I would assume there should be a way to actually calculate the optimal settings...



  • @twig123 3072 ( use power of 2

    0 0 8192 128 4 ( last number means intensity, don't use 8192 unless sepeate gpu

    if that doesn't work use

    0 0 8192 64 4



  • I understand that a globalWorkSize of 8192 translates into 2GB GPU RAM (8192 x 256K nonce = 2,097,152K = 2048M).

    However, besides just "power of 2" and randomly picking a number... how does the localWorkSize correlate to the actual specs of the GPU, as both 128 and 64 are both larger than the actual compute units (28) the device has...? Since localWorkSize is a number of parallel threads, I would think that the max number would be the number of compute units. Is that wrong?



  • @twig123 Hi superfrend.
    I try po plot with gpu , but I cant have a good result.
    I must to plot 10 hdd of 8 tb.
    With cpu is too slow
    But with GPU, is slow too. I have try a lot of parameters
    I use a xfx 480rx gpu 8gb
    Can you help me ?



  • @twig123 check global work size when u [listDevices] and use that number.



  • So, I made a batch file for myself with a for loop to help with the automation of testing "power of 2" values for devices.txt, and these are the results from creating a 5GB Plot file onto a pci-e nvme drive (to rule out disk contention):

    (The setup for the GPU plotter recommends "0 0 8448 192 8192", which just fails)

    0 0 8192 768 8192
    100% (20480 nonces), 40960.00 nonces/minutes, 30s
    
    0 0 8192 768 4096
    100% (20480 nonces), 40960.00 nonces/minutes, 30s
    
    0 0 8192 768 2048
    100% (20480 nonces), 45511.11 nonces/minutes, 27s
    
    0 0 8192 768 1024
    100% (20480 nonces), 43885.71 nonces/minutes, 28s
    
    0 0 8192 768 512
    100% (20480 nonces), 43885.71 nonces/minutes, 28s
    
    0 0 8192 768 256
    100% (20480 nonces), 45511.11 nonces/minutes, 27s
    
    0 0 8192 768 128
    100% (20480 nonces), 43885.71 nonces/minutes, 28s
    
    0 0 8192 768 64
    100% (20480 nonces), 45511.11 nonces/minutes, 27s
    
    0 0 8192 256 8192
    100% (20480 nonces), 45511.11 nonces/minutes, 27s
    
    0 0 8192 256 4096
    100% (20480 nonces), 45511.11 nonces/minutes, 27s
    
    0 0 8192 256 2048
    100% (20480 nonces), 47261.54 nonces/minutes, 26s
    
    0 0 8192 256 1024
    100% (20480 nonces), 43885.71 nonces/minutes, 28s
    
    0 0 8192 256 512
    100% (20480 nonces), 47261.54 nonces/minutes, 26s
    
    0 0 8192 256 256
    100% (20480 nonces), 47261.54 nonces/minutes, 26s
    
    0 0 8192 256 128
    100% (20480 nonces), 47261.54 nonces/minutes, 26s
    
    0 0 8192 256 64
    100% (20480 nonces), 47261.54 nonces/minutes, 26s
    
    0 0 8192 128 8192
    100% (20480 nonces), 45511.11 nonces/minutes, 27s
    
    0 0 8192 128 4096
    100% (20480 nonces), 45511.11 nonces/minutes, 27s
    
    0 0 8192 128 2048
    100% (20480 nonces), 45511.11 nonces/minutes, 27s
    
    0 0 8192 128 1024
    100% (20480 nonces), 45511.11 nonces/minutes, 27s
    
    0 0 8192 128 512
    100% (20480 nonces), 47261.54 nonces/minutes, 26s
    
    0 0 8192 128 256
    100% (20480 nonces), 45511.11 nonces/minutes, 27s
    
    0 0 8192 128 128
    100% (20480 nonces), 47261.54 nonces/minutes, 26s
    
    0 0 8192 128 64
    100% (20480 nonces), 45511.11 nonces/minutes, 27s
    
    0 0 8192 64 8192
    100% (20480 nonces), 43885.71 nonces/minutes, 28s
    
    0 0 8192 64 4096
    100% (20480 nonces), 43885.71 nonces/minutes, 28s
    
    0 0 8192 64 2048
    100% (20480 nonces), 43885.71 nonces/minutes, 28s
    
    0 0 8192 64 1024
    100% (20480 nonces), 43885.71 nonces/minutes, 28s
    
    0 0 8192 64 512
    100% (20480 nonces), 45511.11 nonces/minutes, 27s
    
    0 0 8192 64 256
    100% (20480 nonces), 45511.11 nonces/minutes, 27s
    
    0 0 8192 64 128
    100% (20480 nonces), 45511.11 nonces/minutes, 27s
    
    0 0 8192 64 64
    100% (20480 nonces), 42372.41 nonces/minutes, 29s
    
    0 0 8192 32 8192
    100% (20480 nonces), 36141.18 nonces/minutes, 34s
    
    0 0 8192 32 4096
    100% (20480 nonces), 36141.18 nonces/minutes, 34s
    
    0 0 8192 32 2048
    100% (20480 nonces), 36141.18 nonces/minutes, 34s
    
    0 0 8192 32 1024
    100% (20480 nonces), 36141.18 nonces/minutes, 34s
    
    0 0 8192 32 512
    100% (20480 nonces), 36141.18 nonces/minutes, 34s
    
    0 0 8192 32 256
    100% (20480 nonces), 36141.18 nonces/minutes, 34s
    
    0 0 8192 32 128
    100% (20480 nonces), 37236.36 nonces/minutes, 33s
    
    0 0 8192 32 64
    100% (20480 nonces), 37236.36 nonces/minutes, 33s
    
    0 0 8192 28 8192
    100% (20480 nonces), 22755.56 nonces/minutes, 54s
    
    0 0 8192 28 4096
    100% (20480 nonces), 22755.56 nonces/minutes, 54s
    
    0 0 8192 28 2048
    100% (20480 nonces), 22341.82 nonces/minutes, 55s
    
    0 0 8192 28 1024
    100% (20480 nonces), 22755.56 nonces/minutes, 54s
    
    0 0 8192 28 512
    100% (20480 nonces), 22755.56 nonces/minutes, 54s
    
    0 0 8192 28 256
    100% (20480 nonces), 21942.86 nonces/minutes, 56s
    
    0 0 8192 28 128
    100% (20480 nonces), 22755.56 nonces/minutes, 54s
    
    0 0 8192 28 64
    100% (20480 nonces), 22755.56 nonces/minutes, 54s
    
    0 0 8192 16 8192
    100% (20480 nonces), 27927.27 nonces/minutes, 44s
    
    0 0 8192 16 4096
    100% (20480 nonces), 29257.14 nonces/minutes, 42s
    
    0 0 8192 16 2048
    100% (20480 nonces), 28576.74 nonces/minutes, 43s
    
    0 0 8192 16 1024
    100% (20480 nonces), 29257.14 nonces/minutes, 42s
    
    0 0 8192 16 512
    100% (20480 nonces), 27927.27 nonces/minutes, 44s
    
    0 0 8192 16 256
    100% (20480 nonces), 29257.14 nonces/minutes, 42s
    
    0 0 8192 16 128
    100% (20480 nonces), 29257.14 nonces/minutes, 42s
    
    0 0 8192 16 64
    100% (20480 nonces), 27927.27 nonces/minutes, 44s
    
    0 0 4096 256 8192
    100% (20480 nonces), 49152.00 nonces/minutes, 25s
    
    0 0 4096 256 4096
    100% (20480 nonces), 49152.00 nonces/minutes, 25s
    
    0 0 4096 256 2048
    100% (20480 nonces), 49152.00 nonces/minutes, 25s
    
    0 0 4096 256 1024
    100% (20480 nonces), 49152.00 nonces/minutes, 25s
    
    0 0 4096 256 512
    100% (20480 nonces), 49152.00 nonces/minutes, 25s
    
    0 0 4096 256 256
    100% (20480 nonces), 49152.00 nonces/minutes, 25s
    
    0 0 4096 256 128
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 256 64
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 128 8192
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 128 4096
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 128 2048
    100% (20480 nonces), 49152.00 nonces/minutes, 25s
    
    0 0 4096 128 1024
    100% (20480 nonces), 49152.00 nonces/minutes, 25s
    
    0 0 4096 128 512
    100% (20480 nonces), 49152.00 nonces/minutes, 25s
    
    0 0 4096 128 256
    100% (20480 nonces), 49152.00 nonces/minutes, 25s
    
    0 0 4096 128 128
    100% (20480 nonces), 49152.00 nonces/minutes, 25s
    
    0 0 4096 128 64
    100% (20480 nonces), 49152.00 nonces/minutes, 25s
    
    0 0 4096 64 8192
    100% (20480 nonces), 49152.00 nonces/minutes, 25s
    
    0 0 4096 64 4096
    100% (20480 nonces), 49152.00 nonces/minutes, 25s
    
    0 0 4096 64 2048
    100% (20480 nonces), 49152.00 nonces/minutes, 25s
    
    0 0 4096 64 1024
    100% (20480 nonces), 49152.00 nonces/minutes, 25s
    
    0 0 4096 64 512
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 64 256
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 64 128
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 64 64
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 32 8192
    100% (20480 nonces), 35108.57 nonces/minutes, 35s
    
    0 0 4096 32 4096
    100% (20480 nonces), 35108.57 nonces/minutes, 35s
    
    0 0 4096 32 2048
    100% (20480 nonces), 35108.57 nonces/minutes, 35s
    
    0 0 4096 32 1024
    100% (20480 nonces), 35108.57 nonces/minutes, 35s
    
    0 0 4096 32 512
    100% (20480 nonces), 34133.33 nonces/minutes, 36s
    
    0 0 4096 32 256
    100% (20480 nonces), 35108.57 nonces/minutes, 35s
    
    0 0 4096 32 128
    100% (20480 nonces), 35108.57 nonces/minutes, 35s
    
    0 0 4096 32 64
    100% (20480 nonces), 34133.33 nonces/minutes, 36s
    
    0 0 4096 28 8192
    100% (20480 nonces), 36141.18 nonces/minutes, 34s
    
    0 0 4096 28 4096
    100% (20480 nonces), 36141.18 nonces/minutes, 34s
    
    0 0 4096 28 2048
    100% (20480 nonces), 36141.18 nonces/minutes, 34s
    
    0 0 4096 28 1024
    100% (20480 nonces), 35108.57 nonces/minutes, 35s
    
    0 0 4096 28 512
    100% (20480 nonces), 36141.18 nonces/minutes, 34s
    
    0 0 4096 28 256
    100% (20480 nonces), 35108.57 nonces/minutes, 35s
    
    0 0 4096 28 128
    100% (20480 nonces), 35108.57 nonces/minutes, 35s
    
    0 0 4096 28 64
    100% (20480 nonces), 35108.57 nonces/minutes, 35s
    
    0 0 4096 16 8192
    100% (20480 nonces), 26144.68 nonces/minutes, 47s
    
    0 0 4096 16 4096
    100% (20480 nonces), 26144.68 nonces/minutes, 47s
    
    0 0 4096 16 2048
    100% (20480 nonces), 25600.00 nonces/minutes, 48s
    
    0 0 4096 16 1024
    100% (20480 nonces), 26144.68 nonces/minutes, 47s
    
    0 0 4096 16 512
    100% (20480 nonces), 26144.68 nonces/minutes, 47s
    
    0 0 4096 16 256
    100% (20480 nonces), 26144.68 nonces/minutes, 47s
    
    0 0 4096 16 128
    100% (20480 nonces), 25600.00 nonces/minutes, 48s
    
    0 0 4096 16 64
    100% (20480 nonces), 26144.68 nonces/minutes, 47s
    

    ...plot times just get worse from there the lower that the globalWorkSize value goes.

    So, according to this data, any one of these values is the highest my card can generate:
    (Which is strange, since my GPU has 3GB GPU RAM and the best results come out to a globalWorkSize "4096", which is 1GB GPU RAM)

    0 0 4096 256 128
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 256 64
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 128 8192
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 128 4096
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 64 512
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 64 256
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 64 128
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 64 64
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    

    The file size seems to play a big part in this as well, as a 10GB file has much lower nonces/min compared to the 5GB file in the results above.


    Edit: After posting this, I realized that this maybe skewed towards 4096 for the globalWorkSize, due to the generation command I was using in the BAT file:

    gpuPlotGenerator.exe generate direct V:\{MyIDHere}_30131584_20480_4096

    I'll have to test again on a standard drive once my new drives get here on Friday.



  • Hello @twig123
    I read your post
    I have a R9 290 with 3gb ram
    32 ram
    G3930 processor
    I test to write on my ssd one file of 6 gb
    My recomandeted value are 8448 192 8192. Bur fails
    I try this
    1 0 4096 256 128 1min 19 sec
    1 0 4096 256 64 1 min 20 sec
    1 0 4096 128 8192 1 min 18 sec
    1 0 4096 64 512 1 min 23 sec
    1 0 4096 64 256 1 min 14 sec
    1 0 4096 64 128 1 min 16 sec
    1 0 4096 64 64 1 min 15 sec

    So my superfrends. What is wrong here ?
    because you are the best result 23 seconds ?


  • admin

    @eugenb77000 Are you sure your platform/device ID's are correct? Run gpuplotgenerator in setup mode and do option 0 to get your devices list.



  • 0_1499280663956_piataforme.PNG

    Hi admin @haitch. Thank you very much because you answe me.

    Yes my gpu card is 1 0



  • 0_1499280890411_valuedefault.PNG

    And my recomandetet value are 8448 192 8192

    But crased

    So I put 4096 64 512
    4096 256 128
    4096 256 64
    4096 128 8192
    4096 64 256
    4096 64 128

    Result are 1min 20 sec for one fine of 6gb if i writte on ssd
    and 6 minuts and 46 seconds if I write on hdd

    is very slow

    help me pls



  • Yeah, this is what I'm saying... it feels like it's a guessing game, but make sure you have v4.1.1 of the GPU plotter if you are running on Windows.

    However, the speeds that I was doing was to a Samsung EVO 960 NVMe PCI-e drive... Which is much, much, MUCH faster than a normal SATA HDD, or even a SATA SSD. So, the speeds that I posted will likely be a good 4+x faster than your testing to a SSD, since my NVMe drive is capable of 3500MB/s, but your SATA SSD drive would be capped around 600MB/s.

    I was just trying to find the optimal setting for the GPU and ruling out slowness of the HDD, which is why I was testing with the NVMe drive. I have more testing to do, but I'll report my findings as soon as I get some free time.



  • More tests with a 10GB file, and 8192 for the stagger instead of 4096:
    gpuPlotGenerator.exe generate direct V:\{MyIDHere}_30131584_40960_8192

    0 0 8192 256 8192
    100% (40960 nonces), 46369.81 nonces/minutes, 53s
    
    0 0 8192 256 4096
    100% (40960 nonces), 45511.11 nonces/minutes, 54s
    
    0 0 8192 256 512
    100% (40960 nonces), 47261.54 nonces/minutes, 52s
    
    0 0 8192 256 256
    100% (40960 nonces), 47261.54 nonces/minutes, 52s
    
    0 0 8192 256 128
    100% (40960 nonces), 49152.00 nonces/minutes, 50s
    
    0 0 8192 256 64
    100% (40960 nonces), 45511.11 nonces/minutes, 54s
    
    0 0 8192 128 8192
    100% (40960 nonces), 48188.24 nonces/minutes, 51s
    
    0 0 8192 128 4096
    100% (40960 nonces), 50155.10 nonces/minutes, 49s
    
    0 0 8192 128 512
    100% (40960 nonces), 46369.81 nonces/minutes, 53s
    
    0 0 8192 128 256
    100% (40960 nonces), 49152.00 nonces/minutes, 50s
    
    0 0 8192 128 128
    100% (40960 nonces), 46369.81 nonces/minutes, 53s
    
    0 0 8192 128 64
    100% (40960 nonces), 46369.81 nonces/minutes, 53s
    
    0 0 8192 64 8192
    100% (40960 nonces), 43885.71 nonces/minutes, 56s
    
    0 0 8192 64 4096
    100% (40960 nonces), 48188.24 nonces/minutes, 51s
    
    0 0 8192 64 512
    100% (40960 nonces), 43885.71 nonces/minutes, 56s
    
    0 0 8192 64 256
    100% (40960 nonces), 47261.54 nonces/minutes, 52s
    
    0 0 8192 64 128
    100% (40960 nonces), 44683.64 nonces/minutes, 55s
    
    0 0 8192 64 64
    100% (40960 nonces), 44683.64 nonces/minutes, 55s
    
    0 0 4096 256 8192
    100% (40960 nonces), 43885.71 nonces/minutes, 56s
    
    0 0 4096 256 4096
    100% (40960 nonces), 43885.71 nonces/minutes, 56s
    
    0 0 4096 256 512
    100% (40960 nonces), 44683.64 nonces/minutes, 55s
    
    0 0 4096 256 256
    100% (40960 nonces), 43885.71 nonces/minutes, 56s
    
    0 0 4096 256 128
    100% (40960 nonces), 43885.71 nonces/minutes, 56s
    
    0 0 4096 256 64
    100% (40960 nonces), 47261.54 nonces/minutes, 52s
    
    0 0 4096 128 8192
    100% (40960 nonces), 48188.24 nonces/minutes, 51s
    
    0 0 4096 128 4096
    100% (40960 nonces), 46369.81 nonces/minutes, 53s
    
    0 0 4096 128 512
    100% (40960 nonces), 44683.64 nonces/minutes, 55s
    
    0 0 4096 128 256
    100% (40960 nonces), 48188.24 nonces/minutes, 51s
    
    0 0 4096 128 128
    100% (40960 nonces), 47261.54 nonces/minutes, 52s
    
    0 0 4096 128 64
    100% (40960 nonces), 43885.71 nonces/minutes, 56s
    
    0 0 4096 64 8192
    100% (40960 nonces), 48188.24 nonces/minutes, 51s
    
    0 0 4096 64 4096
    100% (40960 nonces), 43885.71 nonces/minutes, 56s
    
    0 0 4096 64 512
    100% (40960 nonces), 48188.24 nonces/minutes, 51s
    
    0 0 4096 64 256
    100% (40960 nonces), 47261.54 nonces/minutes, 52s
    
    0 0 4096 64 128
    100% (40960 nonces), 47261.54 nonces/minutes, 52s
    
    0 0 4096 64 64
    100% (40960 nonces), 43885.71 nonces/minutes, 56s
    

    Seems this setting is the most efficient with the 10GB file:

    0 0 8192 128 4096
    100% (40960 nonces), 50155.10 nonces/minutes, 49s
    

    Again, these are all speeds on a NVMe drive...
    I'll test on a SATA HDD and compare those as well.



  • very good @twing123
    I wait for a test on che SMA HDD.



  • 5GB file, 4096 stagger, to my internal SATA (PMR) HDD:
    gpuPlotGenerator.exe generate direct C:\{MyIDHere}_30131584_20480_4096

    The top configs:

    0 0 4096 256 512
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 256 256
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 64 512
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 64 64
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    

    10GB file, 8192 stagger, to my internal SATA (PMR) HDD:
    gpuPlotGenerator.exe generate direct C:\{MyIDHere}_30131584_40960_8192

    The top configs:

    0 0 8192 128 8192
    100% (40960 nonces), 42372.41 nonces/minutes, 58s
    
    0 0 8192 128 4096
    100% (40960 nonces), 42372.41 nonces/minutes, 58s
    
    0 0 8192 128 512
    100% (40960 nonces), 41654.24 nonces/minutes, 59s
    
    0 0 8192 64 128
    100% (40960 nonces), 41654.24 nonces/minutes, 59s
    
    0 0 4096 256 4096
    100% (40960 nonces), 41654.24 nonces/minutes, 59s
    

    I'm going to do another test with a 10GB file, using a 4096 stagger... and see if any of the speeds change.



  • 10GB file, 4096 stagger, to my internal SATA (PMR) HDD:
    gpuPlotGenerator.exe generate direct C:\{MyIDHere}_30131584_40960_4096

    The top configs:

    0 0 4096 64 8192
    100% (40960 nonces), 52289.36 nonces/minutes, 47s
    
    0 0 4096 64 4096
    100% (40960 nonces), 52289.36 nonces/minutes, 47s
    
    0 0 4096 64 256
    100% (40960 nonces), 52289.36 nonces/minutes, 47s
    

    So, out of all my testing so far, it seems like a stagger of 4096 in the plotter, and one of these in the devices.txt appear to be the most efficient:

    0 0 8192 128 4096
    0 0 4096 256 512
    0 0 4096 64 512
    0 0 4096 64 128
    0 0 4096 64 64

    larger plots than what I'm testing with with will likely be drastically slower, due to higher I/O on the drive.

    I have some additional 8TB USB 3.0 drives arriving on Friday, so I'll test plotting to those when they arrive.



  • @twig123 said in Help understanding optimal GPU Plotting settings:

    5GB file, 4096 stagger, to my internal SATA (PMR) HDD:
    gpuPlotGenerator.exe generate direct C:\{MyIDHere}_30131584_20480_4096

    The top configs:

    0 0 4096 256 512
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 256 256
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 64 512
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    
    0 0 4096 64 64
    100% (20480 nonces), 51200.00 nonces/minutes, 24s
    

    10GB file, 8192 stagger, to my internal SATA (PMR) HDD:
    gpuPlotGenerator.exe generate direct C:\{MyIDHere}_30131584_40960_8192

    The top configs:

    0 0 8192 128 8192
    100% (40960 nonces), 42372.41 nonces/minutes, 58s
    
    0 0 8192 128 4096
    100% (40960 nonces), 42372.41 nonces/minutes, 58s
    
    0 0 8192 128 512
    100% (40960 nonces), 41654.24 nonces/minutes, 59s
    
    0 0 8192 64 128
    100% (40960 nonces), 41654.24 nonces/minutes, 59s
    
    0 0 4096 256 4096
    100% (40960 nonces), 41654.24 nonces/minutes, 59s
    

    I'm going to do another test with a 10GB file, using a 4096 stagger... and see if any of the speeds change.

    it really does feel like an absolute guessing game, its why so many people are getting so frustrated, because things just don't seem to make logic or sense to us and we're left trying everything as you are just to figure it out, even some folks copying the same settings of others, which should yield similar performance all things being almost identical, seem to yield different results, its a hair puller!

    with a 1050 TI and an IronWolf HDD (PMR) I get the following... although it seems to go against what the program recommends when I list the device capabilities... because I only have 6 compute units 768 shaders... yet I exceed the recommended settings and get almost a two fold performance increase with this...

    0 0 1024 16 8192

    0_1499296388079_1000GB at 28019 nonces.png



  • If you are interested, here is the Windows batch script that I whipped up for testing.
    There is no error handling, so use at your own risk.

    Save the below into a new file called PlotBenchmark.bat, update the variables to your v4.1.1 plot generator, and review the additional variables as well.

    PlotterPath is the folder path that contains v4.1.1 of the gpu plotter (no trailing slash at the end of the path)

    GlobalSizes, LocalSizes, and HashSizes are all space separated arrays of what values you that you want the script to test for globalWorkSize, localWorkSize and hashesNumber

    PlotDriveLetter is ONLY the letter of the drive to plot to. This will attempt to make the plot file on the root of the drive.

    @echo off
    setlocal enabledelayedexpansion
    
    REM ===========Update these variables===========
    set "PlotterPath=C:\Users\Owner\Downloads\gpuPlotGenerator-bin-win-x64-4.1.1"
    set "GlobalSizes=8192 4096"
    set "LocalSizes=256 128 64"
    set "HashSizes=8192 4096 512 256 128 64"
    set "AccountID=11111111111111111111"
    set "PlotDriveLetter=C"
    set "StartingNonce=30131584"
    set "NoncesToCreate=40960"
    set "Stagger=4096"
    REM ===========Update these variables===========
    
    set GlobalWorkSize=
    set LocalWorkSize=
    for %%G in (!GlobalSizes!) do (
    	set GlobalWorkSize=%%G
    	for %%l in (!LocalSizes!) do (
    		set LocalWorkSize=%%l
    		if !LocalWorkSize! leq !GlobalWorkSize! (
    			for %%h in (!HashSizes!) do (
    				set HashesNumber=%%h
    				del /q !PlotDriveLetter!:\!AccountID!_!StartingNonce!_!NoncesToCreate!_!NoncesToCreate! >NUL 2>&1
    				echo 0 0 !GlobalWorkSize! !LocalWorkSize! !HashesNumber!
    				echo 0 0 !GlobalWorkSize! !LocalWorkSize! !HashesNumber!>!PlotterPath!\devices.txt
    				!PlotterPath!\gpuPlotGenerator.exe generate direct !PlotDriveLetter!:\!AccountID!_!StartingNonce!_!NoncesToCreate!_!Stagger! | findstr "100%" 2> NUL
    				echo.
    				ping 127.0.0.1 >NUL
    			)
    		)
    	)
    )
    pause
    

    Note: you will likely get Could Not Find {PlotFileHere} and/or FINDSTR: Line 26 is too long. messages... they are "normal", just ignore them.

    Also, the plot file is deleted before each iteration of testing.

    Edit (7/6): Slight update to null the "Could Not Find" and "FINDSTR" messages. Should just output the current devices.txt setting and then the info when the plot gets to 100% complete.


  • admin

    One thing I'd suggest is making your plots bigger.Currently the 5-10GB plots are presumably fitting in RAM and not reflecting what happens when you actually need to write them to disk.

    Instead of 5-10, use 50-100GB and get the disk writing process reflected in the times.



  • @haitch Thanks for the reply.
    Yeah, I planned on doing more testing with larger plots when I get the 8TB drives on Friday.... and indeed these plot files would fit in system RAM, as this machine is running with 48GB RAM.

    When I plotted my other USB3 drives, I think I was seeing ~20,000 nonces/min. I can't recall what my devices.txt was, but I'm hoping that doing this testing will result in a more optimized devices.txt file for when I plot the large files.

    I'll followup once the drives arrive and I'm able to do further plot testing. So, I'll likely be a couple days (unless the 50,000 nonces/min keep up with the large files, but doubtful.)



  • @twig123 said in Help understanding optimal GPU Plotting settings:

    @haitch Thanks for the reply.
    Yeah, I planned on doing more testing with larger plots when I get the 8TB drives on Friday.... and indeed these plot files would fit in system RAM, as this machine is running with 48GB RAM.

    When I plotted my other USB3 drives, I think I was seeing ~20,000 nonces/min. I can't recall what my devices.txt was, but I'm hoping that doing this testing will result in a more optimized devices.txt file for when I plot the large files.

    I'll followup once the drives arrive and I'm able to do further plot testing. So, I'll likely be a couple days (unless the 50,000 nonces/min keep up with the large files, but doubtful.)

    in another thread someone was discussing the settings they helped someone else with and proposed the localsizes may be based on the ROPS x2 and when I pointed out his card actually had 56 ROPS but 104 TMUS, suddenly it clicked in my head, what if his 104 number was based off the 104 TMUS the particular card had when it performed "optimal" for his friend... when I feel like deleting an entire drive just to do some more trial an error I thought about giving it a try, I'm already exceeding expectations on my 1050 TI with 28000 nonces, can't do much better myself... but when I heard him mention the specific 104 and looked up the GPU and saw it had 104 TMUS, it made me wonder, is that what we're looking for, not compute units...

    https://www.techpowerup.com/gpudb/2620/geforce-gtx-970

    that was the 970 he was referring to when he mentioned the 104 number... then if we compare that to my card the 1050 TI and my settings of 0 0 1024 16 8192 we see that my localsize is 16 a fraction of the 48 TMUS which might explain why I can still comfortably use my card even while plotting without much lag or crashes... I dunno, I'm throwing noodles at the wall, some days they stick, some days they don't... your batch file is a great idea for folks, just need to scale up the plot size...