GPU plotting slow.

  • Hi, I have Nvidia Tesla M2050, ram 16GB, Intel core i5. The fastest I can get the GPU to plot at it 9,300 nonces minutes. I'm plotting to 2 SATA III HDDs. Is this the fastest Nvidia Tesla M2050 can plot at? Thought with 1,030 GFLOPS it would plot faster then that.

    GPU specs:

    devices.txt 0 0 4096 256 8192

  • Is this the amount of ram your giving it? 4096

  • yes
    devices.txt 0 0 4096 256 8192

  • @Tate-A Have you tried increasing it?

  • @AnotherDeadBody said in GPU plotting slow.:

    Have you tried increasing it?

    Yes It did not make it faster.

  • Link me the github please? I need to remember the syntex again.

  • For NVidia owners: OpenCL implementation is under-efficient on some NVidia cards. Try to tweak your device parameters. If there is no significant improvements, you'll need a CUDA version of this program.

  • @AnotherDeadBody said in GPU plotting slow.:

    For NVidia owners: OpenCL implementation is under-efficient on some NVidia cards. Try to tweak your device parameters. If there is no significant improvements, you'll need a CUDA version of this program.

    I have cuda.

  • @Tate-A

    cuda 8.0

  • Ok Cool cool.

  • but is there a CUDA version? IF THERE ISN'T WE NEED ONE ME THINKS 🙂

  • CUDA 8 won't really make a big difference. You need to lower the RAM and smaller global.

    Memory specifications
    Memory size: 3072 MB
    Cores / Texture
    CUDA: 2.0
    CUDA cores: 448
    ROPs: 48
    Texture units: 56

    Take the ROPS and use multiples of 2x for the Global Work size or it may be the Texture units so try both. I had this worked out but realized my CPU was fast enough. When plotting with GPU the only way to get really fast work done is on multiple drives. Most drives will bottleneck at around 30K Nonces and if you are going Direct it will be slow and steady but then when it writes the Nonces it will Ramp up from very very slow to max Nonces.

    So if I am right your settings will actually have smaller memory even though you have 16GB of RAM. Try this config...

    0 0 96 448 3072

    If you get a CL Error just change the settings...

    0 0 192 224 3072

    I would have to consult my chat logs to see how I made this work initially. But I took an Nvidia card that was only putting out 8K to 22K by reworking multiples. The global work size may just be Memory like 256 or 512 and needs to be smaller on Nvidia cards because they run in Parallel.

    If it is slow at first bump the numbers and if you consistently get CL errors then drop the 3072 memory down to half that amount. It is really tricky to tweak NVidia.

  • @CryptoNick Hi, thanks for the reply. When I tried that config the GPU plotted at 400 to 500 nonces. Right now I'm using 0 0 4480 448 8192. Its plotting at about ~10,500 nonces. Its a little better, about 1,500 nonces. When I plot I usually plot to 2 or 3 HDDs. The SATA ports are 6Gb/s and the HDDs are 3Gb/s to 6Gb/s. What do you think the max could be for this GPU? I was hoping for about 40,000 to 60,000 nonces / min. Maybe I'm crazy. 🙂

  • @Tate-A I am also running a Nvidia Tesla M2050. Best I can get is 9500 nonces/min with 0 0 2048 32 8192 other parameters either give the same result or slower, but have only being testing on a very small plot.

    Am confused as to if I am running OpenCL or Cuda as I just downloaded the Win 7 64-bit M2050 Driver from NVIDIA and when I run ListPlatforms it reports Version: OpenCL 1.2 CUDA 8.0.0


  • @RichBC OpenCL is in CUDA so you are good to go.

  • @Tate-A Yeah you should be doing better I would think. Checking my chat logs...

    With this setting we got 40K Nonces on a 970GTX= 0 0 1024 104 1792 it was erratic though but you can see we doubled the ROPS where it stated on the card there were 64 but they really only had 52 as a Memory problem with that card build and class action lawsuit ensued. Then our final stable numbers were= 0 0 1024 104 3584. You can see that the Memory is Double from the 1792 that gave 40K. It was just too erratic for some reason. This card has 1664 CUDA Cores. There are 4 GB of RAM on this card but I think we used 3584 since there is only 52 ROPS instead of 64 which affects memory etc and why there is a lawsuit.

    Ok so the second number is the Work Group size, it is working with your ROPS which are 48. There are only so many ROPS so this number is critical. So you should double the number of ROPS giving you 96.

    0 0 1024 96 16384

    Or Change the memory to half but always multiply by 1024 which you already have in your config as 8192. You can lower and raise this and change the 1024 setting which may be different for your card but that setting should deal with the chunks of memory vs the Global Work size. So it is parallel processing in the proper chunks. Since this first setting is memory you can try 64, 128, 256, 512, 768, 1024. Or possibly try adding increments of the ROPS at 48, 96,192, 240, 288, 336, 384, 432 etc. I never tried that since I dealt with the memory as it pertained to the actual total memory and just used a fraction of the memory available.

  • @CryptoNick I've been lost in trial 'n error and experiments to find optimal or at least good values for GPU and its not clear enough to adjust those parameters.
    Last of my results which i'm sure are not the best ones:
    On RX480 (8gb) I used 8192 64 8192 and got speed average 10k
    On GTX970 (4GB) I used 512 64 8192 and getting speed average 6k
    above plot in direct mode
    I got frustrated in testing coz when I test a plot of 100GB and benchmark speed, and I like the result, I reflect the parameters to plot a 8TB drive and for some reason it gives error at 98% or so .. reducing speed didnt encounter such errors but are slow.
    What would you suggest for a stable parameters?

  • I make a series of 500GB plots rather than making one plot for an entire drive. Makes it easier to move or optimize the plots later if need be. Also keep in mind the TYPE of drive you are writing to can have a significant impact on performance. For example, cheaper 'archive' drives can have low write speeds that will cap your n/m no matter what you do.

    Windows 10 PRO 64-bit
    FX-8350, 16GB RAM, GTX 1070
    7200 RPM WD Black 5TB (x 2)
    7200 RPM Seagate NAS 8TB (x 2)

    I write all plots in direct mode, which will slow things down significantly, but it also means the plots are optimized upon completion. I tend to write to two identical drives if I can since it's faster overall. In buffered mode, I've gotten up to 32 - 34K n/m. However, that drops to 18K - 20K n/m in direct mode:

    0 0 4096 512 128
    gpuPlotGenerator generate direct E://Burst/plots/X_0_1904640_10240 F://Burst/plots/X_19046400_1904640_10240

    A single drive is not ideal, but since the system has more memory available, I bump up the memory to speed things up a bit. I get about 13K - 15K n/m with these values:

    0 0 8192 512 128
    gpuPlotGenerator generate direct G://Burst/plots/X_99041280_1904640_20480

  • @rnahlawi The GTX970 only has 4GB and only has 52 ROPS instead of the listed 64.

    I think it goes like this
    Devices=0 0 Chunks=X ROPS=X Memory=X

    So you should be able to set the 970GTX at what I listed... 0 0 1024 104 3584

    But this is by no means an exact statement of what is right or wrong. It was just my experiments and I did tons of combinations on another persons card across the internet over chat. So I wasn't watching the way it acted but took feedback. They were very happy with the results.

    With the RX480 you would use different parameters. This card is AMD and uses Serial processing. The NVidia cards use Parallel processing and use OpenCL differently.