I've just pushed the native POC2 support to the GPU plot generator. I'm searching for beta testers to validate this release.
Repository : https://github.com/bhamon/gpuPlotGenerator
@Weasel To solve the lag issue, as explained before, the last parameter (
hashesNumber) is in charge of cutting the most intensive job to pieces.
If your graphic card is tied to your display (if it's the one your system use to render your desktop and active windows), a value of
4096 won't let free time to your OS to send desktop refresh rendering. So the system will simply kill your process (there is a watchdog for that on every modern OSes).
So, in brief, you don't have to lower the other parameters, just the last one, to a very low value (like
4 for example). There will only be more orders sent from the plotter to the GPU, but it shouldn't affect the whole througput.
The best thing to do, of course, is to dedicate the graphic card to the plotter. Use you CPU graphic card (Intel graphics most likely) to render your desktop.
Once you don't experience lags anymore, you can begin to touch to the other parameters to enhance the overall nonces/min. That part is entirely GPU dependant. From my tests, the
globalWorkSize should be near the
maxAllocationSize. For the
localWorkSize, just try powers of 2 until you find the best value.
@luxe Thank you for your feedback. Yes I totally agree with you: the more RAM you have the faster it'll be. I reorder the generated plots in the CPU RAM buffer before writing 4096 times to the final file, no matter what the PLOTS_NB is.
Another tweak is to ensure the CPU RAM buffer is envenly dividible by the GPU RAM size. Thus the graphic card won't be put on hold while the writing occurs.
direct mode improvements are on the 4.1+ version. I updated the plotter to v4.1.3 to support CentOS (6 & 7) build (there is even CentOS binaries compiled against CUDA 8.0 SDK on my repository now).
About your mixed AMD/NVidia environment, there is no runtime inter-compatibility between AMD and NVidia cards for now.
Nevertheless, you can build two plotters: one with the
libOpenCL.so from the CUDA SDK, and one with the AMD SDK. You won't be able to plot to one single file with your two cards together but it's not the best choice anyway if you use the
hashesNumber should be renamed as
intensity. Performances are almost the same between
4. It's just that the global work will be divided in more steps to allow the graphic card to answer to standard display rendering calls, or else a watchdog kills the plotter to prevent the display to hang.
device.txt file, you can try :
2 0 8192 1024 8192
8192= 2GB GPU RAM
1024= 1024 CUDA cores
8192if your card is tied to your display, else
About the devices list, I totally agree on this. That was the idea behind an issue I opened a while ago. At that time I haven't found many volunteers to share and collect those parameters.
I can gladly add a file on the official repository to list all the devices reported by the community along with working parameters and average nonces/minutes.
@sevencardz Sounds like it's not only affecting the last scoop. I'll run some tests on my side to try to reproduce this.
@BeholdMiNuggets The error
CL_OUT_OF_RESOURCES is not so self explaining that it sounds. In brief: your card can't process the second step of the GPU kernel with your actual parameters (
localWorkSize). This step is the most intensive one as it fills the GPU buffer with scoops by performing a lot of shabal hashes.
To fix that:
globalWorkSizecan evenly divide your
localWorkSizevalues. If you run the
listDevicescommand on your GPU platform it'll output some hints from the card (like the
You may say "why can't the plotter automatically determine those tricky parameters", the simple answer is: because the returned hint values don't ensure the success. In fact, most of the time, what graphic cards claim to support doesn't match with reality.
setup command, my actual strategy is:
globalWorkSize, to take the minimum value between
globalMemorySize / PLOT_SIZEand
maxMemoryAllocationSize / PLOT_SIZE.
localWorkSize, to take the maximum value between 1 and
(maxWorkItemSizes / 4) * 3. This formula sucks but it has the best results for now.
@sevencardz Can you test around the nonce reported in the miner output (
320348789 from your example output):
./gpuPlotGenerator generate buffer C:/13668371040637458609_320348689_200_200 ./gpuPlotGenerator verify A:/13668371040637458609_318883840_1525760_1525760 C:/13668371040637458609_320348689_200_200