Joining plot files?



  • Hi all,
    So I have a 21TB array that I have been filling with 100GB sized plot files.
    The reason for this is that it was easier to create the files locally and then copy the files over the network to the array.
    So I have a few questions

    1. Is it more efficient/effective to have fewer, larger plots?
    2. if the first answer is yes, is there a way of joining the smaller plots together?
    3. When the Blago miner reads, is it multithreaded?
    4. If the miner is multithreaded, does each thread work on a separate drive or separate file?
    5. Would I be better off breaking the array down into smaller distinct drives to improve the speed of the reads?
      I have tried to search for these answers but none of my searches have explicitly answered these questions exactly (or I am just rubbish at searching!).
      Thanks.

  • admin

    @machasm

    1. Yes.
    2. If they are sequential and there is no gap between them, and have the same stagger, then yes, they can be combined.
    3. Yes - thread per drive/directory.
    4. See 3.
    5. Optimal config is having as many drives/directories as your CPU has threads.


  • @haitch said in Joining plot files?:

    @machasm

    1. Yes.
    2. If they are sequential and there is no gap between them, and have the same stagger, then yes, they can be combined.
    3. Yes - thread per drive/directory.
    4. See 3.
    5. Optimal config is having as many drives/directories as your CPU has threads.

    Thank you Haitch.
    How do you join the plots together? Is there a tool?
    So on a machine with only 4 cores is it worth having more than 4 drives/directories?
    What happens if one directory contains almost all the files while the others have far fewer files? Do the cpu cores go idle on the directories/drives that have finished reading? So, in other words, should I balance up the plots to be evenly distributed amongst separate folders?



  • #2, never knew this could be done, could you elaborate how to do it?

    #5 Are you saying if I have 8 threads and 16 drives, I should assign 8 of the drives a drive letter that maps into the other drives (volume mounting), like drive 1 is E: , drive 2 is E:\second_drive , drive 3 is F: , drive 4 is F:\fourth_drive, etc.?



  • @rds
    #2 http://www.hjsplit.org/windows
    https://superuser.com/questions/80081/how-to-split-and-combine-files
    #5 nope. but you can try something like

    "Paths":["C:\\plots+E:\\plots","F:\\plots+G:\\plots"],
    

    virtually join 2 drives as one by using "+"
    (in my case - 12 drives per 6 threads much faster than 6+6 per 6 threads)


  • admin

    @machasm assuming your existing files are on <driveA> and <driveB>, you can do:

    copy /b <driveA>\<plotfile> + <driveB><plotfile> <driveC>\<id>_<SN of A>_<combined nonces of both files>_<stagger>

    But again - the files MUST be sequential and MUST have the same stagger



  • @haitch said in Joining plot files?:

    copy /b <driveA><plotfile> + <driveB><plotfile> <driveC><id><SN of A><combined nonces of both files>_<stagger>

    Thanks Haitch for your help. Much appreciated.



  • @Blago said in Joining plot files?:

    @rds
    #2 http://www.hjsplit.org/windows
    https://superuser.com/questions/80081/how-to-split-and-combine-files
    #5 nope. but you can try something like

    "Paths":["C:\\plots+E:\\plots","F:\\plots+G:\\plots"],
    

    virtually join 2 drives as one by using "+"
    (in my case - 12 drives per 6 threads much faster than 6+6 per 6 threads)

    Can you do thin in the jminer as well? I tried it and it terminated.



  • @haitch said in Joining plot files?:

    @machasm assuming your existing files are on <driveA> and <driveB>, you can do:

    copy /b <driveA>\<plotfile> + <driveB><plotfile> <driveC>\<id>_<SN of A>_<combined nonces of both files>_<stagger>

    But again - the files MUST be sequential and MUST have the same stagger

    How much faster do you think an 8TB drive would scan if there was one big file, vs 8, 1TB files? 1%, 5%,19%, 20%?


  • admin

    @rds if the combined file was not optimized, not much faster; however if it was optimized, then I'd expect a significant boost, but can't give you a %.



  • @Blago said in Joining plot files?:

    @rds
    #2 http://www.hjsplit.org/windows
    https://superuser.com/questions/80081/how-to-split-and-combine-files
    #5 nope. but you can try something like

    "Paths":["C:\\plots+E:\\plots","F:\\plots+G:\\plots"],
    

    virtually join 2 drives as one by using "+"
    (in my case - 12 drives per 6 threads much faster than 6+6 per 6 threads)

    I did a little test tonight. Turned off all other programs (plotter, wallets, etc.)

    I have 20 drives connected to my 4 core laptop running the Blago miner.

    For two identical blocks I ran the miner using:

    1. using the path command of 20 sequential drives, A:\Burst\plots,B:Burst]plots, etc.
    2. combined drives with the "+" sign to great 4 threads of approximately the same size.

    For the 2 blocks, method 2) scan time was 70% greater than method 1) scan time.

    For the 2 blocks, method 1) showed 85% CPU usage on task manager, method 1) showed 65%.

    So, for me, the old way is way faster.


  • admin

    @rds The combined "+" drives are mined in sequence it does the first drive then the "+"''d second drive and then the third .... and so on. Not using the "+", all drives are processed in parallel with a thread per drive/directory. If you have more drives than cores/threads then there is overhead with thread switching, but that overhead is dwarfed by the HDD seek/read overheads - I'd expect the purely parallel method to always be faster.



  • @haitch said in Joining plot files?:

    @rds The combined "+" drives are mined in sequence it does the first drive then the "+"''d second drive and then the third .... and so on. Not using the "+", all drives are processed in parallel with a thread per drive/directory. If you have more drives than cores/threads then there is overhead with thread switching, but that overhead is dwarfed by the HDD seek/read overheads - I'd expect the purely parallel method to always be faster.

    @Blago said above just the opposite I thought. He said to chain the drives so you have the same amount of chained drivers as threads. That's what I did, I chained 5 drives into one thread, 4 times so my 4 cores would run 4 drives instead of 20. He implied the chained drives would be faster, I think, unless I read his post wrong.


  • admin

    @rds I don't believe that was what he was saying - he said "(in my case - 12 drives per 6 threads much faster than 6+6 per 6 threads)" - so 12 drives simultaneously, despite only having 6 threads, was much faster than doing 6 drives then another 6 drives.



  • @haitch said in Joining plot files?:

    @rds I don't believe that was what he was saying - he said "(in my case - 12 drives per 6 threads much faster than 6+6 per 6 threads)" - so 12 drives simultaneously, despite only having 6 threads, was much faster than doing 6 drives then another 6 drives.

    Ok, got it.



  • @haitch said in Joining plot files?:

    @machasm assuming your existing files are on <driveA> and <driveB>, you can do:

    copy /b <driveA>\<plotfile> + <driveB><plotfile> <driveC>\<id>_<SN of A>_<combined nonces of both files>_<stagger>

    But again - the files MUST be sequential and MUST have the same stagger

    @haitch, To expand on this, does this have to be done 2 at a time? Can I combine 3 files with the same # of nonces and stagger into one file with the starting nonce of the first and the the # of nonces 3 x each file nonces? After I do this twice on 2x3 files can I now combine the the two 3 file copies into one file that contains the nonces of all 6 files, etc.?

    Also, the instructions imply the output file has a stagger size of 1/2 the number of total nonces, then is the file not optimized? e.g., file 1 has 4096 nonces and 4096 stagger, file 2 has 4096 nonces and 4096 stagger, output file has 8192 nonces and 4096 stagger (not optimized) or is it 8192 stagger (optimized)?

    Have you tested a combined file? Are there not headers and/or footers in a plot file architecture that would corrupt the integrity of the combined file?


  • admin

    @rds Yes you can combine multiple files: copy /b <fileA> + <fileB> + <fileC> .... + <FileX> <DestFile>

    The resultant file will not be optimized, even if the source files are - stagger will not equal total nonces.

    Yes, I've tested a combined file, specifically for the purpose of testing combined files.

    The files have no headers or footers - all the metadata about the file is in the filename, the file contains only the precomputed nonces.