tag:blogger.com,1999:blog-3211409948956809184.post3390415655590203131..comments2024-03-21T04:14:27.443-07:00Comments on Large Scale Machine Learning and Other Animals: Atomic read-modify-write operation on multicoreDanny Bicksonhttp://www.blogger.com/profile/01517237836051035400noreply@blogger.comBlogger3125tag:blogger.com,1999:blog-3211409948956809184.post-2810815358547836442011-09-12T20:55:50.104-07:002011-09-12T20:55:50.104-07:00Hi Steve,
Here is the answer I got from our great...Hi Steve, <br />Here is the answer I got from our great Yucheng Low: <br />The Locked compare and exchange is already provided as a GCC builtin.<br />See http://gcc.gnu.org/onlinedocs/gcc/Atomic-Builtins.html<br /><br />That function can be written exactly as<br /><br /> inline bool CAS(long *ptr, long oldv, long newv) { return __sync_bool_compare_and_swap(ptr, oldv, newv); } <br /><br />Can you please try it out?Danny Bicksonhttps://www.blogger.com/profile/01517237836051035400noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-59124744465894324822011-09-12T14:04:16.941-07:002011-09-12T14:04:16.941-07:00Thinking outloud:
Maybe in non x86_64 archs, prov...Thinking outloud:<br /><br />Maybe in non x86_64 archs, providing some "normal" c++ code to do the operation, but guarding it with `#pragma omp critical` might suffice, no? For example, inside the `inline bool CAS` function:<br /><br />#if __x86_64__<br /> // assembly code<br />#else<br /> #pragma omp critical<br /> {<br /> //non-spiffy C++ version<br /> }<br />#endif<br /><br />Not sure if that's even right, or if it the pragma can work "so far away" (deep into) the for loop that's being ||-ized, or even it is correct, how badly it would degrade performance.<br /><br />Was just thinking of a temporary, easy out.<br /><br />-steveAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-12565156601260440102011-09-12T13:48:51.516-07:002011-09-12T13:48:51.516-07:00Hi Danny,
This post is too timely (I was debating...Hi Danny,<br /><br />This post is too timely (I was debating emailing you about this, but I didn't want to harass you ... but since you started it, the field is open :-)<br /><br />I was having a back and forth on R-devel to see what needs to be done to get around this particular inlining __asm__ in order to get shotgun/buckshot to compile on architectures other than x86_64.<br /><br />I was hit with this problem the first time I tried to compile the library because OS X, by default, will try to x-compile the library for both i386, and x86_64 (for now, anyway), and the i386 build was failing.<br /><br />I actually had a lot of assembly questions below, which I removed for brevity's sake and I'll substitute with:<br /><br />I "see" (google, SO) that gcc also has some atomic lock operations, eg:<br />http://stackoverflow.com/questions/930897/c-atomic-operations-for-lock-free-structures<br /><br />Do you (or Aapo) have any experience with them?<br /><br />Currently I'm just checking to see if we're compiling on x86_64, and if not having buckshot "fail gracefully" during runtime, but still let the compiler finish w/o error for the problematic architecture, like so:<br /><br />https://github.com/lianos/buckshot/blob/master/src/cas_array.h#L96<br /><br />Ideally I'd like to replace the Rf_error(...) with portable (maybe slower) code that would compile on other archs ... I'll hunt around for such a solution, but if you (or Aapo) have any hints, I'd be much obliged.<br /><br />Thanks,<br />-steveAnonymousnoreply@blogger.com