Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely la

bug-gawk
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely la

From:	Jason C. Kwan
Subject:	Re: Slowness in bignum mode ( gmp \| gawk -M ) when doubling extremely large inputs
Date:	Mon, 25 Apr 2022 17:50:00 -0400
I’m only talking to Wolfgang 

who the fuck wants to talk to a loser filled to the brim with a sense of 
entitlement like you


> arnold@skeeve.com於2022年4月25日 03:32寫道：
> 
> Hello.
> 
> If you wish, you may discuss these issues directly with the GMP team.
> 
> I have neither the skill nor the time nor the desire to try to add
> special case handling of this nature to gawk.
> 
> In general, I suspect that gawk is entirely the wrong tool for these
> kinds of calculations.
> 
> Arnold
> 
> "Jason C. Kwan" via "Bug reports only for gawk." <bug-gawk@gnu.org> wrote:
> 
>> Speaking of unnecessary conversions, I have these findings for an extreme 
>> edge case task : 
>> - trying to square  [ 7 x 10^3^17 ]  , 
>> as in 7 followed by 129,140,163 zeros (0) behind it in base-10 decimal 
>> representation
>> Method 1 : directly spelling out the formula as is,            letting gawk 
>> + gmp select its most optimized internal representation of it
>> Method 2 : using sprintf( ) to create an ASCII string            
>> representaton of the big integer, then squaring it
>> Method 3 : recognizing the fact that this value in question only has 1       
>>      significant digit in decimal, and be able to significantly            
>> reducing the computational magnitude by only having           to square 
>> digits that are significant in decimal            i.e. 7 x 7 = 49            
>>           while simply double-padding the            trailing zeros to 
>> account for the rest of it. 
>> Using identical gawk -M configurations, as shown in full code below,the 
>> speed differences are drastic - 
>> —  98.98 secs method-1— 106.49 secs method-2—   1.54 secs method-3
>> Obviously this is an extreme edge case, but I've also noticed gawk/gmp, in 
>> more than a handful of situations,
>>      "attempts to do math"
>> when none is called for.
>> Unfortunately, there's a sizable chorus in the community who possess 
>> immutable notions that string-ops are horrifically expensive across the 
>> board, and should be avoided at all costs. 
>> This code snippet showcased how the exact same gawk configuration - 
>> - same C locale,- same GMP flag,- same byte-mode flag (even though that's 
>> slightly redundant)
>> can see major speed ups when routing certain tasks via string-ops instead of 
>> strictly, and potentially unnecessarily, adhering to GMP limbs.
>> You've mentioned high big-O() penalties with roundtrip conversion - this 
>> trailing-zero scenario is exactly where nearly all the penaltiescould be 
>> avoided, and simply be limited by the speed of substr() or array splitting 
>> instead. 
>> On top of 2x before, and this squaring case, modulo-3 is another :
>> using that exact same 7 E 10^3^17, 
>> the built-in modulo ( % ) operator took 
>>      8.507 secs 
>> for a task that only requires 
>>      0.356 secs
>> As you can see, the 2-lines of code for mod3() function isn't doing anything 
>> fancy at all other than leveraging what was taught waaaay back in grade 
>> school :
>>      -  that counting digits alone         already suffice for mod-%-3 
>>      -  and that the exact same set decimal mod-%-3 rules are        equally 
>> applicable to %3 against inputs in bases 4, 7, 10, 13, 16(hex), 19, ….)
>> it stays in the realm of string-ops for as long as it could before finally 
>> having to do 1-single modulo operation against an integer likely less than 
>> 2^53, material and meaningful time savings could be realized.
>> My own version of that mod3() function uses approx 75 digits as the cut-off 
>> between whether it's more advantages to do it numerically versus 
>> gsub-stitutionally. Your mileage may vary. 
>> 
>> 
>>      out9:  302 B 0:00:00 [ 904 B/s] [ 904 B/s] [<=>               ]( 
>> LC_ALL=C gawk -p- -Mbe ; )  0.31s user 0.03s system 97% cpu 0.356 total     
>> 1 1     2 # gawk profile, created Mon Apr 25 01:26:53 2022     3     4 # 
>> BEGIN rule(s)     5      6 BEGIN {     7      1   print 
>> mod3(sprintf("7%0*.f", 3 ^ 17, 0))     8 }     9    10    11 # Functions, 
>> listed alphabetically    12    13      1  function mod3(_)    14 {    15     
>>  1   gsub("[0369CFXcfx]+", "", _)
>>     16      1   return (length(_) + (gsub("[258BEbe]", "", _))) % 3    17 }
>>      out9:  243 B 0:00:08 [28.6 B/s] [28.6 B/s] [<=>               ]( 
>> LC_ALL=C gawk -p- -Mbe ; )  8.14s user 0.35s system 99% cpu 8.507 total     
>> 1 1     2 # gawk profile, created Mon Apr 25 01:27:03 2022     3     4 # 
>> BEGIN rule(s)     5     6 BEGIN {     7      1   print 
>> mod3_builtin(sprintf("7%0*.f", 3 ^ 17, 0))     8 }     9    10    11 # 
>> Functions, listed alphabetically    12    13      1  function 
>> mod3_builtin(_)    14 {    15      1   return (_ % 3)
>>     16 }
>> 
>> 
>> You can run all that code without [ gawk -M ] flag, and the output would be 
>> the same.
>> up to u whether you deem there's any merit for me to speak with gmp team on 
>> this.
>> j
>> 
>> 
>> 
>> 
>> echo ; sleep 2; 
>> ( time ( LC_ALL=C gawk -Mbe 'BEGIN { print ( 7 * 10^3^17 ) ^ 2 }' ) | pvE9 
>> )| xxh128sum | lgp3
>> 
>>   out9: 246MiB 0:01:38 [2.49MiB/s] [2.49MiB/s] [<=> ]
>> ( LC_ALL=C gawk -Mbe 'BEGIN { print ( 7 * 10^3^17 ) ^ 2 }'; ) 95.19s user 
>> 3.73s system 99% cpu 1:38.98 total
>> e2ecfaa1bf7ddc743d5f25c89a2a59a1 stdin 
>> echo ; sleep 2; 
>> ( time ( LC_ALL=C gawk -Mbe 'BEGIN{ print sprintf("7%0*.f", 3^17, 0)^2 }' ) 
>> | pvE9 )| xxh128sum | lgp3 ; echo ;
>> 
>>   out9: 246MiB 0:01:46 [2.31MiB/s] [2.31MiB/s] [<=> ]
>> ( LC_ALL=C gawk -Mbe 'BEGIN{ print sprintf("7%0*.f", 3^17, 0)^2 }'; ) 
>> 102.37s user 4.06s system 99% cpu 1:46.49 total
>> e2ecfaa1bf7ddc743d5f25c89a2a59a1 stdin
>> 
>> echo; ( time ( LC_ALL=C gawk -Mbe '
>> BEGIN { print _x2(sprintf("7%0*.f", 3 ^ 17, 0))}
>> function _x2(_, __, ___, ____, _____, ______){ if (substr(_, (__ += __ = __ 
>> ~ __) ^ ++__, __) == "") { return (_ * _) } else if (sub("[0]+000$", "=&", 
>> _)) { return (___ = index(_, "=")) < (++__ + __--) ? int(+(substr(_, -__ < 
>> +__, --___) ^ --__)) (_ = substr(_, ___ + __)) _ : _x2(______[(! +substr(_ = 
>> ______[split(_, ______, "[=]+")], -__ < +__, __))]) (_) _ } return int(_ * 
>> _)}
>>  ' ) | pvE9 )| xxh128sum | lgp3 
>> 
>>   out9: 246MiB 0:00:01 [ 161MiB/s] [ 161MiB/s] [<=> ]
>> ( LC_ALL=C gawk -Mbe ; ) 1.40s user 0.10s system 97% cpu 1.543 total
>> e2ecfaa1bf7ddc743d5f25c89a2a59a1 stdin
>> 
>> 
>> 
>> On Thursday, January 27, 2022, 03:56:46 PM EST, Wolfgang Laun 
>> <wolfgang.laun@gmail.com> wrote: 
>> 
>> 
>> 
>> 
>> 
>> Function _2x_ implements a special case of the addition of two 
>> multiple-precision numbers. If some application (frequently) needs doubling 
>> of numbers with millions of digits it might be useful. (In almost 60 years 
>> of programming in a wide range of application areas I've never come across 
>> this particular task.)
>> 
>> It is not surprising that some specific operations can be implemented 
>> significantly faster by specific code, faster than the same algorithm using 
>> the GnuMP implementation used in gawk. print $0+$0 requires two, maybe 
>> three, conversions between string and MP binary, each at the cost of O(n) 
>> where n is the number of digits. So this requires 3O(n) or 4O(n), due to 
>> another O(n) for the addition itself. Doing the duplication on a string of 
>> digits just requires O(n). Expect a reduction to 33% or even 25%, my 
>> ballpark figure.
>> 
>> Wolfgang 
>> 
>> 
>> 
>>> On Thu, 27 Jan 2022 at 18:41, Jason C. Kwan <jasonckwan@yahoo.com> wrote:
>>> I’ll go modify the code for your convenience , but I didn’t make the code 
>>> that way to give u a hard time - I write code like that for my personal use 
>>> too. 
>>> 
>>> I don’t know how to make that function more concise and still process 
>>> doubling 1-billion-digit integers with same precision as gmp.
>>> 
>>> did you think I had Fun and joy taking time out to hash out a fully working 
>>> proof of concept to illustrate my point about the performance difference , 
>>> in hopes of starting a candid discussion instead of just saying “it’s slow” 
>>> and sound like generic whining ? 
>>> 
>>> the proof of concept function is designed to be ran *outside* of gmp mode -
>>> 
>>> You want gmp gawk code it’s 
>>> 
>>> gawk -Mbe ‘{ print $0+$0 }’
>>> 
>>> that’s it - the point of proof of concept isn’t to illustrate the 
>>> performance of gmp itself - it serves as simply an example of how it’s 
>>> feasible to use base gawk without gmp bignum mode, perform certain tasks 
>>> with same accuracy , at much faster speeds. 
>>> 
>>> I wasn’t saying  “incorporate my code” then giving you that - it’s to raise 
>>> awareness of how come it’s even feasible , with all the overhead associated 
>>> with interpreted scripting, to perform the same task significantly faster 
>>> than natively built in ones.  Interpreted scripting, by definition, should 
>>> be constrained by the performance of natively built in features. I’m not 
>>> talking about a 5% 10% margin of error thing - I wouldn’t even say anything 
>>> and risk being labeled as a time waster if it’s a small double digit gain.
>>> 
>>>  It’s somewhat paradoxical to have the scripts go 275-400% speed of the 
>>> underlying binary. 
>>> 
>>> I perfectly believe both gawk and gmp teams are giving us state of the art 
>>> software, that’s why I’m guessing it maybe a I/O issue between gawk and gmp 
>>> that might be unnecessarily slowing things down. 
>>> 
>>> but don’t worry about the next time - if I see something, I won’t waste 
>>> your precious time by saying something. 
>>> 
>>> Regards,
>>> Jason K
>>> 
>>>> Andrew J. Schorr <aschorr@telemetry-investments.com>於2022年1月27日 08:24寫道：
>>>> 
>>>> Hi,
>>>> 
>>>> As Arnold subtly pointed out, this is a bug reporting list, not a forum for
>>>> discussing obfuscated code challenges.  If you believe there's a gawk
>>>> performance problem, please provide a short, comprehensible program that
>>>> demonstrates the issue.
>>>> 
>>>> Thanks,
>>>> Andy
>>>> 
>>>>> On Thu, Jan 27, 2022 at 08:13:58AM -0500, Jason C. Kwan wrote:
>>>>> Have you tried even just straight up pasting it and run it as is ? when 
>>>>> interpreted scripting for the same platform is generating some 65% 
>>>>> reduction in execution time versus the built in compiled binary from C 
>>>>> code source, don’t you think it should at least warrant a quick look to 
>>>>> see if there are potential I/O bottlenecks between gawk and the GMP 
>>>>> add-on module it uses ?
>>>>> 
>>>>> i already used LC_ALL=C locale, per the bug reporting page, as you could 
>>>>> see below. I even let gawk -M go second so if there were any gains from 
>>>>> caching, it would be gawk -M that benefited from it. Maybe it’s an ARM 
>>>>> thing, being that I’m on m1max instead of x64
>>>>> 
>>>>> I only sent here first because I’m somewhat certain the GMP side would 
>>>>> instantly send me back here, seeing that I couldn’t isolate out whether 
>>>>> the potential bottlenecks exists on gawk side or gmp side, and I haven’t 
>>>>> written in C since 2002. I even tried reading the gawk 5.1.1 source code 
>>>>> to see if I could identify anything that could be help, but haven’t been 
>>>>> lucky.
>>>>> 
>>>>> Do u want me to read through the gmp source first ?
>>>>> 
>>>>> Regards,
>>>>> Jason K
>>>>> 
>>>>>> arnold@skeeve.com於2022年1月27日 01:42寫道：
>>>>>> 
>>>>>> Hello.
>>>>>> 
>>>>>> Thank you for taking the time to report an issue.
>>>>>> 
>>>>>> Please read the instructions for bug reporting at
>>>>>> https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.
>>>>>> It was updated recently, please reread it if you haven't looked at
>>>>>> it in a long time.
>>>>>> 
>>>>>> I'm afraid that your "proof of concept" code is so unreadable that
>>>>>> I won't even try to determine what it's doing.
>>>>>> 
>>>>>> With respect to GMP performance, I suggest that you report an
>>>>>> issue directly to the GMP developers.  The gawk code that uses
>>>>>> GMP isn't going to change.  For this reason, I don't think you
>>>>>> need to bother redoing your example code, unless you want to send
>>>>>> it to the GMP developers.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Arnold
>>>>>> 
>>>>>> "Jason C. Kwan" via "Bug reports only for gawk." <bug-gawk@gnu.org> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi GAWK team,
>>>>>>> Not sure if I should be reporting this to the gawk team or the GnuMP 
>>>>>>> team - it's neither a bug or nor new feature per se, but merely a 
>>>>>>> performance one - I've noticed on that for extremely large inputs, say, 
>>>>>>> integers with more than 5 million digits, the bignum gawk -M mode could 
>>>>>>> be somewhat slow, with possible room for speed improvement 
>>>>>>> (proof-of-concept attached below). my gawk version information : 
>>>>>>> GNU Awk 5.1.1, API: 3.1 (GNU MPFR 4.1.0, GNU MP 6.2.1)
>>>>>>> Darwin MacBook-Pro.local 21.2.0 Darwin Kernel Version 21.2.0: Sun Nov 
>>>>>>> 28 20:28:41 PST 2021; root:xnu-8019.61.5~1/RELEASE_ARM64_T6000 arm64
>>>>>>> On my test-case of 71.5 million digits, it's possible to to save 63.7% 
>>>>>>> of time , even in regular gawk, when compared to gawk -M  :
>>>>>>> ==================
>>>>>>> gawk -be '{ print length($0) }' test.txt71591842
>>>>>>> test command ::
>>>>>>> echo; time ( pv -q < test.txt | LC_ALL=C gawkmx -b -e '{ print _2x_($0) 
>>>>>>> }' ) | xxh128sum ; echo; time (pv -q < test.txt | LC_ALL=C gawk -M -b 
>>>>>>> -e '{ print $0+$0 }' ) | xxh128sum ; echo
>>>>>>> a56c2d2302d9ea8751f810b848e6f354  stdin( pv -q < test.txt | LC_ALL=C 
>>>>>>> LC_ALL=C gawk -e "${mfx}" -b) 8.16s user 0.60s system 99% cpu 8.789 
>>>>>>> totalxxh128sum  0.01s user 0.00s system 0% cpu 8.789 total
>>>>>>> a56c2d2302d9ea8751f810b848e6f354  stdin( pv -q < test.txt | LC_ALL=C 
>>>>>>> gawk -M -b -e ; )  23.26s user 0.96s system 99% cpu 24.240 
>>>>>>> totalxxh128sum  0.01s user 0.00s system 0% cpu 24.239 total
>>>>>>> =====================
>>>>>>> In another test case of slightly over 275 milion digits, the time 
>>>>>>> savings are 71.7% : 
>>>>>>> fc2231bdff375b7870586d8dffc0841c  stdin( pv -q < 
>>>>>>> jwengowengonoewgnwoegn.txt | LC_ALL=C gawk -b -e "${mfx}" -e ; )  
>>>>>>> 27.25s user 6.48s system 98% cpu 34.350 totalxxh128sum  0.04s user 
>>>>>>> 0.02s system 0% cpu 34.349 total
>>>>>>> fc2231bdff375b7870586d8dffc0841c  stdin( pv -q < 
>>>>>>> jwengowengonoewgnwoegn.txt | LC_ALL=C gawk -M -b -e ; )  116.58s user 
>>>>>>> 4.78s system 99% cpu 2:01.42 totalxxh128sum  0.04s user 0.02s system 0% 
>>>>>>> cpu 2:01.42 total
>>>>>>> ====================
>>>>>>> Attached below is the full proof-of-concept code for function _2x_( ) 
>>>>>>> to demostrate that the time savings are very much concrete and 
>>>>>>> possible, not simply theoretical. The test file, being 70MB+, is a big 
>>>>>>> large for email, but basically any file using ASCII digits 0-9 to 
>>>>>>> represent any integer over 5 million digits will do. The speed 
>>>>>>> difference isn't noticeable for smaller inputs, and for inputs fewer 
>>>>>>> than 7 digits, most likely it would be slower than gawk -M. 
>>>>>>> I tried maximizing portability of the proof-of-concept function by 
>>>>>>> refraining from any gawk-specific extensions of awk - this same code 
>>>>>>> has also been tested in mawk 1.3.4, mawk 1.9.9.6, and macos 12.1 
>>>>>>> awk/nawk. 
>>>>>>> It's entirely self-contained, performs no recursion, has no external 
>>>>>>> dependencies, no bit-wise ops, and doesn't include any advanced/fancy 
>>>>>>> math - just straight up grade-school long-form addition, doubling them 
>>>>>>> 15-digits per chunk, and 2 chunks per while loop. The carrying part is 
>>>>>>> performed by gsub( ) prior to the while( ) loop, and thus, eliminating 
>>>>>>> the need to track them along the way. Performance scales linearly at 
>>>>>>> the log-base-10 level.
>>>>>>> ( I didn't include any copyright notice or credits to a priori since I 
>>>>>>> don't think grade school addition is something copyrightable)
>>>>>>> Obviously this is simply awk scripting code and can't be directly 
>>>>>>> incorporated into the C code base - I'm merely raising awareness of the 
>>>>>>> issue.
>>>>>>> Thanks for your time.
>>>>>>> Jason
>>>>>>> ===================================================================================
>>>>>>> ( I can reformat the code for readability if you prefer )  : 
>>>>>>> function _2x_(__,_,_______,____,_________,      
>>>>>>> ___________,__________,____________,              
>>>>>>> ________,_____,______,___) {    if(__!~/[1-9]/) {        return +_ }    
>>>>>>> ___=(__~/^[-]/)    sub("^[+-]?["(-"")"]*","",__)    if 
>>>>>>> (length(__)<(_____=((_+=++_)^_^_-!!_))+_) {        if 
>>>>>>> (_________^((_____+(_~____))^(_____-_)<+__)) {            return 
>>>>>>> (-!-"")^___*(+__+__)    } }    ___=substr("-",_~"",___)    if 
>>>>>>> (__!~"[5-9]") {        gsub(/4/,"8",__)-gsub(/3/,"6",__)        
>>>>>>> gsub(/2/,"4",__)-gsub(/1/,"2",__)        return (___)__    }; 
>>>>>>> _______=____=________="";    
>>>>>>> __=(_____=substr(________=(_++^_--+_)^_____,_))(__)    
>>>>>>> gsub(/./,".",_____)         sub("("(_____)")+$","_&",__)    
>>>>>>> __=substr(__,index(__,"_")+!+"")    ________+=+________;_="";    if 
>>>>>>> ((gsub(_____,"&_",__)+gsub(/[_][5-9]/,".5&",__)*(+""))%2) {        
>>>>>>> _=(":")(________+__+__)        __=substr(__,index(__,"_")+!+"")    }    
>>>>>>> for(_____ in ______) { }    if 
>>>>>>> (______[_____=split(__,______,/[_]/)]=="") {        delete 
>>>>>>> ______[_____--] };__=____~____;    __________=____________*=\    
>>>>>>> ____________=___________=_________=32;    while(__<_____) { 
>>>>>>> if(!--__________){        
>>>>>>> __________=____________;_______=(_______)_;_="";        
>>>>>>> if(!--___________){        
>>>>>>> ___________=_________;____=(____)_______;_______="";        } }        
>>>>>>> _=(_)(":")(+______[__++]*2+________)\             
>>>>>>> (":")(+______[__++]*2+________) }    ____=(____)(_______)(_)\         
>>>>>>> (__==_____?(":")(________+2*______[_____]):"")    
>>>>>>> gsub(/[:]+[^:]/,"",____)          sub(/^0*/,"",____); return (___)____ }
>>>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> -- 
>> Wolfgang Laun
>> 
>>
[Prev in Thread]
Current Thread
[Next in Thread]
Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely large inputs, Jason C. Kwan, 2022/04/25
- Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely large inputs, arnold, 2022/04/25
  - Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely large inputs, Jason C. Kwan <=
    - Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely large inputs, Jason C. Kwan, 2022/04/26
Prev by Date: Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely large inputs
Next by Date: Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely large inputs
Previous by thread: Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely large inputs
Next by thread: Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely large inputs
Index(es):
- Date
- Thread