[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely la
From: |
Jason C. Kwan |
Subject: |
Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely large inputs |
Date: |
Mon, 25 Apr 2022 17:50:00 -0400 |
I’m only talking to Wolfgang
who the fuck wants to talk to a loser filled to the brim with a sense of
entitlement like you
> arnold@skeeve.com於2022年4月25日 03:32寫道:
>
> Hello.
>
> If you wish, you may discuss these issues directly with the GMP team.
>
> I have neither the skill nor the time nor the desire to try to add
> special case handling of this nature to gawk.
>
> In general, I suspect that gawk is entirely the wrong tool for these
> kinds of calculations.
>
> Arnold
>
> "Jason C. Kwan" via "Bug reports only for gawk." <bug-gawk@gnu.org> wrote:
>
>> Speaking of unnecessary conversions, I have these findings for an extreme
>> edge case task :
>> - trying to square [ 7 x 10^3^17 ] ,
>> as in 7 followed by 129,140,163 zeros (0) behind it in base-10 decimal
>> representation
>> Method 1 : directly spelling out the formula as is, letting gawk
>> + gmp select its most optimized internal representation of it
>> Method 2 : using sprintf( ) to create an ASCII string
>> representaton of the big integer, then squaring it
>> Method 3 : recognizing the fact that this value in question only has 1
>> significant digit in decimal, and be able to significantly
>> reducing the computational magnitude by only having to square
>> digits that are significant in decimal i.e. 7 x 7 = 49
>> while simply double-padding the trailing zeros to
>> account for the rest of it.
>> Using identical gawk -M configurations, as shown in full code below,the
>> speed differences are drastic -
>> — 98.98 secs method-1— 106.49 secs method-2— 1.54 secs method-3
>> Obviously this is an extreme edge case, but I've also noticed gawk/gmp, in
>> more than a handful of situations,
>> "attempts to do math"
>> when none is called for.
>> Unfortunately, there's a sizable chorus in the community who possess
>> immutable notions that string-ops are horrifically expensive across the
>> board, and should be avoided at all costs.
>> This code snippet showcased how the exact same gawk configuration -
>> - same C locale,- same GMP flag,- same byte-mode flag (even though that's
>> slightly redundant)
>> can see major speed ups when routing certain tasks via string-ops instead of
>> strictly, and potentially unnecessarily, adhering to GMP limbs.
>> You've mentioned high big-O() penalties with roundtrip conversion - this
>> trailing-zero scenario is exactly where nearly all the penaltiescould be
>> avoided, and simply be limited by the speed of substr() or array splitting
>> instead.
>> On top of 2x before, and this squaring case, modulo-3 is another :
>> using that exact same 7 E 10^3^17,
>> the built-in modulo ( % ) operator took
>> 8.507 secs
>> for a task that only requires
>> 0.356 secs
>> As you can see, the 2-lines of code for mod3() function isn't doing anything
>> fancy at all other than leveraging what was taught waaaay back in grade
>> school :
>> - that counting digits alone already suffice for mod-%-3
>> - and that the exact same set decimal mod-%-3 rules are equally
>> applicable to %3 against inputs in bases 4, 7, 10, 13, 16(hex), 19, ….)
>> it stays in the realm of string-ops for as long as it could before finally
>> having to do 1-single modulo operation against an integer likely less than
>> 2^53, material and meaningful time savings could be realized.
>> My own version of that mod3() function uses approx 75 digits as the cut-off
>> between whether it's more advantages to do it numerically versus
>> gsub-stitutionally. Your mileage may vary.
>>
>>
>> out9: 302 B 0:00:00 [ 904 B/s] [ 904 B/s] [<=> ](
>> LC_ALL=C gawk -p- -Mbe ; ) 0.31s user 0.03s system 97% cpu 0.356 total
>> 1 1 2 # gawk profile, created Mon Apr 25 01:26:53 2022 3 4 #
>> BEGIN rule(s) 5 6 BEGIN { 7 1 print
>> mod3(sprintf("7%0*.f", 3 ^ 17, 0)) 8 } 9 10 11 # Functions,
>> listed alphabetically 12 13 1 function mod3(_) 14 { 15
>> 1 gsub("[0369CFXcfx]+", "", _)
>> 16 1 return (length(_) + (gsub("[258BEbe]", "", _))) % 3 17 }
>> out9: 243 B 0:00:08 [28.6 B/s] [28.6 B/s] [<=> ](
>> LC_ALL=C gawk -p- -Mbe ; ) 8.14s user 0.35s system 99% cpu 8.507 total
>> 1 1 2 # gawk profile, created Mon Apr 25 01:27:03 2022 3 4 #
>> BEGIN rule(s) 5 6 BEGIN { 7 1 print
>> mod3_builtin(sprintf("7%0*.f", 3 ^ 17, 0)) 8 } 9 10 11 #
>> Functions, listed alphabetically 12 13 1 function
>> mod3_builtin(_) 14 { 15 1 return (_ % 3)
>> 16 }
>>
>>
>> You can run all that code without [ gawk -M ] flag, and the output would be
>> the same.
>> up to u whether you deem there's any merit for me to speak with gmp team on
>> this.
>> j
>>
>>
>>
>>
>> echo ; sleep 2;
>> ( time ( LC_ALL=C gawk -Mbe 'BEGIN { print ( 7 * 10^3^17 ) ^ 2 }' ) | pvE9
>> )| xxh128sum | lgp3
>>
>> out9: 246MiB 0:01:38 [2.49MiB/s] [2.49MiB/s] [<=> ]
>> ( LC_ALL=C gawk -Mbe 'BEGIN { print ( 7 * 10^3^17 ) ^ 2 }'; ) 95.19s user
>> 3.73s system 99% cpu 1:38.98 total
>> e2ecfaa1bf7ddc743d5f25c89a2a59a1 stdin
>> echo ; sleep 2;
>> ( time ( LC_ALL=C gawk -Mbe 'BEGIN{ print sprintf("7%0*.f", 3^17, 0)^2 }' )
>> | pvE9 )| xxh128sum | lgp3 ; echo ;
>>
>> out9: 246MiB 0:01:46 [2.31MiB/s] [2.31MiB/s] [<=> ]
>> ( LC_ALL=C gawk -Mbe 'BEGIN{ print sprintf("7%0*.f", 3^17, 0)^2 }'; )
>> 102.37s user 4.06s system 99% cpu 1:46.49 total
>> e2ecfaa1bf7ddc743d5f25c89a2a59a1 stdin
>>
>> echo; ( time ( LC_ALL=C gawk -Mbe '
>> BEGIN { print _x2(sprintf("7%0*.f", 3 ^ 17, 0))}
>> function _x2(_, __, ___, ____, _____, ______){ if (substr(_, (__ += __ = __
>> ~ __) ^ ++__, __) == "") { return (_ * _) } else if (sub("[0]+000$", "=&",
>> _)) { return (___ = index(_, "=")) < (++__ + __--) ? int(+(substr(_, -__ <
>> +__, --___) ^ --__)) (_ = substr(_, ___ + __)) _ : _x2(______[(! +substr(_ =
>> ______[split(_, ______, "[=]+")], -__ < +__, __))]) (_) _ } return int(_ *
>> _)}
>> ' ) | pvE9 )| xxh128sum | lgp3
>>
>> out9: 246MiB 0:00:01 [ 161MiB/s] [ 161MiB/s] [<=> ]
>> ( LC_ALL=C gawk -Mbe ; ) 1.40s user 0.10s system 97% cpu 1.543 total
>> e2ecfaa1bf7ddc743d5f25c89a2a59a1 stdin
>>
>>
>>
>> On Thursday, January 27, 2022, 03:56:46 PM EST, Wolfgang Laun
>> <wolfgang.laun@gmail.com> wrote:
>>
>>
>>
>>
>>
>> Function _2x_ implements a special case of the addition of two
>> multiple-precision numbers. If some application (frequently) needs doubling
>> of numbers with millions of digits it might be useful. (In almost 60 years
>> of programming in a wide range of application areas I've never come across
>> this particular task.)
>>
>> It is not surprising that some specific operations can be implemented
>> significantly faster by specific code, faster than the same algorithm using
>> the GnuMP implementation used in gawk. print $0+$0 requires two, maybe
>> three, conversions between string and MP binary, each at the cost of O(n)
>> where n is the number of digits. So this requires 3O(n) or 4O(n), due to
>> another O(n) for the addition itself. Doing the duplication on a string of
>> digits just requires O(n). Expect a reduction to 33% or even 25%, my
>> ballpark figure.
>>
>> Wolfgang
>>
>>
>>
>>> On Thu, 27 Jan 2022 at 18:41, Jason C. Kwan <jasonckwan@yahoo.com> wrote:
>>> I’ll go modify the code for your convenience , but I didn’t make the code
>>> that way to give u a hard time - I write code like that for my personal use
>>> too.
>>>
>>> I don’t know how to make that function more concise and still process
>>> doubling 1-billion-digit integers with same precision as gmp.
>>>
>>> did you think I had Fun and joy taking time out to hash out a fully working
>>> proof of concept to illustrate my point about the performance difference ,
>>> in hopes of starting a candid discussion instead of just saying “it’s slow”
>>> and sound like generic whining ?
>>>
>>> the proof of concept function is designed to be ran *outside* of gmp mode -
>>>
>>> You want gmp gawk code it’s
>>>
>>> gawk -Mbe ‘{ print $0+$0 }’
>>>
>>> that’s it - the point of proof of concept isn’t to illustrate the
>>> performance of gmp itself - it serves as simply an example of how it’s
>>> feasible to use base gawk without gmp bignum mode, perform certain tasks
>>> with same accuracy , at much faster speeds.
>>>
>>> I wasn’t saying “incorporate my code” then giving you that - it’s to raise
>>> awareness of how come it’s even feasible , with all the overhead associated
>>> with interpreted scripting, to perform the same task significantly faster
>>> than natively built in ones. Interpreted scripting, by definition, should
>>> be constrained by the performance of natively built in features. I’m not
>>> talking about a 5% 10% margin of error thing - I wouldn’t even say anything
>>> and risk being labeled as a time waster if it’s a small double digit gain.
>>>
>>> It’s somewhat paradoxical to have the scripts go 275-400% speed of the
>>> underlying binary.
>>>
>>> I perfectly believe both gawk and gmp teams are giving us state of the art
>>> software, that’s why I’m guessing it maybe a I/O issue between gawk and gmp
>>> that might be unnecessarily slowing things down.
>>>
>>> but don’t worry about the next time - if I see something, I won’t waste
>>> your precious time by saying something.
>>>
>>> Regards,
>>> Jason K
>>>
>>>> Andrew J. Schorr <aschorr@telemetry-investments.com>於2022年1月27日 08:24寫道:
>>>>
>>>> Hi,
>>>>
>>>> As Arnold subtly pointed out, this is a bug reporting list, not a forum for
>>>> discussing obfuscated code challenges. If you believe there's a gawk
>>>> performance problem, please provide a short, comprehensible program that
>>>> demonstrates the issue.
>>>>
>>>> Thanks,
>>>> Andy
>>>>
>>>>> On Thu, Jan 27, 2022 at 08:13:58AM -0500, Jason C. Kwan wrote:
>>>>> Have you tried even just straight up pasting it and run it as is ? when
>>>>> interpreted scripting for the same platform is generating some 65%
>>>>> reduction in execution time versus the built in compiled binary from C
>>>>> code source, don’t you think it should at least warrant a quick look to
>>>>> see if there are potential I/O bottlenecks between gawk and the GMP
>>>>> add-on module it uses ?
>>>>>
>>>>> i already used LC_ALL=C locale, per the bug reporting page, as you could
>>>>> see below. I even let gawk -M go second so if there were any gains from
>>>>> caching, it would be gawk -M that benefited from it. Maybe it’s an ARM
>>>>> thing, being that I’m on m1max instead of x64
>>>>>
>>>>> I only sent here first because I’m somewhat certain the GMP side would
>>>>> instantly send me back here, seeing that I couldn’t isolate out whether
>>>>> the potential bottlenecks exists on gawk side or gmp side, and I haven’t
>>>>> written in C since 2002. I even tried reading the gawk 5.1.1 source code
>>>>> to see if I could identify anything that could be help, but haven’t been
>>>>> lucky.
>>>>>
>>>>> Do u want me to read through the gmp source first ?
>>>>>
>>>>> Regards,
>>>>> Jason K
>>>>>
>>>>>> arnold@skeeve.com於2022年1月27日 01:42寫道:
>>>>>>
>>>>>> Hello.
>>>>>>
>>>>>> Thank you for taking the time to report an issue.
>>>>>>
>>>>>> Please read the instructions for bug reporting at
>>>>>> https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.
>>>>>> It was updated recently, please reread it if you haven't looked at
>>>>>> it in a long time.
>>>>>>
>>>>>> I'm afraid that your "proof of concept" code is so unreadable that
>>>>>> I won't even try to determine what it's doing.
>>>>>>
>>>>>> With respect to GMP performance, I suggest that you report an
>>>>>> issue directly to the GMP developers. The gawk code that uses
>>>>>> GMP isn't going to change. For this reason, I don't think you
>>>>>> need to bother redoing your example code, unless you want to send
>>>>>> it to the GMP developers.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Arnold
>>>>>>
>>>>>> "Jason C. Kwan" via "Bug reports only for gawk." <bug-gawk@gnu.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi GAWK team,
>>>>>>> Not sure if I should be reporting this to the gawk team or the GnuMP
>>>>>>> team - it's neither a bug or nor new feature per se, but merely a
>>>>>>> performance one - I've noticed on that for extremely large inputs, say,
>>>>>>> integers with more than 5 million digits, the bignum gawk -M mode could
>>>>>>> be somewhat slow, with possible room for speed improvement
>>>>>>> (proof-of-concept attached below). my gawk version information :
>>>>>>> GNU Awk 5.1.1, API: 3.1 (GNU MPFR 4.1.0, GNU MP 6.2.1)
>>>>>>> Darwin MacBook-Pro.local 21.2.0 Darwin Kernel Version 21.2.0: Sun Nov
>>>>>>> 28 20:28:41 PST 2021; root:xnu-8019.61.5~1/RELEASE_ARM64_T6000 arm64
>>>>>>> On my test-case of 71.5 million digits, it's possible to to save 63.7%
>>>>>>> of time , even in regular gawk, when compared to gawk -M :
>>>>>>> ==================
>>>>>>> gawk -be '{ print length($0) }' test.txt71591842
>>>>>>> test command ::
>>>>>>> echo; time ( pv -q < test.txt | LC_ALL=C gawkmx -b -e '{ print _2x_($0)
>>>>>>> }' ) | xxh128sum ; echo; time (pv -q < test.txt | LC_ALL=C gawk -M -b
>>>>>>> -e '{ print $0+$0 }' ) | xxh128sum ; echo
>>>>>>> a56c2d2302d9ea8751f810b848e6f354 stdin( pv -q < test.txt | LC_ALL=C
>>>>>>> LC_ALL=C gawk -e "${mfx}" -b) 8.16s user 0.60s system 99% cpu 8.789
>>>>>>> totalxxh128sum 0.01s user 0.00s system 0% cpu 8.789 total
>>>>>>> a56c2d2302d9ea8751f810b848e6f354 stdin( pv -q < test.txt | LC_ALL=C
>>>>>>> gawk -M -b -e ; ) 23.26s user 0.96s system 99% cpu 24.240
>>>>>>> totalxxh128sum 0.01s user 0.00s system 0% cpu 24.239 total
>>>>>>> =====================
>>>>>>> In another test case of slightly over 275 milion digits, the time
>>>>>>> savings are 71.7% :
>>>>>>> fc2231bdff375b7870586d8dffc0841c stdin( pv -q <
>>>>>>> jwengowengonoewgnwoegn.txt | LC_ALL=C gawk -b -e "${mfx}" -e ; )
>>>>>>> 27.25s user 6.48s system 98% cpu 34.350 totalxxh128sum 0.04s user
>>>>>>> 0.02s system 0% cpu 34.349 total
>>>>>>> fc2231bdff375b7870586d8dffc0841c stdin( pv -q <
>>>>>>> jwengowengonoewgnwoegn.txt | LC_ALL=C gawk -M -b -e ; ) 116.58s user
>>>>>>> 4.78s system 99% cpu 2:01.42 totalxxh128sum 0.04s user 0.02s system 0%
>>>>>>> cpu 2:01.42 total
>>>>>>> ====================
>>>>>>> Attached below is the full proof-of-concept code for function _2x_( )
>>>>>>> to demostrate that the time savings are very much concrete and
>>>>>>> possible, not simply theoretical. The test file, being 70MB+, is a big
>>>>>>> large for email, but basically any file using ASCII digits 0-9 to
>>>>>>> represent any integer over 5 million digits will do. The speed
>>>>>>> difference isn't noticeable for smaller inputs, and for inputs fewer
>>>>>>> than 7 digits, most likely it would be slower than gawk -M.
>>>>>>> I tried maximizing portability of the proof-of-concept function by
>>>>>>> refraining from any gawk-specific extensions of awk - this same code
>>>>>>> has also been tested in mawk 1.3.4, mawk 1.9.9.6, and macos 12.1
>>>>>>> awk/nawk.
>>>>>>> It's entirely self-contained, performs no recursion, has no external
>>>>>>> dependencies, no bit-wise ops, and doesn't include any advanced/fancy
>>>>>>> math - just straight up grade-school long-form addition, doubling them
>>>>>>> 15-digits per chunk, and 2 chunks per while loop. The carrying part is
>>>>>>> performed by gsub( ) prior to the while( ) loop, and thus, eliminating
>>>>>>> the need to track them along the way. Performance scales linearly at
>>>>>>> the log-base-10 level.
>>>>>>> ( I didn't include any copyright notice or credits to a priori since I
>>>>>>> don't think grade school addition is something copyrightable)
>>>>>>> Obviously this is simply awk scripting code and can't be directly
>>>>>>> incorporated into the C code base - I'm merely raising awareness of the
>>>>>>> issue.
>>>>>>> Thanks for your time.
>>>>>>> Jason
>>>>>>> ===================================================================================
>>>>>>> ( I can reformat the code for readability if you prefer ) :
>>>>>>> function _2x_(__,_,_______,____,_________,
>>>>>>> ___________,__________,____________,
>>>>>>> ________,_____,______,___) { if(__!~/[1-9]/) { return +_ }
>>>>>>> ___=(__~/^[-]/) sub("^[+-]?["(-"")"]*","",__) if
>>>>>>> (length(__)<(_____=((_+=++_)^_^_-!!_))+_) { if
>>>>>>> (_________^((_____+(_~____))^(_____-_)<+__)) { return
>>>>>>> (-!-"")^___*(+__+__) } } ___=substr("-",_~"",___) if
>>>>>>> (__!~"[5-9]") { gsub(/4/,"8",__)-gsub(/3/,"6",__)
>>>>>>> gsub(/2/,"4",__)-gsub(/1/,"2",__) return (___)__ };
>>>>>>> _______=____=________="";
>>>>>>> __=(_____=substr(________=(_++^_--+_)^_____,_))(__)
>>>>>>> gsub(/./,".",_____) sub("("(_____)")+$","_&",__)
>>>>>>> __=substr(__,index(__,"_")+!+"") ________+=+________;_=""; if
>>>>>>> ((gsub(_____,"&_",__)+gsub(/[_][5-9]/,".5&",__)*(+""))%2) {
>>>>>>> _=(":")(________+__+__) __=substr(__,index(__,"_")+!+"") }
>>>>>>> for(_____ in ______) { } if
>>>>>>> (______[_____=split(__,______,/[_]/)]=="") { delete
>>>>>>> ______[_____--] };__=____~____; __________=____________*=\
>>>>>>> ____________=___________=_________=32; while(__<_____) {
>>>>>>> if(!--__________){
>>>>>>> __________=____________;_______=(_______)_;_="";
>>>>>>> if(!--___________){
>>>>>>> ___________=_________;____=(____)_______;_______=""; } }
>>>>>>> _=(_)(":")(+______[__++]*2+________)\
>>>>>>> (":")(+______[__++]*2+________) } ____=(____)(_______)(_)\
>>>>>>> (__==_____?(":")(________+2*______[_____]):"")
>>>>>>> gsub(/[:]+[^:]/,"",____) sub(/^0*/,"",____); return (___)____ }
>>>>>
>>>
>>>
>>>
>>
>>
>> --
>> Wolfgang Laun
>>
>>