125c28e83SPiotr Jasiukajtis /* 225c28e83SPiotr Jasiukajtis * CDDL HEADER START 325c28e83SPiotr Jasiukajtis * 425c28e83SPiotr Jasiukajtis * The contents of this file are subject to the terms of the 525c28e83SPiotr Jasiukajtis * Common Development and Distribution License (the "License"). 625c28e83SPiotr Jasiukajtis * You may not use this file except in compliance with the License. 725c28e83SPiotr Jasiukajtis * 825c28e83SPiotr Jasiukajtis * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 925c28e83SPiotr Jasiukajtis * or http://www.opensolaris.org/os/licensing. 1025c28e83SPiotr Jasiukajtis * See the License for the specific language governing permissions 1125c28e83SPiotr Jasiukajtis * and limitations under the License. 1225c28e83SPiotr Jasiukajtis * 1325c28e83SPiotr Jasiukajtis * When distributing Covered Code, include this CDDL HEADER in each 1425c28e83SPiotr Jasiukajtis * file and include the License file at usr/src/OPENSOLARIS.LICENSE. 1525c28e83SPiotr Jasiukajtis * If applicable, add the following below this CDDL HEADER, with the 1625c28e83SPiotr Jasiukajtis * fields enclosed by brackets "[]" replaced with your own identifying 1725c28e83SPiotr Jasiukajtis * information: Portions Copyright [yyyy] [name of copyright owner] 1825c28e83SPiotr Jasiukajtis * 1925c28e83SPiotr Jasiukajtis * CDDL HEADER END 2025c28e83SPiotr Jasiukajtis */ 2125c28e83SPiotr Jasiukajtis 2225c28e83SPiotr Jasiukajtis /* 2325c28e83SPiotr Jasiukajtis * Copyright 2011 Nexenta Systems, Inc. All rights reserved. 2425c28e83SPiotr Jasiukajtis */ 2525c28e83SPiotr Jasiukajtis /* 2625c28e83SPiotr Jasiukajtis * Copyright 2006 Sun Microsystems, Inc. All rights reserved. 2725c28e83SPiotr Jasiukajtis * Use is subject to license terms. 2825c28e83SPiotr Jasiukajtis */ 2925c28e83SPiotr Jasiukajtis 30*ddc0e0b5SRichard Lowe #pragma weak __tgamma = tgamma 3125c28e83SPiotr Jasiukajtis 3225c28e83SPiotr Jasiukajtis /* INDENT OFF */ 3325c28e83SPiotr Jasiukajtis /* 3425c28e83SPiotr Jasiukajtis * True gamma function 3525c28e83SPiotr Jasiukajtis * double tgamma(double x) 3625c28e83SPiotr Jasiukajtis * 3725c28e83SPiotr Jasiukajtis * Error: 3825c28e83SPiotr Jasiukajtis * ------ 3925c28e83SPiotr Jasiukajtis * Less that one ulp for both positive and negative arguments. 4025c28e83SPiotr Jasiukajtis * 4125c28e83SPiotr Jasiukajtis * Algorithm: 4225c28e83SPiotr Jasiukajtis * --------- 4325c28e83SPiotr Jasiukajtis * A: For negative argument 4425c28e83SPiotr Jasiukajtis * (1) gamma(-n or -inf) is NaN 4525c28e83SPiotr Jasiukajtis * (2) Underflow Threshold 4625c28e83SPiotr Jasiukajtis * (3) Reduction to gamma(1+x) 4725c28e83SPiotr Jasiukajtis * B: For x between 1 and 2 4825c28e83SPiotr Jasiukajtis * C: For x between 0 and 1 4925c28e83SPiotr Jasiukajtis * D: For x between 2 and 8 5025c28e83SPiotr Jasiukajtis * E: Overflow thresold {see over.c} 5125c28e83SPiotr Jasiukajtis * F: For overflow_threshold >= x >= 8 5225c28e83SPiotr Jasiukajtis * 5325c28e83SPiotr Jasiukajtis * Implementation details 5425c28e83SPiotr Jasiukajtis * ----------------------- 5525c28e83SPiotr Jasiukajtis * -pi 5625c28e83SPiotr Jasiukajtis * (A) For negative argument, use gamma(-x) = ------------------------. 5725c28e83SPiotr Jasiukajtis * (sin(pi*x)*gamma(1+x)) 5825c28e83SPiotr Jasiukajtis * 5925c28e83SPiotr Jasiukajtis * (1) gamma(-n or -inf) is NaN with invalid signal by SUSv3 spec. 6025c28e83SPiotr Jasiukajtis * (Ideally, gamma(-n) = 1/sinpi(n) = (-1)**(n+1) * inf.) 6125c28e83SPiotr Jasiukajtis * 6225c28e83SPiotr Jasiukajtis * (2) Underflow Threshold. For each precision, there is a value T 6325c28e83SPiotr Jasiukajtis * such that when x>T and when x is not an integer, gamma(-x) will 6425c28e83SPiotr Jasiukajtis * always underflow. A table of the underflow threshold value is given 6525c28e83SPiotr Jasiukajtis * below. For proof, see file "under.c". 6625c28e83SPiotr Jasiukajtis * 6725c28e83SPiotr Jasiukajtis * Precision underflow threshold T = 6825c28e83SPiotr Jasiukajtis * ---------------------------------------------------------------------- 6925c28e83SPiotr Jasiukajtis * single 41.000041962 = 41 + 11 ULP 7025c28e83SPiotr Jasiukajtis * (machine format) 4224000B 7125c28e83SPiotr Jasiukajtis * double 183.000000000000312639 = 183 + 11 ULP 7225c28e83SPiotr Jasiukajtis * (machine format) 4066E000 0000000B 7325c28e83SPiotr Jasiukajtis * quad 1774.0000000000000000000000000000017749370 = 1774 + 9 ULP 7425c28e83SPiotr Jasiukajtis * (machine format) 4009BB80000000000000000000000009 7525c28e83SPiotr Jasiukajtis * ---------------------------------------------------------------------- 7625c28e83SPiotr Jasiukajtis * 7725c28e83SPiotr Jasiukajtis * (3) Reduction to gamma(1+x). 7825c28e83SPiotr Jasiukajtis * Because of (1) and (2), we need only consider non-integral x 7925c28e83SPiotr Jasiukajtis * such that 0<x<T. Let k = [x] and z = x-[x]. Define 8025c28e83SPiotr Jasiukajtis * sin(x*pi) cos(x*pi) 8125c28e83SPiotr Jasiukajtis * kpsin(x) = --------- and kpcos(x) = --------- . Then 8225c28e83SPiotr Jasiukajtis * pi pi 8325c28e83SPiotr Jasiukajtis * 1 8425c28e83SPiotr Jasiukajtis * gamma(-x) = --------------------. 8525c28e83SPiotr Jasiukajtis * -kpsin(x)*gamma(1+x) 8625c28e83SPiotr Jasiukajtis * Since x = k+z, 8725c28e83SPiotr Jasiukajtis * k+1 8825c28e83SPiotr Jasiukajtis * -sin(x*pi) = -sin(k*pi+z*pi) = (-1) *sin(z*pi), 8925c28e83SPiotr Jasiukajtis * k+1 9025c28e83SPiotr Jasiukajtis * we have -kpsin(x) = (-1) * kpsin(z). We can further 9125c28e83SPiotr Jasiukajtis * reduce z to t by 9225c28e83SPiotr Jasiukajtis * (I) t = z when 0.00000 <= z < 0.31830... 9325c28e83SPiotr Jasiukajtis * (II) t = 0.5-z when 0.31830... <= z < 0.681690... 9425c28e83SPiotr Jasiukajtis * (III) t = 1-z when 0.681690... <= z < 1.00000 9525c28e83SPiotr Jasiukajtis * and correspondingly 9625c28e83SPiotr Jasiukajtis * (I) kpsin(z) = kpsin(t) ... 0<= z < 0.3184 9725c28e83SPiotr Jasiukajtis * (II) kpsin(z) = kpcos(t) ... |t| < 0.182 9825c28e83SPiotr Jasiukajtis * (III) kpsin(z) = kpsin(t) ... 0<= t < 0.3184 9925c28e83SPiotr Jasiukajtis * 10025c28e83SPiotr Jasiukajtis * Using a special Remez algorithm, we obtain the following polynomial 10125c28e83SPiotr Jasiukajtis * approximation for kpsin(t) for 0<=t<0.3184: 10225c28e83SPiotr Jasiukajtis * 10325c28e83SPiotr Jasiukajtis * Computation note: in simulating higher precision arithmetic, kcpsin 10425c28e83SPiotr Jasiukajtis * return head = t and tail = ks[0]*t^3 + (...) to maintain extra bits. 10525c28e83SPiotr Jasiukajtis * 10625c28e83SPiotr Jasiukajtis * Quad precision, remez error <= 2**(-129.74) 10725c28e83SPiotr Jasiukajtis * 3 5 27 10825c28e83SPiotr Jasiukajtis * kpsin(t) = t + ks[0] * t + ks[1] * t + ... + ks[12] * t 10925c28e83SPiotr Jasiukajtis * 11025c28e83SPiotr Jasiukajtis * ks[ 0] = -1.64493406684822643647241516664602518705158902870e+0000 11125c28e83SPiotr Jasiukajtis * ks[ 1] = 8.11742425283353643637002772405874238094995726160e-0001 11225c28e83SPiotr Jasiukajtis * ks[ 2] = -1.90751824122084213696472111835337366232282723933e-0001 11325c28e83SPiotr Jasiukajtis * ks[ 3] = 2.61478478176548005046532613563241288115395517084e-0002 11425c28e83SPiotr Jasiukajtis * ks[ 4] = -2.34608103545582363750893072647117829448016479971e-0003 11525c28e83SPiotr Jasiukajtis * ks[ 5] = 1.48428793031071003684606647212534027556262040158e-0004 11625c28e83SPiotr Jasiukajtis * ks[ 6] = -6.97587366165638046518462722252768122615952898698e-0006 11725c28e83SPiotr Jasiukajtis * ks[ 7] = 2.53121740413702536928659271747187500934840057929e-0007 11825c28e83SPiotr Jasiukajtis * ks[ 8] = -7.30471182221385990397683641695766121301933621956e-0009 11925c28e83SPiotr Jasiukajtis * ks[ 9] = 1.71653847451163495739958249695549313987973589884e-0010 12025c28e83SPiotr Jasiukajtis * ks[10] = -3.34813314714560776122245796929054813458341420565e-0012 12125c28e83SPiotr Jasiukajtis * ks[11] = 5.50724992262622033449487808306969135431411753047e-0014 12225c28e83SPiotr Jasiukajtis * ks[12] = -7.67678132753577998601234393215802221104236979928e-0016 12325c28e83SPiotr Jasiukajtis * 12425c28e83SPiotr Jasiukajtis * Double precision, Remez error <= 2**(-62.9) 12525c28e83SPiotr Jasiukajtis * 3 5 15 12625c28e83SPiotr Jasiukajtis * kpsin(t) = t + ks[0] * t + ks[1] * t + ... + ks[6] * t 12725c28e83SPiotr Jasiukajtis * 12825c28e83SPiotr Jasiukajtis * ks[0] = -1.644934066848226406065691 (0x3ffa51a6 625307d3) 12925c28e83SPiotr Jasiukajtis * ks[1] = 8.11742425283341655883668741874008920850698590621e-0001 13025c28e83SPiotr Jasiukajtis * ks[2] = -1.90751824120862873825597279118304943994042258291e-0001 13125c28e83SPiotr Jasiukajtis * ks[3] = 2.61478477632554278317289628332654539353521911570e-0002 13225c28e83SPiotr Jasiukajtis * ks[4] = -2.34607978510202710377617190278735525354347705866e-0003 13325c28e83SPiotr Jasiukajtis * ks[5] = 1.48413292290051695897242899977121846763824221705e-0004 13425c28e83SPiotr Jasiukajtis * ks[6] = -6.87730769637543488108688726777687262485357072242e-0006 13525c28e83SPiotr Jasiukajtis * 13625c28e83SPiotr Jasiukajtis * Single precision, Remez error <= 2**(-34.09) 13725c28e83SPiotr Jasiukajtis * 3 5 9 13825c28e83SPiotr Jasiukajtis * kpsin(t) = t + ks[0] * t + ks[1] * t + ... + ks[3] * t 13925c28e83SPiotr Jasiukajtis * 14025c28e83SPiotr Jasiukajtis * ks[0] = -1.64493404985645811354476665052005342839447790544e+0000 14125c28e83SPiotr Jasiukajtis * ks[1] = 8.11740794458351064092797249069438269367389272270e-0001 14225c28e83SPiotr Jasiukajtis * ks[2] = -1.90703144603551216933075809162889536878854055202e-0001 14325c28e83SPiotr Jasiukajtis * ks[3] = 2.55742333994264563281155312271481108635575331201e-0002 14425c28e83SPiotr Jasiukajtis * 14525c28e83SPiotr Jasiukajtis * Computation note: in simulating higher precision arithmetic, kcpsin 14625c28e83SPiotr Jasiukajtis * return head = t and tail = kc[0]*t^3 + (...) to maintain extra bits 14725c28e83SPiotr Jasiukajtis * precision. 14825c28e83SPiotr Jasiukajtis * 14925c28e83SPiotr Jasiukajtis * And for kpcos(t) for |t|< 0.183: 15025c28e83SPiotr Jasiukajtis * 15125c28e83SPiotr Jasiukajtis * Quad precision, remez <= 2**(-122.48) 15225c28e83SPiotr Jasiukajtis * 2 4 22 15325c28e83SPiotr Jasiukajtis * kpcos(t) = 1/pi + pi/2 * t + kc[2] * t + ... + kc[11] * t 15425c28e83SPiotr Jasiukajtis * 15525c28e83SPiotr Jasiukajtis * kc[2] = 1.29192819501249250731151312779548918765320728489e+0000 15625c28e83SPiotr Jasiukajtis * kc[3] = -4.25027339979557573976029596929319207009444090366e-0001 15725c28e83SPiotr Jasiukajtis * kc[4] = 7.49080661650990096109672954618317623888421628613e-0002 15825c28e83SPiotr Jasiukajtis * kc[5] = -8.21458866111282287985539464173976555436050215120e-0003 15925c28e83SPiotr Jasiukajtis * kc[6] = 6.14202578809529228503205255165761204750211603402e-0004 16025c28e83SPiotr Jasiukajtis * kc[7] = -3.33073432691149607007217330302595267179545908740e-0005 16125c28e83SPiotr Jasiukajtis * kc[8] = 1.36970959047832085796809745461530865597993680204e-0006 16225c28e83SPiotr Jasiukajtis * kc[9] = -4.41780774262583514450246512727201806217271097336e-0008 16325c28e83SPiotr Jasiukajtis * kc[10]= 1.14741409212381858820016567664488123478660705759e-0009 16425c28e83SPiotr Jasiukajtis * kc[11]= -2.44261236114707374558437500654381006300502749632e-0011 16525c28e83SPiotr Jasiukajtis * 16625c28e83SPiotr Jasiukajtis * Double precision, remez < 2**(61.91) 16725c28e83SPiotr Jasiukajtis * 2 4 12 16825c28e83SPiotr Jasiukajtis * kpcos(t) = 1/pi + pi/2 *t + kc[2] * t + ... + kc[6] * t 16925c28e83SPiotr Jasiukajtis * 17025c28e83SPiotr Jasiukajtis * kc[2] = 1.29192819501230224953283586722575766189551966008e+0000 17125c28e83SPiotr Jasiukajtis * kc[3] = -4.25027339940149518500158850753393173519732149213e-0001 17225c28e83SPiotr Jasiukajtis * kc[4] = 7.49080625187015312373925142219429422375556727752e-0002 17325c28e83SPiotr Jasiukajtis * kc[5] = -8.21442040906099210866977352284054849051348692715e-0003 17425c28e83SPiotr Jasiukajtis * kc[6] = 6.10411356829515414575566564733632532333904115968e-0004 17525c28e83SPiotr Jasiukajtis * 17625c28e83SPiotr Jasiukajtis * Single precision, remez < 2**(-30.13) 17725c28e83SPiotr Jasiukajtis * 2 6 17825c28e83SPiotr Jasiukajtis * kpcos(t) = kc[0] + kc[1] * t + ... + kc[3] * t 17925c28e83SPiotr Jasiukajtis * 18025c28e83SPiotr Jasiukajtis * kc[0] = 3.18309886183790671537767526745028724068919291480e-0001 18125c28e83SPiotr Jasiukajtis * kc[1] = -1.57079581447762568199467875065854538626594937791e+0000 18225c28e83SPiotr Jasiukajtis * kc[2] = 1.29183528092558692844073004029568674027807393862e+0000 18325c28e83SPiotr Jasiukajtis * kc[3] = -4.20232949771307685981015914425195471602739075537e-0001 18425c28e83SPiotr Jasiukajtis * 18525c28e83SPiotr Jasiukajtis * Computation note: in simulating higher precision arithmetic, kcpcos 18625c28e83SPiotr Jasiukajtis * return head = 1/pi chopped, and tail = pi/2 *t^2 + (tail part of 1/pi 18725c28e83SPiotr Jasiukajtis * + ...) to maintain extra bits precision. In particular, pi/2 * t^2 18825c28e83SPiotr Jasiukajtis * is calculated with great care. 18925c28e83SPiotr Jasiukajtis * 19025c28e83SPiotr Jasiukajtis * Thus, the computation of gamma(-x), x>0, is: 19125c28e83SPiotr Jasiukajtis * Let k = int(x), z = x-k. 19225c28e83SPiotr Jasiukajtis * For z in (I) 19325c28e83SPiotr Jasiukajtis * k+1 19425c28e83SPiotr Jasiukajtis * (-1) 19525c28e83SPiotr Jasiukajtis * gamma(-x) = ------------------- ; 19625c28e83SPiotr Jasiukajtis * kpsin(z)*gamma(1+x) 19725c28e83SPiotr Jasiukajtis * 19825c28e83SPiotr Jasiukajtis * otherwise, for z in (II), 19925c28e83SPiotr Jasiukajtis * k+1 20025c28e83SPiotr Jasiukajtis * (-1) 20125c28e83SPiotr Jasiukajtis * gamma(-x) = ----------------------- ; 20225c28e83SPiotr Jasiukajtis * kpcos(0.5-z)*gamma(1+x) 20325c28e83SPiotr Jasiukajtis * 20425c28e83SPiotr Jasiukajtis * otherwise, for z in (III), 20525c28e83SPiotr Jasiukajtis * k+1 20625c28e83SPiotr Jasiukajtis * (-1) 20725c28e83SPiotr Jasiukajtis * gamma(-x) = --------------------- . 20825c28e83SPiotr Jasiukajtis * kpsin(1-z)*gamma(1+x) 20925c28e83SPiotr Jasiukajtis * 21025c28e83SPiotr Jasiukajtis * Thus, the computation of gamma(-x) reduced to the computation of 21125c28e83SPiotr Jasiukajtis * gamma(1+x) and kpsin(), kpcos(). 21225c28e83SPiotr Jasiukajtis * 21325c28e83SPiotr Jasiukajtis * (B) For x between 1 and 2. We break [1,2] into three parts: 21425c28e83SPiotr Jasiukajtis * GT1 = [1.0000, 1.2845] 21525c28e83SPiotr Jasiukajtis * GT2 = [1.2844, 1.6374] 21625c28e83SPiotr Jasiukajtis * GT3 = [1.6373, 2.0000] 21725c28e83SPiotr Jasiukajtis * 21825c28e83SPiotr Jasiukajtis * For x in GTi, i=1,2,3, let 21925c28e83SPiotr Jasiukajtis * z1 = 1.134861805732790769689793935774652917006 22025c28e83SPiotr Jasiukajtis * gz1 = gamma(z1) = 0.9382046279096824494097535615803269576988 22125c28e83SPiotr Jasiukajtis * tz1 = gamma'(z1) = -0.3517214357852935791015625000000000000000 22225c28e83SPiotr Jasiukajtis * 22325c28e83SPiotr Jasiukajtis * z2 = 1.461632144968362341262659542325721328468e+0000 22425c28e83SPiotr Jasiukajtis * gz2 = gamma(z2) = 0.8856031944108887002788159005825887332080 22525c28e83SPiotr Jasiukajtis * tz2 = gamma'(z2) = 0.00 22625c28e83SPiotr Jasiukajtis * 22725c28e83SPiotr Jasiukajtis * z3 = 1.819773101100500601787868704921606996312e+0000 22825c28e83SPiotr Jasiukajtis * gz3 = gamma(z3) = 0.9367814114636523216188468970808378497426 22925c28e83SPiotr Jasiukajtis * tz3 = gamma'(z3) = 0.2805306315422058105468750000000000000000 23025c28e83SPiotr Jasiukajtis * 23125c28e83SPiotr Jasiukajtis * and 23225c28e83SPiotr Jasiukajtis * y = x-zi ... for extra precision, write y = y.h + y.l 23325c28e83SPiotr Jasiukajtis * Then 23425c28e83SPiotr Jasiukajtis * gamma(x) = gzi + tzi*(y.h+y.l) + y*y*Ri(y), 23525c28e83SPiotr Jasiukajtis * = gzi.h + (tzi*y.h + ((tzi*y.l+gzi.l) + y*y*Ri(y))) 23625c28e83SPiotr Jasiukajtis * = gy.h + gy.l 23725c28e83SPiotr Jasiukajtis * where 23825c28e83SPiotr Jasiukajtis * (I) For double precision 23925c28e83SPiotr Jasiukajtis * 24025c28e83SPiotr Jasiukajtis * Ri(y) = Pi(y)/Qi(y), i=1,2,3; 24125c28e83SPiotr Jasiukajtis * 24225c28e83SPiotr Jasiukajtis * P1(y) = p1[0] + p1[1]*y + ... + p1[4]*y^4 24325c28e83SPiotr Jasiukajtis * Q1(y) = q1[0] + q1[1]*y + ... + q1[5]*y^5 24425c28e83SPiotr Jasiukajtis * 24525c28e83SPiotr Jasiukajtis * P2(y) = p2[0] + p2[1]*y + ... + p2[3]*y^3 24625c28e83SPiotr Jasiukajtis * Q2(y) = q2[0] + q2[1]*y + ... + q2[6]*y^6 24725c28e83SPiotr Jasiukajtis * 24825c28e83SPiotr Jasiukajtis * P3(y) = p3[0] + p3[1]*y + ... + p3[4]*y^4 24925c28e83SPiotr Jasiukajtis * Q3(y) = q3[0] + q3[1]*y + ... + q3[5]*y^5 25025c28e83SPiotr Jasiukajtis * 25125c28e83SPiotr Jasiukajtis * Remez precision of Ri(y): 25225c28e83SPiotr Jasiukajtis * |gamma(x)-(gzi+tzi*y) - y*y*Ri(y)| <= 2**-62.3 ... for i = 1 25325c28e83SPiotr Jasiukajtis * <= 2**-59.4 ... for i = 2 25425c28e83SPiotr Jasiukajtis * <= 2**-62.1 ... for i = 3 25525c28e83SPiotr Jasiukajtis * 25625c28e83SPiotr Jasiukajtis * (II) For quad precision 25725c28e83SPiotr Jasiukajtis * 25825c28e83SPiotr Jasiukajtis * Ri(y) = Pi(y)/Qi(y), i=1,2,3; 25925c28e83SPiotr Jasiukajtis * 26025c28e83SPiotr Jasiukajtis * P1(y) = p1[0] + p1[1]*y + ... + p1[9]*y^9 26125c28e83SPiotr Jasiukajtis * Q1(y) = q1[0] + q1[1]*y + ... + q1[8]*y^8 26225c28e83SPiotr Jasiukajtis * 26325c28e83SPiotr Jasiukajtis * P2(y) = p2[0] + p2[1]*y + ... + p2[9]*y^9 26425c28e83SPiotr Jasiukajtis * Q2(y) = q2[0] + q2[1]*y + ... + q2[9]*y^9 26525c28e83SPiotr Jasiukajtis * 26625c28e83SPiotr Jasiukajtis * P3(y) = p3[0] + p3[1]*y + ... + p3[9]*y^9 26725c28e83SPiotr Jasiukajtis * Q3(y) = q3[0] + q3[1]*y + ... + q3[9]*y^9 26825c28e83SPiotr Jasiukajtis * 26925c28e83SPiotr Jasiukajtis * Remez precision of Ri(y): 27025c28e83SPiotr Jasiukajtis * |gamma(x)-(gzi+tzi*y) - y*y*Ri(y)| <= 2**-118.2 ... for i = 1 27125c28e83SPiotr Jasiukajtis * <= 2**-126.8 ... for i = 2 27225c28e83SPiotr Jasiukajtis * <= 2**-119.5 ... for i = 3 27325c28e83SPiotr Jasiukajtis * 27425c28e83SPiotr Jasiukajtis * (III) For single precision 27525c28e83SPiotr Jasiukajtis * 27625c28e83SPiotr Jasiukajtis * Ri(y) = Pi(y), i=1,2,3; 27725c28e83SPiotr Jasiukajtis * 27825c28e83SPiotr Jasiukajtis * P1(y) = p1[0] + p1[1]*y + ... + p1[5]*y^5 27925c28e83SPiotr Jasiukajtis * 28025c28e83SPiotr Jasiukajtis * P2(y) = p2[0] + p2[1]*y + ... + p2[5]*y^5 28125c28e83SPiotr Jasiukajtis * 28225c28e83SPiotr Jasiukajtis * P3(y) = p3[0] + p3[1]*y + ... + p3[4]*y^4 28325c28e83SPiotr Jasiukajtis * 28425c28e83SPiotr Jasiukajtis * Remez precision of Ri(y): 28525c28e83SPiotr Jasiukajtis * |gamma(x)-(gzi+tzi*y) - y*y*Ri(y)| <= 2**-30.8 ... for i = 1 28625c28e83SPiotr Jasiukajtis * <= 2**-31.6 ... for i = 2 28725c28e83SPiotr Jasiukajtis * <= 2**-29.5 ... for i = 3 28825c28e83SPiotr Jasiukajtis * 28925c28e83SPiotr Jasiukajtis * Notes. (1) GTi and zi are choosen to balance the interval width and 29025c28e83SPiotr Jasiukajtis * minimize the distant between gamma(x) and the tangent line at 29125c28e83SPiotr Jasiukajtis * zi. In particular, we have 29225c28e83SPiotr Jasiukajtis * |gamma(x)-(gzi+tzi*(x-zi))| <= 0.01436... for x in [1,z2] 29325c28e83SPiotr Jasiukajtis * <= 0.01265... for x in [z2,2] 29425c28e83SPiotr Jasiukajtis * 29525c28e83SPiotr Jasiukajtis * (2) zi are slightly adjusted so that tzi=gamma'(zi) is very 29625c28e83SPiotr Jasiukajtis * close to a single precision value. 29725c28e83SPiotr Jasiukajtis * 29825c28e83SPiotr Jasiukajtis * Coefficents: Single precision 29925c28e83SPiotr Jasiukajtis * i= 1: 30025c28e83SPiotr Jasiukajtis * P1[0] = 7.09087253435088360271451613398019280077561279443e-0001 30125c28e83SPiotr Jasiukajtis * P1[1] = -5.17229560788652108545141978238701790105241761089e-0001 30225c28e83SPiotr Jasiukajtis * P1[2] = 5.23403394528150789405825222323770647162337764327e-0001 30325c28e83SPiotr Jasiukajtis * P1[3] = -4.54586308717075010784041566069480411732634814899e-0001 30425c28e83SPiotr Jasiukajtis * P1[4] = 4.20596490915239085459964590559256913498190955233e-0001 30525c28e83SPiotr Jasiukajtis * P1[5] = -3.57307589712377520978332185838241458642142185789e-0001 30625c28e83SPiotr Jasiukajtis * 30725c28e83SPiotr Jasiukajtis * i = 2: 30825c28e83SPiotr Jasiukajtis * p2[0] = 4.28486983980295198166056119223984284434264344578e-0001 30925c28e83SPiotr Jasiukajtis * p2[1] = -1.30704539487709138528680121627899735386650103914e-0001 31025c28e83SPiotr Jasiukajtis * p2[2] = 1.60856285038051955072861219352655851542955430871e-0001 31125c28e83SPiotr Jasiukajtis * p2[3] = -9.22285161346010583774458802067371182158937943507e-0002 31225c28e83SPiotr Jasiukajtis * p2[4] = 7.19240511767225260740890292605070595560626179357e-0002 31325c28e83SPiotr Jasiukajtis * p2[5] = -4.88158265593355093703112238534484636193260459574e-0002 31425c28e83SPiotr Jasiukajtis * 31525c28e83SPiotr Jasiukajtis * i = 3 31625c28e83SPiotr Jasiukajtis * p3[0] = 3.82409531118807759081121479786092134814808872880e-0001 31725c28e83SPiotr Jasiukajtis * p3[1] = 2.65309888180188647956400403013495759365167853426e-0002 31825c28e83SPiotr Jasiukajtis * p3[2] = 8.06815109775079171923561169415370309376296739835e-0002 31925c28e83SPiotr Jasiukajtis * p3[3] = -1.54821591666137613928840890835174351674007764799e-0002 32025c28e83SPiotr Jasiukajtis * p3[4] = 1.76308239242717268530498313416899188157165183405e-0002 32125c28e83SPiotr Jasiukajtis * 32225c28e83SPiotr Jasiukajtis * Coefficents: Double precision 32325c28e83SPiotr Jasiukajtis * i = 1: 32425c28e83SPiotr Jasiukajtis * p1[0] = 0.70908683619977797008004927192814648151397705078125000 32525c28e83SPiotr Jasiukajtis * p1[1] = 1.71987061393048558089579513384356441668351720061e-0001 32625c28e83SPiotr Jasiukajtis * p1[2] = -3.19273345791990970293320316122813960527705450671e-0002 32725c28e83SPiotr Jasiukajtis * p1[3] = 8.36172645419110036267169600390549973563534476989e-0003 32825c28e83SPiotr Jasiukajtis * p1[4] = 1.13745336648572838333152213474277971244629758101e-0003 32925c28e83SPiotr Jasiukajtis * q1[0] = 1.0 33025c28e83SPiotr Jasiukajtis * q1[1] = 9.71980217826032937526460731778472389791321968082e-0001 33125c28e83SPiotr Jasiukajtis * q1[2] = -7.43576743326756176594084137256042653497087666030e-0002 33225c28e83SPiotr Jasiukajtis * q1[3] = -1.19345944932265559769719470515102012246995255372e-0001 33325c28e83SPiotr Jasiukajtis * q1[4] = 1.59913445751425002620935120470781382215050284762e-0002 33425c28e83SPiotr Jasiukajtis * q1[5] = 1.12601136853374984566572691306402321911547550783e-0003 33525c28e83SPiotr Jasiukajtis * i = 2: 33625c28e83SPiotr Jasiukajtis * p2[0] = 0.42848681585558601181418225678498856723308563232421875 33725c28e83SPiotr Jasiukajtis * p2[1] = 6.53596762668970816023718845105667418483122103629e-0002 33825c28e83SPiotr Jasiukajtis * p2[2] = -6.97280829631212931321050770925128264272768936731e-0003 33925c28e83SPiotr Jasiukajtis * p2[3] = 6.46342359021981718947208605674813260166116632899e-0003 34025c28e83SPiotr Jasiukajtis * q2[0] = 1.0 34125c28e83SPiotr Jasiukajtis * q2[1] = 4.57572620560506047062553957454062012327519313936e-0001 34225c28e83SPiotr Jasiukajtis * q2[2] = -2.52182594886075452859655003407796103083422572036e-0001 34325c28e83SPiotr Jasiukajtis * q2[3] = -1.82970945407778594681348166040103197178711552827e-0002 34425c28e83SPiotr Jasiukajtis * q2[4] = 2.43574726993169566475227642128830141304953840502e-0002 34525c28e83SPiotr Jasiukajtis * q2[5] = -5.20390406466942525358645957564897411258667085501e-0003 34625c28e83SPiotr Jasiukajtis * q2[6] = 4.79520251383279837635552431988023256031951133885e-0004 34725c28e83SPiotr Jasiukajtis * i = 3: 34825c28e83SPiotr Jasiukajtis * p3[0] = 0.382409479734567459008331979930517263710498809814453125 34925c28e83SPiotr Jasiukajtis * p3[1] = 1.42876048697668161599069814043449301572928034140e-0001 35025c28e83SPiotr Jasiukajtis * p3[2] = 3.42157571052250536817923866013561760785748899071e-0003 35125c28e83SPiotr Jasiukajtis * p3[3] = -5.01542621710067521405087887856991700987709272937e-0004 35225c28e83SPiotr Jasiukajtis * p3[4] = 8.89285814866740910123834688163838287618332122670e-0004 35325c28e83SPiotr Jasiukajtis * q3[0] = 1.0 35425c28e83SPiotr Jasiukajtis * q3[1] = 3.04253086629444201002215640948957897906299633168e-0001 35525c28e83SPiotr Jasiukajtis * q3[2] = -2.23162407379999477282555672834881213873185520006e-0001 35625c28e83SPiotr Jasiukajtis * q3[3] = -1.05060867741952065921809811933670131427552903636e-0002 35725c28e83SPiotr Jasiukajtis * q3[4] = 1.70511763916186982473301861980856352005926669320e-0002 35825c28e83SPiotr Jasiukajtis * q3[5] = -2.12950201683609187927899416700094630764182477464e-0003 35925c28e83SPiotr Jasiukajtis * 36025c28e83SPiotr Jasiukajtis * Note that all pi0 are exact in double, which is obtained by a 36125c28e83SPiotr Jasiukajtis * special Remez Algorithm. 36225c28e83SPiotr Jasiukajtis * 36325c28e83SPiotr Jasiukajtis * Coefficents: Quad precision 36425c28e83SPiotr Jasiukajtis * i = 1: 36525c28e83SPiotr Jasiukajtis * p1[0] = 0.709086836199777919037185741507610124611513720557 36625c28e83SPiotr Jasiukajtis * p1[1] = 4.45754781206489035827915969367354835667391606951e-0001 36725c28e83SPiotr Jasiukajtis * p1[2] = 3.21049298735832382311662273882632210062918153852e-0002 36825c28e83SPiotr Jasiukajtis * p1[3] = -5.71296796342106617651765245858289197369688864350e-0003 36925c28e83SPiotr Jasiukajtis * p1[4] = 6.04666892891998977081619174969855831606965352773e-0003 37025c28e83SPiotr Jasiukajtis * p1[5] = 8.99106186996888711939627812174765258822658645168e-0004 37125c28e83SPiotr Jasiukajtis * p1[6] = -6.96496846144407741431207008527018441810175568949e-0005 37225c28e83SPiotr Jasiukajtis * p1[7] = 1.52597046118984020814225409300131445070213882429e-0005 37325c28e83SPiotr Jasiukajtis * p1[8] = 5.68521076168495673844711465407432189190681541547e-0007 37425c28e83SPiotr Jasiukajtis * p1[9] = 3.30749673519634895220582062520286565610418952979e-0008 37525c28e83SPiotr Jasiukajtis * q1[0] = 1.0+0000 37625c28e83SPiotr Jasiukajtis * q1[1] = 1.35806511721671070408570853537257079579490650668e+0000 37725c28e83SPiotr Jasiukajtis * q1[2] = 2.97567810153429553405327140096063086994072952961e-0001 37825c28e83SPiotr Jasiukajtis * q1[3] = -1.52956835982588571502954372821681851681118097870e-0001 37925c28e83SPiotr Jasiukajtis * q1[4] = -2.88248519561420109768781615289082053597954521218e-0002 38025c28e83SPiotr Jasiukajtis * q1[5] = 1.03475311719937405219789948456313936302378395955e-0002 38125c28e83SPiotr Jasiukajtis * q1[6] = 4.12310203243891222368965360124391297374822742313e-0004 38225c28e83SPiotr Jasiukajtis * q1[7] = -3.12653708152290867248931925120380729518332507388e-0004 38325c28e83SPiotr Jasiukajtis * q1[8] = 2.36672170850409745237358105667757760527014332458e-0005 38425c28e83SPiotr Jasiukajtis * 38525c28e83SPiotr Jasiukajtis * i = 2: 38625c28e83SPiotr Jasiukajtis * p2[0] = 0.428486815855585429730209907810650616737756697477 38725c28e83SPiotr Jasiukajtis * p2[1] = 2.63622124067885222919192651151581541943362617352e-0001 38825c28e83SPiotr Jasiukajtis * p2[2] = 3.85520683670028865731877276741390421744971446855e-0002 38925c28e83SPiotr Jasiukajtis * p2[3] = 3.05065978278128549958897133190295325258023525862e-0003 39025c28e83SPiotr Jasiukajtis * p2[4] = 2.48232934951723128892080415054084339152450445081e-0003 39125c28e83SPiotr Jasiukajtis * p2[5] = 3.67092777065632360693313762221411547741550105407e-0004 39225c28e83SPiotr Jasiukajtis * p2[6] = 3.81228045616085789674530902563145250532194518946e-0006 39325c28e83SPiotr Jasiukajtis * p2[7] = 4.61677225867087554059531455133839175822537617677e-0006 39425c28e83SPiotr Jasiukajtis * p2[8] = 2.18209052385703200438239200991201916609364872993e-0007 39525c28e83SPiotr Jasiukajtis * p2[9] = 1.00490538985245846460006244065624754421022542454e-0008 39625c28e83SPiotr Jasiukajtis * q2[0] = 1.0 39725c28e83SPiotr Jasiukajtis * q2[1] = 9.20276350207639290567783725273128544224570775056e-0001 39825c28e83SPiotr Jasiukajtis * q2[2] = -4.79533683654165107448020515733883781138947771495e-0003 39925c28e83SPiotr Jasiukajtis * q2[3] = -1.24538337585899300494444600248687901947684291683e-0001 40025c28e83SPiotr Jasiukajtis * q2[4] = 4.49866050763472358547524708431719114204535491412e-0003 40125c28e83SPiotr Jasiukajtis * q2[5] = 7.20715455697920560621638325356292640604078591907e-0003 40225c28e83SPiotr Jasiukajtis * q2[6] = -8.68513169029126780280798337091982780598228096116e-0004 40325c28e83SPiotr Jasiukajtis * q2[7] = -1.25104431629401181525027098222745544809974229874e-0004 40425c28e83SPiotr Jasiukajtis * q2[8] = 3.10558344839000038489191304550998047521253437464e-0005 40525c28e83SPiotr Jasiukajtis * q2[9] = -1.76829227852852176018537139573609433652506765712e-0006 40625c28e83SPiotr Jasiukajtis * 40725c28e83SPiotr Jasiukajtis * i = 3 40825c28e83SPiotr Jasiukajtis * p3[0] = 0.3824094797345675048502747661075355640070439388902 40925c28e83SPiotr Jasiukajtis * p3[1] = 3.42198093076618495415854906335908427159833377774e-0001 41025c28e83SPiotr Jasiukajtis * p3[2] = 9.63828189500585568303961406863153237440702754858e-0002 41125c28e83SPiotr Jasiukajtis * p3[3] = 8.76069421042696384852462044188520252156846768667e-0003 41225c28e83SPiotr Jasiukajtis * p3[4] = 1.86477890389161491224872014149309015261897537488e-0003 41325c28e83SPiotr Jasiukajtis * p3[5] = 8.16871354540309895879974742853701311541286944191e-0004 41425c28e83SPiotr Jasiukajtis * p3[6] = 6.83783483674600322518695090864659381650125625216e-0005 41525c28e83SPiotr Jasiukajtis * p3[7] = -1.10168269719261574708565935172719209272190828456e-0006 41625c28e83SPiotr Jasiukajtis * p3[8] = 9.66243228508380420159234853278906717065629721016e-0007 41725c28e83SPiotr Jasiukajtis * p3[9] = 2.31858885579177250541163820671121664974334728142e-0008 41825c28e83SPiotr Jasiukajtis * q3[0] = 1.0 41925c28e83SPiotr Jasiukajtis * q3[1] = 8.25479821168813634632437430090376252512793067339e-0001 42025c28e83SPiotr Jasiukajtis * q3[2] = -1.62251363073937769739639623669295110346015576320e-0002 42125c28e83SPiotr Jasiukajtis * q3[3] = -1.10621286905916732758745130629426559691187579852e-0001 42225c28e83SPiotr Jasiukajtis * q3[4] = 3.48309693970985612644446415789230015515365291459e-0003 42325c28e83SPiotr Jasiukajtis * q3[5] = 6.73553737487488333032431261131289672347043401328e-0003 42425c28e83SPiotr Jasiukajtis * q3[6] = -7.63222008393372630162743587811004613050245128051e-0004 42525c28e83SPiotr Jasiukajtis * q3[7] = -1.35792670669190631476784768961953711773073251336e-0004 42625c28e83SPiotr Jasiukajtis * q3[8] = 3.19610150954223587006220730065608156460205690618e-0005 42725c28e83SPiotr Jasiukajtis * q3[9] = -1.82096553862822346610109522015129585693354348322e-0006 42825c28e83SPiotr Jasiukajtis * 42925c28e83SPiotr Jasiukajtis * (C) For x between 0 and 1. 43025c28e83SPiotr Jasiukajtis * Let P stand for the number of significant bits in the working precision. 43125c28e83SPiotr Jasiukajtis * -P 1 43225c28e83SPiotr Jasiukajtis * (1)For 0 <= x <= 2 , gamma(x) is computed by --- rounded to nearest. 43325c28e83SPiotr Jasiukajtis * x 43425c28e83SPiotr Jasiukajtis * The error is bound by 0.739 ulp(gamma(x)) in IEEE double precision. 43525c28e83SPiotr Jasiukajtis * Proof. 43625c28e83SPiotr Jasiukajtis * 1 2 43725c28e83SPiotr Jasiukajtis * Since -------- ~ x + 0.577...*x - ..., we have, for small x, 43825c28e83SPiotr Jasiukajtis * gamma(x) 43925c28e83SPiotr Jasiukajtis * 1 1 44025c28e83SPiotr Jasiukajtis * ----------- < gamma(x) < --- and 44125c28e83SPiotr Jasiukajtis * x(1+0.578x) x 44225c28e83SPiotr Jasiukajtis * 1 1 1 44325c28e83SPiotr Jasiukajtis * 0 < --- - gamma(x) <= --- - ----------- < 0.578 44425c28e83SPiotr Jasiukajtis * x x x(1+0.578x) 44525c28e83SPiotr Jasiukajtis * 1 1 -P 44625c28e83SPiotr Jasiukajtis * The error is thus bounded by --- ulp(---) + 0.578. Since x <= 2 , 44725c28e83SPiotr Jasiukajtis * 2 x 44825c28e83SPiotr Jasiukajtis * 1 P 1 P 1 44925c28e83SPiotr Jasiukajtis * --- >= 2 , ulp(---) >= ulp(2 ) >= 2. Thus 0.578=0.289*2<=0.289ulp(-) 45025c28e83SPiotr Jasiukajtis * x x x 45125c28e83SPiotr Jasiukajtis * Thus 45225c28e83SPiotr Jasiukajtis * 1 1 45325c28e83SPiotr Jasiukajtis * | gamma(x) - [---] rounded | <= (0.5+0.289)*ulp(---). 45425c28e83SPiotr Jasiukajtis * x x 45525c28e83SPiotr Jasiukajtis * -P 1 45625c28e83SPiotr Jasiukajtis * Note that for x<= 2 , it is easy to see that ulp(---)=ulp(gamma(x)) 45725c28e83SPiotr Jasiukajtis * x 45825c28e83SPiotr Jasiukajtis * n 1 45925c28e83SPiotr Jasiukajtis * except only when x = 2 , (n<= -53). In such cases, --- is exact 46025c28e83SPiotr Jasiukajtis * x 46125c28e83SPiotr Jasiukajtis * and therefore the error is bounded by 46225c28e83SPiotr Jasiukajtis * 1 46325c28e83SPiotr Jasiukajtis * 0.298*ulp(---) = 0.298*2*ulp(gamma(x)) = 0.578ulp(gamma(x)). 46425c28e83SPiotr Jasiukajtis * x 46525c28e83SPiotr Jasiukajtis * Thus we conclude that the error in gamma is less than 0.739 ulp. 46625c28e83SPiotr Jasiukajtis * 46725c28e83SPiotr Jasiukajtis * (2)Otherwise, for x in GTi-1 (see B), let y = x-(zi-1). From (B) we obtain 46825c28e83SPiotr Jasiukajtis * gamma(1+x) 46925c28e83SPiotr Jasiukajtis * gamma(1+x) = gy.h + gy.l, then compute gamma(x) by -----------. 47025c28e83SPiotr Jasiukajtis * x 47125c28e83SPiotr Jasiukajtis * gy.h 47225c28e83SPiotr Jasiukajtis * Implementaion note. Write x = x.h+x.l, and Let th = ----- chopped to 47325c28e83SPiotr Jasiukajtis * x 47425c28e83SPiotr Jasiukajtis * 20 bits, then 47525c28e83SPiotr Jasiukajtis * gy.h+gy.l 47625c28e83SPiotr Jasiukajtis * gamma(x) = th + (---------- - th ) 47725c28e83SPiotr Jasiukajtis * x 47825c28e83SPiotr Jasiukajtis * 1 47925c28e83SPiotr Jasiukajtis * = th + ---*(gy.h-th*x.h+gy.l-th*x.l) 48025c28e83SPiotr Jasiukajtis * x 48125c28e83SPiotr Jasiukajtis * 48225c28e83SPiotr Jasiukajtis * (D) For x between 2 and 8. Let n = 1+x chopped to an integer. Then 48325c28e83SPiotr Jasiukajtis * 48425c28e83SPiotr Jasiukajtis * gamma(x)=(x-1)*(x-2)*...*(x-n)*gamma(x-n) 48525c28e83SPiotr Jasiukajtis * 48625c28e83SPiotr Jasiukajtis * Since x-n is between 1 and 2, we can apply (B) to compute gamma(x). 48725c28e83SPiotr Jasiukajtis * 48825c28e83SPiotr Jasiukajtis * Implementation detail. The computation of (x-1)(x-2)...(x-n) in simulated 48925c28e83SPiotr Jasiukajtis * higher precision arithmetic can be somewhat optimized. For example, in 49025c28e83SPiotr Jasiukajtis * computing (x-1)*(x-2)*(x-3)*(x-4), if we compute (x-1)*(x-4) = z.h+z.l, 49125c28e83SPiotr Jasiukajtis * then (x-2)(x-3) = z.h+2+z.l readily. In below, we list the expression 49225c28e83SPiotr Jasiukajtis * of the formula to compute gamma(x). 49325c28e83SPiotr Jasiukajtis * 49425c28e83SPiotr Jasiukajtis * Assume x-n is in GTi (i=1,2, or 3, see B for detail). Let y = x - n - zi. 49525c28e83SPiotr Jasiukajtis * By (B) we have gamma(x-n) = gy.h+gy.l. If x = x.h+x.l, then we have 49625c28e83SPiotr Jasiukajtis * n=1 (x in [2,3]): 49725c28e83SPiotr Jasiukajtis * gamma(x) = (x-1)*gamma(x-1) = (x-1)*(gy.h+gy.l) 49825c28e83SPiotr Jasiukajtis * = [(x.h-1)+x.l]*(gy.h+gy.l) 49925c28e83SPiotr Jasiukajtis * n=2 (x in [3,4]): 50025c28e83SPiotr Jasiukajtis * gamma(x) = (x-1)(x-2)*gamma(x-2) = (x-1)*(x-2)*(gy.h+gy.l) 50125c28e83SPiotr Jasiukajtis * = ((x.h-2)+x.l)*((x.h-1)+x.l)*(gy.h+gy.l) 50225c28e83SPiotr Jasiukajtis * = [x.h*(x.h-3)+2+x.l*(x+(x.h-3))]*(gy.h+gy.l) 50325c28e83SPiotr Jasiukajtis * n=3 (x in [4,5]) 50425c28e83SPiotr Jasiukajtis * gamma(x) = (x-1)(x-2)(x-3)*(gy.h+gy.l) 50525c28e83SPiotr Jasiukajtis * = (x.h*(x.h-3)+2+x.l*(x+(x.h-3)))*[((x.h-3)+x.l)(gy.h+gy.l)] 50625c28e83SPiotr Jasiukajtis * n=4 (x in [5,6]) 50725c28e83SPiotr Jasiukajtis * gamma(x) = [(x-1)(x-4)]*[(x-2)(x-3)]*(gy.h+gy.l) 50825c28e83SPiotr Jasiukajtis * = [(x.h*(x.h-5)+4+x.l(x+(x.h-5)))]*[(x-2)*(x-3)]*(gy.h+gy.l) 50925c28e83SPiotr Jasiukajtis * = (y.h+y.l)*(y.h+1+y.l)*(gy.h+gy.l) 51025c28e83SPiotr Jasiukajtis * n=5 (x in [6,7]) 51125c28e83SPiotr Jasiukajtis * gamma(x) = [(x-1)(x-4)]*[(x-2)(x-3)]*[(x-5)*(gy.h+gy.l)] 51225c28e83SPiotr Jasiukajtis * n=6 (x in [7,8]) 51325c28e83SPiotr Jasiukajtis * gamma(x) = [(x-1)(x-6)]*[(x-2)(x-5)]*[(x-3)(x-4)]*(gy.h+gy.l)] 51425c28e83SPiotr Jasiukajtis * = [(y.h+y.l)(y.h+4+y.l)][(y.h+6+y.l)(gy.h+gy.l)] 51525c28e83SPiotr Jasiukajtis * 51625c28e83SPiotr Jasiukajtis * (E)Overflow Thresold. For x > Overflow thresold of gamma, 51725c28e83SPiotr Jasiukajtis * return huge*huge (overflow). 51825c28e83SPiotr Jasiukajtis * 51925c28e83SPiotr Jasiukajtis * By checking whether lgamma(x) >= 2**{128,1024,16384}, one can 52025c28e83SPiotr Jasiukajtis * determine the overflow threshold for x in single, double, and 52125c28e83SPiotr Jasiukajtis * quad precision. See over.c for details. 52225c28e83SPiotr Jasiukajtis * 52325c28e83SPiotr Jasiukajtis * The overflow threshold of gamma(x) are 52425c28e83SPiotr Jasiukajtis * 52525c28e83SPiotr Jasiukajtis * single: x = 3.5040096283e+01 52625c28e83SPiotr Jasiukajtis * = 0x420C290F (IEEE single) 52725c28e83SPiotr Jasiukajtis * double: x = 1.71624376956302711505e+02 52825c28e83SPiotr Jasiukajtis * = 0x406573FAE561F647 (IEEE double) 52925c28e83SPiotr Jasiukajtis * quad: x = 1.7555483429044629170038892160702032034177e+03 53025c28e83SPiotr Jasiukajtis * = 0x4009B6E3180CD66A5C4206F128BA77F4 (quad) 53125c28e83SPiotr Jasiukajtis * 53225c28e83SPiotr Jasiukajtis * (F)For overflow_threshold >= x >= 8, we use asymptotic approximation. 53325c28e83SPiotr Jasiukajtis * (1) Stirling's formula 53425c28e83SPiotr Jasiukajtis * 53525c28e83SPiotr Jasiukajtis * log(G(x)) ~= (x-.5)*(log(x)-1) + .5(log(2*pi)-1) + (1/x)*P(1/(x*x)) 53625c28e83SPiotr Jasiukajtis * = L1 + L2 + L3, 53725c28e83SPiotr Jasiukajtis * where 53825c28e83SPiotr Jasiukajtis * L1(x) = (x-.5)*(log(x)-1), 53925c28e83SPiotr Jasiukajtis * L2 = .5(log(2pi)-1) = 0.41893853...., 54025c28e83SPiotr Jasiukajtis * L3(x) = (1/x)P(1/(x*x)), 54125c28e83SPiotr Jasiukajtis * 54225c28e83SPiotr Jasiukajtis * The range of L1,L2, and L3 are as follows: 54325c28e83SPiotr Jasiukajtis * 54425c28e83SPiotr Jasiukajtis * ------------------------------------------------------------------ 54525c28e83SPiotr Jasiukajtis * Range(L1) = (single) [8.09..,88.30..] =[2** 3.01..,2** 6.46..] 54625c28e83SPiotr Jasiukajtis * (double) [8.09..,709.3..] =[2** 3.01..,2** 9.47..] 54725c28e83SPiotr Jasiukajtis * (quad) [8.09..,11356.10..]=[2** 3.01..,2** 13.47..] 54825c28e83SPiotr Jasiukajtis * Range(L2) = 0.41893853..... 54925c28e83SPiotr Jasiukajtis * Range(L3) = [0.0104...., 0.00048....] =[2**-6.58..,2**-11.02..] 55025c28e83SPiotr Jasiukajtis * ------------------------------------------------------------------ 55125c28e83SPiotr Jasiukajtis * 55225c28e83SPiotr Jasiukajtis * Gamma(x) is then computed by exp(L1+L2+L3). 55325c28e83SPiotr Jasiukajtis * 55425c28e83SPiotr Jasiukajtis * (2) Error analysis of (F): 55525c28e83SPiotr Jasiukajtis * -------------------------- 55625c28e83SPiotr Jasiukajtis * The error in Gamma(x) depends on the error inherited in the computation 55725c28e83SPiotr Jasiukajtis * of L= L1+L2+L3. Let L' be the computed value of L. The absolute error 55825c28e83SPiotr Jasiukajtis * in L' is t = L-L'. Since exp(L') = exp(L-t) = exp(L)*exp(t) ~ 55925c28e83SPiotr Jasiukajtis * (1+t)*exp(L), the relative error in exp(L') is approximately t. 56025c28e83SPiotr Jasiukajtis * 56125c28e83SPiotr Jasiukajtis * To guarantee the relatively accuracy in exp(L'), we would like 56225c28e83SPiotr Jasiukajtis * |t| < 2**(-P-5) where P denotes for the number of significant bits 56325c28e83SPiotr Jasiukajtis * of the working precision. Consequently, each of the L1,L2, and L3 56425c28e83SPiotr Jasiukajtis * must be computed with absolute error bounded by 2**(-P-5) in absolute 56525c28e83SPiotr Jasiukajtis * value. 56625c28e83SPiotr Jasiukajtis * 56725c28e83SPiotr Jasiukajtis * Since L2 is a constant, it can be pre-computed to the desired accuracy. 56825c28e83SPiotr Jasiukajtis * Also |L3| < 2**-6; therefore, it suffices to compute L3 with the 56925c28e83SPiotr Jasiukajtis * working precision. That is, 57025c28e83SPiotr Jasiukajtis * L3(x) approxmiate log(G(x))-(x-.5)(log(x)-1)-.5(log(2pi)-1) 57125c28e83SPiotr Jasiukajtis * to a precision bounded by 2**(-P-5). 57225c28e83SPiotr Jasiukajtis * 57325c28e83SPiotr Jasiukajtis * 2**(-6) 57425c28e83SPiotr Jasiukajtis * _________V___________________ 57525c28e83SPiotr Jasiukajtis * L1(x): |_________|___________________| 57625c28e83SPiotr Jasiukajtis * __ ________________________ 57725c28e83SPiotr Jasiukajtis * L2: |__|________________________| 57825c28e83SPiotr Jasiukajtis * __________________________ 57925c28e83SPiotr Jasiukajtis * + L3(x): |__________________________| 58025c28e83SPiotr Jasiukajtis * ------------------------------------------- 58125c28e83SPiotr Jasiukajtis * [leading] + [Trailing] 58225c28e83SPiotr Jasiukajtis * 58325c28e83SPiotr Jasiukajtis * For L1(x)=(x-0.5)*(log(x)-1), we need ilogb(L1(x))+5 extra bits for 58425c28e83SPiotr Jasiukajtis * both multiplicants to guarantee L1(x)'s absolute error is bounded by 58525c28e83SPiotr Jasiukajtis * 2**(-P-5) in absolute value. Here ilogb(y) is defined to be the unbias 58625c28e83SPiotr Jasiukajtis * binary exponent of y in IEEE format. We can get x-0.5 to the desire 58725c28e83SPiotr Jasiukajtis * accuracy easily. It remains to compute log(x)-1 with ilogb(L1(x))+5 58825c28e83SPiotr Jasiukajtis * extra bits accracy. Note that the range of L1 is 88.30.., 709.3.., and 58925c28e83SPiotr Jasiukajtis * 11356.10... for single, double, and quadruple precision, we have 59025c28e83SPiotr Jasiukajtis * 59125c28e83SPiotr Jasiukajtis * single double quadruple 59225c28e83SPiotr Jasiukajtis * ------------------------------------ 59325c28e83SPiotr Jasiukajtis * ilogb(L1(x))+5 <= 11 14 18 59425c28e83SPiotr Jasiukajtis * ------------------------------------ 59525c28e83SPiotr Jasiukajtis * 59625c28e83SPiotr Jasiukajtis * (3) Table Driven Method for log(x)-1: 59725c28e83SPiotr Jasiukajtis * -------------------------------------- 59825c28e83SPiotr Jasiukajtis * Let x = 2**n * y, where 1 <= y < 2. Let Z={z(i),i=1,...,m} 59925c28e83SPiotr Jasiukajtis * be a set of predetermined evenly distributed floating point numbers 60025c28e83SPiotr Jasiukajtis * in [1, 2]. Let z(j) be the closest one to y, then 60125c28e83SPiotr Jasiukajtis * log(x)-1 = n*log(2)-1 + log(y) 60225c28e83SPiotr Jasiukajtis * = n*log(2)-1 + log(z(j)*y/z(j)) 60325c28e83SPiotr Jasiukajtis * = n*log(2)-1 + log(z(j)) + log(y/z(j)) 60425c28e83SPiotr Jasiukajtis * = T1(n) + T2(j) + T3, 60525c28e83SPiotr Jasiukajtis * 60625c28e83SPiotr Jasiukajtis * where T1(n) = n*log(2)-1 and T2(j) = log(z(j)). Both T1 and T2 can be 60725c28e83SPiotr Jasiukajtis * pre-calculated and be looked-up in a table. Note that 8 <= x < 1756 60825c28e83SPiotr Jasiukajtis * implies 3<=n<=10 implies 1.079.. < T1(n) < 6.931. 60925c28e83SPiotr Jasiukajtis * 61025c28e83SPiotr Jasiukajtis * 61125c28e83SPiotr Jasiukajtis * y-z(i) y 1+s 61225c28e83SPiotr Jasiukajtis * For T3, let s = --------; then ----- = ----- and 61325c28e83SPiotr Jasiukajtis * y+z(i) z(i) 1-s 61425c28e83SPiotr Jasiukajtis * 1+s 2 3 2 5 61525c28e83SPiotr Jasiukajtis * T3 = log(-----) = 2s + --- s + --- s + .... 61625c28e83SPiotr Jasiukajtis * 1-s 3 5 61725c28e83SPiotr Jasiukajtis * 61825c28e83SPiotr Jasiukajtis * Suppose the first term 2s is compute in extra precision. The 61925c28e83SPiotr Jasiukajtis * dominating error in T3 would then be the rounding error of the 62025c28e83SPiotr Jasiukajtis * second term 2/3*s**3. To force the rounding bounded by 62125c28e83SPiotr Jasiukajtis * the required accuracy, we have 62225c28e83SPiotr Jasiukajtis * single: |2/3*s**3| < 2**-11 == > |s|<0.09014... 62325c28e83SPiotr Jasiukajtis * double: |2/3*s**3| < 2**-14 == > |s|<0.04507... 62425c28e83SPiotr Jasiukajtis * quad : |2/3*s**3| < 2**-18 == > |s|<0.01788... = 2**(-5.80..) 62525c28e83SPiotr Jasiukajtis * 62625c28e83SPiotr Jasiukajtis * Base on this analysis, we choose Z = {z(i)|z(i)=1+i/64+1/128, 0<=i<=63}. 62725c28e83SPiotr Jasiukajtis * For any y in [1,2), let j = [64*y] chopped to integer, then z(j) is 62825c28e83SPiotr Jasiukajtis * the closest to y, and it is not difficult to see that |s| < 2**(-8). 62925c28e83SPiotr Jasiukajtis * Please note that the polynomial approximation of T3 must be accurate 63025c28e83SPiotr Jasiukajtis * -24-11 -35 -53-14 -67 -113-18 -131 63125c28e83SPiotr Jasiukajtis * to 2 =2 , 2 = 2 , and 2 =2 63225c28e83SPiotr Jasiukajtis * for single, double, and quadruple precision respectively. 63325c28e83SPiotr Jasiukajtis * 63425c28e83SPiotr Jasiukajtis * Inplementation notes. 63525c28e83SPiotr Jasiukajtis * (1) Table look-up entries for T1(n) and T2(j), as well as the calculation 63625c28e83SPiotr Jasiukajtis * of the leading term 2s in T3, are broken up into leading and trailing 63725c28e83SPiotr Jasiukajtis * part such that (leading part)* 2**24 will always be an integer. That 63825c28e83SPiotr Jasiukajtis * will guarantee the addition of the leading parts will be exact. 63925c28e83SPiotr Jasiukajtis * 64025c28e83SPiotr Jasiukajtis * 2**(-24) 64125c28e83SPiotr Jasiukajtis * _________V___________________ 64225c28e83SPiotr Jasiukajtis * T1(n): |_________|___________________| 64325c28e83SPiotr Jasiukajtis * _______ ______________________ 64425c28e83SPiotr Jasiukajtis * T2(j): |_______|______________________| 64525c28e83SPiotr Jasiukajtis * ____ _______________________ 64625c28e83SPiotr Jasiukajtis * 2s: |____|_______________________| 64725c28e83SPiotr Jasiukajtis * __________________________ 64825c28e83SPiotr Jasiukajtis * + T3(s)-2s: |__________________________| 64925c28e83SPiotr Jasiukajtis * ------------------------------------------- 65025c28e83SPiotr Jasiukajtis * [leading] + [Trailing] 65125c28e83SPiotr Jasiukajtis * 65225c28e83SPiotr Jasiukajtis * (2) How to compute 2s accurately. 65325c28e83SPiotr Jasiukajtis * (A) Compute v = 2s to the working precision. If |v| < 2**(-18), 65425c28e83SPiotr Jasiukajtis * stop. 65525c28e83SPiotr Jasiukajtis * (B) chopped v to 2**(-24): v = ((int)(v*2**24))/2**24 65625c28e83SPiotr Jasiukajtis * (C) 2s = v + (2s - v), where 65725c28e83SPiotr Jasiukajtis * 1 65825c28e83SPiotr Jasiukajtis * 2s - v = --- * (2(y-z) - v*(y+z) ) 65925c28e83SPiotr Jasiukajtis * y+z 66025c28e83SPiotr Jasiukajtis * 1 66125c28e83SPiotr Jasiukajtis * = --- * ( [2(y-z) - v*(y+z)_h ] - v*(y+z)_l ) 66225c28e83SPiotr Jasiukajtis * y+z 66325c28e83SPiotr Jasiukajtis * where (y+z)_h = (y+z) rounded to 24 bits by (double)(float), 66425c28e83SPiotr Jasiukajtis * and (y+z)_l = ((z+z)-(y+z)_h)+(y-z). Note the the quantity 66525c28e83SPiotr Jasiukajtis * in [] is exact. 66625c28e83SPiotr Jasiukajtis * 2 4 66725c28e83SPiotr Jasiukajtis * (3) Remez approximation for (T3(s)-2s)/s = T3[0]*s + T3[1]*s + ...: 66825c28e83SPiotr Jasiukajtis * Single precision: 1 term (compute in double precision arithmetic) 66925c28e83SPiotr Jasiukajtis * T3(s) = 2s + S1*s^3, S1 = 0.6666717231848518054693623697539230 67025c28e83SPiotr Jasiukajtis * Remez error: |T3(s)/s - (2s+S1*s^3)| < 2**(-35.87) 67125c28e83SPiotr Jasiukajtis * Double precision: 3 terms, Remez error is bounded by 2**(-72.40), 67225c28e83SPiotr Jasiukajtis * see "tgamma_log" 67325c28e83SPiotr Jasiukajtis * Quad precision: 7 terms, Remez error is bounded by 2**(-136.54), 67425c28e83SPiotr Jasiukajtis * see "tgammal_log" 67525c28e83SPiotr Jasiukajtis * 67625c28e83SPiotr Jasiukajtis * The computation of 0.5*(ln(2pi)-1): 67725c28e83SPiotr Jasiukajtis * 0.5*(ln(2pi)-1) = 0.4189385332046727417803297364056176398614... 67825c28e83SPiotr Jasiukajtis * split 0.5*(ln(2pi)-1) to hln2pi_h + hln2pi_l, where hln2pi_h is the 67925c28e83SPiotr Jasiukajtis * leading 21 bits of the constant. 68025c28e83SPiotr Jasiukajtis * hln2pi_h= 0.4189383983612060546875 68125c28e83SPiotr Jasiukajtis * hln2pi_l= 1.348434666870928297364056176398612173648e-07 68225c28e83SPiotr Jasiukajtis * 68325c28e83SPiotr Jasiukajtis * The computation of 1/x*P(1/x^2) = log(G(x))-(x-.5)(ln(x)-1)-(.5ln(2pi)-1): 68425c28e83SPiotr Jasiukajtis * Let s = 1/x <= 1/8 < 0.125. We have 68525c28e83SPiotr Jasiukajtis * quad precision 68625c28e83SPiotr Jasiukajtis * |GP(s) - s*P(s^2)| <= 2**(-120.6), where 68725c28e83SPiotr Jasiukajtis * 3 5 39 68825c28e83SPiotr Jasiukajtis * GP(s) = GP0*s+GP1*s +GP2*s +... +GP19*s , 68925c28e83SPiotr Jasiukajtis * GP0 = 0.083333333333333333333333333333333172839171301 69025c28e83SPiotr Jasiukajtis * hex 0x3ffe5555 55555555 55555555 55555548 69125c28e83SPiotr Jasiukajtis * GP1 = -2.77777777777777777777777777492501211999399424104e-0003 69225c28e83SPiotr Jasiukajtis * GP2 = 7.93650793650793650793635650541638236350020883243e-0004 69325c28e83SPiotr Jasiukajtis * GP3 = -5.95238095238095238057299772679324503339241961704e-0004 69425c28e83SPiotr Jasiukajtis * GP4 = 8.41750841750841696138422987977683524926142600321e-0004 69525c28e83SPiotr Jasiukajtis * GP5 = -1.91752691752686682825032547823699662178842123308e-0003 69625c28e83SPiotr Jasiukajtis * GP6 = 6.41025641022403480921891559356473451161279359322e-0003 69725c28e83SPiotr Jasiukajtis * GP7 = -2.95506535798414019189819587455577003732808185071e-0002 69825c28e83SPiotr Jasiukajtis * GP8 = 1.79644367229970031486079180060923073476568732136e-0001 69925c28e83SPiotr Jasiukajtis * GP9 = -1.39243086487274662174562872567057200255649290646e+0000 70025c28e83SPiotr Jasiukajtis * GP10 = 1.34025874044417962188677816477842265259608269775e+0001 70125c28e83SPiotr Jasiukajtis * GP11 = -1.56803713480127469414495545399982508700748274318e+0002 70225c28e83SPiotr Jasiukajtis * GP12 = 2.18739841656201561694927630335099313968924493891e+0003 70325c28e83SPiotr Jasiukajtis * GP13 = -3.55249848644100338419187038090925410976237921269e+0004 70425c28e83SPiotr Jasiukajtis * GP14 = 6.43464880437835286216768959439484376449179576452e+0005 70525c28e83SPiotr Jasiukajtis * GP15 = -1.20459154385577014992600342782821389605893904624e+0007 70625c28e83SPiotr Jasiukajtis * GP16 = 2.09263249637351298563934942349749718491071093210e+0008 70725c28e83SPiotr Jasiukajtis * GP17 = -2.96247483183169219343745316433899599834685703457e+0009 70825c28e83SPiotr Jasiukajtis * GP18 = 2.88984933605896033154727626086506756972327292981e+0010 70925c28e83SPiotr Jasiukajtis * GP19 = -1.40960434146030007732838382416230610302678063984e+0011 71025c28e83SPiotr Jasiukajtis * 71125c28e83SPiotr Jasiukajtis * double precision 71225c28e83SPiotr Jasiukajtis * |GP(s) - s*P(s^2)| <= 2**(-63.5), where 71325c28e83SPiotr Jasiukajtis * 3 5 7 9 11 13 15 71425c28e83SPiotr Jasiukajtis * GP(s) = GP0*s+GP1*s +GP2*s +GP3*s +GP4*s +GP5*s +GP6*s +GP7*s , 71525c28e83SPiotr Jasiukajtis * 71625c28e83SPiotr Jasiukajtis * GP0= 0.0833333333333333287074040640618477 (3FB55555 55555555) 71725c28e83SPiotr Jasiukajtis * GP1= -2.77777777776649355200565611114627670089130772843e-0003 71825c28e83SPiotr Jasiukajtis * GP2= 7.93650787486083724805476194170211775784158551509e-0004 71925c28e83SPiotr Jasiukajtis * GP3= -5.95236628558314928757811419580281294593903582971e-0004 72025c28e83SPiotr Jasiukajtis * GP4= 8.41566473999853451983137162780427812781178932540e-0004 72125c28e83SPiotr Jasiukajtis * GP5= -1.90424776670441373564512942038926168175921303212e-0003 72225c28e83SPiotr Jasiukajtis * GP6= 5.84933161530949666312333949534482303007354299178e-0003 72325c28e83SPiotr Jasiukajtis * GP7= -1.59453228931082030262124832506144392496561694550e-0002 72425c28e83SPiotr Jasiukajtis * single precision 72525c28e83SPiotr Jasiukajtis * |GP(s) - s*P(s^2)| <= 2**(-37.78), where 72625c28e83SPiotr Jasiukajtis * 3 5 72725c28e83SPiotr Jasiukajtis * GP(s) = GP0*s+GP1*s +GP2*s 72825c28e83SPiotr Jasiukajtis * GP0 = 8.33333330959694065245736888749042811909994573178e-0002 72925c28e83SPiotr Jasiukajtis * GP1 = -2.77765545601667179767706600890361535225507762168e-0003 73025c28e83SPiotr Jasiukajtis * GP2 = 7.77830853479775281781085278324621033523037489883e-0004 73125c28e83SPiotr Jasiukajtis * 73225c28e83SPiotr Jasiukajtis * 73325c28e83SPiotr Jasiukajtis * Implementation note: 73425c28e83SPiotr Jasiukajtis * z = (1/x), z2 = z*z, z4 = z2*z2; 73525c28e83SPiotr Jasiukajtis * p = z*(GP0+z2*(GP1+....+z2*GP7)) 73625c28e83SPiotr Jasiukajtis * = z*(GP0+(z4*(GP2+z4*(GP4+z4*GP6))+z2*(GP1+z4*(GP3+z4*(GP5+z4*GP7))))) 73725c28e83SPiotr Jasiukajtis * 73825c28e83SPiotr Jasiukajtis * Adding everything up: 73925c28e83SPiotr Jasiukajtis * t = rr.h*ww.h+hln2pi_h ... exact 74025c28e83SPiotr Jasiukajtis * w = (hln2pi_l + ((x-0.5)*ww.l+rr.l*ww.h)) + p 74125c28e83SPiotr Jasiukajtis * 74225c28e83SPiotr Jasiukajtis * Computing exp(t+w): 74325c28e83SPiotr Jasiukajtis * s = t+w; write s = (n+j/32)*ln2+r, |r|<=(1/64)*ln2, then 74425c28e83SPiotr Jasiukajtis * exp(s) = 2**n * (2**(j/32) + 2**(j/32)*expm1(r)), where 74525c28e83SPiotr Jasiukajtis * expm1(r) = r + Et1*r^2 + Et2*r^3 + ... + Et5*r^6, and 74625c28e83SPiotr Jasiukajtis * 2**(j/32) is obtained by table look-up S[j]+S_trail[j]. 74725c28e83SPiotr Jasiukajtis * Remez error bound: 74825c28e83SPiotr Jasiukajtis * |exp(r) - (1+r+Et1*r^2+...+Et5*r^6)| <= 2^(-63). 74925c28e83SPiotr Jasiukajtis */ 75025c28e83SPiotr Jasiukajtis 75125c28e83SPiotr Jasiukajtis #include "libm.h" 75225c28e83SPiotr Jasiukajtis 75325c28e83SPiotr Jasiukajtis #define __HI(x) ((int *) &x)[HIWORD] 75425c28e83SPiotr Jasiukajtis #define __LO(x) ((unsigned *) &x)[LOWORD] 75525c28e83SPiotr Jasiukajtis 75625c28e83SPiotr Jasiukajtis struct Double { 75725c28e83SPiotr Jasiukajtis double h; 75825c28e83SPiotr Jasiukajtis double l; 75925c28e83SPiotr Jasiukajtis }; 76025c28e83SPiotr Jasiukajtis 76125c28e83SPiotr Jasiukajtis /* Hex value of GP0 shoule be 3FB55555 55555555 */ 76225c28e83SPiotr Jasiukajtis static const double c[] = { 76325c28e83SPiotr Jasiukajtis +1.0, 76425c28e83SPiotr Jasiukajtis +2.0, 76525c28e83SPiotr Jasiukajtis +0.5, 76625c28e83SPiotr Jasiukajtis +1.0e-300, 76725c28e83SPiotr Jasiukajtis +6.66666666666666740682e-01, /* A1=T3[0] */ 76825c28e83SPiotr Jasiukajtis +3.99999999955626478023093908674902212920e-01, /* A2=T3[1] */ 76925c28e83SPiotr Jasiukajtis +2.85720221533145659809237398709372330980e-01, /* A3=T3[2] */ 77025c28e83SPiotr Jasiukajtis +0.0833333333333333287074040640618477, /* GP[0] */ 77125c28e83SPiotr Jasiukajtis -2.77777777776649355200565611114627670089130772843e-03, 77225c28e83SPiotr Jasiukajtis +7.93650787486083724805476194170211775784158551509e-04, 77325c28e83SPiotr Jasiukajtis -5.95236628558314928757811419580281294593903582971e-04, 77425c28e83SPiotr Jasiukajtis +8.41566473999853451983137162780427812781178932540e-04, 77525c28e83SPiotr Jasiukajtis -1.90424776670441373564512942038926168175921303212e-03, 77625c28e83SPiotr Jasiukajtis +5.84933161530949666312333949534482303007354299178e-03, 77725c28e83SPiotr Jasiukajtis -1.59453228931082030262124832506144392496561694550e-02, 77825c28e83SPiotr Jasiukajtis +4.18937683105468750000e-01, /* hln2pi_h */ 77925c28e83SPiotr Jasiukajtis +8.50099203991780279640e-07, /* hln2pi_l */ 78025c28e83SPiotr Jasiukajtis +4.18938533204672741744150788368695779923320328369e-01, /* hln2pi */ 78125c28e83SPiotr Jasiukajtis +2.16608493865351192653e-02, /* ln2_32hi */ 78225c28e83SPiotr Jasiukajtis +5.96317165397058656257e-12, /* ln2_32lo */ 78325c28e83SPiotr Jasiukajtis +4.61662413084468283841e+01, /* invln2_32 */ 78425c28e83SPiotr Jasiukajtis +5.0000000000000000000e-1, /* Et1 */ 78525c28e83SPiotr Jasiukajtis +1.66666666665223585560605991943703896196054020060e-01, /* Et2 */ 78625c28e83SPiotr Jasiukajtis +4.16666666665895103520154073534275286743788421687e-02, /* Et3 */ 78725c28e83SPiotr Jasiukajtis +8.33336844093536520775865096538773197505523826029e-03, /* Et4 */ 78825c28e83SPiotr Jasiukajtis +1.38889201930843436040204096950052984793587640227e-03, /* Et5 */ 78925c28e83SPiotr Jasiukajtis }; 79025c28e83SPiotr Jasiukajtis 79125c28e83SPiotr Jasiukajtis #define one c[0] 79225c28e83SPiotr Jasiukajtis #define two c[1] 79325c28e83SPiotr Jasiukajtis #define half c[2] 79425c28e83SPiotr Jasiukajtis #define tiny c[3] 79525c28e83SPiotr Jasiukajtis #define A1 c[4] 79625c28e83SPiotr Jasiukajtis #define A2 c[5] 79725c28e83SPiotr Jasiukajtis #define A3 c[6] 79825c28e83SPiotr Jasiukajtis #define GP0 c[7] 79925c28e83SPiotr Jasiukajtis #define GP1 c[8] 80025c28e83SPiotr Jasiukajtis #define GP2 c[9] 80125c28e83SPiotr Jasiukajtis #define GP3 c[10] 80225c28e83SPiotr Jasiukajtis #define GP4 c[11] 80325c28e83SPiotr Jasiukajtis #define GP5 c[12] 80425c28e83SPiotr Jasiukajtis #define GP6 c[13] 80525c28e83SPiotr Jasiukajtis #define GP7 c[14] 80625c28e83SPiotr Jasiukajtis #define hln2pi_h c[15] 80725c28e83SPiotr Jasiukajtis #define hln2pi_l c[16] 80825c28e83SPiotr Jasiukajtis #define hln2pi c[17] 80925c28e83SPiotr Jasiukajtis #define ln2_32hi c[18] 81025c28e83SPiotr Jasiukajtis #define ln2_32lo c[19] 81125c28e83SPiotr Jasiukajtis #define invln2_32 c[20] 81225c28e83SPiotr Jasiukajtis #define Et1 c[21] 81325c28e83SPiotr Jasiukajtis #define Et2 c[22] 81425c28e83SPiotr Jasiukajtis #define Et3 c[23] 81525c28e83SPiotr Jasiukajtis #define Et4 c[24] 81625c28e83SPiotr Jasiukajtis #define Et5 c[25] 81725c28e83SPiotr Jasiukajtis 81825c28e83SPiotr Jasiukajtis /* 81925c28e83SPiotr Jasiukajtis * double precision coefficients for computing log(x)-1 in tgamma. 82025c28e83SPiotr Jasiukajtis * See "algorithm" for details 82125c28e83SPiotr Jasiukajtis * 82225c28e83SPiotr Jasiukajtis * log(x) - 1 = T1(n) + T2(j) + T3(s), where x = 2**n * y, 1<=y<2, 82325c28e83SPiotr Jasiukajtis * j=[64*y], z[j]=1+j/64+1/128, s = (y-z[j])/(y+z[j]), and 82425c28e83SPiotr Jasiukajtis * T1(n) = T1[2n,2n+1] = n*log(2)-1, 82525c28e83SPiotr Jasiukajtis * T2(j) = T2[2j,2j+1] = log(z[j]), 82625c28e83SPiotr Jasiukajtis * T3(s) = 2s + T3[0]s^3 + T3[1]s^5 + T3[2]s^7 82725c28e83SPiotr Jasiukajtis * = 2s + A1*s^3 + A2*s^5 + A3*s^7 (see const A1,A2,A3) 82825c28e83SPiotr Jasiukajtis * Note 82925c28e83SPiotr Jasiukajtis * (1) the leading entries are truncated to 24 binary point. 83025c28e83SPiotr Jasiukajtis * See Remezpak/sun/tgamma_log_64.c 83125c28e83SPiotr Jasiukajtis * (2) Remez error for T3(s) is bounded by 2**(-72.4) 83225c28e83SPiotr Jasiukajtis * See mpremez/work/Log/tgamma_log_4_outr2 83325c28e83SPiotr Jasiukajtis */ 83425c28e83SPiotr Jasiukajtis 83525c28e83SPiotr Jasiukajtis static const double T1[] = { 83625c28e83SPiotr Jasiukajtis -1.00000000000000000000e+00, /* 0xBFF00000 0x00000000 */ 83725c28e83SPiotr Jasiukajtis +0.00000000000000000000e+00, /* 0x00000000 0x00000000 */ 83825c28e83SPiotr Jasiukajtis -3.06852817535400390625e-01, /* 0xBFD3A37A 0x00000000 */ 83925c28e83SPiotr Jasiukajtis -1.90465429995776763166e-09, /* 0xBE205C61 0x0CA86C38 */ 84025c28e83SPiotr Jasiukajtis +3.86294305324554443359e-01, /* 0x3FD8B90B 0xC0000000 */ 84125c28e83SPiotr Jasiukajtis +5.57953361754750897367e-08, /* 0x3E6DF473 0xDE6AF279 */ 84225c28e83SPiotr Jasiukajtis +1.07944148778915405273e+00, /* 0x3FF14564 0x70000000 */ 84325c28e83SPiotr Jasiukajtis +5.38906818755173187963e-08, /* 0x3E6CEEAD 0xCDA06BB5 */ 84425c28e83SPiotr Jasiukajtis +1.77258867025375366211e+00, /* 0x3FFC5C85 0xF0000000 */ 84525c28e83SPiotr Jasiukajtis +5.19860275755595544734e-08, /* 0x3E6BE8E7 0xBCD5E4F2 */ 84625c28e83SPiotr Jasiukajtis +2.46573585271