5 F.4th 1341

Fed. Cir.

2021

STUPP CORPORATION, A DIVISION OF STUPP BROS., INC., WELSPUN TUBULAR LLC USA, IPSCO TUBULARS, INC., MAVERICK TUBE CORPORATION, Plaintiffs v. UNITED STATES, Defendant-Appellee HYUNDAI STEEL COMPANY, Defendant SEAH STEEL CORP., Defendant-Appellant

2020-1857

United States Court of Appeals for the Federal Circuit

Decided: July 15, 2021

Appeal from the United States Court of International Trade in Nos. 1:15-cv-00334-CRK, 1:15-cv-00336-CRK, 1:15-cv-00337-CRK, Judge Claire R. Kelly.

ROBERT R. KIEPURA, Commercial Litigaton Branch, Civil Division, United States Department of Justice, Washington, DC, argued for defendant-appellee. Also represented by CLAUDIA BURKE, JEFFREY B. CLARK, JEANNE DAVIDSON; REZA KARAMLOO, Office of the Chief Counsel for Trade Enforcement & Compliance, United States Department of Commerce, Washington, DC.

JEFFREY M. WINTON, Winton & Chapman PLLC, Washington, DC, argued for defendant-appellant.

Before TARANTO, BRYSON, and CHEN, Circuit Judges.

BRYSON, Circuit Judge.

Appellant SeAH Steel Corporation appeals from a decision of the Court of International Trade (“the Trade Court“) affirming a final determination of the United States Department of Commerce in an antidumping duty investigation. In that investigation, Commerce assessed SeAH a weighted average dumping margin above the de minimis threshold, which subjected SeAH to antidumping duties. SeAH challenges Commerce‘s rejection of portions of SeAH‘s case brief and various aspects of the analysis Commerce used to derive the dumping margin. We affirm with respect to the case brief issue and with respect to most of SeAH‘s challenges to Commerce‘s analysis. We vacate and remand, however, on the issue of whether it was reasonable for Commerce to apply a portion of its analysis—specifically, the “Cohen‘s d test“—to sales data that may have been of insufficient size, not normally distributed, and lacking roughly equal variances.

I

In late 2014, Commerce initiated a less-than-fair-value investigation into the importation of welded line pipe from the Republic of Korea. See Welded Line Pipe from the Republic of Korea: Preliminary Determination, 80 Fed. Reg. 29,620 (Dep‘t of Commerce May 22, 2015). The investigation covered the period from October 1, 2013, through September 30, 2014, and focused on sales by two Korea-based respondents, SeAH and Hyundai HYSCO.

Commerce issued a preliminary determination on May 14, 2015, that SeAH was, or likely was, selling welded line pipe in the United States at less than fair value during the relevant period. SeAH filed a case brief challenging Commerce‘s statistical analysis and citing academic literature in support of that challenge. Commerce rejected SeAH‘s case brief because Commerce found that it violated procedural regulations governing the filing of new factual information. J.A. 9698–99.

Commerce issued a final determination on October 13, 2015. Welded Line Pipe from the Republic of Korea: Final Determination, 80 Fed. Reg. 61,366, and accompanying Issues and Decision Memorandum (Dep‘t of Commerce Oct. 5, 2015) (“Final Memo“), available at https://enforcement.trade.gov/frn/summary/korea-south/2015-25980-1.pdf. In that final determination, Commerce found that SeAH had dumped welded line pipe in the United States, calculating SeAH‘s weighted average dumping margin to be above the de minimis threshold for less-than-fair-value investigations. Final Determination, 80 Fed. Reg. at 61,367.

When calculating a weighted average dumping margin, Commerce typically uses the average-to-average comparison method. 19 C.F.R. § 351.414(c)(1); see also 19 U.S.C. § 1677f-1(d)(1). That method compares the weighted average of the respondent‘s sales prices in its home country during the investigation period to the weighted average of the respondent‘s sales prices in the United States during the same period. 19 C.F.R. § 351.414(b)(1). The average-to-average method, however, sometimes fails to detect “targeted” or “masked” dumping, because a respondent‘s “sales of low-priced ‘dumped’ merchandise would be averaged with (and offset by) sales of higher-priced ‘masking’ mer-chandise, giving the impression that no dumping was taking place.” Apex Frozen Foods Priv. Ltd. v. United States, 862 F.3d 1337, 1341 (Fed. Cir. 2017) (”Apex II“).

To address the problem of targeted dumping, Congress created an exception to the use of the average-to-average method. Congress provided that when “(i) there is a pattern of export prices¹ (or constructed export prices) for comparable merchandise that differ significantly among purchasers, regions, or periods of time, and (ii) [Commerce] explains why such differences cannot be taken into account using [the average-to-average method],” Commerce may compare the weighted average of the respondent‘s sales prices in the home country to the respondent‘s individual sales prices in the United States. 19 U.S.C. § 1677f-1(d)(1)(B). The rationale behind that statutory exception is that targeted dumping is more likely to be occurring when export prices fit a pricing model that differs significantly among different periods of time, different purchasers, or different regions of the United States. Apex II, 862 F.3d at 1347. Commerce refers to the alternative method of calculating a weighted average dumping margin as the “average-to-transaction” method. See 19 C.F.R. § 351.414(b)(3).

Congress has not delineated exactly how Commerce is to assess whether there is a “pattern of export prices . . . differ[ing] significantly among purchasers, regions, or periods of time,” or how Commerce is to “explain[] why such differences cannot be taken into account’ using the average-to-average or transaction-to-transaction methods.” Dillinger France S.A. v. United States, 981 F.3d 1318, 1324–25 n.5 (Fed. Cir. 2020) (quoting section 1677f-1(d)(1)(B)); see also Apex II, 862 F.3d at 1346. Commerce has therefore devised a means for implementing Congress‘s directive. Until 2014, Commerce applied the “Nails test” to detect targeted dumping. See JBF RAK LLC v. United States, 790 F.3d 1358, 1367 n.5 (Fed. Cir. 2015). From 2013 to 2014, Commerce refined its methodology and began applying what it now calls “differential pricing analysis.” See Differential Pricing Analysis; Request for Comments, 79 Fed. Reg. 26,720, 26,722 (Dep‘t of Commerce May 9, 2014); Xanthan Gum from the People‘s Republic of China, 78 Fed. Reg. 33,351 (Dep‘t of Commerce June 4, 2013).

We have summarized the methodology behind Commerce‘s differential pricing analysis in prior decisions. See, e.g., Apex II, 862 F.3d at 1343 n.2. Because the issues in this case concern specific aspects of that methodology, we provide a more thorough description below.

Before Commerce can conduct its differential pricing analysis, it must first collect data regarding the respondent‘s export sales and home sales. See Final Memo at 1. If those sales span multiple distinct products, Commerce segments the sales into sets based on comparable product groups. See Differential Pricing Analysis, 79 Fed. Reg. at 26,722.

To begin the differential pricing analysis, Commerce further segments the respondent‘s export sales for each product group into subsets based on the region of the United States in which those sales took place. Id. Commerce similarly constructs subsets based on the purchasers involved in the sales (i.e., the purchaser category) and also based on the time periods in which the sales took place (i.e., the time-period category). Id. A particular export sale will be present in multiple subsets across the regional, purchaser, and time-period categories. See id.

For each subset within a category, Commerce makes that subset the “test group” and aggregates the remaining subsets in that category into the “comparison group.” Id. If both groups have at least two observations (i.e., sales prices), and if the sum of the comparison group is at least five percent of the total amount of export sales, Commerce applies the “Cohen‘s d test,” named after statistician Jacob Cohen, to evaluate whether the test group differs significantly from the comparison group. Id. The formula for calculating the Cohen‘s d value is as follows:

Image in original document— Cohen's d formula

see Large Residential Washers from the Republic of Korea, 2016 WL 5854390 (Dep‘t of Commerce Sept. 6, 2016) (noting that Commerce applies the “two-tailed” version of the Cohen‘s d test, which uses the absolute-value operator to “focus[] on both lower and higher prices“). In the formula used by Commerce, M_c is the mean of the comparison group, M_t is the mean of the test group, and σ_p is the simple average of the two groups’ standard deviations. See Mid Continent Steel & Wire, Inc. v. United States, 495 F. Supp. 3d 1298, 1304 (Ct. Int‘l Trade 2021) (appeal docketed).

If the Cohen‘s d value is equal to or greater than 0.8 for any test group, the observations within that group are said to have “passed” the Cohen‘s d test, i.e., Commerce deems the sales prices in the test group to be significantly different from the sales prices in the comparison group. Id. at 1302–04. Commerce applies the Cohen‘s d test to each test group within the regional, purchaser, and time-period categories. See Differential Pricing Analysis, 79 Fed. Reg. at 26,722–23.

Commerce counts the number of observations within each product group that were tagged as “passing,” and applies what it calls a “ratio test” to the results: If the total percentage of passing transactions is 33% or less, Commerce uses the default average-to-average method to calculate the weighted average dumping margin. If the total percentage is 66% or more, Commerce tentatively selects the alternative average-to-transaction method as the method it will use to calculate the weighted average dumping margin. If the total percentage is between 33% and 66%, Commerce tentatively selects a hybrid approach in which it applies the alternative average-to-transaction method to those transactions passing the Cohen‘s d test and the average-to-average method to the remainder of the transactions. Id.

If Commerce tentatively selects an alternative comparison method, it confirms its selection by applying the “meaningful difference” test to determine whether using the default average-to-average method can account for the disparate pricing patterns that were discovered by the Cohen‘s d test and the ratio test. Id. at 26,723 (implementing 19 U.S.C. § 1677f-1(d)(1)(B)(ii)). The first step of the meaningful difference test is to calculate the weighted average dumping margin using the average-to-average method. The second step is to calculate the weighted average dumping margin with the tentatively selected method. The third step is to compare the results: If the margin for the average-to-average method is below the de minimis threshold² and the margin for the tentatively selected method is above that threshold, or if both are above that threshold and the margin for the tentatively selected method is 25% greater than the average-to-average margin, then Commerce considers there to be a meaningful difference, and it selects the alternative approach. Id. If that comparison leads Commerce to conclude that there is not a meaningful difference, Commerce applies the average-to-average method across the board.

As alluded to above, the average-to-average comparison method involves subtracting the weighted average of the export prices for a particular product group from the weighted average of the home market prices for that product group and multiplying the result by the total number of export units sold for that product group.³ See 19 C.F.R. § 351.414(b)(1) and (d)(1).

The average-to-transaction method involves subtracting each individual export price for a particular product group from the weighted average of the home market prices for that product group in an iterative fashion, and summing the results. See id. § 351.414(b)(3). Notably, when applying the average-to-transaction method, Commerce “zeroes out” iterations that produce a negative dumping margin (i.e., when the weighted average home market price is less than an individual export price), a practice known as “zeroing.” Mid Continent Steel & Wire, Inc. v. United States, 940 F.3d 662, 671–72 (Fed. Cir. 2019).

Both methods result in dumping margins that Commerce then aggregates across the product groups. See 19 U.S.C. § 1677(35)(A) and (B) (defining “[d]umping margin” and “[w]eighted average dumping margin“). Finally, Commerce divides the aggregate dumping margin by the total value of the export sales, yielding the weighted average dumping margin. See id. If the weighted average dumping margin is greater than the de minimis threshold, Commerce makes a final determination that the respondent is selling goods in the United States at less than fair value, which can lead to the entry of an antidumping duty order. See id. §§ 1673d, 1673e.

In this case, Commerce applied its differential pricing analysis to SeAH‘s sales of welded line pipe and selected the hybrid approach for calculating SeAH‘s weighted average dumping margin. J.A. 10451; see also Final Memo at 4. That approach resulted in a weighted average dumping margin of 2.53%, which is above the de minimis threshold. Final Determination, 80 Fed. Reg. at 61,367.

SeAH appealed to the Trade Court. Among other issues, SeAH challenged specific aspects of Commerce‘s differential pricing analysis and Commerce‘s rejection of SeAH‘s case brief. Stupp Corp. v. United States, 359 F. Supp. 3d 1293, 1297 (Ct. Int‘l Trade 2019) (”Stupp I“). The Trade Court affirmed. Id.⁴

II

A

SeAH contends on appeal that Commerce acted unlawfully when it rejected SeAH‘s case brief. SeAH submitted its case brief on September 1, 2015, more than three months after Commerce issued its preliminary determination on May 14, 2015. In that case brief, SeAH cited for the first time certain academic articles in support of its argument that Commerce was misusing the Cohen‘s d test. See J.A. 9582-92. SeAH also presented results from a statistical analysis showing that its U.S. sales data were not normally distributed. J.A. 9586–87. Additionally, SeAH presented the results from its own application of Commerce‘s differential pricing analysis to ten hypothetical datasets that it generated based on the sales data in this case. J.A. 9582. The results identified disparate pricing patterns in five of those randomly generated datasets. According to SeAH, those results demonstrated that Commerce‘s differential pricing analysis produces false positives.

Commerce rejected those portions of SeAH‘s case brief because of several procedural violations. J.A. 9698. Commerce first noted that those portions of SeAH‘s case brief contained “factual information” and that such information likely fell under either subparagraph (iv) or (v) of 19 C.F.R. § 351.102(b)(21).⁵ According to Commerce, SeAH failed to identify the subparagraph of section 351.102(b)(21) under which that factual information was being submitted, as required by 19 C.F.R. § 351.301(b). Commerce added that if that factual information fell within the catch-all provision of subparagraph (v), SeAH failed to satisfy section 351.301(b)(1), which required SeAH to explain why that factual information did not fall within subparagraphs (i) through (iv). Finally, Commerce found that SeAH‘s submission of that factual information was untimely under the deadlines set out in 19 C.F.R. § 351.301(c).⁶ The Trade Court upheld Commerce‘s rejection of SeAH‘s case brief. Stupp I, 359 F. Supp. 3d at 1299–1302. We review Commerce‘s rejection of SeAH‘s case brief for an abuse of discretion. See Micron Tech., Inc. v. United States, 117 F.3d 1386, 1396 (Fed. Cir. 1997).

SeAH argues that Commerce‘s rejection of the case brief was contrary to the position Commerce took in Antidumping Duties; Countervailing Duties, 62 Fed. Reg. 27,296 (Dep‘t of Commerce May 19, 1997) (notice of final rule), where Commerce stated:

Parties are free to comment on verification reports and to make arguments concerning information in the reports up to and including the filing of case and rebuttal briefs . . . . In making their arguments, parties may use factual information already on the record or may draw on information in the public realm to highlight any perceived inaccuracies in a report.

Id. at 27,332. SeAH contends that the academic articles it cited in its case brief are in the “public realm” and that its statistical analyses are derived from data “already on the record.” According to SeAH, Commerce‘s decision directing SeAH to remove those materials from its case brief was therefore inconsistent with Commerce‘s publicly announced policy, and requires reversal.

SeAH misunderstands Commerce‘s statements in the 1997 notice of final rule. In that notice, Commerce explained that the exception to section 351.301(c) allowing parties to reference factual information already on the record or in the public realm pertains only to a party‘s use of factual information to highlight perceived inaccuracies “in a report.” Id. The context of the exception makes clear that “report” means a “verification report[].” Id.

Commerce may issue a verification report before issuing a final determination to “verify relevant factual information” that it previously gathered pursuant to its investigation or review. 19 C.F.R. § 351.307(a). Commerce issued a verification report in this case pertaining to SeAH‘s sales data. See Stupp I, 359 F. Supp. 3d at 1308. However, SeAH‘s references to academic articles and statistical analyses in its case brief were not directed at correcting perceived inaccuracies in Commerce‘s verification report. Instead, SeAH used those materials to support its challenge to Commerce‘s differential pricing analysis, and in particular its challenge to the manner in which Commerce applied the Cohen‘s d test in the preliminary determination. See J.A. 9582–92; see also Appellant‘s Opening Br. 49 (“SeAH‘s case brief to Commerce included discussions . . . concerning statistical practices and the meaning of and requirements for using Cohen‘s d.“). Because SeAH was not rebutting factual conclusions in Commerce‘s verification report, SeAH‘s submission of factual information did not fall within the exception to the requirements of 19 C.F.R. § 351.301(c) described in the 1997 notice of final rule.⁷ SeAH‘s submission was thus untimely and failed to satisfy other procedural requirements set forth in section 351.301 of Commerce‘s regulations.

More broadly, SeAH argues that Commerce‘s rejection of SeAH‘s case brief was contrary to the underlying purpose of section 351.301(c). SeAH reasons that none of the submitted factual information required verification by Commerce, and that allowing that information into the record would not have delayed the investigation. Relatedly, SeAH argues that Commerce has permitted post-deadline submissions of similar factual information in other instances, contrary to Commerce‘s interpretation of its regulations.

Commerce is entitled to broad discretion regarding the manner in which it develops the record in an antidumping investigation. See PSC VSMPO-Avisma Corp. v. United States, 688 F.3d 751, 760 (Fed. Cir. 2012) (“[C]ourts will defer to the judgment of an agency regarding the development of the agency record.“); Micron Tech., 117 F.3d at 1396 (“Congress has implicitly delegated to Commerce the latitude to derive verification procedures ad hoc.“); Am. Alloys, Inc. v. United States, 30 F.3d 1469, 1475 (Fed. Cir. 1994) (“[T]he statute gives Commerce wide latitude in its verification procedures.“). Mindful of that standard, we will not second-guess Commerce‘s application of the procedural requirements governing the submission of factual information in case briefs.

As for SeAH‘s contention that Commerce has permitted other parties to make untimely submissions of factual information in the past, the Supreme Court has explained that an agency is “entitled to a measure of discretion in administering its own procedural rules,” and that as a general principle, it is within the discretion of an administrative agency “to relax or modify its procedural rules adopted for the orderly transaction of business before it when in a given case the ends of justice require it.” Am. Farm Lines v. Black Ball Freight Serv., 397 U.S. 532, 538–39 (1970). Short of a showing that Commerce‘s enforcement of its procedural rules is so haphazard or unreasonable as to be arbitrary or capricious—which SeAH has not shown to be the case—Commerce‘s failure to apply those rules with Procrustean consistency in every case does not deprive it of the authority to enforce those rules in any case. We conclude, therefore, that Commerce‘s rejection of SeAH‘s case brief was not an abuse of discretion.

B

With respect to the standard for reviewing Commerce‘s selection of the statistical tests and numerical cutoffs used in this case, SeAH contends that “substantial evidence” is the appropriate standard. SeAH points out that Commerce did not adopt its differential pricing analysis with the benefit of notice-and-comment rulemaking.⁸ SeAH asserts that Commerce‘s public announcements regarding its differential pricing analysis amount to mere policy statements. Such policy statements, SeAH argues, “are not legally binding,” and the agency may not rely on them to justify applying differential pricing analysis in every case. Appellant‘s Opening Br. 33–35. Pointing to our decision in Washington Red Raspberry Commission v. United States, 859 F.2d 898 (Fed. Cir. 1988), SeAH argues that the proper standard for reviewing Commerce‘s choice of methodology is whether “the record contains substantial evidence supporting [Commerce‘s] basis for its application of [certain statistical principles].” Appellant‘s Opening Br. 36 (quoting Red Raspberry, 859 F.2d at 903).

The Trade Court rejected SeAH‘s arguments on this issue, reasoning that the substantial evidence standard applies to “the outputs” of Commerce‘s statistical analysis, not to Commerce‘s “interpretation of a statute.” Stupp II, 365 F. Supp. 3d at 1378. SeAH‘s labeling of the differential pricing analysis as a “general policy statement” was inaccurate, according to the court. Id. The differential pricing analysis was instead “the result of Commerce interpreting 19 U.S.C. § 1677f-1(d)(1)(B) and devising a methodology to effectuate that interpretation.” Stupp II, 365 F. Supp. 3d at 1378–79. For that reason, the court held that the standard for reviewing Commerce‘s choice of methodology was whether that methodology “reasonably implements a given statutory directive.” Id. at 1378.

We agree with the Trade Court. Contrary to SeAH‘s suggestion, Commerce‘s differential pricing analysis is an interpretive rule, not a general statement of policy. A policy statement “advise[s] the public prospectively of the manner in which the agency proposes to exercise a discretionary power.” Lincoln v. Vigil, 508 U.S. 182, 197 (1993) (quoting Chrysler Corp. v. Brown, 441 U.S. 281, 302 n.31 (1979)). As illustrated in the Lincoln case, an example of an agency‘s exercise of a discretionary power is the decision of the Department of Health and Human Services to cease allocating funds to a particular program when the funds had originally been appropriated to the Department as a lump sum without statutory restrictions. Id.

In this case, while Commerce‘s decision to consider applying the average-to-transaction method is within its discretionary power,⁹ its determination of whether the average-to-transaction method is appropriate in a particular case is not solely within its discretion, because that determination is confined by the statutory language of 19 U.S.C. § 1677f-1(d)(1)(B): (i) there must be a “pattern of export prices . . . that differ significantly among purchasers, regions, or periods of time,” and (ii) Commerce must “explain[] why such differences cannot be taken into account” using the average-to-average method. Commerce‘s differential pricing analysis is an interpretation of that statutory language and thus constitutes an interpretive rule. See Perez v. Mortg. Bankers Ass’n, 575 U.S. 92, 97 (2015) (stating that interpretive rules are “issued by an agency to advise the public of the agency‘s construction of the statutes and rules which it administers” (quoting Shalala v. Guernsey Mem‘l Hosp., 514 U.S. 87, 99 (1995))).

In the alternative, and somewhat contradictorily, SeAH argues that Commerce‘s adoption of its differential pricing analysis constitutes a legislative rule that could be adopted only by notice-and-comment rulemaking. SeAH contends that it is “doubtful” that Commerce‘s differential pricing analysis is merely an interpretive rule, because Commerce‘s decision to apply that analysis resulted in SeAH‘s weighted average dumping margin crossing the de minimis threshold. Appellant‘s Opening Br. 32–35.

SeAH misunderstands the distinction between interpretive and legislative rules. Legislative rules alter the landscape of individual rights and obligations, binding parties with the force and effect of law; interpretive rules, on the other hand, merely clarify existing duties for affected parties. Kisor v. Wilkie, 139 S. Ct. 2400, 2420 (2019); Splane v. West, 216 F.3d 1058, 1063 (Fed. Cir. 2000). Hence, the relevant distinction is not whether a newly adopted rule changes the outcome of a particular case; the relevant distinction is whether the rule is “an attempt to make new law or modify existing law,” as opposed to merely “represent[ing] the agency‘s reading of [existing] statutes.” Id.; see also Am. Postal Workers Union, AFL-CIO v. U.S. Postal Serv., 707 F.2d 548, 560 (D.C. Cir. 1983) (“[T]he impact of a rule has no bearing on whether it is legislative or interpretative; interpretative rules may have a substantial impact on the rights of individuals.” (citing 2 K. Davis, Administrative Law Treatise § 7:8, at 39 (2d ed. 1979))).

Commerce‘s differential pricing analysis does not make new law or modify existing law—it interprets the statutory provision that applies to patterns of significantly differing export prices by providing a mechanism for identifying such patterns. See Guernsey Mem‘l Hosp., 514 U.S. at 97–100 (agency‘s rule requiring amortization of reimbursable defeasance losses was an interpretive rule implementing the statutory mandate that Medicare reimburse only the “necessary costs of efficiently delivering covered services to individuals covered“); POET Biorefining, LLC v. EPA, 970 F.3d 392, 408 (D.C. Cir. 2020) (“If an agency‘s interpretation were a legislative rule simply because it drew ‘crisper and more detailed lines than the authority being interpreted,’ then ‘no rule could pass as an interpretation of a legislative rule unless it were confined to parroting the rule or replacing the original vagueness with another—a regime we have squarely rejected. . . . Rules that are fairly drawn from underlying statutes or regulations may articulate even relatively detailed legal obligations without thereby becoming legislative rules subject to notice and comment.‘” (quoting Am. Mining Cong. v. Mine Safety & Health Admin., 995 F.2d 1106, 1112 (D.C. Cir. 1993))).

Our precedents make clear that the relevant standard for reviewing Commerce‘s selection of statistical tests and numerical cutoffs is reasonableness, not substantial evidence. See, e.g., Mid Continent, 940 F.3d at 667 (“In carrying out its statutorily assigned tasks, Commerce has discretion to make reasonable choices within statutory constraints.” (collecting cases)); Apex II, 862 F.3d at 1346 (holding Commerce‘s “meaningful difference” test to be “reasonable“); JBF, 790 F.3d at 1363, 1367 (holding that Commerce‘s interpretation of 19 U.S.C. § 1677f-1(d)(1)(B)(i) was reasonable and that “[b]ecause Congress did not provide for a direct methodology, Commerce properly filled that gap” (cleaned up)).

Our decision in Red Raspberry is not to the contrary. In that case, we applied the substantial evidence standard to review Commerce‘s determination that a particular respondent‘s dumping margin was de minimis and that the respondent should therefore be excluded from the antidumping duty order. 859 F.2d at 903. At the time of Commerce‘s 1985 final determination in that case, there was no statute defining a de minimis threshold or expressly authorizing a de minimis rule, and Commerce had not adopted or announced any rule defining and supporting a de minimis threshold.¹⁰ Further, Commerce did not adopt a general definition of de minimis dumping in the Red Raspberry case, but simply determined that the particular dumping margin before it in that case was de minimis and insufficient to support an antidumping duty order.¹¹ Hence, unlike in this case, Commerce made factual determinations in Red Raspberry without previously announcing a rule governing those determinations and without interpreting statutory language expressly authorizing those determinations to be made. It was thus appropriate for us to ask whether Commerce‘s decision that a particular dumping margin was de minimis was supported by substantial evidence in the context of the particular investigation under review. See Red Raspberry, 859 F.2d at 903.

In this case, by contrast, Commerce applied its differential pricing analysis, a general approach that Commerce defined in a prior publication, see 79 Fed. Reg. 26,720, as a methodology for implementing the statutory directive in section 1677f-1(d)(1)(B). The appropriate standard for reviewing Commerce‘s differential pricing analysis and the specific components of that methodology is therefore reasonableness. See Mid Continent, 940 F.3d at 667; JBF, 790 F.3d at 1363–64.

C

Turning to the merits of Commerce‘s differential pricing analysis, SeAH contends that Commerce provided no substantive justification for its ratio test, and that the ratio test is otherwise not supported by evidence. Specifically, SeAH argues that Commerce has provided no justification, whether derived from general statistical principles or based on the facts of this case, for using the 33% and 66% cutoffs employed in that test. According to SeAH, Commerce‘s explanation of those cutoffs simply “repeat[s] [the] unsupported assertion that the cut-offs achieve the purposes for which Commerce wants to use them.” Appellant‘s Opening Br. 45. SeAH argues that Commerce was required “to explain why the particular cut-offs it had chosen were appropriate in the specific circumstances of this case. And, it was also required to point to substantial evidence that supported those explanations.” Id. at 45–46. We disagree.

As a preliminary matter, Commerce has explained that the ratio test is not the ultimate determinant of masked dumping. See Issues and Decision Memorandum for Antidumping Duty Administrative Review of Polyethylene Terephthalate Film from India, 80 ITADOC 11,160 (Dep‘t of Commerce Mar. 2, 2015), available at https://enforcement.trade.gov/frn/summary/india/2015-04273-1.pdf (“A determination that there exists a pattern of prices that differ significantly in no way indicates that dumping is being masked in a meaningful way.“). Rather, the ratio test is a preliminary step “aggregat[ing] the results of the comparisons of the means between the test and comparison groups to gauge the extent of the significant differences in prices,” i.e., the “effect size[s].” Id.

More importantly, there is no statutory language telling Commerce how to detect patterns of significantly differing export prices, much less how to aggregate and quantify pricing comparisons across product groups in order to select a statutorily defined comparison method. See 19 U.S.C. § 1677f-1(d)(1)(A)–(B). Commerce therefore has discretion to determine a reasonable methodology to implement the statutory directive. See JTEKT Corp. v. United States, 642 F.3d 1378, 1383 (Fed. Cir. 2011). At the highest

level of abstraction, Commerce is using a conventional method for quantifying comparisons across discrete groups: counting the number of divergent sales prices, as identified by an effect-size test, and calculating the population percentage of those divergent sales prices. We hold that general approach to be reasonable.

Commerce has justified its more specific selection of the 33% and 66% cutoffs. Regarding the 33% cutoff, Commerce explained that “when a third or less of a respondent‘s U.S. sales are not at prices that differ significantly, then these significantly different prices are not extensive enough to satisfy the first requirement of the statute.” Issues and Decision Memorandum for Administrative Review of the Antidumping Duty Order on Certain Steel Nails from the Republic of Korea, 84 ITADOC 56,424 (Dep‘t of Commerce Oct. 16, 2019), available at https://enforcement.trade.gov/frn/summary/korea-south/2019-22992-1.pdf. Likewise, “given its growing experience of applying section 777A(d)(1)(B) of the Act and the application of the [average-to-transaction] method as an alternative to the [average-to-average] method,” Commerce has found that “when two thirds or more of a respondent‘s sales are at prices that differ significantly, then the extent of these sales is so pervasive that it would not permit [Commerce] to separate the effect of the sales where prices differ significantly from those where prices do not differ significantly.” Id. Finally, “when [Commerce] finds that between one third and two thirds of U.S. sales are at prices that differ significantly, then there exists a pattern of prices that differ significantly, and . . . the effect of this pattern can reasonably be separated from the sales whose prices do not differ significantly.” Id. In the latter two situations, Commerce will merely “consider[]” applying the average-to-transaction method, a decision that is ultimately dictated by the meaningful difference test. See id.

Commerce‘s selection of the 33% and 66% cutoffs is a reasonable choice. An alternative approach might be, for example, to use a single cutoff at 50%. That approach would undoubtedly favor some respondents—the more frequent application of the average-to-average method would result in more de minimis dumping margins—but it would disfavor other respondents. For example, respondents having slightly more than 50% of their sales passing the Cohen‘s d test would have the average-to-transaction method applied to all of their sales. Commerce‘s approach is less rigid, providing a middle ground between 33% and 66%, in which the average-to-transaction method is only partially applied. That approach provides a better fit, minimizing both the assessment of antidumping duties that are too high and the assessment of duties that are too low. We conclude that Commerce‘s cutoffs are reasonable in light of the alternatives.

SeAH is mistaken when it asserts that Commerce must demonstrate the propriety of the ratio test with respect to the particular facts of this case. As discussed above, Commerce‘s burden in selecting a methodology for detecting patterns of significantly differing export prices is reasonableness as a matter of law, not substantial evidence on the factual record. SeAH was free to make factual arguments regarding why it was inappropriate to apply the ratio test in this case, but it chose not to do so. Instead, SeAH has challenged the appropriateness of the ratio test in the abstract (e.g., by contending that the test and its cutoffs are “arbitrary“) and wrongly attempts to place the burden on Commerce to justify the use of that test as a matter of substantial evidence in light of the facts of this case.

For those reasons, we hold that Commerce‘s ratio test reasonably implements the statutory requirement that Commerce determine whether there is “a pattern of export prices” “differ[ing] significantly among purchasers, regions, or periods of time” before selecting the average-to-transaction method. 19 U.S.C. § 1677f-1(d)(1)(B)(i).

D

SeAH next challenges Commerce‘s “meaningful difference” test. SeAH argues that Commerce‘s use of that test fails to satisfy the statutory requirement that Commerce “explain[]” why significantly differing export prices among different purchasers, regions, or time periods “cannot be taken into account using [the] average-to-average [method].” Appellant‘s Opening Br. 54-55 (quoting 19 U.S.C. § 1677f-1(d)(1)(B)). According to SeAH, Commerce must show that the average-to-transaction method is more “accurate” than the average-to-average method in order to satisfy that statutory requirement. Id. at 56. SeAH further contends that the meaningful difference test identifies disparities between the results of the two methods only because the average-to-transaction method includes zeroing, while the average-to-average method does not.

Our prior decision in Apex II disposes of SeAH‘s challenges to the “meaningful difference” test. In that case, we addressed and rejected the argument that “Commerce‘s meaningful difference test is unreasonable because it is inconsistent with the statute‘s text.” 862 F.3d at 1347. The appellant in that case argued that the meaningful difference test improperly conflated the ultimate margin calculation with the task of explaining why the average-to-average method could not account for differences in prices. Id. We rejected that argument, and we also rejected the argument that the meaningful difference test was flawed because it simply measured differences in dumping margins caused by zeroing. Id. at 1348-49.

Seeking to distinguish Apex II, SeAH argues that we did not hold in that case that comparisons of the margin calculations from the average-to-average and average-to-transaction methods “are always sufficient in and of themselves.” Appellant‘s Opening Br. 58-59. SeAH is mistaken; our holding in that case had two parts: (1) Commerce‘s meaningful difference test is a reasonable response to the statutory directive to explain why the average-to-average method is inadequate in certain cases, and (2) the meaningful difference test is sufficient to satisfy that directive. See 862 F.3d at 1348-49 (“Commerce‘s methodology compares the [average-to-average] and [average-to-transaction] methodologies, as they are applied in practice, and in a manner this court has expressly condoned. . . . Commerce‘s chosen methodology reasonably achieves the overarching statutory aim of addressing targeted or masked dumping.“). Accordingly, we affirm Commerce‘s use of the meaningful difference test.

E

SeAH next challenges Commerce‘s use of the 0.8 cutoff for determining whether particular results “pass” the Cohen‘s d test. SeAH has two arguments: First, SeAH argues that Commerce‘s selection of the 0.8 cutoff was arbitrary. Second, SeAH argues that Commerce‘s application of the 0.8 cutoff in this case was unsupported by evidence because Professor Cohen‘s suggestion that “0.8 could be considered a ‘large’ effect size” was limited to comparisons involving data that met certain restrictive conditions—“in particular, that the datasets being compared had roughly the same number of data points, were drawn from normal distributions, and had approximately equal variances.” Appellant‘s Opening Br. 27-28. According to SeAH, none of those conditions were satisfied in this case. Id.

We addressed the crux of SeAH‘s first argument in our decision in Mid Continent: “[Appellant] next challenges Commerce‘s reliance on a d ratio of at least 0.8 as a rigid measure of significance of the difference measured by the Cohen‘s d test. . . . This is a challenge to the reasonableness of Commerce‘s choice of one part of the overall analysis of differential pricing . . . .” 940 F.3d at 673. We held that “the 0.8 standard is ‘widely adopted’ as part of a ‘commonly used measure’ of the difference relative to such overall price dispersion . . . . [I]t is reasonable to adopt that measure where there is no better, objective measure of effect size.” Id. (citation omitted).

We did not, however, address SeAH‘s second argument in Mid Continent. We construe that argument as part of SeAH‘s challenge to Commerce‘s use of the Cohen‘s d test, which we address next.

F

SeAH‘s final contention is that Commerce misused the Cohen‘s d test in its differential pricing analysis. SeAH argues that the data in this case did not satisfy the conditions required to achieve meaningful results from the Cohen‘s d test: in particular, the requirements that the test groups and the comparison groups be normally distributed, of sufficient size, and of roughly equal variances.¹² SeAH further argues that even if Commerce merely needed to provide some reasonable basis for adopting the Cohen‘s d test, Commerce‘s only support for using that test was the general view in the academic literature that Cohen‘s d is a reliable measure of effect size. According to SeAH, the literature ceases to provide reasonable support when Commerce applies the test to data that do not satisfy the conditions assumed by that literature.

We agree that there are significant concerns relating to Commerce‘s application of the Cohen‘s d test in this case and, more generally, in adjudications in which the data groups being compared are small, are not normally distributed, and have disparate variances. Our concerns raise

questions about the reasonableness of Commerce‘s use of the Cohen‘s d test in less-than-fair-value adjudications, warranting further supporting explanation from the Department. See Mid Continent, 940 F.3d at 667 (“Commerce must provide an explanation that is adequate to enable the court to determine whether the choices are in fact reasonable, including as to calculation methodologies.“).

Our first concern is a general one: Commerce‘s application of the Cohen‘s d test to data that do not satisfy the assumptions on which the test is based may undermine the usefulness of the interpretive cutoffs. In developing those cutoffs, including the 0.8 cutoff, Professor Cohen noted that “we maintain the assumption that the populations being compared are normal and with equal variability, and conceive them further as equally numerous.” Jacob Cohen, Statistical Power Analysis for the Behavioral Sciences 21 (2d ed. 1988); see also id. at 25-26 (discussing “small effect size” 0.2, “medium effect size” 0.5, and “large effect size” 0.8 “[i]n terms of measures of nonoverlap . . . of the combined area covered by two normal equal-sized equally varying populations“). Other literature confirms those assumptions. See, e.g., Robert J. Grissom & John J. Kim, Effect Sizes for Research: Univariate and Multivariate 66 (2d ed. 2012) (“When the distribution of scores of a comparison population is not normal, the usual interpretation of a d_g or d in terms of estimating the percentile standing of the average-scoring members of another group with respect to the supposed normal distribution of the comparison group‘s scores would be invalid. Also, because standard deviations can be very sensitive to a distribution‘s shape, . . . nonnormality can greatly influence the value of a standardized-mean-difference effect size and its estimate.“); id. at 68 (noting that “Cohen‘s d” is appropriate “if the two populations that are being compared are assumed to have equal variances.“).

There is extensive literature describing the problems associated with applying the Cohen‘s d test to data that are not normally distributed or that are lacking equal variances. See, e.g., Robert Coe, It‘s the Effect Size, Stupid: What effect size is and why it is important, presented at the Annual Conference of the British Educational Research Association (Sept. 2002) (“It has been shown that the interpretation of the ‘standardised mean difference’ measure of effect size [(e.g., Cohen‘s d)] is very sensitive to violations of the assumption of normality.“);¹³ David M. Lane et al., Introduction to Statistics, Online Edition, 645 (“When the effect size is measured in standard deviation units as it is for Hedges’ g and Cohen‘s d, it is important to recognize that the variability in the subjects has a large influence on the effect size measure.“).

In 2005, James Algina and his collaborators inspected the robustness of Cohen‘s d as an effect-size parameter, seeking to determine “if a small change in the population distribution can strongly affect the parameter.” James Algina et al., An Alternative to Cohen‘s Standardized Mean Difference Effect Size: A Robust Parameter and Confidence Interval in the Two Independent Groups Case, 10 Psychological Methods 317, 318 (2005). After simulating Cohen‘s d on various data that followed a mixed-normal distribution, e.g., a heavy-tailed distribution, they concluded that Cohen‘s d was not robust to mixed-normal distributions, and that applying Cohen‘s d to such data caused serious flaws in interpreting the resulting parameter. Id. at 318-319.

In a subsequent simulation study, Johnson Ching-Hong Li investigated the robustness of several effect-size

tests, including Cohen‘s d. Johnson Ching-Hong Li, Effect size measures in a two-independent-samples case with nonnormal and nonhomogeneous data, 48 Behavioral Research 1560 (2015). Li concluded that Cohen‘s d “was found to be inaccurate when the normality and homogeneity-of-variances assumptions were violated in this study, thereby severely affecting the accuracy of d in evaluating the true [effect size] in the research literature.” Id. at 1571.

The use of Cohen‘s d with test groups consisting of very few observations may be particularly problematic. Consider, for example, a situation in which there are eight export sales, two occurring in each of the four regions of the United States. Under the differential pricing analysis, as Commerce describes it, Commerce would apply Cohen‘s d to analyze the pricing differences between each region‘s two sales (i.e., the test group) and the other regions’ six sales (i.e., the comparison group) even though each test group contains only two observations and each would potentially lack normality. The literature concludes that using Cohen‘s d in such a situation may produce an upward bias in the calculated effect size. See Grissom et al. at 70 (“Both Cohen‘s d and Glass‘s d_g have some positive bias (i.e., tending to overestimate their respective parameters), the more so the smaller the sample sizes and the larger the effect size in the population.“). An upward bias might produce more “passing” results under the Cohen‘s d test, which would tend to exaggerate dumping margins.

Another source of concern arises from test groups containing sales prices that hover around the same value. Consider, for example, ten purchasers of a product, each of which purchases five units. Assume that the per-unit sales prices for a particular purchaser are not normally distributed and are all the same, or nearly the same (e.g., $100.01, $100.01, $100.01, $100.01, and $99.99). Assume further that the per-unit sales prices across the entire set of purchasers are also very similar, falling within a relatively small range (such as between $99.92 and $101.01).

Applying Cohen‘s d to that hypothetical data seems problematic: As the variance within each test group approaches zero, the denominator in the Cohen‘s d equation is greatly reduced and, in fact, approaches half of the values of the standard deviations of the larger comparison groups.¹⁴ That is because Commerce uses the simple average pooled standard deviation instead of the weighted average pooled standard deviation; the former averages the standard deviations of the test and comparison groups without accounting for the number of observations in each group.¹⁵ As the denominator is reduced, the resulting effect-size parameter is increased, tending to artificially inflate the dumping margins for a set of export sales prices that has minimal variance. An objective examiner inspecting those export sales prices would be unlikely to conclude that they embody a “pattern” of prices that “differ significantly.” 19 U.S.C. § 1677f-1(d)(1)(B)(i). Although the problem in that situation is a function of Commerce‘s use of the simple average pooled standard deviation, our concern is

also related to the number of observations being compared and the distribution of those observations—requiring larger test groups tends to decrease the likelihood that a test group would have sales prices with near-zero variance, and requiring normality also tends to decrease that likelihood as the number of observations increases.

Commerce makes only two relevant arguments in response. First, Commerce argues that the concern over the assumption of normality is misplaced because “normal distribution is a concept of probability and statistical significance, which are not relevant to Commerce‘s differential pricing analysis.” Appellee‘s Br. 25. Put differently, Commerce argues that it does not need to worry about normality, because it is not sampling data but instead possesses the entire universe of data. See id. at 25-26; see also Final Memo at 21-22 (making similar arguments). While Commerce is correct that it does not “sample” data, that observation does not address the fact that Professor Cohen derived his interpretive cutoffs under the assumption of normality. Nor does it address SeAH‘s representation that Commerce‘s analysis in this case violated Professor Cohen‘s other assumptions, homogeneity-of-variances and the number of observations being compared.

Commerce‘s second argument is that its approach is reasonable because it uses the larger, more conservative 0.8 cutoff for identifying effect sizes that pass the Cohen‘s d test. That argument, too, fails to address the fact that Professor Cohen derived his interpretive cutoffs under certain assumptions. Violating those assumptions can subvert the usefulness of the interpretive cutoffs, transforming what might be a conservative cutoff into a meaningless comparator. See Virnetx, Inc. v. Cisco Sys., Inc., 767 F.3d 1308, 1332 (Fed. Cir. 2014) (“The Nash theorem arrives at a result that follows from a certain set of premises. It itself asserts nothing about what situations in the real world fit those premises. Anyone seeking to invoke the theorem as applicable to a particular situation must establish that fit, because the 50/50 profit-split result is proven by the theorem only on those premises. Weinstein did not do so. This was an essential failing in invoking the Solution.“).

In sum, the evidence and arguments before us call into question whether Commerce‘s application of the Cohen‘s d test to the data in this case violated the assumptions of normality, sufficient observation size, and roughly equal variances associated with that test. It seems likely that Commerce‘s application of the Cohen‘s d test had a material impact on the results of the less-than-fair-value investigation in this case, particularly given that the dumping margin assigned to SeAH (2.53%) was only slightly above the de minimis threshold, below which no antidumping duties would be assessed. We therefore remand to give Commerce an opportunity to explain whether the limits on the use of the Cohen‘s d test prescribed by Professor Cohen and other authorities were satisfied in this case or whether those limits need not be observed when Commerce uses the Cohen‘s d test in less-than-fair-value adjudications. In that regard, we invite Commerce to clarify its argument that having the entire universe of data rather than a sample makes it permissible to disregard the otherwise-applicable limitations on the use of the Cohen‘s d test.

AFFIRMED IN PART, VACATED AND REMANDED IN PART

COSTS

Each party will bear its own costs for this appeal.

Notes

An “export” price means the price of a transaction in the United States; a “normal” price means the price of a transaction in the respondent‘s home country.

The de minimis threshold for less-than-fair-value investigations is 2%. 19 U.S.C. § 1673d(a)(4) (incorporating the 2% value provided in section 1673b(b)(3)).

Calculating the “weighted average” of a group of sales prices simply requires multiplying each sales price by the number of units sold at that price and computing the average of the resulting values.

The Trade Court subsequently denied SeAH‘s motion for reconsideration. (”Stupp II“). The court later issued two additional decisions in this case that are not pertinent to this appeal.

As relevant here, subparagraph (iv) covers evidence submitted by a party to rebut, clarify, or correct certain evidence placed on the record by Commerce. Subparagraph (v) covers all evidence not covered by subparagraphs (i) through (iv) as well as evidence submitted by a party to rebut, clarify, or correct such evidence.

Commerce reasoned that if SeAH‘s factual information fell within the catch-all provision of subparagraph (v), then section 351.301(c)(5) required SeAH to submit that information at least 30 days before Commerce‘s preliminary determination. SeAH failed to meet that deadline because it submitted that information more than three months after the preliminary determination. J.A. 9698. Although Commerce did not separately analyze the timing requirement for factual information submitted under subparagraph (iv) of section 351.102(b)(21), SeAH does not contend on appeal that its submission would have been timely under that requirement.

Commerce has interpreted the exception set forth in the 1997 notice of final rule in the same manner. See, e.g., , available at https://enforcement.trade.gov/frn/summary/india/04-14620-1.pdf (permitting a party to submit financial statements in a January 2004 case brief when those statements were in the “public realm” and addressed conclusions in Commerce‘s December 2003 verification report).

Commerce issued a “Request for Comments” announcing its “Differential Pricing Analysis” methodology before it instituted the investigation in this case. See 79 Fed. Reg. 26,720. However, Commerce has not issued a formal rule adopting that methodology.

The statute defines an optional “[e]xception” to the general rule that Commerce use the average-to-average method (or transaction-to-transaction method): “The administering authority may determine whether the subject merchandise is being sold in the United States at less than fair value [using the average-to-transaction method] . . . .” 19 U.S.C. § 1677f-1(d)(1)(B) (emphasis added).

See (“Congress has not expressly authorized the ITA to ignore de minimis or negligible dumping margins.“). The current statute defining the de minimis threshold, 19 U.S.C. § 1673b(b)(3), was not enacted until December 8, 1994. See Uruguay Round Agreements Act, Pub. L. No. 103-465, 108 Stat. 4809. Commerce did not publish its rule establishing the de minimis threshold until 1987. See (cited with approval in ) (“So far as the Court is aware, Commerce has never proposed a rule, or even claimed, that a .5 percent test applies in all cases. . . . Even though there is no ‘rule’ that margins less than .5 percent are de minimis, Commerce may find that margins of approximately .45 percent are de minimis in this investigation. To do this Commerce must explain the basis for its decision.“).

; see also .

SeAH contends that Commerce “compared groups containing as few as 2 data points,” “compared groups with vastly dissimilar numbers of data points,” “compared groups that were not normally distributed,” and “compared groups with greatly dissimilar variances (as measured by the standard deviation).” Appellant‘s Opening Br. 41-42. Commerce does not dispute those contentions.

Professor Coe‘s paper is available at https://www.cem.org/attachments/ebe/ESguide.pdf. Cohen‘s d is a measure of “standardized mean difference.” Paul D. Ellis, The Essential Guide to Effect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of Research Results 13 (2010).

For each iteration of the Cohen‘s d test, with rotating test groups and comparison groups, the denominator is simply the average of two numbers—the standard deviation of the test group and the standard deviation of the comparison group. When the test group‘s standard deviation is zero, the denominator is equal to half of the comparison group‘s standard deviation (the simple average of zero and any number is half of that number).

In , we remanded so that Commerce could provide “more thorough consideration” and justification for using the simple average pooled standard deviation. . Commerce defended its position on remand, and the Trade Court found Commerce‘s defense reasonable. See . An appeal of the Trade Court‘s decision is pending before this Court. See .

Read the detailed case summary