Ontology alignment is widely used to find the correspondences between different ontologies in diverse fields. After discovering the alignment by methods, several performance scores are available to evaluate them. The scores require the produced alignment by a method and the reference alignment containing the underlying actual correspondences of the given ontologies. The current trend in alignment evaluation is to put forward a new score and to compare various alignments by juxtaposing their performance scores. However, it is substantially provocative to select one performance score among others for comparison. On top of that, claiming if one method has a better performance than one another can not be substantiated by solely comparing the scores. In this paper, we propose the statistical procedures which enable us to theoretically favor one method over one another. The McNemar test is considered as a reliable and suitable means for comparing two ontology alignment methods over one matching task. The test applies to a 2 x 2 contingency table which can be constructed in two different ways based on the alignments, each of which has their own merits/pitfalls. The ways of the contingency table construction and various apposite statistics from the McNemar test are elaborated in minute detail. In the case of having more than two alignment methods for comparison, the family-wise error rate is expected to happen. Thus, the ways of preventing such an error are also discussed. A directed graph visualizes the outcome of the McNemar test in the presence of multiple alignment methods. From this graph, it is readily understood if one method is better than one another or if their differences are imperceptible. Our investigation on the methods participated in the anatomy track of OAEI 2016 demonstrates that AML and CroMatcher are the top two methods and DKP-AOM and Alin are the bottom two ones.