Execution Time

The efficiency and scalability of PyGraft are benchmarked across several schema and graph configurations. Each schema specification reported in Table 1 is paired with each graph specification from Table 2. This leads to 27 distinct combinations.

In particular, schemas from \(\mathcal{S}1\) to \(\mathcal{S}3\) are small-sized, schemas from \(\mathcal{S}4\) to \(\mathcal{S}6\) are medium-sized, and schemas from \(\mathcal{S}7\) to \(\mathcal{S}9\) are of larger sizes. For each schema of a given size, the degree of constraints vary as they contain different levels of OWL and RDFS constructs. For example, \(\mathcal{S}1\) has less constraints than \(\mathcal{S}2\), which itself has less constraints than \(\mathcal{S}3\). Graph specifications \(\mathcal{G}1\), \(\mathcal{G}2\), and \(\mathcal{G}3\) correspond to small-sized, medium-sized and large-sized graphs, respectively.

For these 27 unique configurations, execution times w.r.t. several dimensions are computed and shown in Figure 3. Execution times related to the schema generation are omitted as they are negligible. Experiments were conducted on a machine with 2 CPUs Intel Xeon E5-2650 v4, 12 cores/CPU, and 128GB RAM.

Table 1. Generated schemas. Column headers from left to right: number of classes, class hierarchy depth, average class depth, proportion of class disjointness (cd), number of relations, average depth of relation domains and ranges (rs), and proportions of reflexive (rf), irreflexive (irr), asymmetric (asy), symmetric (sy), transitive (tra), and inverse (inv) relations.

\(|\mathcal{C}|\)

\(\operatorname{MAX}(\mathcal{D})\)

\(\operatorname{AVG}(\mathcal{D})\)

cd

\(|\mathcal{R}|\)

rs

ref

irr

asy

sym

tra

inv

\(\mathcal{S}1\)

\(25\)

\(3\)

\(1.5\)

\(0.1\)

\(25\)

\(1.5\)

\(0.1\)

\(0.1\)

\(0.1\)

\(0.1\)

\(0.1\)

\(0.1\)

\(\mathcal{S}2\)

\(25\)

\(3\)

\(1.5\)

\(0.2\)

\(25\)

\(1.5\)

\(0.2\)

\(0.2\)

\(0.2\)

\(0.2\)

\(0.2\)

\(0.2\)

\(\mathcal{S}3\)

\(25\)

\(3\)

\(1.5\)

\(0.3\)

\(25\)

\(1.5\)

\(0.3\)

\(0.3\)

\(0.3\)

\(0.3\)

\(0.3\)

\(0.3\)

\(\mathcal{S}4\)

\(100\)

\(4\)

\(2.5\)

\(0.1\)

\(100\)

\(2.5\)

\(0.1\)

\(0.1\)

\(0.1\)

\(0.1\)

\(0.1\)

\(0.1\)

\(\mathcal{S}5\)

\(100\)

\(4\)

\(2.5\)

\(0.2\)

\(100\)

\(2.5\)

\(0.2\)

\(0.2\)

\(0.2\)

\(0.2\)

\(0.2\)

\(0.2\)

\(\mathcal{S}6\)

\(100\)

\(4\)

\(2.5\)

\(0.3\)

\(100\)

\(2.5\)

\(0.3\)

\(0.3\)

\(0.3\)

\(0.3\)

\(0.3\)

\(0.3\)

\(\mathcal{S}7\)

\(250\)

\(5\)

\(3.0\)

\(0.1\)

\(250\)

\(3.0\)

\(0.1\)

\(0.1\)

\(0.1\)

\(0.1\)

\(0.1\)

\(0.1\)

\(\mathcal{S}8\)

\(250\)

\(5\)

\(3.0\)

\(0.2\)

\(250\)

\(3.0\)

\(0.2\)

\(0.2\)

\(0.2\)

\(0.2\)

\(0.2\)

\(0.2\)

\(\mathcal{S}9\)

\(250\)

\(5\)

\(3.0\)

\(0.3\)

\(250\)

\(3.0\)

\(0.3\)

\(0.3\)

\(0.3\)

\(0.3\)

\(0.3\)

\(0.3\)

Table 2. Different graph specifications. Column headers from left to right: number of entities, number of triples, proportion of untyped entities, average depth of the most specific specific class, average number of most-specific classes per multi-typed entity.

\(|\mathcal{E}|\)

\(|\mathcal{T}|\)

unt

asc

mul

\(\mathcal{G}_1\)

\(100\)

\(1,000\)

\(0.3\)

\(2.0\)

\(2.0\)

\(\mathcal{G}_2\)

\(1,000\)

\(10,000\)

\(0.3\)

\(2.0\)

\(2.0\)

\(\mathcal{G}_3\)

\(10,000\)

\(100,000\)

\(0.3\)

\(2.0\)

\(2.0\)


../_images/stacked-bar-plot.png

Figure 3: Execution time results