-
Figure 1, p.4, the Parrennes examples. These examples are in
the Fabrice directory. Go to this
directory, and run
grep -c input_clause
〈file〉, where 〈file〉 ranges over
alice_full1.h1.p, alice_full2.h1.p, alice_full3.h1.p, needham_preuve.h1.p and needham_saturation.h1.p. This gives you the figure in the # of
clauses column.
Now run tptp2dfg
〈file〉. Get the argument of the
functions
directive, listing all functions in the signature,
and count them. This gives you the figures in the #functions
column.
To count the #literals/clause figure, do grep -c -- '--'
〈file〉 to count all negative literals,
grep -c -- '++'
〈file〉 to count all positive literals,
add the two numbers, divide by # of clauses.
To compute the average size of clauses, do
cat
〈file〉 | grep -v input_clause | sed -e s/--// | sed -e s/++// | sed -e "s/[][().,]/ /g" | wc -w
then divide by # of clauses.
- Claim “Almost none of the problems from the TPTP library are in
the H1 class”, top of p.5: run the shell script
findh1.
- Figure 2, p.5. This table was built by launching the shell
script precision. Then count the
number of lines starting with *** to fill in the #defin.
coarse row, and the number of lines starting with +++ to fill
in the #possib. coarse row. The Total # row is obtained by running
EXT=h1
for i in tptp/*
do
echo $i: `grep -l Killed $i/*.bench.$EXT | wc -l` killed, `grep -l NoMem $i/*.bench.$EXT | wc -l` nomem, `ls $i/*.h1.p | wc -l` files
done
and taking the last number of each line (just before “files”).
- All TPTP benchmarks, running h1 (with all variants
mentioned in the paper, and in fact at least one more) or tt
SPASS, are run by launching the shell script
bench. Beware this takes something like
eight hours to complete.
This produces the tables of Figure 3, 4, 7, 13, 16, 20, 22
automatically, in files called times.h1.tex, times.h1-no-skel.tex, etc.
This also prepares some required files for other tables mentioned
below.
- Figure 5 is obtained by launching gnuplot v4.0 or higher,
then typing the commands
set term postscript eps noenhanced 20
set output "times.h1.eps"
load "times.h1.plot"
If you don't have any file called times.h1.plot, you probably
forgot to run bench. The times.h1.plot is obtained by
running statbench h1.
- Figure 6 is obtained, similarly, by launching gnuplot v4.0
or higher, then typing the commands
set term postscript eps noenhanced 20
set output "mem.h1.eps"
load "mem.h1.plot"
- Figure 7 is obtained by launching gnuplot v4.0 or higher,
then typing the commands
set term postscript eps noenhanced 20
set output "times.spass.eps"
load "times.spass.plot"
- Figure 8 is obtained by launching gnuplot v4.0 or higher,
then typing the commands
set term postscript eps noenhanced 20
set output "mem.spass.eps"
load "mem.spass.plot"
Before that, you need to change the set xrange line in mem.spass.plot to [1:], otherwise something will be missing
from the diagram.
- Figure 9 was obtained by copying each command line following the
prompt
>
in file bench.h1 into a
console and running them as shown. Do not run this as though it
were a shell script, it isn't. The answers we have got were then
copied into the bench.h1 file right after the corresponding
command.
The “generated” column of the figure was obtained by copying the
number at the end of the line “Total generated (derived+subsumed
forward+frozen):”. Similarly for the “subsumed” column and the
“Total subsumed (forward+backward+fully defined):” line, the
“automaton” column and the “Automata clauses:” line. Times were
gotten by reading user time (just before “user” in lines such as
“358.26 user, 0.68 system, 744.06 elapsed – Max VSize = 268380KB,
Max RSS = 265588KB”). Memory usage was obtained by reading the
number just after “Max VSize = ” on the same line, and dividing by
1 024 to get a result in Mb.
- Figure 10 was obtained similarly, from three sets of measures of
the kind found in Figure 9, then computing averages and standard
deviations.
- p.7, claim that “SPASS did not terminate on any of the
Parrennes examples, even when allotted 60 minutes of computation
time.”. Claim that “SPASS did not terminate on the original,
unapproximated problems S either, again within a 60 minute time
limit.” On p.11, claim that “SPASS does terminate on the
original problems, when run with the
-RSSi=1
(enable sort
simplification), -FuncWeight=999999
(compare terms mainly by
function count and ignore predicates), and -Select=2
(select
all negative literals) options.”, and that “SPASS -RSSi=1
-FuncWeight=999999
-Select=2
still does not terminate
in less than 60 minutes on the (approximated) Parrennes
examples.”
These conclusions were reached by running SPASS 2.0 using the
commands listed in file bench.spass.
Do not run the shell script file directly. Rather, go to the right
directory, copy each command and paste each into some console. The
results we obtained are indicated in comments after each command
line.
- p.19, claim that “all of the Parrennes and TPTP examples
whose type proxy approximation was found to be unsatisfiable were
indeed unsatisfiable.” This was observed by launching the shell
script precision-monadic-proxy. Then count the number of lines
starting with *** (definitely coarse; there should not be
any), and +++ (possibly coarse; there should not be any
either).
- Figure 14, p.25, was obtained just like Figure 9, except through
the commands found in
bench.h1-no-skel.
- Figure 17, p.29, was obtained just like Figure 9, except through
the commands found in
bench.h1-monadic-proxy.
Note that half of the commands in the latter file use the additional
flag -all. This is used to fill in Figure 18, p.30.
- Figure 19, p.33, was obtained just like Figure 9, except through
the commands found in
bench.h1-noda.
- Figure 21, p.39, was obtained by running the shell script
abbrvbench. Then launch gnuplot
v4.0 or higher, and type the commands
set term postscript eps noenhanced 20
set output "abbrv.h1.eps"
load "abbrv.h1.plot"
- Figure 23, p.42, was obtained by running the shell script
interbench. Then launch gnuplot
v4.0 or higher, and type the commands
set term postscript eps noenhanced 20
set output "inter.h1.eps"
load "inter.h1-no-alt.plot"
- p.42. To compare which problems succeeded/failed with or
without abbreviation of non-trivial blocks, after all benchmarks
have been run, type
FILES=`find tptp -name '*.bench.h1-no-alt' -exec grep -q NoMem {} \; -print`
FILES="`find tptp -name '*.bench.h1-no-alt' -exec grep -q Killed {} \; -print` $FILES"
grep Intersection $FILES
This gives you the list of problems which failed with h1
-no-alternation, together with the number of intersection
predicates generated for each. You may remove from this list those
problems that already failed without the -no-alternation
option, by typing
OFILES=`find tptp -name '*.bench.h1' -exec grep -q NoMem {} \; -print`
OFILES="`find tptp -name '*.bench.h1' -exec grep -q Killed {} \; -print` $OFILES"
and removing the files from $OFILES from those of $FILES.
- p.42, claim that “The
COM006-1
case is stranger: while a
contradiction was found, the 462 step refutation does not use any
of the 5 defining clauses produced.” This was obtained by
running
memtime -m 400000 -c 300 h1.opt -progress -no-alternation tptp/COM/COM006-1.h1.p
h1logstrip tptp/COM/COM006-1.h1.log.gz >c6.log
grep -c '\[' c6.log
The result we got was 462. Each time the abbreviation of
non-trivial blocks rule is used, some lines will be output if tptp/COM/COM006-1.h1.log.gz containing the words inter-def:
and inter-use:. Check with grep that there are no such
words in the stripped log file c6.log, showing that no clause
using any intersection predicate was eventually useful in deriving a
contradiction.
- Figure 24, p.43, was obtained just like Figure 9, except through
the commands found in
bench.h1-no-alt. We added a
#intersection predicates column, obtained by reading the lines
starting with “Intersection predicates:” in the latter file.
- Figure 25, p.46, was obtained by looking again at file
bench.h1. This time, we fill the
“forward subsumption column” with the figure at line “Subsumed
clauses [forward] :”, the “backward subsumption” column with the
number at line “Subsumed clauses [backward]:”, the “full def.
subsumption, straight” column with the number at line “Subsumed
parent clauses because of fully defined symbols:”, and the “full
def. subsumption, backward” column with the number at line
“Backward subsumed clauses because of fully defined symbols:”.