Theory Computations

theory Computations
  imports Setup
    "HOL-Library.Code_Target_Int"
begin

section ‹Computations \label{sec:computations}›

subsection ‹Prelude -- The code› antiquotation \label{sec:code_antiq}›

text ‹
  The @{ML_antiquotation_def code} antiquotation allows to include constants
  from
  generated code directly into ML system code, as in the following toy
  example:
›

datatype %quote form = T | F | And form form | Or form form (*<*)

(*>*) ML %quote fun eval_form @{code T} = true
    | eval_form @{code F} = false
    | eval_form (@{code And} (p, q)) =
        eval_form p andalso eval_form q
    | eval_form (@{code Or} (p, q)) =
        eval_form p orelse eval_form q;

text ‹
  \noindent The antiquotation @{ML_antiquotation code} takes
  the name of a constant as argument;
  the required code is generated
  transparently and the corresponding constant names are inserted
  for the given antiquotations.  This technique allows to use pattern
  matching on constructors stemming from compiled datatypes.
  Note that the @{ML_antiquotation code}
  antiquotation may not refer to constants which carry adaptations;
  here you have to refer to the corresponding adapted code directly.
›

  
subsection ‹The concept of computations›

text ‹
  Computations embody the simple idea that for each
  monomorphic Isabelle/HOL term of type τ› by virtue of
  code generation there exists an corresponding ML type T› and
  a morphism Φ :: τ → T› satisfying
  Φ (t1 ⋅ t2) = Φ t1 ⋅ Φ t2, with ⋅› denoting
  term application.

  For a given Isabelle/HOL type τ›, parts of Φ› can be
  implemented by a corresponding ML function φτ :: term → T›.
  How?

     Let input be a constant C :: τ.› \\
      Then φτ C = fC with fC being
      the image of C› under code generation.

     Let input be an application (t1 ⋅ t2) :: τ.› \\
      Then φτ (t1 ⋅ t2) = φτ t1τ t2)›.

  \noindent Using these trivial properties, each monomorphic constant
  C : vecτn → τ› yields the following
  equations:
›

text %quote φ1 → τ2 → … → τn → τ) C = fC \\
  φ2 → … → τn → τ) (C ⋅ t1) = fCτ1 t1)› \\
  …› \\
  φτ (C ⋅ t1 ⋅ … ⋅ tn) = fCτ1 t1) … (φτn tn)›
  
text ‹
  \noindent Hence a computation is characterized as follows:

     Let input constants› denote a set of monomorphic constants.

     Let τ› denote a monomorphic type and 'ml› be a schematic
      placeholder for its corresponding type in ML under code generation.

     Then the corresponding computation is an ML function of type
      ML_typeProof.context -> term -> 'ml›
      partially implementing the morphism Φ :: τ → T› for all
      ‹input terms› consisting only of input constants and applications.

  \noindent The charming idea is that all required code is automatically generated
  by the code generator for givens input constants and types;
  that code is directly inserted into the Isabelle/ML runtime system
  by means of antiquotations.
›


subsection ‹The computation› antiquotation›

text ‹
  The following example illustrates its basic usage:
›

ML %quote local

  fun int_of_nat @{code "0 :: nat"} = 0
    | int_of_nat (@{code Suc} n) = int_of_nat n + 1;

  in

  val comp_nat = @{computation nat terms:
    "plus :: nat  nat  nat" "times :: nat  nat  nat"
    "sum_list :: nat list  nat" "prod_list :: nat list  nat"
    datatypes: nat "nat list"}
    (fn post => post o HOLogic.mk_nat o int_of_nat o the);

  end

text  Antiquotations occurring in the
      same ML block always refer to the same transparently generated code;
      particularly, they share the same transparently generated datatype
      declarations.

     The type of a computation is specified as first argument.

     Input constants are specified the following ways:

         Each term following terms:› specifies all constants
          it contains as input constants.

         Each type following datatypes:› specifies all constructors
          of the corresponding code datatype as input constants.  Note that
          this does not increase expressiveness but succinctness for datatypes
          with many constructors.  Abstract type constructors are skipped
          silently.

     The code generated by a computation› antiquotation takes a functional argument
      which describes how to conclude the computation.  What's the rationale
      behind this?

         There is no automated way to generate a reconstruction function
          from the resulting ML type to a Isabelle term -- this is in the
          responsibility of the implementor.  One possible approach
          for robust term reconstruction is the code› antiquotation.

         Both statically specified input constants and dynamically provided input
          terms are subject to preprocessing.  Likewise the result
          is supposed to be subject to postprocessing; the implementor
          is expected to take care for this explicitly.

         Computations follow the partiality principle (cf.~\secref{sec:partiality_principle}):
          failures due to pattern matching or unspecified functions
          are interpreted as partiality; therefore resulting ML values
          are optional. 

      Hence the functional argument accepts the following parameters

         A postprocessor function ML_typeterm -> term.

         The resulting value as optional argument.

      The functional argument is supposed to compose the final result
      from these in a suitable manner.

  \noindent A simple application:
›

ML_val %quote comp_nat context termsum_list [Suc 0, Suc (Suc 0)] * Suc (Suc 0)

text ‹
  \noindent A single ML block may contain an arbitrary number of computation
  antiquotations.  These share the ‹same› set of possible
  input constants.  In other words, it does not matter in which
  antiquotation constants are specified;  in the following example,
  ‹all› constants are specified by the first antiquotation once
  and for all:
›

ML %quote local

  fun int_of_nat @{code "0 :: nat"} = 0
    | int_of_nat (@{code Suc} n) = int_of_nat n + 1;

  in

  val comp_nat = @{computation nat terms:
    "plus :: nat  nat  nat" "times :: nat  nat  nat"
    "sum_list :: nat list  nat" "prod_list :: nat list  nat"
    "replicate :: nat  nat  nat list"
    datatypes: nat "nat list"}
    (fn post => post o HOLogic.mk_nat o int_of_nat o the);

  val comp_nat_list = @{computation "nat list"}
    (fn post => post o HOLogic.mk_list typnat o
      map (HOLogic.mk_nat o int_of_nat) o the);

  end

  
subsection ‹Pitfalls when specifying input constants \label{sec:input_constants_pitfalls}›

text  Complete type coverage.› Specified input constants must
      be ‹complete› in the sense that for each
      required type τ› there is at least one corresponding
      input constant which can actually ‹construct› a concrete
      value of type τ›, potentially requiring more types recursively;
      otherwise the system of equations cannot be generated properly.
      Hence such incomplete input constants sets are rejected immediately.

     Unsuitful right hand sides.› The generated code for a computation
      must compile in the strict ML runtime environment.  This imposes
      the technical restriction that each compiled input constant
      fC on the right hand side of a generated equations
      must compile without throwing an exception.  That rules
      out pathological examples like @{term [source] "undefined :: nat"}
      as input constants, as well as abstract constructors (cf. \secref{sec:invariant}).

     Preprocessing.› For consistency, input constants are subject
      to preprocessing;  however, the overall approach requires
      to operate on constants C› and their respective compiled images
      fC.\footnote{Technical restrictions of the implementation
      enforce this, although those could be lifted in the future.}
      This is a problem whenever preprocessing maps an input constant
      to a non-constant.

      To circumvent these situations, the computation machinery
      has a dedicated preprocessor which is applied ‹before›
      the regular preprocessing, both to input constants as well as
      input terms.  This can be used to map problematic constants
      to other ones that are not subject to regular preprocessing. 
      Rewrite rules are added using attribute @{attribute code_computation_unfold}.
      There should rarely be a need to use this beyond the few default
      setups in HOL, which deal with literals (see also \secref{sec:literal_input}).
›

  
subsection ‹Pitfalls concerning input terms›

text  No polymorphism.› Input terms must be monomorphic: compilation
      to ML requires dedicated choice of monomorphic types.

     No abstractions.› Only constants and applications are admissible
      as input; abstractions are not possible.  In theory, the
      compilation schema could be extended to cover abstractions also,
      but this would increase the trusted code base.  If abstractions
      are required, they can always be eliminated by a dedicated preprocessing
      step, e.g.~using combinatorial logic.

     Potential interfering of the preprocessor.› As described in \secref{sec:input_constants_pitfalls}
      regular preprocessing can have a disruptive effect for input constants.
      The same also applies to input terms; however the same measures 
      to circumvent that problem for input constants apply to input terms also.
›

  
subsection ‹Computations using the computation_conv› antiquotation›

text ‹
  Computations are a device to implement fast proof procedures.
  Then results of a computation are often assumed to be trustable
  and thus wrapped in oracles (see cite"isabelle-isar-ref"),
  as in the following example:\footnote{
  The technical details how numerals are dealt with are given later in
  \secref{sec:literal_input}.}
›

ML %quote local

  fun raw_dvd (b, ct) =
    instantiatex = ct and y = if b then ctermTrue else ctermFalse
      in cterm x  y for x y :: bool;

  val (_, dvd_oracle) = Theory.setup_result (Thm.add_oracle (bindingdvd, raw_dvd));

  in

  val conv_dvd = @{computation_conv bool terms:
    "Rings.dvd :: int  int  bool"
    "plus :: int  int  int"
    "minus :: int  int  int"
    "times :: int  int  int"
    "0 :: int" "1 :: int" "2 :: int" "3 :: int" "-1 :: int"
  } (K (curry dvd_oracle))

  end

text  Antiquotation @{ML_antiquotation computation_conv} basically yields
      a conversion of type ML_typeProof.context -> cterm -> thm
      (see further cite"isabelle-implementation").

     The antiquotation expects one functional argument to bridge the
      \qt{untrusted gap};  this can be done e.g.~using an oracle.  Since that oracle
      will only yield \qt{valid} results in the context of that particular
      computation, the implementor must make sure that it does not leave
      the local ML scope;  in this example, this is achieved using
      an explicit local› ML block.  The very presence of the oracle
      in the code acknowledges that each computation requires explicit thinking
      before it can be considered trustworthy!

     Postprocessing just operates as further conversion after normalization.

     Partiality makes the whole conversion fall back to reflexivity.
› (*<*)

(*>*) ML_val %quote conv_dvd context cterm7 dvd ( 62437867527846782 :: int);
  conv_dvd context cterm7 dvd (-62437867527846783 :: int);

text ‹
  \noindent An oracle is not the only way to construct a valid theorem.
  A computation result can be used to construct an appropriate certificate:
›

lemma %quote conv_div_cert:
  "(Code_Target_Int.positive r * Code_Target_Int.positive s)
     div Code_Target_Int.positive s  (numeral r :: int)" (is "?lhs  ?rhs")
proof (rule eq_reflection)
  have "?lhs = numeral r * numeral s div numeral s"
    by simp
  also have "numeral r * numeral s div numeral s = ?rhs"
    by (rule nonzero_mult_div_cancel_right) simp
  finally show "?lhs = ?rhs" .
qed

lemma %quote positive_mult:
  "Code_Target_Int.positive r * Code_Target_Int.positive s = 
    Code_Target_Int.positive (r * s)"
  by simp
  
ML %quote local

  fun integer_of_int (@{code int_of_integer} k) = k

  val cterm_of_int = Thm.cterm_of context o HOLogic.mk_numeral o integer_of_int;

  val divisor = Thm.dest_arg o Thm.dest_arg;

  val evaluate_simps = map mk_eq @{thms arith_simps positive_mult};

  fun evaluate ctxt = 
    Simplifier.rewrite_rule ctxt evaluate_simps;

  fun revaluate ctxt k ct =
    @{thm conv_div_cert}
    |> Thm.instantiate' [] [SOME (cterm_of_int k), SOME (divisor ct)]
    |> evaluate ctxt;

  in

  val conv_div = @{computation_conv int terms:
    "divide :: int  int  int"
    "0 :: int" "1 :: int" "2 :: int" "3 :: int"
  } revaluate

  end

ML_val %quote conv_div context
    cterm46782454343499999992777742432342242323423425 div (7 :: int)

text ‹
  \noindent The example is intentionally kept simple: only positive
  integers are covered, only remainder-free divisions are feasible,
  and the input term is expected to have a particular shape.
  This exhibits the idea more clearly:
  the result of the computation is used as a mere
  hint how to instantiate @{fact conv_div_cert}, from which the required
  theorem is obtained by performing multiplication using the
  simplifier;  hence that theorem is built of proper inferences,
  with no oracles involved.
›


subsection ‹Computations using the computation_check› antiquotation›

text ‹
  The computation_check› antiquotation is convenient if
  only a positive checking of propositions is desired, because then
  the result type is fixed (typprop) and all the technical
  matter concerning postprocessing and oracles is done in the framework
  once and for all:
›

ML %quote val check_nat = @{computation_check terms:
    Trueprop "less :: nat  nat  bool" "plus :: nat  nat  nat" 
    "times :: nat  nat  nat"
    "0 :: nat" Suc
  }

text ‹
  \noindent The HOL judgement termTrueprop embeds an expression
  of type typbool into typprop.
›

ML_val %quote check_nat context cpropless (Suc (Suc 0)) (Suc (Suc (Suc 0)))

text ‹
  \noindent Note that such computations can only ‹check›
  for typprops to hold but not ‹decide›.
›


subsection ‹Some practical hints›  

subsubsection ‹Inspecting generated code›

text ‹
  The antiquotations for computations attempt to produce meaningful error
  messages.  If these are not sufficient, it might by useful to
  inspect the generated code, which can be achieved using
›
  
declare %quote [[ML_source_trace]] (*<*) declare %quote [[ML_source_trace = false]] (*>*)


subsubsection ‹Literals as input constants \label{sec:literal_input}›

text ‹
  Literals (characters, numerals) in computations cannot be processed
  naively: the compilation pattern for computations fails whenever
  target-language literals are involved; since various
  common code generator setups (see \secref{sec:common_adaptation})
  implement typnat and typint by target-language literals,
  this problem manifests whenever numeric types are involved.
  In practice, this is circumvented with a dedicated preprocessor
  setup for literals (see also \secref{sec:input_constants_pitfalls}).

  The following examples illustrate the pattern
  how to specify input constants when literals are involved, without going into
  too much detail:
›

paragraph ‹An example for typnat

ML %quote val check_nat = @{computation_check terms:
    Trueprop "even :: nat  bool" "plus :: nat  nat  nat" 
    "0 :: nat" Suc "1 :: nat" "2 :: nat" "3 :: nat"
  }

ML_val %quote check_nat context cpropeven (Suc 0 + 1 + 2 + 3 + 4 + 5)
  
paragraph ‹An example for typint

ML %quote val check_int = @{computation_check terms:
    Trueprop "even :: int  bool" "plus :: int  int  int" 
    "0 :: int" "1 :: int" "2 :: int" "3 :: int" "-1 :: int"
  }

ML_val %quote check_int context cpropeven ((0::int) + 1 + 2 + 3 + -1 + -2 + -3)
  
paragraph ‹An example for typString.literal

definition %quote is_cap_letter :: "String.literal  bool"
  where "is_cap_letter s  (case String.asciis_of_literal s
    of []  False | k # _  65  k  k  90)" (*<*)

(*>*) ML %quote val check_literal = @{computation_check terms:
    Trueprop is_cap_letter datatypes: bool String.literal
  }

ML_val %quote check_literal context cpropis_cap_letter (STR ''Q'')

  
subsubsection ‹Preprocessing HOL terms into evaluable shape›

text ‹
  When integrating decision procedures developed inside HOL into HOL itself,
  a common approach is to use a deep embedding where operators etc.
  are represented by datatypes in Isabelle/HOL.  Then it is necessary
  to turn generic shallowly embedded statements into that particular
  deep embedding (\qt{reification}).

  One option is to hardcode using code antiquotations (see \secref{sec:code_antiq}).
  Another option is to use pre-existing infrastructure in HOL:
  MLReification.conv and MLReification.tac.

  A simplistic example:
›

datatype %quote form_ord = T | F | Less nat nat
  | And form_ord form_ord | Or form_ord form_ord | Neg form_ord

primrec %quote interp :: "form_ord  'a::order list  bool"
where
  "interp T vs  True"
| "interp F vs  False"
| "interp (Less i j) vs  vs ! i < vs ! j"
| "interp (And f1 f2) vs  interp f1 vs  interp f2 vs"
| "interp (Or f1 f2) vs  interp f1 vs  interp f2 vs"
| "interp (Neg f) vs  ¬ interp f vs"

text ‹
  \noindent The datatype typeform_ord represents formulae whose semantics is given by
  constinterp.  Note that values are represented by variable indices (typnat)
  whose concrete values are given in list termvs.
›

ML %quote (*<*) ‹›
lemma "thm": fixes x y z :: "'a::order" shows "x < y  x < z  interp (And (Less (Suc 0) (Suc (Suc 0))) (Less (Suc 0) 0)) [z, x, y]"
ML_prf %quote
(*>*) val thm =
  Reification.conv context @{thms interp.simps} ctermx < y  x < z (*<*)
by (tactic ALLGOALS (resolve_tac context [thm]))
(*>*)

text ‹
  \noindent By virtue of @{fact interp.simps}, MLReification.conv provides a conversion
  which, for this concrete example, yields @{thm thm [no_vars]}.  Note that the argument
  to constinterp does not contain any free variables and can thus be evaluated
  using evaluation.

  A less meager example can be found in the AFP, session Regular-Sets›,
  theory Regexp_Method›.
›

end