Theory Refinement

theory Refinement
imports Setup
begin

section ‹Program and datatype refinement \label{sec:refinement}›

text ‹
  Code generation by shallow embedding (cf.~\secref{sec:principle})
  allows to choose code equations and datatype constructors freely,
  given that some very basic syntactic properties are met; this
  flexibility opens up mechanisms for refinement which allow to extend
  the scope and quality of generated code dramatically.
›


subsection ‹Program refinement›

text ‹
  Program refinement works by choosing appropriate code equations
  explicitly (cf.~\secref{sec:equations}); as example, we use Fibonacci
  numbers:
›

fun %quote fib :: "nat  nat" where
    "fib 0 = 0"
  | "fib (Suc 0) = Suc 0"
  | "fib (Suc (Suc n)) = fib n + fib (Suc n)"

text ‹
  \noindent The runtime of the corresponding code grows exponential due
  to two recursive calls:
›

text %quote @{code_stmts fib constant: fib (Haskell)}

text ‹
  \noindent A more efficient implementation would use dynamic
  programming, e.g.~sharing of common intermediate results between
  recursive calls.  This idea is expressed by an auxiliary operation
  which computes a Fibonacci number and its successor simultaneously:
›

definition %quote fib_step :: "nat  nat × nat" where
  "fib_step n = (fib (Suc n), fib n)"

text ‹
  \noindent This operation can be implemented by recursion using
  dynamic programming:
›

lemma %quote [code]:
  "fib_step 0 = (Suc 0, 0)"
  "fib_step (Suc n) = (let (m, q) = fib_step n in (m + q, m))"
  by (simp_all add: fib_step_def)

text ‹
  \noindent What remains is to implement constfib by constfib_step as follows:
›

lemma %quote [code]:
  "fib 0 = 0"
  "fib (Suc n) = fst (fib_step n)"
  by (simp_all add: fib_step_def)

text ‹
  \noindent The resulting code shows only linear growth of runtime:
›

text %quote @{code_stmts fib constant: fib fib_step (Haskell)}


subsection ‹Datatype refinement›

text ‹
  Selecting specific code equations \emph{and} datatype constructors
  leads to datatype refinement.  As an example, we will develop an
  alternative representation of the queue example given in
  \secref{sec:queue_example}.  The amortised representation is
  convenient for generating code but exposes its \qt{implementation}
  details, which may be cumbersome when proving theorems about it.
  Therefore, here is a simple, straightforward representation of
  queues:
›

datatype %quote 'a queue = Queue "'a list"

definition %quote empty :: "'a queue" where
  "empty = Queue []"

primrec %quote enqueue :: "'a  'a queue  'a queue" where
  "enqueue x (Queue xs) = Queue (xs @ [x])"

fun %quote dequeue :: "'a queue  'a option × 'a queue" where
    "dequeue (Queue []) = (None, Queue [])"
  | "dequeue (Queue (x # xs)) = (Some x, Queue xs)"

text ‹
  \noindent This we can use directly for proving;  for executing,
  we provide an alternative characterisation:
›

definition %quote AQueue :: "'a list  'a list  'a queue" where
  "AQueue xs ys = Queue (ys @ rev xs)"

code_datatype %quote AQueue

text ‹
  \noindent Here we define a \qt{constructor} constAQueue which
  is defined in terms of Queue› and interprets its arguments
  according to what the \emph{content} of an amortised queue is supposed
  to be.

  The prerequisite for datatype constructors is only syntactical: a
  constructor must be of type τ = … ⇒ κ α1 … αn where 1, …, αn}› is exactly the set of \emph{all} type variables in
  τ›; then κ› is its corresponding datatype.  The
  HOL datatype package by default registers any new datatype with its
  constructors, but this may be changed using @{command_def
  code_datatype}; the currently chosen constructors can be inspected
  using the @{command print_codesetup} command.

  Equipped with this, we are able to prove the following equations
  for our primitive queue operations which \qt{implement} the simple
  queues in an amortised fashion:
›

lemma %quote empty_AQueue [code]:
  "empty = AQueue [] []"
  by (simp add: AQueue_def empty_def)

lemma %quote enqueue_AQueue [code]:
  "enqueue x (AQueue xs ys) = AQueue (x # xs) ys"
  by (simp add: AQueue_def)

lemma %quote dequeue_AQueue [code]:
  "dequeue (AQueue xs []) =
    (if xs = [] then (None, AQueue [] [])
    else dequeue (AQueue [] (rev xs)))"
  "dequeue (AQueue xs (y # ys)) = (Some y, AQueue xs ys)"
  by (simp_all add: AQueue_def)

text ‹
  \noindent It is good style, although no absolute requirement, to
  provide code equations for the original artefacts of the implemented
  type, if possible; in our case, these are the datatype constructor
  constQueue and the case combinator constcase_queue:
›

lemma %quote Queue_AQueue [code]:
  "Queue = AQueue []"
  by (simp add: AQueue_def fun_eq_iff)

lemma %quote case_queue_AQueue [code]:
  "case_queue f (AQueue xs ys) = f (ys @ rev xs)"
  by (simp add: AQueue_def)

text ‹
  \noindent The resulting code looks as expected:
›

text %quote @{code_stmts empty enqueue dequeue Queue case_queue (SML)}

text ‹
  The same techniques can also be applied to types which are not
  specified as datatypes, e.g.~type typint is originally specified
  as quotient type by means of @{command_def typedef}, but for code
  generation constants allowing construction of binary numeral values
  are used as constructors for typint.

  This approach however fails if the representation of a type demands
  invariants; this issue is discussed in the next section.
›


subsection ‹Datatype refinement involving invariants \label{sec:invariant}›

text ‹
  Datatype representation involving invariants require a dedicated
  setup for the type and its primitive operations.  As a running
  example, we implement a type typ'a dlist of lists consisting
  of distinct elements.

  The specification of typ'a dlist itself can be found in theory
  theoryHOL-Library.Dlist.

  The first step is to decide on which representation the abstract
  type (in our example typ'a dlist) should be implemented.
  Here we choose typ'a list.  Then a conversion from the concrete
  type to the abstract type must be specified, here:
›

text %quote term_typeDlist

text ‹
  \noindent Next follows the specification of a suitable \emph{projection},
  i.e.~a conversion from abstract to concrete type:
›

text %quote term_typelist_of_dlist

text ‹
  \noindent This projection must be specified such that the following
  \emph{abstract datatype certificate} can be proven:
›

lemma %quote [code abstype]:
  "Dlist (list_of_dlist dxs) = dxs"
  by (fact Dlist_list_of_dlist)

text ‹
  \noindent Note that so far the invariant on representations
  (term_typedistinct) has never been mentioned explicitly:
  the invariant is only referred to implicitly: all values in
  set term{xs. list_of_dlist (Dlist xs) = xs} are invariant,
  and in our example this is exactly term{xs. distinct xs}.
  
  The primitive operations on typ'a dlist are specified
  indirectly using the projection constlist_of_dlist.  For
  the empty dlist›, constDlist.empty, we finally want
  the code equation
›

text %quote termDlist.empty = Dlist []

text ‹
  \noindent This we have to prove indirectly as follows:
›

lemma %quote [code]:
  "list_of_dlist Dlist.empty = []"
  by (fact list_of_dlist_empty)

text ‹
  \noindent This equation logically encodes both the desired code
  equation and that the expression constDlist is applied to obeys
  the implicit invariant.  Equations for insertion and removal are
  similar:
›

lemma %quote [code]:
  "list_of_dlist (Dlist.insert x dxs) = List.insert x (list_of_dlist dxs)"
  by (fact list_of_dlist_insert)

lemma %quote [code]:
  "list_of_dlist (Dlist.remove x dxs) = remove1 x (list_of_dlist dxs)"
  by (fact list_of_dlist_remove)

text ‹
  \noindent Then the corresponding code is as follows:
›

text %quote @{code_stmts Dlist.empty Dlist.insert Dlist.remove list_of_dlist (SML)}

text ‹
  To reduce manual work for datatype refinement, @{command_def lift_definition}
  is a valuable tool.  See the corresponding section in cite"isabelle-isar-ref".

  See further cite"Haftmann-Kraus-Kuncar-Nipkow:2013:data_refinement"
  for the meta theory of datatype refinement involving invariants.

  Typical data structures implemented by representations involving
  invariants are available in the library, theory theoryHOL-Library.Mapping
  specifies key-value-mappings (type typ('a, 'b) mapping);
  these can be implemented by red-black-trees (theory theoryHOL-Library.RBT).
›

end