Embeddable Spreadsheet -- Software Requirements Specification
Steven J Zeil:
1 Introduction
1.1 Purpose
This document provides the software requirements specification for the embeddable spreadsheet project. It is intended to serve as a sample SRS for CS350 students.
1.2 Scope
The embeddable spreadsheet is an implementation of the familiar computational model of a spreadsheet.
The principle component of this project is a library (API) that can be incorporated into other applications to add a spreadsheet-like capability.
Command-line and GUI interfaces will also be provided, but are secondary products, as their purpose is to add convenience in exploiting the library.
1.3 Definitions, acronyms, and abbreviations
Spreadsheet: a computational model generally envisioned as a rectangular arrangement of cells. Expressions associated with these cells create an implicit order dependency in which the cells must be evaluated: If a cell A contains an expression with references to cells B1, B2, … BN, then A must be evaluated after the evaluation of B1, B2, … BN.
Cell: A container that can be empty or hold an expression (possibly referencing other cells). If the cell contains an expression, then it also has a current value denoting the evaluation of that expression.
Value: The result of evaluating an expression. Values can be of various kinds, including numeric, string, and various artificial values used to indicate problems in the calculation.
direct dependence: A cell C that has been (most recently) assigned an expression E is considered to be directly dependent on any cells referenced within E.
dependence: A cell C is said to be dependent on any cells within the transitive closure of the direct dependence relationship defined above.
indirect dependence: A cell C is said to be indirectly dependent on any cell B if C depends on B but is not directly dependent on B.
1.4 References
1.4.1 Spreadsheets
- zeil15a
- Embeddable Spreadsheet – Requirements Definition, Steven Zeil, CS 350 lecture notes, Sep 18, 2018
- wikiSS
- Spreadsheet, http://en.wikipedia.org/wiki/Spreadsheet, accessed Dec 16, 2016
- zeil14a
- CS350: Dynamic Binding: Class-Appropriate behavior, https://secweb.cs.odu.edu/~zeil/cs330/s14/Public/dynamicBinding/dynamicBinding__html.html#example:evaluatingacell, Section 4.2, Feb 24, 2014
- zeil14b
- CS350: The Observer Pattern, https://secweb.cs.odu.edu/~zeil/cs330/s14/Public/idioms/idioms__html.html#theobserverpattern, Mar 11, 2014
- zeil15b
- CS361: The Topological Sort, https://www.cs.odu.edu/~zeil/cs361/sum15/Public/graphalgorithms/#topological-sort, Jul 02, 2015
1.4.2 Parsing
- zeil03a
- CS355: Basic Components - Lexical, Steven Zeil, http://www.cs.odu.edu/~zeil/cs355/Lectures/1components/lexical_summary.pdf, Aug 28, 2003
- zeil03b
- CS355: Basic Components - Syntax, Steven Zeil, http://www.cs.odu.edu/~zeil/cs355/Lectures/1components/syntax_summary.pdf, Sep 7, 2001
1.4.3 Parser Generators
- zeller05
- Zeller & Kinke, Essential Open Source Toolset, 2005, Wiley, ch 5–7
1.5 Overview
Section 2 of this document describes the Embeddable Spreadsheet_ project in general terms.
Section 3 presents the detailed requirements, organized by feature.
The appendices are devoted to the grammar for the spreadsheet expression language and a list of functions related to it.
2 Overall description
2.1 Product perspective
The spreadsheet is a now familiar model of calculation. Spreadsheets are typically envisioned as a rectangular arrangement of cells, each cell containing a mathematical expression. These expressions frequently contain references to other cells, and it is a defining characteristic of the spreadsheet model that the resulting computational order dependencies are resolved automatically rather than by explicit programming (e.g., as in a general programming language like C++ or Java). [zeil15a]
Commonly, the GUI for a spreadsheet is often seen as integral to this model. But, in fact, there are many applications that could benefit from the ability to incorporate user-supplied calculations. The spreadsheet model suggests a means of doing so without requiring the embedding of an entire scripting language.
The embeddable spreadsheet is envisioned as a library that can be incorporated into applications to add a spreadsheet-like capability.
2.2 Product functions
The system should be accessible through a Java API that would allow construction, loading and saving, and modification of spreadsheets and the querying of values in the evaluated spreadsheet.
It should also be possible to launch a command line application to load, evaluate, and report on a spreadsheet.
Optional, but highly desirable would be a GUI to aid in the construction of spreadsheets, even if the primary subsequent use of those spreadsheets would subsequently be evaluation via the API or via command line operation.
2.3 User characteristics
The embeddable nature of this spreadsheet means that the primary user group for this product will be application developers seeking to incorporate spreadsheet-style calculations into their applications software.
A secondary user group would be the users of those applications. We distinguish between
-
consumers of calculation results, who may be unaware of the method by which those calculations were carries out. Such users will have minimal impact on this project.
-
authors of calculations, who supply formulas involving variables related to the application within which the spreadsheet is embedded. Ease of entry and language will be importnat to this class of users.
For example, an application in the education domain might involve collecting scores from a variety of assignments and assessments. The course instructor would play author role in using the spreadsheet to tailor the calculation of overall grades to the course policy (e.g., dropping a lowest assignment grade, optional extra-credit work). Students in that class, by way of contrast, would play the consumer role in receiving the computed overall grade.
2.4 Constraints
The system should run under Windows, Linux, and OS/X systems equipped with a Java JRE of version 7 or later.
The system should be buildable under Windows, Linux, and OS/X systems equipped with a Java SDK of version 7 or later and the ant
build manager.
2.5 Assumptions and dependencies
3 Specific requirements
3.1 External interface requirements
3.1.1 User interfaces
The system shall support
- a command-line interface permitting the loading and evaluation of a spreadsheet.
- a GUI (optional) supporting the interactive creation, loading, editing, and saving of spreadsheets
3.1.2 Hardware interfaces
N/A
3.1.3 Software interfaces
The system shall support a Java API providing operations to
- create new spreadsheets
- load a spreadsheet from a file
- assign a formula to a cell in a spreadsheet
- query the value of the expression in a cell
- save a spreadsheet to a file
- report on all values in the spreadsheet
3.1.4 Communications interfaces
3.2 System features
3.2.1 Dimensionality and CellNames
Introduction/Purpose of feature
A spreadsheet is generally envisioned as a rectangular arrangement of cells.[zeil13a] In this rectangular view, cells are named by column (typically an alphabetic string) and row (numeric), for example, A1
or ABC23
.
The most common spreadsheets go beyond this simple 2D rectangular model. Most add a third dimension, the “sheet”, also identified by alphabetic strings, written before the column and row, with a separating character (‘!’) used to indicate the end of the first string and the start of the column string, e.g., Data!A1
or Summary!BB170
. In the absence of a sheet designation, the appropriate sheet is determined by context.
There is, however, no fundamental reason in the computation model to limit the arrangements of cells to 3 dimensions, nor to preclude the possibility of one-dimensional naming (e.g., ‘X’, ‘Y’, etc.)
Stimulus/Response sequence
N/A
Associated functional requirements
- DC-1
- Alphabetic characters within a cell name are case-insensitive and treated as upper-case for sorting purposes.
- DC-2
- If the purely numeric component is empty, it is treated as if it had been given as ‘0’ for sorting purposes.
- DC-3
- The dimension 1 (purely numeric) component may not contain leading zeros unless the number is entirely zero, in which case it may consist of only a single digit.
- DC-4
- For sorting purposes, the values of dimension 1 are ordered as integers.
- DC-5
- For sorting purposes, the values of dimension 2 are ordered so that shorter strings precede longer strings, and strings of equal length are ordered by conventional alphabetic (lexicographic) ordering, after replacing any lower-case letters by their upper-case equivalents.
- DC-6
- For sorting purposes, the values of dimensions above 2 are ordered by conventional alphabetic (lexicographic) ordering, after replacing any lower-case letters by their upper-case equivalents.
3.2.2 Expressions
Introduction/Purpose of feature
Associated with each non-empty cell is an expression. The syntax for expressions is conventional algebraic expressions combined with functions written as a function name followed by a parenthesized, comma-separated parameter list.
An expression defines a computation for a cell’s value as functions of constants and of the values of other cells.
Stimulus/Response sequence
N/A
Associated functional requirements
- EX-1
- The syntax for expressions is given as the grammar in Appendix sections Expression Lexemes and Expression Grammar.
- EX-2
- An
AreaReference
is considered a reference to all non-empty cells within that region. - EX-3
- Any rectangular region of 2 or more rows and 2 or more columns can be specified in 4 ways, all of which are entirely equivalent in meaning (e.g.,
A1:B3
,B3:A1
,A3:B1
, andB1:A3
all specify the same region).
- EX-4
- The minimal set of functions to be supported by the spreadsheet are listed in Appendix section Functions.
- EX-5
- A
<functionCall>
in which thefunctionName
matches one of the functions listed in Appendix section Functions is valid only if the number of parameters and their type match the specification given in the Appendix. - EX-6
- When an expression is copied from a source cell to a destination cell offset from the source by $(c,r)$ columns and rows, then each
CellReference
andAreaReference
in the expression copy is modified from the original by adding $c$ columns to any unfrozen column coordinate and adding $r$ rows to an unfrozen row coordinate.
3.2.3 Values
Introduction/Purpose of feature
When evaluated, expressions yield values.
Stimulus/Response sequence
N/A
Associated functional requirements
- V-1
- The system shall support expressions over numeric (floating-point) values.
- V-2
- The system shall support expressions over string values.
- V-3
- A special value
Error
, rendered as “error”, shall be supported for the purpose of reporting errors encountered during expression evaluation. - V-4
- Any function or operator evaluations that receive
Error
as an input shall returnError
as their result.
3.2.4 Assignments
Introduction/Purpose of feature
An assignment is an action of storing an expression in a cell.
An assignment to a cell C has the potential to affect the value of C, of all cells that reference C in their own expressions, of all cells referencing any of those cells, and so on.
Stimulus/Response sequence
API
stimulus An expression is inserted into a cell.
response Subsequent retrievals of expression from that cell show the new expression.
response Subsequent retrievals of values from that cell show the value obtained by evaluation of the new expression. Subsequent retrievals of values from cells that depend, directly or indirectly, on this cell may also change.
GUI
stimulus A cell in the spreadsheet is selected and a new expression entered. assignments
Subsequent retrievals of expression from that cell show the new expression.
response Subsequent retrievals of values from that cell show the value obtained by evaluation of the new expression. The GUI display of the cell is updated to reflect this new value.
response Subsequent retrievals of values form cells that depend, directly or indirectly, on this cell may also change. For any such dependent cells currently visible in the GUI, the displays of those cells are updated to reflect their new values.
GUI/Command Line
stimulus A spreadsheet is loaded from a file containing a list of assignments
response Identical to multiple API assignments
Associated functional requirements
- AS-1
- Assignment of an expression to a cell removes any prior direct dependencies of that cell, but may add new direct dependencies.
- AS-2
- Following any assignment of an expression E to a cell, a value retrieval from that cell will produce the value of E.
(This requirement should not be read as indicating that the re-evaluation of C must occur at the time of the assignment, nor as an indication that it must at the time of the value retrieval. This requirement is satisfied if the expression evaluation is performed at either of those times or at any time in between. )
- AS-3
- Operations that copy an expression from one cell to another are considered an assignment to the second cell. (The copied expression is modified as described in EX-6.)
- AS-4
- The primary input format for loading spreadsheets from files consists of a series of assignments, one per line, with syntax given in Expression Grammar.
- AS-5
- The action of loading a file is equivalent to creating an empty spreadsheet, then performing each of the assignments indicated in the file, in the order presented.
- AS-6
- An input assignment to an
AreaReference
is treated as an assignment of the indicated expression to the cell named to the left of the ‘:’, followed by a copy from that cell to each other cell in the rectangular region denoted byAreaReference
. - AS-7
- An input assignment that is syntactically incorrect results in an error message “Syntax error in line ##” and no change to the spreadsheet. Subsequent lines in the input will be processed normally.
- AS-8
- An API assignment of an expression that is syntactically incorrect results in an exception, with no change made to the cell.
- AS-9
- A GUI assignment of an expression that is syntactically incorrect results in no change to the cell, but a non-fatal notification within the GUI.
3.2.5 Outputs
Introduction/Purpose of feature
The possible outputs from the spreadsheet involve:
-
direct queries in the API for the expression or value of a cell
-
a “saved” spreadsheet format listing expression assignments to each non-empty cell.
-
a “value” report listing the values of each non-empty cell
Stimulus/Response sequence
API/GUI
stimulus Request is made to save the spreadsheet in a given file
response A list of assignments sufficient to recreate the spreadsheet is written to that file.
API/GUI
stimulus Request is made to save the value report in a given file
response The value report is written to that file.
Command Line
stimulus Command line program is run.
response At end, value report is generated.
Associated functional requirements
- OU-1
- A saved spreadsheet report consists of a list of assignments conforming to the syntax of Expression Grammar.
- OU-2
- The contents of a saved spreadsheet report shall be sufficient to re-create the value of the spreadsheet at the time of the save operation.
- OU-3
- The saved spreadsheet may employ
AreaRefernece
s to reduce the number of assignments to be written out. This is optional (though encouraged).
- OU-4
- The value report consists of multiple lines containing a cell name, a single blank space, and then the value of that cell.
- OU-5
- Only non-empty cells appear in the value report.
- OU-6
- Cells are listed in the value report in ascending order by name.
- OU-7
- Cellnames with row 0 are written to the value report with an empty row designator. (e.g.,
Total0
would be written to the report asTotal
.)
3.3 Performance requirements
The reference platform for performance evaluation shall be CS Dept’s Linux servers, sirius
and atria
.
-
The system should be capable of evaluating a spreadsheet with 10,000 non-empty cells, with a total of 15,000 inter-cell references, within one second on the reference platform.
-
The system should be capable of loading and parsing a file describing a spreadsheet with 10,000 non-empty cells, with a total of 15,000 inter-cell references, within 3 seconds on the reference platform.
3.4 Design constraints
3.5 Software system attributes
- SSA-1
- The set of value types supported by the system shall be extendable, requiring a maximum of 40 hours effort from a programmer with a grasp of inheritance (B-level CS330) and parsing (B-level CS488).
- SSA-2
- The set of functions supported by the system shall be extendable, requiring a maximum of 5 hours effort from a programmer with a grasp of inheritance (B-level CS330).
3.6 Other requirements
- OR-1
- The system should run on Windows, Linux and OS/X systems.
- OR-2
- The system should build on Windows, Linux and OS/X systems.
4 Appendices
4.1 Expression Lexemes
This section presents the definitions of the tokens that make up the expression language. Each token type is denoted by a regular expression.
-
CellName
: used to name a cell for the purpose of retrieval (API) or as a target for an assignment.([A-Za-z][A-Za-z0-9]*!)*([A-Za-z][A-za-z]*)([0-9]*)
The dimension of a cellname is the number of substrings matching the parenthesized components in the expression of DC-1. Each such matching substring specifies a coordinate along that dimensional axis.
Dimensions in the names are counted from right to left, starting with 1, so the purely numeric component at the end is dimension 1 (and sometimes referred to as the “row”), the purely alphabetic component before that is dimension 2 (sometimes referred to as the “column”), and so on.
-
CellReference
: used within expressions to denote a cell from which a value should be fetched([A-Za-z][A-Za-z0-9]*!)*($?[A-Za-z][A-za-z]*)($?[0-9]*)
The optional
$
symbols, if present, indicate that the following dimension coordinate is frozen during movement and copy operations. -
AreaName
: denotes a rectangular area of a spreadsheet as a target for an assignment.([A-Za-z][A-Za-z0-9]*!)*($?[A-Za-z][A-za-z]*)($?[0-9]*):(?[A-Za-z][A-za-z]*)(?[0-9]*)
-
AreaReference
: describes a rectangular region of the spreadsheet from which values should be fetched.1([A-Za-z][A-Za-z0-9]*!)*($?[A-Za-z][A-za-z]*)($?[0-9]*):($?[A-Za-z][A-za-z]*)($?[0-9]*)
-
functionName
: name of a function being called using the parenthesized comma-separated list of parameters form.[A-Za-z]A-Za-z0-9]*
4.2 Expression Grammar
The primary starting points in this grammar are <Expression>
and <InputFile>
As is conventional when describing syntax, whitespace is not included in the grammar. Only blanks and tab characters are considered valid whitespace in this application. Line terminators are represented as explicit tokens.
<InputFile> ::= <assignmentList>
<assignmentList> ::= | <Assignment> <assignmentList> lineTerminator
<Assignment> ::= CellReference '=' <Expression>
| AreaReference '=' <Expression>
<Expression> ::= <Compared> <relop> <Expression> | <Compared>
<Compared> ::= <Factor> <multOp> <Compared> | <Factor>
<Factor> ::= <Term> <addOp> <Factor> | <Term>
<Term> ::= <Literal> | <FunctionCall> | ( <Expression> )
<FunctionCall> ::= functionName ( <paramList> )
<paramList> ::= <param> | <param> , <paramList>
<param> ::= <Expression> | AreaReference
<Literal> :== numericLiteral | stringLiteral | CellRefernece
<relOp> :== '==' | '!=' | '<' | '<=' | '>' | '>='
<multOp> :== '*' | '/' | '%'
<addOp> :== '+' | '-'
An example of a sentence satisfying <InputFile>
would be
B4 = sum(B1:B3)
A1 = 23
A2 = 27
A3 = A2+4
B1:B3 = sqrt(2*a1)
The final line in the above example is understood as a shorthand for
B1 = sqrt(2*a1)
B2 = sqrt(2*a2)
B3 = sqrt(2*a3)
4.3 Functions
The minimum set of function to be supported are
-
if(cond, then, else)
- cond:
<Expression>
- then:
<Expression>
- else:
<Expression>
cond
. If numeric and non-zero or string and non-empty, evaluates and returns the value ofthen
, otherwise evaluates and returns the value ofelse
. - cond:
-
sqrt(x)
- x:
<Expression>
x
if numeric and non-negative. - x:
-
min(...)
- parameters are 1 or more
<Expression>
orAreaReference
- parameters are 1 or more
-
max(...)
- parameters are 1 or more
<Expression>
orAreaReference
- parameters are 1 or more
-
count(A)
A
: anAreaReference
A
-
sum(A)
A
: anAreaReference
A
-
sumproduct(A,B)
A
: anAreaReference
B
: anAreaReference
of the same dimensions asA
A
andB
. -
average(A)
A
: anAreaReference
A
contains at least one non-empty, numerically-valued cell, returns the mean average of all such cells inA
. -
stddev(A)
A
: anAreaReference
A
contains at least two non-empty, numerically-valued cells, returns the sample standard deviation of all such cells inA
.
In all cases, inputs not conforming to the above descriptions cause the function to return the Error
value.
1: This is a deliberate simplification of the similar idea described in the original requirements definition, which allowed prismatic regions of arbitrary dimension.