WIKIPEDIA MISSION - Tvikia's primary mission is to allow searching and viewing of a large subset of wikipedia on a common television set in the developing world.
CHEAP PC MISSION - Tvikia's secondary mission is to disseminate a sub-$20 general purpose personal computer platform into the developing world.
COMMUNITY MISSION - Tvikia's tertiary mission is to foster a community of developers within the developed world who can aid, support, and partially subsidize the platform.
The design consists of three Atmega8 MCU's surrounded by a variety of peripherals and expansion headers. The layout can be only partially populated in order to reduct cost for the Wikipedia-specific mission.
The three microcontroller units are:
Power - All board components except the SD Card are 5V and powered from the USB connector without further regulation. The Micro-USB connector has been adopted as the standard power connector for all cell phones within the next couple years. We hope this will save the cost of providing a seperate power adaptor (or at least ensure it will be generic).
3.3V Power - A voltage regulator is used to power the 3.3V SD Card.
DIO Bus - All three MCUs have limited access to the digital IO bus. All MCUs have access to DIO lines 0 through 7, although the V and P Chips have DIO0 (RX) and DIO1 (TX) reversed so they can talk to the M Chip. The DIO bus is broken out in the Arduino-compatible expansion headers. Several peripherals, including the IR devices, the audio output, and input switches, are located on the DIO bus, and can thus be used from any of the MCU's.
SPI Bus - The P and M chips are connected to the SPI bus. The SPI bus is also broken out in the Arduino-compatible expansion headers. However the P chip has MISO and MOSI reversed so that it can talk directly to the M chip. The P chip also has access to the M chip's RESET line, so it can be used for serial programming.
3.3V SD Card Signals - The SD Card requires 3.3V signals. We use a simple set of voltage dividers to divide down our 5V MCU outputs. The voltage divider values are taken from the Uzebox design schematic. The Arduino Wave Shield uses the same voltage divider method. Our MCU is tolerant to the 3.3V outputs from the SD card.
We could easily move the entire design to a 3.3V regulated power bus - even reducing part counts in the USB and SD interfaces. However, doing so would break our Arduino shield compatiblity, increase the board current demands, and require minor revision to several interface circuits that we copied from other 5V designs.
Analog Bus - The Analog Bus connects the MCHIP MCU directly to the Arduino-compatible analog expansion header.
ICSP - In-Circuit Serial Programming. A seperate ICSP header is provided for each MCU. MOSI, MISO, and SCK are shared across the SPI Bus for the M and P chips. However, each chip has a seperate reset and MOSI/MISO is reversed for the P Chip so there should be no interference during programming unless the other chip is actively trying to disrupt the SPI bus. Caveat Empour for the programmer.
DIO Bus Allocations - The IR Receiver module, if installed, will take over pin DIO7 and make it unavailable for other uses (such as on an expansion Shield). The four input buttons will not interfere with their bus lines unless pressed - they are meant to be used with the internal pull-up resistors on the MCUs. The IR LED and audio output on the DIO bus are output-only and unintended output is mostly harmless if the lines are used for other purposes, although it should be considered carefully. DIO8 and DIO9 are tied to RESET for the V and P Chips, so they are unavailable for other uses.
RESET - There is no board-wide reset signal for all three MCUs. There is no reset buttons. Full reset can be achieved by removing the power connector. The M Chip MRESET is provided to the standard Arduino expansion power header, and P and V Chip RESETs are available on DIO8 and 9, respectively. Alternatively, reset of all MCUs could be achieved by programming the M Chip to invoke the reset lines of the V and P Chips upon startup if a reset button is provided via expansion shield.
The P Chip has control over the M Chip RESET. The M Chip has control over the P and V chip reset lines via the DIO8 and 9 signals.
(optional) indicates parts not needed for the Wikipedia Mission
Video DAC - This is based on the Tellymate Shield design, which provides a partial VT52 emulation. The high-frequency line is controlled by the SPI MOSI line, while blanking interval controll is via a GPIO pin. Pixels are shifted out through the AVR SPI peripheral. We do not provide the 75 ohm termination option that the Tellymate offers on some versions.
Buttons - Four tactile switches are used for basic input. They are arranged in a single thumb up/down/left/right configuration. They should be adequate for Wikipedia navigation, and a text entry method for search can easily be devised. The buttons pull down DIO lines 2,3,4, and 5 respectively when pressed. A matrix configuration was rejected so that arbitrary button chording could be achieved.
MicroSD Connector - Our MicroSD interface is based on the Uzebox.
USB Power is fed through a 500mA fuse and filtered but not regulated.
USB Data (optional) - USB signal lines are supposed to be 3.3V, but we have a 5V design. The D+ and D- lines are thus clamped to 3.6V by two diodes after a pair of buffering resistors. The D- line is also pulled up by the 2.2k resistor. Both D+ and D- lines are fed into the P Chip. This voltage conversion design came from the internet (url???).
Audio Out (optional) - Simple PWM audio output is possible using the DIO5 line and the second RCA phono jack. A very simple capacitor filter is used. This design came from the internet (url???).
Infrared Communications (optional) - An 38khz IR detector outputs to DIO7. An IR LED is driven by a diode controlled by DIO6. This configuration should support communications at least up to 800 baud, and perhaps 2000 baud. The IR LED will have to be modulated at 38khz by one of the AVR's using a timer-driven interrupt.
Keyboard IF (optional) - The PS/2 keyboard data and clock (KBDATA, KBCLK) are connected to GPIO pins on the V Chip. The intention is that the lines could be sampled during horizontal blanking intervals and input sent via the serial lines to the M Chip.
We provide a set of physically and electrically Arduino-compatible expansion headers. Two of the headers break out the DIO and SPI bus, one is devoted to M Chip Analog I/O, and one is a power header.
Tvikia is not capable of taking unregulated power from the VIN pin of the arduino power header. The VIN pin is unconnected. This is also true of the LilyPad.
Tvikia does not provide 3.3V on the power header. Only Arduino Diecimila provides 3.3V on this header.
The board is designed to provide flexibility in programming the three MCUs.
SD Card Bootloader - For general purpose computing, the M Chip will require a bootloader to boot various programs off of the SD Card. The Uzebox project have recently created a similar bootloader which could be used as a starting point.
In addition to loading image files for the M Chip, the M Chip should also have a means to reprogram the V and P chips from the SD Card. (see serial method below).
ICSP - Each MCU has its own ICSP header through which it can be programmed directly on the most basic level.
M Chip by USB Serial via Serial Boot Loader - The P Chip can be programmed with software from the V-USB project to act as a serial USB device which transcieves on the Rx and Tx lines (which are reversed between the M Chip and the P Chip). The M Chip can then be programmed via the P Chip if the M Chip is programmed with the stock serial bootloader.
M Chip by USB Serial via ICSP - The P Chip can take control of the MISO, MOSI, SCK, and MRESET lines of the M Chip. It can thus program the M Chip via ICSP using AVRICE-compatible software written by the V-USB project.
P and V Chips by M Chip via Serial Bootloader - If the P and V chips are loaded with the default serial bootloader, then they can be put into reset and programmed by the M Chip over the Rx/Tx lines.
|S4 CONFLICT Audio Out
|Green LED/PCHIP RESET (CONFLICT)
|SD Card Chip Select/VCHIP RESET (CONFLICT)
|SD Card Chip Select (CONFLICT WITH SHIELDS USE PB1)
|SD Card (shared)
|SD Card (shared)
|SD Card (shared)
|SD Card (shared)
|SD Card (shared), Green LED
|SD Card (shared)
|Keyboard Clock (KBCLK)
|Keyboard Clock (KBCLK)
|Keyboard Data (KBDATA)
|Keyboard Data (KBDATA)
The Wikipedia data is stored on-disk in a B-Tree database. A custom B-Tree database system with an extremely small memory footprint called "bgtree" was created to run on the microcontroller. It resides in software/bgtree
A bgtree database consists of table files (extension .bgt) which contain mappings of 32-bit keys to 32-bit numbers. The domain of the mapping is often a CRC32 checksum of a string, while the target of the mapping is often a 32-bit offset into a blob file.
A blob file (extension .bgb) consists of a series of entries containing a combination of binary signed or unsigned 32-bit numbers and variable length strings. Variable length strings are stored as a 32-bit length followed by the specified number of bytes.
The format of the data contained in a blob file is described with a schema string. For example, the schema string "xiIs" indicates an unsigned hexidecimal 32-bit binary integer (x) followed by a signed 32-bit integer (i), an unsigned 32-bit integer (I), and finally a variable length string (s).
The Wikipedia database is stored in several files:
The schema used in the sections blob file is "xiss":
Articles are stored as a set of Sections in a non-intuitive but efficient way. The CRC32 checksum of an article title string forms a B-tree key. This key maps to an offset for the top level article section in the sec.bgb blob file. However sub-sections of the article are enumerated in sequence using the article key as a start. For example, the article key+1, if it exists, points to the first sub-section of the article. Article key+2 points to the second sub-section, which may (depending on the level field) be a sub-section of the top level or a sub-section of the second section.
Thus once the top CRC32 checksum of an article title generates the first key, you must iterate through the subsequent keys to get all the sections of an article. An article is complete when the subsequent key does not exist in the mapping.
Because of the sorted structure of the B-tree nodes, this traversal through subsequent keys is efficient - if a little odd.
Data can be extracted from the Wikipedia bgtree database file using the bgTreeCmd command line program build in software/build/unix/bgtree/
Because articles are stored in Sections, we will start by getting the CRC32 checksum for a title string. Lets assume we are looking for the "Anarchism" article. Our database contains titles in all capital letters, so we will search for "ANARCHISM".
bgTreeCmd search sec.bgt SANARCHISM
CRC32=178ed3ba for ANARCHISM len=9
The prefix "S" to ANARCHISM tells bgTreeCmd that this is a string which will be hashed into a CRC32 key, and not a hex key itself. The output from the command tells us that the CRC32 for ANARCHISM is 178ed3ba. It also tells us that the blob file offset is 56 bytes (it is one of the first entries in the blob file).
We now extract the top level section, using the CRC32 key:
bgTreeCmd search sec.bgt 178ed3ba sec.bgb xiss
178ed3ba 00000056 0 0 Anarchism
'''Anarchism''' is a political philosophy [...snip....]
To obtain each sub-section, we increment the key:
bgTreeCmd search sec.bgt 178ed3bb sec.bgb xiss
178ed3bb 00000b67 1 395236282 Origins Some claim anarchist themes can be found in the works of Taoism [...snip...]
Increment the key and repeat until a key is not found.
Raw XML Wikipedia dumps are processed using the parseWikiDump.py script in software/preprocess/. This script does multiple stages of data manipulation and article pruning.
Articles are pruned from the collection until the resulting database meets a specified size. Pruning is done by counting the number of references to each article from other articles, and dropping the lowest scoring articles.