Abstract 1. Background - Sites.cs.ucsb.edu

1y ago
43 Views
3 Downloads
847.82 KB
11 Pages
Last View : 12d ago
Last Download : 3m ago
Upload by : Audrey Hope
Transcription

Implements BIOS emulation support forBHyVe: A BSD HypervisorAbstractCurrent BHyVe only supports FreeBSD/amd64as a GuestOS.One of the reason why BHyVe cannot supportother OSes is lack of BIOS support.My project is implementing BIOS emulator onBHyVe, to remove these limitations.1. Background1.1 History of virtualization on x86architectureThere's a famous requirements called "Popek& Goldberg Virtualization requirements"1,which defines a set of conditions sufficient foran architecture to support virtualizationefficiently.Efficient virtualization means virtualizemachine without using full CPU emulation, runguest code natively.Explain the requirements simply, to anarchitecture virtualizable, all sensitiveinstructions should be privileged instruction.Sensitive instructions definition is theinstruction which can interfere the global statusof system.Which means, all sensitive instructionsexecuted under user mode should be trappedby privileged mode program.Without this condition, Guest OS affects HostOS system status and causes system crash.x86 architecture was the architecture whichdidin’t meet the requirement, because It hadnon-privileged sensitive instructions.To virtualize this architecture efficiently,hypervisors needed to avoid execute theseinstructions, and replace instruction withsuitable operations.There were some approach to implement it:On VMware approach, the hypervisor replacesproblematic sensitive instructions on-the-fly,while running guest machine. This approachcalled Binary Translation2.It could run most of unmodified OSes, but ithad some performance overhead.On Xen approach, the hypervisor requires torun pre-modified GuestOS which replacedproblematic sensitive instructions to dedicatedoperations called Hypercall. This approachcalled Para-virtualization 3.It has less performance overhead than BinaryTranslation on some conditions, but requirespre-modified GuestOS.Due to increasing popularity of virtualizationon x86 machines, Intel decided to enhance x86architecture to virtualizable.The feature called Intel VT-x, or HardwareAssisted Virtualization which is vendorneutral term.AMD also developed hardware-assistedvirtualization feature on their own CPU, calledAMD-V.1.2 Detail of Intel VT-xVT-x provides new protection model whichisolated with Ring protection, forvirtualization.It added two CPU modes, hypervisor modeand guest machine mode.Hypervisor mode called VMX Root Mode,and guest machine mode called VMX nonRoot Mode(Figure 1).Gerald J. Popek and Robert P. Goldberg. 1974. Formal requirements for virtualizable thirdgeneration architectures. Commun. ACM 17, 7 (July 1974), 412-421. DOI 1011.36107323Brian Walters. 1999. VMware Virtual Platform. Linux J. 1999, 63es, Article 6 (July 1999).Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer,Ian Pratt, and Andrew Warfield. 2003. Xen and the art of virtualization. In Proceedings of thenineteenth ACM symposium on Operating systems principles (SOSP '03). ACM, New York, NY,USA, 164-177. DOI 10.1145/945445.945462 http://doi.acm.org/10.1145/945445.945462

VMXroot modeVMXnon-rootmodeUser(Ring 3)User(Ring 3)Kernel(Ring 0)Kernel(Ring 0)Figure 1. VMX root Mode and VMX non-rootModeOn VT-x, hypervisor can run guest OS onVMX non Root Mode without anymodification, including sensitive instructions,without affecting Host OS system status.When sensitive instructions are being executedunder VMX non Root Mode, CPU stopsexecution of VMX non Root Mode, exit toVMX Root Mode.Then it trapped by hypervisor, hypervisoremulates the instruction which guest tried toexecute.Mode change from VMX Root Mode to VMXnon-root Mode called VMEntry, from VMXnon-root Mode to VMX Root Mode calledVMExit(Figure 2).VMXroot modeVMXnon-rootmodeUser(Ring 3)User(Ring 3)Kernel(Ring 0)VMEntryVMExitKernel(Ring 0)Figure 2. VMEntry and VMExitSome more events other than sensitiveinstructions which need to intercept byhypervisor also causes VMExit.For example, IN/OUT instruction causesVMExit, and hypervisor emulates virtualdevice access.VT-x defines number of events which cancause VMExit, and hypervisor needs toconfigure enable/disable on each VMExitevents.Reasons of VMExit is called VMExit reason,it classified by genres of events.Here are VMExit reason list: Exception or NMI External interrupt Triple fault INIT signal received SIPI received SM received Internal interrupt Task switch CPUID instruction Intel SMX instructions Cache operation instructions(INVD,WBINVD) TLB operation instructions(HNVLPG,INVPCID) IO operation instructions(INB, OUTB, etc) Performance monitoring conter operationinstruction(RDTSC) SMM related instruction(RSM) VT-x instructions(Can use for implementnested virtualization) Accesses to control registers Accesses to debug registers Accesses to MSR MONITOR/MWAIT instructions PAUSE instruction Accesses to Local APIC Accesses to GDTR, IDTR, LDTR, TR VMX preemption timer RDRAND instruction

VMCS data format revision number.Error code of VMExit failure.VMCS revision identifierVMX-abort indicatorGuest-state areaVMCS dataHost-state areaAn area for saving / restoring guest registers.Register saving/restoring are automaticallypreformed at VMExit/VMEntry.(Actually not all register are it's target. Someregisters should save by hypervisor manually.)The area saves some non-register state,instruction blocking state etc.VM-exection control fieldsVM-exit control fieldsVM-entry control fieldsVM-exit information fieldsAn area for saving / restoring hypervisor registers.Usage is almost identical with Guest-state area.A field control processor behavior in VMX non-rootoperation. VMExit causing events can configurehere.A field control processor behavior in VMExitoperation.A field control processor behavior in VMEntryoperation. Enabling/Disabling 64bit mode canconfigure here.VMExit reason stored here.Figure 3. Structure of VMCSAll configuration data related to VT-x stored toVMCS(Virtual Machine Control Structure),which is on memory data structure for eachguest machine4.Figure 3 shows VMCS structure.1.3 VT-x enabled hypervisor lifecycleHypervisors for VT-x works as followinglifecycle (Figure 4).1. VT-x enablingIt requires to enable at first to use VT-xfeatures.To enable it, you need set VMXE bit onCR4 register, and invoke VMXONinstruction.2. VMCS initializationVMCS is 4KB alined 4KB page.You need to notify the page address to CPUby invoking VMPTRLD instruction, then4write initial configuration values byVMWRITE instruction.You need to write initial register valueshere, and it done by /usr/sbin/bhyveload.3. VMEntry to VMX non root modeEntry to VMX non root mode by invokingVMLAUNCH or VMRESUME instruction.On first launch you need to useVMLAUNCH, after that you need to useVMRESUME.Before the entry operation, you need tosave Host OS registers and restore GuestOS registers.VT-x only offers minimum automatic save/restore features, rest of the registers need totake care manually.4. Run guest machineCPU runs VMX non root mode, guestmachine works natively.If guest system has two or more virtual CPUs, VMCS needs for each vCPUs.

1. VT-x enabling2. VMCSinitialization7. Run anotherprocess3. VMEntry to VMXnon root mode6. Do emulation forthe exit reason4. Run guestmachine5. VMExit for someexit reasonFigure 4. VT-x enabled hypervisor lifecycle5. VMExit for some reasonWhen some events which causes VMExit,CPU returns to VTX root mode.You need to save/restore register at first,then check the VMExit reason.6. Do emulation for the exit reasonIf VMExit reason was the event whichrequires some emulation on hypervisor,perform emulation. (Ex: Guest OS wrotedata on HDDDepending Host OS scheduling, it mayresume VM by start again from 3, or taskswitch to another process.1.4 Memory VirtualizationMordan multi-tasking OSes use paging toprovide individual memory space for eachprocesses.To run guest OS program natively, addresstranslation on paging become problematicfunction.For example (Figure 5):You allocate physical page 1- 4 to Guest A, and5-8 to GuestB.Both guests map page 1 of Process A to page 1of guest physical memory.Then it should point to: Page 1 of Process A on Guest A - Page 1 of Guest physical memory - Page 1 of Host physical Page 1 of Process B on Guest B - Page 1 of Guest physical memory - Page 5 of Host physicalBut, if you run guest OS natively, CPU willtranslate Page 1 of Process B on Guest B toPage 1 of Host physical memory.Because CPU doesn’t know the paging forguests are nested.There is software technique to solve theproblem called shadow paging (Figure 6).Hypervisor creates clone of guest page table,set host physical address on it, traps guestwriting CR3 register and set cloned page tableto CR3.Then CPU able to know correct mapping ofguest memory.This technique was used on both Binarytranslation based VMware, and also earlyimplementation of hypervisors for VT-x.But it has big overhead, Intel decided to addnested paging support on VT-x from Nehalemmicro-architecture.EPT is the name of nested paging feature(Figure 7),It simply adds Guest physical address to Hostphysical address translation table.Now hypervisor doesn’t need to take care guestpaging, it become much simpler and faster.

Process A1Guest APage table A1Guest physical memory12Process B12Process A1Page table B132412Host physical memory341234Guest BPage table A12Process B125Guest physical memory7112Page table B13246834Figure 5. Problem of memory virtualizationPage table A'1Host physical memory52Page table B'12Process A112234Guest APage table A122Process B178Page table B132456781234Guest physical memoryFigure 6. Shadow pagingHost physical memoryProcess A1Process B12Figure 7. EPTGuest APage table A122Page table B1234EPT A1234Guest physical memory1235674812345678

Actually, not all VT-x supported CPUssupports EPT, on these CPUs hypervisors stillneed to do shadow paging.2. BHyVe: BSD Hypervisor2.1 What is BHyVe?BHyVe is new project to implement ahypervisor witch will integrate in FreeBSD.The concept is similar to Linux KVM, itprovides “hypervisor driver” to unmodifiedBSD kernel running on bare-metal machine.With the driver, the kernel become ahypervisor, able to run GuestOS just likenormal process on the kernel.Both hypervisors are designed for hardwareassisted virtualization, unlike Xen’s paravirtualization and VMware’s binary translation.The kernel module only provides a feature toswitch CPU modes between Host mode andGuest mode, almost all device emulation isperformed in userland process.2.2 Difference of approach between LinuxKVM and BHyVeLinux KVM uses modified QEMU5 as theuserland part 6.It’s good way to support large coverage ofGuest OSes, because QEMU is highlydeveloped emulator, many people alreadyconfirmed to run variety of OSes on it.KVM could support almost same features whatQEMU has, and it just worked fine.BHyVe’s approach is different.5BHyVe implements minimum set of devicesupport which required to run FreeBSD guest,from scratch.In the result, we could have completely GPLfree, BSD licensed, well coded hypervisor, butit only supports FreeBSD/amd64 as a GuestOS at this point.One of the reason why BHyVe cannot supportother OSes is lack of BIOS support.BHyVe loads and executes FreeBSD kerneldirectly using custom OS loader runs on HostOS, instead of boot up from disk image.With this method, we need to implement OSloader for each OSes, and currently we don’thave any loader other than FreeBSD.Also, it doesn’t support some OSes which callsBIOS function while running.So I started the project to implementing BIOSemulator on BHyVe, to remove theselimitations.2.3 Hardware requirementsBHyVe requires an Intel CPU which supportsIntel VT-x and EPT.It means you will need Nehalem core or laterIntel CPUs, because EPT is only supported onthese processors.Currently, AMD-V is not supported.Installing on physical machine is best choice,but it also works on recent version of VMware,using Nested virtualization feature 7.2.3 Supported featuresBHyVe only supports FreeBSD/amd64 8-10for guest OS.Original QEMU has full emulation of x86 CPU, but on KVM we want to use VT-x hardwareassisted virtualization instead of CPU emulation.So they replace CPU emulation code to KVM driver call.6Strictly speaking, KVM has another userland implementation called Linux Native KVM Tools,which is built from scratch - same as BHyVe’s userland part.And it has similar limitation with BHyVe.7The technology which enables Hypervisor on Hypervisor. Note that it still requires Nehalemcore or later Intel CPUs even on VMware.

2.4 BHyVe internalBHyVe built with two parts: kernel module anduserland process.The kernel module is called vmm.ko, itperforms actions which requires privilegedmode (ex: executes VT-x instructions.Userland process is named /usr/sbin/bhyve,provides user interface and emulates virtualhardwares.BHyVe also has OS Loader called /usr/sbin/bhyveload, loads and initializes guest kernelwithout BIOS./usr/sbin/bhyveload source code is based onFreeBSD bootloader, so it outputs bootloaderscreen, but VM instance is not yet executing atthat stage.It runs on Host OS, create VM instance andloads kernel onto guest memory area,initializes guest machine registers to preparedirect kernel boot.To destroy VM instance, VM control utility /usr/sbin/bhyvectl is available.These userland programs are accesses vmm.kovia VMM control library called libvmmapi.Figure 8 illustrates overall view of BHyVe.82. Run VM instaceDisk image1. Create VM instance,load guest kernelGuestkernelbhyveloadtap devicestdin/stdoutHDNICConsoleIt emulates following devices: HDD controller: virtio-blk NIC controller: virtio-net Serial console: 16550 compatible PCI UART PCI/PCIe devices passthrough (VT-d)Boot-up from virtio-blk with PCI UARTconsole is not general hardware configurationon PC architecture, we need to change guestkernel settings on /boot/loader.conf(on guestdisk image).And some older FreeBSD also need to add avirtio drivers 8.PCI device passthrough is also supported, ableto use physical PCI/PCIe devices directly.Recently ACPI support and IO-APIC supportare added, which improves compatibility withexisting OSes.bhyve3. Destroy VMinstancebhyvectllibvmmapimmap/ioctl/dev/vmm/ {vm name} (vmm.ko)FreeBSD kernelFigure 8. BHyVe overall view3. Implement BIOSEmulation3.1 BIOS on real hardwareBIOS interrupt calls are implemented assoftware interrupt handler on real mode(Figure9).CPU executes initialization code on BIOSROM at the beginning of startup machine, itinitializes real mode interrupt vector to handlenumber of software interrupts reserved forBIOS interrupt calls(Figure 10).BIOS interrupt calls aren’t only for legacyOSes like MS-DOS, almost all boot loaders formordan OSes are using BIOS interrupt call toaccess disks, display and keyboard.3.2 BIOS on Linux KVMOn Linux KVM, QEMU loads RealBIOS(called SeaBIOS) on guest memory areaat the beginning of QEMU startup.KVM version of SeaBIOS’s BIOS call handleraccesses hardware by IO instruction ormemory mapped IO, and the behavior isbasically same as BIOS for real hardware.The difference is how the hardware accesshandled.On KVM, the hardware access will trapped byKVM hypervisor driver, and QEMU emulatesvirtio is para-virtual driver which designed for Linux KVM. para-virtual driver needs specialdriver for guest, but usually much faster than full emulation driver.

int 13hSoftware interrupt(INTx)CPU reads interrupt vectorExecute BIOS call handlerIOHardwareFigure 9. BIOS interrupt call mechanism on real hardware③Handler accesses HW by IO instructionhighmemFFFF:000FROM BIOSFFFF:0000Video RAM, etc②Jump to the handler addressA000:0000lowmemInterrupt vector①Fetch interrupt handler addressF000:00000000:04000000:0000Figure 10. Memory map on real hardwareint 13hSoftware interrupt(INTx)CPU reads interrupt vectorExecute BIOS call handlerSeaBIOS preforms IOto virtual HWIO TrapQEMU HWEmulationGuestHyperVisorQEMU emulates HW IOFigure 11. BIOS interrupt call mechanism on KVM③Handler accesses HW by IO instrQEMU emulates the IOhighmemSeaBIOS②Jump to the handler addressVideo RAM, etclowmem①Fetch interrupt handler addressInterrupt 00000:0000Figure 12. Memory map on KVM

hardware device, then KVM hypervisor driverresume a guest environment(Figure 11).In this implementation, KVM and QEMUdoesn’t trap BIOS interrupt calls, it just loadsreal BIOS on guest memory space(Figure 12)and emulates hardware device.3.3 Emulating BIOS on BHyVe3.3.1 doscmdPort SeaBIOS on BHyVe and implementhardware emulation was an option, and it wasprobably best way to improve compatibility oflegacy code, but SeaBIOS is GPL’d software,it’s not comfortable to bring in FreeBSD codetree.And there’s no implementation non-GPLopensourced BIOS.Instead, there’s BSD licensed DOS Emulatorcalled doscmd.It’s the software to run old DOS application onFreeBSD using virtual 8086 mode, similar toDOSBox(but DOSBox is GPL’d software).The emulator mechanism is described asfollows:1. Map pages to lowmem area (begin from0x0), load the DOS application on the area.2. Enter virtual 8086 mode, start executingthe DOS application.3. DOS application invokes BIOS interruptcall or DOS API call by INTx instruction.4. DOS Emulator traps software interrupt,emulate BIOS interrupt call or DOS APIcall.5. Resume DOS application.It traps BIOS interrupt calls and DOS API callsand emulate them on FreeBSD protected modeprogram.I decided to port the BIOS interrupt callemulation code to BHyVe and trap BIOSinterrupt call on BHyVe, instead of porting realBIOS.3.3.2 Run real mode program on VT-xOn older implementation of VT-x enabled CPUdoesn’t allow to VMEnter the guest whichdoesn’t enable paging.Which means real mode program cannot runon VT-x, and hypervisors needed to virtualizereal mode without VT-x.Linux KVM used full CPU emulation usingQEMU to virtualize real mode.Some other hypervisors are used virtual 8086mode.This issue was resolved by extending VT-xfeatures.Intel added unrestricted guest mode onWestmere micro-architecture and later IntelCPUs, it uses EPT to translate guest physicaladdress access to host physical address.With this mode, VMEnter without enablepaging is allowed.I decided to use this mode for BHyVe BIOSemulation.3.3.3 Trapping BIOS interrupt callVT-x has functionality to trap various event onguest mode, it can be done by changing VT-xconfiguration structure called VMCS.And BHyVe kernel module can notify theseevents by IOCTL return.So all I need to do to trapping BIOS call ischanging configuration on VMCS, and notifyevent by IOCTL return when it trapped.But the problem is which VMExit event isoptimal for the purpose.It looks like trapping software interrupt is theeasiest way, but we may have problem afterGuest OS switched protected mode.Real mode and protected mode has differentinterrupt vector.It’s possible to re-use BIOS interrupt callvector number for different purpose onprotected mode.Maybe we can detect mode change betweenreal mode/protected mode, and enable/disablesoftware interrupt trapping, but it’s bitcomplicated.Instead of implement complicated modechange detection, I decided to implementsoftware interrupt handler which causeVMExit.

The handler doesn’t contain programs forhandling the BIOS interrupt call, just performVMExit by VMCALL instruction.VMCALL causes unconditional VMExit.It’s for call hypervisor from guest OS, suchfunction is called Hypercall.Following is simplest handler implementation:VMCALLIRETEven program is same, you should have thehandler program for each vector.Because guest EIP can be use for determinehandled vector number.4. ImplementationMost of work are rewriting doscmd to fitBHyVe interface, from FreeBSD virtual 8086API. Code was 64bit unsafedoscmd was designed only for 32bit x86, andBHyVe is only for amd64.So I need to re-write some codes to 64bit safe.ex:u long uint32 tIf you place BIOS interrupt call handler start at0x400, and program length is 4byte for each(VMCALL is 3byte IRET is 1byte), you candetermine vector number from hypervisor withfollowing program: Guest memory area started from 0x0To use virtual 8086, doscmd places guestmemory area from 0x0.But BHyVe’s guest memory area is mapped tonon-zero address, we need to move all addressto BHyVe’s guest memory area.vector (guest eip - 0x400) / 0x4;ex:BHyVe need to initialize interrupt vector andset pointer of the handler described above.In this way, it doesn’t take care about modechanges anymore.Figure 13 shows BIOS interrupt callmechanism on my implementation.On the implementation, it traps BIOS interruptcall itself, emulates by hypervisor.int 13h*(char *)(0x400) 0; *(char *)(0x400 guest mem) 0; Interface with /usr/sbin/bhyveI don’t wanted to mix doscmd’s complicatedsource code with /usr/sbin/bhyve’s code, so Imodified doscmd’s Makefile to build it as alibrary.And named it libbiosemul.Software interrupt(INTx)CPU reads interrupt vectorExecute pseudo BIOS call handlerPseudo BIOS issueVMCALL instruction(Hypercall)GuestVMCALL TrapBHyVe BIOSEmulationHyperVisorFigure 13. BIOS interrupt call mechanism on BHyVeBHyVe emulates BIOS call

It exposed only few functions:void biosemul init(struct vmctx*ctx, int vcpu, char *lomem, inttrace mode);int biosemul call(struct vmctx*ctx, int vcpu);biosemul init is called at initialization.biosemul call is main function, which called atevery BIOS call. Guest register storagedoscmd stored guest register values on theirstructure, but BHyVe need to call ioctl to get /set register value.It’s hard to re-write all code to call ioctl, so Ididn’t changed doscmd code.I just copy all register values to doscmd structat beginning of BIOS call emulation, andcopyback it the end of the emulation. Instruction level tracingI implemented instruction level tracer to debugBIOS emulator.It’s also uses psuedo BIOS interrupt callhandler to implement.5. Development statusIt still early stage of development, none ofOSes boots up with the BIOS emulator.I’m focusing to boot-up FreeBSD/amd64, nowmbr and boot1 are working correctly.

under VMX non Root Mode, CPU stops execution of VMX non Root Mode, exit to VMX Root Mode. Then it trapped by hypervisor, hypervisor emulates the instruction which guest tried to execute. Mode change from VMX Root Mode to VMX non-root Mode called VMEntry, from VMX non-root Mode to VMX Root Mode called VMExit(Figure 2). User (Ring 3) Kernel (Ring .

Related Documents:

The CSS background properties allow you to control the background color of an element, set an image as the background, repeat a background image vertically or horizontally, and position an image on a page. Properties include background, background-color, background-attachment, background-image, background

While car buyers use a variety of sites to shop, third-party sites are the most-used site of any online resource. THIRD-PARTY SITES ARE THE MOST-USED SITES FOR ONLINE CAR SHOPPING 83% 77% 86% 35% 51% 29% 54% 55% 53% 3rd Party Sites Dealership OEM Sites Total New Used SOURCES USED TO SHOP *

to it. The Environmental Protection Agency (EPA) has identified 1,177 sites on its National Priorities List (NPL). Radium has been found above background levels at 18 of these sites. However, we do not know how many of the 1,177 NPL sites have been evaluated for radium. As EPA evaluates more sites, the number of sites at which radium is found above

understand the purpose of an abstract; know the structure of an abstract; understand the steps in writing an abstract; know how to identify the key components of an abstract from current examples; be able to write an abstract. Background: This lesson is one component of a course in how to prepare technical

Proxy web pages circumvent Web filters. They are commonly used at school and home. There are two methods to proxy web surfing: 1. proxy sites and 2. proxy servers. Proxy sites support web access within a webpage. Browsing history reports do not capture sites visited within proxy sites, and most filters do not block proxy sites or the

Social networking sites, alongside sites which enable users to put up their own pictures, text and videos (known as user-generated content) such as YouTube, blogging sites, and interactive games sites for example are part of a social and technological revolution that is known as Web 2.0. Web 2.0 is characterised by the ease

Social networking sites, alongside sites which enable users to put up their own pictures, text and videos (known as user-generated content) such as YouTube, blogging sites, and interactive games sites for example are part of a social and technological revolution that is known as Web 2.0. Web 2.0 is characterised by the ease

phishing sites that look almost (or exactly) like legitimate sites. Maybe we should give up on Alice. After all, attackers copy legitimate websites. They look identical to the real sites. Alice will never detect imitation sites by looking at them. However, maybe her browser can help. The browser can learn which sites Alice has accounts on