summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2013-06-25ruby: profiler: lots of inter-related changesNilay Vaish
The patch started of with removing the global variables from the profiler for profiling the miss latency of requests made to the cache. The corrresponding histograms have been moved to the Sequencer. These are combined together when the histograms are printed. Separate histograms are now maintained for tracking latency of all requests together, of hits only and of misses only. A particular set of histograms used to use the type GenericMachineType defined in one of the protocol files. This patch removes this type. Now, everything that relied on this type would use MachineType instead. To do this, SLICC has been changed so that multiple machine types can be declared by a controller in its preamble.
2013-06-24ruby: remove the three files related to profilingNilay Vaish
This patch removes the following three files: RubySlicc_Profiler.sm, RubySlicc_Profiler_interface.cc and RubySlicc_Profiler_interface.hh. Only one function prototyped in the file RubySlicc_Profiler.sm. Rest of the code appearing in any of these files is not in use. Therefore, these files are being removed. That one single function, profileMsgDelay(), is being moved to the protocol files where it is in use. If we need any of these deleted functions, I think the right way to make them visible is to have the AbstractController class in a .sm and let the controller state machine inherit from this class. The AbstractController class can then have the prototypes of these profiling functions in its definition.
2013-06-24ruby: MessageBuffer: Remove unused m_size variableJoel Hestness ext:(%2C%20Nilay%20Vaish%20%3Cnilay%40cs.wisc.edu%3E)
The m_size variable attempted to track m_prio_heap.size(), but it did so incorrectly due to the functions reanalyzeMessages and reanalyzeAllMessages(). Since this variable is intended to track m_prio_heap.size(), we can simply replace instances where m_size is referenced with m_prio_heap.size(), which has the added bonus of removing the need for m_size. Note: This patch also removes an extraneous DPRINTF format string designator from reanalyzeAllMessages() Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-06-20ruby: fix typo in MOESI_CMP_token protocolLena Olson
2013-06-18ruby: Fix prefetching for MESI_CMP_DirectoryLena Olson
Transitions from present on PF_Ifetch were missing, causing a crash when prefetching is enabled. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-06-18ruby: fix slicc compiler to complain about duplicate symbolsLena Olson
Previously, .sm files were allowed to use the same name for a type and a variable. This is unnecessarily confusing and has some bad side effects, like not being able to declare later variables in the same scope with the same type. This causes the compiler to complain and die on things like Address Address. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-06-18ruby: restrict Address to being a type and not a variable nameLena Olson
Change all occurrances of Address as a variable name to instead use Addr. Address is an allowed name in slicc even when Address is also being used as a type, leading to declarations of "Address Address". While this works, it prevents adding another field of type Address because the compiler then thinks Address is a variable name, not type. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-06-18x86: Add support for maintaining the x87 tag wordAndreas Sandberg
The current implementation of the x87 never updates the x87 tag word. This is currently not a big issue since the simulated x87 never checks for stack overflows, however this becomes an issue when switching between a virtualized CPU and a simulated CPU. This changeset adds support, which is enabled by default, for updating the tag register to every floating point microop that updates the stack top using the spm mechanism. The new tag words is generated by the helper function X86ISA::genX87Tags(). This function is currently limited to flagging a stack position as valid or invalid and does not try to distinguish between the valid, zero, and special states.
2013-06-18x86: Fix loading of floating point constantsAndreas Sandberg
This changeset actually fixes two issues: * The lfpimm instruction didn't work correctly when applied to a floating point constant (it did work for integers containing the bit string representation of a constant) since it used reinterpret_cast to convert a double to a uint64_t. This caused a compilation error, at least, in gcc 4.6.3. * The instructions loading floating point constants in the x87 processor didn't work correctly since they just stored a truncated integer instead of a double in the floating point register. This changeset fixes the old microcode by using lfpimm instruction instead of the limm instructions.
2013-06-18x86: Initialize the MXCSR registerAndreas Sandberg
2013-06-18x86: Make the boot state VMX compliantAndreas Sandberg
This patch allows the default x86 state to be used when by CPUs that use hardware virtualization.
2013-06-18x86: Make fprem like the fprem on a real x87Andreas Sandberg
The current implementation of fprem simply does an fmod and doesn't simulate any of the iterative behavior in a real fprem. This isn't normally a problem, however, it can lead to problems when switching between CPU models. If switching from a real CPU in the middle of an fprem loop to a simulated CPU, the output of the fprem loop becomes correupted. This changeset changes the fprem implementation to work like the one on real hardware.
2013-06-18kvm: Use the address finalization code in the TLBAndreas Sandberg
Reuse the address finalization code in the TLB instead of replicating it when handling MMIO. This patch also adds support for injecting memory mapped IPR requests into the memory system.
2013-06-18x86: Add helper functions to access rflagsAndreas Sandberg
The rflags register is spread across several different registers. Most of the flags are stored in MISCREG_RFLAGS, but some are stored in microcode registers. When accessing RFLAGS, we need to reconstruct it from these registers. This changeset adds two functions, X86ISA::getRFlags() and X86ISA::setRFlags(), that take care of this magic.
2013-06-18x86: Fix the flag handling code in FABS and FCHSAndreas Sandberg
This changeset fixes two problems in the FABS and FCHS implementation. First, the ISA parser expects the assignment in flag_code to be a pure assignment and not an and-assignment, which leads to the isa_parser omitting the misc reg update. Second, the FCHS and FABS macro-ops don't set the SetStatus flag, which means that the default micro-op version, which doesn't update FSW, is executed.
2013-06-11kvm: Add more VM statsAndreas Sandberg
This changeset adds the following stats to KVM: * numVMHalfEntries: Number of entries into KVM to finalize pending IO operations without executing guest instructions. These typically happen as a result of a drain where the guest must finalize some operations before the guest state is consistent. * numExitSignal: Number of VM exits that have been triggered by a signal. These usually happen as a result of the timer that limits the time spent in KVM.
2013-06-11kvm: Separate host frequency from simulated CPU frequencyAndreas Sandberg
We used to use the KVM CPU's clock to specify the host frequency. This was not ideal for several reasons. One of them being that the clock parameter of a CPU determines the frequency of some of the components connected to the CPU. This changeset adds a separate hostFreq parameter that should be used to specify the host frequency until we add code to autodetect it. The hostFactor should still be used to specify the conversion factor between the host performance and that of the simulated system.
2013-06-11kvm: Don't handle IO and execute in the same tickAndreas Sandberg
We currently execute instructions in the guest and then handle any IO request right after we break out of the virtualized environment. This has the effect of executing IO requests in the exact same tick as the first instruction in the sequence that was just run. There seem to be cases where this simplification upsets some timing-sensitive devices. This changeset splits execute and IO (and other services) across multiple ticks. This is implemented by adding a separate RunningService state to the CPU state machine. When a VM requires service, it enters into this state and pending IO is then serviced in the future instead of immediately. The delay between getting the request and servicing it depends on the number of cycles executed in the guest, which allows other components to catch up with the CPU.
2013-06-11kvm: Maintain a local instruction counter and update totalNumInstsAndreas Sandberg
Update the system's totalNumInst counter when exiting from KVM and maintain an internal absolute instruction count instead of relying on the one from perf.
2013-06-11x86: Fix bug when copying TSC on CPU handoverAndreas Sandberg
The TSC value stored in MISCREG_TSC is actually just an offset from the current CPU cycle to the actual TSC value. Writes with side-effects to the TSC subtract the current cycle count before storing the new value, while reads add the current cycle count. When switching CPUs, the current value is copied without side-effects. This works as long as the source and the destination CPUs have the same clock frequencies. The TSC will jump, sometimes backwards, if they have different clock frequencies. Most OSes assume the TSC to be monotonic and break when this happens. This changeset makes sure that the TSC is copied with side-effects to ensure that the offset is updated to match the new CPU.
2013-06-11sim: Revert [34e3295b0e39] (sim: Fix early termination in mult...)Andreas Sandberg
HG changset 34e3295b0e39 introduced a check in the main simulation loop that discards exit events that happen at the same tick as another exit event. This was supposed to fix a problem where a simulation script got confused by multiple exit events. This obviously breaks the simulator since it can hide important simulation events, such as a simulation failure, that happen at the same time as a non-fatal simulation event.
2013-06-11cpu: Add support for scheduling multiple inst/load stop eventsAndreas Sandberg
Currently, the only way to get a CPU to stop after a fixed number of instructions/loads is to set a property on the CPU that causes a SimLoopExitEvent to be scheduled when the CPU is constructed. This is clearly not ideal in cases where the simulation script wants the CPU to stop at multiple instruction counts (e.g., SimPoint generation). This changeset adds the methods scheduleInstStop() and scheduleLoadStop() to the BaseCPU. These methods are exported to Python and are designed to be used from the simulation script. By using these methods instead of the old properties, a simulation script can schedule a stop at any point during simulation or schedule multiple stops. The number of instructions specified when scheduling a stop is relative to the current point of execution.
2013-06-09ruby: remove several unused variables in ProfilerNilay Vaish
This patch removes per processor cycle count, histogram for filter stats, histogram for multicasts, histogram for prefetch wait, some function prototypes that do not have definitions.
2013-06-09ruby: remove periodic event from ProfilerNilay Vaish
The Profiler class does not need an event for dumping statistics periodically. This is because there is a method for dumping statistics for all the sim objects periodically. Since Ruby is a sim object, its statistics are also included.
2013-06-09ruby: stats: use gem5's stats for cache and memory controllersNilay Vaish
This moves event and transition count statistics for cache controllers to gem5's statistics. It does the same for the statistics associated with the memory controller in ruby. All the cache/directory/dma controllers individually collect the event and transition counts. A callback function, collateStats(), has been added that is invoked on the controller version 0 of each controller class. This function adds all the individual controller statistics to a vector variables. All the code for registering the statistical variables and collating them is generated by SLICC. The patch removes the files *_Profiler.{cc,hh} and *_ProfileDumper.{cc,hh} which were earlier used for collecting and dumping statistics respectively.
2013-06-09ruby: remove undefined functions in Address classNilay Vaish
2013-06-09stats: allow printing vectors on a single lineNilay Vaish
This patch adds a new flag to specify if the data values for a given vector should be printed in one line in the stats.txt file. The default behavior will be to print the data in multiple lines. It makes changes to print functions to enforce this behavior.
2013-06-04dev: Clarify why updates are delayed when the MC14818 is activatedAndreas Sandberg
2013-06-03arch: Create a method to finalize physical addressesAndreas Sandberg
in the TLB Some architectures (currently only x86) require some fixing-up of physical addresses after a normal address translation. This is usually to remap devices such as the APIC, but could be used for other memory mapped devices as well. When running the CPU in a using hardware virtualization, we still need to do these address fix-ups before inserting the request into the memory system. This patch moves this patch allows that code to be used by such CPUs without doing full address translations.
2013-06-03base: Make the Python module loader PEP302 compliantAndreas Sandberg
The custom Python loader didn't comply with PEP302 for two reasons: * Previously, we would overwrite old modules on name conflicts. PEP302 explicitly states that: "If there is an existing module object named 'fullname' in sys.modules, the loader must use that existing module". * The "__package__" attribute wasn't set. PEP302: "The __package__ attribute must be set." This changeset addresses both of these issues.
2013-06-03kvm: Allow architectures to override the cycle accounting mechanismAndreas Sandberg
Some architectures have special registers in the guest that can be used to do cycle accounting. This is generally preferrable since the prevents the guest from seeing a non-monotonic clock. This changeset adds a virtual method, getHostCycles(), that the architecture-specific code can override to implement this functionallity. The default implementation uses the hwCycles counter.
2013-06-03kvm: Add handling of EAGAIN when creating timersAndreas Sandberg
timer_create can apparently return -1 and set errno to EAGAIN if the kernel suffered a temporary failure when allocating a timer. This happens from time to time, so we need to handle it.
2013-06-03sim: Add debug output when executing pseudo-instructionsAndreas Sandberg
2013-06-03kvm: Add a call to thread->startup() in startup()Andreas Sandberg
It is now required to initialize the thread context by calling startup() on it. Failing to do so currently causes decoder in x86-based CPUs to get very confused when restoring from checkpoints.
2013-06-03dev: Add support for disabling ticking and the divider in MC146818Andreas Sandberg
Some Linux versions disable updates (regB.set = 1) to prevent the chip from updating its internal state while the OS is updating it. Support for this was already there, this patch merely disables the check in writeReg that prevented it from being enabled. The patch also includes support for disabling the divider, which is used to control when clock updates should start after setting the internal RTC state. These changes are required to boot most vanilla Linux distributions that update the RTC settings at boot.
2013-06-03dev: Clean up MC146818 register (A & B) handlingAndreas Sandberg
Rewrite reg A & B handling to use the bitunion stuff instead of bit masking. Add better error messages when the kernel tries to enable unsupported stuff.
2013-05-30mem: More descriptive DRAM config namesAndreas Hansson
This patch changes the class names of the variuos DRAM configurations to better reflect what memory they are based on. The speed and interface width is now part of the name, and also the alias that is used to select them on the command line. Some minor changes are done to the actual parameters, to better reflect the named configurations. As a result of these changes the regressions change slightly and the stats will be bumped in a separate patch.
2013-05-30mem: Add bytes per activate DRAM controller statAndreas Hansson
This patch adds a histogram to track how many bytes are accessed in an open row before it is closed. This metric is useful in characterising a workload and the efficiency of the DRAM scheduler. For example, a DDR3-1600 device requires 44 cycles (tRC) before it can activate another row in the same bank. For a x32 interface (8 bytes per cycle) that means 8 x 44 = 352 bytes must be transferred to hide the preparation time.
2013-05-30mem: Add static latency to the DRAM controllerAndreas Hansson
This patch adds a frontend and backend static latency to the DRAM controller by delaying the responses. Two parameters expressing the frontend and backend contributions in absolute time are added to the controller, and the appropriate latency is added to the responses when adding them to the (infinite) queued port for sending. For writes and reads that hit in the write buffer, only the frontend latency is added. For reads that are serviced by the DRAM, the static latency is the sum of the pipeline latencies of the entire frontend, backend and PHY. The default values are chosen based on having roughly 10 pipeline stages in total at 500 MHz. In the future, it would be sensible to make the controller use its clock and convert these latencies (and a few of the DRAM timings) to cycles.
2013-05-30mem: Spring cleaning of MSHR and MSHRQueueAndreas Hansson
This patch does some minor tidying up of the MSHR and MSHRQueue. The clean up started as part of some ad-hoc tracing and debugging, but seems worthwhile enough to go in as a separate patch. The highlights of the changes are reduced scoping (private) members where possible, avoiding redundant new/delete, and constructor initialisation to please static code analyzers.
2013-05-30mem: Fix MSHR print formatAndreas Hansson
This patch fixes an incorrect print format string by adding an additional string element.
2013-05-30cpu: Prune the stale TraceCPUAndreas Hansson
This patch prunes the TraceCPU as the code is stale and the functionality that it provided can now be achieved with the TrafficGen using its trace playback mode. The TraceCPU was able to play back pre-recorded memory traces of a few different formats, and to achieve this level of flexibility with the TrafficGen, use the util/encode_packet_trace (with suitable modifications) to create a protobuf trace off-line.
2013-05-30cpu: Check that minimum TrafficGen period is less than max periodSascha Bischoff
Add a check which ensures that the minumum period for the LINEAR and RANDOM traffic generator states is less than or equal to the maximum period. If the minimum period is greater than the maximum period a fatal is triggered.
2013-05-30cpu: Fix bug when reading in TrafficGen state transitionsSascha Bischoff
This patch fixes a bug with the traffic generator which occured when reading in the state transitions from the configuration file. Previously, the size of the vector which stored the transitions was used to get the size of the transitions matrix, rather than using the number of states. Therefore, if there were more transitions than states, i.e. some transitions has a probability of less than 1, then the traffic generator would fatal when trying to check the transitions. This issue has been addressed by using the number of input states, rather then the number of transitions.
2013-05-30cpu: Add request elasticity to the traffic generatorAndreas Hansson
This patch adds an optional request elasticity to the traffic generator, effectievly compensating for it in the case of the linear and random generators, and adding it in the case of the trace generator. The accounting is left with the top-level traffic generator, and the individual generators do the necessary math as part of determining the next packet tick. Note that in the linear and random generators we have to compensate for the blocked time to not be elastic, i.e. without this patch the aforementioned generators will slow down in the case of back-pressure.
2013-05-30cpu: Block traffic generator when requests have to retryAndreas Hansson
This patch changes the queued port for a conventional master port and stalls the traffic generator when requests are not immediately accepted. This is a first step to allowing elasticity in the injection of requests. The patch also adds stats for the sent packets and retries, and slightly changes how the nextPacketTick and getNextPacket interact. The advancing of the trace is now moved to getNextPacket and nextPacketTick is only responsible for answering the question when the next packet should be sent.
2013-05-30cpu: Move traffic generator sending out of generator statesAndreas Hansson
This patch moves the responsibility for sending packets out of the generator states and leaves it with the top-level traffic generator. The main aim of this patch is to enable a transition to non-queued ports, i.e. with send/retry flow control, and to do so it is much more convenient to not wrap the port interactions and instead leave it all local to the traffic generator. The generator states now only govern when they are ready to send something new, and the generation of the packets to send. They thus have no knowledge of the port that is used.
2013-05-30cpu: Fold together the StateGraph and the TrafficGenAndreas Hansson
This patch simplifies the object hierarchy of the traffic generator by getting rid of the StateGraph class and folding this functionality into the traffic generator itself. The main goal of this patch is to facilitate upcoming changes by reducing the number of affected layers.
2013-05-30mem: Make returning snoop responses occupy response layerAndreas Hansson
This patch introduces a mirrored internal snoop port to facilitate easy addition of flow control for the snoop responses that are turned into normal responses on their return. To perform this, the slave ports of the coherent bus are wrapped in internal master ports that are passed as the source ports to the response layer in question. As a result of this patch, there is more contention for the response resources, and as such system performance will decrease slightly. A consequence of the mirrored internal port is that the port the bus tells to retry (the internal one) and the port actually retrying (the mirrored) one are not the same. Thus, the existing check in tryTiming is not longer correct. In fact, the test is redundant as the layer is only in the retry state while calling sendRetry on the waiting port, and if the latter does not immediately call the bus then the retry state is left. Consequently the check is removed.
2013-05-30mem: Make the buses multi layeredAndreas Hansson
This patch makes the buses multi layered, and effectively creates a crossbar structure with distributed contention ports at the destination ports. Before this patch, a bus could have a single request, response and snoop response in flight at any time, and with these changes there can be as many requests as connected slaves (bus master ports), and as many responses as connected masters (bus slave ports). Together with address interleaving, this patch enables us to create high-throughput memory interconnects, e.g. 50+ GByte/s.