All eyes on Zynq SoC for smarter vision
Mike Santarini, Xilinx 5/16/2013 11:29 AM EDT
This article is republished from Issue #83 of the Xilinx Xcell Journal with the kind permission of Xilinx.
If you have seen a demonstration of Audi’s Automated Parking technology in which the car autonomously finds a parking spot and parks itself without a driver
– or if you have played an Xbox 360 game with its Kinect controller or even just bitten into a flawless piece of fruit from your local grocery store
– then you can count yourself as an eyewitness to the dawning of the era of smarter vision systems.
All manner of products, from the most sophisticated electronic systems down to the humble apple, are affected by smarter vision technologies.
And while today’s systems are impressive enough, some experts predict that in 10 years’ time, a vast majority of electronics systems
– from automotive to factory automation, medical, as well as surveillance, consumer, aerospace and defense
– will include smarter vision technologies with even more remarkable capabilities.
As smarter vision systems increase in complexity, we’ll very likely become passengers in autonomous automobiles flowing in networked highways.
Medical equipment such as Intuitive Surgical’s amazing robotic-assisted surgical system will advance even further and may enable surgeons to perform procedures from remote locations.
Television and telepresence will reach new levels of immersion and interactivity, while the content on screens in theaters, homes and stores will cater to each individual consumer’s interests, even our moods.
Xilinx All Programmable solutions for Smarter Vision are at the forefront of this revolution.
With the Zynq-7000 All Programmable SoC
– the first device to marry an ARM dual-core Cortex-A9 MPCore, programmable logic and key peripherals on a single chip
– as the foundation, Xilinx has fielded a supporting infrastructure of tools and IP that will play a pivotal role in enabling the development and faster delivery of these innovations in vision.
The supporting infrastructure includes Vivado HLS (high-level synthesis), the new IP Integrator tools, OpenCV (computer vision) libraries, SmartCORE IP and specialized development kits.
"Through Xilinx’s All Programmable Smarter Vision technologies, we are enabling our customers to pioneer the next generation of smarter vision systems," said Steve Glaser, senior vice president of corporate strategy and marketing at Xilinx. "Over the last decade, customers have leveraged our FPGAs to speed up functions that wouldn’t run fast enough in the processors they were using in their systems. With the Zynq-7000 All Programmable SoC, the processor and FPGA logic are on the same chip, which means developers now have a silicon platform ideally suited for smarter vision applications."
In support of the device, said Glaser, "We’ve complemented the Zynq-7000 All Programmable SoC with a robust development environment consisting of Vivado HLS, new IP Integrator tools, OpenCV libraries, SmartCORE IP and development kits.
With these Smarter Vision technologies, our customers will get a jump on their next design and be able to achieve new levels of efficiency, lower system power, increase system performance and drastically reduce the bill of materials – enriching and even saving lives while increasing profitability as these innovations roll out at an ever faster pace."
From dumb cameras to smarter vision
At the root of Smarter Vision systems is embedded vision. As defined by the rapidly growing industry group the Embedded Vision Alliance (www.embedded-vision.com), embedded vision is the merging of two technologies: embedded systems (any electronic system other than a computer that uses a processor) and computer vision (also sometimes referred to as machine vision).
Jeff Bier, founder of the Embedded Vision Alliance and CEO of consulting firm BDTI, said embedded vision technology has had a tremendous impact on several industries as the discipline has evolved beyond motorized pan-tilt-zoom analog camera-based systems. "We have all been living in the digital age for some time now, and we have seen embedded vision rapidly evolve from early digital systems that excelled in compressing, storing or enhancing the appearance of what cameras are looking at into today’s smarter embedded vision systems that are now able to know what they are looking at," said Bier.
Cutting-edge embedded vision systems not only enhance and analyze images, but also trigger actions based on those analyses. As such, the amount of processing and compute power, and the sophistication of the algorithms, have spiked dramatically. A case in point is the rapidly advancing market of surveillance.
Twenty years ago, surveillance systems vendors were in a race to provide the best lenses enhanced by mechanical systems that performed autofocus and tilting for a clearer and wider field of view. These systems were essentially analog video cameras connected via coaxial cables to analog monitors, coupled with video-recording devices monitored by security guards. The clarity, reliability and thus effectiveness of these systems were only as good as the quality of the optics and lenses, and the diligence of the security guards in monitoring what the cameras displayed.
With embedded vision technology, surveillance equipment companies began to use lower-cost cameras based on digital technology. This digital processing gave their systems extraordinary features that outclassed and underpriced analog and lens-based security systems. Fisheye lenses and embedded processing systems with various vision-centric algorithms dramatically enhanced the image the camera was producing. Techniques that correct for lighting conditions, improve focus, enhance color and digitally zoom in on areas of interest also eliminated the need for mechanical motor control to perform pan, tilt and zoom, improving system reliability.
Digital signal processing has enabled video resolution of 1080p and higher.
But a clearer image that can be manipulated through digital signal processing was just the beginning. With considerably more advanced pixel processing, surveillance system manufacturers began to create more sophisticated embedded vision systems that performed analytics in real time on the high-quality images their digital systems were capturing.
The earliest of these embedded vision systems had the capacity to detect particular colors, shapes and movement. This capability rapidly advanced to algorithms that detect whether something has crossed a virtual fence in a camera’s field of view; determine if the object in the image is in fact a human; and, through links to databases, even identify individuals.
The most advanced surveillance systems include analytics that track individuals of interest as they move through the field of view of the security network, even as they leave the field of view of one camera, move into a blind spot and then enter into the field of view of another camera in the surveillance network. Vision designers have programmed some of these systems to even detect unusual or suspicious movements. "Analytics is the biggest trend in the surveillance market today," said Mark Timmons, system architect in Xilinx’s Industrial, Scientific and Medical (ISM) group. "It can account for human error and even take away the need for diligent human viewing and decision making. As you can imagine, surveillance in crowded environments such as train stations and sporting events can become extremely difficult, so having analytics that can spot dangerous overcrowding conditions or track individuals displaying suspicious behavior, perhaps radical movements, is very advantageous."
To further enhance this analysis and increase the effectiveness of these systems, surveillance and many other markets leveraging smarter vision are increasingly using "fusion" architectures that combine cameras with other sensing technologies such as thermal vision, radar, sonar and LIDAR (Light/Laser Detection and Ranging). In this way, the systems can enable night vision; detect thermal/heat signatures; or pick up objects not captured by or visible to the camera alone. This capability drastically reduces false detections and in turn allows for much more precise analytics. Needless to say, the added complexity of fusing the technologies and then analyzing that data requires ever more analytic-processing horsepower.
Timmons said that another megatrend in this market is products that perform all these forms of complex analysis "at the edge" of a surveillance system network – that is, within each camera – rather than having each camera transmit its data to a central mainframe system, which then performs a more refined analysis from these multiple feeds. Localized analytics adds resilience to the overall security system, makes each point in the system much faster and more accurate in detection, and thus can warn security operators sooner if indeed a camera spots a valid threat.
Localized analytics means that each unit not only requires greater processing horsepower to enhance and analyze what it is seeing, but must also be compact and yet incorporate highly integrated electronics. And because each unit must be able to communicate reliably with the rest of the network, it must also integrate electronic communication capabilities, adding further compute complexity. Increasingly, these surveillance units are connected via a wireless network as part of a larger surveillance system. And increasingly, these surveillance systems are becoming part of larger enterprise networks or even larger, global networks, like the U.S. military’s Global Information Grid (see cover story, Xcell Journal issue 69).
This high degree of sophistication is being employed in the military-and-defense market in everything from foot soldier helmets to defense satellites networked to central command centers. What’s perhaps more remarkable is how fast smarter vision technology is moving into other markets to enhance quality of life and safety.
Smarter vision for the perfect apple
Take, for example, an apple.
Ever wonder how an apple makes it to your grocery store in such good condition?
Giulio Corradi, an architect in Xilinx’s ISM group, said that food companies are using ever-smarter vision systems in food inspection lines to, for example, sort the bad apples from the good ones. Corradi said first-generation embedded vision systems deployed on high-speed food inspection lines typically used a camera or perhaps several cameras to spot surface defects in apples or other produce.
If the embedded vision system spotted an unusual color, the apple would be marked/sorted for further inspection or thrown away.
Beneath the skin
But what happens if, at some point before that, the fruit was dropped but the damage wasn’t visible? "In some cases, damage that resulted from a drop may not be easily spotted by a camera, let alone by the human eye," said Corradi. "The damage may actually be in the flesh of the apple. So some smarter vision systems fuse an infrared sensor with the cameras to detect the damage beneath the surface of the apple’s skin. Finding a bruised fruit triggers a mechanical sorter to pull the apple off the line before it gets packed for the grocery store." If the damaged apple had passed by without the smarter fusion vision system, the damage would likely become apparent by the time it was displayed on the grocery store shelves; the fruit would probably have to be thrown away. One rotten apple can, of course, spoil the bunch.
Analytics can also help a food company determine if the bruised apple is in good enough condition to divert to a new line, in which another smarter vision system can tell if it is suitable for some other purpose – to make applesauce, dried fruit or, if it is too far gone, best suited for composting.
Factory floors are another site for smarter vision, Corradi said. A growing number use robotic-assisted technologies or completely automated robotic lines that manufacturers can retool for different tasks. The traditional safety cages around the robots are too restrictive (or too small) to accommodate the range of movement required to manufacture changing product lines.
So to protect workers while not restricting the range of motion of automated factory lines, companies are employing smarter vision to create safety systems. Cameras and lasers erect "virtual fences or barriers" that audibly warn workers (and safety monitor personnel) if someone is getting too close to the factory line given the product being manufactured. Some installations include a multiphase virtual barrier system that will send an audible warning as someone crosses an outer barrier, and shut down the entire line automatically if the individual crosses a second barrier that is closer to the robot, preventing injury. Bier of the Embedded Vision Alliance notes that this type of virtual barrier technology has wide applicability.
"It can have a tremendous impact in reducing the number of accidents in factories, but why not also have virtual barriers in amusement parks, or at our homes around swimming pools or on cars?" said Bier. "I think we’ll see a lot more virtual barrier systems in our daily lives very soon."
Smarter vision for better driving
Automotive is another market that is fully embracing smarter vision to create a less stressful and safer driving experience. Paul Zoratti, a systems architect within Xilinx Automotive, said that advanced driver assistance systems (ADAS) are all about using remote sensing technologies, including smarter vision, to assist drivers (see cover story, Xcell Journal issue 66).
Each year over the past decade, automakers have unveiled ever-more impressive DA features in their luxury lines, while also making a growing number of driver assistance features available in their sport and standard product lines. Many of these features
– such as blind-spot detection, lane change assist, pedestrian and sign detection
– send drivers a warning if they sense a potentially dangerous situation.
Recent offerings from auto manufacturers include even more advanced systems such as automatic emergency braking and lane keeping, which not only monitor the vehicle environment for potential problems but assist the driver in taking corrective actions to avoid accidents or decrease their severity.
Zoratti said that some new-model cars today are outfitted with four cameras
– located on the sides, front, and rear of the vehicle
– that offer a continuous 360-degree view of the vehicle’s surroundings.
While the first-generation surround-view systems are using those cameras to provide an image to the driver, future systems will bundle additional DA features. Using the same four cameras and image-processing analytics, these next-generation systems will simultaneously generate a bird’s eye view of the vehicle and also warn of potential danger such as the presence of a pedestrian. Furthermore, while the vehicle is traveling at higher speeds, the automobile will use the cameras on the side and rear of the vehicle for blind-spot detection, lane change assistance and lane departure warning.
Adding another, forward-looking camera behind the windshield will support traffic sign recognition and forward-collision warning features.
Finally, when the driver reaches his or her destination and activates automated parking, the system will employ those same cameras along with other sensors to help the car semi-autonomously maneuver into a parking spot.
Handling all these tasks in real time requires a tremendous amount of processing power that is well-suited for parallel hardware computation, Zoratti said.
That’s why many early DA systems paired standalone microprocessors with FPGAs, with the FPGA handling most of the parallel computations and microprocessors performing the serial decision making.
The cost pressures in automotive are driving the analytics to be performed in a central compute hub, rather than at each camera as is the case in other markets, such as surveillance.
In this way, carmakers can minimize the cost of each camera sensor and, ultimately, the cost of the entire system.
That means the processing platform in the central unit needs to deliver very high performance and bandwidth to support the simultaneous processing of four, five or even six real-time video inputs.
A smarter vision for longer lives
Another area where smarter vision is making a dramatic difference is in the medical electronics industry, which uses smarter vision technology in a wide range of medical imaging systems, from endoscopes and imaging scanners (CT, MRI, etc.) to robotic-surgical systems such as Intuitive Surgical’s Da Vinci, detailed in Xcell Journal issue 77.
Da Vinci’s sophisticated 3D vision system allows surgeons to guide robotic surgical instruments with extreme precision, fluidity and tactile sensitivity to perform a number of delicate, intricate surgical procedures.
With every generation of system, surgeons are able to perform a greater number and variety of surgeries, helping to ensure better patient outcomes and faster recovery times.
The degree of technological sophistication to control and coordinate these procedures is remarkable and is heavily reliant on the combined horsepower of processing and logic.
Each generation of newer technology will thus benefit from greater integration between processor and logic.
A smarter vision for an immersive experience
Smarter vision is also making great strides in keeping us connected.
If you work in a modern office building, chances are your company has at least one conference room with an advanced telepresence conferencing system that not only allows you to talk to others around the world, but also lets you see them as though they were there in person.
These videoconferencing systems are increasing in sophistication to the point that they can sense who at a table or conference is speaking, and automatically turn and zoom in on that person, displaying him or her in ever-higher-quality immersive video.
Ben Runyan, director of the broadcast and consumer segment marketing at Xilinx, said that companies developing telepresence technologies are seeking ways to create a more immersive experience for users.
"The goal is to make users feel like they are in the same room when in fact they could be on the other side of the globe," said Runyan.
"To do this requires state-of-the-art cameras and display technologies, which require advanced image processing.
As these technologies become more advanced and deliver a more immersive experience, it will make collaboration easier and make companies more productive while cutting down the need, and thus expense, to travel."
Xilinx: All-Programmable for smarter vision
To enable smarter vision to progress on all fronts rapidly and to reach new markets requires an extremely flexible processing platform, a rich set of resources and a viable ecosystem dedicated to smarter vision.
Xilinx devices have played a key role in helping companies innovate these vision systems over the last decade. Today, after five years in development, Xilinx is delivering a holistic solution that will help developers of smarter vision applications to quickly deliver the next generation of innovations.
For more than a decade, embedded vision designers have leveraged the programmability, parallel computing and fast I/O capabilities of Xilinx FPGAs in a vast number of embedded vision systems.
Traditionally, designers have used FPGAs to speed up functions that were slowing down the main processor in their systems, or were using the FPGA to run parallel computing tasks that processors simply could not perform.
Now, with the Zynq-7000 All Programmable SoC, embedded vision developers have a fully programmable device that is ideally suited for developing the next generation of smarter vision applications.
"Smarter vision can be implemented in separate processors and FPGAs communicating on the same board, but what the Zynq SoC delivers is a level of integration that the electronics industry didn’t have before," said Jose Alvarez, engineering director of video technology at Xilinx.
"Now, instead of interchanging information between the main intelligent processor and FPGA logic at board speeds, we can do it at silicon speeds through 3,000 high-performance connections between the processor and logic on the same chip."
Figure 1 reveals the benefits of the Zynq SoC over a traditional multicamera, multichip architecture in the creation of a multifeature automotive driver assistance system.
Using one set of cameras connected to one Zynq SoC, the Xilinx architecture (bottom left in the graphic) can enable feature bundles such as blind-spot detection, 360-degree surround view, lane departure warning and pedestrian detection.
In comparison, existing multifeature DA systems require multiple chips and multiple cameras, which complicates integration, adversely affects performance and system power consumption, and leads to higher BOM costs.
Figure 1. Zynq All Programmable SoC vs. multichip,
multiple-camera systems in driver assistance applications.
(Click Here to see a larger, more detailed version of this image)
A few silicon vendors offer ASSPs that pair ARM processors with DSPs or with GPUs, but those devices tend to be too rigid or provide insufficient compute performance for many of today’s smarter vision applications. Often, solutions based on these devices require the addition of standalone FPGAs to fix these deficiencies.
Programmability and performance
The Zynq SoC’s programmability and performance provide key advantages over GPU- and DSP-centric SoCs. The ARM processing system is software programmable; the FPGA logic is programmable via HDLs or C++; and even the I/O is fully programmable.
As a result, customers can create extremely high-performance smarter vision systems suited to their specific applications, and differentiate their systems from those offered by competitors.
Figure 2 details a generic signal flow of a smarter vision system and shows how the Zynq SoC stacks up against ARM-plus-DSP and ARM-plus-GPU-based ASSPs.
The first signal-processing block in the flow (shown in green) is the input that connects the device to a camera sensor.
In the Zynq SoC, developers can accommodate a wide range of I/O signals to conform to whatever camera connectivity their customers require.
The next signal-processing block performs pixel-level processing or video processing (depending on whether the application is for image processing or display).
The next block performs analytics on the image, a compute-intensive process that often requires parallel computing best implemented in FPGA logic.
The subsequent three blocks (in red) are where the processing system derives metadata results from the analytics, creates a graphic representation of the result (the graphics step) and encodes results for transmission.
Figure 2. Generic video-processing and image-processing system flow.
(Click Here to see a larger, more detailed version of this image)
In the Zynq SoC, the processing subsystem and FPGA logic work together.
When compression is required, the appropriate codec can be readily implemented in FPGA logic. Then, in the final signal-processing block (labeled "Output"), the Zynq SoC’s programmable I/O allows developers to target a vast number of communication protocols and video transport standards, whether they be proprietary, market specific or industry-standard IP protocols.
In comparison, in both DSP- and GPU-centric SoCs, developers run the risk of developing algorithms that require performance not achievable with the DSP or GPU sections of these ASSPs. They will often have to make up for this deficiency by adding a standalone FPGA to their systems.
While the Zynq SoC is clearly the best silicon choice for smarter vision systems, Xilinx realized early in the device’s development that programming needed to be streamlined, especially for designers who are more accustomed to C and C++ based development of vision algorithms.
To this end, in June 2012 Xilinx delivered to customers a state-of-the-art software environment called the Vivado Design Suite that includes, among other technologies, best-in-class high-level synthesis technology that the company gained in its January 2011 acquisition of AutoESL.
Vivado HLS is particularly well-suited to embedded vision applications.
If, for example, vision developers using the Zynq SoC have created an algorithm in C or C++ that doesn’t run fast enough or is overburdening the processing system, they can send their C algorithms to Vivado HLS and synthesize the algorithms into Verilog or VHDL to run in the FPGA logic on the device.
This frees up the processing subsystem on the Zynq SoC to handle tasks it is better suited to run, and thus speeds up the overall system performance.
Xilinx has also rounded out its Smarter Vision technology offering by releasing its OpenCV (computer vision) library.
OpenCV is an industry-standard, open-source library of algorithms from OpenCV.org that embedded vision developers use to quickly create vision systems.
Embedded vision developers across the world actively contribute new algorithms to the library, which now contains more than 2,500 algorithms written in C, C++, Java and Python. Algorithms in the library range in complexity from simple functions such as image filters to more advanced functions for analytics such as motion detection.
Alvarez said that these OpenCV algorithms target implementation in just about any commercial microprocessor and DSP. Because the Zynq SoC uses an ARM processing system, users can implement these algorithms, written in C++, in its processor portion.
Thanks to Vivado HLS, said Alvarez, users can also take these algorithms written in C or C++, modify function calls from OpenCV to HLS and then, using Vivado HLS, synthesize or compile the algorithms into RTL code optimized for implementation in the logic portion of the Zynq-7000 SoC.
Having OpenCV in the Vivado environment allows smarter vision architects to easily compare and contrast whether a given algorithm in their design will run most optimally in the processor or FPGA logic portion of the Zynq-7000 All Programmable SoC. With the release of Xilinx’s Open Source library, Xilinx has essentially given customers a head start.
Using Vivado HLS, Xilinx has already compiled more than 30 of the most used embedded vision algorithms from the OpenCV library. Customers can quickly make processor vs. logic trade-offs at the systems level and run them immediately in the Zynq-7000 All Programmable SoC to derive the optimal system for their given application.
Xilinx and its Alliance members will actively migrate more functions from the OpenCV library on an ongoing basis, making them available to Xilinx’s user base quarterly. Because developers can run OpenCV libraries on just about any commercial processor, vision designers will be able to compare and even benchmark the performance of algorithms running on various silicon devices.
As part of its Smarter Vision initiative, Xilinx has also created an intellectual property (IP) suite called SmartCORE IP, addressing smarter vision requirements from across the many market segments that will design smarter vision into their next-generation products. Customers can implement cores from the SmartCORE IP suite and algorithms from the OpenCV library into their designs quickly using Xilinx’s newly introduced IP Integrator tool. The new tool is a modern plug-and-play IP environment that allows users to work in schematics or, if they prefer, a command-line environment.
Targeted platform aware
Alvarez said that since the Vivado Design Suite’s inception, Xilinx architected the suite to be device aware, so as to take full advantage of each device’s capabilities.
Alvarez said that thanks to IP Integrator, the Vivado Design suite is not only device aware but now targeted platform aware as well – supporting all Zynq SoC and 7 series FPGA boards and kits. Being target platform aware means that the Vivado Design Suite will configure and apply board-specific design rule checks, which ensures rapid bring-up of working systems.
For example, when a designer selects the Xilinx Zynq-7000 SoC Video and Imaging Kit, and instantiates a Zynq SoC processing system within IP Integrator, Vivado Design Suite preconfigures the processing system with the correct peripherals, drivers and memory map to support the board.
Embedded design teams can now more rapidly identify, reuse and integrate both software and hardware IP, targeting the dual-core ARM processing system and high-performance FPGA logic.
Users specify the interface between the processing system and their logic with a series of dialog boxes.
IP Integrator then automatically generates RTL and optimizes it for performance or area. Then users can add their own custom logic or use the Vivado IP catalog to complete their designs.
It’s remarkable to see what smarter vision systems Xilinx customers have created to date with Xilinx FPGAs.
The advent of the Zynq-7000 All Programmable SoC and the powerful Smarter Vision environment guarantees that the next crop of products will be even more amazing.
About the author
Mike Santarini is the publisher of the Xilinx Xcell Journal. Mike can be contacted at [email protected]