What's new

Asus TinkerBoard

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

The Tinker is getting better SW support over time - Armbian stays fairly close to current (I think they're on 4.14). The upside with Tinker is that it is Asus, which may be important to some...

Go with Armbian on the TinkerBoard -- https://www.armbian.com/tinkerboard/

They're staying on top of the Rockchip, upstream, and it just generally works....

Code:
 _____ _       _             _                         _
|_   _(_)_ __ | | _____ _ __| |__   ___   __ _ _ __ __| |
  | | | | '_ \| |/ / _ \ '__| '_ \ / _ \ / _` | '__/ _` |
  | | | | | | |   <  __/ |  | |_) | (_) | (_| | | | (_| |
  |_| |_|_| |_|_|\_\___|_|  |_.__/ \___/ \__,_|_|  \__,_|
                                                      

Welcome to ARMBIAN 5.59 stable Ubuntu 18.04.1 LTS 4.14.67-rockchip
System load:   0.00 0.02 0.01  Up time:       24 min
Memory usage:  9 % of 2005MB IP:            192.168.1.121
CPU temp:      63°C        
Usage of /:    5% of 30G
 
Go with Armbian on the TinkerBoard -- https://www.armbian.com/tinkerboard/

They're staying on top of the Rockchip, upstream, and it just generally works....

I like the armbian team. I should find time to say hello to them.

Having spent some time on SBC, I feel like jumping to a conclusion that none of the 3rd party distro's has optimal memory settings. My board vendor is doing a bit better but beyond their supplied image, performance degrades, especially once going mainline kernel.

Below is my homebrew distribution, mainly based on Arch Linux. I haven't seen any better performance data online, not even from my vendor. I should find time to open up my distro to FOSS community.

KMUGdfR.png
 
I like the armbian team. I should find time to say hello to them.

Having spent some time on SBC, I feel like jumping to a conclusion that none of the 3rd party distro's has optimal memory settings. My board vendor is doing a bit better but beyond their supplied image, performance degrades, especially once going mainline kernel.

Below is my homebrew distribution, mainly based on Arch Linux. I haven't seen any better performance data online, not even from my vendor. I should find time to open up my distro to FOSS community.

They do a decent job - and they're getting their head wrapped around the new chip on your board...

If you start collaborating with the Armbian folks - the best contrib would be probably in the realm of uboot and the device tree stuff...

FWIW - care to share the rest of the benchmark on tinymembench?

Here's the TinkerBoard on current Armbian...

https://pastebin.com/Bm6ZusKB

Part of the challenge with the TinkerBoard is that it's very aggressive at throttling, and it runs pretty warm, even at idle. I need to find a better thermal solution there - the stock heat sink it holding it back...
 
Part of the challenge with the TinkerBoard is that it's very aggressive at throttling, and it runs pretty warm, even at idle. I need to find a better thermal solution there - the stock heat sink it holding it back...

Anyways - some byte-unixbench numbers - I think the Tinkerboard is thermally challenged under a heavy workload...

Pi3B+ isn't throttling as far as I can tell, the Tinker is bumping up against thermals here, and limits itself...

Code:
Tinker on Armbian

------------------------------------------------------------------------
Benchmark Run: Sat Sep 15 2018 14:52:50 - 15:20:51
4 CPUs in system; running 1 parallel copy of tests

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   11497736.5    985.2
Double-Precision Whetstone                       55.0       1648.0    299.6
Execl Throughput                                 43.0       1591.8    370.2
File Copy 1024 bufsize 2000 maxblocks          3960.0     141548.4    357.4
File Copy 256 bufsize 500 maxblocks            1655.0      42321.0    255.7
File Copy 4096 bufsize 8000 maxblocks          5800.0     365313.7    629.9
Pipe Throughput                               12440.0     245091.5    197.0
Pipe-based Context Switching                   4000.0      52519.3    131.3
Process Creation                                126.0       2811.9    223.2
Shell Scripts (1 concurrent)                     42.4       2979.0    702.6
Shell Scripts (8 concurrent)                      6.0        820.8   1368.0
System Call Overhead                          15000.0     663250.9    442.2
                                                                   ========
System Benchmarks Index Score                                         397.2

------------------------------------------------------------------------
Benchmark Run: Sat Sep 15 2018 15:20:51 - 15:49:42
4 CPUs in system; running 4 parallel copies of tests

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   23644142.4   2026.1
Double-Precision Whetstone                       55.0       3994.6    726.3
Execl Throughput                                 43.0       3267.2    759.8
File Copy 1024 bufsize 2000 maxblocks          3960.0     159748.6    403.4
File Copy 256 bufsize 500 maxblocks            1655.0      43267.3    261.4
File Copy 4096 bufsize 8000 maxblocks          5800.0     505426.9    871.4
Pipe Throughput                               12440.0     590428.1    474.6
Pipe-based Context Switching                   4000.0     155592.2    389.0
Process Creation                                126.0       6727.1    533.9
Shell Scripts (1 concurrent)                     42.4       5702.9   1345.0
Shell Scripts (8 concurrent)                      6.0        731.5   1219.2
System Call Overhead                          15000.0    1763473.5   1175.6
                                                                   ========
System Benchmarks Index Score                                         720.4

Rpi3 on Raspbian

Benchmark Run: Sat Sep 15 2018 14:52:58 - 15:21:01
4 CPUs in system; running 1 parallel copy of tests

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    4640107.5    397.6
Double-Precision Whetstone                       55.0       1189.1    216.2
Execl Throughput                                 43.0        952.8    221.6
File Copy 1024 bufsize 2000 maxblocks          3960.0     154128.0    389.2
File Copy 256 bufsize 500 maxblocks            1655.0      45153.5    272.8
File Copy 4096 bufsize 8000 maxblocks          5800.0     375447.9    647.3
Pipe Throughput                               12440.0     323826.2    260.3
Pipe-based Context Switching                   4000.0      54346.7    135.9
Process Creation                                126.0       2336.4    185.4
Shell Scripts (1 concurrent)                     42.4       1832.0    432.1
Shell Scripts (8 concurrent)                      6.0        577.7    962.8
System Call Overhead                          15000.0     663497.8    442.3
                                                                   ========
System Benchmarks Index Score                                         328.5

------------------------------------------------------------------------
Benchmark Run: Sat Sep 15 2018 15:21:01 - 15:49:07
4 CPUs in system; running 4 parallel copies of tests

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   17276440.4   1480.4
Double-Precision Whetstone                       55.0       4216.0    766.5
Execl Throughput                                 43.0       2274.2    528.9
File Copy 1024 bufsize 2000 maxblocks          3960.0     238696.6    602.8
File Copy 256 bufsize 500 maxblocks            1655.0      65352.9    394.9
File Copy 4096 bufsize 8000 maxblocks          5800.0     594693.6   1025.3
Pipe Throughput                               12440.0    1174283.6    944.0
Pipe-based Context Switching                   4000.0     236455.0    591.1
Process Creation                                126.0       4742.4    376.4
Shell Scripts (1 concurrent)                     42.4       4473.8   1055.1
Shell Scripts (8 concurrent)                      6.0        618.0   1029.9
System Call Overhead                          15000.0    2325050.9   1550.0
                                                                   ========
System Benchmarks Index Score                                         781.4
 
FWIW - care to share the rest of the benchmark on tinymembench?

Here's the TinkerBoard on current Armbian...

https://pastebin.com/Bm6ZusKB

Here you go: rk3328 on renegade
https://pastebin.com/raw/byugVQw4

Note that it's less meaningful comparing across different SoC. One advantage in rk3288 is dual channel DRAM that rk3328 doesn't have. But my board uses DDR4 and there is some advantage over Pine64's Rock64 that uses DDR3..

Code:
[root@alarm ~]# lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
Vendor ID:           ARM
Model:               4
Model name:          Cortex-A53
Stepping:            r0p4
CPU max MHz:         1416.0000
CPU min MHz:         408.0000
BogoMIPS:            48.00
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32
 
One advantage in rk3288 is dual channel DRAM that rk3328 doesn't have. But my board uses DDR4 and there is some advantage over Pine64's Rock64 that uses DDR3..

Honestly, I think the RK3328 is probably better in this form factor... bit more power efficient, and not as thermally challenged, and the USB3 port is a plus...
 
Honestly, I think the RK3328 is probably better in this form factor... bit more power efficient, and not as thermally challenged, and the USB3 port is a plus...

You're spot on. rk3328 also has non-trivial throttling. Maybe rk3288 throttles a bit more since it has higher clocks. I saw the same on mine when pushing four cores to the limit. I would assert everyone is about the same on this form factor. So some people install a fan but that breaks the beautify of this FF.

On my build, I push up the thermal threshold by 5 degree C at 75. It delays throttle process and give it a bit more time to stay closer to 85 degree C where throttle starts. This is like black magic and keeps the SoC spending more time between 80 and 85 C and overall better performance. I saw the Rock64 kernel is very aggressive on this but their numbers don't work out nicely for my board. Perhaps give it a try and find the sweet spot in your DTS file.
 
You're spot on. rk3328 also has non-trivial throttling. Maybe rk3288 throttles a bit more since it has higher clocks. I saw the same on mine when pushing four cores to the limit. I would assert everyone is about the same on this form factor. So some people install a fan but that breaks the beautify of this FF.

On my build, I push up the thermal threshold by 5 degree C at 75. It delays throttle process and give it a bit more time to stay closer to 85 degree C where throttle starts. This is like black magic and keeps the SoC spending more time between 80 and 85 C and overall better performance. I saw the Rock64 kernel is very aggressive on this but their numbers don't work out nicely for my board. Perhaps give it a try and find the sweet spot in your DTS file.

The 3288 is just overkill in this form factor - maybe for Chromebook/Chromebox or Android Set Top box where one can have a better thermal solution... as it stands now, once the throttling starts, even at 600MHz it can't shed heat fast enough, and basically gets heat soaked... Part of the challenge is that all four cores are locked together in the time domain - so if one core turbos up to max speed, they all do, even if they're idle... and being big cores (like the A9/A15), look at the thermal solutions on most Router/AP's like Asus, Netgear, Linksys and the like - big honking heatsinks there.

Going to a bigger heat sink on the Tinker - one starts running into clearance issues everywhere on the board - between the GPIO connector, the display and camera connectors, and at least one capacitor, there's just not a lot of room for a passive solution - so active cooling with a fan is really the only option, or just live with the throttling on this board.

I've done some prelim exploration on the DFVS settings, but the Armbian folks have sorted out what is probably the best compromise for the 3288 - even there though - if one sets the governor to "performance", it idles at 70C with the stock 1.8GHz clocks... "ondemand" helps out there, and no real impact to performance.

From cpufreq-info (part of cpufrequtils package) - here's the armbian curves for the Tinkerboard - I'll have to check what they're doing with the MiQi board, but I suspect it's similar..

Code:
driver: cpufreq-dt
  CPUs which run at the same hardware frequency: 0 1 2 3
  CPUs which need to have their frequency coordinated by software: 0 1 2 3
  maximum transition latency: 136 us.
  hardware limits: 600 MHz - 1.80 GHz
  available frequency steps: 600 MHz, 816 MHz, 1.01 GHz, 1.20 GHz, 1.42 GHz, 1.51 GHz, 1.61 GHz, 1.70 GHz, 1.80 GHz

Changing topic a bit - have you had a chance to play around with the crypto stuff on the 3328 in ARMv8 mode?

The 3328 does support the Cortex-A53 crypto extensions (unlike the Pi3 line), so might be interesting to compare the ARM stuff against the dedicated crypto blocks on the 3328.
 
Maybe rk3288 throttles a bit more since it has higher clocks.

Throw a load on it... watch how fast it throttles...

openssl speed -multi 4

Code:
Time        CPU    load %cpu %sys %usr %nice %io %irq   CPU  C.St.
14:57:24:  600MHz  1.00   0%   0%   0%   0%   0%   0% 68.1°C  0/14
14:57:30: 1800MHz  0.92  18%   0%  18%   0%   0%   0% 75.8°C  12/14
14:57:38:  816MHz  1.47 100%   0%  99%   0%   0%   0% 72.5°C  11/14
14:57:45: 1008MHz  1.67 100%   0%  99%   0%   0%   0% 72.1°C  13/14
14:57:54: 1200MHz  2.10 100%   0%  99%   0%   0%   0% 72.1°C  12/14
14:58:02: 1008MHz  2.47 100%   0%  99%   0%   0%   0% 70.8°C  12/14
14:58:11:  816MHz  2.59 100%   0%  99%   0%   0%   0% 72.1°C  12/14
14:58:19:  816MHz  2.88 100%   0%  99%   0%   0%   0% 71.2°C  13/14
14:58:28:  600MHz  3.13 100%   0%  99%   0%   0%   0% 72.1°C  14/14
14:58:38:  600MHz  3.34 100%   0%  99%   0%   0%   0% 71.7°C  13/14
14:58:47:  600MHz  3.51 100%   0%  99%   0%   0%   0% 71.7°C  14/14
 
Part of the challenge is that all four cores are locked together in the time domain - so if one core turbos up to max speed, they all do, even if they're idle...

This sounds interesting I wonder if rk3328 is similar. Probably not because I see a noticeable temperature difference with respect to number of cores pushed to the limit.

look at the thermal solutions on most Router/AP's like Asus, Netgear, Linksys and the like - big honking heatsinks there.

I had some thought about those big heatsinks. If there are some analytics, would be nice to read. I meant I doubt vendors do precise math to get it working. More likely they stamp one as big as they can go. Since routers are mostly idle, it works. When you push to the limit e.g. in 86U, I suspect it throttles without reporting to users.

Systems like 86U being two cores alleviates the problem a lot. rk3328 for example have no issues with one or two cores at full speed. It's when all four cores at full load. A big big heatsink isn't going to help in such case. The heat dissipation at ambient room temperature with stale airflow isn't fast enough.

I quickly checked rk3288 dts file for you. Thermal threasholds are already 75/85/115. Exactly what I'm using for rk3328. Not that it's comparable across SoC but I found the coincidence a bit surprising. Perhaps given the longer time in market, it's already at better defaults for rk3288.

I should re-visit some of your other points on the freqs. Too long in a single reply..

Changing topic a bit - have you had a chance to play around with the crypto stuff on the 3328 in ARMv8 mode?

The 3328 does support the Cortex-A53 crypto extensions (unlike the Pi3 line), so might be interesting to compare the ARM stuff against the dedicated crypto blocks on the 3328.

HW crypto is something I thought I could contribute but turns out useless in practice as performance is much worse than ARMv8 crypto extension. I actually borrowed the driver from rk3288. I won't recommend people to try for rk3328. Perhaps worth something on rk3288.

The driver works and you could get it from mainline kernel if it isn't in Armbian. I believe the crypto block either isn't defined in DTS or disabled. That's the case for rk3328. To re-create the definition in DTS is a bit black magic and requires a bit guess work. Overall I think it isn't worth the time but a good learning experience.

My last hope is on the HW RNG that sadly is still lacking a driver!
 
This sounds interesting I wonder if rk3328 is similar. Probably not because I see a noticeable temperature difference with respect to number of cores pushed to the limit.

Reading thru the RK3288 docs - there's actually six power domains.

Each core has it's own PD, the cache has one, and then the package.. looks like currently they're only using the package one, which locks all cores to the same timeline....

Code:
Six separate power domains for every core to support internal power switch and
externally turn on/off based on different application scenario

* PD_A17_0: 1st Cortex-A17 + Neon + FPU + L1 I/D Cache
* PD_A17_1: 2nd Cortex-A17 + Neon + FPU + L1 I/D Cache
* PD_A17_2: 3rd Cortex-A17 + Neon + FPU + L1 I/D Cache
* PD_A17_3: 4th Cortex-A17 + Neon + FPU + L1 I/D Cache
* PD_SCU: SCU + L2 Cache controller, and including PD_A17_0, PD_A17_1, PD_A17_2, PD_A17_3, debug logic

One isolated voltage domain to support DVFS

Sent a clinker over to the Armbian forum asking about this... not expecting a firm answer there...
 
Last edited:
HW crypto is something I thought I could contribute but turns out useless in practice as performance is much worse than ARMv8 crypto extension. I actually borrowed the driver from rk3288. I won't recommend people to try for rk3328. Perhaps worth something on rk3288.

The driver works and you could get it from mainline kernel if it isn't in Armbian. I believe the crypto block either isn't defined in DTS or disabled. That's the case for rk3328. To re-create the definition in DTS is a bit black magic and requires a bit guess work. Overall I think it isn't worth the time but a good learning experience.

There's been a couple of issues raised about the rk_crypto and whether it actually works or not over on github... it's not just device tree, but the stock kernel doesn't have it defined - one has to a custom kernel to even try...

Haven't really looked at hwrng on the rockchip - most times, I just use haveged, works well enough these days to keep the entropy pool full enough on headless configs.

haveged out performs Broadcom's hwrng on the pi's, that much I can suggest when rngd is configured to use the hwrng there
 
HW crypto is something I thought I could contribute but turns out useless in practice as performance is much worse than ARMv8 crypto extension. I actually borrowed the driver from rk3288. I won't recommend people to try for rk3328. Perhaps worth something on rk3288.

the rk_crypto module is built on armbian, but it's hard to test as the userland doesn't have cryptodev built (if one is looking at openssl) - I've seen reports of some interesting numbers, but not to the scale of other ARM or X68 chips I've looked at in the past.

The thermal issues compound the complexity when trying to even get a baseline - half a mind to just lock the chip to a supportable clock - let's say 600MHz... which kind of defeats the purpose.

Anyways - wondering if I should go down that rat-hole of tuning this board when I have other things to do...

The armbian folks have done a decent job already.

Brings to mind - one can tune a board, but one cannot tuna fish...
 
Part of the challenge is that all four cores are locked together in the time domain - so if one core turbos up to max speed, they all do, even if they're idle...

I was pondering this while working on something else last night. In retrospect, rk3328 should be similar. When one core's frequency changes, the other three follow. So they change together.

I've a lot of things to talk about from the adventure that I went through in the short few weeks. I'll come back to this thread..
 
I was pondering this while working on something else last night. In retrospect, rk3328 should be similar. When one core's frequency changes, the other three follow. So they change together.

Take a look at the mainline device tree for the rk's - some input from the Rockchip engineers direct, along with check in's from the Chromium team -- there's a fair number of Chrome Things running the rk3288, along with the Mali GPU...

Just gets a bit complicated where one starts with the uBoot and carry over into the device tree options and kernel/app tuning...

I've got a little Allwinner H3 IoT device enroute from Shenzhen - something tells me it'll be just as painful... it's for another project that is actually related to the main site here...

OLED-NEO_en_01.jpg


Nothing against Rockchip or Allwinner - but I've had better luck working with TI/Freescale/Qualcomm/Marvell, and even Broadcom - better docs, better BSP's, better pizza..
 
There's been a couple of issues raised about the rk_crypto and whether it actually works or not over on github... it's not just device tree, but the stock kernel doesn't have it defined - one has to a custom kernel to even try...

rk_crypto is the right driver. I borrowed from there and it works for me. I just went through rk3288 dts file in the 4.4.y rock chip kernel. The crypto block is already there. Perhaps it's missing in 4.14.x kernel used in armbian build. After you add that the binding should work.

I would suggest having kernel built and module successfully loaded. Then proceed with test from user space. Many things could go wrong if people attempt to test from openssl util. I think it worth a shot for rk3288. As I mention earlier, don't waste time for rk3328 because the performance is very poor when compared to armv8 built-in crypto. It was like 400MB/s vs 40MB/s on aes-128-cbc..
 
haveged out performs Broadcom's hwrng on the pi's, that much I can suggest when rngd is configured to use the hwrng there

rk3328 data sheet says there is "true rng" and I guess I just want to see how it works out. Rockchip didn't make a huge noise about the hw crypto on newer chips. Perhaps that says something already about its performance as well as the hw rng maybe.

there's a fair number of Chrome Things running the rk3288, along with the Mali GPU...

Have you got chance to test rock chip's MPP/VSP/xyz..all those media stuff? Would be interesting to hear. I have them compiled but haven't had time to try it out. Mainly because my SBC is headless. I would want to have them working for batch mode encoding/decoding if it's possible.

Probably best to post the story over on https://kazoo.ga/

happy to collaborate on the article if needed...

I plan to before I forgot though I made some notes. Before that I've to sort out my VPS that hosts the blog first..still haven't attempted on it yet. One reason that hinders me from actively adding new content.
 
rk_crypto is the right driver. I borrowed from there and it works for me. I just went through rk3288 dts file in the 4.4.y rock chip kernel. The crypto block is already there. Perhaps it's missing in 4.14.x kernel used in armbian build. After you add that the binding should work.

I would suggest having kernel built and module successfully loaded. Then proceed with test from user space. Many things could go wrong if people attempt to test from openssl util. I think it worth a shot for rk3288. As I mention earlier, don't waste time for rk3328 because the performance is very poor when compared to armv8 built-in crypto. It was like 400MB/s vs 40MB/s on aes-128-cbc..

rk_crypto is definitely built in the armbian images, but it looks like it's half-implemented... e.g. the kernel module exists and is loaded, I'm still looking into whether the ARM TrustZone firmware is present - ChromeOS does appear to load the ARM firmware image, and they do use it to speed up hashing of their BSP per design* there...

* the ChromeOS design docs are indeed very interesting on how they secure their OS.

However, for Armbian or TinkerOS - in the userland neither cryptodev or af_alg appears to be exposed - even it it were, there's likely some extra effort for userland stuff like OpenSSL to rebuild with the right options for cryptodev, or building the af_alg plugin and adding it to openssl.cnf

There might be fair reasons why it's all this is not turned on by default - remember all the controversy about RdRand and Via's Padlock stuff for random number generation not being all that random- Fast bad crypto is far worse than slow good crypto... so for some, it's safer to nix the HW elements all together, and go back to SW where things can be observed and changed if needed.

Going over the the 4.4 RockChip BSP Kernel - actually check things out on the main linux tip for device tree - Rockchip has added/fixed a lot of things there, along with Linaro, and this is part of the mainlining effort not just for the rk3288 evp, but for the various flavors from the OEM's - yes, there's a specific entry for tinker there.

Anyways - while interesting to spend a bit of time on the crypt stuff including the TRNG, my focus as the moment is more on the power domains. The other items are on the to-do list, but much lower priority...
 
Similar threads

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top