Skip to content

Fix an error with nvc++#249

Open
amontoison wants to merge 10 commits into
masterfrom
amontoison-patch-1-1
Open

Fix an error with nvc++#249
amontoison wants to merge 10 commits into
masterfrom
amontoison-patch-1-1

Conversation

@amontoison

@amontoison amontoison commented Apr 9, 2025

Copy link
Copy Markdown
Collaborator

@jfowkes
If you give me access to the settings like GALAHAD, I can use my self-hosted cluster (moonshot) at ANL with NVIDIA GPUs for the tests.

Related issue #225

@jfowkes

jfowkes commented Apr 9, 2025

Copy link
Copy Markdown
Contributor

@amontoison will do, although we know from multiple bug reports that the GPU code is very broken 😞

@jfowkes

jfowkes commented Apr 9, 2025

Copy link
Copy Markdown
Contributor

Looks like we're still seeing that upstream LLVM bug.

@amontoison

amontoison commented Apr 9, 2025

Copy link
Copy Markdown
Collaborator Author

I know :(
But with CI, it will maybe help to fix it.

For the issue with the NVIDIA compilers, it seems that we forget a default value:

int adep_idx = (blk < iblk) ? blk*block_size*lda + iblk*block_size

We probably need something like that:

int c = a < b ? d : e

I am not an expert in C++ so I may be wrong.

@jfowkes

jfowkes commented Apr 9, 2025

Copy link
Copy Markdown
Contributor

I think you are right there, the question is what should the default value be? Indeed what does the GNU compiler assume that it is if it is not explicitly given?

@amontoison

amontoison commented Apr 9, 2025

Copy link
Copy Markdown
Collaborator Author

@jfowkes The : e was on the next line.
I think you need a quite robust parser to handle that.
I just move it to the same line to help the parser / compiler.
Let's see if it works now.

@jfowkes

jfowkes commented Apr 9, 2025

Copy link
Copy Markdown
Contributor

@amontoison okay I can see why their parser wouldn't like that!

Comment thread src/ssids/cpu/kernels/ldlt_app.cxx Outdated
Comment thread src/ssids/cpu/kernels/ldlt_app.cxx Outdated
Comment thread src/ssids/cpu/kernels/ldlt_app.cxx Outdated
@amontoison amontoison changed the title [CI] Test Nvidia compilers Fix an error with nvc++ Apr 9, 2025
@amontoison amontoison force-pushed the amontoison-patch-1-1 branch from bbd4582 to e5dc2bc Compare August 7, 2025 15:56
@amontoison

Copy link
Copy Markdown
Collaborator Author

@jfowkes It seems to still fail with the last version of nvc++.

@jfowkes

jfowkes commented Aug 8, 2025

Copy link
Copy Markdown
Contributor

@nimgould I thought that you managed to get SSIDS to compile with nvfortran and nvc++? Still failing here.

@nimgould

nimgould commented Aug 8, 2025

Copy link
Copy Markdown
Contributor

nvc++ 25.7-0 64-bit target on x86-64 Linux -tp haswell
nvfortran 25.7-0 64-bit target on x86-64 Linux -tp haswell

From the GALAHAD fork -

Compiling spral_kinds [ OK ]
Compiling spral_kinds_real [ OK ]
Compiling matrix_util [ OK ]
Compiling random [ OK ]
Compiling rutherford_boeing [ OK ]
Compiling cuda_nocuda [ OK ]
Compiling compat [ OK ]
Compiling guess_topology [ OK ]
Compiling omp [ OK ]
Compiling profile [ OK ]
Compiling SymbolicSubtree [ OK ]
Compiling NumericSubtree [ OK ]
Compiling cholesky [ OK ]
Compiling ldlt_app "ldlt_app.cxx", line 1503: warning: variable "adep_idx" was declared but never referenced [declared_but_not_referenced]
ipc_ adep_idx = (blk < iblk) ? blkblock_sizelda + iblk*block_size
^
detected during:
instantiation of "ipc_ spral::ssids::cpu::ldlt_app_internal_dbl::LDLT<T, BLOCK_SIZE, Backup, use_tasks, debug, Allocator>::factor(ipc_, ipc_, ipc_ *, T *, ipc_, T *, Backup &, const spral::ssids::cpu::cpu_factor_options &, spral::ssids::cpu::PivotMethod, ipc_, T, T *, ipc_, std::vector<spral::ssids::cpu::Workspace, std::allocatorspral::ssids::cpu::Workspace> &, const Allocator &) [with T=rpc_, BLOCK_SIZE=32, Backup=spral::ssids::cpu::ldlt_app_internal_dbl::CopyBackup<rpc_, spral::ssids::cpu::BuddyAllocator<rpc_, std::allocator<rpc_>>>, use_tasks=false, debug=false, Allocator=spral::ssids::cpu::BuddyAllocator<rpc_, std::allocator<rpc_>>]" at line 1009
instantiation of "ipc_ spral::ssids::cpu::ldlt_app_internal_dbl::Block<T, INNER_BLOCK_SIZE, IntAlloc>::factor(ipc_, ipc_ *, T *, const spral::ssids::cpu::cpu_factor_options &, std::vector<spral::ssids::cpu::Workspace, std::allocatorspral::ssids::cpu::Workspace> &, const Allocator &) [with T=rpc_, INNER_BLOCK_SIZE=32, IntAlloc=spral::ssids::cpu::BuddyAllocator<ipc_, std::allocator<rpc_>>, Allocator=spral::ssids::cpu::BuddyAllocator<rpc_, std::allocator<rpc_>>]" at line 1833
instantiation of "ipc_ spral::ssids::cpu::ldlt_app_internal_dbl::LDLT<T, BLOCK_SIZE, Backup, use_tasks, debug, Allocator>::run_elim_unpivoted(ipc_, ipc_, ipc_ *, T *, ipc_, T *, spral::ssids::cpu::ldlt_app_internal_dbl::ColumnData<T, spral::ssids::cpu::ldlt_app_internal_dbl::LDLT<T, BLOCK_SIZE, Backup, use_tasks, debug, Allocator>::IntAlloc> &, Backup &, ipc_ *, const spral::ssids::cpu::cpu_factor_options &, ipc_, T, T *, ipc_, std::vector<spral::ssids::cpu::Workspace, std::allocatorspral::ssids::cpu::Workspace> &, const Allocator &) [with T=rpc_, BLOCK_SIZE=32, Backup=spral::ssids::cpu::ldlt_app_internal_dbl::CopyBackup<rpc_, spral::ssids::cpu::BuddyAllocator<rpc_, std::allocator<rpc_>>>, use_tasks=true, debug=false, Allocator=spral::ssids::cpu::BuddyAllocator<rpc_, std::allocator<rpc_>>]" at line 2387
instantiation of "ipc_ spral::ssids::cpu::ldlt_app_internal_dbl::LDLT<T, BLOCK_SIZE, Backup, use_tasks, debug, Allocator>::factor(ipc_, ipc_, ipc_ *, T *, ipc_, T *, Backup &, const spral::ssids::cpu::cpu_factor_options &, spral::ssids::cpu::PivotMethod, ipc_, T, T *, ipc_, std::vector<spral::ssids::cpu::Workspace, std::allocatorspral::ssids::cpu::Workspace> &, const Allocator &) [with T=rpc_, BLOCK_SIZE=32, Backup=spral::ssids::cpu::ldlt_app_internal_dbl::CopyBackup<rpc_, spral::ssids::cpu::BuddyAllocator<rpc_, std::allocator<rpc_>>>, use_tasks=true, debug=false, Allocator=spral::ssids::cpu::BuddyAllocator<rpc_, std::allocator<rpc_>>]" at line 2607
instantiation of "ipc_ spral::ssids::cpu::ldlt_app_factor_dbl(ipc_, ipc_, ipc_ *, T *, ipc_, T *, T, T *, ipc_, const spral::ssids::cpu::cpu_factor_options &, std::vector<spral::ssids::cpu::Workspace, std::allocatorspral::ssids::cpu::Workspace> &, const Allocator &) [with T=rpc_, Allocator=spral::ssids::cpu::BuddyAllocator<rpc_, std::allocator<rpc_>>]" at line 2615

Remark: individual warnings can be suppressed with "--diag_suppress "

[ OK ]
Compiling ldlt_nopiv [ OK ]
Compiling ldlt_tpp [ OK ]
Compiling wrappers [ OK ]
Compiling ThreadStats [ OK ]
Compiling hw_topology [ OK ]
Compiling spral_scaling [ OK ]
Compiling match_order [ OK ]
Compiling datatypes [ OK ]
Compiling core_analyse [ OK ]
Compiling pgm [ OK ]
Compiling inform [ OK ]
Compiling contrib [ OK ]
Compiling subtree [ OK ]
Compiling akeep [ OK ]
Compiling blas_iface [ OK ]
Compiling lapack_iface [ OK ]
Compiling cpu_iface [ OK ]
Compiling cpu_subtree [ OK ]
Compiling profile_iface [ OK ]
Compiling gpu_subtree_no_cuda [ OK ]
Compiling anal [ OK ]
Compiling fkeep [ OK ]
Compiling contrib_free [ OK ]
Compiling ssids [ OK ]
GALAHAD: SSIDS (double precision version) compiled successfully
Compiling ssidst [ OK ]

Exhaustive test of subroutine interface to ssids
Warning: ieee_inexact is signaling
FORTRAN STOP
ssids double precision tests

provided ordering
RHS refine partial
ok ok ok

ssids tests completed

@jfowkes

jfowkes commented Aug 8, 2025

Copy link
Copy Markdown
Contributor

@amontoison compiles fine for the GALAHAD fork as above, what's different here to what Nick has?

@nimgould

nimgould commented Aug 8, 2025

Copy link
Copy Markdown
Contributor

Thus the ldlt_app.cxx error that you see is just classed as a warning with me, and all is ok.

@nimgould

nimgould commented Aug 8, 2025

Copy link
Copy Markdown
Contributor

(Of course this is with the makefile version, but previously even this failed - maybe we have different compiler flags?)

@jfowkes

jfowkes commented Aug 8, 2025

Copy link
Copy Markdown
Contributor

We're seeing

nvc++ -Ilibspral.so.p -I. -I.. -Iinclude -I../include -Isrc -I../src -I../../deps/include -g -DHAVE_HWLOC 
      -DHAVE_SCHED_GETCPU -mp -fPIC -o libspral.so.p/src_ssids_cpu_kernels_ldlt_app.cxx.o
      -c ../src/ssids/cpu/kernels/ldlt_app.cxx
/opt/nvidia/hpc_sdk/Linux_x86_64/25.7/compilers/share/llvm/bin/llc: error:
/opt/nvidia/hpc_sdk/Linux_x86_64/25.7/compilers/share/llvm/bin/llc: 
/tmp/nvc++fd6p0IAlRmo.ll:10297:148: error: use of undefined value '%dblk' invoke void @_ZN5spral5ssids3cpu17ldlt_app_internal5BlockIdLi32ENS1_14BuddyAllocatorIiSaIdEEEE6backupINS2_10CopyBackupIdNS4_IdS5_EEEEEEvRT_ (ptr %dblk, ptr  %20) mustprogress

So perhaps we need to ensure that llc here is also the latest version?

@nimgould

nimgould commented Aug 8, 2025

Copy link
Copy Markdown
Contributor

Here

/opt/nvidia/hpc_sdk/Linux_x86_64/25.7/compilers/share/llvm/bin/llc
-rwxr-xr-x 1 root root 67877816 Jul 18 20:21 /opt/nvidia/hpc_sdk/Linux_x86_64/25
.7/compilers/share/llvm/bin/llc

but this comes as part of the nvidia bundle

@jfowkes

jfowkes commented Aug 8, 2025

Copy link
Copy Markdown
Contributor

Ah I know what it might be, @nimgould are you enabling OpenMP? NVIDIA say that this is a bug with the OpenMP flag.

@jfowkes

jfowkes commented Aug 8, 2025

Copy link
Copy Markdown
Contributor

nvc++ uses -mp to enable OpenMP support and upstream think that this is where the problem lies.

@nimgould

nimgould commented Aug 8, 2025

Copy link
Copy Markdown
Contributor

That is probably it, I have OPENMP = -not_openmp as one of the environment variables for nvfortran calls (it is, e.g., OPENMP = -fopenmp for gfortran)

@nimgould

nimgould commented Aug 8, 2025

Copy link
Copy Markdown
Contributor

OK, so it is still a compiler bug, but we can work around it at the expense of no openp support. I wonder why nvidia haven't fixed something as crucial as this? But, I guess they are not alone (e.g. NAG)

@nimgould

nimgould commented Aug 8, 2025

Copy link
Copy Markdown
Contributor

(and yes, the unused variable I see is simply because it is only used within an openmp pragma)

@jfowkes

jfowkes commented Aug 8, 2025

Copy link
Copy Markdown
Contributor

I reported it here in April:
https://developer.nvidia.com/bugs/5217463
They said they're able to reproduce it and that someone will take a look.

@amontoison

amontoison commented Aug 8, 2025

Copy link
Copy Markdown
Collaborator Author

@jfowkes @nimgould
I updated the CI script to compile without OpenMP when we use the NVIDIA compilers.
The compilation works now.

 2/39 SPRAL:ssmfe+tests+fortran / ssmfet                FAIL             0.13s   killed by signal 11 SIGSEGV
>>> OMP_PROC_BIND=true LD_LIBRARY_PATH=/home/runner/work/spral/spral/builddir/:/opt/hostedtoolcache/Python/3.12.11/x64/lib MALLOC_PERTURB_=33 UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MESON_TEST_ITERATION=1 ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 OMP_CANCELLATION=true MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 /home/runner/work/spral/spral/builddir/ssmfet
18/39 SPRAL:ssmfe+examples+fortran / hermitians         FAIL             0.09s   killed by signal 11 SIGSEGV
>>> OMP_PROC_BIND=true LD_LIBRARY_PATH=/home/runner/work/spral/spral/builddir/:/opt/hostedtoolcache/Python/3.12.11/x64/lib UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MESON_TEST_ITERATION=1 MALLOC_PERTURB_=32 ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 OMP_CANCELLATION=true MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 /home/runner/work/spral/spral/builddir/hermitians

Note that two Fortran tests are failing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants